https://github.com/apertus-open-source-cinema/naps/blob/9ebbc0/naps/soc/cli.py#L17 for PowerDecoder and PowerDecoder2 the output is sufficiently complex that duplicating it (and maintaining a duplicate) is not sensible. therefore create a VERY basic nmigen-to-c converter through simple AST node tree-walking.
> for PowerDecoder and PowerDecoder2 the output is sufficiently > complex that duplicating it (and maintaining a duplicate) is not sensible. An alternative for this would be to convert to a C++ simulation using cxxrtl and wrap the evaluation function into a library.
(In reply to Cesar Strauss from comment #1) > > for PowerDecoder and PowerDecoder2 the output is sufficiently > > complex that duplicating it (and maintaining a duplicate) is not sensible. > > An alternative for this would be to convert to a C++ simulation using cxxrtl > and wrap the evaluation function into a library. nice idea in theory however c++ and the associated template library it uses will not make it into the linux kernel. i took a look yesterday at _pyrtl.py, i did not realise it actually creates *python* code which is eval'd and compiled and then executed as a nameless function. this is extremely cool because the python code (which can be inspected by enabling a debug os.ENV var) is very basic and conversion to c should be extremely easy.
ok so the idea here is to have the bare minimum code-generator which is actually executable c code. it is reasonable to assume (for now) that the maximum Signal width will be 64-bit, but not reasonable to assume it will stay that way. therefore, part of the project involves creating some macro-templates for Signal arithmetic (in c) and having the compiler spit out both the macros and their usage. nmigen: comb += x.eq(y + 5) c output (or close to it): #define SIGNAL uint64_t #define SADD(res, x, y) (res = x + y) .... SADD(x, y, 5) something like that.
(In reply to Luke Kenneth Casson Leighton from comment #3) > ok so the idea here is to have the bare minimum code-generator which is > actually executable c code. it is reasonable to assume (for now) that > the maximum Signal width will be 64-bit, but not reasonable to assume > it will stay that way. > > therefore, part of the project involves creating some macro-templates > for Signal arithmetic (in c) and having the compiler spit out both > the macros and their usage. I'd expect that it'll work better for the C to be completely de-generic-ified, and not use a mountain of undecipherable macros to make everything work, being able to read the generated code would be nice :)
(In reply to Jacob Lifshay from comment #4) > I'd expect that it'll work better for the C to be completely > de-generic-ified, and not use a mountain of undecipherable macros to make > everything work, being able to read the generated code would be nice :) signals unfortunately are not limited in length in any way, shape or form. there is no such concept in c as a basic integer type capable of adding 4,096 bits. consequently, macros (or macros hiding functions) are unavoidable.
(In reply to Luke Kenneth Casson Leighton from comment #5) > signals unfortunately are not limited in length in any way, shape or form. > there is no such concept in c as a basic integer type capable of adding > 4,096 bits. (cxxsim uses c++ templates. compile-times are off the charts as a result)
(In reply to Luke Kenneth Casson Leighton from comment #5) > (In reply to Jacob Lifshay from comment #4) > > > I'd expect that it'll work better for the C to be completely > > de-generic-ified, and not use a mountain of undecipherable macros to make > > everything work, being able to read the generated code would be nice :) > > signals unfortunately are not limited in length in any way, shape or form. > there is no such concept in c as a basic integer type capable of adding > 4,096 bits. > > consequently, macros (or macros hiding functions) are unavoidable. there's an easy solution: use arrays when signals are more than 64-bits: typedef uint32_t signal_word_t; typedef uint64_t signal_dword_t; #define SIGNAL_WORD_BITS 32 #define SIGNAL_ARRAY_SIZE(bits) \ (((size_t)(bits) + (SIGNAL_WORD_BITS - 1)) / SIGNAL_WORD_BITS) static inline size_t saturating_sub(size_t a, size_t b) { return a >= b ? a - b : 0; } static inline void add_signal( signal_word_t *restrict out, const signal_word_t *in0, const signal_word_t *in1, size_t bits) { size_t i; signal_dword_t carry = 0; for(i = 0; bits > 0; i++) { signal_dword_t sum = (signal_dword_t)in0[i]; sum += (signal_dword_t)in1[i] + carry; carry = sum >> SIGNAL_WORD_BITS; if(bits < SIGNAL_WORD_BITS) sum &= (1ULL << bits) - 1; out[i] = (signal_word_t)sum; bits = saturating_sub(bits, SIGNAL_WORD_BITS); } } static inline void cast_unsigned_signal( signal_word_t *restrict out, size_t out_bits, const signal_word_t *in, size_t in_bits) { size_t i; for(i = 0; out_bits > 0; i++) { signal_word_t v = in_bits > 0 ? in[i] : 0; // assumption: `in` is already padded with zero bits // to fill out the last word if(out_bits < SIGNAL_WORD_BITS) v &= (1ULL << out_bits) - 1; out[i] = v; out_bits = saturating_sub(out_bits, SIGNAL_WORD_BITS); in_bits = saturating_sub(in_bits, SIGNAL_WORD_BITS); } } void openpower_add(openpower_regs *regs) { // replace with actual code: signal_word_t ra[SIGNAL_ARRAY_SIZE(64)]; signal_word_t rb[SIGNAL_ARRAY_SIZE(64)]; signal_word_t rt[SIGNAL_ARRAY_SIZE(64)]; signal_word_t lhs[SIGNAL_ARRAY_SIZE(256)]; signal_word_t rhs[SIGNAL_ARRAY_SIZE(256)]; signal_word_t sum[SIGNAL_ARRAY_SIZE(256)]; ra[0] = (signal_word_t)regs.ra; ra[1] = regs.ra >> SIGNAL_WORD_BITS; rb[0] = (signal_word_t)regs.rb; rb[1] = regs.rb >> SIGNAL_WORD_BITS; cast_unsigned_signal(lhs, 256, ra, 64); cast_unsigned_signal(rhs, 256, rb, 64); add_signal(sum, lhs, rhs, 256); cast_unsigned_signal(rt, 64, sum, 256); regs.rt = ((signal_dword_t)rt[1] << SIGNAL_WORD_BITS) | rt[0]; }
https://github.com/numba/numba#:~:text=A%20Just-In-Time%20Compiler%20for%20Numerical%20Functions%20in%20Python,subset%20of%20numerically-focused%20Python%2C%20including%20many%20NumPy%20functions
mikolajw, dmitry has 2 weeks free (precious full-time availability), do you mind if he makes a start on this on tuesday?
Sure, go ahead.
https://git.libre-soc.org/?p=nmigen.git;a=blob;f=nmigen/sim/_pyrtl.py;h=13d515f25cb4a6a297726d29901268b8a6a94a59;hb=e88d283ed30448ed5fe3ba264e3e56b48f2a4982#l433 433 if os.getenv("NMIGEN_pysim_dump"): 434 file = tempfile.NamedTemporaryFile("w", prefix="nmigen_pysim_", delete=False) 435 file.write(code) 436 filename = file.name
mikolaj rather than create a new repo, or alter nmigen, can you please start by taking a complete *unmodified* copy of _pyrtl.py and anything else needed to run _pyrtl.py as a nmigen pyrtl Simulator() and drop it into the openpower-isa repository. with appropriate renaming of classes and imports the first task should be to get an absolutely simple simulation running *in pyrtl* but in the openpower-isa directory not nmigen/sim directory. do not copy over more than that! the next task is then to stop the actual Simulation itself from going ahead, it is probably sufficient to just not call "sim.add_sync_process()" or "sim.run()" and to then begin the task of adjusting the *copy* of _pyrtl.py so that it blats out c instead of python. this should be extremely straightforward albeit slightly trippy in that you will be working at 3rd hand: this is after all compiler technology, which is always a bit hairy. the saving grace here is that the output is utterly utterly braindead simple. no loops, no switches, no gotos, no whiles, nothing.
https://cffi.readthedocs.io/en/latest/overview.html#if-you-don-t-have-an-already-installed-c-library-to-call cffi looks really good! it has the huge advantage over ctypes that if there are any errors in the c definition given, the c compiler will throw a syntax error. loooks like a great start https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=dd56be83b81d17105061478e40fdfa57db9c8af1 that looks like the bare minimum needed, which is great. one thing you will find incredibly useful, seeing the names of the Signals: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ba350a46860654464853df07e6583f22d052132d also i fixed an error in nmutil on the detection of the new Simulator API, the engine= parameter was getting overwritten. i also added an on_Display() function which, some of us are using a patched version of nmigen which can output $display for debugging purposes. can i recommend starting from a waay smaller unit test :) the one you picked, although it is the ultimate goal here, it generates so much code that it's both overwhelming and also takes much longer to run. feel free to duplicate test_power_decoder.py then drastically cut it back or even replace the dut with e.g. one of the nmigen examples. this one would do: https://git.libre-soc.org/?p=nmigen.git;a=blob;f=examples/basic/ctr.py;hb=HEAD or, for a pure combinatorial one: https://git.libre-soc.org/?p=nmigen.git;a=blob;f=examples/basic/pmux.py;hb=HEAD general advice: start very small so that it's extremely quick, you can encounter errors quickly, and better relate the input to the output. overall a fantastic start.
i'm looking at the output, and it's full of these: set(&slots[1280], next_1280); slots is obviously a global array of all of the Signals(), which get updated via repeated calls to run() functions, until such time as no "slot" makes a change. at that point, you know everything's "settled". it will have to be passed in as the argument to "run()" if keeping these "slots", can i make a recommendation to create a suite of #defines for these indices? #define {name_of_signal}_1280 1280 then set(&slots[{name_of_signal}_1280], next_{name_of_signal}_1280); that way you can see the names, but in the c code it still uses integer indexing? unlike in python where it really really matters that slots[] is accessed as fast as possible (so integers is a damn good idea), as this will be compiled it's useful to see what the heck is going on by having actual signal names.
(In reply to Luke Kenneth Casson Leighton from comment #13) > the one you picked, although it is the ultimate goal here, [correction: PowerDecoder2 is the ultimate goal, in order to save vast amounts of time replicating then maintaining something in c that has already taken 12 months to develop]
moved code-generation to a template https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4d98a95ada958a9f683bc2c013f7c8037762eb17 https://libre-soc.org/irclog/%23libre-soc.2021-12-20.log.html#t2021-12-20T14:57:53
just added a very quick PowerDecode2 unit test which really does nothing but put a LD instruction through the wringer https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=9e5d4975cf6a55642670a3f93ca0eb1893af1c95 https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=86cc477b87b46087e3208ccad096c4c67147d643 there's no actual testing (no asserts), it's enough to get a compile running though. i fixed a couple of things, but left bool() and other functions, which will be needed also one thing to watch out for: brackets (operator precedence). there are some warnings being outputted that "|" and "&" tend to be problematic, it's generally a good idea to make things explicit, although the code ends up larger, it's not really to be read by humans, and at least there will be no issues due to the Abstract Syntax Tree being mis-matched against the c standard. where the AST would expect "or(x, and(y, z)" we do not want c operator precedence to accidentally do that as "or(and(x,y), z)" and some explicit brackets makes that potential problem go away