665 – very basic nmigen-to-c compiler needed

Bug 665 - very basic nmigen-to-c compiler needed

Summary: very basic nmigen-to-c compiler needed

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:	760
Blocks:	241
	Show dependency tree / graph

Reported:	2021-08-05 11:50 BST by Luke Kenneth Casson Leighton
Modified:	2022-07-22 05:15 BST (History)
CC List:	5 users (show)

See Also:	583
NLnet milestone:	NLNet.2019.10.046.Standards
total budget (EUR) for completion of task and all subtasks:	2200
budget (EUR) for this task, excluding subtasks' budget:	2200
parent task for budget allocation:	241
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	lkcl = { amount = 1700, submitted = 2022-06-25, paid = 2022-07-21 } [jacob] amount = 500 submitted = 2022-07-06 paid = 2022-07-21

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2021-08-05 11:50:21 BST

https://github.com/apertus-open-source-cinema/naps/blob/9ebbc0/naps/soc/cli.py#L17

for PowerDecoder and PowerDecoder2 the output is sufficiently
complex that duplicating it (and maintaining a duplicate) is not sensible.

therefore create a VERY basic nmigen-to-c converter through
simple AST node tree-walking.

Comment 1 Cesar Strauss 2021-08-06 18:17:58 BST

> for PowerDecoder and PowerDecoder2 the output is sufficiently
> complex that duplicating it (and maintaining a duplicate) is not sensible.

An alternative for this would be to convert to a C++ simulation using cxxrtl and wrap the evaluation function into a library.

Comment 2 Luke Kenneth Casson Leighton 2021-08-06 19:29:19 BST

(In reply to Cesar Strauss from comment #1)
> > for PowerDecoder and PowerDecoder2 the output is sufficiently
> > complex that duplicating it (and maintaining a duplicate) is not sensible.
> 
> An alternative for this would be to convert to a C++ simulation using cxxrtl
> and wrap the evaluation function into a library.

nice idea in theory however c++ and the associated template library
it uses will not make it into the linux kernel.

i took a look yesterday at _pyrtl.py, i did not realise it actually creates
*python* code which is eval'd and compiled and then executed as a nameless
function.

this is extremely cool because the python code (which can be inspected
by enabling a debug os.ENV var) is very basic and conversion to c should
be extremely easy.

Comment 3 Luke Kenneth Casson Leighton 2021-08-31 20:30:43 BST

ok so the idea here is to have the bare minimum code-generator which is
actually executable c code.  it is reasonable to assume (for now) that
the maximum Signal width will be 64-bit, but not reasonable to assume
it will stay that way.

therefore, part of the project involves creating some macro-templates
for Signal arithmetic (in c) and having the compiler spit out both
the macros and their usage.

nmigen:

     comb += x.eq(y + 5)

c output (or close to it):

     #define SIGNAL uint64_t
     #define SADD(res, x, y) (res = x + y)

     .... SADD(x, y, 5)

something like that.

Comment 4 Jacob Lifshay 2021-08-31 20:36:48 BST

(In reply to Luke Kenneth Casson Leighton from comment #3)
> ok so the idea here is to have the bare minimum code-generator which is
> actually executable c code.  it is reasonable to assume (for now) that
> the maximum Signal width will be 64-bit, but not reasonable to assume
> it will stay that way.
> 
> therefore, part of the project involves creating some macro-templates
> for Signal arithmetic (in c) and having the compiler spit out both
> the macros and their usage.

I'd expect that it'll work better for the C to be completely de-generic-ified, and not use a mountain of undecipherable macros to make everything work, being able to read the generated code would be nice :)

Comment 5 Luke Kenneth Casson Leighton 2021-08-31 21:01:19 BST

(In reply to Jacob Lifshay from comment #4)

> I'd expect that it'll work better for the C to be completely
> de-generic-ified, and not use a mountain of undecipherable macros to make
> everything work, being able to read the generated code would be nice :)

signals unfortunately are not limited in length in any way, shape or form.
there is no such concept in c as a basic integer type capable of adding
4,096 bits.

consequently, macros (or macros hiding functions) are unavoidable.

Comment 6 Luke Kenneth Casson Leighton 2021-08-31 21:02:02 BST

(In reply to Luke Kenneth Casson Leighton from comment #5)

> signals unfortunately are not limited in length in any way, shape or form.
> there is no such concept in c as a basic integer type capable of adding
> 4,096 bits.

(cxxsim uses c++ templates.  compile-times are off the charts as a result)

Comment 7 Jacob Lifshay 2021-08-31 21:48:22 BST

(In reply to Luke Kenneth Casson Leighton from comment #5)
> (In reply to Jacob Lifshay from comment #4)
> 
> > I'd expect that it'll work better for the C to be completely
> > de-generic-ified, and not use a mountain of undecipherable macros to make
> > everything work, being able to read the generated code would be nice :)
> 
> signals unfortunately are not limited in length in any way, shape or form.
> there is no such concept in c as a basic integer type capable of adding
> 4,096 bits.
> 
> consequently, macros (or macros hiding functions) are unavoidable.

there's an easy solution: use arrays when signals are more than 64-bits:

typedef uint32_t signal_word_t;
typedef uint64_t signal_dword_t;
#define SIGNAL_WORD_BITS 32
#define SIGNAL_ARRAY_SIZE(bits) \
    (((size_t)(bits) + (SIGNAL_WORD_BITS - 1)) / SIGNAL_WORD_BITS)

static inline size_t saturating_sub(size_t a, size_t b)
{
    return a >= b ? a - b : 0;
}

static inline void add_signal(
    signal_word_t *restrict out,
    const signal_word_t *in0,
    const signal_word_t *in1,
    size_t bits)
{
    size_t i;
    signal_dword_t carry = 0;
    for(i = 0; bits > 0; i++)
    {
        signal_dword_t sum = (signal_dword_t)in0[i];
        sum += (signal_dword_t)in1[i] + carry;
        carry = sum >> SIGNAL_WORD_BITS;
        if(bits < SIGNAL_WORD_BITS)
            sum &= (1ULL << bits) - 1;
        out[i] = (signal_word_t)sum;
        bits = saturating_sub(bits, SIGNAL_WORD_BITS);
    }
}

static inline void cast_unsigned_signal(
    signal_word_t *restrict out,
    size_t out_bits,
    const signal_word_t *in,
    size_t in_bits)
{
    size_t i;
    for(i = 0; out_bits > 0; i++)
    {
        signal_word_t v = in_bits > 0 ? in[i] : 0;
        // assumption: `in` is already padded with zero bits
        // to fill out the last word
        if(out_bits < SIGNAL_WORD_BITS)
            v &= (1ULL << out_bits) - 1;
        out[i] = v;
        out_bits = saturating_sub(out_bits, SIGNAL_WORD_BITS);
        in_bits = saturating_sub(in_bits, SIGNAL_WORD_BITS);
    }
}

void openpower_add(openpower_regs *regs) {
    // replace with actual code:
    signal_word_t ra[SIGNAL_ARRAY_SIZE(64)];
    signal_word_t rb[SIGNAL_ARRAY_SIZE(64)];
    signal_word_t rt[SIGNAL_ARRAY_SIZE(64)];
    signal_word_t lhs[SIGNAL_ARRAY_SIZE(256)];
    signal_word_t rhs[SIGNAL_ARRAY_SIZE(256)];
    signal_word_t sum[SIGNAL_ARRAY_SIZE(256)];
    ra[0] = (signal_word_t)regs.ra;
    ra[1] = regs.ra >> SIGNAL_WORD_BITS;
    rb[0] = (signal_word_t)regs.rb;
    rb[1] = regs.rb >> SIGNAL_WORD_BITS;
    cast_unsigned_signal(lhs, 256, ra, 64);
    cast_unsigned_signal(rhs, 256, rb, 64);
    add_signal(sum, lhs, rhs, 256);
    cast_unsigned_signal(rt, 64, sum, 256);
    regs.rt = ((signal_dword_t)rt[1] << SIGNAL_WORD_BITS) | rt[0];
}

Comment 8 Luke Kenneth Casson Leighton 2021-09-20 20:08:33 BST

https://github.com/numba/numba#:~:text=A%20Just-In-Time%20Compiler%20for%20Numerical%20Functions%20in%20Python,subset%20of%20numerically-focused%20Python%2C%20including%20many%20NumPy%20functions

Comment 9 Luke Kenneth Casson Leighton 2021-11-21 18:26:26 GMT

mikolajw, dmitry has 2 weeks free (precious full-time availability), do you
mind if he makes a start on this on tuesday?

Comment 10 wielgusmikolaj 2021-11-21 18:50:56 GMT

Sure, go ahead.

Comment 11 Luke Kenneth Casson Leighton 2021-12-03 19:01:18 GMT

https://git.libre-soc.org/?p=nmigen.git;a=blob;f=nmigen/sim/_pyrtl.py;h=13d515f25cb4a6a297726d29901268b8a6a94a59;hb=e88d283ed30448ed5fe3ba264e3e56b48f2a4982#l433

 433             if os.getenv("NMIGEN_pysim_dump"):
 434                 file = tempfile.NamedTemporaryFile("w", prefix="nmigen_pysim_", delete=False)
 435                 file.write(code)
 436                 filename = file.name

Comment 12 Luke Kenneth Casson Leighton 2021-12-03 20:58:48 GMT

mikolaj rather than create a new repo, or alter nmigen, can you
please start by taking a complete *unmodified* copy of _pyrtl.py and anything
else needed to run _pyrtl.py as a nmigen pyrtl Simulator() and drop
it into the openpower-isa repository.

with appropriate renaming of classes and imports the first task should
be to get an absolutely simple simulation running *in pyrtl* but in
the openpower-isa directory not nmigen/sim directory.

do not copy over more than that!

the next task is then to stop the actual Simulation
itself from going ahead, it is probably sufficient to
just not call "sim.add_sync_process()" or "sim.run()"
and to then begin the task of adjusting the *copy* of
_pyrtl.py so that it blats out c instead of python.

this should be extremely straightforward albeit slightly
trippy in that you will be working at 3rd hand: this is
after all compiler technology, which is always a bit hairy.

the saving grace here is that the output is utterly utterly
braindead simple. no loops, no switches, no gotos, no whiles,
nothing.

Comment 13 Luke Kenneth Casson Leighton 2021-12-08 12:57:05 GMT

https://cffi.readthedocs.io/en/latest/overview.html#if-you-don-t-have-an-already-installed-c-library-to-call

cffi looks really good! it has the huge advantage over ctypes that if
there are any errors in the c definition given, the c compiler will throw
a syntax error.


loooks like a great start

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=dd56be83b81d17105061478e40fdfa57db9c8af1

that looks like the bare minimum needed, which is great.

one thing you will find incredibly useful, seeing the names of the Signals:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ba350a46860654464853df07e6583f22d052132d

also i fixed an error in nmutil on the detection of the new Simulator API,
the engine= parameter was getting overwritten.  i also added an on_Display()
function which, some of us are using a patched version of nmigen which can
output $display for debugging purposes.

can i recommend starting from a waay smaller unit test :)
the one you picked, although it is the ultimate goal here, it generates
so much code that it's both overwhelming and also takes much longer to
run.

feel free to duplicate test_power_decoder.py then drastically cut it back
or even replace the dut with e.g. one of the nmigen examples.  this one
would do:

https://git.libre-soc.org/?p=nmigen.git;a=blob;f=examples/basic/ctr.py;hb=HEAD

or, for a pure combinatorial one:

https://git.libre-soc.org/?p=nmigen.git;a=blob;f=examples/basic/pmux.py;hb=HEAD

general advice: start very small so that it's extremely quick, you can 
encounter errors quickly, and better relate the input to the output.

overall a fantastic start.

Comment 14 Luke Kenneth Casson Leighton 2021-12-08 13:03:54 GMT

i'm looking at the output, and it's full of these:

    set(&slots[1280], next_1280);

slots is obviously a global array of all of the Signals(), which get
updated via repeated calls to run() functions, until such time as
no "slot" makes a change.  at that point, you know everything's "settled".

it will have to be passed in as the argument to "run()"

if keeping these "slots", can i make a recommendation to create a suite
of #defines for these indices?

#define {name_of_signal}_1280 1280

then

    set(&slots[{name_of_signal}_1280], next_{name_of_signal}_1280);

that way you can see the names, but in the c code it still uses integer
indexing?

unlike in python where it really really matters that slots[] is accessed
as fast as possible (so integers is a damn good idea), as this will be
compiled it's useful to see what the heck is going on by having actual
signal names.

Comment 15 Luke Kenneth Casson Leighton 2021-12-08 19:32:48 GMT

(In reply to Luke Kenneth Casson Leighton from comment #13)

> the one you picked, although it is the ultimate goal here,

[correction: PowerDecoder2 is the ultimate goal, in order to
save vast amounts of time replicating then maintaining something
in c that has already taken 12 months to develop]

Comment 16 Luke Kenneth Casson Leighton 2021-12-20 15:58:14 GMT

moved code-generation to a template

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4d98a95ada958a9f683bc2c013f7c8037762eb17

https://libre-soc.org/irclog/%23libre-soc.2021-12-20.log.html#t2021-12-20T14:57:53

Comment 17 Luke Kenneth Casson Leighton 2021-12-26 13:42:34 GMT

just added a very quick PowerDecode2 unit test which really does nothing
but put a LD instruction through the wringer

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=9e5d4975cf6a55642670a3f93ca0eb1893af1c95
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=86cc477b87b46087e3208ccad096c4c67147d643

there's no actual testing (no asserts), it's enough to get a compile
running though.

i fixed a couple of things, but left bool() and other functions, which
will be needed

also one thing to watch out for: brackets (operator precedence).
there are some warnings being outputted that "|" and "&" tend to
be problematic, it's generally a good idea to make things explicit,
although the code ends up larger, it's not really to be read by humans,
and at least there will be no issues due to the Abstract Syntax Tree
being mis-matched against the c standard.

where the AST would expect "or(x, and(y, z)" we do not want c operator
precedence to accidentally do that as "or(and(x,y), z)" and some
explicit brackets makes that potential problem go away