Bug 665 - very basic nmigen-to-c compiler needed
Summary: very basic nmigen-to-c compiler needed
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 241
  Show dependency treegraph
 
Reported: 2021-08-05 11:50 BST by Luke Kenneth Casson Leighton
Modified: 2021-11-21 18:50 GMT (History)
5 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-08-05 11:50:21 BST
https://github.com/apertus-open-source-cinema/naps/blob/9ebbc0/naps/soc/cli.py#L17

for PowerDecoder and PowerDecoder2 the output is sufficiently
complex that duplicating it (and maintaining a duplicate) is not sensible.

therefore create a VERY basic nmigen-to-c converter through
simple AST node tree-walking.
Comment 1 Cesar Strauss 2021-08-06 18:17:58 BST
> for PowerDecoder and PowerDecoder2 the output is sufficiently
> complex that duplicating it (and maintaining a duplicate) is not sensible.

An alternative for this would be to convert to a C++ simulation using cxxrtl and wrap the evaluation function into a library.
Comment 2 Luke Kenneth Casson Leighton 2021-08-06 19:29:19 BST
(In reply to Cesar Strauss from comment #1)
> > for PowerDecoder and PowerDecoder2 the output is sufficiently
> > complex that duplicating it (and maintaining a duplicate) is not sensible.
> 
> An alternative for this would be to convert to a C++ simulation using cxxrtl
> and wrap the evaluation function into a library.

nice idea in theory however c++ and the associated template library
it uses will not make it into the linux kernel.

i took a look yesterday at _pyrtl.py, i did not realise it actually creates
*python* code which is eval'd and compiled and then executed as a nameless
function.

this is extremely cool because the python code (which can be inspected
by enabling a debug os.ENV var) is very basic and conversion to c should
be extremely easy.
Comment 3 Luke Kenneth Casson Leighton 2021-08-31 20:30:43 BST
ok so the idea here is to have the bare minimum code-generator which is
actually executable c code.  it is reasonable to assume (for now) that
the maximum Signal width will be 64-bit, but not reasonable to assume
it will stay that way.

therefore, part of the project involves creating some macro-templates
for Signal arithmetic (in c) and having the compiler spit out both
the macros and their usage.

nmigen:

     comb += x.eq(y + 5)

c output (or close to it):

     #define SIGNAL uint64_t
     #define SADD(res, x, y) (res = x + y)

     .... SADD(x, y, 5)

something like that.
Comment 4 Jacob Lifshay 2021-08-31 20:36:48 BST
(In reply to Luke Kenneth Casson Leighton from comment #3)
> ok so the idea here is to have the bare minimum code-generator which is
> actually executable c code.  it is reasonable to assume (for now) that
> the maximum Signal width will be 64-bit, but not reasonable to assume
> it will stay that way.
> 
> therefore, part of the project involves creating some macro-templates
> for Signal arithmetic (in c) and having the compiler spit out both
> the macros and their usage.

I'd expect that it'll work better for the C to be completely de-generic-ified, and not use a mountain of undecipherable macros to make everything work, being able to read the generated code would be nice :)
Comment 5 Luke Kenneth Casson Leighton 2021-08-31 21:01:19 BST
(In reply to Jacob Lifshay from comment #4)

> I'd expect that it'll work better for the C to be completely
> de-generic-ified, and not use a mountain of undecipherable macros to make
> everything work, being able to read the generated code would be nice :)

signals unfortunately are not limited in length in any way, shape or form.
there is no such concept in c as a basic integer type capable of adding
4,096 bits.

consequently, macros (or macros hiding functions) are unavoidable.
Comment 6 Luke Kenneth Casson Leighton 2021-08-31 21:02:02 BST
(In reply to Luke Kenneth Casson Leighton from comment #5)

> signals unfortunately are not limited in length in any way, shape or form.
> there is no such concept in c as a basic integer type capable of adding
> 4,096 bits.

(cxxsim uses c++ templates.  compile-times are off the charts as a result)
Comment 7 Jacob Lifshay 2021-08-31 21:48:22 BST
(In reply to Luke Kenneth Casson Leighton from comment #5)
> (In reply to Jacob Lifshay from comment #4)
> 
> > I'd expect that it'll work better for the C to be completely
> > de-generic-ified, and not use a mountain of undecipherable macros to make
> > everything work, being able to read the generated code would be nice :)
> 
> signals unfortunately are not limited in length in any way, shape or form.
> there is no such concept in c as a basic integer type capable of adding
> 4,096 bits.
> 
> consequently, macros (or macros hiding functions) are unavoidable.

there's an easy solution: use arrays when signals are more than 64-bits:

typedef uint32_t signal_word_t;
typedef uint64_t signal_dword_t;
#define SIGNAL_WORD_BITS 32
#define SIGNAL_ARRAY_SIZE(bits) \
    (((size_t)(bits) + (SIGNAL_WORD_BITS - 1)) / SIGNAL_WORD_BITS)

static inline size_t saturating_sub(size_t a, size_t b)
{
    return a >= b ? a - b : 0;
}

static inline void add_signal(
    signal_word_t *restrict out,
    const signal_word_t *in0,
    const signal_word_t *in1,
    size_t bits)
{
    size_t i;
    signal_dword_t carry = 0;
    for(i = 0; bits > 0; i++)
    {
        signal_dword_t sum = (signal_dword_t)in0[i];
        sum += (signal_dword_t)in1[i] + carry;
        carry = sum >> SIGNAL_WORD_BITS;
        if(bits < SIGNAL_WORD_BITS)
            sum &= (1ULL << bits) - 1;
        out[i] = (signal_word_t)sum;
        bits = saturating_sub(bits, SIGNAL_WORD_BITS);
    }
}

static inline void cast_unsigned_signal(
    signal_word_t *restrict out,
    size_t out_bits,
    const signal_word_t *in,
    size_t in_bits)
{
    size_t i;
    for(i = 0; out_bits > 0; i++)
    {
        signal_word_t v = in_bits > 0 ? in[i] : 0;
        // assumption: `in` is already padded with zero bits
        // to fill out the last word
        if(out_bits < SIGNAL_WORD_BITS)
            v &= (1ULL << out_bits) - 1;
        out[i] = v;
        out_bits = saturating_sub(out_bits, SIGNAL_WORD_BITS);
        in_bits = saturating_sub(in_bits, SIGNAL_WORD_BITS);
    }
}

void openpower_add(openpower_regs *regs) {
    // replace with actual code:
    signal_word_t ra[SIGNAL_ARRAY_SIZE(64)];
    signal_word_t rb[SIGNAL_ARRAY_SIZE(64)];
    signal_word_t rt[SIGNAL_ARRAY_SIZE(64)];
    signal_word_t lhs[SIGNAL_ARRAY_SIZE(256)];
    signal_word_t rhs[SIGNAL_ARRAY_SIZE(256)];
    signal_word_t sum[SIGNAL_ARRAY_SIZE(256)];
    ra[0] = (signal_word_t)regs.ra;
    ra[1] = regs.ra >> SIGNAL_WORD_BITS;
    rb[0] = (signal_word_t)regs.rb;
    rb[1] = regs.rb >> SIGNAL_WORD_BITS;
    cast_unsigned_signal(lhs, 256, ra, 64);
    cast_unsigned_signal(rhs, 256, rb, 64);
    add_signal(sum, lhs, rhs, 256);
    cast_unsigned_signal(rt, 64, sum, 256);
    regs.rt = ((signal_dword_t)rt[1] << SIGNAL_WORD_BITS) | rt[0];
}
Comment 9 Luke Kenneth Casson Leighton 2021-11-21 18:26:26 GMT
mikolajw, dmitry has 2 weeks free (precious full-time availability), do you
mind if he makes a start on this on tuesday?
Comment 10 wielgusmikolaj 2021-11-21 18:50:56 GMT
Sure, go ahead.