Bug 979 - Implement C-based Power ISA decoder compiler
Summary: Implement C-based Power ISA decoder compiler
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Dmitry Selyutin
URL:
Depends on: 1094
Blocks: 1154
  Show dependency treegraph
 
Reported: 2022-12-08 18:17 GMT by Dmitry Selyutin
Modified: 2024-06-01 14:45 BST (History)
4 users (show)

See Also:
NLnet milestone: NLnet.2021-08-071.cavatools
total budget (EUR) for completion of task and all subtasks: 5500
budget (EUR) for this task, excluding subtasks' budget: 5500
parent task for budget allocation: 939
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
ghostmansd={amount=5000, submitted=2023-09-14, paid=2023-09-20} lkcl={amount=500, submitted=2023-09-10, paid=2023-09-12}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Selyutin 2022-12-08 18:17:25 GMT
In scope of supporting cavatools, one of crucial pieces is instruction decoder for Power ISA. This task can re-use some of existing infrastructure of the project, relying on nMigen and insndb. It is critically important to provide the code which can be used as a standalone entity: we can and should generate some bits for this code, but it should not rely on external components other than cavatools itself.
Comment 1 Dmitry Selyutin 2022-12-08 18:19:45 GMT
In an ideal world I think we should not even depend on cavatools (in fact, the dependency should be opposite). However, I relaxed this requirement for this iteration due to the time constraints and the understanding that the world is not ideal.
Comment 2 Dmitry Selyutin 2023-08-29 20:08:14 BST
With new visitors (https://bugs.libre-soc.org/show_bug.cgi?id=1094) we should (mostly?) be ready to generate the huge major opcode table like binutils do. Approximately the following steps:
1. Generate a huge table with instructions sorted by opcodes, along with the operands.
2. Provide a way to extend this table with additional data we currently lack a bit (e.g. type of operand, various flags, etc.).
3. Supply an additional manually written code which performs the lookup based on the instruction.
4. Everything should be in C. I suggest we compile this code into standalone library so that it can be reused by anything which needs instruction decoding. I assume we'll need this functionality later for other activities as well.
Comment 3 Dmitry Selyutin 2023-08-29 20:11:04 BST
After some thoughts, I decided to postpone raising subtasks here, because the overall description above already perfectly summarizes the task. I suggest that we limit this exact task to the library, and leave the actual adoption for later milestones. I will, however, check that the library works; if time permits, I'll even might provide some tests.
Comment 4 Dmitry Selyutin 2023-08-30 21:03:03 BST
I started working on this task, marked as "in progress"
Comment 5 Dmitry Selyutin 2023-08-31 22:16:38 BST
OK, it looks like I have something already. Still working on that, but the result looks promising. Here is the sketch; I've deliberately dropped "parts" of the code (8K+ lines).

static const struct svp64_operand svp64_operands[] = {
    /* snip */
    [29] = /* SVxd */ {
        .asm = svp64_operand_nonzero_asm,
        .dis = svp64_operand_nonzero_dis,
        .fields = {6, 7, 8, 9, 10},
    },
    /* snip */
    [64] = /* FRT */ {
        .asm = svp64_operand_fpr_asm,
        .dis = svp64_operand_fpr_dis,
        .fields = {6, 7, 8, 9, 10},
    },
    /* snip */
    [70] = /* FRA */ {
        .asm = svp64_operand_fpr_asm,
        .dis = svp64_operand_fpr_dis,
        .fields = {11, 12, 13, 14, 15},
    },
    /* snip */
};

static const struct svp64_instruction svp64_instructions[] = {
    /* snip */
    {
        .name = "fpow.",
        .opcode = {
            .value = 0xfc0007db,
            .mask = 0xfc0007ff,
        },
        .operands = {
            /* FRT */ [0] = &svp64_operands[64],
            /* FRA */ [1] = &svp64_operands[70],
            /* FRB */ [2] = &svp64_operands[71],
        },
    },
};

As you can see, the overall code structure is somewhat familiar... :-) This generation needs two visitors, where the first one creates a compact table of operands and the second one creates the instructions. Perhaps there will be more visitors, we have to output more entries.
Comment 6 Dmitry Selyutin 2023-08-31 22:23:33 BST
Ah yes, I forgot that asm is a keyword. Sigh. I'll use assemble/disassemble.
Comment 7 Luke Kenneth Casson Leighton 2023-08-31 22:50:42 BST
(In reply to Dmitry Selyutin from comment #6)
> Ah yes, I forgot that asm is a keyword.

i did tell you!

> Sigh. I'll use assemble/disassemble.

i like "ass" and "disass" but these maaay not be well-received
by the wider binutils community :)
Comment 8 Luke Kenneth Casson Leighton 2023-08-31 23:26:29 BST
(In reply to Dmitry Selyutin from comment #5)
> OK, it looks like I have something already. Still working on that, but the
> result looks promising.

awesome, that was quick.

> As you can see, the overall code structure is somewhat familiar... :-) 

indeed, i suspect it will end up near-identical to binutils obj/asm
except perhaps with a stand-alone, more self-contained API?

btw watch out for the size of the decoder table, and consider
(as a *second* phase of optimisation) using that quine algorithm
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/quine.py;hb=HEAD

it should then be easy to "group" by the largest number of
matching bits (by masked-value up to a maximum
set array size, then branch out in a tree:

opcode 0b--011111
opcode 0b---11110

results in a "first level" matcher of "0b---1111-".

the idea there being to reduce decode time by not having massive
numbers of if-elif-elif-elif conditions but instead a cascading
sequence of jump-tables. hell, even arrays of functions to call
that decode the next part.  anything to avoid the massive
switch/case mess.

i would expect this to create a three-level jump-table, corresponding
to MAJOR PO, then 1/2 of the XO (bits 26-31), and finally bits 21-25
although ironically you might as well do that by hand, sigh.

a fun option to explore would be hashes, but not just any hash:
*actually dynamically and randomly find one* that happens to
produce 100% unique hits (remember "illegal opcode" has to be
also detected)

and if you *really* want some fun, and to not have L1 Cache
end up being obliterated due to the size of decode being so large
it cannot fit in L1 Cache, look up "MROB RIES", which is a FORTH
engine, and adapt it to find an *equation* that produces a unique
index for each opcode (!!! yes really! i have seen this done with
UCS-2 to DOS Codepage tables, in samba)


but... all still way into the future as an optimisation.
Comment 9 Jacob Lifshay 2023-08-31 23:31:31 BST
(In reply to Luke Kenneth Casson Leighton from comment #7)
> (In reply to Dmitry Selyutin from comment #6)
> > Ah yes, I forgot that asm is a keyword.

I tend to use x_ when x is a keyword. So, asm_
Comment 10 Jacob Lifshay 2023-08-31 23:33:42 BST
sounds like you want
https://en.wikipedia.org/wiki/Perfect_hash_function#Minimal_perfect_hash_function
Comment 11 Dmitry Selyutin 2023-08-31 23:37:04 BST
TL;DR: at this stage, this will likely heavily follow binutils. IIRC they hash PO.

Detailed answer. At this stage, I will most likely avoid optimizing it. I want to make something that works, and care about optimizations at the later stage. First, they are hard; second, I have not so much time; third, but not the least, if I dive into optimizing it now, this will go outside of budget bounds.
Comment 12 Dmitry Selyutin 2023-08-31 23:45:19 BST
Also, I insist we will limit this to Power ISA which we support, and likely minus SVP64 disassembly. I don't want to copy manually written bits of binutils code, and would rather generate parts of it (that is, extend fields with visitors, and the ini-like or similar format). But these deserve standalone tasks. Any objections?
Comment 13 Dmitry Selyutin 2023-08-31 23:51:02 BST
So, to summarize:
0. Only disassembly. I reserve the right to prepare some ground for assembly, you already saw, but I don't promise assembly (it's even outside of the tasks description).
1. Vanilla 32-bit instructions, but only those we have in openpower-isa. I assume all operands, all names, etc.
2. No SVP64 yet. No sooner than we have some format for fields. I'm tired of editing it in two places, and the third one will likely lead me to a suicide.
3. No nifty extensions whatsoever.
4. Code will follow binutils logic mostly. All optimizations beyond are likely to be done later.
Comment 14 Luke Kenneth Casson Leighton 2023-09-01 00:06:06 BST
(In reply to Dmitry Selyutin from comment #11)
> TL;DR: at this stage, this will likely heavily follow binutils. IIRC they
> hash PO.

makes sense.

> Detailed answer. At this stage, I will most likely avoid optimizing it. 

gooood.

(In reply to Dmitry Selyutin from comment #13)
> So, to summarize:
> 0. Only disassembly.

gooood.

> I don't promise assembly (it's even outside of the tasks description).

ohh yes very much so. i'm slightly surprised you mention it, i can't
see what use there would be: i believe cavatools might actually use
objdump/disasm in its console?

> 1. Vanilla 32-bit instructions, but only those we have in openpower-isa. 

yes absolutely.  this is a crucial first step (even in the
openpower-isa PowerDecoder / PowerDecoder2) the augmentation
from SVP64 is entirely separate.

i absolutely insisted that there be NO mixing or changing of
SVP64-Prefixed instruction encoding, precisely so that
the 32-bit decoder may be used as-is, later.

> 2. No SVP64 yet.

agreed. it is firmly out of scope but would be covered by the
separate task anyway. (at which point, using c++ classes and
operator-overloads (on registers in particular) will make
the transformation of RA-as-5-bit into RA-as-7-bit much MUCH
easier, but again this is definitely NOT part of this task).
Comment 15 Dmitry Selyutin 2023-09-02 10:21:25 BST
I think you remember how binutils did these tricks, eh?

#define TO TBR + 1
#define DUI TO
#define SVme TO
#define SVG TO
#define TO_MASK (0x1f << 21)
  { 0x1f, 21, NULL, NULL, 0 },

  /* The UI field in a D form instruction.  */
#define UI TO + 1
  { 0xffff, 0, NULL, NULL, 0 },


I've updated the generation so that it's clear that the same entry may refer to multiple possible operands.

    [0] = /* BO, SVG, SVme, TO */ {
        .assemble = svp64_operand_assemble,
        .disassemble = svp64_operand_disassemble,
        .fields = {6, 7, 8, 9, 10},
    },

The decision on whether two different operands are the same depends on two key factors:
1. Is operand class (cf. insndb/core.py) the same?
2. If so, are it's fields the same?

We don't care about names, at all, other than for documentation purposes.
Comment 16 Dmitry Selyutin 2023-09-02 10:54:56 BST
You know what? I thought about it more. Who needs these fields, we can create assemble/disassemble functions dealing with fields immediately, instead of passing the fields into the generic function.
This way, the whole assemble/disassemble process becomes just a simple jump to the corresponding function.
Comment 17 Dmitry Selyutin 2023-09-02 11:06:03 BST
Yep, like this:

    [0] = /* BO, SVG, SVme, TO */ {
        .assemble = svp64_operand_0_assemble,
        .disassemble = svp64_operand_0_disassemble,
    },
    [1] = /* RA */ {
        .assemble = svp64_operand_1_assemble,
        .disassemble = svp64_operand_1_disassemble,
    },

And, once svp64_operand_##_assemble and svp64_operand_##_disassemble are generated, we have just a jump.
Comment 18 Luke Kenneth Casson Leighton 2023-09-02 12:14:18 BST
(In reply to Dmitry Selyutin from comment #15)
> I think you remember how binutils did these tricks, eh?
> 
> #define TO TBR + 1
> #define DUI TO
> #define SVme TO
> #define SVG TO

oh dear...

> We don't care about names, at all, other than for documentation purposes.

internally, no. externally, yes.  i.e. "as an API user", it should
be bloody obvious, when someone looks at the spec, that "operand N
is under Form M which oh look it's really obvious that that links
this header file line Y to section A.B.C of the spec" 

therefore can you please make sure that the header-file and/or structs contain
the Form as part of the "unique identification"?  something like:

   X_FORM_L0
   X_FORM_L1
   ...

i have very specifically ensured that "L" (which is a nightmare
candidate that i have been warning IBM about for 3 years) as an
example is unique... *only* if you prefix it with the Form, by
renaming conflicting occurrences of "L" to "L0 L1 L2 L3".

i really want this to work with c, and not to critically depend
on c++ namespaces.
Comment 19 Luke Kenneth Casson Leighton 2023-09-02 12:16:44 BST
(In reply to Dmitry Selyutin from comment #16)
> You know what? I thought about it more. Who needs these fields, we can
> create assemble/disassemble functions dealing with fields immediately,
> instead of passing the fields into the generic function.
> This way, the whole assemble/disassemble process becomes just a simple jump
> to the corresponding function.

you have to be very careful with higher-order-functions because they are
very expensive (unless static-inlined, and even then)

this has to be a *high performance* API, with a budget of only 50 CPU cycles
per emulated instruction.

each function call blows a whopping *ten percent* of that available budget.
Comment 20 Dmitry Selyutin 2023-09-02 14:00:44 BST
(In reply to Luke Kenneth Casson Leighton from comment #18)
> (In reply to Dmitry Selyutin from comment #15)
> > I think you remember how binutils did these tricks, eh?
> > 
> > #define TO TBR + 1
> > #define DUI TO
> > #define SVme TO
> > #define SVG TO
> 
> oh dear...

I knew you'd like it! They even had to drop parentheses in constructs like (TBR + 1), since some compilers bark at those.


> > We don't care about names, at all, other than for documentation purposes.
> 
> internally, no. externally, yes.  i.e. "as an API user", it should
> be bloody obvious, when someone looks at the spec, that "operand N
> is under Form M which oh look it's really obvious that that links
> this header file line Y to section A.B.C of the spec" 
> 
> therefore can you please make sure that the header-file and/or structs
> contain
> the Form as part of the "unique identification"?  something like:

Sure thing! Since it's not needed for either assembly or dissasembly yet, I'll handle it later, OK? I'm planning to add even more stuff. I eventually want the library to cover everything which we have in insndb. I think that, regardless of cavatools, this will be useful per se. So, short answer: yes, there will be many enums eventually, not that many in scope of this task though.

>    X_FORM_L0
>    X_FORM_L1
>    ...
> 
> i have very specifically ensured that "L" (which is a nightmare
> candidate that i have been warning IBM about for 3 years) as an
> example is unique... *only* if you prefix it with the Form, by
> renaming conflicting occurrences of "L" to "L0 L1 L2 L3".
> 
> i really want this to work with c, and not to critically depend
> on c++ namespaces.

This all will be C-based. The only use case for C++ I'm thinking of is register remapping. That said, even there I'll likely avoid C++. First (and yes, this is a damn good reason!), I don't like it. Second, it will complicate things. Third... do we even need third here? Wrapping C API is trivial, especially for C++; if some weirdo needs classes and templates (no idea why), they can do it themselves.


(In reply to Luke Kenneth Casson Leighton from comment #19)
> you have to be very careful with higher-order-functions because they are
> very expensive (unless static-inlined, and even then)
> 
> this has to be a *high performance* API, with a budget of only 50 CPU cycles
> per emulated instruction.
> 
> each function call blows a whopping *ten percent* of that available budget.

Yes, binutils optimize the case when callback is empty, and deal with it internally. I'm thinking of another way to handle it AND at the same time still generate it. Stay tuned.
Comment 21 Dmitry Selyutin 2023-09-02 16:37:16 BST
OK, here's a better idea. No functions at all other than one huge function which performs the jumps.

static inline int64_t
svp64_operand_disassemble_value(size_t id, uint64_t insn) {
{
    switch (id) {
    case 0: /* BO, SVG, SVme, TO */
        return (int64_t)(
            (((insn >> UINT64_C(21)) & UINT64_C(1)) << UINT64_C(0)) |
            (((insn >> UINT64_C(22)) & UINT64_C(1)) << UINT64_C(1)) |
            (((insn >> UINT64_C(23)) & UINT64_C(1)) << UINT64_C(2)) |
            (((insn >> UINT64_C(24)) & UINT64_C(1)) << UINT64_C(3)) |
            UINT64_C(0)
        );
    case 1: /* RA */
        return (int64_t)(
            (((insn >> UINT64_C(16)) & UINT64_C(1)) << UINT64_C(0)) |
            (((insn >> UINT64_C(17)) & UINT64_C(1)) << UINT64_C(1)) |
            (((insn >> UINT64_C(18)) & UINT64_C(1)) << UINT64_C(2)) |
            (((insn >> UINT64_C(19)) & UINT64_C(1)) << UINT64_C(3)) |
            UINT64_C(0)
        );
}

That's obviously is a sketch, but should illustrate the idea.
Comment 22 Luke Kenneth Casson Leighton 2023-09-02 19:27:51 BST
(In reply to Dmitry Selyutin from comment #21)

> static
>     case 0: /* BO, SVG, SVme, TO */
>         return (int64_t)(
>             (((insn >> UINT64_C(21)) & UINT64_C(1)) << UINT64_C(0)) |

compilers will merge all these together as static bitshifts.
should do fine.
Comment 23 Jacob Lifshay 2023-09-03 07:20:03 BST
(In reply to Luke Kenneth Casson Leighton from comment #19)
> each function call blows a whopping *ten percent* of that available budget.

with tail-call optimization, you can get even better performance than a giant switch if you do it right...

also modern cpus usually correctly predict exactly where call and return insns go leaving them as very low cost (maybe 1 cycle each, or less if they overlap other computation), so, no, calls aren't always expensive.

(In reply to Dmitry Selyutin from comment #21)
> OK, here's a better idea. No functions at all other than one huge function
> which performs the jumps.

i think having separate functions is easier to understand, and with __attribute__((always_inline)) produces essentially identical code.
Comment 24 Dmitry Selyutin 2023-09-03 07:51:38 BST
(In reply to Jacob Lifshay from comment #23)
> (In reply to Luke Kenneth Casson Leighton from comment #19)
> > each function call blows a whopping *ten percent* of that available budget.
> 
> with tail-call optimization, you can get even better performance than a
> giant switch if you do it right...
> 
> also modern cpus usually correctly predict exactly where call and return
> insns go leaving them as very low cost (maybe 1 cycle each, or less if they
> overlap other computation), so, no, calls aren't always expensive.

I wouldn't rely too much on optimizations in the generated code. When it's short and manually-written, it's already difficult enough to perform such optimizations.

> (In reply to Dmitry Selyutin from comment #21)
> > OK, here's a better idea. No functions at all other than one huge function
> > which performs the jumps.
> 
> i think having separate functions is easier to understand, and with
> __attribute__((always_inline)) produces essentially identical code.

On the generation _product_ -- yes. I'd argue it's easier when we're talking about the generator _itself_. :-) Also, I'd like to avoid non-portable extensions whenever possible.

We do the same thing as binutils except that we don't have the distinction between "usual" and "special" operands (those with and without callbacks). The function with the switch is static and is a simple (C99) inline. Sufficient for me, I'm moving on.
Comment 25 Jacob Lifshay 2023-09-03 08:05:32 BST
(In reply to Dmitry Selyutin from comment #24)

> On the generation _product_ -- yes. I'd argue it's easier when we're talking
> about the generator _itself_. :-) Also, I'd like to avoid non-portable
> extensions whenever possible.

if you're worried about portability, just use any of the macros that expand to always inline when available and nothing otherwise:

https://github.com/bminor/binutils-gdb/blob/41770089162edd23ea7ae33b7952585316accde5/gnulib/config.in#L1642
Comment 26 Jacob Lifshay 2023-09-03 08:14:31 BST
(In reply to Jacob Lifshay from comment #23)
> also modern cpus usually correctly predict exactly where call and return
> insns go leaving them as very low cost (maybe 1 cycle each, or less if they
> overlap other computation), so, no, calls aren't always expensive.

e.g. here, the call/return pair costs 0.13 ns/call on average when I tested it, which is less than one clock cycle:
https://gcc.godbolt.org/z/vfME5dbjG
Comment 27 Luke Kenneth Casson Leighton 2023-09-03 08:28:58 BST
(In reply to Dmitry Selyutin from comment #24)

> We do the same thing as binutils except that we don't have the distinction
> between "usual" and "special" operands

gooood.  the non-portable extensions (callbacks) were (hypothesis) because
of a lack of thoroughness in insndb [?]

> (those with and without callbacks).
> The function with the switch is static and is a simple (C99) inline.
> Sufficient for me, I'm moving on.

does-the-job. gets you to the next hoop. makes more money for you because
it works and was quicker to implement. am totally behind that.
Comment 28 Dmitry Selyutin 2023-09-03 09:00:32 BST
(In reply to Luke Kenneth Casson Leighton from comment #27)
> (In reply to Dmitry Selyutin from comment #24)
> 
> gooood.  the non-portable extensions (callbacks) were (hypothesis) because
> of a lack of thoroughness in insndb [?]

Naaah, that was just an attempt to make their special case a common one. They call callbacks when they have it; my first idea was just to call it unconditionally. Then I revisited this place and thought that there's no real need to call something unconditionally (yet creating one or two additional tables) when this can be converted to direct jumps to code. That is, why bother jumping to data and hope it'd be optimized when we can just jump to the code directly? :-)

Any approach can be done, but jumping to the code location makes it even simpler. Speaking of non-portable extensions, we could've even have this:

{
    static void *table[] = { &&operand_0, &&operand_1, &&operand_3 };

    goto *table[id];
 
   operand_0:
        /* code 0 */
    operand_1:
        /* code 1 */
}

But, in the end of the day, it's just the same as switch, just highly non-portable one. I'm happy with the switch for now. :-)
Comment 29 Dmitry Selyutin 2023-09-03 09:06:38 BST
That said, to support Jacob: I agree that generally jumping to function by its pointer via index is cleaner and more readable. It's just that I can ignore this for now, because in the generated code it really wouldn't help: navigating in switch or navigating in array for generated code is kinda the same for large amount of entries.
Comment 30 Dmitry Selyutin 2023-09-06 20:18:43 BST
Hi folks! Good news, I've already implemented a very trivial generated library based on visitors. This library is intended to keep both assembler and disassembler; the code looks like this (obviously the code is truncated):


$ cat svp64.h
enum svp64_state {
    SVP64_SUCCESS,
    SVP64_ERROR_LOOKUP,
    SVP64_ERROR_OPERAND_0,
    SVP64_ERROR_OPERAND_1,
    SVP64_ERROR_OPERAND_2,
    SVP64_ERROR_OPERAND_3,
    SVP64_ERROR_OPERAND_4,
    SVP64_ERROR_OPERAND_5,
    SVP64_ERROR_OPERAND_6,
    SVP64_ERROR_OPERAND_7,
};

struct svp64_opcode {
    uint32_t value;
    uint32_t mask;
};

struct svp64_record {
    struct svp64_opcode opcode;
    uint8_t operands[8];
    char name[16];
};

struct svp64_ctx {
    struct svp64_record const *record;
    int64_t operands[8];
};

enum svp64_state
svp64_disassemble(struct svp64_ctx *ctx, uint32_t insn);

struct svp64_record const *
svp64_lookup_insn(uint32_t insn);


$ cat svp64-dis-gen.c
static inline enum svp64_state
svp64_disassemble_operand(struct svp64_ctx *ctx, uint32_t insn, size_t id) {
    int64_t value;

    switch (ctx->record->operands[id]) {
    case 0x02: /* RA */
        value = (int64_t)(
            /* 11 */ (((insn >> UINT64_C(20)) & UINT64_C(1)) << UINT64_C(4)) |
            /* 12 */ (((insn >> UINT64_C(19)) & UINT64_C(1)) << UINT64_C(3)) |
            /* 13 */ (((insn >> UINT64_C(18)) & UINT64_C(1)) << UINT64_C(2)) |
            /* 14 */ (((insn >> UINT64_C(17)) & UINT64_C(1)) << UINT64_C(1)) |
            /* 15 */ (((insn >> UINT64_C(16)) & UINT64_C(1)) << UINT64_C(0)) |
            UINT64_C(0)
        );
        break;

    case 0x03: /* SI */
        value = (int64_t)(
            (
                (
                    /* 16 */ (((insn >> UINT64_C(15)) & UINT64_C(1)) << UINT64_C(15)) |
                    /* 17 */ (((insn >> UINT64_C(14)) & UINT64_C(1)) << UINT64_C(14)) |
                    /* 18 */ (((insn >> UINT64_C(13)) & UINT64_C(1)) << UINT64_C(13)) |
                    /* 19 */ (((insn >> UINT64_C(12)) & UINT64_C(1)) << UINT64_C(12)) |
                    /* 20 */ (((insn >> UINT64_C(11)) & UINT64_C(1)) << UINT64_C(11)) |
                    /* 21 */ (((insn >> UINT64_C(10)) & UINT64_C(1)) << UINT64_C(10)) |
                    /* 22 */ (((insn >> UINT64_C(9)) & UINT64_C(1)) << UINT64_C(9)) |
                    /* 23 */ (((insn >> UINT64_C(8)) & UINT64_C(1)) << UINT64_C(8)) |
                    /* 24 */ (((insn >> UINT64_C(7)) & UINT64_C(1)) << UINT64_C(7)) |
                    /* 25 */ (((insn >> UINT64_C(6)) & UINT64_C(1)) << UINT64_C(6)) |
                    /* 26 */ (((insn >> UINT64_C(5)) & UINT64_C(1)) << UINT64_C(5)) |
                    /* 27 */ (((insn >> UINT64_C(4)) & UINT64_C(1)) << UINT64_C(4)) |
                    /* 28 */ (((insn >> UINT64_C(3)) & UINT64_C(1)) << UINT64_C(3)) |
                    /* 29 */ (((insn >> UINT64_C(2)) & UINT64_C(1)) << UINT64_C(2)) |
                    /* 30 */ (((insn >> UINT64_C(1)) & UINT64_C(1)) << UINT64_C(1)) |
                    /* 31 */ (((insn >> UINT64_C(0)) & UINT64_C(1)) << UINT64_C(0)) |
                    UINT64_C(0)
                )
                ^
                (UINT64_C(1) << (UINT64_C(16) - 1))
            )
            -
            (UINT64_C(1) << (UINT64_C(16) - 1))
        );
        break;

    case 0x04: /* RS, RT */
        value = (int64_t)(
            /* 6  */ (((insn >> UINT64_C(25)) & UINT64_C(1)) << UINT64_C(4)) |
            /* 7  */ (((insn >> UINT64_C(24)) & UINT64_C(1)) << UINT64_C(3)) |
            /* 8  */ (((insn >> UINT64_C(23)) & UINT64_C(1)) << UINT64_C(2)) |
            /* 9  */ (((insn >> UINT64_C(22)) & UINT64_C(1)) << UINT64_C(1)) |
            /* 10 */ (((insn >> UINT64_C(21)) & UINT64_C(1)) << UINT64_C(0)) |
            UINT64_C(0)
        );
        break;

    case 0x05: /* RB */
        value = (int64_t)(
            /* 16 */ (((insn >> UINT64_C(15)) & UINT64_C(1)) << UINT64_C(4)) |
            /* 17 */ (((insn >> UINT64_C(14)) & UINT64_C(1)) << UINT64_C(3)) |
            /* 18 */ (((insn >> UINT64_C(13)) & UINT64_C(1)) << UINT64_C(2)) |
            /* 19 */ (((insn >> UINT64_C(12)) & UINT64_C(1)) << UINT64_C(1)) |
            /* 20 */ (((insn >> UINT64_C(11)) & UINT64_C(1)) << UINT64_C(0)) |
            UINT64_C(0)
        );
        break;
    }


$ cat svp64-opc-gen.c
    [29] = {
        .name = "addi",
        .opcode = {
            .value = UINT64_C(0x0000000038000000),
            .mask = UINT64_C(0x00000000fc000000),
        },
        .operands = {
            [0] = UINT8_C(0x04), /* RT */
            [1] = UINT8_C(0x02), /* RA */
            [2] = UINT8_C(0x03), /* SI */
            [3] = UINT8_C(0x00), /* NIL */
        },
    },
    [232] = {
        /*
         * OE=0 [21]
         * Rc=0 [31]
         */
        .name = "add",
        .opcode = {
            .value = UINT64_C(0x000000007c000214),
            .mask = UINT64_C(0x00000000fc0007ff),
        },
        .operands = {
            [0] = UINT8_C(0x04), /* RT */
            [1] = UINT8_C(0x02), /* RA */
            [2] = UINT8_C(0x05), /* RB */
            [3] = UINT8_C(0x00), /* NIL */
        },
    },
Comment 31 Dmitry Selyutin 2023-09-06 20:22:25 BST
I implemented a hash lookup by PO as well. With this lookup, here's what we have:

$ echo -n > /tmp/test.s; for i in $(seq 4194304); do echo "addi 2,1,-1" >> /tmp/test.s; done
$ powerpc64le-linux-gnu-as -mlibresoc /tmp/test.s -o /tmp/test.o && powerpc64le-linux-gnu-objcopy -O binary /tmp/test.o /tmp/bin.o

$ time powerpc64le-linux-gnu-objdump -dr -mpowerpc -D -b binary -mpowerpc -EL /tmp/bin.o > /dev/null 

real    0m3.033s
user    0m1.922s
sys     0m1.082s


$ cat test.c
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include "svp64.h"

int
main(int argc, char *argv[]) {
    FILE *fp;
    struct svp64_ctx ctx;

    fp = fopen(argv[1], "rb");
    for (size_t index = 0; index < 4194304UL; ++index) {
        uint32_t insn;
        enum svp64_state state;

        fread(&insn, sizeof(uint32_t), 1, fp);
        state = svp64_disassemble(&ctx, insn);
        if (state != SVP64_SUCCESS)
            abort();

        printf("%s ", ctx.record->name);
        for (size_t id = 0; ctx.record->operands[id]; ++id)
            printf("%ld ", ctx.operands[id]);
        printf("\n");
    }

    return 0;
}


$ ./test /tmp/bin.o > /dev/null

real    0m0.854s
user    0m0.828s
sys     0m0.023s


I for sure realize that binutils does more work than we do, but still, I think this is a good start.
Comment 33 Dmitry Selyutin 2023-09-06 20:41:07 BST
The next plans are to handle other operand types (e.g. generating the correct masks; for now we support only usual DynamicOperand and SignedOperand classes). I plan to handle other operand types and, when that's done, mark this task as closed, because that's fairly sufficient for this revision.

First of all, I'm sorry about the code quality. Luke, I'm sorry, this includes line breaks; I have so little time that I really wanted to make something that works and paid little attention to anything besides making it work. Feel free to refactor it, or you can take my promise that I'll do it eventually. I'm really, really sorry, but I wanted to have a result to show. You can format it (just, please, use a single indent, OK?). Or just leave it to me.

Besides operands, as I stated in comment #13 (https://bugs.libre-soc.org/show_bug.cgi?id=979#c13), I have no further plans in scope of this task. Everything else is to be handled later. This "everything else" includes:

1. SVP64 support. This one depends on the following:
1.a. Introducing some text format for fields.
1.b. Introducing a generated jump table based on these fields.
2. Tests, including stress tests and tricky cases (like svhape2 vs svshape).
3. Assembler. Yes, we'll eventually need it, because at some stage I'd like to re-use bits of this generated code in binutils and possibly clang and whatever.
4. Possibly further optimizations (though I should tell I was surprized with the quality even now; I expected much worse outcome).
5. An opportunity for caller to adjust the output based on entity type (like colored output in binutils).

Each of these items deserves a standalone task.
Comment 34 Jacob Lifshay 2023-09-06 20:56:14 BST
(In reply to Dmitry Selyutin from comment #33)
> First of all, I'm sorry about the code quality. Luke, I'm sorry, this
> includes line breaks;

if you're referring to the generated C output, you can just run it through clang-format, it has a GNU style option.
Comment 35 Dmitry Selyutin 2023-09-06 21:04:03 BST
(In reply to Jacob Lifshay from comment #34)
> (In reply to Dmitry Selyutin from comment #33)
> > First of all, I'm sorry about the code quality. Luke, I'm sorry, this
> > includes line breaks;
> 
> if you're referring to the generated C output, you can just run it through
> clang-format, it has a GNU style option.

Nah, unfortunately I meant codegen.py, which takes care of the generator. As for the generated code, I'm not that strict; but the generator itself, I wish it could be better.
That said, despite the code quality... This idea with visitor was REALLY impressive. It'd have taken more time to achieve even this degree without visitors.
Comment 36 Dmitry Selyutin 2023-09-07 21:14:11 BST
Good news everyone! I've updated the code generation so that it also includes the details about the operand flags:

struct svp64_operand {
    uint32_t value;
    uint32_t flags;
};

#define SVP64_OPERAND_SIGNED    (UINT32_C(1) << UINT32_C(0))
#define SVP64_OPERAND_GPR       (UINT32_C(1) << UINT32_C(1))
#define SVP64_OPERAND_FPR       (UINT32_C(1) << UINT32_C(2))
#define SVP64_OPERAND_PAIR      (UINT32_C(1) << UINT32_C(3))
#define SVP64_OPERAND_CR3       (UINT32_C(1) << UINT32_C(4))
#define SVP64_OPERAND_CR5       (UINT32_C(1) << UINT32_C(5))
#define SVP64_OPERAND_NONZERO   (UINT32_C(1) << UINT32_C(6))
#define SVP64_OPERAND_ADDRESS   (UINT32_C(1) << UINT32_C(7))

From now on, every operand has flags assigned as well, including tricky cases like address operands and operands split accross multiple fields:

    case 0x0e: /* target_addr */
        value = (
            (
                (
                    /* 16 */ (((insn >> UINT32_C(15)) & UINT32_C(1)) << UINT32_C(13)) |
                    /* 17 */ (((insn >> UINT32_C(14)) & UINT32_C(1)) << UINT32_C(12)) |
                    /* 18 */ (((insn >> UINT32_C(13)) & UINT32_C(1)) << UINT32_C(11)) |
                    /* 19 */ (((insn >> UINT32_C(12)) & UINT32_C(1)) << UINT32_C(10)) |
                    /* 20 */ (((insn >> UINT32_C(11)) & UINT32_C(1)) << UINT32_C(9)) |
                    /* 21 */ (((insn >> UINT32_C(10)) & UINT32_C(1)) << UINT32_C(8)) |
                    /* 22 */ (((insn >> UINT32_C(9)) & UINT32_C(1)) << UINT32_C(7)) |
                    /* 23 */ (((insn >> UINT32_C(8)) & UINT32_C(1)) << UINT32_C(6)) |
                    /* 24 */ (((insn >> UINT32_C(7)) & UINT32_C(1)) << UINT32_C(5)) |
                    /* 25 */ (((insn >> UINT32_C(6)) & UINT32_C(1)) << UINT32_C(4)) |
                    /* 26 */ (((insn >> UINT32_C(5)) & UINT32_C(1)) << UINT32_C(3)) |
                    /* 27 */ (((insn >> UINT32_C(4)) & UINT32_C(1)) << UINT32_C(2)) |
                    /* 28 */ (((insn >> UINT32_C(3)) & UINT32_C(1)) << UINT32_C(1)) |
                    /* 29 */ (((insn >> UINT32_C(2)) & UINT32_C(1)) << UINT32_C(0)) |
                    UINT32_C(0)
                )
                ^
                (UINT32_C(1) << (UINT32_C(14) - 1))
            )
            -
            (UINT32_C(1) << (UINT32_C(14) - 1))
        );
        flags = SVP64_OPERAND_SIGNED;
        break;

    case 0x12: /* D */
        value = (
            (
                (
                    /* 16 */ (((insn >> UINT32_C(15)) & UINT32_C(1)) << UINT32_C(15)) |
                    /* 17 */ (((insn >> UINT32_C(14)) & UINT32_C(1)) << UINT32_C(14)) |
                    /* 18 */ (((insn >> UINT32_C(13)) & UINT32_C(1)) << UINT32_C(13)) |
                    /* 19 */ (((insn >> UINT32_C(12)) & UINT32_C(1)) << UINT32_C(12)) |
                    /* 20 */ (((insn >> UINT32_C(11)) & UINT32_C(1)) << UINT32_C(11)) |
                    /* 21 */ (((insn >> UINT32_C(10)) & UINT32_C(1)) << UINT32_C(10)) |
                    /* 22 */ (((insn >> UINT32_C(9)) & UINT32_C(1)) << UINT32_C(9)) |
                    /* 23 */ (((insn >> UINT32_C(8)) & UINT32_C(1)) << UINT32_C(8)) |
                    /* 24 */ (((insn >> UINT32_C(7)) & UINT32_C(1)) << UINT32_C(7)) |
                    /* 25 */ (((insn >> UINT32_C(6)) & UINT32_C(1)) << UINT32_C(6)) |
                    /* 11 */ (((insn >> UINT32_C(20)) & UINT32_C(1)) << UINT32_C(5)) |
                    /* 12 */ (((insn >> UINT32_C(19)) & UINT32_C(1)) << UINT32_C(4)) |
                    /* 13 */ (((insn >> UINT32_C(18)) & UINT32_C(1)) << UINT32_C(3)) |
                    /* 14 */ (((insn >> UINT32_C(17)) & UINT32_C(1)) << UINT32_C(2)) |
                    /* 15 */ (((insn >> UINT32_C(16)) & UINT32_C(1)) << UINT32_C(1)) |
                    /* 31 */ (((insn >> UINT32_C(0)) & UINT32_C(1)) << UINT32_C(0)) |
                    UINT32_C(0)
                )
                ^
                (UINT32_C(1) << (UINT32_C(16) - 1))
            )
            -
            (UINT32_C(1) << (UINT32_C(16) - 1))
        );
        flags = SVP64_OPERAND_SIGNED;
        break;

    case 0x1b: /* SVxd */
        value = (
            /* 6  */ (((insn >> UINT32_C(25)) & UINT32_C(1)) << UINT32_C(4)) |
            /* 7  */ (((insn >> UINT32_C(24)) & UINT32_C(1)) << UINT32_C(3)) |
            /* 8  */ (((insn >> UINT32_C(23)) & UINT32_C(1)) << UINT32_C(2)) |
            /* 9  */ (((insn >> UINT32_C(22)) & UINT32_C(1)) << UINT32_C(1)) |
            /* 10 */ (((insn >> UINT32_C(21)) & UINT32_C(1)) << UINT32_C(0)) |
            UINT32_C(0)
        );
        flags = SVP64_OPERAND_NONZERO;
        break;
Comment 38 Luke Kenneth Casson Leighton 2023-09-07 21:32:19 BST
(In reply to Dmitry Selyutin from comment #36)

> From now on, every operand has flags assigned as well, including tricky
> cases like address operands and operands split accross multiple fields:
> 
>     case 0x0e: /* target_addr */

ahh fantastic, i wasgoing to ask about those: really we need some
expressions (pseudocode fragments), i cannot say further as it is
confidential but that is a big hint.

ermermer! you said "svp64_operand", that should be
"powerisa_operand" or just "power_operand"?
Comment 39 Luke Kenneth Casson Leighton 2023-09-07 21:34:19 BST
(In reply to Dmitry Selyutin from comment #36)
> Good news everyone! I've updated the code generation so that it also
> includes the details about the operand flags:
> 
> struct svp64_operand {
>     uint32_t value;
>     uint32_t flags;
> };
> 
> #define SVP64_OPERAND_SIGNED    (UINT32_C(1) << UINT32_C(0))

yreah these should all be POWERISA_OPERAND_xxx

SVP64 is *only* the prefix. nothing to do with Power ISA 32-bit
instructions at all.
Comment 40 Dmitry Selyutin 2023-09-07 21:41:26 BST
(In reply to Luke Kenneth Casson Leighton from comment #39)
> (In reply to Dmitry Selyutin from comment #36)
> > Good news everyone! I've updated the code generation so that it also
> > includes the details about the operand flags:
> > 
> > struct svp64_operand {
> >     uint32_t value;
> >     uint32_t flags;
> > };
> > 
> > #define SVP64_OPERAND_SIGNED    (UINT32_C(1) << UINT32_C(0))
> 
> yreah these should all be POWERISA_OPERAND_xxx
> 
> SVP64 is *only* the prefix. nothing to do with Power ISA 32-bit
> instructions at all.

I used svp64 as prefix just becase the lib is called libsvp64. :-) I wanted it to be called libresoc, but libibresoc sounds disguisting (although I know one similar naming case). I think that'd be great to enforse this naming due to popularisation. Other naming options are welcome, as long as they're short.
Comment 41 Luke Kenneth Casson Leighton 2023-09-07 21:42:15 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/libsvp64

sorry dmitry, can you please rename it to libpowerisa or libpower?
or libopenpower or libopenpowerisa?

otherwise it implies that we are attempting to dominate and
subsume the entirety of Power ISA which is Trademarked, and
the OPF and IBM would have justification to jump on us from
a very great height.

this is serious.  it is actually a Criminal Offense to take
Trademarked material and "pass it off" as your own under
a different brand.  It's called "Counterfeiting" and it's
illegal (jail time).

please rename the library at your earliest convenience to include
"power" or "openpower" in order to respect the Trademark.
Comment 42 Luke Kenneth Casson Leighton 2023-09-07 21:43:15 BST
(In reply to Dmitry Selyutin from comment #40)

> to popularisation.

whilst not hijacking the Trademark and attempting to pass off Trademarked
material as "belonging to Libre-SOC". that's called "Counterfeiting".

> Other naming options are welcome, as long as they're
> short.

libpower libopenpowerisa libopenpower or libpowerisa.
Comment 43 Dmitry Selyutin 2023-09-07 21:48:48 BST
(In reply to Luke Kenneth Casson Leighton from comment #41)
> https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/
> libsvp64
> 
> sorry dmitry, can you please rename it to libpowerisa or libpower?
> or libopenpower or libopenpowerisa?
> 
> otherwise it implies that we are attempting to dominate and
> subsume the entirety of Power ISA which is Trademarked, and
> the OPF and IBM would have justification to jump on us from
> a very great height.
> 
> this is serious.  it is actually a Criminal Offense to take
> Trademarked material and "pass it off" as your own under
> a different brand.  It's called "Counterfeiting" and it's
> illegal (jail time).

Hm. I haven't thought about this aspect. OK then...

> please rename the library at your earliest convenience to include
> "power" or "openpower" in order to respect the Trademark.

How about libop? Or perhaps libopium, OpenPower Instruction Universal-or-Unified M-something? I like op as prefix. But opisa sounds awful.
Comment 44 Dmitry Selyutin 2023-09-07 21:53:05 BST
(In reply to Dmitry Selyutin from comment #43)
> How about libop? Or perhaps libopium, OpenPower Instruction
> Universal-or-Unified M-something? I like op as prefix. But opisa sounds
> awful.

Maybe libppcx, PPC extensions?
Comment 45 Dmitry Selyutin 2023-09-07 21:55:22 BST
(In reply to Dmitry Selyutin from comment #44)
> Maybe libppcx, PPC extensions?

Hm, this would likely hit the same issue as svp64 prefix. Damn, naming things is always the most difficult thing to do.
I thought about libinsndb, but that's too vague. libopinsndb is better, but stuff like `opinsndb_record` looks too long... or is it?
Comment 46 Jacob Lifshay 2023-09-07 22:11:47 BST
(In reply to Dmitry Selyutin from comment #45)
> (In reply to Dmitry Selyutin from comment #44)
> > Maybe libppcx, PPC extensions?
> 
> Hm, this would likely hit the same issue as svp64 prefix. Damn, naming
> things is always the most difficult thing to do.
> I thought about libinsndb, but that's too vague. libopinsndb is better, but
> stuff like `opinsndb_record` looks too long... or is it?

libopid? openpower instruction database?
Comment 47 Jacob Lifshay 2023-09-07 22:13:33 BST
(In reply to Jacob Lifshay from comment #46)
> libopid? openpower instruction database?

plus it seems to have no relevant search results so should be unique
Comment 48 Jacob Lifshay 2023-09-07 22:15:59 BST
do note that if we name it openpower or similar, openpower would reasonably expect it to eventually implement all instructions in the powerisa spec., including po1. that can happen much later tho.
Comment 49 Dmitry Selyutin 2023-09-07 22:58:59 BST
(In reply to Jacob Lifshay from comment #46)
> libopid? openpower instruction database?

Oh, nice! I totally like it. Luke, do you have any objections if this library is called libopid?
Comment 50 Luke Kenneth Casson Leighton 2023-09-07 23:01:10 BST
(In reply to Dmitry Selyutin from comment #49)

> Oh, nice! I totally like it. Luke, do you have any objections if this
> library is called libopid?

love it.
Comment 51 Dmitry Selyutin 2023-09-08 07:56:57 BST
Done, the library is now named libopid:
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=905dd24c769e74247935514f5ab8d462724ef798

Further directions are mentioned in comment #13 and comment #33. These, I think, should be done in scope of separate tasks.

I think the current state is sufficient to mark this task as completed. Any objections?
Comment 52 Luke Kenneth Casson Leighton 2023-09-08 10:11:38 BST
(In reply to Dmitry Selyutin from comment #51)
> Done, the library is now named libopid:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=905dd24c769e74247935514f5ab8d462724ef798

awesome.

> I think the current state is sufficient to mark this task as completed. 

ah sorry, i'm not seeing any unit tests. that's a hell of a lot of
money for such a short amount of code, and i am therefore uncomfortable
signing it off. i do appreciate the Visitor system is that powerful,
but it in turn means we can achieve more with less (our key strategy)

could you do say... mmm...
3-4 instructions that demonstrate how to use it? with real simple
"exit(-1)" if fail and exit(0) if true on verifying that the operands
decode correctly?

the other alternative is to also auto-generate a *python c-module*
interface on top, and we can then accelerate *ISACaller* by importing
it, replace PowerDecoder entirely (hooray).
Comment 53 Dmitry Selyutin 2023-09-08 16:31:47 BST
(In reply to Luke Kenneth Casson Leighton from comment #52)
> could you do say... mmm...
> 3-4 instructions that demonstrate how to use it? with real simple
> "exit(-1)" if fail and exit(0) if true on verifying that the operands
> decode correctly?

No, if we do it, we'll do it correctly. I'll come back with something which can be used to check (almost) arbitrary instructions. I have an idea on how to do it so that this will be a cross-check. Otherwise there's no point in illustrating the API with 3-4 instructions, since for now API is dead simple and completely self-explanatory. Tests, on the other hand, are much a better idea.

> the other alternative is to also auto-generate a *python c-module*
> interface on top, and we can then accelerate *ISACaller* by importing
> it, replace PowerDecoder entirely (hooray).

Using your own wording, I'd say that's a hell of a little of money for such a big work, and I am therefore uncomfortable starting it in scope of this task. :-)

> for such a short amount of code, and i am therefore uncomfortable
> signing it off. i do appreciate the Visitor system is that powerful,
> but it in turn means we can achieve more with less (our key strategy)

Luke, I have to stop at this paragraph, even though I'm not comfortable doing it. Judging the payment based on the code size is a totally wrong metric, which will likely yield not the outcome you expect. The fact that you see this code the way it looks like is because there were _many_ efforts before this code came to life. Don't think it was easy, because in reality wasn't, quite the contrary. It ended up simpler I expected, that's true, but it was not a simple 5-minute job.

I ask you, please, don't ever, ever use this as an argument. The only possible outcome we could gain from such strategy is that people would write more code only to justify the payment. Or they might think against writing it at all. I'm writing this, because I've seen this metric in action, and I know the outcome pretty well. Again: please don't use this as an argument, ever.


So, to summarize: I'll return with something to be used for testing purposes, because this is The Right Thing To Do. This decision has nothing to do with the nonsensical argument about the code size.
Comment 54 Luke Kenneth Casson Leighton 2023-09-08 16:41:51 BST
(In reply to Dmitry Selyutin from comment #53)

> No, if we do it, we'll do it correctly. I'll come back with something which
> can be used to check (almost) arbitrary instructions.

if you're happy to do that and it's easy within the budget _great_.
but remember what i said (and you do it below) always always assert/state
what *you* value the code at.  NLnet's guide is EUR 3,000 per *month* so
anything over that, we have to have a *really* good reason.

>  I have an idea on how
> to do it so that this will be a cross-check. Otherwise there's no point in
> illustrating the API with 3-4 instructions, since for now API is dead simple
> and completely self-explanatory.

to you, it is, because you wrote it. remember: other users who are unfamiliar
*need* documentation and/or examples.

>  Tests, on the other hand, are much a better idea.

they'll illustrate the usage in the same way as examples. i'm happy with that.

> > the other alternative is to also auto-generate a *python c-module*
> > interface on top, and we can then accelerate *ISACaller* by importing
> > it, replace PowerDecoder entirely (hooray).
> 
> Using your own wording, I'd say that's a hell of a little of money for such
> a big work, and I am therefore uncomfortable starting it in scope of this
> task. :-)

i don't have a problem with that at all.
 
> Luke, I have to stop at this paragraph, even though I'm not comfortable
> doing it. Judging the payment based on the code size is a totally wrong
> metric, 

i remember the "$1 for tapping with a hammer, $9999 for knowing where to tap"
anecdote, and you've given NLnet (and the EU Auditor) enough here to appreciate
the value of what you've done.

> I'll return with something to be used for testing purposes, because this is 
> The Right Thing To Do

perfect. no rush, but don't push it outside of what you value your
contribution and effort, either.
Comment 55 Dmitry Selyutin 2023-09-08 18:06:00 BST
I re-read my comment and came to conclusion it might have given you the wrong impression. To clarify: adding the test is a perfectly fine and valid idea. The only item which attracted my attention was this code size thing, so there was my reaction. Perhaps that was somewhat a bit more exaggerated than needed; if the comment made that impression, I'm sorry, Luke, I didn't want it to appear this way.

As for the tests, I actually liked the idea, and the motto of the comment overall was "that's The Right Thing To Do, I really missed it, thank you for pointing that!". I've been thinking of the following:
1. Standalone C executable (opid-dis), which is capable of base instruction decoding (just gets the raw data and outputs in some simple-ish format easy to parse).
2. Some Python test which builds the instructions, assembles them and then feeds to opid-dis and gets its output.
3. When opid-dis output is captured, the Python test compares the results to the expected.

The rationale:
1. Most of the logic is in Python. Checking new instructions and extending the test is dead simple.
2. opid-dis can be converted into a real disassembler. Not that we actually need it (perhaps we don't, perhaps we do), but we can grasp what's missing on API side for somebody who wants a real disassembler.
3. At some later stage, we might opt to use the same approach for the assembler. I really would like to have both assembly and disassembly generated, adding each and every instruction to binutils manually is the worst choice.
Comment 56 Andrey Miroshnikov 2023-09-08 18:30:23 BST
(In reply to Dmitry Selyutin from comment #55)
>
> As for the tests, I actually liked the idea, and the motto of the comment
> overall was "that's The Right Thing To Do, I really missed it, thank you for
> pointing that!". I've been thinking of the following:
> 1. Standalone C executable (opid-dis), which is capable of base instruction
> decoding (just gets the raw data and outputs in some simple-ish format easy
> to parse).
> 2. Some Python test which builds the instructions, assembles them and then
> feeds to opid-dis and gets its output.
> 3. When opid-dis output is captured, the Python test compares the results to
> the expected.

If you don't mind me butting in, I really like the idea.

Because, to my understanding, all supported instructions of the disassembler can be tested with this python script (eventually, probably out-of-scope/unnecessary right now).

And having opid-dis as a standalone tool means it can eventually be matured to become a full disassembler (as you have already rationalised).

> 
> 3. At some later stage, we might opt to use the same approach for the
> assembler.

Making the future work of the assembler easier by having this executable/python test framework sounds great as well.

> I really would like to have both assembly and disassembly
> generated, adding each and every instruction to binutils manually is the
> worst choice.

I'm sure we all agree that autogenerating is better. Perhaps another task we can allocate some funding to from one of the grants that need to be used up?
Comment 57 Dmitry Selyutin 2023-09-08 19:59:27 BST
(In reply to Andrey Miroshnikov from comment #56)
> (In reply to Dmitry Selyutin from comment #55)
> If you don't mind me butting in, I really like the idea.

No, I'm really glad that you are interested in this, and it's perfect that I can know your opinion on this. It's difficult to develop something without a feedback; that's one of the reasons I'm writing so many comments during the development. :-)

> Because, to my understanding, all supported instructions of the disassembler
> can be tested with this python script (eventually, probably
> out-of-scope/unnecessary right now).

Exactly! I suspect we'll unlikely be able to test each and every instruction at this point, since this took literally months and separate works on rewriting Python asm and disasm to reach even the previous point, but the overall idea is that they must be on par, Python version and C version. And they kinda complement each other: C code is based on Python code, but at some places in Python, where we strive for speed, we'll have to switch to C code.

> And having opid-dis as a standalone tool means it can eventually be matured
> to become a full disassembler (as you have already rationalised).

I have some doubts on the latter part, to be honest. I have no illusions on us being used as a replacement on binutils or clang: they are de facto standard. If we want to survive -- their support is mandatory. However, I'm really, really tired of adding all code in binutils manually, and doing the same work for clang would be completely insane. Since we basically want to support all our code in binutils and might (and likely will) opt to support it in clang, we basically want to have everything ready for writing assembler and disassembler. And the best way to ensure that we indeed have everything ready for assembler and disassembler is writing at least trivial verions of them. Plus it helps to invent the overall API architecture: updating it on the later stages would be quite a difficult task.

> Making the future work of the assembler easier by having this
> executable/python test framework sounds great as well.

Yep. I also have a confession to make: we currently have quite a limited set of tests for assembly and disassembly in Python, and completely lack them for binutils. The latter is mostly caused by the fact that until recent we lacked a way to generate so much code, we're just in the beginning of the way. Until insndb, we had no generalized and uniform way to process all data we had in repository; until visitors, we had no good way to re-use this information for other means (including code generation).

> I'm sure we all agree that autogenerating is better. Perhaps another task we
> can allocate some funding to from one of the grants that need to be used up?

Yep, I think we need to have it. Especially if we want to support other tools, like clang. That's actually the main reason why I chose to implement this code as a library with explicit dis-related parts. Besides assembler, I also shared some future directions. I'll try and list them again in a separate comment (there is some information here and there, some also in my head). Then we can put priorities to these tasks and find volunteers who can help with them: there's a huge amount of work to do, I will be glad for help from all people interested.
Comment 58 Luke Kenneth Casson Leighton 2023-09-08 20:38:38 BST
(In reply to Andrey Miroshnikov from comment #56)

> I'm sure we all agree that autogenerating is better. Perhaps another task we
> can allocate some funding to from one of the grants that need to be used up?

bug #984. create a subtask under there. the rationale is: it is
"progress *towards*" SVP64 decode. don't take the piss (allocate
too large an amount) eq/under EUR 2500, 3 such bugs is ok as long
as they are still "towards" SVP64. given that 32bit decode is needed
before SVP64, you get the idea.
Comment 59 Dmitry Selyutin 2023-09-09 19:26:08 BST
Hi folks, I've created a test which can be as a foundation to check the generated code. The idea is to provide standalone micro-disassembler, which just fetches the instructions from stdin and prints the very basic disassembly (instruction name, operands values, operands flags). The Python test builds the instructions from plain text via insndb, then feeds them to micro-disassembler, then checks whether the expected output matches the actual one.

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=77fc07a0cb5819a908da042d7a980b44ca782d74

Currently I've checked three variants of addpcis (these have split fields and signed operands), and svshape vs svshape2 (because these have non-zero operands and complicated opcode mapping).

Luke, please check whether these are sufficient to mark the task as completed.
Comment 60 Luke Kenneth Casson Leighton 2023-09-09 20:45:07 BST
(In reply to Dmitry Selyutin from comment #59)

> Luke, please check whether these are sufficient to mark the task as
> completed.

Perfect.