Bug 1154 - Support basic PowerPC generated assembly
Summary: Support basic PowerPC generated assembly
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Dmitry Selyutin
URL:
Depends on: 979
Blocks: 984
  Show dependency treegraph
 
Reported: 2023-09-08 21:40 BST by Andrey Miroshnikov
Modified: 2023-10-13 18:18 BST (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2021-08-071.cavatools
total budget (EUR) for completion of task and all subtasks: 2500
budget (EUR) for this task, excluding subtasks' budget: 2500
parent task for budget allocation: 984
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
[ghostmansd] amount=2500 submitted=2023-09-14 paid=2023-09-20


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Miroshnikov 2023-09-08 21:40:18 BST
To reduce the time and effort required to align the binutils assembler with isndb(?) (whenever changes are made to the Libre-SOC ISA), a code generator needs to be written. At present, the binutils source has to be modified manually, which is error-prone and time/effort intensive.

This task will aid with cavatools, as an SVP64 assembler will be required regardless of the simulator being used (or even hardware), and reduce the burden on developers picking up future tasks.

See bug #979 comment #55 (https://bugs.libre-soc.org/show_bug.cgi?id=979#c55) where this task had been suggested.
Comment 1 Dmitry Selyutin 2023-09-08 22:59:48 BST
A whole SVP64 assembly depends on a critical task (I think to be raised yet, but I'll check whether we already have it already tomorrow): we need to generate the specs and fields, otherwise syncing them would be a nightmare.

See task #4 here:
http://lists.libre-soc.org/pipermail/libre-soc-dev/2023-September/005633.html

At this point, we can support a trivial PPC assembly. This will need:
1. Fast instruction lookup by name (therefore another table to be generated).
2. Additional operands logic (likely the validation mechanisms).
3. Overall assembly API. We want it to be robust and flexible, but at the same time easy to integrate into other projects (binutils as the first candidate). Whilst disassembly API is simple ("just take this integer as an instruction and get stuff from there"), the assembler API will likely be a multistage (e.g. on a per-operand basis); otherwise no chance of making a friendship with binutils macro subsystem which successfully kicked me in the back several times.

Ironically, the items above appear in the order of the complexity and amount of efforts. :-)
Comment 2 Dmitry Selyutin 2023-09-08 23:10:51 BST
And, as usual, limitations:

1. No macros. At all. That's not our level. libopid should rely on higher-level abstractions which just provide the value. The only exceptions are registers parsing and some stuff like our custom CRs representations; both "exceptions" are just a way to convert an input into a numeric value (we'll want this to be reused by binutils macro subsystem).

2. Only raw binaries. Only a way to build instruction so that it's reused by real assembler. No elves (ELF), no dwarves (DWARF), nothing similar. "Fire and forget", "instruction name + (set of?) API(s?) to handle operands -> 32-bit insn". As I said, likely a set of calls like "update context with insn name", "update context with operand at index X", "update context with operand at index Y", etc. I'm not yet sure where we have a good balance with macro subsystem; perhaps functions to convert from registers/branches should just convert to values and APIs should just operate on the values already converted. Feeding everything as a "long" string assumes we have to parse it on our own, and that's something I'd like to avoid. Again, that's not our task, it's macro subsystem plus tokenizer plus lexer of higher-level stuff; we can aid but we do not intend to substitute.

3. Only instructions which have at this point in openpower-isa, and only those for which we already managed to generate the disassembly. We cannot support what we don't yet have. No aliases, no fancy vectors.
Comment 3 Jacob Lifshay 2023-09-08 23:22:53 BST
if this is intended to replace the ppc encode/decode in binutils, it needs to eventually have some support for relocations -- basically selecting the right one based on what was written in the assembly source, since relocations can depend on which instruction form is used.
Comment 4 Dmitry Selyutin 2023-09-08 23:49:05 BST
(In reply to Jacob Lifshay from comment #3)
> if this is intended to replace the ppc encode/decode in binutils, it needs
> to eventually have some support for relocations -- basically selecting the
> right one based on what was written in the assembly source, since
> relocations can depend on which instruction form is used.

I have yet to check, but I suspect this is to be handled by binutils, not us; or maybe we could just give the caller a way to pass a callback. It's up to binutils to give us a proper stuff here. I want both asm and dis to be completely trivial, just like "we give you building bricks but it's up to you to decide how to build a house". Maybe it's the same houses as three pigs built to hide from the wolf, we have yet to investigate. :-)
Comment 5 Jacob Lifshay 2023-09-09 01:53:37 BST
(In reply to Dmitry Selyutin from comment #4)
> (In reply to Jacob Lifshay from comment #3)
> > if this is intended to replace the ppc encode/decode in binutils, it needs
> > to eventually have some support for relocations -- basically selecting the
> > right one based on what was written in the assembly source, since
> > relocations can depend on which instruction form is used.
> 
> I have yet to check, but I suspect this is to be handled by binutils, not
> us;

binutils needs to know which bits to tell the linker to modify and how, that seems very closely related to encoding/decoding instructions.
Comment 6 Dmitry Selyutin 2023-09-10 17:42:34 BST
Damn, this proves to be more difficult than I thought. :-)
We already have a big table sorted by opcodes for disassembler, but obviously sorting by opcodes makes no sense when we try to look up a name.
Currently the best option I'm thinking is another table:

// opcodes table
    [73] = {
        .name = "svshape",
        .opcode = {
            .value = UINT32_C(0x0000000058000019),
            .mask = UINT32_C(0x00000000fc0007bf),
        },
        .operands = {
            [0] = UINT8_C(0x1b), /* SVxd */
            [1] = UINT8_C(0x1c), /* SVyd */
            [2] = UINT8_C(0x1d), /* SVzd */
            [3] = UINT8_C(0x1e), /* SVrm */
            [4] = UINT8_C(0x1f), /* vf */
            [5] = UINT8_C(0x00), /* NIL */
        },
    },
    [80] = {
        .name = "svshape",
        .opcode = {
            .value = UINT32_C(0x0000000058000099),
            .mask = UINT32_C(0x00000000fc0007bf),
        },
        .operands = {
            [0] = UINT8_C(0x1b), /* SVxd */
            [1] = UINT8_C(0x1c), /* SVyd */
            [2] = UINT8_C(0x1d), /* SVzd */
            [3] = UINT8_C(0x1e), /* SVrm */
            [4] = UINT8_C(0x1f), /* vf */
            [5] = UINT8_C(0x00), /* NIL */
        },
    },

// names table
/* svshape */
        /* 010110---------------0000-011001 */ [650] = 73,
        /* 010110---------------0001-011001 */ [651] = 80,


Then there will be another table sorted by name which contains a pair of integers assigned to each name (e.g. 650 and 663). I'm still working on sorting this.

We actually I think have only two of these pigs -- isel and svshape. But I still would like to have a generic and uniform algorithm.
Comment 7 Dmitry Selyutin 2023-09-10 17:50:11 BST
That is, to clarify, the opcode assembly (which includes all "static operands", i.e. PO, XO and say Rc), is going to look like this:

1. Lookup a range in name_range table by name; a range is represented as pair of integers.
2. Use the range from name_range table to iterate over id_opcode table. Each index (id) maps to index in opcodes (opcode)
3. Try to assemble the provided operands based on the data provided in the opcodes table.

Gory, but somewhat similar to binutils. They, unlike us, cheat somewhat: they merge all possible opcodes into a single entry. This, however, needs another kind of hack, e.g. place svshape2 before any other svshape.

The assembler turns out to be a real pig here, not disassembler... And all thanks to isel and svshape.
Comment 8 Dmitry Selyutin 2023-09-10 21:37:32 BST
I had to decouple the generated code from the static one, because the the Python generator is already messy:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ccb321faf53018750abdfd30fd66ccae8d8b22e1

This, in turn, required to rename the name of the binary which we use to test the disassembler:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=0e29ecf5e5785fe6f20158fef09a50644e1746e4

But at least now we have something which can be used for opcode lookup:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=7448458d2c062b4e90e1e550520cb24aeb231ca0


From now on, the following function is used to query all records which match the instruction name:

size_t
opid_lookup_name(char const *name,
        struct opid_record const *records[], size_t count);


This function unsurprisignly takes an array and returns the expected count (as in printf: the count that'd have been written if the initial count permits).

This should be sufficient to cover all "static operands" (a dumb name from insndb; actually these include PO, XO, Rc and similar fields).


The next stage is to add the generation for the dynamic operands (again, if someone smarter than me looks at this: arguments).
Comment 9 Dmitry Selyutin 2023-09-10 21:38:49 BST
The reference code, for anybody curious:

    @classmethod
    def assemble(cls, record, arguments=None):
        if arguments is None:
            arguments = ()

        insn = cls.integer(value=0)

        for operand in cls.static_operands(record=record):
            operand.assemble(insn=insn)

        arguments = Arguments(record=record,
            arguments=arguments, operands=cls.dynamic_operands(record=record))
        for (value, operand) in arguments:
            operand.assemble(insn=insn, value=value)

        return insn

static_operands function yields all POs and XOs as well. That is, for isel, this would bring multiple POs.
Comment 10 Dmitry Selyutin 2023-09-10 21:41:06 BST
When I'll get my hands to raising bugs related to http://lists.libre-soc.org/pipermail/libre-soc-dev/2023-September/005633.html, I'll add one item: refactor insndb so that instructions are distinguished by an opcode, not by name. Currently we get one record with multiple opcodes, which is quite an illogical choice (should be per-opcode).
Comment 11 Dmitry Selyutin 2023-09-10 21:44:16 BST
(In reply to Dmitry Selyutin from comment #9)
> static_operands function yields all POs and XOs as well. That is, for isel,
> this would bring multiple POs.

Correction: multiple XOs. PO is the same (31).
Comment 12 Dmitry Selyutin 2023-09-10 21:49:18 BST
I rebased several commits a bit, so the most interesting commit ended up with the different hash: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=7448458d2c062b4e90e1e550520cb24aeb231ca0.
Comment 13 Dmitry Selyutin 2023-09-10 21:51:24 BST
Before you ask: yes I know we can do much better than binary search here. In fact, we can do a perfect hash, since we know all names in advance. However, I don't want to bring gperf as dependency, and time constraints don't allow me to invent micro-gperf in Python.
Comment 14 Dmitry Selyutin 2023-09-10 22:04:48 BST
(In reply to Dmitry Selyutin from comment #12)
> I rebased several commits a bit, so the most interesting commit ended up
> with the different hash:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=7448458d2c062b4e90e1e550520cb24aeb231ca0.

Another correction: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=e911f74fdb8eff76fe7b07b194fd12a7b7491d63
Comment 15 Dmitry Selyutin 2023-09-13 18:34:31 BST
It took an awful lot of time, thanks to major refactoring of code generation and isel and svshape. From now on, we generate a unique opcode for each and every instruction; no more redundant table which specifies "hey iterate over multiple records, here's the range as the second table so that you could jump into the third table".

    [85] = {
        /*
         * PO=22 [0, 1, 2, 3, 4, 5]
         * XO=281 [21, 22, 23, 26, 27, 28, 29, 30, 31]
         */
        .name = "svshape2",
        .opcode = {
            .value = UINT64_C(0x58000419),
            .mask = UINT64_C(0xfc00073f),
        },
        .operands = {
            [0] = UINT8_C(0x2a), /* SVo */
            [1] = UINT8_C(0x0b), /* SVyx */
            [2] = UINT8_C(0x21), /* rmm */
            [3] = UINT8_C(0x1d), /* SVd */
            [4] = UINT8_C(0x1f), /* sk */
            [5] = UINT8_C(0x24), /* mm */
            [6] = UINT8_C(0x00), /* NIL */
        },
    },
    [86] = {
        /*
         * PO=22 [0, 1, 2, 3, 4, 5]
         * XO=25 [26, 27, 28, 29, 30, 31]
         */
        .name = "svshape",
        .opcode = {
            .value = UINT64_C(0x58000019),
            .mask = UINT64_C(0xfc00003f),
        },
        .operands = {
            [0] = UINT8_C(0x1b), /* SVxd */
            [1] = UINT8_C(0x1c), /* SVyd */
            [2] = UINT8_C(0x1d), /* SVzd */
            [3] = UINT8_C(0x1e), /* SVrm */
            [4] = UINT8_C(0x1f), /* vf */
            [5] = UINT8_C(0x00), /* NIL */
        },
    },
    [359] = {
        /*
         * PO=31 [0, 1, 2, 3, 4, 5]
         * XO=15 [26, 27, 28, 29, 30]
         */
        .name = "isel",
        .opcode = {
            .value = UINT64_C(0x7c00001e),
            .mask = UINT64_C(0xfc00003e),
        },
        .operands = {
            [0] = UINT8_C(0x04), /* RT */
            [1] = UINT8_C(0x02), /* RA */
            [2] = UINT8_C(0x05), /* RB */
            [3] = UINT8_C(0x2d), /* BC */
            [4] = UINT8_C(0x00), /* NIL */
        },
    },

The most difficult task was to make svshape2 appear before svshape in a way that is generic. The table used to be sorted by just (opcode.value, opcode.mask). Now we check count of bits in mask as well. In hoc signo uincemus!
Comment 16 Dmitry Selyutin 2023-09-13 18:39:26 BST
I also used an opportunity to switch to 64-bit integers for opcode, because we might use the same mechanism to generate the whole svp64 prefix too, or maybe other instructions which include prefix too. As a crazy idea, we could match the whole instruction immediately, including all specs (e.g. /vli, /w=##, etc.).

I had to refactor some of the previous commits and rebase and so on. But the most relevant commits are:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=2a1f83cb7b6dc54ef67838493808d12386453dac
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ef53c672a2d168329b1a73884018be6f1e556136
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=027e8446a638b467f61d418e9efba92161c7e9c8
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6d14c20dc0f19e15fedb6e9aac4ef663e8de6224
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=c5c274572df7910601cdf732a42fb0c0779581fd
Comment 17 Luke Kenneth Casson Leighton 2023-09-13 18:48:16 BST
(In reply to Dmitry Selyutin from comment #15)

> The most difficult task was to make svshape2 appear before svshape in a way
> that is generic. The table used to be sorted by just (opcode.value,
> opcode.mask). Now we check count of bits in mask as well. In hoc signo
> uincemus!

awesome :) i saw your query monday but have been travelling, glad it's
solved. the way it is done for isel was just to ignore the 5-bit selector
field.

svshape would need 14 matchers for the 4-bits SVRM and the 2 for svshape2
can be done with 3-bit match.  bit of a pain that. sorry :)
Comment 18 Dmitry Selyutin 2023-09-13 22:02:56 BST
Hooray! The small opid-check binary is now capable of cross-checking assembly and disassembly. The flow is the following:

1. List all instructions along with their arguments in Python as sequences of (string, string-tuple).
2. Assemble all such instructions via insndb and output into a temporary unnamed file the object code.
3. Feed this code to opid-check: without arguments, it acts as a disassembler. opid-check reads its stdin and collects all instructions, then disassembles them.
4. After comparing the disassembly with the expected output, return the temporary unnamed file with the object code.
5. Then, once we're done with the disassembly, feed the same instructions into opid-check assembler mode. opid-check treats its command line arguments as instructions in form "insn1 arg0 arg1 : insn2 arg0 arg1 arg3". The semicolon is used as a separator. Then opid-check outputs the object code into stdout, which, in Python, is another unnamed temporary file.
6. In the end, we compare the code we got at stages 3-4 with the object code from stage 5.
Comment 19 Dmitry Selyutin 2023-09-13 22:04:25 BST
Luke, what'd you say if I tell this is completed? I find the results encouraging, especially considering the tests.
Comment 21 Luke Kenneth Casson Leighton 2023-09-13 22:17:07 BST
(In reply to Dmitry Selyutin from comment #19)
> Luke, what'd you say if I tell this is completed? I find the results
> encouraging, especially considering the tests.

i'd already marked it done :) we initially only wanted assembly,
disass is icing.