Bug 550 - binutils support needed for svp64
Summary: binutils support needed for svp64
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Dmitry Selyutin
URL:
Depends on:
Blocks: 577
  Show dependency treegraph
 
Reported: 2020-12-18 20:33 GMT by Luke Kenneth Casson Leighton
Modified: 2022-05-19 11:19 BST (History)
6 users (show)

See Also:
NLnet milestone: NLNet.2019.10.Formal
total budget (EUR) for completion of task and all subtasks: 4500
budget (EUR) for this task, excluding subtasks' budget: 1500
parent task for budget allocation: 577
child tasks for budget allocation: 833 837 838
The table of payments (in EUR) for this task; TOML format:


Attachments
Ho ho ho! (4.67 KB, patch)
2020-12-26 02:14 GMT, Alexandre Oliva
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-12-18 20:33:41 GMT
gnu as and objdump support is needed for the new svp64 encoding.

first recommended approach (simplest): a new "instruction"

     svp64 0xNNNNNN (or binary)

encoding would be as described in "Prefix Fields" starting with Major Opcode EXT01

https://libre-soc.org/openpower/sv/svp_rewrite/svp64/

subsequent encoding (TBD) would be:

    svp64 SUBVL=2,ew=8,rt=v,mask=r3

and finally allow that to be a "prefix" of instructions as (TBD)

     addi[r3] r64.w.vec2.v, 5

w: elwidth=32
v: RT is vector
vec2: SUBVL=2
[r3]: mask=r3

opcodes listed here, gives prefixing info, autogenerated.
https://libre-soc.org/openpower/opcode_regs_deduped/
Comment 1 Alexandre Oliva 2020-12-21 19:08:46 GMT
I'm a little puzzled (not just because I can hardly make head from tail of the svp64 web page :-)

why bother with "svp64 0x..." syntax, if we already have .long?


as for making sense of the page.  I guess it must all make some sense if you have some vague notion of what the prefixes are supposed to accomplish, but that's not me.  I could use some examples, or pointers to earlier, more complete and self-contained docs that would give me some sense of what's supposed to be going on there.

not that I really need to be able to make sense of it before I can implement binutils changes, mind you; it just helps avoid silly mistakes, and wrong assumptions, and I figured I might be able to help validate the proposed design, if only I had the required background.  alas, I suppose I'm missing background on GPUs, ppc 3.1 opcodes, and the earlier simd design for risc-v
Comment 2 Luke Kenneth Casson Leighton 2020-12-21 20:09:51 GMT
(In reply to Alexandre Oliva from comment #1)
> I'm a little puzzled (not just because I can hardly make head from tail of
> the svp64 web page :-)
> 
> why bother with "svp64 0x..." syntax, if we already have .long?

yes jacob pointed that out... although... an "svp64 0xNNNNNNN" instruction would help you to understand the "first phase": where the RM field fits.


> 
> as for making sense of the page.  I guess it must all make some sense if you
> have some vague notion of what the prefixes are supposed to accomplish, but
> that's not me.

SV - aka SimpleV - is a hardware for-loop around instructions.

that's it.  full stop.

here is some pseudocode that shows what that looks like, using ADD as an example:

https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=simple_v_extension/simple_v_chennai_2018.tex;hb=HEAD#l190


>  I could use some examples, or pointers to earlier, more
> complete and self-contained docs that would give me some sense of what's
> supposed to be going on there.

this paragraph puts the above one-liner and the pseudocode into context:

https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=simple_v_extension/specification.mdwn;hb=HEAD#l38


> not that I really need to be able to make sense of it before I can implement
> binutils changes, mind you; it just helps avoid silly mistakes, and wrong
> assumptions, and I figured I might be able to help validate the proposed
> design, if only I had the required background. 

appreciated.

> alas, I suppose I'm missing
> background on GPUs, ppc 3.1 opcodes, and the earlier simd design for risc-v

there was no SIMD ISA: SV is *categorically* and very specifically diametrically opposed to SIMD.

SIMD is considered harmful:
https://www.sigarch.org/simd-instructions-considered-harmful/

x86 expanded from 70 to *1400* instructions since 1978, thanks to SIMD (far, far more since adding AVX512.  SIMD is an O(N^6) opcode proliferation nightmare.

also we are not adding v3.1B opcodes (that is a separate discussion which requires OPF permission). the sole exclusive reason for using EXT01 is to get the "fitting in" with v3.1B 64 bit prefixing in a nondisruptive fashion that the OPF ISA WG should not have any objection to.


the sigarch article shows how RVV works.  SV is based on the exact same underlying principle: you have an instruction, you have a vector loop on that instruction, elements are computed based on that instruction.

full stop.

it's real simple.

VL in our case can be anywhere from 1 to 64.  *very rarely* it is permitted to be zero.

so how do we set this "VL" or vector length?

well, with an instruction of course.
https://libre-soc.org/openpower/sv/setvl/

and... err... then what?  well, no standard 32 bit scalar instructions do anything: they don't "understand" VL.

so we "Prefix" them.  this says, "hey you know that VL for-loop you want applied? well the next 32 bits contains the instruction to be smashed into that for-loop, oh and by the way here's some other random trash to chuck at the loop, such as predication, blah blah".

therefore, ultimately, we want this kind of syntax:

    setvl r3, r5, VL=4
    SUBVL=2, ELWIDTH=8 { add r5, r5, r2 }

the output will be:

* 32 bits containing an instruction for setvl
* 32 bits starting with EXT01 as its Major Opcode and continuing with the pattern that drops SUBVL=2 and ELWIDTH=8 somewhere into the RM field bits
* 32 bits containing an addi instruction

this will get us that hardware for-loop activated 4 times (0-3) on that add instruction.

actually 8 because SUBVL=2

and, actually, it will be 8bit adds not 64bit adds because ELWIDTH=8.

does that provide you with a quick crash-course in how SV works?
Comment 3 Luke Kenneth Casson Leighton 2020-12-21 21:50:27 GMT
i've added a rapid prototype "Assembly Annotation" to the appendix,
and also updated the "Prefix Fields".

unnnforttunately, i just realised that, actually, working out which
of the "Remapped Encodings" to apply, will need to be a per-instruction
basis, for everything but MASK_KIND, MASK, ELWIDTH, SUBVL and MODE.
these are always in the same place: everything else (EXTRAs, ELWIDTH_SRC,
MASK_SRC) critically depends on what instruction is used.

we can "get away with this" by specifying the mode-type as part of the
svp64 encoding... for now.
Comment 4 Alexandre Oliva 2020-12-26 02:14:34 GMT
Created attachment 123 [details]
Ho ho ho!

Here's a patch that introduces in GNU binutils an svp64 pseudo-instruction, that takes a single 24-bit operand, and encodes it as a 32-bit insn with EXT01 as the major opcode, and MSB0 bits 7 and 9 also set, shuffling the top two bits of the 24-bit operand, RM[0] and RM[1], into bits 6 and 8 of the insn.
Comment 5 Luke Kenneth Casson Leighton 2020-12-26 02:38:08 GMT
(In reply to Alexandre Oliva from comment #4)
> Created attachment 123 [details]
> Ho ho ho!

:)
 
> Here's a patch that introduces in GNU binutils an svp64 pseudo-instruction,
> that takes a single 24-bit operand, and encodes it as a 32-bit insn with
> EXT01 as the major opcode, and MSB0 bits 7 and 9 also set, shuffling the top
> two bits of the 24-bit operand, RM[0] and RM[1], into bits 6 and 8 of the
> insn.

cool!  this is fantastic, it means that the next stages open up as well, for adding basic SV capability to ISACaller (the simulator).

alexandre, i will create a binutils git clone tomorrow, to make sure this gets tracked properly.
Comment 6 Alexandre Oliva 2020-12-26 02:51:30 GMT
thanks for the crash course.  as I said in the call, it was very useful.
it's all beginning to make sense.

> we can "get away with this" by specifying the mode-type as part of the
svp64 encoding... for now.

I was going to ask about that.  it seems that there's nothing in the svp64 prefix instruction itself that tells how to decode its fields, you have to look at the actual insn that follows to know.

Once we get to a stage in which we'll want to specify svp64 fields separately, rather than combined into a 24-bit immediate, an explicit specification of mode may help the assembler, to some extent, but the disassembler (and the assembler, if it's to detect inconsistencies) will have to look at prefix+insn as a single thing to be able to do its job.
Comment 7 Luke Kenneth Casson Leighton 2020-12-26 08:03:59 GMT
(In reply to Alexandre Oliva from comment #6)
> thanks for the crash course.  as I said in the call, it was very useful.
> it's all beginning to make sense.

ah good.  it's kinda surprising that nobody has thought of this before.

> > we can "get away with this" by specifying the mode-type as part of the
> svp64 encoding... for now.
> 
> I was going to ask about that.  it seems that there's nothing in the svp64
> prefix instruction itself that tells how to decode its fields, you have to
> look at the actual insn that follows to know.

correct.  bit (haha) annoying however with bits so precious it's how it goes. the alternative is that we request a Major Opcode then use the 2 extra bits, one for 1/2 Predication, the other for 2/3 EXTRA (although to be honest, 2 more bits means 4 more modes/features...)

the sv_analysis.py program is generating tables already
https://libre-soc.org/openpower/opcode_regs_deduped/

the idea is to create CSV files which give those 2 missing bits.  it is not outside the realm of possibility to autogenerate a header file for inclusion in binutils.
 
> Once we get to a stage in which we'll want to specify svp64 fields
> separately, rather than combined into a 24-bit immediate, an explicit
> specification of mode may help the assembler, to some extent, 

autogenerated.  otherwise it's too much work (200 insns) and you get transcription errors.  dunno bout you but i don't want to have to check that.

> but the
> disassembler (and the assembler, if it's to detect inconsistencies) will
> have to look at prefix+insn as a single thing to be able to do its job.

indeed.  and PowerDecoder2 as well.  this is how it goes.

i'm not happy about it because normally RISC is not supposed to have lots of gates in the decoder.

if we were doing our own ISA from scratch these two bits, saying whether 1P/2P was set and whether EXTRA2/3 was set would definitely be part of the opcode.
Comment 8 Luke Kenneth Casson Leighton 2021-09-07 00:22:55 BST
relevant links:
http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003590.html

the only thing: it is *not* a good idea to hand-create the tables needed by binutils.  these should be *auto-generated*, teaching sv_analysis.py how to do that.

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_analysis.py;hb=HEAD

there's nothing particularly sophisticated or clever about that program: it's written in a bland, non-OO "Get It Done" style.  it:
* reads OpenPOWER ISA v3.0B CSV files containing micro-code-style instruction format information
(exactly like the tables in binutils)
* identifies and groups v3.0B instructions by identical register file profile (number of Read regs, number of Write regs, number of CR regs read etc)
* assigns an SVP64 "Style" to each (Twin/Single-predicate, 2 or 3 EXTRA bits for reg extension)
* spits out *more* CSV files with that grouping information in it, to assist in decoding

thus rather than hand-create the SVP64 decoding information in binutils it should be trivial to autogenerate c header files and c structs.

http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003592.html


no deadlines given that i am using the python class, which has a mode where it can do .S processing.  i actually had to add gas macro recognition to get that to work.

so there is a temporary workaround.  however it will become increasingly more of a priority particularly for Lauri who is working at assembler level for Video/Audio CODECs, and later for compilers.

the function entrypoint is asm_process()

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;h=45b292b4c4c32bbff548f2bf299235633d31db6c;hb=HEAD#l1052

you can see it looks for ".set" macros of the utmost basic form, example where this is used:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/Makefile;h=4dd904b6ba48f3fcae3b1ab04e1b0479e460abd4;hb=HEAD#l34

and some actual assembler containing sv.xxx opcodes, which get translated by asm_process() libe by line into ".long xxxxx; some_v3.0b_asmopcode"

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/audio/mp3/mp3_0_apply_window_float_basicsv.s;hb=HEAD

you've seen the spec page which contains the format?

 https://libre-soc.org/openpower/sv/svp64/

it's very deliberately only describing the format, not why it is what it us, or how to *use* that format (how to implement hardware etc i mean).
Comment 9 lechenko 2021-09-19 20:33:01 BST
(In reply to Luke Kenneth Casson Leighton from comment #8)

Slowly but surely, I figured out, what https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;h=45b292b4c4c32bbff548f2bf299235633d31db6c;hb=HEAD#l1052 does. 

As I understood, it translates svp64 asm mnemonics to prefix as a 32-bit literal and subsequent scalar OpenPOWER asm mnemonic. And after that translated .S-file feeds to binutils to produce binary file.

So, now we want to support the same svp64 asm mnemonics directly in binutils. But, my guess, that scalar OpenPOWER instructions are already there. Thus, few questions.

Does it mean, that we can try to implement the same two-step translation logic inside binutils? Or reuse OpenPOWER-related header files, at least?

Another one about svp64 asm syntax. As far as I understand, it is already support current version of asm syntax, but is there a spec on it? Could you share a link, please.

And the last one, for now. I guess, I can reuse/refactor both sv_analysis.py and svp64.py to generate header for binutils. Was that your intentions on how to do that task?
Comment 10 Luke Kenneth Casson Leighton 2021-09-19 22:25:30 BST
(In reply to lechenko from comment #9)
> (In reply to Luke Kenneth Casson Leighton from comment #8)
> 
> Slowly but surely, I figured out, what
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/
> trans/svp64.py;h=45b292b4c4c32bbff548f2bf299235633d31db6c;hb=HEAD#l1052
> does. 
> 
> As I understood, it translates svp64 asm mnemonics to prefix as a 32-bit
> literal and subsequent scalar OpenPOWER asm mnemonic.

because there is no binutils support for svp64, yes.  when the .long
and the 32 bit mnemonic get passed to binutils, they get converted to
binary *without* binutils ever needing to know anout svp64 assembler.

the exercise is therefore to merge the EXACT functionality of svp64.py
*into* binutils.


> And after that
> translated .S-file feeds to binutils to produce binary file.

yes.

> So, now we want to support the same svp64 asm mnemonics directly in
> binutils. But, my guess, that scalar OpenPOWER instructions are already
> there. 

yes, and yes.

> Thus, few questions.
> 
> Does it mean, that we can try to implement the same two-step translation
> logic inside binutils?

yes, exactly. or, more to the point: after "conversion" to ".long xxxxx; {equivalent v3.0B}" pass that *again* to the relevant function and get it to convert those to the appropriate binary output.

it is important to do that conversion pass *after* all the macro renaming
and expansion of registers.  gas has a builtin macro system, you cannot
process SVP64 registers until you know the actual number, 0-127.


> Or reuse OpenPOWER-related header files, at least?

yes absolutely, for goodness sake don't duplicate the entirety of power isa headers for scalar operations.


> Another one about svp64 asm syntax. As far as I understand, it is already
> support current version of asm syntax, but is there a spec on it? Could you
> share a link, please.

there is a spec, http://libre-soc.org/openpower/sv/svp64 however
i literally made up the syntax as i went along.

"need a way to indicate mapreduce, err "mr" is short, that'll do"

no kidding! :)

svpy4.py is pretty much it, alongside the consts.py and other data structures,
power_enums.py and so on.

svp64.py and the Decoder are the canonical sources at the moment until
such time as there *is* time for someone *to* write documentation like
this.


> And the last one, for now. I guess, I can reuse/refactor both sv_analysis.py
> and svp64.py to generate header for binutils. Was that your intentions on
> how to do that task?

yyyep!  or, err refactor svp64.py? ahh more, "use svp64.py as a reference
to create the exact same thing in c", and *add* to sv_analysis.py to get
it to output autogenerated headers for use in binutils.  if you look
at how the microwatt vhdl struct is autogenerated that is pretty much
exactly what is needed.

cut, paste, substitute right magic constants, done.

you will see there are some data structures in binutils headers, that list
instructions.

if you add *one* extra field to that (a pointer to a binutils-ppc-svp64-struct)
ordinarily by leaving that out of the existing structs it will default to NULL.

you can then autogenerate a binutils svp64 header full of ppc-svp64-struct entries then have a function which, before use, *fills in*, *at runtime*, the pointers.

btw you got the message about copyright assignment to the FSF? this is *really* important.  binutils code that has not had an FSF copyright assignment *cannot be accepted upstream* and whatever you did would have to be thrown away and duplicated by someone who has.
Comment 11 Luke Kenneth Casson Leighton 2021-09-20 13:24:49 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_svp64_rm.py;hb=HEAD

that's the reverse: disassembly of binary to internal data structure
for use in the simulator and in HDL.  it will give some insight.

did you receive the email with the contact details of the copyright
clerk at the FSF?
Comment 12 lechenko 2021-09-20 22:42:24 BST
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to lechenko from comment #9)
> > (In reply to Luke Kenneth Casson Leighton from comment #8)
> > Thus, few questions.
> > 
> > Does it mean, that we can try to implement the same two-step translation
> > logic inside binutils?
> 
> yes, exactly. or, more to the point: after "conversion" to ".long xxxxx;
> {equivalent v3.0B}" pass that *again* to the relevant function and get it to
> convert those to the appropriate binary output.

Okay, I shall find that function then.

> it is important to do that conversion pass *after* all the macro renaming
> and expansion of registers.  gas has a builtin macro system, you cannot
> process SVP64 registers until you know the actual number, 0-127.
> 

You mean, the conversion of 'sv.*' instruction to '.long xxxxxx; \1'? Will this macro/expansion machinery work correctly on 'sv.*' instruction?


> there is a spec, http://libre-soc.org/openpower/sv/svp64 however
> i literally made up the syntax as i went along.

Aha. This is a binary format spec and there is no spec for mnemonics per se. My guess, that I have to dig out the format from svp64.py and tests.

Also, forgot to mention last time. There are some macro processing in svp64.py. Where can I read about it?

> 
> btw you got the message about copyright assignment to the FSF? this is
> *really* important.  binutils code that has not had an FSF copyright
> assignment *cannot be accepted upstream* and whatever you did would have to
> be thrown away and duplicated by someone who has.

I noticed the email. But had no time to fill in and send. I'll deal with paperwork as soon as I'll start hacking binutils.
Comment 13 Luke Kenneth Casson Leighton 2021-09-20 23:27:59 BST
(In reply to lechenko from comment #12)

> > it is important to do that conversion pass *after* all the macro renaming
> > and expansion of registers.  gas has a builtin macro system, you cannot
> > process SVP64 registers until you know the actual number, 0-127.
> > 
> 
> You mean, the conversion of 'sv.*' instruction to '.long xxxxxx; \1'? Will
> this macro/expansion machinery work correctly on 'sv.*' instruction?

yes, a macro expandion like system should work perfectly.

SVP64 has a hard rule: you cannot do this:

     sv.X   ==>  .long NNNN; Y

it MUST be this:

     sv.X   ==>  .long NNNN; X

in other words there is not ONE SINGLE 64 bit instruction that does not
map to its corresponding 32 bit one.

therefore you can perfectly well do a runtime substitution.

> > there is a spec, http://libre-soc.org/openpower/sv/svp64 however
> > i literally made up the syntax as i went along.
> 
> Aha. This is a binary format spec and there is no spec for mnemonics per se.
> My guess, that I have to dig out the format from svp64.py and tests.

yes, sorry.  we can put a documentation budget if you would like to write
it, even if it is very sparse working notes for yourself.

> Also, forgot to mention last time. There are some macro processing in
> svp64.py. Where can I read about it?

binutils docs describe the macro system, it is ".set X Y"
i simply copied that at its most basic simplest level so that
Lauri could do some very basic macros, ".set counter r3" and
so on.  this was enough.

 
> > 
> > btw you got the message about copyright assignment to the FSF? this is
> > *really* important.  binutils code that has not had an FSF copyright
> > assignment *cannot be accepted upstream* and whatever you did would have to
> > be thrown away and duplicated by someone who has.
> 
> I noticed the email. But had no time to fill in and send. I'll deal with
> paperwork as soon as I'll start hacking binutils.

ok cool. i must send it as well.
Comment 14 lechenko 2021-10-03 17:54:56 BST
I have found a task https://bugs.libre-soc.org/show_bug.cgi?id=615 and it makes me wonder. Have been results of those discussion implemented in svp64.py?
Comment 15 Luke Kenneth Casson Leighton 2021-10-03 22:21:20 BST
(In reply to lechenko from comment #14)
> I have found a task https://bugs.libre-soc.org/show_bug.cgi?id=615 and it
> makes me wonder. Have been results of those discussion implemented in
> svp64.py?

not yet.  i thought about it (use of "?") and it makes the entire syntax
well... not understandable.

so i meant to ask about an alternative separator and haven't got round
to it yet.

probably sensible to make it a #define for now.
Comment 16 Luke Kenneth Casson Leighton 2021-11-21 18:24:10 GMT
anton, dmitry has a couple of weeks spare very shortly (tuesday), this
is precious time: what progress has been made, and do you mind if he
helps out?
Comment 17 lechenko 2021-11-22 12:44:21 GMT
Hi! Well, I didn't do much, studied code base and outlined plan of coding. But unfortunately, I have no time to do that until January. So I may brief Dmitry on my findings and pass this task to him for now.
Comment 18 Dmitry Selyutin 2021-11-24 05:07:33 GMT
Re-assigned the task to me. For now, I briefly checked the comments and patch by Alexandre; I'll try to take a look at https://libre-soc.org/openpower/sv/svp64/ and src/openpower/sv/trans/svp64.py today.
Comment 19 Luke Kenneth Casson Leighton 2021-11-24 11:00:15 GMT
(In reply to dmitry.selyutin from comment #18)
> Re-assigned the task to me. For now, I briefly checked the comments and
> patch by Alexandre; I'll try to take a look at
> https://libre-soc.org/openpower/sv/svp64/ and
> src/openpower/sv/trans/svp64.py today.

an entire new table will be needed (an extra field in the "normal"
table which defaults to NULL and is optionally filled in for any
svp64 instruction)

it is *not safe* to write that SVP64-information table by hand.

it should be entirely auto-generated, just like this table for
microwatt is auto-generated:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_analysis.py;h=d62b587521db0303189f88b57873bb6a725d7ec0;hb=e5d2a21bd25720f9267c7c8045df83163bc63a20#l653

    constant sv_minor_19_decode_rom_array :
             sv_minor_19_rom_array_t := (
        -- insn  Ptype  Etype  in1  in2  in3  out  out2  CR in  CR out  sv_in1  sv_in2  sv_in3  sv_out  sv_out2  sv_cr_in  sv_cr_out
    2#0000000000# => (P2, EXTRA3, NONE, NONE, NONE, NONE, NONE, BFA, BF, NONE, NONE, NONE, NONE, NONE, Idx1, Idx0), -- mcrf


etc etc. etc. etc.

those consts - BFA, BF, Idx1, Idx0 etc etc. - heck even those can be
auto-generated through enumeration of the corresponding magic enums
in power_enums.py:

@unique
class SVEXTRA(Enum):
    NONE = 0
    Idx0 = 1
    Idx1 = 2
    Idx2 = 3
    Idx3 = 4
    Idx_1_2 = 5  # due to weird BA/BB for crops

==>

for value in SVEXTRA:
      f.write("#define SVP64_SVEXTRA_%s %s\n" % (value.name, value.field))

something like that.

almost nothing in the entire table should be written by hand.
Comment 20 Luke Kenneth Casson Leighton 2021-11-24 13:46:07 GMT
here:

an extra pointer in powerpc_opcode:

https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=include/opcode/ppc.h;h=8c356b1ff4c0e82d7dd8b367cb27559707fd14dd;hb=84629a61ee0f459a78e245e5aa41bec73f30c4d1#l64

  35 struct powerpc_opcode
  36 {
  38   const char *name;
  42   uint64_t opcode;
  48   uint64_t mask;
  53   ppc_cpu_t flags;
  58   ppc_cpu_t deprecated;
  63   unsigned char operands[8];
       svp64_opcode_table* svp64;
  64 };

where here, the *AUTO-GENERATED* svp64_opcode_table would be "merged"
by this function:

https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=gas/config/tc-ppc.c;h=cd0ba56ddffa728f90f2a9b1d299990b435645ad;hb=84629a61ee0f459a78e245e5aa41bec73f30c4d1#l1646

into this massive table of epic proportions:

https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=opcodes/ppc-opc.c;h=bbbadffad8f62f867c53630b9bf67cfe72ecfed6;hb=84629a61ee0f459a78e245e5aa41bec73f30c4d1
Comment 21 Dmitry Selyutin 2021-11-24 17:43:00 GMT
We've discussed the state of art today with Anton. I'll dig binutils and svp64.
Comment 22 Dmitry Selyutin 2021-11-24 19:40:40 GMT
What's the current state of 615? Is there an established format?
Comment 23 Dmitry Selyutin 2021-11-24 19:44:35 GMT
(In reply to Luke Kenneth Casson Leighton from comment #20)
>   35 struct powerpc_opcode
>   36 {
>   63   unsigned char operands[8];
>        svp64_opcode_table* svp64;
>   64 };

Did you mean a standalone table, or you'd like to bound each and every Power ISA instruction to its own SVP64 table? It really looks like that you meant to add plus one opcodes[]/num_opcodes pair, but the structure layout you posted tells otherwise.

I'd really start with checking for "sv." prefix in mnemonics, and then splitting by space or slash. I, however, do not yet understand how SVP64 works; it'd be nice to have some examples like "extsw from vanilla Power ISA vs sv.extsw 5, 3 vs sv.extsw. vs sv.extsw./satu/sz/dz/sm=r3/dm=r3 5, 31" and so on.
Comment 24 Luke Kenneth Casson Leighton 2021-11-24 19:59:03 GMT
use a #definable char.

we were advised that "?" is available, but it looks.. well.. s***

   sv.addi?vec2?ew=8?sm=r3

vs

   sv.addi/vec2/ew=8/sm=r3

if you use a #defined SVP64SEP it will not hold things up.
Comment 25 Luke Kenneth Casson Leighton 2021-11-24 20:24:58 GMT
(In reply to dmitry.selyutin from comment #23)
> (In reply to Luke Kenneth Casson Leighton from comment #20)
> >   35 struct powerpc_opcode
> >   36 {
> >   63   unsigned char operands[8];
> >        svp64_opcode_table* svp64;
> >   64 };
> 
> Did you mean a standalone table, 

YES.  one that is autogenerated by sv_analysis.py.  do not even
remotely consider creating it by hand, this will be absolutely
disastrous.

> or you'd like to bound each and every Power
> ISA instruction to its own SVP64 table?

the SVP64_table to powerpc_opcode table is a many-to-one relationship.
it is not a many-to-many.

it would be absolutely distastrous to try to modify powerpc_opcode
by hand (10,000 instructions).

therefore:

1) autogenerate svp64_opcode_table
2) in ppc_setup_opcodes hunt through powerpc_opcodes looking
   for name matches powerpc_opcode->name == svp64_opcode

oh i see, yes:

a) call it "struct svp64_opcode_augmentation" (or something)
b) autogenerate a table named svp64_table[135] (whatever)
c) add a pointer to the struct svp64_opcode_augmentation

35 struct powerpc_opcode
36 {
63   unsigned char operands[8];
     struct svp64_opcode_augmentation* svp64;
64 };


> It really looks like that you meant
> to add plus one opcodes[]/num_opcodes pair, but the structure layout you
> posted tells otherwise.

yes name of table not name of struct.

> I'd really start with checking for "sv." prefix in mnemonics, and then
> splitting by space or slash. I, however, do not yet understand how SVP64
> works; it'd be nice to have some examples like "extsw from vanilla Power ISA
> vs sv.extsw 5, 3 vs sv.extsw. vs sv.extsw./satu/sz/dz/sm=r3/dm=r3 5, 31" and
> so on.

ironically - interestingly - understanding svp64 is not actually necessary
in order to translate it!  the job is that simple and self-contained.

a higher priority item is, funnily enough, to add the *32 bit* Draft opcodes
such as the FFT and DCT butterfly muladd, and the new ternlogi, all of
which need to go under a binutils --experimental-svp64 option (something
like that)

those can be done under a separate bugreport, btw.

SVP64 overview including FOSDEM video is here:
https://libre-soc.org/openpower/sv/overview/

bear in mind, there really is nothing actually useful or
relevant to this task, in there.  if working on the *simulator*
however, that would be a different matter: it would be absolutely
essential to understand SVP64.

however the data format, that is in links you will already find in
earlier comments, as well as in svp64.py etc which you should consider
to be the canonical reference guide.
Comment 26 Dmitry Selyutin 2021-11-27 20:41:04 GMT
Pushed a pair of smallish commits into sv_analysis.py so that it'll be capable of generating code for binutils. These are mostly preparations, but I think it's inevitable: I have neither desire nor time to copy&paste sv_analysis.py customizations (all these if/else branches, continues in loops and so on). I intend to decouple the format-specific parts into, well, format-specific entities. It'd be great if you could take a look and tell me if it's OK or not on the early stage. :-)
Comment 27 Luke Kenneth Casson Leighton 2021-11-27 21:06:34 GMT
(In reply to dmitry.selyutin from comment #26)
> Pushed a pair of smallish commits into sv_analysis.py so that it'll be
> capable of generating code for binutils. These are mostly preparations, but
> I think it's inevitable: I have neither desire nor time to copy&paste
> sv_analysis.py customizations (all these if/else branches, continues in
> loops and so on). I intend to decouple the format-specific parts into, well,
> format-specific entities. It'd be great if you could take a look and tell me
> if it's OK or not on the early stage. :-)

yyeah go for it.  try not to go too overboard (a la jinja2) it's just
not worth it.  see how it goes: if there turns out to be a hell of a lot
of "if format == " statements then some sort of *really small* API would
be worth considering, i do tend to find however that the readability of
such APIs is absolutely awful and after about the 5th one you deeply
regret having created them, and go back to code-duplication, where at
least the damn code is linear, shorter, and bleedin obvious.

see how you get on.  this could be short enough to work out okay: it's
only 50 lines within the cvs.svp64.items() loop.

what might be handy would be to drop in some namedtuples. or perhaps
if i'd put in comment it would help, ehn?
Comment 28 Dmitry Selyutin 2021-11-27 21:25:34 GMT
I actually prefer dataclasses to namedtuples, these tend to be more flexible, especially for code generation. And it's also one of rare cases where type annotations _are_ useful (though, as usual, in a limited way).
Comment 29 Luke Kenneth Casson Leighton 2021-11-28 03:20:15 GMT
(In reply to dmitry.selyutin from comment #28)
> I actually prefer dataclasses to namedtuples, these tend to be more
> flexible, especially for code generation.

if this was even 300 lines i would agree with you no question.

however this is not 300 lines: it is one single for-loop
on one single dictionary, with a scant 30 lines inside
that one single for-loop.

it is under absolutely no circumstances whatsoever going
to get more complicated than that one single dictionary.
there will NEVER be anything else added, ever.

consequently attempting to add complex infrastructure normally
suited to large sophisticated code generation is in danger of creating massive
code bloat.

to give you some idea of how inappropriate code generator
technology is here: there is so little to output here that
it could well be covered entirely by a single 64 bit uint64_t
per row, comprising the ORing of a series of #defines:
not even a struct need be created (unlike in the VHDL case).
Comment 30 Luke Kenneth Casson Leighton 2021-11-28 15:25:37 GMT
(In reply to Luke Kenneth Casson Leighton from comment #29)

> it could well be covered entirely by a single 64 bit uint64_t
> per row, comprising the ORing of a series of #defines:
> not even a struct need be created (unlike in the VHDL case).

... possibly.  there's actually quite a lot.  looking at the VHDL:

        -- insn  Ptype  Etype  in1  in2  in3  out  out2  CR in  CR out  sv_in1  sv_in2  sv_in3  sv_out  sv_out2  sv_cr_in  sv_cr_out
2#0100001010# => (P1, EXTRA3, RA, RB, NONE, RT, NONE, NONE, CR0, Idx1, Idx2, NONE, Idx0, NONE, NONE, Idx0), -- add

so:

* sv_* fields are all 3-bit (5 possible values):  (None, Idx0, Idx1, Idx2, Idx3)
* there are 7 of those (in1, in2, in3, out, out2, cr_in, cr_out)
* that totals 21 bits
* Ptype (predicate type) is 1 bit: PTYPE_P1 or PTYPE_P2
* Etype (EXTRA fields type) is 1 bit: EXTRA2 or EXTRA3

(total 23 so far)

but, ahh, where there is sv_in1 .... etc. there are also 7 fields
in1, in2, in3, out, out2, CRin, Crout which tell you *which register*
those 7 sv_* fields are associated with.  in the example above:

* in1 = RA, but sv_in1 = Idx1.  therefore:
  - RA is an input
  - RA EXTRA3 encoding is in position Idx1
* in2 = RB, but sv_in2 = Idx2 therefore:
  - RB is an input
  - RB EXTRA3 encoding is in position Idx2
...
* CRout = CR0, but sv_cr_out = Idx0.  therfore
  - CR0 is an **OUTPUT**
  - CR0 EXTRA3 encoding is in position Idx0

now, the number of possible entries for in1/in2 etc. is:

* RA
* RB
* RC
* RT
* EA
* FRA
* FRB
* FRC
* FRS
* FRT
* CR0
* CR1

so 12 possible values, requiring 4 bits.  there are 7 of them,
therefore another 4*7=28 bits

28 + 23 == 51

ooooo it's still achievable to fit into all the SVP64 table
encoding information into a single uint64_t
Comment 31 Dmitry Selyutin 2021-11-30 20:49:53 GMT
Pushed few more commits, now it looks like the whole code generation is format-based. I don't quite like the part where we convert CSV entry into (op, insn, row) stuff, this should likely be in some function or whatever, but maybe later. Anyway, I think the code is ready to output what we need for binutils.
Comment 32 Dmitry Selyutin 2021-11-30 20:51:22 GMT
For the record, I completely dislike that we (ab)use print for logging. I'd rather prefer using print to output the file. But I guess that's one of really "way too late" remarks. :-)
Comment 33 Luke Kenneth Casson Leighton 2021-12-01 16:33:06 GMT
(In reply to dmitry.selyutin from comment #31)
> Pushed few more commits, now it looks like the whole code generation is
> format-based. I don't quite like the part where we convert CSV entry into
> (op, insn, row) stuff, this should likely be in some function or whatever,
> but maybe later.

yes.  it's quick and dirty, Gets The Job Done.

> Anyway, I think the code is ready to output what we need
> for binutils.

brilliant.

(In reply to dmitry.selyutin from comment #32)
> For the record, I completely dislike that we (ab)use print for logging. I'd
> rather prefer using print to output the file. But I guess that's one of
> really "way too late" remarks. :-)

feel free to convert to the openpower.util.log function:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/util.py;h=9be902dbd8721f32667cc054cc789514976f089d;hb=4328f255d67d40ce7caef6b42799bcab2ec5a733#l64

if there was any UI-related code (if console was important) it would
matter greatly.  it doesn't... therefore it's a matter of aesthetics...
therefore if time permits feel free to convert to the log function.
Comment 34 Dmitry Selyutin 2021-12-07 19:02:50 GMT
The progress so far: I've updated the binutils sources (they're vastly outdated), and also made some changes to gas to make it parse some of our extensions (it seems all but bc stuff). After parsing, nothing is done for now: we discard parsing results. This is an intermediate stage to let interested parties to take a look at, until things go too far. Please refer to binutils-gdb svp64 branch.

https://git.libre-soc.org/?p=binutils-gdb.git;a=shortlog;h=refs/heads/svp64
Comment 35 Luke Kenneth Casson Leighton 2021-12-07 20:54:20 GMT
(In reply to dmitry.selyutin from comment #34)
> The progress so far: I've updated the binutils sources (they're vastly
> outdated), 

yes, no rush there.

> and also made some changes to gas to make it parse some of our
> extensions (it seems all but bc stuff).

fantastic.

> After parsing, nothing is done for
> now: we discard parsing results. This is an intermediate stage to let
> interested parties to take a look at, until things go too far. Please refer
> to binutils-gdb svp64 branch.
> 
> https://git.libre-soc.org/?p=binutils-gdb.git;a=shortlog;h=refs/heads/svp64

ah good it's a branch.  wheww

it looks really good: the only thing that's a little odd is:

+static char *
+svp64_decode_ff_or_pr(char *str, int *value)
+{
+  size_t i;
+  static const struct svp64_ff_or_pr_map table[] = {
+    SVP64_FF_OR_PR_MAP ("RC1" , SVP64_FF_OR_PR_RC1),
+    SVP64_FF_OR_PR_MAP ("~RC1", SVP64_FF_OR_PR_NRC1),
+    SVP64_FF_OR_PR_MAP ("lt"  , 0),
+    SVP64_FF_OR_PR_MAP ("nl"  , 1),
+    SVP64_FF_OR_PR_MAP ("ge"  , 1), /* same value as nl */
+    SVP64_FF_OR_PR_MAP ("gt"  , 2),

which is making me wonder where the heck "pr=" and "ff=" are
coming from :)

decode_predicate() i totally get.  the lt/ge/gt should not
be in the list.

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;h=45b292b4c4c32bbff548f2bf299235633d31db6c;hb=ca475c00b80c1b66d03505e1d7ba6f26a379ff2e#l676

decode_ffirst() should be:

 121 # partial-decode fail-first mode
 122 def decode_ffirst(encoding):
 123     if encoding in ['RC1', '~RC1']:
 124         return encoding
 125     return decode_bo(encoding)

so only RC1 or ~RC1

other than that it looks really good.
Comment 36 Luke Kenneth Casson Leighton 2021-12-07 21:06:24 GMT
one thing: do cross-reference (in the source code):

a) this bugreport
b) the svp64 spec
c) put "this is draft / experimental" but with a few more words
   around it.

one other thing: it's *really* important that you sign a copyright
"assignment-then-get-an-automatic license-back-to-your-own-work"
agreement with the FSF.

email copyright-clerk@fsf.org and cross-reference this bugreport,
probably put that you want to work on gcc, binutils and gdb as well.
Comment 37 Dmitry Selyutin 2021-12-13 19:09:26 GMT
(In reply to Luke Kenneth Casson Leighton from comment #35)

> it looks really good: the only thing that's a little odd is:
> 
> +static char *
> +svp64_decode_ff_or_pr(char *str, int *value)

> so only RC1 or ~RC1
> 
> other than that it looks really good.

In Python, the function returns either int or str, which looks kinda bad for C; instead of this, I added special values, -1 and -2, corresponding to RC1 and ~RC1. These are handled specially. The exact equivalent of Python code looks quite unorthodox to me; I'm open to suggestions on how to implement it better (including the function name and the semantics). For starters, I'll split the routine svp64_decode_ff_or_pr into two, where one is svp64_decode_bo.
Comment 38 Luke Kenneth Casson Leighton 2021-12-13 19:49:03 GMT
+static char *
+svp64_decode_ff_or_pr(char *str, int *value)
+{
+  size_t i;
+  static const struct svp64_ff_or_pr_map table[] = {
+    SVP64_FF_OR_PR_MAP ("RC1" , SVP64_FF_OR_PR_RC1),
+    SVP64_FF_OR_PR_MAP ("~RC1", SVP64_FF_OR_PR_NRC1),
+}

you missed that the only permitted options for pr=
and ff= are RC1 and ~RC1.

look again at the equivalent python.

the addition of le lt ge etc is most likely a cut/paste error
Comment 39 Luke Kenneth Casson Leighton 2021-12-13 19:51:12 GMT
 125     return decode_bo(encoding)
 126 

urrr i cannot read code i wrote 6 months ago.  sigh.
no all good, you are on the right track.
Comment 40 Dmitry Selyutin 2021-12-18 12:26:10 GMT
Split the routine into parts so that it's clearer what's going on. An hour is lost trying to understand why gas doesn't understand `sv.setb/dm=r3/sm=1<<r3 5, 7`; turns out I must've used `-mpower9` switch... Now on my way to considering is_ldst and is_bc.
Comment 41 Luke Kenneth Casson Leighton 2021-12-18 15:35:14 GMT
(In reply to dmitry.selyutin from comment #40)
> Split the routine into parts so that it's clearer what's going on. An hour
> is lost trying to understand why gas doesn't understand
> `sv.setb/dm=r3/sm=1<<r3 5, 7`; turns out I must've used `-mpower9` switch...

oo sorry :)  yyeah, setb is probably a v3.0 instruction... yyep,
see v3.0B p1225 "p122 setb v3.0 Set Boolean"

> Now on my way to considering is_ldst and is_bc.

yyeah basically there, there are different SVP64 "mode" tables for:

* LD/ST instructions
* Branch instructions
* Condition instructions
* everything-else instructions ("normal")

(i haven't put Condition instruction mode-format into sv/trans/svp64.py yet)

see mode section:

https://libre-soc.org/openpower/sv/svp64/

which shows these pages:

* https://libre-soc.org/openpower/sv/cr_ops/
* https://libre-soc.org/openpower/sv/ldst/
* https://libre-soc.org/openpower/sv/branch/
* https://libre-soc.org/openpower/sv/normal/

and you can see the different specialist tables giving alternative
interpretations for the "Mode" bits

watch out for CR and Branch ops because actually some of the
*OTHER* fields (elwidth for example) are used as additional
mode bits (because of running out of space)

i do apologise that all the bits in the specs are in MSB0 order.
if you take enough drugs you either melt your brain enough to
follow along or, more likely, don't care one way or the other.
Comment 42 Luke Kenneth Casson Leighton 2021-12-18 15:36:37 GMT
(putting in a preliminary budget for this, i'll push it up if needed)
Comment 43 Dmitry Selyutin 2021-12-18 20:43:16 GMT
Added ldst/bc checks. I also squashed some commits, so now it appears as one single commit "support parsing". There are also some minor fixes and code cleanup.
Comment 44 Dmitry Selyutin 2021-12-18 20:45:11 GMT
I assumed that bc_svstep and bc_step indicate the same[0]; also I don't for now check whether we've already encountered bc mode so far[1].

[0] https://libre-soc.org/irclog/%23libre-soc.2021-12-18.log.html#t2021-12-18T20:27:00
[1] https://libre-soc.org/irclog/%23libre-soc.2021-12-18.log.html#t2021-12-18T20:32:18
Comment 45 Dmitry Selyutin 2021-12-18 20:58:34 GMT
There will be more cleanup ahead, stay tuned. The variable initialization in svp64.py is somewhat inconsistent, I assume the meaning of variables changed along with the design. :-) The current state is somewhat inconsistent, I'll choose something between flags and unions and finally unify it soon.

UPD: done; now everything is declared as bit fields. Not really that I like using these, but here I think they're OK: don't think we need to bother with ordering etc. inside the standalone app. Could've even used standalone booleans (but this is even worse).
Comment 46 Luke Kenneth Casson Leighton 2021-12-18 23:52:15 GMT
i must apologise: sv.bc in svp64.py is slightly behind what's actually
in the spec.  "svstep" mode turned out to be fantastically complex in
hardware and i removed it.  the merging of the capability of the
svstep instruction into branch turned out to be CISC, not RISC,
and in RISC you keep things as two "simpler" instructions.

CTR-mode on the other hand is a simple variant of standard bcctr.
it's dead easy.

VLset mode is also not that hard.
Comment 47 Luke Kenneth Casson Leighton 2021-12-18 23:54:02 GMT
(In reply to dmitry.selyutin from comment #44)
> I assumed that bc_svstep and bc_step indicate the same[0];

svstep mode is not in the spec any more.

https://libre-soc.org/openpower/sv/branches/
Comment 48 Dmitry Selyutin 2021-12-19 13:38:07 GMT
lkcl: I cannot map the branches spec[0] to "encmodes" (as per svp64.py argot). I got that we have SNZ, ALL and ~ALL as encmodes (the last one is also omitted from svp64.py).

Luke, could you, please, a) help me to read the spec so that I have a clear vision how it maps into encmodes, or b) give an exhaustive list of things I need to consider as encmodes along with flags to be set, or c) update svp64.py respectively?

Ideally some table would be great, like:
MODE      BITS
ALL       bc_all=1
COCOJUMBO bc_coco=0b11, bc_jumbo=0b101
Comment 49 Dmitry Selyutin 2021-12-19 13:43:43 GMT
For now, I've dropped "st" and "sr" encmodes, and also dropped bc_svstep and bc_brc bits.
Comment 50 Luke Kenneth Casson Leighton 2021-12-19 16:14:56 GMT
(In reply to dmitry.selyutin from comment #48)
> lkcl: I cannot map the branches spec[0] to "encmodes" (as per svp64.py
> argot). I got that we have SNZ, ALL and ~ALL as encmodes (the last one is
> also omitted from svp64.py).

yes because by default it is non-all ("some") therefore having ~ALL is a
waste of space.

> Luke, could you, please, a) help me to read the spec so that I have a clear
> vision how it maps into encmodes, or b) give an exhaustive list of things I
> need to consider as encmodes along with flags to be set, or c) update
> svp64.py respectively?

honestly i don't have time (NGI POINTER contract) and would appreciate if
you could do it.
  
the table *is* the table, which is "Format and fields".

each bit to be set requires a corresponding assembler slash-mnemonic
(in lower-case), the only exceptions being:

* /ctr obviously sets bit 19, and /cti sets bit 6.
  but having /ctr/cti is a total waste of space.  therefore, logically,
  /cti should set bit 19 *AND* bit 6

* similarly for sz and snz.  if you specify /snz then it *implies* /sz
  and so *both* bit 5 and 23 must be set when /snz is specified.

please do make something up for bits 20, 21 and 7, along the same lines:
make it "compact", logical, and keep to a maximum of 3 letters per
mnemonic.  vs for "VLSET mode" (bit 20) but "vsb" for *BOTH* bit 20 *AND*
bit 7.  vi and vib likewise involve bit 21 as well.

just... make it up as you go along, keeping it as compact as possible.
some instructions can easily already be well over 25 characters, which
is of some concern.
Comment 51 Dmitry Selyutin 2021-12-19 19:39:45 GMT
lkcl: OK, I'll take a look at branches page and try to come up with something sensible. Since I'm now close to re-using ISA tables, I took a step back to take a look at what I did with sv_analysis.py, and I must admit I don't really like it (in fact, I don't like it at all). I'm going to revert these bits, and, instead, will provide a standalone script which parses VHDL.
An overall parsing is present at src/openpower/sv/sv_binutils.py.
Comment 52 Luke Kenneth Casson Leighton 2021-12-19 22:40:37 GMT
(In reply to dmitry.selyutin from comment #51)
> lkcl: OK, I'll take a look at branches page and try to come up with
> something sensible.

star.  there do exist a couple of svp64 branches, do try not to
bust them :)  they will be using bits and syntax already implemented


> Since I'm now close to re-using ISA tables, I took a
> step back to take a look at what I did with sv_analysis.py, and I must admit
> I don't really like it (in fact, I don't like it at all).

i did try to tell you it's way simpler than you thought it would be :)

> I'm going to
> revert these bits, and, instead, will provide a standalone script which
> parses VHDL.

that's a great idea.

> An overall parsing is present at src/openpower/sv/sv_binutils.py.

will take a look
Comment 53 Luke Kenneth Casson Leighton 2021-12-19 22:45:14 GMT
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=393db2bedb8bc307a40bbd54f5eaefede35dfa2f

y'all know i can't read regex's, right? :) they melt my brain
which is why i used the longer more verbose explicit style.
i do love the one-line-per-thing though, but would ask that
you put some sort of english-language explanation of the
opcode one.

(edit: an example would help.  this input XYZ is turned into ABC)
Comment 54 Dmitry Selyutin 2021-12-25 18:33:29 GMT
Extended gas (and likely other binutils, since it's a common code) with -mlibresoc switch, which, amongst other stuff, sets PPC_OPCODE_SVP64 flag. I'm not sure what'd be the right latest POWER version, so I took "-mfuture" and added one more flag there.
Also pushed some rather minor changes to sv_binutils script so that it groups fields into a dataclass. I'll add a couple of methods there so that it's capable of generating the C code. For now, the main question is, what to do with codes like `-----01101`? If we consider `struct powerpc_opcode` as a "quite-close-but-not-exactly-the-same" thing, do these hyphens correspond to `mask` and `flags` members?
Comment 55 Luke Kenneth Casson Leighton 2021-12-25 23:49:36 GMT
(In reply to dmitry.selyutin from comment #54)
> Extended gas (and likely other binutils, since it's a common code) with
> -mlibresoc switch, which, amongst other stuff, sets PPC_OPCODE_SVP64 flag.
> I'm not sure what'd be the right latest POWER version, so I took "-mfuture"
> and added one more flag there.

binutils-dev, best place to ask.

> Also pushed some rather minor changes to sv_binutils script so that it
> groups fields into a dataclass. I'll add a couple of methods there so that
> it's capable of generating the C code.

i did the math in an earlier comment, a uint64_t is sufficient to hold all
the bits.

> For now, the main question is, what
> to do with codes like `-----01101`? If we consider `struct powerpc_opcode`
> as a "quite-close-but-not-exactly-the-same" thing, do these hyphens
> correspond to `mask` and `flags` members?

"-" means simply "don't care, ignore".  masked out, yes.
Comment 56 Luke Kenneth Casson Leighton 2021-12-26 14:33:04 GMT
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_binutils.py;hb=HEAD

erm... ermerm... that looks like its input is the
VHDL rather than the csv files, which is a bit "um"
shall we say.  it makes the VHDL a necessary prerequisite
and that should definitely be documented in *both*
source files.

otherwise someone changes the VHDL and breaks binutils,
or, as in my case, wonders what the heck is going on.
took me 10 days to realise!
Comment 57 Dmitry Selyutin 2021-12-26 15:05:31 GMT
Yeah, I really want to avoid duplicating this whole parsing done in sv_analysis, especially in the form of copy & paste. This is barely maintainable.
That said, it's really simple to make sv_analysis output one big CSV instead of VHDL, so perhaps we can do it and feed CSV to sv_binutils. This will eliminate regex, but I think we can tolerate this loss.
Comment 58 Dmitry Selyutin 2021-12-26 18:08:51 GMT
(In reply to Luke Kenneth Casson Leighton from comment #55)
> binutils-dev, best place to ask.

I thought that's a question to us, perhaps I've asked it badly... I mean, do we consider POWER10 instructions or only consider POWER9? From openpower-isa it looks like it's POWER9; I've chosen POWER10 as present in "-mfuture", but I can change it easily. Below are excerpts on how these -mpower10/-mfuture/-mlibresoc look like in code (basically we're "-mfuture" with PPC_OPCODE_SVP64 flag).

  { "power10", (PPC_OPCODE_PPC | PPC_OPCODE_ISEL | PPC_OPCODE_64
		| PPC_OPCODE_POWER4 | PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6
		| PPC_OPCODE_POWER7 | PPC_OPCODE_POWER8 | PPC_OPCODE_POWER9
		| PPC_OPCODE_POWER10 | PPC_OPCODE_ALTIVEC | PPC_OPCODE_VSX),
    0 },
  { "libresoc",  (PPC_OPCODE_PPC | PPC_OPCODE_ISEL | PPC_OPCODE_64
		| PPC_OPCODE_POWER4 | PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6
		| PPC_OPCODE_POWER7 | PPC_OPCODE_POWER8 | PPC_OPCODE_POWER9
		| PPC_OPCODE_POWER10 | PPC_OPCODE_ALTIVEC | PPC_OPCODE_VSX
		| PPC_OPCODE_SVP64),
    0 },
  { "future",  (PPC_OPCODE_PPC | PPC_OPCODE_ISEL | PPC_OPCODE_64
		| PPC_OPCODE_POWER4 | PPC_OPCODE_POWER5 | PPC_OPCODE_POWER6
		| PPC_OPCODE_POWER7 | PPC_OPCODE_POWER8 | PPC_OPCODE_POWER9
		| PPC_OPCODE_POWER10 | PPC_OPCODE_ALTIVEC | PPC_OPCODE_VSX),
    0 },
Comment 59 Dmitry Selyutin 2021-12-26 18:11:23 GMT
Is src/openpower/decoder/power_svp64.py also OK and not outdated? Seems to be really closer to what I'd like to have for an input, compared to sv_analysis.py. As I see src/openpower/decoder/power_decoder.py uses it, e.g. create_pdecode function.
Comment 60 Luke Kenneth Casson Leighton 2021-12-26 18:55:28 GMT
(In reply to dmitry.selyutin from comment #58)
> (In reply to Luke Kenneth Casson Leighton from comment #55)
> > binutils-dev, best place to ask.
> 
> I thought that's a question to us, perhaps I've asked it badly... I mean, do
> we consider POWER10 instructions or only consider POWER9?

mmm right, got it.  at the moment it's only POWER9.  we *may* end up
cherry-picking certain POWER ISA v3.1 instructions particularly some
of the logical ones (centrifuge for example) but that's another story.
Comment 61 Dmitry Selyutin 2022-01-05 16:42:56 GMT
Hi folks, hope you had good holidays! Here's the state of art:
- Refactored sv_binutils so that the code generation is handled by separate entities, e.g. enums simply inherit from power_enums and add some C code generation capabilities atop.
- From now on, we use exactly SVP64RM class directly, as sv_analysis and decoders do; the stuff we get is then converted into the main structure (struct svp64_entry).
- You now should be able to generate the header via $(SILENCELOG=true python3 src/openpower/sv/sv_binutils.py ppc-opc-svp64.h). SILENCELOG is a MUST, unless our util.log is converted into something which doesn't pollute stdout.
Comment 62 Luke Kenneth Casson Leighton 2022-01-05 17:21:41 GMT
(In reply to dmitry.selyutin from comment #61)
> Hi folks, hope you had good holidays! Here's the state of art:
> - Refactored sv_binutils so that the code generation is handled by separate
> entities, e.g. enums simply inherit from power_enums and add some C code
> generation capabilities atop.

sensible

> - From now on, we use exactly SVP64RM class directly, as sv_analysis and
> decoders do; the stuff we get is then converted into the main structure
> (struct svp64_entry).

exccellent.

> - You now should be able to generate the header via $(SILENCELOG=true
> python3 src/openpower/sv/sv_binutils.py ppc-opc-svp64.h). SILENCELOG is a
> MUST, unless our util.log is converted into something which doesn't pollute
> stdout.

can i suggest using getopt or just say "nuts to that", take one argument
(sys.argv[1]) which is the directory into which the ppc-opc-svp64.[ch]
files are placed?

we are not going for sophistication here :)
Comment 63 Luke Kenneth Casson Leighton 2022-01-07 20:01:08 GMT
dmitry it occurred to me yesterday that if autogenerating a c file
this is way more than i would expect.

i was expecting a header file only which contained enough info
such that a non-auto-generated c file could do the correct
actions such as merging the svp64 table entries into ppc opts

what were you considering placing in an autogenerated c file?
Comment 64 Dmitry Selyutin 2022-01-07 22:16:31 GMT
(In reply to Luke Kenneth Casson Leighton from comment #63)
> what were you considering placing in an autogenerated c file?

Mostly the array of entries (cf. struct svp64_entry per ppc-svp64-opc.h). Sure it could go to header as well; I'd prefer this to be a standalone entity, though.
Comment 65 Luke Kenneth Casson Leighton 2022-01-08 01:50:56 GMT
(In reply to dmitry.selyutin from comment #64)
> (In reply to Luke Kenneth Casson Leighton from comment #63)
> > what were you considering placing in an autogenerated c file?
> 
> Mostly the array of entries (cf. struct svp64_entry per ppc-svp64-opc.h).\\

see comment #30, i was expecting it to be a single uint64_t per
instruction - no struct at all - just a series of #defined macros
which extract the relevant bits from the required locations in the
uint64_t.

this is exactly the same technique used elsewhere in binutils.

therefore there should be no struct entries per se - just a list
of #defined entries, followed by ORing them together, and a matching
set of macros to extract the required information.
Comment 66 Dmitry Selyutin 2022-01-08 06:32:50 GMT
(In reply to Luke Kenneth Casson Leighton from comment #65)
> therefore there should be no struct entries per se - just a list
> of #defined entries, followed by ORing them together, and a matching
> set of macros to extract the required information.

Please explain then, how do you expect to bind these on a per-instruction basis. Let's start with some instruction from some CSV. I compressed the bits into a single uint64 (bitfields) inside struct svp64_entry.
Comment 67 Luke Kenneth Casson Leighton 2022-01-08 13:21:09 GMT
(In reply to dmitry.selyutin from comment #66)
> (In reply to Luke Kenneth Casson Leighton from comment #65)
> > therefore there should be no struct entries per se - just a list
> > of #defined entries, followed by ORing them together, and a matching
> > set of macros to extract the required information.
> 
> Please explain then, how do you expect to bind these on a per-instruction
> basis. Let's start with some instruction from some CSV. I compressed the
> bits into a single uint64 (bitfields) inside struct svp64_entry.

brilliant, that's great.  btw do disregard what i said about one file,
i realised late last night that two are needed: one for defines, one for
the list. took me a while, sorry, too much thinking in python :)

there should not be a list of 10,000 entries like in binutils,
that would be insane and also tie the codegenerator to the state
of binutils.

my thoughts here were to have an empty (0x0) value in the binutils
table (struct ppc_something) QTY 10,000 and, deep breath, when
creating the hashtable in binutils to at that
time *also* make a lookup for the corresponding
svp64 entry and fill in the empty field.

basically a manual implementation of a SQL database "LEFT JOIN"
(left, because some entries in the binutils hashtable will still
be 0x0 - NULL. a right join you would REMOVE the missing binutils
entry which is not what we want)

if both the 10000 binutils entries and the 200-or-so svp64 
entries are in strict alphabetical order then the LEFT JOIN
algorithm is O(N) because a lexicographic comparison can
simply increment each list index by one.
Comment 68 Dmitry Selyutin 2022-01-08 14:42:37 GMT
(In reply to Luke Kenneth Casson Leighton from comment #67)
> brilliant, that's great.  btw do disregard what i said about one file,
> i realised late last night that two are needed: one for defines, one for
> the list. took me a while, sorry, too much thinking in python :)

OK, so we both agree that there must be at least _some_ array, even if this array contains only uint64_t (I doubt, though: there must be at least something which binds svp64 entry and ppc opcode, be it instruction name or opcode itself).

> there should not be a list of 10,000 entries like in binutils,
> that would be insane and also tie the codegenerator to the state
> of binutils.

283 entries totally. :-)

> my thoughts here were to have an empty (0x0) value in the binutils
> table (struct ppc_something) QTY 10,000 and, deep breath, when
> creating the hashtable in binutils to at that
> time *also* make a lookup for the corresponding
> svp64 entry and fill in the empty field.

I thought of standalone hash but I think that's mostly implementation detail.

> the 10000 binutils entries and the 200-or-so svp64 
> entries are in strict alphabetical order then the LEFT JOIN
> algorithm is O(N) because a lexicographic comparison can
> simply increment each list index by one.

PPC entries are alphabetically ordered, inevitably, so will be ours due to the same rationale. :-) There's no comparison or sorting yet in Python, but yes, this is needed. Again, not sure if we need to touch existing structures by adding the pointer, I'd rather prefer a separate stuff to keep the code as decoupled as possible.
Comment 69 Luke Kenneth Casson Leighton 2022-01-08 15:34:58 GMT
(In reply to dmitry.selyutin from comment #68)

> OK, so we both agree that there must be at least _some_ array, even if this
> array contains only uint64_t (I doubt, though: there must be at least
> something which binds svp64 entry and ppc opcode, be it instruction name or
> opcode itself).

errr oh yes, you're right: instruction name (as an ascii string) followed
by uint64_t.  doh.

> > there should not be a list of 10,000 entries like in binutils,
> > that would be insane and also tie the codegenerator to the state
> > of binutils.
> 
> 283 entries totally. :-)

fuuun
 
> > my thoughts here were to have an empty (0x0) value in the binutils
> > table (struct ppc_something) QTY 10,000 and, deep breath, when
> > creating the hashtable in binutils to at that
> > time *also* make a lookup for the corresponding
> > svp64 entry and fill in the empty field.
> 
> I thought of standalone hash but I think that's mostly implementation detail.

true.  my thoughts there were that the lexicographical O(N) trick during
actual creating of the ppc_something binutils hashtree would avoid the
need for a standalone hash.

and, also, avoid the CPU cycles of double hashtree lookups, which has
to be taken into consideration due to the sheer overwhelming size of
executables these days.

> > the 10000 binutils entries and the 200-or-so svp64 
> > entries are in strict alphabetical order then the LEFT JOIN
> > algorithm is O(N) because a lexicographic comparison can
> > simply increment each list index by one.
> 
> PPC entries are alphabetically ordered, inevitably, so will be ours due to
> the same rationale. :-) 

ah superb.  so yes, a LEFT-JOIN style trick relying on lexicographic order
would work.  the only thing being, names need to match without the "o"
"." and "o." on them (or, re-create those names).

> There's no comparison or sorting yet in Python, but
> yes, this is needed. Again, not sure if we need to touch existing structures
> by adding the pointer, I'd rather prefer a separate stuff to keep the code
> as decoupled as possible.

high performance in binutils is essential, and having double hashtable lookups
would likely raise legitimate complaints.

plus, by doing a LEFT-JOIN-like merge, that is only done once [and is
pretty optimal as long as the two lists are pre-sorted], whereas double
hashtable lookups are on every single instruction.
Comment 70 Dmitry Selyutin 2022-01-09 15:14:14 GMT
(In reply to Luke Kenneth Casson Leighton from comment #69)
> (In reply to dmitry.selyutin from comment #68)
> 
> > OK, so we both agree that there must be at least _some_ array, even if this
> > array contains only uint64_t (I doubt, though: there must be at least
> > something which binds svp64 entry and ppc opcode, be it instruction name or
> > opcode itself).
> 
> errr oh yes, you're right: instruction name (as an ascii string) followed
> by uint64_t.  doh.

It also looks like that at least a pair of two uint32_t is needed to represent opcode plus mask (cf. opcode in extra.csv).

> and, also, avoid the CPU cycles of double hashtree lookups, which has
> to be taken into consideration due to the sheer overwhelming size of
> executables these days.
> ...
> plus, by doing a LEFT-JOIN-like merge, that is only done once [and is
> pretty optimal as long as the two lists are pre-sorted], whereas double
> hashtable lookups are on every single instruction.

Ah, OK, I got the rationale. Well, for this particular case, we could've even searched entries in a sorted array, since count of entries is quite limited (cache-friendly 283 entries; makes even more sense to keep these compact). However, since we have to lookup PPC entry anyway, regardless of whether we deal with SVP64 or not, I think saving one lookup is totally reasonable. I'll need to consult you when I start doing this, since I've never used SQL, but I think I get the overall idea. Thanks for the tip!
Comment 71 Dmitry Selyutin 2022-01-09 15:19:14 GMT
A question on matching the names: we have bunch of instructions named like "4/6=mtfsfi", "26/6=fmrgow", etc. In good ol' PPC, these have no prefixes, I assume SVP64 added them? Should I simply discard everything before '=' sign to keep the name as it appears in binutils?
Comment 72 Jacob Lifshay 2022-01-09 15:37:05 GMT
(In reply to dmitry.selyutin from comment #71)
> A question on matching the names: we have bunch of instructions named like
> "4/6=mtfsfi", "26/6=fmrgow", etc. In good ol' PPC, these have no prefixes, I
> assume SVP64 added them? Should I simply discard everything before '=' sign
> to keep the name as it appears in binutils?

we need to eventually get around to cleaning up our .csv files...those entries should be moved to a field vhdl_comment or something (they were originally comments in microwatt's decoder's source) ...allowing the comment field (really mnemonic) to be just mtfsfi.

dmitry, taking the stuff after an = should be good enough for now.
Comment 73 Dmitry Selyutin 2022-01-09 15:55:04 GMT
(In reply to Jacob Lifshay from comment #72)
> dmitry, taking the stuff after an = should be good enough for now.

I actually kinda worry that we rely on the "name" field being an implicit part of "comment" field. That's what sv_analysis does, sure, but I don't find it to be a good idea, frankly speaking.

> allowing the comment field (really mnemonic) to be just mtfsfi.

Raised an issue on this; cf. https://bugs.libre-soc.org/show_bug.cgi?id=761.
Comment 74 Dmitry Selyutin 2022-01-09 16:40:35 GMT
Wow, inheriting power_enums.py stuff was really a nice idea. With the recent updates from Jacob, I see stuff like `SVP64_IN2SEL_CONST_XBI` for free!
Comment 75 Luke Kenneth Casson Leighton 2022-01-09 17:19:43 GMT
(In reply to dmitry.selyutin from comment #71)
> A question on matching the names: we have bunch of instructions named like
> "4/6=mtfsfi", "26/6=fmrgow", etc. 

not quite: the developers of microwatt placed the instruction name
into decode1.vhdl and decided in the case of FP instructions to
(consistently) subdivide the XO field into a pair of HI-LO numbers.


> In good ol' PPC, these have no prefixes, I
> assume SVP64 added them?

i don't follow the question, but the above may explain and make it moot.

> Should I simply discard everything before '=' sign
> to keep the name as it appears in binutils?

yes.  this is exactly what is already done in power_decoder.py (and in
the CSV-reading classes that you have already been using).

(In reply to dmitry.selyutin from comment #70)

> > errr oh yes, you're right: instruction name (as an ascii string) followed
> > by uint64_t.  doh.
> 
> It also looks like that at least a pair of two uint32_t is needed to
> represent opcode plus mask (cf. opcode in extra.csv).

ah no, definitely not.  the name of the instruction is the "unique key".
you *do not* need the opcode.  the *name* is unique and can be used as
the match.

(only thing is, you need to match "add." against "add")

> Ah, OK, I got the rationale. Well, for this particular case, we could've
> even searched entries in a sorted array, since count of entries is quite
> limited (cache-friendly 283 entries; makes even more sense to keep these
> compact). However, since we have to lookup PPC entry anyway, regardless of
> whether we deal with SVP64 or not, I think saving one lookup is totally
> reasonable. I'll need to consult you when I start doing this, since I've
> never used SQL, but I think I get the overall idea. Thanks for the tip!

something like this:

ppc_insns = [{name="addi", blahblah, svp64stuff=0x0},
             {name="addeo", blahblah, svp64stuff=0x0},
             {name="fredinsn", ....}
             ... ]
svp64_mods = [{name="addi", svp64stuff=0x991212121},
             ]

lex_idx_svp64 = 0
for i in range(len(ppc_insns)):
     ppc = ppc_insns[i]
     found_match = False
     while True:
         svp64 = svp64_mods[lex_idx_svp64]
         lexicographical_compare = cmp(ppc->name, svp64->name);
         if lexicographical_compare == 0:
             found_match = True
             break
         if match > 0: // no match, name is greater
             break
         if match < 0: // must "catch up" svp64 array
             lex_idx_svp64 += 1
             if lex_idx_svp64 == len(svp64_mods): // overflow!
                  break
          if found {
               // fill in the svp64 field in ppc_insns
               ppc->svp64stuff = svp64->svp64stuff

that's it.  that's a SQL "LEFT JOIN", and it works by assuming
that the primary key in each "table" is pre-sorted.

you can do it even inside the function that creates the hash lookup
as it is walking the 10,000-long ppc_insns 

you *do not* need the opcode/mask to do this.
to repeat: you *do not* need the opcode/mask.
the match is by *name*.

in python (SVP64 CSV-reading class) i do this merge using a dictionary, so
it is like... 2 lines of code.

the only tricky bit is that the "cmp" function must strip off any
"o", "." or "o." at the end before doing the compare. Rc=1 and OE=1
are the exact same instruction, so they have to match.

match:  ppc->name = "addo" svp64->name = "add" MATCHES
match:  ppc->name = "add." svp64->name = "add" MATCHES
match:  ppc->name = "addeo" svp64->name = "addi" FAILS (obviously,
         because it's a different instruction)

(In reply to dmitry.selyutin from comment #74)
> Wow, inheriting power_enums.py stuff was really a nice idea. With the recent
> updates from Jacob, I see stuff like `SVP64_IN2SEL_CONST_XBI` for free!

gooood.  that's a huge relief, because maintaining stuff by hand that
gets out of sync ==> bad idea.  nice decision.
Comment 77 Dmitry Selyutin 2022-01-16 18:57:48 GMT
Added the array of entries itself, also pushed minor cleanup and fixup. Hopefully we can now proceed to merging entries as Luke described. One note, though: I was wrong, PPC tables are _not_ sorted by names, but instead are sorted by opcodes:

include/opcode/ppc.h
/* The table itself is sorted by major opcode number, and is otherwise
   in the order in which the disassembler should consider
   instructions.  */

Given that we have not that many entries, I think we might simply iterate over all SVP64 entries and lookup all PPC entries which match upon start (e.g. when we meet "add" SVP64 entry, we'll try addo, add., addo. as well. Luke, does it look OK to you? I'd be really glad if you see another trick like you suggested initially with LEFT-JOIN. :-)
Comment 78 Dmitry Selyutin 2022-01-16 19:00:34 GMT
You might use SILENCELOG=true python3 src/openpower/sv/sv_binutils.py ppc-opc-svp64.c to see the output. I'll think about "simply let 'em feed the goddamn directory path" approach as well, for now it's simpler to operate with explicit files. For now the output contains opcodes as well; I know we intend to drop these, but perhaps they might be good for some checks we do cross PPC opcodes.
Comment 79 Luke Kenneth Casson Leighton 2022-01-16 19:26:50 GMT
(In reply to dmitry.selyutin from comment #77)
> Added the array of entries itself, also pushed minor cleanup and fixup.
> Hopefully we can now proceed to merging entries as Luke described. One note,
> though: I was wrong, PPC tables are _not_ sorted by names, but instead are
> sorted by opcodes:

mmm that's a bit of a pig, then.

> include/opcode/ppc.h
> /* The table itself is sorted by major opcode number, and is otherwise
>    in the order in which the disassembler should consider
>    instructions.  */
> 
> Given that we have not that many entries, I think we might simply iterate
> over all SVP64 entries and lookup all PPC entries which match upon start

yeah, either that, or use the ppc64 lookup function *after* the hashtable
is created, and just go over the svp64 list in linear order.

> (e.g. when we meet "add" SVP64 entry, we'll try addo, add., addo. as well.
> Luke, does it look OK to you?

yyep, should work great.

(In reply to dmitry.selyutin from comment #78)
> You might use SILENCELOG=true python3 src/openpower/sv/sv_binutils.py
> ppc-opc-svp64.c to see the output. I'll think about "simply let 'em feed the
> goddamn directory path" approach as well, for now it's simpler to operate
> with explicit files.

> For now the output contains opcodes as well; I know we
> intend to drop these, but perhaps they might be good for some checks we do
> cross PPC opcodes.

extra effort. all mnemonics should be unique.  hey if it's in it's effort
to take it out.

now, about this:

struct svp64_entry {
    const char *name;
    struct svp64_opcode opcode;
    uint64_t in1 : 3;
    uint64_t in2 : 5;
    uint64_t in3 : 3;
    uint64_t out : 3;
    uint64_t out2 : 3;
    uint64_t cr_in : 4;
    uint64_t cr_out : 3;
    uint64_t sv_ptype : 2;
    uint64_t sv_etype : 2;
    uint64_t sv_in1 : 3;
    uint64_t sv_in2 : 3;
    uint64_t sv_in3 : 3;
    uint64_t sv_out : 3;
    uint64_t sv_out2 : 3;
    uint64_t sv_cr_in : 3;
    uint64_t sv_cr_out : 3;
    uint64_t : 15;
};

that's not going to work.  that's a massive... 1k entry where a 64-bit one will do.

i was expecting this:

struct svp64_entry {
    const char *name;
    uint64_t svp64_info;
}

followed by some (perhaps auto-generated) macros:

#define SET_SV_IN1(val) ((val&0x3)<<5)
#define GET_SV_IN1(val) ((val&0x3)>>5)

where the settings in the c file comprise:

    svp64_info = SET_SV_IN1(someconst) | SET_SV_IN2(anotherconst) ....


something like that.

the code inside binutils would then use the GET_SV_XXXX macros to
obtain the required information.

using bit-fields in structs is non-portable and it won't make it
past review by the binutils developers.

also, the fact that there would be ten THOUSAND such 1k entries,
(because of 25 64-bit ints) that's far too much memory wasted.

whereas - as i said in several of the comments - a single uint64_t
is perfectly sufficient. adding up the total number of bits above
comes to 60 bits, which is less than 64.
Comment 80 Dmitry Selyutin 2022-01-16 20:25:13 GMT
What exactly is non-portable with bitfields here, having GCC-compatible compilers in mind? You don't memcpy the structures, you don't send these over the network; in fact, the machine where the code was compiled is exactly the same which runs it (I mean, even if you cross-compile for ARM64 on x86, you will run it on ARM64, not on some SPARC which got the PPC entry over the network). Sure, 64-bit integer bitfields are not ANSI C blessed, but the code is already GCC-compliant (__attribute__ et al.). The only argument which stands is least surprise principle (i.e. that other bits use the macros). But "x->y" is much more readable than "X_GET_Y(x)", and same stands for setters. Anyway, that's not difficult to change.

As for a lot of 64-bit integers: I don't get where you find 25 uint64_t fields. There's no need to assign something other than "const struct svp64_entry *" in PPC opcode structures. The only way these are used are mapping these into the hash table. Even if there is a need to stick to a single 64-bit integer, you _can_ still pack bitfields of 64-bit integers into single 64-bit macros, like you suggest above. My point, however, is readability: unless there are some portability concerns, there's no need to make it less readable, be it machine-generated or not.

So I'll think about macros (though I personally find their use for such purpose quite awful outside of e.g. network programming), but mostly because of least surprise principle. I'm not convinced of other reasons. As for having a copy of uint64_t in PPC, I think there must be a pointer to SVP64 entry.

On 64-bit machine, "size of(struct svp64_entry)" would be 24 bytes. If we drop the "struct svp64_opcode" part, that'd be 16 bytes (we have 300 or so entries since recent). For each PPC entry, if we keep a pointer or 64-bit int, it'd add 8 bytes per entry. I don't see a problem here.
Comment 81 Luke Kenneth Casson Leighton 2022-01-16 20:45:21 GMT
(In reply to dmitry.selyutin from comment #80)

> into single 64-bit macros, like you suggest above. My point, however, is
> readability: unless there are some portability concerns, there's no need to
> make it less readable, be it machine-generated or not.

again to repeat: space and the amount of time taken to complete is a priority.
binutils is extremely low-level and it will be expected to have
an above-average level of optimisation that cannot unfortunately be ignored.

> So I'll think about macros (though I personally find their use for such
> purpose quite awful outside of e.g. network programming), but mostly because
> of least surprise principle. I'm not convinced of other reasons. As for
> having a copy of uint64_t in PPC, I think there must be a pointer to SVP64
> entry.

no: to repeat again: there is no need. if it was more than 64 bits of
information, then yes a pointer would be needed.  i have said three
times that there will never be more than the 64 bits of information.

> On 64-bit machine, "size of(struct svp64_entry)" would be 24 bytes. If we
> drop the "struct svp64_opcode" part, that'd be 16 bytes (we have 300 or so
> entries since recent). For each PPC entry, if we keep a pointer or 64-bit
> int, it'd add 8 bytes per entry. I don't see a problem here.

again: this means extra processing power required, first to access the
pointer and second to access the contents of the pointer.

whereas providing the 64-bit value directly as the sole exclusive value
that is never going to be extended beyond those 64 bits is much more
efficient.

binutils is unfortunately a suite of programs where optimisation matters
due to the sheer overwhelming size of the task it carries out.

when sub-optimal code is presented to the binutils team for inclusion
they will immediately pick up on it and request it to be made acceptable.
therefore we might as well make the effort now.
Comment 82 Jacob Lifshay 2022-01-16 20:56:23 GMT
(In reply to Luke Kenneth Casson Leighton from comment #81)
> no: to repeat again: there is no need. if it was more than 64 bits of
> information, then yes a pointer would be needed.  i have said three
> times that there will never be more than the 64 bits of information.

imho, if that struct is not public API with backward compat constraints, then just use a struct with bitfields but without the extra pointer. bitfields produce the exact same machine code as lkcl's macros but are waay more clear imho. if we run out of space in the future it's pretty easy to refactor it to use indirection if needed. because the bitfields are always only accessed by reading/writing through that struct, it doesn't matter if the fields are laid out non-portably...the compiler generates the code that adapts between the non-portable fields and the rest the C code. non-portable field layout only matters when there's a way to access the fields either without using that struct or from a different cpu architecture.

e.g.:
typedef struct {
    // no pointers...
    uint64_t field1 : 3;
    uint64_t field2 : 3;
    uint64_t field3 : 3;
    uint64_t field4 : 3;
} my_struct;
Comment 83 Luke Kenneth Casson Leighton 2022-01-22 22:27:48 GMT
dmitry, saw the message to binutils (thank you for ccing me)

my thoughts after a week are, the struct is ok - but will need
separating out into a separate struct, so that it can be
copied into the ppc_binutils field which is #ifdef'd out

struct svp64_stuff // no more than 64 bits
{
    bitfield1: 1
    bitfield2: 1
    ...
};

struct svp64_autogenerated_list
{
     char *name;
     struct svp64_stuff stuff;
};

struct svp64_autogenerated_list list[200]=
   {{name="add", ....}
    ...
   }

and in binutils:

struct ppc_binutils_fields
     ....
     .....
#ifdef SVP64
     struct svp64_stuff svp64;
#endif
};

note: *not* a *pointer* to struct svp64_stuff.  repeat: *not* a pointer
because the total number of bits going into any svp64_stuff *will* be
*below* 64-bit and will *never* go above 64-bit, ever.

(it may be useful therefore to dedicate one bit to indicate that the
entry is valid)

struct svp64_stuff // no more than 64 bits
{
    valid: 1;
    bitfield1: 1
    bitfield2: 1
    ...
};
Comment 84 Dmitry Selyutin 2022-01-23 11:30:31 GMT
(In reply to Luke Kenneth Casson Leighton from comment #83)
> dmitry, saw the message to binutils (thank you for ccing me)

I'm kinda sad nobody replied yet; yet I don't won't to appear that annoying, so I'll ping them in some days.

> my thoughts after a week are, the struct is ok - but will need
> separating out into a separate struct

Fair enough. I'll do it soon.

> note: *not* a *pointer* to struct svp64_stuff.  repeat: *not* a pointer
> because the total number of bits going into any svp64_stuff *will* be
> *below* 64-bit and will *never* go above 64-bit, ever.

OK, OK, I surrender. :-)
Comment 85 Dmitry Selyutin 2022-01-23 11:40:18 GMT
(In reply to Luke Kenneth Casson Leighton from comment #76)
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/
> decoder/power_decoder.py;h=8fabeabcb63bfc4d0b8f53bf614c4e4d8f777c18;
> hb=55bf9f6152d7348135aec60d70ccde7692c81e0b#l259
> 
> there you go.  comment field is processed by removing anything "="

Another question, we have entries like mfcr/mfocrf and mtcrf/mtocrf. Are these synonyms, and I should output two or more entries there which contain the same data? I didn't find the particular place where we split by "/".
Comment 86 Dmitry Selyutin 2022-01-23 11:43:16 GMT
Seems likely these should output different fields. Here's what we have in binutils:

{"mfcr",	XFXM(31,19,0,0), XFXFXM_MASK, COM,	0,		{RT, FXM4}},
{"mfocrf",	XFXM(31,19,0,1), XFXFXM_MASK, COM,	0,		{RT, FXM}},

{"mtcr",	XFXM(31,144,0xff,0), XRARB_MASK, COM,	EXT,		{RS}},
{"mtcrf",	XFXM(31,144,0,0), XFXFXM_MASK, COM,	0,		{FXM, RS}},
{"mtocrf",	XFXM(31,144,0,1), XFXFXM_MASK, COM,	0,		{FXM, RS}},
Comment 87 Luke Kenneth Casson Leighton 2022-01-23 12:32:36 GMT
(In reply to Dmitry Selyutin from comment #84)
> (In reply to Luke Kenneth Casson Leighton from comment #83)
> > dmitry, saw the message to binutils (thank you for ccing me)
> 
> I'm kinda sad nobody replied yet; yet I don't won't to appear that annoying,
> so I'll ping them in some days.

it happens. if no reply, then, well, we did ask.

> > my thoughts after a week are, the struct is ok - but will need
> > separating out into a separate struct
> 
> Fair enough. I'll do it soon.

star.
 
> > note: *not* a *pointer* to struct svp64_stuff.  repeat: *not* a pointer
> > because the total number of bits going into any svp64_stuff *will* be
> > *below* 64-bit and will *never* go above 64-bit, ever.
> 
> OK, OK, I surrender. :-)

:)

(In reply to Dmitry Selyutin from comment #86)
> Seems likely these should output different fields. Here's what we have in
> binutils:
> 
> {"mfcr",	XFXM(31,19,0,0), XFXFXM_MASK, COM,	0,		{RT, FXM4}},
> {"mfocrf",	XFXM(31,19,0,1), XFXFXM_MASK, COM,	0,		{RT, FXM}},
> 
> {"mtcr",	XFXM(31,144,0xff,0), XRARB_MASK, COM,	EXT,		{RS}},
> {"mtcrf",	XFXM(31,144,0,0), XFXFXM_MASK, COM,	0,		{FXM, RS}},
> {"mtocrf",	XFXM(31,144,0,1), XFXFXM_MASK, COM,	0,		{FXM, RS}},

the two pairs are handled by the same HDL but yes you will need
to drop the same svp64 struct entry into both of them.

easiest way to do that is probably just duplicate the svp64 struct entry
in sv_binutils.* 

similar thing for mtmsr/mtmsrd btw.  the exact same HDL deals with
both, then does further bit-decoding in the actual HDL.
Comment 88 Dmitry Selyutin 2022-01-23 18:17:11 GMT
I'm also curious about these entries in minor_31.csv:

0b1101111010,SHIFT_ROT,OP_EXTSWSLI,NONE,CONST_SH,RS,RA,NONE,CR0,0,0,ZERO,0,NONE,0,0,0,0,0,0,RC,0,0,extswsli,XS,
0b1101111011,SHIFT_ROT,OP_EXTSWSLI,NONE,CONST_SH,RS,RA,NONE,CR0,0,0,ZERO,0,NONE,0,0,0,0,0,0,RC,0,0,extswsli,XS,

Almost the same except for opcode. I assume the latter is `extswsli.`?

UPD: same for isel.
Comment 89 Luke Kenneth Casson Leighton 2022-01-23 18:49:57 GMT
(In reply to Dmitry Selyutin from comment #88)
> I'm also curious about these entries in minor_31.csv:
> 
> 0b1101111010,SHIFT_ROT,OP_EXTSWSLI,NONE,CONST_SH,RS,RA,NONE,CR0,0,0,ZERO,0,
> NONE,0,0,0,0,0,0,RC,0,0,extswsli,XS,
> 0b1101111011,SHIFT_ROT,OP_EXTSWSLI,NONE,CONST_SH,RS,RA,NONE,CR0,0,0,ZERO,0,
> NONE,0,0,0,0,0,0,RC,0,0,extswsli,XS,

https://github.com/antonblanchard/microwatt/blob/6ff3b2499ceaf911252f634d44469fefb58952ae/decode1.vhdl#L255

let's have a look at the spec (v3.0C p108 book I 3.3.14.2.1

       445 sh Rc
    21     30 31

MSB0 bit 30 is for the "sh" field.  a "convenient" way to ensure that
only 9 bits of what is normally considered to be the XO field both go
to EXTSWSLI would be... to have two entries.

> Almost the same except for opcode. I assume the latter is `extswsli.`?
> 
> UPD: same for isel.

likewise for isel except there are 32 identical entries.

pick one. doesn't matter which because they're identical.  the relationship
is by instruction name not any other fields.

the link from svp64_struct to binutils is by name.

all and any other relationships are for other purposes not in any way
related to the job needed to be done here.
Comment 90 Dmitry Selyutin 2022-01-23 19:45:20 GMT
Multiple updates in binutils and isa parts (major stuff, not going into details):
* decoupled svp64_record (bits only);
* decoupled all SVP64-related binutils code into standalone files;
* checked that the auto-generated code can be compiled as part of binutils.

Next steps:
1. drop `struct svp64_opcode` in some of the next revisions;
2. dig into autotools to drop tricks like `#include "tc-ppc-svp64.c"` (the binutils-gdb build system is somewhat complex) and also transfer `#define SVP64`.
3. find a way to make `struct powerpc_opcode` pass `-Wmissing-fields-initializers` (see `git log` on current way to bypass it);
4. finally iterate over all generated entries so that we can find their counterparts in `ppc_hash` (i.e. integrate right after `ppc_setup_opcodes`);
5. deal with missing mnemonics (https://bugs.libre-soc.org/show_bug.cgi?id=550#c50);
6. the actual remap.

Luke, if you have some time to deal with mnemonics (https://bugs.libre-soc.org/show_bug.cgi?id=550#c50), that'd be great, since they would look alike and keep the same spirit. Otherwise I'll handle it when we're here.

Actually I think everything before item 5 is simple, hopefully we'll be there soon.
Comment 91 Dmitry Selyutin 2022-01-25 20:00:38 GMT
More updates:

sv_binutils
* dropped reliance on opcodes; from now on, we take the first insn name we met, and don't bother checking it one more time;
* introduced a per-record validity flag; for now, we only check whenever we attempt to bind an SVP64 recprd which was already bound;

binutils-gdb
* updated the generated files respectively;
* introduced svp64_setup_opcodes routine which takes care of binding svp64_record.
Comment 92 Dmitry Selyutin 2022-01-29 16:44:36 GMT
After the discussion, updated the binutils code so that we use a separate hash table, after all. :-) This allowed us to drop "valid" flag on sv_binutils.py side, since the hash used by binutils keeps track of duplicated entries on its own.
I'd like to update the generator so that it outputs "dot" entries, like with "add.", because I'd like to pass name as is into the lookup routine.
Comment 93 Dmitry Selyutin 2022-01-31 19:16:21 GMT
Updated sv_binutils with the way I _think_ we should consider `INSN.` versions. In the current revision, I check for `rc is RC.RC`; please, let me know, if this is a wrong way. As for `INSNo` versions, it seems that every `INSNo` entry is already present in CSVs, so there should not be problems.
Also, whilst I was here, I also took a liberty to refactor some code, to keep it somewhat more maintainable (yehyeh, I know, that's always a personal stuff; but I feel the code got simplified a bit).
Comment 94 Luke Kenneth Casson Leighton 2022-01-31 20:38:14 GMT
(In reply to Dmitry Selyutin from comment #93)
> Updated sv_binutils with the way I _think_ we should consider `INSN.`
> versions. In the current revision, I check for `rc is RC.RC`;

yeah looks reasonable.  there's a couple of ways it could be used:

1) to do the detection in python (similar to ISACaller) and actually
   add a duplicate entry "add." (and when you get to OE, "addo" and
   then "addo." as well)

2) pass the information over to gas through the svp64_struct and
   use just the one entry

now, i would not recommend (2) even though (1) will result in more
entries (with duplicate information in effect) because the size of
the svp64_struct could go up to beyond 64-bit if putting more and
more info in it.
Comment 95 Dmitry Selyutin 2022-02-28 22:08:37 GMT
sv_binutils updates: mostly cosmetic changes, including enums simplification. Also added enum-like class to incorporate constants, which, in turn, lead to changes in openpower.consts module. From now on, we're able to generate bunch of defines aimed to substitute SVP64MODE.MOD2_MSB et al.

binutils updates: minor refactoring and cleanup. Adding validation of fields after parsing, as well as updating the fields in course of it (e.g. srcwid/destwid manipulations).

Despite that changes look pretty simple, they took long time; what you see is the result of massive try-and-probe series, with several rebases. :-) At least I feel that I know more of Python Enum than I ever wanted to know.
Comment 96 Dmitry Selyutin 2022-03-04 18:35:07 GMT
sv_binutils updates: generalized some handlers; introduced rc_mode check; added svp64 modes handling (normal, mapreduce, failfirst, saturation, predicate-result).

I'll likely refactor some bits, considering that the code gets way too complex to keep it in one function, likely just as I did with the parsing (i.e. some array with callbacks).
Comment 97 Luke Kenneth Casson Leighton 2022-04-06 21:40:50 BST
https://libre-soc.org/irclog/%23libre-soc.2022-04-06.log.html
Comment 98 Luke Kenneth Casson Leighton 2022-05-11 11:32:48 BST
raise discussion of limit being hit on number of opcode fields
limited to 8-bit index

https://sourceware.org/pipermail/binutils/2022-May/120774.html
https://libre-soc.org/irclog/%23libre-soc.2022-05-11.log.html#t2022-05-11T10:25:11
Comment 99 Jacob Lifshay 2022-05-11 11:36:03 BST
(In reply to Luke Kenneth Casson Leighton from comment #98)
> raise discussion of limit being hit on number of opcode fields
> limited to 8-bit index
> 
> https://sourceware.org/pipermail/binutils/2022-May/120774.html

that's the wrong url, the correct url is:
https://sourceware.org/pipermail/binutils/2022-May/120775.html
Comment 101 Dmitry Selyutin 2022-05-17 12:22:09 BST
Patches on hitting opcode limits in mailing lists:
https://sourceware.org/pipermail/binutils/2022-May/120783.html