Bug 550 - binutils support needed for svp64
Summary: binutils support needed for svp64
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Alexandre Oliva
URL:
Depends on:
Blocks:
 
Reported: 2020-12-18 20:33 GMT by Luke Kenneth Casson Leighton
Modified: 2021-09-20 23:27 BST (History)
4 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments
Ho ho ho! (4.67 KB, patch)
2020-12-26 02:14 GMT, Alexandre Oliva
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-12-18 20:33:41 GMT
gnu as and objdump support is needed for the new svp64 encoding.

first recommended approach (simplest): a new "instruction"

     svp64 0xNNNNNN (or binary)

encoding would be as described in "Prefix Fields" starting with Major Opcode EXT01

https://libre-soc.org/openpower/sv/svp_rewrite/svp64/

subsequent encoding (TBD) would be:

    svp64 SUBVL=2,ew=8,rt=v,mask=r3

and finally allow that to be a "prefix" of instructions as (TBD)

     addi[r3] r64.w.vec2.v, 5

w: elwidth=32
v: RT is vector
vec2: SUBVL=2
[r3]: mask=r3

opcodes listed here, gives prefixing info, autogenerated.
https://libre-soc.org/openpower/opcode_regs_deduped/
Comment 1 Alexandre Oliva 2020-12-21 19:08:46 GMT
I'm a little puzzled (not just because I can hardly make head from tail of the svp64 web page :-)

why bother with "svp64 0x..." syntax, if we already have .long?


as for making sense of the page.  I guess it must all make some sense if you have some vague notion of what the prefixes are supposed to accomplish, but that's not me.  I could use some examples, or pointers to earlier, more complete and self-contained docs that would give me some sense of what's supposed to be going on there.

not that I really need to be able to make sense of it before I can implement binutils changes, mind you; it just helps avoid silly mistakes, and wrong assumptions, and I figured I might be able to help validate the proposed design, if only I had the required background.  alas, I suppose I'm missing background on GPUs, ppc 3.1 opcodes, and the earlier simd design for risc-v
Comment 2 Luke Kenneth Casson Leighton 2020-12-21 20:09:51 GMT
(In reply to Alexandre Oliva from comment #1)
> I'm a little puzzled (not just because I can hardly make head from tail of
> the svp64 web page :-)
> 
> why bother with "svp64 0x..." syntax, if we already have .long?

yes jacob pointed that out... although... an "svp64 0xNNNNNNN" instruction would help you to understand the "first phase": where the RM field fits.


> 
> as for making sense of the page.  I guess it must all make some sense if you
> have some vague notion of what the prefixes are supposed to accomplish, but
> that's not me.

SV - aka SimpleV - is a hardware for-loop around instructions.

that's it.  full stop.

here is some pseudocode that shows what that looks like, using ADD as an example:

https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=simple_v_extension/simple_v_chennai_2018.tex;hb=HEAD#l190


>  I could use some examples, or pointers to earlier, more
> complete and self-contained docs that would give me some sense of what's
> supposed to be going on there.

this paragraph puts the above one-liner and the pseudocode into context:

https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=simple_v_extension/specification.mdwn;hb=HEAD#l38


> not that I really need to be able to make sense of it before I can implement
> binutils changes, mind you; it just helps avoid silly mistakes, and wrong
> assumptions, and I figured I might be able to help validate the proposed
> design, if only I had the required background. 

appreciated.

> alas, I suppose I'm missing
> background on GPUs, ppc 3.1 opcodes, and the earlier simd design for risc-v

there was no SIMD ISA: SV is *categorically* and very specifically diametrically opposed to SIMD.

SIMD is considered harmful:
https://www.sigarch.org/simd-instructions-considered-harmful/

x86 expanded from 70 to *1400* instructions since 1978, thanks to SIMD (far, far more since adding AVX512.  SIMD is an O(N^6) opcode proliferation nightmare.

also we are not adding v3.1B opcodes (that is a separate discussion which requires OPF permission). the sole exclusive reason for using EXT01 is to get the "fitting in" with v3.1B 64 bit prefixing in a nondisruptive fashion that the OPF ISA WG should not have any objection to.


the sigarch article shows how RVV works.  SV is based on the exact same underlying principle: you have an instruction, you have a vector loop on that instruction, elements are computed based on that instruction.

full stop.

it's real simple.

VL in our case can be anywhere from 1 to 64.  *very rarely* it is permitted to be zero.

so how do we set this "VL" or vector length?

well, with an instruction of course.
https://libre-soc.org/openpower/sv/setvl/

and... err... then what?  well, no standard 32 bit scalar instructions do anything: they don't "understand" VL.

so we "Prefix" them.  this says, "hey you know that VL for-loop you want applied? well the next 32 bits contains the instruction to be smashed into that for-loop, oh and by the way here's some other random trash to chuck at the loop, such as predication, blah blah".

therefore, ultimately, we want this kind of syntax:

    setvl r3, r5, VL=4
    SUBVL=2, ELWIDTH=8 { add r5, r5, r2 }

the output will be:

* 32 bits containing an instruction for setvl
* 32 bits starting with EXT01 as its Major Opcode and continuing with the pattern that drops SUBVL=2 and ELWIDTH=8 somewhere into the RM field bits
* 32 bits containing an addi instruction

this will get us that hardware for-loop activated 4 times (0-3) on that add instruction.

actually 8 because SUBVL=2

and, actually, it will be 8bit adds not 64bit adds because ELWIDTH=8.

does that provide you with a quick crash-course in how SV works?
Comment 3 Luke Kenneth Casson Leighton 2020-12-21 21:50:27 GMT
i've added a rapid prototype "Assembly Annotation" to the appendix,
and also updated the "Prefix Fields".

unnnforttunately, i just realised that, actually, working out which
of the "Remapped Encodings" to apply, will need to be a per-instruction
basis, for everything but MASK_KIND, MASK, ELWIDTH, SUBVL and MODE.
these are always in the same place: everything else (EXTRAs, ELWIDTH_SRC,
MASK_SRC) critically depends on what instruction is used.

we can "get away with this" by specifying the mode-type as part of the
svp64 encoding... for now.
Comment 4 Alexandre Oliva 2020-12-26 02:14:34 GMT
Created attachment 123 [details]
Ho ho ho!

Here's a patch that introduces in GNU binutils an svp64 pseudo-instruction, that takes a single 24-bit operand, and encodes it as a 32-bit insn with EXT01 as the major opcode, and MSB0 bits 7 and 9 also set, shuffling the top two bits of the 24-bit operand, RM[0] and RM[1], into bits 6 and 8 of the insn.
Comment 5 Luke Kenneth Casson Leighton 2020-12-26 02:38:08 GMT
(In reply to Alexandre Oliva from comment #4)
> Created attachment 123 [details]
> Ho ho ho!

:)
 
> Here's a patch that introduces in GNU binutils an svp64 pseudo-instruction,
> that takes a single 24-bit operand, and encodes it as a 32-bit insn with
> EXT01 as the major opcode, and MSB0 bits 7 and 9 also set, shuffling the top
> two bits of the 24-bit operand, RM[0] and RM[1], into bits 6 and 8 of the
> insn.

cool!  this is fantastic, it means that the next stages open up as well, for adding basic SV capability to ISACaller (the simulator).

alexandre, i will create a binutils git clone tomorrow, to make sure this gets tracked properly.
Comment 6 Alexandre Oliva 2020-12-26 02:51:30 GMT
thanks for the crash course.  as I said in the call, it was very useful.
it's all beginning to make sense.

> we can "get away with this" by specifying the mode-type as part of the
svp64 encoding... for now.

I was going to ask about that.  it seems that there's nothing in the svp64 prefix instruction itself that tells how to decode its fields, you have to look at the actual insn that follows to know.

Once we get to a stage in which we'll want to specify svp64 fields separately, rather than combined into a 24-bit immediate, an explicit specification of mode may help the assembler, to some extent, but the disassembler (and the assembler, if it's to detect inconsistencies) will have to look at prefix+insn as a single thing to be able to do its job.
Comment 7 Luke Kenneth Casson Leighton 2020-12-26 08:03:59 GMT
(In reply to Alexandre Oliva from comment #6)
> thanks for the crash course.  as I said in the call, it was very useful.
> it's all beginning to make sense.

ah good.  it's kinda surprising that nobody has thought of this before.

> > we can "get away with this" by specifying the mode-type as part of the
> svp64 encoding... for now.
> 
> I was going to ask about that.  it seems that there's nothing in the svp64
> prefix instruction itself that tells how to decode its fields, you have to
> look at the actual insn that follows to know.

correct.  bit (haha) annoying however with bits so precious it's how it goes. the alternative is that we request a Major Opcode then use the 2 extra bits, one for 1/2 Predication, the other for 2/3 EXTRA (although to be honest, 2 more bits means 4 more modes/features...)

the sv_analysis.py program is generating tables already
https://libre-soc.org/openpower/opcode_regs_deduped/

the idea is to create CSV files which give those 2 missing bits.  it is not outside the realm of possibility to autogenerate a header file for inclusion in binutils.
 
> Once we get to a stage in which we'll want to specify svp64 fields
> separately, rather than combined into a 24-bit immediate, an explicit
> specification of mode may help the assembler, to some extent, 

autogenerated.  otherwise it's too much work (200 insns) and you get transcription errors.  dunno bout you but i don't want to have to check that.

> but the
> disassembler (and the assembler, if it's to detect inconsistencies) will
> have to look at prefix+insn as a single thing to be able to do its job.

indeed.  and PowerDecoder2 as well.  this is how it goes.

i'm not happy about it because normally RISC is not supposed to have lots of gates in the decoder.

if we were doing our own ISA from scratch these two bits, saying whether 1P/2P was set and whether EXTRA2/3 was set would definitely be part of the opcode.
Comment 8 Luke Kenneth Casson Leighton 2021-09-07 00:22:55 BST
relevant links:
http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003590.html

the only thing: it is *not* a good idea to hand-create the tables needed by binutils.  these should be *auto-generated*, teaching sv_analysis.py how to do that.

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_analysis.py;hb=HEAD

there's nothing particularly sophisticated or clever about that program: it's written in a bland, non-OO "Get It Done" style.  it:
* reads OpenPOWER ISA v3.0B CSV files containing micro-code-style instruction format information
(exactly like the tables in binutils)
* identifies and groups v3.0B instructions by identical register file profile (number of Read regs, number of Write regs, number of CR regs read etc)
* assigns an SVP64 "Style" to each (Twin/Single-predicate, 2 or 3 EXTRA bits for reg extension)
* spits out *more* CSV files with that grouping information in it, to assist in decoding

thus rather than hand-create the SVP64 decoding information in binutils it should be trivial to autogenerate c header files and c structs.

http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003592.html


no deadlines given that i am using the python class, which has a mode where it can do .S processing.  i actually had to add gas macro recognition to get that to work.

so there is a temporary workaround.  however it will become increasingly more of a priority particularly for Lauri who is working at assembler level for Video/Audio CODECs, and later for compilers.

the function entrypoint is asm_process()

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;h=45b292b4c4c32bbff548f2bf299235633d31db6c;hb=HEAD#l1052

you can see it looks for ".set" macros of the utmost basic form, example where this is used:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/Makefile;h=4dd904b6ba48f3fcae3b1ab04e1b0479e460abd4;hb=HEAD#l34

and some actual assembler containing sv.xxx opcodes, which get translated by asm_process() libe by line into ".long xxxxx; some_v3.0b_asmopcode"

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/audio/mp3/mp3_0_apply_window_float_basicsv.s;hb=HEAD

you've seen the spec page which contains the format?

 https://libre-soc.org/openpower/sv/svp64/

it's very deliberately only describing the format, not why it is what it us, or how to *use* that format (how to implement hardware etc i mean).
Comment 9 lechenko 2021-09-19 20:33:01 BST
(In reply to Luke Kenneth Casson Leighton from comment #8)

Slowly but surely, I figured out, what https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/trans/svp64.py;h=45b292b4c4c32bbff548f2bf299235633d31db6c;hb=HEAD#l1052 does. 

As I understood, it translates svp64 asm mnemonics to prefix as a 32-bit literal and subsequent scalar OpenPOWER asm mnemonic. And after that translated .S-file feeds to binutils to produce binary file.

So, now we want to support the same svp64 asm mnemonics directly in binutils. But, my guess, that scalar OpenPOWER instructions are already there. Thus, few questions.

Does it mean, that we can try to implement the same two-step translation logic inside binutils? Or reuse OpenPOWER-related header files, at least?

Another one about svp64 asm syntax. As far as I understand, it is already support current version of asm syntax, but is there a spec on it? Could you share a link, please.

And the last one, for now. I guess, I can reuse/refactor both sv_analysis.py and svp64.py to generate header for binutils. Was that your intentions on how to do that task?
Comment 10 Luke Kenneth Casson Leighton 2021-09-19 22:25:30 BST
(In reply to lechenko from comment #9)
> (In reply to Luke Kenneth Casson Leighton from comment #8)
> 
> Slowly but surely, I figured out, what
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/
> trans/svp64.py;h=45b292b4c4c32bbff548f2bf299235633d31db6c;hb=HEAD#l1052
> does. 
> 
> As I understood, it translates svp64 asm mnemonics to prefix as a 32-bit
> literal and subsequent scalar OpenPOWER asm mnemonic.

because there is no binutils support for svp64, yes.  when the .long
and the 32 bit mnemonic get passed to binutils, they get converted to
binary *without* binutils ever needing to know anout svp64 assembler.

the exercise is therefore to merge the EXACT functionality of svp64.py
*into* binutils.


> And after that
> translated .S-file feeds to binutils to produce binary file.

yes.

> So, now we want to support the same svp64 asm mnemonics directly in
> binutils. But, my guess, that scalar OpenPOWER instructions are already
> there. 

yes, and yes.

> Thus, few questions.
> 
> Does it mean, that we can try to implement the same two-step translation
> logic inside binutils?

yes, exactly. or, more to the point: after "conversion" to ".long xxxxx; {equivalent v3.0B}" pass that *again* to the relevant function and get it to convert those to the appropriate binary output.

it is important to do that conversion pass *after* all the macro renaming
and expansion of registers.  gas has a builtin macro system, you cannot
process SVP64 registers until you know the actual number, 0-127.


> Or reuse OpenPOWER-related header files, at least?

yes absolutely, for goodness sake don't duplicate the entirety of power isa headers for scalar operations.


> Another one about svp64 asm syntax. As far as I understand, it is already
> support current version of asm syntax, but is there a spec on it? Could you
> share a link, please.

there is a spec, http://libre-soc.org/openpower/sv/svp64 however
i literally made up the syntax as i went along.

"need a way to indicate mapreduce, err "mr" is short, that'll do"

no kidding! :)

svpy4.py is pretty much it, alongside the consts.py and other data structures,
power_enums.py and so on.

svp64.py and the Decoder are the canonical sources at the moment until
such time as there *is* time for someone *to* write documentation like
this.


> And the last one, for now. I guess, I can reuse/refactor both sv_analysis.py
> and svp64.py to generate header for binutils. Was that your intentions on
> how to do that task?

yyyep!  or, err refactor svp64.py? ahh more, "use svp64.py as a reference
to create the exact same thing in c", and *add* to sv_analysis.py to get
it to output autogenerated headers for use in binutils.  if you look
at how the microwatt vhdl struct is autogenerated that is pretty much
exactly what is needed.

cut, paste, substitute right magic constants, done.

you will see there are some data structures in binutils headers, that list
instructions.

if you add *one* extra field to that (a pointer to a binutils-ppc-svp64-struct)
ordinarily by leaving that out of the existing structs it will default to NULL.

you can then autogenerate a binutils svp64 header full of ppc-svp64-struct entries then have a function which, before use, *fills in*, *at runtime*, the pointers.

btw you got the message about copyright assignment to the FSF? this is *really* important.  binutils code that has not had an FSF copyright assignment *cannot be accepted upstream* and whatever you did would have to be thrown away and duplicated by someone who has.
Comment 11 Luke Kenneth Casson Leighton 2021-09-20 13:24:49 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_svp64_rm.py;hb=HEAD

that's the reverse: disassembly of binary to internal data structure
for use in the simulator and in HDL.  it will give some insight.

did you receive the email with the contact details of the copyright
clerk at the FSF?
Comment 12 lechenko 2021-09-20 22:42:24 BST
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to lechenko from comment #9)
> > (In reply to Luke Kenneth Casson Leighton from comment #8)
> > Thus, few questions.
> > 
> > Does it mean, that we can try to implement the same two-step translation
> > logic inside binutils?
> 
> yes, exactly. or, more to the point: after "conversion" to ".long xxxxx;
> {equivalent v3.0B}" pass that *again* to the relevant function and get it to
> convert those to the appropriate binary output.

Okay, I shall find that function then.

> it is important to do that conversion pass *after* all the macro renaming
> and expansion of registers.  gas has a builtin macro system, you cannot
> process SVP64 registers until you know the actual number, 0-127.
> 

You mean, the conversion of 'sv.*' instruction to '.long xxxxxx; \1'? Will this macro/expansion machinery work correctly on 'sv.*' instruction?


> there is a spec, http://libre-soc.org/openpower/sv/svp64 however
> i literally made up the syntax as i went along.

Aha. This is a binary format spec and there is no spec for mnemonics per se. My guess, that I have to dig out the format from svp64.py and tests.

Also, forgot to mention last time. There are some macro processing in svp64.py. Where can I read about it?

> 
> btw you got the message about copyright assignment to the FSF? this is
> *really* important.  binutils code that has not had an FSF copyright
> assignment *cannot be accepted upstream* and whatever you did would have to
> be thrown away and duplicated by someone who has.

I noticed the email. But had no time to fill in and send. I'll deal with paperwork as soon as I'll start hacking binutils.
Comment 13 Luke Kenneth Casson Leighton 2021-09-20 23:27:59 BST
(In reply to lechenko from comment #12)

> > it is important to do that conversion pass *after* all the macro renaming
> > and expansion of registers.  gas has a builtin macro system, you cannot
> > process SVP64 registers until you know the actual number, 0-127.
> > 
> 
> You mean, the conversion of 'sv.*' instruction to '.long xxxxxx; \1'? Will
> this macro/expansion machinery work correctly on 'sv.*' instruction?

yes, a macro expandion like system should work perfectly.

SVP64 has a hard rule: you cannot do this:

     sv.X   ==>  .long NNNN; Y

it MUST be this:

     sv.X   ==>  .long NNNN; X

in other words there is not ONE SINGLE 64 bit instruction that does not
map to its corresponding 32 bit one.

therefore you can perfectly well do a runtime substitution.

> > there is a spec, http://libre-soc.org/openpower/sv/svp64 however
> > i literally made up the syntax as i went along.
> 
> Aha. This is a binary format spec and there is no spec for mnemonics per se.
> My guess, that I have to dig out the format from svp64.py and tests.

yes, sorry.  we can put a documentation budget if you would like to write
it, even if it is very sparse working notes for yourself.

> Also, forgot to mention last time. There are some macro processing in
> svp64.py. Where can I read about it?

binutils docs describe the macro system, it is ".set X Y"
i simply copied that at its most basic simplest level so that
Lauri could do some very basic macros, ".set counter r3" and
so on.  this was enough.

 
> > 
> > btw you got the message about copyright assignment to the FSF? this is
> > *really* important.  binutils code that has not had an FSF copyright
> > assignment *cannot be accepted upstream* and whatever you did would have to
> > be thrown away and duplicated by someone who has.
> 
> I noticed the email. But had no time to fill in and send. I'll deal with
> paperwork as soon as I'll start hacking binutils.

ok cool. i must send it as well.