Bug 550 - binutils support needed for svp64
Summary: binutils support needed for svp64
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Alexandre Oliva
URL:
Depends on:
Blocks:
 
Reported: 2020-12-18 20:33 GMT by Luke Kenneth Casson Leighton
Modified: 2020-12-28 14:54 GMT (History)
3 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments
Ho ho ho! (4.67 KB, patch)
2020-12-26 02:14 GMT, Alexandre Oliva
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-12-18 20:33:41 GMT
gnu as and objdump support is needed for the new svp64 encoding.

first recommended approach (simplest): a new "instruction"

     svp64 0xNNNNNN (or binary)

encoding would be as described in "Prefix Fields" starting with Major Opcode EXT01

https://libre-soc.org/openpower/sv/svp_rewrite/svp64/

subsequent encoding (TBD) would be:

    svp64 SUBVL=2,ew=8,rt=v,mask=r3

and finally allow that to be a "prefix" of instructions as (TBD)

     addi[r3] r64.w.vec2.v, 5

w: elwidth=32
v: RT is vector
vec2: SUBVL=2
[r3]: mask=r3

opcodes listed here, gives prefixing info, autogenerated.
https://libre-soc.org/openpower/opcode_regs_deduped/
Comment 1 Alexandre Oliva 2020-12-21 19:08:46 GMT
I'm a little puzzled (not just because I can hardly make head from tail of the svp64 web page :-)

why bother with "svp64 0x..." syntax, if we already have .long?


as for making sense of the page.  I guess it must all make some sense if you have some vague notion of what the prefixes are supposed to accomplish, but that's not me.  I could use some examples, or pointers to earlier, more complete and self-contained docs that would give me some sense of what's supposed to be going on there.

not that I really need to be able to make sense of it before I can implement binutils changes, mind you; it just helps avoid silly mistakes, and wrong assumptions, and I figured I might be able to help validate the proposed design, if only I had the required background.  alas, I suppose I'm missing background on GPUs, ppc 3.1 opcodes, and the earlier simd design for risc-v
Comment 2 Luke Kenneth Casson Leighton 2020-12-21 20:09:51 GMT
(In reply to Alexandre Oliva from comment #1)
> I'm a little puzzled (not just because I can hardly make head from tail of
> the svp64 web page :-)
> 
> why bother with "svp64 0x..." syntax, if we already have .long?

yes jacob pointed that out... although... an "svp64 0xNNNNNNN" instruction would help you to understand the "first phase": where the RM field fits.


> 
> as for making sense of the page.  I guess it must all make some sense if you
> have some vague notion of what the prefixes are supposed to accomplish, but
> that's not me.

SV - aka SimpleV - is a hardware for-loop around instructions.

that's it.  full stop.

here is some pseudocode that shows what that looks like, using ADD as an example:

https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=simple_v_extension/simple_v_chennai_2018.tex;hb=HEAD#l190


>  I could use some examples, or pointers to earlier, more
> complete and self-contained docs that would give me some sense of what's
> supposed to be going on there.

this paragraph puts the above one-liner and the pseudocode into context:

https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=simple_v_extension/specification.mdwn;hb=HEAD#l38


> not that I really need to be able to make sense of it before I can implement
> binutils changes, mind you; it just helps avoid silly mistakes, and wrong
> assumptions, and I figured I might be able to help validate the proposed
> design, if only I had the required background. 

appreciated.

> alas, I suppose I'm missing
> background on GPUs, ppc 3.1 opcodes, and the earlier simd design for risc-v

there was no SIMD ISA: SV is *categorically* and very specifically diametrically opposed to SIMD.

SIMD is considered harmful:
https://www.sigarch.org/simd-instructions-considered-harmful/

x86 expanded from 70 to *1400* instructions since 1978, thanks to SIMD (far, far more since adding AVX512.  SIMD is an O(N^6) opcode proliferation nightmare.

also we are not adding v3.1B opcodes (that is a separate discussion which requires OPF permission). the sole exclusive reason for using EXT01 is to get the "fitting in" with v3.1B 64 bit prefixing in a nondisruptive fashion that the OPF ISA WG should not have any objection to.


the sigarch article shows how RVV works.  SV is based on the exact same underlying principle: you have an instruction, you have a vector loop on that instruction, elements are computed based on that instruction.

full stop.

it's real simple.

VL in our case can be anywhere from 1 to 64.  *very rarely* it is permitted to be zero.

so how do we set this "VL" or vector length?

well, with an instruction of course.
https://libre-soc.org/openpower/sv/setvl/

and... err... then what?  well, no standard 32 bit scalar instructions do anything: they don't "understand" VL.

so we "Prefix" them.  this says, "hey you know that VL for-loop you want applied? well the next 32 bits contains the instruction to be smashed into that for-loop, oh and by the way here's some other random trash to chuck at the loop, such as predication, blah blah".

therefore, ultimately, we want this kind of syntax:

    setvl r3, r5, VL=4
    SUBVL=2, ELWIDTH=8 { add r5, r5, r2 }

the output will be:

* 32 bits containing an instruction for setvl
* 32 bits starting with EXT01 as its Major Opcode and continuing with the pattern that drops SUBVL=2 and ELWIDTH=8 somewhere into the RM field bits
* 32 bits containing an addi instruction

this will get us that hardware for-loop activated 4 times (0-3) on that add instruction.

actually 8 because SUBVL=2

and, actually, it will be 8bit adds not 64bit adds because ELWIDTH=8.

does that provide you with a quick crash-course in how SV works?
Comment 3 Luke Kenneth Casson Leighton 2020-12-21 21:50:27 GMT
i've added a rapid prototype "Assembly Annotation" to the appendix,
and also updated the "Prefix Fields".

unnnforttunately, i just realised that, actually, working out which
of the "Remapped Encodings" to apply, will need to be a per-instruction
basis, for everything but MASK_KIND, MASK, ELWIDTH, SUBVL and MODE.
these are always in the same place: everything else (EXTRAs, ELWIDTH_SRC,
MASK_SRC) critically depends on what instruction is used.

we can "get away with this" by specifying the mode-type as part of the
svp64 encoding... for now.
Comment 4 Alexandre Oliva 2020-12-26 02:14:34 GMT
Created attachment 123 [details]
Ho ho ho!

Here's a patch that introduces in GNU binutils an svp64 pseudo-instruction, that takes a single 24-bit operand, and encodes it as a 32-bit insn with EXT01 as the major opcode, and MSB0 bits 7 and 9 also set, shuffling the top two bits of the 24-bit operand, RM[0] and RM[1], into bits 6 and 8 of the insn.
Comment 5 Luke Kenneth Casson Leighton 2020-12-26 02:38:08 GMT
(In reply to Alexandre Oliva from comment #4)
> Created attachment 123 [details]
> Ho ho ho!

:)
 
> Here's a patch that introduces in GNU binutils an svp64 pseudo-instruction,
> that takes a single 24-bit operand, and encodes it as a 32-bit insn with
> EXT01 as the major opcode, and MSB0 bits 7 and 9 also set, shuffling the top
> two bits of the 24-bit operand, RM[0] and RM[1], into bits 6 and 8 of the
> insn.

cool!  this is fantastic, it means that the next stages open up as well, for adding basic SV capability to ISACaller (the simulator).

alexandre, i will create a binutils git clone tomorrow, to make sure this gets tracked properly.
Comment 6 Alexandre Oliva 2020-12-26 02:51:30 GMT
thanks for the crash course.  as I said in the call, it was very useful.
it's all beginning to make sense.

> we can "get away with this" by specifying the mode-type as part of the
svp64 encoding... for now.

I was going to ask about that.  it seems that there's nothing in the svp64 prefix instruction itself that tells how to decode its fields, you have to look at the actual insn that follows to know.

Once we get to a stage in which we'll want to specify svp64 fields separately, rather than combined into a 24-bit immediate, an explicit specification of mode may help the assembler, to some extent, but the disassembler (and the assembler, if it's to detect inconsistencies) will have to look at prefix+insn as a single thing to be able to do its job.
Comment 7 Luke Kenneth Casson Leighton 2020-12-26 08:03:59 GMT
(In reply to Alexandre Oliva from comment #6)
> thanks for the crash course.  as I said in the call, it was very useful.
> it's all beginning to make sense.

ah good.  it's kinda surprising that nobody has thought of this before.

> > we can "get away with this" by specifying the mode-type as part of the
> svp64 encoding... for now.
> 
> I was going to ask about that.  it seems that there's nothing in the svp64
> prefix instruction itself that tells how to decode its fields, you have to
> look at the actual insn that follows to know.

correct.  bit (haha) annoying however with bits so precious it's how it goes. the alternative is that we request a Major Opcode then use the 2 extra bits, one for 1/2 Predication, the other for 2/3 EXTRA (although to be honest, 2 more bits means 4 more modes/features...)

the sv_analysis.py program is generating tables already
https://libre-soc.org/openpower/opcode_regs_deduped/

the idea is to create CSV files which give those 2 missing bits.  it is not outside the realm of possibility to autogenerate a header file for inclusion in binutils.
 
> Once we get to a stage in which we'll want to specify svp64 fields
> separately, rather than combined into a 24-bit immediate, an explicit
> specification of mode may help the assembler, to some extent, 

autogenerated.  otherwise it's too much work (200 insns) and you get transcription errors.  dunno bout you but i don't want to have to check that.

> but the
> disassembler (and the assembler, if it's to detect inconsistencies) will
> have to look at prefix+insn as a single thing to be able to do its job.

indeed.  and PowerDecoder2 as well.  this is how it goes.

i'm not happy about it because normally RISC is not supposed to have lots of gates in the decoder.

if we were doing our own ISA from scratch these two bits, saying whether 1P/2P was set and whether EXTRA2/3 was set would definitely be part of the opcode.