Bug 701 - document Matrix REMAP in SVP64
Summary: document Matrix REMAP in SVP64
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Andrey Miroshnikov
URL: https://libre-soc.org/openpower/sv/co...
Depends on:
Blocks: 952 953 241
  Show dependency treegraph
 
Reported: 2021-09-19 21:29 BST by Luke Kenneth Casson Leighton
Modified: 2024-01-15 17:01 GMT (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2022-08-051.OPF
total budget (EUR) for completion of task and all subtasks: 2000
budget (EUR) for this task, excluding subtasks' budget: 2000
parent task for budget allocation: 953
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
red={amount=1000,submitted=2024-01-05,paid=2024-01-12} lkcl={amount=1000,submitted=2023-12-01,paid=2023-12-19}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-09-19 21:29:53 BST
the initial exploration was successful, it needs a further round
of rationalisation and analysis
Comment 1 Luke Kenneth Casson Leighton 2023-10-15 21:40:02 BST
andrey can you please publish that matrix example under the URL above
Comment 3 Luke Kenneth Casson Leighton 2023-10-30 06:59:12 GMT
spelling corrections
remove marketing terminology
use triple and single quotes
correct SPR names
add tag
use standard layout frmat for links (at top)
correct wiki links (needed for PDF building)
"Power ISA v3.0C" is not named "OpenPOWER ISA v3.0C"

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=78747e8
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=df1a1d30
Comment 4 Andrey Miroshnikov 2023-11-05 17:26:59 GMT
I've been going over the matrix multiply example again, and noticed that the maddld (Multiply-Add Low Doubleword) in the 3.1b version of the PowerISA spec is in the Linux Compliancy Subset, not SFS or SFFS. See appendix G table entry on page 1477 of the document, or page 1503 of the pdf.
Comment 5 Luke Kenneth Casson Leighton 2023-11-05 17:54:46 GMT
(In reply to Andrey Miroshnikov from comment #4)
> I've been going over the matrix multiply example again, and noticed that the
> maddld (Multiply-Add Low Doubleword) in the 3.1b version of the PowerISA
> spec is in the Linux Compliancy Subset, not SFS or SFFS. See appendix G
> table entry on page 1477 of the document, or page 1503 of the pdf.

(for the next para, read page vii onwards carefully).

it is permitted to implement any additional instructions over-and-above
what is in any given subset. however *some* instructions (such as the
MMA group) are a "all or nothing".

maddld is not one such instruction.

therefore we can add it.


i am checking v3.0C and p1215 figure 94 shows it to be v3.0

000100 ..... 110000VAI78maddhd v3.0 Multiply-Add HighDoubleword
000100 ..... 110001VAI78maddhdu v3.0 Multiply-Add HighDoubleword Unsigned
000100 ..... 110011VAI78maddld v3.0 Multiply-Add Low Doubleword

again: Public v3.1, p1425 figure 92 shows exactly the same. v3.0

checking p1455 in the version i have)

000100 ..... 110000I ..XX maddhd v3.0 p88 Multiply-Add High Doubleword VA-form

so again, v3.0.

let's have a look at the "key" to the table, find out what "..XX" means.

OpenPOWER Compliancy Subsets
X... Instruction included in the Scalar Fixed-Point Compliancy subset
.X.. Instruction included in the Scalar Fixed-Point + Floating-Point Compliancy
..X. Instruction included in the Linux Compliancy subset
...X Instruction included in the AIX Compliancy subset.

that's plain wrong.  let's check addi.
001110 ..... XXXX addi P176Add Immediate D-form

not helpful. fcfid

111111 ..... ///// ..... 11010 01110.I .XXX fcfid

ok better. ok so the coding of X is correct.
Comment 6 Andrey Miroshnikov 2023-11-05 19:13:20 GMT
(In reply to Luke Kenneth Casson Leighton from comment #5)
> (for the next para, read page vii onwards carefully).
> 
> it is permitted to implement any additional instructions over-and-above
> what is in any given subset. however *some* instructions (such as the
> MMA group) are a "all or nothing".
Thanks, I found the optional features listed on pages x-xiii.

> 
> maddld is not one such instruction.
> 
> therefore we can add it.
Maddld belongs to the 3.3.9 Fixed-Point Arithmetic Instructions Category, more specifically the *3.3.9.1 64-bit Fixed-Point Arithmetic Instructions* (both on 3.0c and 3.1b).

On page xiii, the optional feature:
"Fixed-point instructions that modify OV to
indicate whether overflow occurred (OV)
(addex and instructions with OE=1 such
as addo, subfo, etc.)"

With the reference pointing to 3.3.9.

Do we need to implement the whole set of Fixed-Point Arithmetic Instructions under section 3.3.9 to be compliant (be easily supported by the compiler)?

It seems far-fetched to me that just maddld by itself would be acceptable, unless we would also have provide compiler modifications ourselves (i.e. a flag to say this instruction is present).

> 000100 ..... 110000I ..XX maddhd v3.0 p88 Multiply-Add High Doubleword
> VA-form
> 
> so again, v3.0.
> 
> let's have a look at the "key" to the table, find out what "..XX" means.
> 
> OpenPOWER Compliancy Subsets
> X... Instruction included in the Scalar Fixed-Point Compliancy subset
> .X.. Instruction included in the Scalar Fixed-Point + Floating-Point
> Compliancy
> ..X. Instruction included in the Linux Compliancy subset
> ...X Instruction included in the AIX Compliancy subset.
> 
> that's plain wrong.  let's check addi.
Not sure what you meant by "that's plain wrong".
The Compliancy Subset column looks to be correct, however I'm probably not understanding something.

> 111111 ..... ///// ..... 11010 01110.I .XXX fcfid
> 
> ok better. ok so the coding of X is correct.
Yep, part of SFFS and above.
Comment 7 Luke Kenneth Casson Leighton 2023-11-05 19:37:29 GMT
(In reply to Andrey Miroshnikov from comment #6)

> Thanks, I found the optional features listed on pages x-xiii.

we are permitted to add any instruction that is not in one of
those "all-or-nothing" optional subsets. MMA is an example
of one of those "all-or-nothing" subsets.

whilst good that you raised it, do not spend further time on this. it's in
microwatt, therefore we include it. that's all there is to it. end of the story.
https://github.com/antonblanchard/microwatt/blob/63d4553faeea897926f89eb7486dd991cd9e83c8/decode1.vhdl#L246
Comment 8 Jacob Lifshay 2023-11-05 19:44:37 GMT

    
Comment 9 Andrey Miroshnikov 2023-11-05 19:49:39 GMT
(In reply to Luke Kenneth Casson Leighton from comment #7)
> whilst good that you raised it, do not spend further time on this. it's in
> microwatt, therefore we include it. that's all there is to it. end of the
> story.
> https://github.com/antonblanchard/microwatt/blob/
> 63d4553faeea897926f89eb7486dd991cd9e83c8/decode1.vhdl#L246
No problem, thanks for the clarification. Won't be dwelling on it.
Comment 10 Luke Kenneth Casson Leighton 2023-11-05 20:12:39 GMT
(In reply to Jacob Lifshay from comment #8)

> I have previously mentioned

please keep this bugreport solely and exclusively on-topic for Matrix
Multiply as it is likely to be put in front of an Investor or a Client.
i will now delete the off-topic discussion.  thank you.
Comment 11 Andrey Miroshnikov 2023-11-06 19:37:23 GMT
Added links to setvl and remapyield pages:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=b1af6290493d5e24b66a697e18c48198dae336c0

Added side-by-side outer and inner product indices table (maybe unnecessary, we'll see):
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=e753285cd38279c477ebd4a20d9cc56ab0fb7201
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=f0f633cf4c971a0719859dcdd33ef170eb5f408a

Worked example drawings (to show why outer product is inefficient):
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=ec354c22512c8d7ba90ee86b2a566bff198cb045
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=5b92098f9f10fa004274e1e549e9634a8683219e
(These diagrams were done earlier for my understanding, so adding them to the page was trivial)

Fixed permute table:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=f38a4ebf02eace64055088c15e44034ed5ab11fd

(I may move the whole SVSHAPE SPR section to the appendix as it's too detailed. Still would prefer to keep it, as it aids with some understanding).

How to compute the SVxd/SVyd/SVzd parameters (to make sure I can adjust the example for any matrix):
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=e27736d030ed3a806a67652bac3fd3f436aba04e
There's probably more to it than that, but for the basic examples I will be doing those rules are sufficient for now.

svremp insn breakdown and simulator example (terrible example of putting two separate changes in one commit, I noticed too late)
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=05ccb75bebd8956aa3f0361bcea8c34e6a86d04b
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=c4ef47abccbb5519ffc6b2c8ad1f9d3348dd0b41

None of my explanations cover SVSHAPE1/2/3, but I'm guessing their are set by the svremap instruction?
The sole svshape instruction specifies REMAP mode 0, which only deals with SVSHAPE0 to my understanding.

A section on the indices generate by remapyield.py, and how those correspond to the actual indices of the operand and result matrices. Section for now incomplete.
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=4be5bec8309348e84bfd3c35df0aebc1a82c5ac7
Comment 12 Andrey Miroshnikov 2023-11-06 19:39:27 GMT
(In reply to Andrey Miroshnikov from comment #11)
> A section on the indices generate by remapyield.py, and how those correspond
> to the actual indices of the operand and result matrices.
Forgot the quotes for the table:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=b82da23c7a0d368ac811df9b98b1bd670a28e786
Comment 13 Luke Kenneth Casson Leighton 2023-11-06 21:09:30 GMT
andrey it looks good, esp. the images. must convert them soe time.

 however the explanation of svshape is unnecessarily
long, duplicating content that is already in thespec. it is a distraction
(and is "spec information" style not "readable" style writing)

also this is wrong

   For Matrix Multiply, SHAPE0 SPR is used:

*three* SVSHAPE SPRs are used and they are named SVSHAPEn not SHAPEn
(i corrected this earlier last week)

you can see the instruction source code, you do not need me
to see that:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=33a02e#l137

notice how SVSHAPE0-2 are set up?
one for A, one for B, one for C  in C=AxB...

otherwise how can we ever do the schedule for A B and C
unless there is an SVSHAPE for each?

hmmm i thought i had created a yield-demo which uses remapyield.py
to shw that more clearly, but i think (over 2!) years ago i did
a demo for DCT and FFT.

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD

yyeah there it is. that is standalone executable (designed as such).
sorry, didn't do a similar one for matrix, rempyield.py is much more
basic it was the first REMAP subsystem i ever did.
Comment 14 Luke Kenneth Casson Leighton 2023-11-06 21:10:33 GMT
(In reply to Andrey Miroshnikov from comment #12)

> Forgot the quotes for the table:


if you can indent thm back four spaces that will look better.
Comment 15 Andrey Miroshnikov 2023-11-06 23:43:19 GMT
(In reply to Luke Kenneth Casson Leighton from comment #13)
> *three* SVSHAPE SPRs are used and they are named SVSHAPEn not SHAPEn
> (i corrected this earlier last week)
> 
> you can see the instruction source code, you do not need me
> to see that:
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/
> simplev.mdwn;h=33a02e#l137
Thanks for the clarification, I was confused by the "SHAPE Remapping SPRs" section in openpower/sv/remap/ (I thought only SVSHAPE0 is being used).

> 
> notice how SVSHAPE0-2 are set up?
> one for A, one for B, one for C  in C=AxB...
> 
> otherwise how can we ever do the schedule for A B and C
> unless there is an SVSHAPE for each?
A quick look does show SVSHAPE0-2 being used and setup, yeah.
I'll look over the pseudocode properly on the train. 

> 
> yyeah there it is. that is standalone executable (designed as such).
> sorry, didn't do a similar one for matrix, rempyield.py is much more
> basic it was the first REMAP subsystem i ever did.
Only after studying svshape instruction and SVSHAPE0 did I realise just how capable and flexible the configuration can be. However getting used to this paradigm is not easy (kind of frightening when there are so many parameters to tweak).
Comment 16 Andrey Miroshnikov 2023-11-06 23:46:17 GMT
(In reply to Luke Kenneth Casson Leighton from comment #14)
> if you can indent thm back four spaces that will look better.
It does indeed look better.
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=6c51cb7909ab90e8a85f5db85477cb40a5032082

(In reply to Andrey Miroshnikov from comment #15)
> > https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/
> > simplev.mdwn;h=33a02e#l137
> Thanks for the clarification, I was confused by the "SHAPE Remapping SPRs"
> section in openpower/sv/remap/ (I thought only SVSHAPE0 is being used).
Forgot to include commit:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=dff78c5a215434f15ffb23db2f1174c969079c40
I haven't remove the duplicated bits yet, but will do that after the meeting.
Comment 17 Luke Kenneth Casson Leighton 2023-11-07 03:54:23 GMT
(In reply to Andrey Miroshnikov from comment #15)

> I'll look over the pseudocode properly on the train. 

it's not that exciting :)
basically it does-the-job-in-one-instruction, providing "commonly used"
setups that would in some cases require THIRTY OR GREATER instructions
to do.

i mean, who the heck wants to do all that bitmanipulation?

> Only after studying svshape instruction and SVSHAPE0 did I realise just how
> capable and flexible the configuration can be. However getting used to this
> paradigm is not easy (kind of frightening when there are so many parameters
> to tweak).

you're not... really... supposed to program SVSHAPE* directly, although
you can. the penalty for doing so will in *some* hardware implementations
be a slowdown/stall (remember we are *writing a spec* not "writing something
solely and exclusively to be implemented by us and us alone", so other
potential implementors have to be imagined and then thought through)
Comment 18 Andrey Miroshnikov 2023-11-09 23:37:18 GMT
Made a bunch of updates to the remap matrix entry.

Minor corrections:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=d00d706aad0255ce8bf4e33bf609fecd8cf3f9c5
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=16b44c819cd3f2922b7fd973201ea3850c576eb1
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=30be7bc693a9f3ba8dc179eb8b3eff28639257c8
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=4ee70db0509793a5a5d2cc438f12612099f44e51

Save space by removing indenting:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=92151876263c1213f010c808057b7d1060afc75c

Minor formatting fixes:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=725b8e3b43b08823f9d189849f4c66bbd3dfc7f7

Add row/column index table (finally realised how this corresponds to the SVSHAPE's x/y/z indices):
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=68c1a4d9d3200c4b625e1a84d962b9234128e6b6

Remove the duplicated SVSHAPE SPR section, update svremap and maddld sections:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=353c9900c7128084f26a8e23c20a7340ed0f255f


Added an brief explanation on why three instructions are sufficient:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=f36dcf74ead4da26825747b5e1cd4280653e72e6
If this explanation is irrelevant, feel free to take it out. I just thought that it would be helpful.


At this point I feel it's done.
Let me know if anything else is missing.
Comment 19 Andrey Miroshnikov 2023-11-30 12:55:48 GMT
(In reply to Andrey Miroshnikov from comment #18)
> At this point I feel it's done.
> Let me know if anything else is missing.

Luke, please confirm if this is considered complete. I've been asked to submit an RfP for Red.
Comment 20 Luke Kenneth Casson Leighton 2023-11-30 14:37:17 GMT
(In reply to Andrey Miroshnikov from comment #19)

> Luke, please confirm if this is considered complete. 

you can see by the status "fixed" that it is indeed.

> I've been asked to submit an RfP for Red.

can you ask (and advise whomever it was to directly do so i  future)
to ask on the libre-soc-dev mailing list if they don't know
how RFPs work, where i can reply informing you where (and how) the other
tasks are that can be checked for RFPs, and in the
near future you can add the procedure to bug #1126 so as to
be able to advise people on how to check for themselves
without you getting utterly exhausted becuse you are repeating
it to everyone *privately* dozens of times?

(bottom line, message behind above: you will not be doing NLet any
favours by submitting one RFP per task completed)