the initial exploration was successful, it needs a further round of rationalisation and analysis
andrey can you please publish that matrix example under the URL above
Writing up the matrix remap doc (currently only covered the matrix multiply): https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=afdc384bb78d1ae7938eb0d5f15ba785863d4043 https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=d559ce5ac28615cc5dbbb2cb57a66a2374f341ad https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=9006971fe6c4e95560706918af19a4e42707a9d9
spelling corrections remove marketing terminology use triple and single quotes correct SPR names add tag use standard layout frmat for links (at top) correct wiki links (needed for PDF building) "Power ISA v3.0C" is not named "OpenPOWER ISA v3.0C" https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=78747e8 https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=df1a1d30
I've been going over the matrix multiply example again, and noticed that the maddld (Multiply-Add Low Doubleword) in the 3.1b version of the PowerISA spec is in the Linux Compliancy Subset, not SFS or SFFS. See appendix G table entry on page 1477 of the document, or page 1503 of the pdf.
(In reply to Andrey Miroshnikov from comment #4) > I've been going over the matrix multiply example again, and noticed that the > maddld (Multiply-Add Low Doubleword) in the 3.1b version of the PowerISA > spec is in the Linux Compliancy Subset, not SFS or SFFS. See appendix G > table entry on page 1477 of the document, or page 1503 of the pdf. (for the next para, read page vii onwards carefully). it is permitted to implement any additional instructions over-and-above what is in any given subset. however *some* instructions (such as the MMA group) are a "all or nothing". maddld is not one such instruction. therefore we can add it. i am checking v3.0C and p1215 figure 94 shows it to be v3.0 000100 ..... 110000VAI78maddhd v3.0 Multiply-Add HighDoubleword 000100 ..... 110001VAI78maddhdu v3.0 Multiply-Add HighDoubleword Unsigned 000100 ..... 110011VAI78maddld v3.0 Multiply-Add Low Doubleword again: Public v3.1, p1425 figure 92 shows exactly the same. v3.0 checking p1455 in the version i have) 000100 ..... 110000I ..XX maddhd v3.0 p88 Multiply-Add High Doubleword VA-form so again, v3.0. let's have a look at the "key" to the table, find out what "..XX" means. OpenPOWER Compliancy Subsets X... Instruction included in the Scalar Fixed-Point Compliancy subset .X.. Instruction included in the Scalar Fixed-Point + Floating-Point Compliancy ..X. Instruction included in the Linux Compliancy subset ...X Instruction included in the AIX Compliancy subset. that's plain wrong. let's check addi. 001110 ..... XXXX addi P176Add Immediate D-form not helpful. fcfid 111111 ..... ///// ..... 11010 01110.I .XXX fcfid ok better. ok so the coding of X is correct.
(In reply to Luke Kenneth Casson Leighton from comment #5) > (for the next para, read page vii onwards carefully). > > it is permitted to implement any additional instructions over-and-above > what is in any given subset. however *some* instructions (such as the > MMA group) are a "all or nothing". Thanks, I found the optional features listed on pages x-xiii. > > maddld is not one such instruction. > > therefore we can add it. Maddld belongs to the 3.3.9 Fixed-Point Arithmetic Instructions Category, more specifically the *3.3.9.1 64-bit Fixed-Point Arithmetic Instructions* (both on 3.0c and 3.1b). On page xiii, the optional feature: "Fixed-point instructions that modify OV to indicate whether overflow occurred (OV) (addex and instructions with OE=1 such as addo, subfo, etc.)" With the reference pointing to 3.3.9. Do we need to implement the whole set of Fixed-Point Arithmetic Instructions under section 3.3.9 to be compliant (be easily supported by the compiler)? It seems far-fetched to me that just maddld by itself would be acceptable, unless we would also have provide compiler modifications ourselves (i.e. a flag to say this instruction is present). > 000100 ..... 110000I ..XX maddhd v3.0 p88 Multiply-Add High Doubleword > VA-form > > so again, v3.0. > > let's have a look at the "key" to the table, find out what "..XX" means. > > OpenPOWER Compliancy Subsets > X... Instruction included in the Scalar Fixed-Point Compliancy subset > .X.. Instruction included in the Scalar Fixed-Point + Floating-Point > Compliancy > ..X. Instruction included in the Linux Compliancy subset > ...X Instruction included in the AIX Compliancy subset. > > that's plain wrong. let's check addi. Not sure what you meant by "that's plain wrong". The Compliancy Subset column looks to be correct, however I'm probably not understanding something. > 111111 ..... ///// ..... 11010 01110.I .XXX fcfid > > ok better. ok so the coding of X is correct. Yep, part of SFFS and above.
(In reply to Andrey Miroshnikov from comment #6) > Thanks, I found the optional features listed on pages x-xiii. we are permitted to add any instruction that is not in one of those "all-or-nothing" optional subsets. MMA is an example of one of those "all-or-nothing" subsets. whilst good that you raised it, do not spend further time on this. it's in microwatt, therefore we include it. that's all there is to it. end of the story. https://github.com/antonblanchard/microwatt/blob/63d4553faeea897926f89eb7486dd991cd9e83c8/decode1.vhdl#L246
(In reply to Luke Kenneth Casson Leighton from comment #7) > whilst good that you raised it, do not spend further time on this. it's in > microwatt, therefore we include it. that's all there is to it. end of the > story. > https://github.com/antonblanchard/microwatt/blob/ > 63d4553faeea897926f89eb7486dd991cd9e83c8/decode1.vhdl#L246 No problem, thanks for the clarification. Won't be dwelling on it.
(In reply to Jacob Lifshay from comment #8) > I have previously mentioned please keep this bugreport solely and exclusively on-topic for Matrix Multiply as it is likely to be put in front of an Investor or a Client. i will now delete the off-topic discussion. thank you.
Added links to setvl and remapyield pages: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=b1af6290493d5e24b66a697e18c48198dae336c0 Added side-by-side outer and inner product indices table (maybe unnecessary, we'll see): https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=e753285cd38279c477ebd4a20d9cc56ab0fb7201 https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=f0f633cf4c971a0719859dcdd33ef170eb5f408a Worked example drawings (to show why outer product is inefficient): https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=ec354c22512c8d7ba90ee86b2a566bff198cb045 https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=5b92098f9f10fa004274e1e549e9634a8683219e (These diagrams were done earlier for my understanding, so adding them to the page was trivial) Fixed permute table: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=f38a4ebf02eace64055088c15e44034ed5ab11fd (I may move the whole SVSHAPE SPR section to the appendix as it's too detailed. Still would prefer to keep it, as it aids with some understanding). How to compute the SVxd/SVyd/SVzd parameters (to make sure I can adjust the example for any matrix): https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=e27736d030ed3a806a67652bac3fd3f436aba04e There's probably more to it than that, but for the basic examples I will be doing those rules are sufficient for now. svremp insn breakdown and simulator example (terrible example of putting two separate changes in one commit, I noticed too late) https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=05ccb75bebd8956aa3f0361bcea8c34e6a86d04b https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=c4ef47abccbb5519ffc6b2c8ad1f9d3348dd0b41 None of my explanations cover SVSHAPE1/2/3, but I'm guessing their are set by the svremap instruction? The sole svshape instruction specifies REMAP mode 0, which only deals with SVSHAPE0 to my understanding. A section on the indices generate by remapyield.py, and how those correspond to the actual indices of the operand and result matrices. Section for now incomplete. https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=4be5bec8309348e84bfd3c35df0aebc1a82c5ac7
(In reply to Andrey Miroshnikov from comment #11) > A section on the indices generate by remapyield.py, and how those correspond > to the actual indices of the operand and result matrices. Forgot the quotes for the table: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=b82da23c7a0d368ac811df9b98b1bd670a28e786
andrey it looks good, esp. the images. must convert them soe time. however the explanation of svshape is unnecessarily long, duplicating content that is already in thespec. it is a distraction (and is "spec information" style not "readable" style writing) also this is wrong For Matrix Multiply, SHAPE0 SPR is used: *three* SVSHAPE SPRs are used and they are named SVSHAPEn not SHAPEn (i corrected this earlier last week) you can see the instruction source code, you do not need me to see that: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=33a02e#l137 notice how SVSHAPE0-2 are set up? one for A, one for B, one for C in C=AxB... otherwise how can we ever do the schedule for A B and C unless there is an SVSHAPE for each? hmmm i thought i had created a yield-demo which uses remapyield.py to shw that more clearly, but i think (over 2!) years ago i did a demo for DCT and FFT. https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/remap_fft_yield.py;hb=HEAD yyeah there it is. that is standalone executable (designed as such). sorry, didn't do a similar one for matrix, rempyield.py is much more basic it was the first REMAP subsystem i ever did.
(In reply to Andrey Miroshnikov from comment #12) > Forgot the quotes for the table: if you can indent thm back four spaces that will look better.
(In reply to Luke Kenneth Casson Leighton from comment #13) > *three* SVSHAPE SPRs are used and they are named SVSHAPEn not SHAPEn > (i corrected this earlier last week) > > you can see the instruction source code, you do not need me > to see that: > > https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/ > simplev.mdwn;h=33a02e#l137 Thanks for the clarification, I was confused by the "SHAPE Remapping SPRs" section in openpower/sv/remap/ (I thought only SVSHAPE0 is being used). > > notice how SVSHAPE0-2 are set up? > one for A, one for B, one for C in C=AxB... > > otherwise how can we ever do the schedule for A B and C > unless there is an SVSHAPE for each? A quick look does show SVSHAPE0-2 being used and setup, yeah. I'll look over the pseudocode properly on the train. > > yyeah there it is. that is standalone executable (designed as such). > sorry, didn't do a similar one for matrix, rempyield.py is much more > basic it was the first REMAP subsystem i ever did. Only after studying svshape instruction and SVSHAPE0 did I realise just how capable and flexible the configuration can be. However getting used to this paradigm is not easy (kind of frightening when there are so many parameters to tweak).
(In reply to Luke Kenneth Casson Leighton from comment #14) > if you can indent thm back four spaces that will look better. It does indeed look better. https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=6c51cb7909ab90e8a85f5db85477cb40a5032082 (In reply to Andrey Miroshnikov from comment #15) > > https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/ > > simplev.mdwn;h=33a02e#l137 > Thanks for the clarification, I was confused by the "SHAPE Remapping SPRs" > section in openpower/sv/remap/ (I thought only SVSHAPE0 is being used). Forgot to include commit: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=dff78c5a215434f15ffb23db2f1174c969079c40 I haven't remove the duplicated bits yet, but will do that after the meeting.
(In reply to Andrey Miroshnikov from comment #15) > I'll look over the pseudocode properly on the train. it's not that exciting :) basically it does-the-job-in-one-instruction, providing "commonly used" setups that would in some cases require THIRTY OR GREATER instructions to do. i mean, who the heck wants to do all that bitmanipulation? > Only after studying svshape instruction and SVSHAPE0 did I realise just how > capable and flexible the configuration can be. However getting used to this > paradigm is not easy (kind of frightening when there are so many parameters > to tweak). you're not... really... supposed to program SVSHAPE* directly, although you can. the penalty for doing so will in *some* hardware implementations be a slowdown/stall (remember we are *writing a spec* not "writing something solely and exclusively to be implemented by us and us alone", so other potential implementors have to be imagined and then thought through)
Made a bunch of updates to the remap matrix entry. Minor corrections: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=d00d706aad0255ce8bf4e33bf609fecd8cf3f9c5 https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=16b44c819cd3f2922b7fd973201ea3850c576eb1 https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=30be7bc693a9f3ba8dc179eb8b3eff28639257c8 https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=4ee70db0509793a5a5d2cc438f12612099f44e51 Save space by removing indenting: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=92151876263c1213f010c808057b7d1060afc75c Minor formatting fixes: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=725b8e3b43b08823f9d189849f4c66bbd3dfc7f7 Add row/column index table (finally realised how this corresponds to the SVSHAPE's x/y/z indices): https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=68c1a4d9d3200c4b625e1a84d962b9234128e6b6 Remove the duplicated SVSHAPE SPR section, update svremap and maddld sections: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=353c9900c7128084f26a8e23c20a7340ed0f255f Added an brief explanation on why three instructions are sufficient: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=f36dcf74ead4da26825747b5e1cd4280653e72e6 If this explanation is irrelevant, feel free to take it out. I just thought that it would be helpful. At this point I feel it's done. Let me know if anything else is missing.
(In reply to Andrey Miroshnikov from comment #18) > At this point I feel it's done. > Let me know if anything else is missing. Luke, please confirm if this is considered complete. I've been asked to submit an RfP for Red.
(In reply to Andrey Miroshnikov from comment #19) > Luke, please confirm if this is considered complete. you can see by the status "fixed" that it is indeed. > I've been asked to submit an RfP for Red. can you ask (and advise whomever it was to directly do so i future) to ask on the libre-soc-dev mailing list if they don't know how RFPs work, where i can reply informing you where (and how) the other tasks are that can be checked for RFPs, and in the near future you can add the procedure to bug #1126 so as to be able to advise people on how to check for themselves without you getting utterly exhausted becuse you are repeating it to everyone *privately* dozens of times? (bottom line, message behind above: you will not be doing NLet any favours by submitting one RFP per task completed)