FFT and DCT floating-point butterfly instructions have been implemented, but integer ones have not. once designed (see bug #962) they need to be implemented.
First working implementation/prototype at: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/butterfly.mdwn;h=555e8abd120bc1252c52c2610a5afabe888136f3;hb=refs/heads/maddsubrs
REVERTED *** This bug has been marked as a duplicate of bug 1074 *** *** This bug has been marked as a duplicate of bug 1074 ***
small budget for me as i am making actual changes to the FP instructions, as well. the number of operands is reduced: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f07887871e71669b6cce2dc2aaac53f2b02a4621 https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6fe2b6ccc37181c0f416df3706110ea609377746 also i need to add double variants of some of these instructions.
I added a test case as well as adjusted sv_analysis.py to properly handle maddsubrs. commit 7d8c8cb54774392e018fbd48840ba76c09b1a6af Author: Jacob Lifshay <programmerjake@gmail.com> Date: Wed May 31 23:46:23 2023 -0700 add worked-out svp64 16-bit maddsubrs test case commit 7c3488d2cda99ad682c2e4de31ecc99cf3e64b09 Author: Jacob Lifshay <programmerjake@gmail.com> Date: Wed May 31 23:30:48 2023 -0700 make maddsubrs show up in SVP64 generated CSVs commit ce029ac5663a48665bb8b558f785d078dad8c6a5 Author: Jacob Lifshay <programmerjake@gmail.com> Date: Wed May 31 23:29:59 2023 -0700 raise error on unhandled instruction kind
Latest commit fixes the XLEN != 64 cases. https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb
I did some work on this (adding the unit test), therefore I should get paid some amount, I'll let luke figure out how much. that's why there's the jacob=0. markos, please don't remove that, it isn't an accident.
(In reply to Konstantinos Margaritis (markos) from comment #5) > Latest commit fixes the XLEN != 64 cases. > > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb yay! looks good for XLEN<64 you should be able to delete both XLEN=64 branches and just keep the else blocks, since then the whole function is independent of how many bits all the values have. this will make the XLEN=64 case not drop bits -- e.g. right now the XLEN=64 case will return the wrong result for SH=31 with RT=2^32, RA=0, and RB=2^32 (essentially fixed-point 2.0 * 2.0 for 33.31 fixed-point format) whereas afaict the else block would return the correct result, if only you used it for XLEN=64 too. so, basically you can delete lines 24-40 and 74-91 in https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/butterfly.mdwn;h=b6b7efe523c4ee9c9e61dd1fc6ce5ab96d3d7121;hb=189a461f1c86d21a6813ed8df892c0cf0b6d8acb#l24
(In reply to Jacob Lifshay from comment #7) > (In reply to Konstantinos Margaritis (markos) from comment #5) > > Latest commit fixes the XLEN != 64 cases. > > > > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > > h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb > > yay! looks good for XLEN<64 > > you should be able to delete both XLEN=64 branches and just keep the else > blocks, since then the whole function is independent of how many bits all > the values have. this will make the XLEN=64 case not drop bits -- e.g. right > now the XLEN=64 case will return the wrong result for SH=31 with RT=2^32, > RA=0, and RB=2^32 (essentially fixed-point 2.0 * 2.0 for 33.31 fixed-point > format) whereas afaict the else block would return the correct result, if > only you used it for XLEN=64 too. if this breaks test cases, the test cases may just be incorrect.
while writing unit tests for maddrs, I noticed that it's a 4-in 2-out instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes RT and RS. If you must have that many inputs/outputs, it seems to me that it would be better to just directly compute the expression you care about, since that's also 4-in 2-out, rather than needing 2 instructions: fdct_round_shift(a * c1 +/- b * c2) basically replace maddrs with: Multiply Multiply Add Sub Right Shift A-Form mmaddsubrs RT, RA, RB, SH prod1 <- MULS((RT), (RA)) prod2 <- MULS((RS), (RB)) RT <- round_shift(prod1 + prod2, SH) RS <- round_shift(prod1 - prod2, SH)
(In reply to Jacob Lifshay from comment #9) > while writing unit tests for maddrs, I noticed that it's a 4-in 2-out > instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes > RT and RS. i already explained to markos on IRC why that is prohibited. the instruction has not been updated since that discussion.
(In reply to Luke Kenneth Casson Leighton from comment #10) > (In reply to Jacob Lifshay from comment #9) > > while writing unit tests for maddrs, I noticed that it's a 4-in 2-out > > instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes > > RT and RS. > > i already explained to markos on IRC why that is prohibited. > the instruction has not been updated since that discussion. ah, ok. reading through the pseudo-code for maddsubrs/maddrs I noticed 2 things: 1. all the code to compute masks and merge in the sign bit is not actually necessary, you extract the full XLEN-bit value you need with just: prod1[XLEN - n:XLEN*2 - n - 1] no further adjustment is needed (since casting to the result type is essentially modular reduction, hence ignores the high bits). 2. for maddsubrs: if you sign extend the sum/difference by one bit before multiplying, you avoid errors due to overflow: e.g. use (RT)[0] || (RT) instead of (RT) overflow errors do make it into the final result due to the right-shifting
https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=1ede1d3464b184bae24668bf58d92a9bca73fce4 commit 1ede1d3464b184bae24668bf58d92a9bca73fce4 Author: Jacob Lifshay <programmerjake@gmail.com> Date: Thu Jul 20 15:42:25 2023 -0700 add much more exhaustive maddrs unit tests commit 1967fdf575ea7285800dfbc05f5656b537b9119f Author: Jacob Lifshay <programmerjake@gmail.com> Date: Thu Jul 20 15:19:19 2023 -0700 add much more exhaustive maddsubrs unit tests commit 709d5c02dac8d3ab2b91fb15a63e1d55eb633d2d Author: Jacob Lifshay <programmerjake@gmail.com> Date: Thu Jul 20 15:18:18 2023 -0700 format code
(In reply to Luke Kenneth Casson Leighton from comment #10) > (In reply to Jacob Lifshay from comment #9) > > while writing unit tests for maddrs, I noticed that it's a 4-in 2-out > > instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes > > RT and RS. > > i already explained to markos on IRC why that is prohibited. > the instruction has not been updated since that discussion. a suggestion to end up with a version of maddrs that doesn't have too many registers: split it into the mul-add-shift and the mul-sub-shift parts and have each be a separate instruction: maddrs RT, RA, RB, SH prod <- MULS((RA), (RB)) RT <- round_shift(RT + prod, SH) msubrs RT, RA, RB, SH prod <- MULS((RA), (RB)) RT <- round_shift(RT - prod, SH) the final sequence then could be: maddsubrs r1,r10,0,r11 maddrs r1,r10,r12,14 msubrs r2,r10,r12,14
Latest commit https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=161e3857953adb8258e0069fece330c1b194cbc7 This concludes implementation as per last suggestion by Jacob: maddsubrs r1,r10,0,r11 maddrs r1,r10,r12,14 msubrs r2,r10,r12,14 Documentation updated in: https://libre-soc.org/openpower/sv/twin_butterfly/