Bug 1028 - implement integer-versions of fft/dct "butterfly" instructions in ISACaller Simulator
Summary: implement integer-versions of fft/dct "butterfly" instructions in ISACaller S...
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 962
Blocks: 1076
  Show dependency treegraph
 
Reported: 2023-03-15 17:03 GMT by Luke Kenneth Casson Leighton
Modified: 2024-03-10 20:30 GMT (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2022-08-107.ongoing
total budget (EUR) for completion of task and all subtasks: 4000
budget (EUR) for this task, excluding subtasks' budget: 4000
parent task for budget allocation: 1027
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
lkcl = { amount = 500, submitted = 2024-02-26,paid = 2024-03-08 } donated = { amount = 2700, submitted = 2024-02-26,paid = 2024-03-08 } [jacob] amount = 800 submitted = 2024-02-26 paid = 2024-03-08


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2023-03-15 17:03:25 GMT
FFT and DCT floating-point butterfly instructions have been implemented,
but integer ones have not. once designed (see bug #962) they need to be
implemented.
Comment 2 Luke Kenneth Casson Leighton 2023-04-28 18:30:50 BST
REVERTED *** This bug has been marked as a duplicate of bug 1074 ***

*** This bug has been marked as a duplicate of bug 1074 ***
Comment 3 Luke Kenneth Casson Leighton 2023-04-29 16:46:27 BST
small budget for me as i am making actual changes to the FP
instructions, as well.  the number of operands is reduced:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f07887871e71669b6cce2dc2aaac53f2b02a4621
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6fe2b6ccc37181c0f416df3706110ea609377746

also i need to add double variants of some of these instructions.
Comment 4 Jacob Lifshay 2023-06-01 07:51:18 BST
I added a test case as well as adjusted sv_analysis.py to properly handle maddsubrs.

commit 7d8c8cb54774392e018fbd48840ba76c09b1a6af
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed May 31 23:46:23 2023 -0700

    add worked-out svp64 16-bit maddsubrs test case

commit 7c3488d2cda99ad682c2e4de31ecc99cf3e64b09
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed May 31 23:30:48 2023 -0700

    make maddsubrs show up in SVP64 generated CSVs

commit ce029ac5663a48665bb8b558f785d078dad8c6a5
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed May 31 23:29:59 2023 -0700

    raise error on unhandled instruction kind
Comment 5 Konstantinos Margaritis (markos) 2023-07-19 14:51:11 BST
Latest commit fixes the XLEN != 64 cases.

https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb
Comment 6 Jacob Lifshay 2023-07-19 17:10:33 BST
I did some work on this (adding the unit test), therefore I should get paid some amount, I'll let luke figure out how much. that's why there's the jacob=0. markos, please don't remove that, it isn't an accident.
Comment 7 Jacob Lifshay 2023-07-19 17:33:43 BST
(In reply to Konstantinos Margaritis (markos) from comment #5)
> Latest commit fixes the XLEN != 64 cases.
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=commit;
> h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb

yay! looks good for XLEN<64

you should be able to delete both XLEN=64 branches and just keep the else blocks, since then the whole function is independent of how many bits all the values have. this will make the XLEN=64 case not drop bits -- e.g. right now the XLEN=64 case will return the wrong result for SH=31 with RT=2^32, RA=0, and RB=2^32 (essentially fixed-point 2.0 * 2.0 for 33.31 fixed-point format) whereas afaict the else block would return the correct result, if only you used it for XLEN=64 too.

so, basically you can delete lines 24-40 and 74-91 in https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/butterfly.mdwn;h=b6b7efe523c4ee9c9e61dd1fc6ce5ab96d3d7121;hb=189a461f1c86d21a6813ed8df892c0cf0b6d8acb#l24
Comment 8 Jacob Lifshay 2023-07-19 17:35:28 BST
(In reply to Jacob Lifshay from comment #7)
> (In reply to Konstantinos Margaritis (markos) from comment #5)
> > Latest commit fixes the XLEN != 64 cases.
> > 
> > https://git.libre-soc.org/?p=openpower-isa.git;a=commit;
> > h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb
> 
> yay! looks good for XLEN<64
> 
> you should be able to delete both XLEN=64 branches and just keep the else
> blocks, since then the whole function is independent of how many bits all
> the values have. this will make the XLEN=64 case not drop bits -- e.g. right
> now the XLEN=64 case will return the wrong result for SH=31 with RT=2^32,
> RA=0, and RB=2^32 (essentially fixed-point 2.0 * 2.0 for 33.31 fixed-point
> format) whereas afaict the else block would return the correct result, if
> only you used it for XLEN=64 too.

if this breaks test cases, the test cases may just be incorrect.
Comment 9 Jacob Lifshay 2023-07-20 23:36:47 BST
while writing unit tests for maddrs, I noticed that it's a 4-in 2-out instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes RT and RS.

If you must have that many inputs/outputs, it seems to me that it would be better to just directly compute the expression you care about, since that's also 4-in 2-out, rather than needing 2 instructions:
fdct_round_shift(a * c1  +/- b * c2)

basically replace maddrs with:

Multiply Multiply Add Sub Right Shift A-Form

mmaddsubrs RT, RA, RB, SH

prod1 <- MULS((RT), (RA))
prod2 <- MULS((RS), (RB))
RT <- round_shift(prod1 + prod2, SH)
RS <- round_shift(prod1 - prod2, SH)
Comment 10 Luke Kenneth Casson Leighton 2023-07-20 23:45:43 BST
(In reply to Jacob Lifshay from comment #9)
> while writing unit tests for maddrs, I noticed that it's a 4-in 2-out
> instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes
> RT and RS.

i already explained to markos on IRC why that is prohibited.
the instruction has not been updated since that discussion.
Comment 11 Jacob Lifshay 2023-07-20 23:52:00 BST
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #9)
> > while writing unit tests for maddrs, I noticed that it's a 4-in 2-out
> > instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes
> > RT and RS.
> 
> i already explained to markos on IRC why that is prohibited.
> the instruction has not been updated since that discussion.

ah, ok.


reading through the pseudo-code for maddsubrs/maddrs I noticed 2 things:
1. all the code to compute masks and merge in the sign bit is not actually necessary, you extract the full XLEN-bit value you need with just:
prod1[XLEN - n:XLEN*2 - n - 1]

no further adjustment is needed (since casting to the result type is essentially modular reduction, hence ignores the high bits).

2. for maddsubrs: if you sign extend the sum/difference by one bit before multiplying, you avoid errors due to overflow:
e.g. use (RT)[0] || (RT) instead of (RT)

overflow errors do make it into the final result due to the right-shifting
Comment 12 Jacob Lifshay 2023-07-20 23:56:34 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=1ede1d3464b184bae24668bf58d92a9bca73fce4

commit 1ede1d3464b184bae24668bf58d92a9bca73fce4
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Thu Jul 20 15:42:25 2023 -0700

    add much more exhaustive maddrs unit tests

commit 1967fdf575ea7285800dfbc05f5656b537b9119f
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Thu Jul 20 15:19:19 2023 -0700

    add much more exhaustive maddsubrs unit tests

commit 709d5c02dac8d3ab2b91fb15a63e1d55eb633d2d
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Thu Jul 20 15:18:18 2023 -0700

    format code
Comment 13 Jacob Lifshay 2023-07-21 00:06:37 BST
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #9)
> > while writing unit tests for maddrs, I noticed that it's a 4-in 2-out
> > instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes
> > RT and RS.
> 
> i already explained to markos on IRC why that is prohibited.
> the instruction has not been updated since that discussion.

a suggestion to end up with a version of maddrs that doesn't have too many registers: split it into the mul-add-shift and the mul-sub-shift parts and have each be a separate instruction:

maddrs RT, RA, RB, SH

prod <- MULS((RA), (RB))
RT <- round_shift(RT + prod, SH)


msubrs RT, RA, RB, SH

prod <- MULS((RA), (RB))
RT <- round_shift(RT - prod, SH)


the final sequence then could be:
maddsubrs r1,r10,0,r11
maddrs r1,r10,r12,14
msubrs r2,r10,r12,14
Comment 14 Konstantinos Margaritis (markos) 2023-07-25 16:23:19 BST
Latest commit 

https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=161e3857953adb8258e0069fece330c1b194cbc7

This concludes implementation as per last suggestion by Jacob:

maddsubrs r1,r10,0,r11
maddrs r1,r10,r12,14
msubrs r2,r10,r12,14

Documentation updated in:

https://libre-soc.org/openpower/sv/twin_butterfly/