1028 – implement integer-versions of fft/dct "butterfly" instructions in ISACaller Simulator

Bug 1028 - implement integer-versions of fft/dct "butterfly" instructions in ISACaller Simulator

Summary: implement integer-versions of fft/dct "butterfly" instructions in ISACaller S...

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:	962
Blocks:	1076
	Show dependency tree / graph

Reported:	2023-03-15 17:03 GMT by Luke Kenneth Casson Leighton
Modified:	2024-03-10 20:30 GMT (History)
CC List:	3 users (show)

See Also:	962 1042 1074
NLnet milestone:	NLnet.2022-08-107.ongoing
total budget (EUR) for completion of task and all subtasks:	4000
budget (EUR) for this task, excluding subtasks' budget:	4000
parent task for budget allocation:	1027
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	lkcl = { amount = 500, submitted = 2024-02-26,paid = 2024-03-08 } donated = { amount = 2700, submitted = 2024-02-26,paid = 2024-03-08 } [jacob] amount = 800 submitted = 2024-02-26 paid = 2024-03-08

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2023-03-15 17:03:25 GMT

FFT and DCT floating-point butterfly instructions have been implemented,
but integer ones have not. once designed (see bug #962) they need to be
implemented.

Comment 1 Konstantinos Margaritis (markos) 2023-04-28 18:13:00 BST

First working implementation/prototype at:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/butterfly.mdwn;h=555e8abd120bc1252c52c2610a5afabe888136f3;hb=refs/heads/maddsubrs

Comment 2 Luke Kenneth Casson Leighton 2023-04-28 18:30:50 BST

REVERTED *** This bug has been marked as a duplicate of bug 1074 ***

*** This bug has been marked as a duplicate of bug 1074 ***

Comment 3 Luke Kenneth Casson Leighton 2023-04-29 16:46:27 BST

small budget for me as i am making actual changes to the FP
instructions, as well.  the number of operands is reduced:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f07887871e71669b6cce2dc2aaac53f2b02a4621
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6fe2b6ccc37181c0f416df3706110ea609377746

also i need to add double variants of some of these instructions.

Comment 4 Jacob Lifshay 2023-06-01 07:51:18 BST

I added a test case as well as adjusted sv_analysis.py to properly handle maddsubrs.

commit 7d8c8cb54774392e018fbd48840ba76c09b1a6af
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed May 31 23:46:23 2023 -0700

    add worked-out svp64 16-bit maddsubrs test case

commit 7c3488d2cda99ad682c2e4de31ecc99cf3e64b09
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed May 31 23:30:48 2023 -0700

    make maddsubrs show up in SVP64 generated CSVs

commit ce029ac5663a48665bb8b558f785d078dad8c6a5
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed May 31 23:29:59 2023 -0700

    raise error on unhandled instruction kind

Comment 5 Konstantinos Margaritis (markos) 2023-07-19 14:51:11 BST

Latest commit fixes the XLEN != 64 cases.

https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb

Comment 6 Jacob Lifshay 2023-07-19 17:10:33 BST

I did some work on this (adding the unit test), therefore I should get paid some amount, I'll let luke figure out how much. that's why there's the jacob=0. markos, please don't remove that, it isn't an accident.

Comment 7 Jacob Lifshay 2023-07-19 17:33:43 BST

(In reply to Konstantinos Margaritis (markos) from comment #5)
> Latest commit fixes the XLEN != 64 cases.
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=commit;
> h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb

yay! looks good for XLEN<64

you should be able to delete both XLEN=64 branches and just keep the else blocks, since then the whole function is independent of how many bits all the values have. this will make the XLEN=64 case not drop bits -- e.g. right now the XLEN=64 case will return the wrong result for SH=31 with RT=2^32, RA=0, and RB=2^32 (essentially fixed-point 2.0 * 2.0 for 33.31 fixed-point format) whereas afaict the else block would return the correct result, if only you used it for XLEN=64 too.

so, basically you can delete lines 24-40 and 74-91 in https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/butterfly.mdwn;h=b6b7efe523c4ee9c9e61dd1fc6ce5ab96d3d7121;hb=189a461f1c86d21a6813ed8df892c0cf0b6d8acb#l24

Comment 8 Jacob Lifshay 2023-07-19 17:35:28 BST

(In reply to Jacob Lifshay from comment #7)
> (In reply to Konstantinos Margaritis (markos) from comment #5)
> > Latest commit fixes the XLEN != 64 cases.
> > 
> > https://git.libre-soc.org/?p=openpower-isa.git;a=commit;
> > h=189a461f1c86d21a6813ed8df892c0cf0b6d8acb
> 
> yay! looks good for XLEN<64
> 
> you should be able to delete both XLEN=64 branches and just keep the else
> blocks, since then the whole function is independent of how many bits all
> the values have. this will make the XLEN=64 case not drop bits -- e.g. right
> now the XLEN=64 case will return the wrong result for SH=31 with RT=2^32,
> RA=0, and RB=2^32 (essentially fixed-point 2.0 * 2.0 for 33.31 fixed-point
> format) whereas afaict the else block would return the correct result, if
> only you used it for XLEN=64 too.

if this breaks test cases, the test cases may just be incorrect.

Comment 9 Jacob Lifshay 2023-07-20 23:36:47 BST

while writing unit tests for maddrs, I noticed that it's a 4-in 2-out instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes RT and RS.

If you must have that many inputs/outputs, it seems to me that it would be better to just directly compute the expression you care about, since that's also 4-in 2-out, rather than needing 2 instructions:
fdct_round_shift(a * c1  +/- b * c2)

basically replace maddrs with:

Multiply Multiply Add Sub Right Shift A-Form

mmaddsubrs RT, RA, RB, SH

prod1 <- MULS((RT), (RA))
prod2 <- MULS((RS), (RB))
RT <- round_shift(prod1 + prod2, SH)
RS <- round_shift(prod1 - prod2, SH)

Comment 10 Luke Kenneth Casson Leighton 2023-07-20 23:45:43 BST

(In reply to Jacob Lifshay from comment #9)
> while writing unit tests for maddrs, I noticed that it's a 4-in 2-out
> instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes
> RT and RS.

i already explained to markos on IRC why that is prohibited.
the instruction has not been updated since that discussion.

Comment 11 Jacob Lifshay 2023-07-20 23:52:00 BST

(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #9)
> > while writing unit tests for maddrs, I noticed that it's a 4-in 2-out
> > instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes
> > RT and RS.
> 
> i already explained to markos on IRC why that is prohibited.
> the instruction has not been updated since that discussion.

ah, ok.


reading through the pseudo-code for maddsubrs/maddrs I noticed 2 things:
1. all the code to compute masks and merge in the sign bit is not actually necessary, you extract the full XLEN-bit value you need with just:
prod1[XLEN - n:XLEN*2 - n - 1]

no further adjustment is needed (since casting to the result type is essentially modular reduction, hence ignores the high bits).

2. for maddsubrs: if you sign extend the sum/difference by one bit before multiplying, you avoid errors due to overflow:
e.g. use (RT)[0] || (RT) instead of (RT)

overflow errors do make it into the final result due to the right-shifting

Comment 12 Jacob Lifshay 2023-07-20 23:56:34 BST

https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=1ede1d3464b184bae24668bf58d92a9bca73fce4

commit 1ede1d3464b184bae24668bf58d92a9bca73fce4
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Thu Jul 20 15:42:25 2023 -0700

    add much more exhaustive maddrs unit tests

commit 1967fdf575ea7285800dfbc05f5656b537b9119f
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Thu Jul 20 15:19:19 2023 -0700

    add much more exhaustive maddsubrs unit tests

commit 709d5c02dac8d3ab2b91fb15a63e1d55eb633d2d
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Thu Jul 20 15:18:18 2023 -0700

    format code

Comment 13 Jacob Lifshay 2023-07-21 00:06:37 BST

(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #9)
> > while writing unit tests for maddrs, I noticed that it's a 4-in 2-out
> > instruction (most likely too many) -- it reads RA, RB, RT, and RS and writes
> > RT and RS.
> 
> i already explained to markos on IRC why that is prohibited.
> the instruction has not been updated since that discussion.

a suggestion to end up with a version of maddrs that doesn't have too many registers: split it into the mul-add-shift and the mul-sub-shift parts and have each be a separate instruction:

maddrs RT, RA, RB, SH

prod <- MULS((RA), (RB))
RT <- round_shift(RT + prod, SH)


msubrs RT, RA, RB, SH

prod <- MULS((RA), (RB))
RT <- round_shift(RT - prod, SH)


the final sequence then could be:
maddsubrs r1,r10,0,r11
maddrs r1,r10,r12,14
msubrs r2,r10,r12,14

Comment 14 Konstantinos Margaritis (markos) 2023-07-25 16:23:19 BST

Latest commit 

https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=161e3857953adb8258e0069fece330c1b194cbc7

This concludes implementation as per last suggestion by Jacob:

maddsubrs r1,r10,0,r11
maddrs r1,r10,r12,14
msubrs r2,r10,r12,14

Documentation updated in:

https://libre-soc.org/openpower/sv/twin_butterfly/