1074 – create twin-butterfly research page into DCT/FFT instructions

Bug 1074 - create twin-butterfly research page into DCT/FFT instructions

Summary: create twin-butterfly research page into DCT/FFT instructions

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Specification (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Konstantinos Margaritis (markos)

URL:	https://libre-soc.org/openpower/sv/tw...

Duplicates (1):	962 (view as bug list)
Depends on:
Blocks:	1076
	Show dependency tree / graph

Reported:	2023-04-27 16:38 BST by Luke Kenneth Casson Leighton
Modified:	2024-03-01 00:16 GMT (History)
CC List:	4 users (show)

See Also:	1028
NLnet milestone:	NLnet.2022-08-051.OPF
total budget (EUR) for completion of task and all subtasks:	2500
budget (EUR) for this task, excluding subtasks' budget:	2500
parent task for budget allocation:	1011
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	lkcl = { amount = 500, submitted = 2024-02-07, paid = 2024-02-29 } markos = { amount = 1600, submitted = 2023-09-03, paid = 2023-09-15 } [jacob] amount = 400 submitted = 2024-02-09 paid = 2024-02-29

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2023-04-27 16:38:41 BST

the FP DCT/FFT twin-butterfly instructions need listing
and explaining, and also the (new) INT DCT/FFT twin-butterfly ones

Comment 1 Luke Kenneth Casson Leighton 2023-04-27 18:04:04 BST

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=5b0a082545185799b7bf053374aa3b60117ef74b

+| NN | RT | RA  | RB   | RC      | sh 01  00 |0 | maddsubrs | BF-Form  |

Comment 2 Luke Kenneth Casson Leighton 2023-04-27 18:17:22 BST

https://libre-soc.org/irclog/%23libre-soc.2023-04-27.log.html#t2023-04-27T16:16:55

Comment 3 Jacob Lifshay 2023-04-27 20:16:05 BST

current pseudocode is 5-in 2-out, since that's too much, i have an idea that might work for how to reduce that:
https://libre-soc.org/irclog/%23libre-soc.2023-04-27.log.html#t2023-04-27T20:06:41
> idea: put the pair of coefficients and accumulated sums each in 1 reg with each value being the lower/upper half of a reg...this should reduce input/output regs to 4-in 1-out
> idk if that'll fit the DCT pattern tho
> this is kinda like how cdtbcd works where the upper and lower halves are independent
> e.g. RT <- ((RT)[0:XLEN/2-1] + prod0) || ((RT)[XLEN/2:XLEN-1] + prod1)
> that way if you set elwid=32 you get 2 16-bit results

Comment 4 Luke Kenneth Casson Leighton 2023-04-27 20:32:24 BST

(In reply to Jacob Lifshay from comment #3)
> current pseudocode is 5-in 2-out, since that's too much, i have an idea that
> might work for how to reduce that:

already sorted. copying the pre-established pattern for ffmadds
it is RA that is used as the input-accumulator.

this is the way twin-butterfly works


RT = RA+RB*RC
RS = RA-RB*RC

where if you look at the unit test and also dig into the DCT
Schedule you find RT=RA and RB=RS.

Comment 5 Luke Kenneth Casson Leighton 2023-04-27 21:02:54 BST

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f8e2c0cb1467391aa7ae4b8b092c281ee2e16a7b

+            with m.If((major == 22) & xo6.matches(
+                    '-01000',  # maddsubrs
+                )):
+                comb += self.implicit_rs.eq(1)
+                comb += self.extend_rb_maxvl.eq(1) # extend RB

this says "RB is the extended register offset by MAXVL", and
detects the opcode 22 XO top 5 bits indicating maddsubrs.

Comment 6 Luke Kenneth Casson Leighton 2023-04-28 12:15:47 BST

i have fdmadds down to 3 operands in the instruction encoding,
it is still 3-in 2-out, FRT is overwrite and FRS destination
is implicit.

DCT-Form

   fdmadds FRT,FRA,FRB (Rc=0)
   fdmadds. FRT,FRA,FRB (Rc=1)

Pseudo-code:

   FRS <- FPADD32(FRT, FRB)
   sub <- FPSUB32(FRT, FRB)
   FRT <- FPMUL32(FRA, sub)

astonishingly this worked.  so actually you should be able to
have a full 5-bits shift.

# 1.6.7.2 DCTI-FORM
    |0     | 6    |11      |16     |21    |25      |31  |
    | PO   |  RT  |   RA   |   RB  |   SH |   XO   | Rc |


yes like that :)

which, actually, means it can be added as a variant of A-Form
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD


 210 # 1.6.17 A-FORM
 211     |0     |6     |11      |16     |21      |26    |31 |
 212     | PO   |  FRT |   FRA  | FRB   |   FRC  |   XO |Rc |
 213     | PO   |  FRT |   FRA  | FRB   |    /// |   XO |Rc |
 214     | PO   |  FRT |   FRA  |   /// |   FRC  |   XO |Rc |
 215     | PO   |  FRT |    /// | FRB   |    /// |   XO |Rc |
 216     | PO   |   RT |   RA   |   RB  |    BC  |   XO |  /|
 217 

         | PO   |  RT  |   RA   |   RB  |   SH |   XO   | Rc |

fits perfectly.

Comment 7 Luke Kenneth Casson Leighton 2023-04-28 12:21:06 BST

although it looks horrible can i suggest this instead?

    prod1 <- MUL(RB, sum)   # RB = c
    prod2 <- MUL(RB, diff)  # TODO: Pick high half?
    res1 <- ROTL64(prod1, XLEN-SH)
    res2 <- ROTL64(prod2, XLEN-SH)

==>

    prod1 <- MUL(RB, sum)   # RB = c
    prod2 <- MUL(RB, diff)  # TODO: Pick high half?

    res1 <- prod1[XLEN/2-SH:XLEN-1-SH]
    res2 <- prod2[XLEN/2-SH:XLEN-1-SH]

because the ROTL64 actually requires masking out of the top
LSBs.

Comment 8 Luke Kenneth Casson Leighton 2023-04-28 12:27:34 BST

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=9094d03d96dc474267c587fc94b8b5fdc8244227


+    res1 <- ROTL64(prod1, XLEN-SH)
+    res2 <- ROTL64(prod2, XLEN-SH)

ha, excellent.

Comment 9 Luke Kenneth Casson Leighton 2023-04-28 12:29:28 BST

(In reply to Luke Kenneth Casson Leighton from comment #8)
> https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;
> h=9094d03d96dc474267c587fc94b8b5fdc8244227
> 
> 
> +    res1 <- ROTL64(prod1, XLEN-SH)
> +    res2 <- ROTL64(prod2, XLEN-SH)
> 
> ha, excellent.

oh hang on, no, that doesn't quite do it, the MSBs still get
rotated into LSB positions.  or err the the other way round.

Comment 10 Luke Kenneth Casson Leighton 2023-04-28 16:46:40 BST

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6fe2b6ccc37181c0f416df3706110ea609377746

ffmadds is now down to 3 operands in the instruction form,
it is still 3-in 2-out, it's just that it looks like an X-Form
now (10-bit XO) which is great.

Comment 11 Konstantinos Margaritis (markos) 2023-04-28 18:09:49 BST

First working prototype here:

https://libre-soc.org/openpower/sv/twin_butterfly/

Comment 12 Luke Kenneth Casson Leighton 2023-04-28 18:30:50 BST

*** Bug 1028 has been marked as a duplicate of this bug. ***

Comment 13 Luke Kenneth Casson Leighton 2023-04-28 18:31:32 BST

*** Bug 962 has been marked as a duplicate of this bug. ***

Comment 14 Luke Kenneth Casson Leighton 2023-04-29 15:04:50 BST

konstantinos, question: should this be


   (a + b + 1) * c) >> N

?

you see why i suggest that?  it is to do with averaging a+b
where otherwise it is a FLOOR situation.

Comment 15 Luke Kenneth Casson Leighton 2023-04-29 17:47:57 BST

holy cow you're right.  this ends up as one instruction. wow.

    add 9,5,4
    subf 5,5,4
    mullw 9,9,6
    mullw 5,5,6
    addi 9,9,8192
    addi 5,5,8192
    srawi 9,9,14
    srawi 5,5,14

Comment 16 Konstantinos Margaritis (markos) 2023-07-25 16:23:42 BST

Documentation updated in:

https://libre-soc.org/openpower/sv/twin_butterfly/