the FP DCT/FFT twin-butterfly instructions need listing and explaining, and also the (new) INT DCT/FFT twin-butterfly ones
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=5b0a082545185799b7bf053374aa3b60117ef74b +| NN | RT | RA | RB | RC | sh 01 00 |0 | maddsubrs | BF-Form |
https://libre-soc.org/irclog/%23libre-soc.2023-04-27.log.html#t2023-04-27T16:16:55
current pseudocode is 5-in 2-out, since that's too much, i have an idea that might work for how to reduce that: https://libre-soc.org/irclog/%23libre-soc.2023-04-27.log.html#t2023-04-27T20:06:41 > idea: put the pair of coefficients and accumulated sums each in 1 reg with each value being the lower/upper half of a reg...this should reduce input/output regs to 4-in 1-out > idk if that'll fit the DCT pattern tho > this is kinda like how cdtbcd works where the upper and lower halves are independent > e.g. RT <- ((RT)[0:XLEN/2-1] + prod0) || ((RT)[XLEN/2:XLEN-1] + prod1) > that way if you set elwid=32 you get 2 16-bit results
(In reply to Jacob Lifshay from comment #3) > current pseudocode is 5-in 2-out, since that's too much, i have an idea that > might work for how to reduce that: already sorted. copying the pre-established pattern for ffmadds it is RA that is used as the input-accumulator. this is the way twin-butterfly works RT = RA+RB*RC RS = RA-RB*RC where if you look at the unit test and also dig into the DCT Schedule you find RT=RA and RB=RS.
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f8e2c0cb1467391aa7ae4b8b092c281ee2e16a7b + with m.If((major == 22) & xo6.matches( + '-01000', # maddsubrs + )): + comb += self.implicit_rs.eq(1) + comb += self.extend_rb_maxvl.eq(1) # extend RB this says "RB is the extended register offset by MAXVL", and detects the opcode 22 XO top 5 bits indicating maddsubrs.
i have fdmadds down to 3 operands in the instruction encoding, it is still 3-in 2-out, FRT is overwrite and FRS destination is implicit. DCT-Form fdmadds FRT,FRA,FRB (Rc=0) fdmadds. FRT,FRA,FRB (Rc=1) Pseudo-code: FRS <- FPADD32(FRT, FRB) sub <- FPSUB32(FRT, FRB) FRT <- FPMUL32(FRA, sub) astonishingly this worked. so actually you should be able to have a full 5-bits shift. # 1.6.7.2 DCTI-FORM |0 | 6 |11 |16 |21 |25 |31 | | PO | RT | RA | RB | SH | XO | Rc | yes like that :) which, actually, means it can be added as a variant of A-Form https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD 210 # 1.6.17 A-FORM 211 |0 |6 |11 |16 |21 |26 |31 | 212 | PO | FRT | FRA | FRB | FRC | XO |Rc | 213 | PO | FRT | FRA | FRB | /// | XO |Rc | 214 | PO | FRT | FRA | /// | FRC | XO |Rc | 215 | PO | FRT | /// | FRB | /// | XO |Rc | 216 | PO | RT | RA | RB | BC | XO | /| 217 | PO | RT | RA | RB | SH | XO | Rc | fits perfectly.
although it looks horrible can i suggest this instead? prod1 <- MUL(RB, sum) # RB = c prod2 <- MUL(RB, diff) # TODO: Pick high half? res1 <- ROTL64(prod1, XLEN-SH) res2 <- ROTL64(prod2, XLEN-SH) ==> prod1 <- MUL(RB, sum) # RB = c prod2 <- MUL(RB, diff) # TODO: Pick high half? res1 <- prod1[XLEN/2-SH:XLEN-1-SH] res2 <- prod2[XLEN/2-SH:XLEN-1-SH] because the ROTL64 actually requires masking out of the top LSBs.
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=9094d03d96dc474267c587fc94b8b5fdc8244227 + res1 <- ROTL64(prod1, XLEN-SH) + res2 <- ROTL64(prod2, XLEN-SH) ha, excellent.
(In reply to Luke Kenneth Casson Leighton from comment #8) > https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff; > h=9094d03d96dc474267c587fc94b8b5fdc8244227 > > > + res1 <- ROTL64(prod1, XLEN-SH) > + res2 <- ROTL64(prod2, XLEN-SH) > > ha, excellent. oh hang on, no, that doesn't quite do it, the MSBs still get rotated into LSB positions. or err the the other way round.
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6fe2b6ccc37181c0f416df3706110ea609377746 ffmadds is now down to 3 operands in the instruction form, it is still 3-in 2-out, it's just that it looks like an X-Form now (10-bit XO) which is great.
First working prototype here: https://libre-soc.org/openpower/sv/twin_butterfly/
*** Bug 1028 has been marked as a duplicate of this bug. ***
*** Bug 962 has been marked as a duplicate of this bug. ***
konstantinos, question: should this be (a + b + 1) * c) >> N ? you see why i suggest that? it is to do with averaging a+b where otherwise it is a FLOOR situation.
holy cow you're right. this ends up as one instruction. wow. add 9,5,4 subf 5,5,4 mullw 9,9,6 mullw 5,5,6 addi 9,9,8192 addi 5,5,8192 srawi 9,9,14 srawi 5,5,14
Documentation updated in: https://libre-soc.org/openpower/sv/twin_butterfly/