Bug 323 - create POWER9 MUL pipeline
Summary: create POWER9 MUL pipeline
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Jacob Lifshay
URL:
Depends on: 356 419 432 448
Blocks: 383
  Show dependency treegraph
 
Reported: 2020-05-19 13:01 BST by Luke Kenneth Casson Leighton
Modified: 2021-04-20 14:47 BST (History)
2 users (show)

See Also:
NLnet milestone: NLNet.2019.10.Wishbone
total budget (EUR) for completion of task and all subtasks: 750
budget (EUR) for this task, excluding subtasks' budget: 750
parent task for budget allocation: 383
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
"lkcl"={amount=250, paid=2020-08-21} "jacob"={amount=500, paid=2020-08-21}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-05-19 13:01:46 BST
a MUL pipeline is needed similar to the other pipelines in soc.fu, covering MUL operations.

https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/fu/mul;hb=HEAD
Comment 1 Luke Kenneth Casson Leighton 2020-05-19 13:24:58 BST
there are actually two different types of MUL here.

* VA Form - 3 int in, no carry/overflow
* X Form - usual style just like ALU/Logical

my feelings are mixed as that is a lot of ports if they are combined. still, actuslly, after some thought it is the same (after combining) port allocation as Shift.


# Multiply-Add High Doubleword VA-Form

VA-Form

* maddhd RT,RA.RB,RC

    prod[0:127] <- (RA) * (RB)
    sum[0:127] <- prod + EXTS(RC)
    RT <- sum[0:63]

Special Registers Altered:

    None
Comment 2 Luke Kenneth Casson Leighton 2020-05-20 01:35:55 BST
https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=a60febdeb1c572a4b85b410c6519383fc581732d

i moved mul operations over to a MUL Function Unit.  the unit test,
test_pipe_caller.py, when cookie-cut copied over, should then be changed:

                    fn_unit = yield pdecode2.e.fn_unit
                    self.assertEqual(fn_unit, Function.SHIFT_ROT.value)

to:

                    fn_unit = yield pdecode2.e.fn_unit
                    self.assertEqual(fn_unit, Function.MUL.value)

really we should look at some point at deriving a class to contain
the common code from all these tests, soc.fn.*.test.test_pipe_caller.py
Comment 3 Luke Kenneth Casson Leighton 2020-05-27 23:21:25 BST
from microwatt: how to set up the inputs to the mul pipeline.  this can go in main_stage.py when calling the mul unit:



if e_in.is_32bit = '1' then
    if e_in.is_signed = '1' then
	x_to_multiply.data1 <= (others => a_in(31));
	x_to_multiply.data1(31 downto 0) <= a_in(31 downto 0);
	x_to_multiply.data2 <= (others => b_in(31));
	x_to_multiply.data2(31 downto 0) <= b_in(31 downto 0);
    else
	x_to_multiply.data1 <= '0' & x"00000000" & a_in(31 downto 0);
	x_to_multiply.data2 <= '0' & x"00000000" & b_in(31 downto 0);
else
    if e_in.is_signed = '1' then
	x_to_multiply.data1 <= a_in(63) & a_in;
	x_to_multiply.data2 <= b_in(63) & b_in;
    else
	x_to_multiply.data1 <= '0' & a_in;
	x_to_multiply.data2 <= '0' & b_in;
Comment 4 Luke Kenneth Casson Leighton 2020-07-06 19:33:33 BST
i made a start on this, no multi stage, just to get at least something movibg forward.

immediately found an issue with the simulator pseudocode.  mulli operands are supposed to be signed and it alters the output considerably.

this will need alteration of the pseudocode, even to the extent of creating a special MULS function.
Comment 5 Luke Kenneth Casson Leighton 2020-07-09 10:54:30 BST
commit 512e2d72912ba57913ab1b1297a085d5fae67181 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Jul 9 10:52:46 2020 +0100

    add new stages etc. to get multiply working without xer_ca

removing xer_ca from the DIV and MUl pipelines (both on input and
output) needs a bit of tweaking.

it's important because unnecessary registers being read/written to
creates dependencies that create chaining and prevent opportunities
for parallelism.
Comment 6 Luke Kenneth Casson Leighton 2020-07-10 22:32:26 BST
hmm to match the exact behaviour of IBM's POWER9 core it is necessary to modify the pseudocode of divhwu and divhw to return the 32 bits of the product mapped *twice*.

this is exactly what microwatt does.

the second modification needed is going to be in creating a variable named overflow in the pseudocode and returning it.

the microwatt test is quite neat: hi bits are both all non zero and not all 1s.

this can be easily expressed in the pseudocode.
Comment 7 Luke Kenneth Casson Leighton 2020-07-25 22:28:04 BST
ha, hilarious

    overflow <- ((prod[0:32] != 0x0_0000_0000) &
                 (prod[0:32] != 0x1_ffff_ffff))

that's in hexadecimal, which is 36 bits long, not 33.  so the
pseudocode rightly complains.

i changed it to [0]*33 and [1]*33 and that works.
Comment 8 Luke Kenneth Casson Leighton 2020-08-18 12:17:55 BST
jacob EUR 500 lkcl 250 on this one i feel is reasonable.  MAC TODO, tests ok, proof still needed however is separate.