123 – IEEE754 FPU FMAC needed

Bug 123 - IEEE754 FPU FMAC needed

Summary: IEEE754 FPU FMAC needed

Status:	RESOLVED DUPLICATE of bug 877

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	ALU (including IEEE754 16/32/64-bit FPU) (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:	48
	Show dependency tree / graph

Reported:	2019-07-28 22:42 BST by Luke Kenneth Casson Leighton
Modified:	2022-07-05 00:46 BST (History)
CC List:	2 users (show)

See Also:	877
NLnet milestone:	NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:	48
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2019-07-28 22:42:42 BST

3 operand FMAC (multiply and accumulate) needed, FP16/32/64 with unit tests.

https://github.com/gilani/fpfma/blob/fpfma_clean/fpfma.v

Comment 1 Jacob Lifshay 2019-07-28 23:47:31 BST

should be part of fpmul

simd integer multiplier needs modification to support or we can just add after

Comment 2 Luke Kenneth Casson Leighton 2019-07-29 00:42:19 BST

(In reply to Jacob Lifshay from comment #1)
> should be part of fpmul

Yes.

> simd integer multiplier needs modification to support or we can just add
> after

Hm hm, don't know. Might be simpler to do that as a first option, and optimise later under separate funding / bugs.

I will see if something can be thrown together with the existing pipeline building blocks, without having to do full normalisation and rounding followed immediately by denormalisation.

The post norm stages of both mul and add are designed to output more bits, to the rounding phase: it might be possible to just extend the add number to the same (nonstandard) mantissa width and go from there.

Comment 3 Jacob Lifshay 2019-07-29 00:57:37 BST

(In reply to Luke Kenneth Casson Leighton from comment #2)
> (In reply to Jacob Lifshay from comment #1)
> > should be part of fpmul
> 
> Yes.
> 
> > simd integer multiplier needs modification to support or we can just add
> > after
meant adding integer adder right after integer mul.
needs to be:
53*3 bit wide adder for fp64
24*3 for fp32

needed to handle the case of a*b + c where a = 0.00xxxx, b = 0.00yyyy, c = 0.zzzz, unrounded result is 0.zzzzpppppppp where a * b = 0.0000pppppppp

rounding and normalization needs to take all bits into account

Comment 4 Luke Kenneth Casson Leighton 2019-07-29 01:10:09 BST

(In reply to Jacob Lifshay from comment #3)

> meant adding integer adder right after integer mul.
> needs to be:
> 53*3 bit wide adder for fp64
> 24*3 for fp32
> 
> needed to handle the case of a*b + c where a = 0.00xxxx, b = 0.00yyyy, c =
> 0.zzzz, unrounded result is 0.zzzzpppppppp where a * b = 0.0000pppppppp

zowee, 53*3 wide add. is the gate latency on that ok? I don't know.

> rounding and normalization needs to take all bits into account

I'm fairly certain that there are optimisations involving the rounding modes and depending on whether a*b is larger than c or not (or, a-exp plus b-exp greater than c-exp).

I took a look at hardfloat-3 and it has something like that, although decoding what is being done is a different matter. No code comments.

Will try a nonoptimal version, see what happens.

Comment 5 Luke Kenneth Casson Leighton 2019-07-29 01:12:44 BST

https://github.com/ucb-bar/berkeley-hardfloat/blob/master/src/main/scala/MulAddRecFN.scala

Comment 6 Jacob Lifshay 2019-07-29 01:22:09 BST

(In reply to Luke Kenneth Casson Leighton from comment #4)
> (In reply to Jacob Lifshay from comment #3)
> 
> > meant adding integer adder right after integer mul.
> > needs to be:
> > 53*3 bit wide adder for fp64
> > 24*3 for fp32
> > 
> > needed to handle the case of a*b + c where a = 0.00xxxx, b = 0.00yyyy, c =
> > 0.zzzz, unrounded result is 0.zzzzpppppppp where a * b = 0.0000pppppppp
> 
> zowee, 53*3 wide add. is the gate latency on that ok? I don't know.
ignoring wire delay, n-bit addition can be done in O(log n) time and
O(n * log n) space using carry look ahead.

> > rounding and normalization needs to take all bits into account
> 
> I'm fairly certain that there are optimisations involving the rounding modes
> and depending on whether a*b is larger than c or not (or, a-exp plus b-exp
> greater than c-exp).
yeah, but the hw needs to handle the worst case, which is as above. don't forget that we need to handle a * b - c as well.

Comment 7 Luke Kenneth Casson Leighton 2019-08-10 06:49:18 BST

http://www.jhauser.us/arithmetic/HardFloat-1/doc/HardFloat-Verilog.html
http://www.jhauser.us/arithmetic/HardFloat.html

the source is a .zip archive, where mulAddRecFn.v can be found, and it's proven and correct.

i can do a conversion to nmigen, keeping the logic intact, it's only 450 lines or so, and we'll not need to do "research".

also i just spotted that there's a really clean "rounding" function which will be extremely useful.

Comment 8 Luke Kenneth Casson Leighton 2019-08-10 08:38:41 BST

            # XXX check! {doSubMags ? ~sigC : sigC,
            #            {(sigSumWidth - sigWidth + 2){doSubMags}}};
            extComplSigC.eq(Cat((sigSumWidth - sigWidth + 2){doSubMags}},
                                Mux(doSubMags, ~sigC, sigC))),

anyone know what this translates to?

Comment 9 Jacob Lifshay 2019-08-11 04:35:42 BST

(In reply to Luke Kenneth Casson Leighton from comment #8)
>             # XXX check! {doSubMags ? ~sigC : sigC,
>             #            {(sigSumWidth - sigWidth + 2){doSubMags}}};
>             extComplSigC.eq(Cat((sigSumWidth - sigWidth + 2){doSubMags}},
>                                 Mux(doSubMags, ~sigC, sigC))),
> 
> anyone know what this translates to?

I think that's a repeat

Comment 10 Luke Kenneth Casson Leighton 2019-08-11 05:19:38 BST

(In reply to Jacob Lifshay from comment #9)
> (In reply to Luke Kenneth Casson Leighton from comment #8)
> >             # XXX check! {doSubMags ? ~sigC : sigC,
> >             #            {(sigSumWidth - sigWidth + 2){doSubMags}}};
> >             extComplSigC.eq(Cat((sigSumWidth - sigWidth + 2){doSubMags}},
> >                                 Mux(doSubMags, ~sigC, sigC))),
> > 
> > anyone know what this translates to?
> 
> I think that's a repeat

yeh i looked it up, fortunately found a stackexchange resource that
used double-brackets like this, so i did this:

            # XXX check! {doSubMags ? ~sigC : sigC,
            #            {(sigSumWidth - sigWidth + 2){doSubMags}}};
            sc = [doSubMags] * (sigSumWidth - sigWidth + 2) + \
                                [Mux(doSubMags, ~sigC, sigC)]
            extComplSigC.eq(Cat(*sc))

so it's 1 bit worth of doSubMags, times ssw-sw+2, with sigC tacked onto the end (MSB), inverted if doSubMags is true.

possible values:

0b0111111
0b1111111
0b1000000
0b0000000

something like that

Comment 11 Jacob Lifshay 2019-08-11 06:00:02 BST

(In reply to Luke Kenneth Casson Leighton from comment #10)
> yeh i looked it up, fortunately found a stackexchange resource that
> used double-brackets like this, so i did this:
> 
>             # XXX check! {doSubMags ? ~sigC : sigC,
>             #            {(sigSumWidth - sigWidth + 2){doSubMags}}};
>             sc = [doSubMags] * (sigSumWidth - sigWidth + 2) + \
>                                 [Mux(doSubMags, ~sigC, sigC)]
>             extComplSigC.eq(Cat(*sc))
> 

nmigen has a Repl operator that makes the result much cleaner. It translates directly to Verilog's repeat construct.

Comment 12 Luke Kenneth Casson Leighton 2019-08-11 06:59:53 BST

(In reply to Jacob Lifshay from comment #11)

> nmigen has a Repl operator that makes the result much cleaner. It translates
> directly to Verilog's repeat construct.

rats!  i keep forgetting about that one.  there's several places now where
i used [x] * w.  thanks for the reminder.

Comment 13 Luke Kenneth Casson Leighton 2020-01-06 21:30:38 GMT

See https://docs.rs/simple-soft-float/0.1.0/simple_soft_float/struct.Float.html#method.fused_mul_add
for the docs for the fused-mul-add implementation.

Comment 14 Jacob Lifshay 2022-07-05 00:46:31 BST


*** This bug has been marked as a duplicate of bug 877 ***