Bug 48 - Complete IEEE754 floating point pipeline
Summary: Complete IEEE754 floating point pipeline
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: ALU (including IEEE754 16/32/64-bit FPU) (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 102 114 115 116 118 119 121 122 123 125 126 132 136 512 43 44 74 75 76 77 78 99 101 107 108 111 112 113 117 120 127 129 130
Blocks: 191
  Show dependency treegraph
 
Reported: 2019-03-21 11:26 GMT by Luke Kenneth Casson Leighton
Modified: 2020-10-06 13:15 BST (History)
2 users (show)

See Also:
NLnet milestone: NLnet.2019.02
total budget (EUR) for completion of task and all subtasks: 15000
budget (EUR) for this task, excluding subtasks' budget: 3225
parent task for budget allocation: 191
child tasks for budget allocation: 43 44 74 75 77 78 99 101 102 106 107 108 111 112 113 114 115 116 117 118 119 120 121 122 123 125 126 129 130 132 136 171 172 173 189
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2019-03-21 11:26:38 GMT
requirements:

* FP16 / FP32 / FP64
* odd/even/normal rounding modes
* bug #76 - RISC-V tininess
* bug #77 - mul (2 operand - as a pipeline and an FSM)
* bug #123 - mul (FMAC - 3 operand - as a pipeline)
* bug #129 - FCMP (FNE, FGE, FLT)
* bug #130 - FMIN/MAX
* bug #78 - div (as a FMS)
* bug #99 - div (as a *pipeline*)
* bug #75 - add (as a pipeline and FSM
* bug #43 - sqrt   (as a pipeline)
* bug #44 - 1/sqrt (as a pipeline)
* bug #107 - FCVT (float-to-float downsize)
* bug #108 - FCVT (float-to-float upsize)
* bug #111 - FCVT (int-to-float)
* bug #112 - FCVT (float-to-int)
* bug #113 - improve FCVT unit tests
* bug #117 - FCLASS (determine type of float)
* bug #120 - FABS (make positive), FNEG (make negative), FINV (x=-x)
* bug #110 - FPRSQRT RISC-V Opcode needed
* bug #118 - FPFlags needed (to go into FPCSR)
* bug #119 - zero and sign extension needed
* bug #122 - FP software emulation needed, incl RSQRT, incl rounding modes: softfloat-3 cannot do RSQRT.
* bug #136 - partitioned multiplier needs to use Dadda tree algorithm
* bug #132 - partitioned signal (use for SIMDification of FPU)
Comment 1 Luke Kenneth Casson Leighton 2019-03-21 17:47:43 GMT
comment from jacob (to be discussed to create separate milestones):

I think we should split this into the div/mod/sqrt/inv-sqrt pipeline
(referred to as div pipeline hereafter) and the main pipeline. In
particular,  I'm planning on having the div pipeline also handle integer
div/mod and having the main pipeline handle integer multiplication and
probably additional operations.
Comment 2 Luke Kenneth Casson Leighton 2019-04-26 21:51:22 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)
> comment from jacob (to be discussed to create separate milestones):
> 
> I think we should split this into the div/mod/sqrt/inv-sqrt pipeline
> (referred to as div pipeline hereafter) and the main pipeline. In
> particular,  I'm planning on having the div pipeline also handle integer
> div/mod and having the main pipeline handle integer multiplication and
> probably additional operations.

i just created a FMUL bugreport/milestone, which depends on the
integer-mul one, before seeing this, so we're along the same
lines.

didn't occur to me about the DIV.  DIV already exists as an FSM,
so there's no need to duplicate the work already done there.

looking at the code, now, the need to split it out does not appear
to be as urgent as it clearly is for MUL, being as DIV is, nothing
more complex than ADDs, 1-bit shifters, CMPs, XORs and SUBs.

it's just not that complicated.

MUL on the other hand, is a *massive* block of gates, and the need
to break it out behind an API that gives us the option to reduce
gate count (at the cost of increasing pipeline stage length) is
quite clear.
Comment 3 Luke Kenneth Casson Leighton 2019-04-26 21:54:32 BST
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/nmigen_div_experiment.py;hb=HEAD#l158

relevant code:

 161                     div.quot.eq(div.quot << 1),
 162                     div.rem.eq(Cat(div.dend[-1], div.rem[0:])),
 163                     div.dend.eq(div.dend << 1),

and:

 172                 with m.If(div.rem >= div.dor):
 173                     m.d.sync += [div.quot[0].eq(1),
 175                                  div.rem.eq(div.rem - div.dor),]
 177                 with m.If(div.count == div.width-2):
 178                     m.next = "divide_3"
 179                 with m.Else():
 180                     m.next = "divide_1"
 181                     m.d.sync += div.count.eq(div.count + 1),

so the way i see it, it's hardly even worth sub-classing.
Comment 4 Luke Kenneth Casson Leighton 2019-04-26 22:04:30 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)
> comment from jacob (to be discussed to create separate milestones):
> 
> particular,  I'm planning on having the div pipeline also handle integer
> div/mod and having the main pipeline handle integer multiplication and
> probably additional operations.

bear in mind in an OoO (parallel) design, keeping the units separate allows
the developer to decide *how many* of each ALU type shall be deployed, and then
to pass in a part-encoded instruction which tells the pipeline what action
to take.

i would greatly prefer there not to be massive code-duplication between
the single-core barrel processor and the multi-core OoO one because the
pipelines designed for one cannot be used in the other!

would it be ok to design the ALUs to have the *option* to cover multiple
functions, by way of being constructed from an API-selectable *range*
of available pipelineable (and FSM-based) units?

the reason that i ask is because, if you look at Mitch Alsup's 2nd book
chapter, you will see a detailed assessment of how to design a suite of
pipelines based on performance requirements that go so far as to analyse
the application requirements from a statistical analysis perspective...

... and then *specifically* target the design of the pipeline suite *and*
the register file port count to that *specific* analysis.

if the pipeline is specifically *hard-coded* to *require* that it handle
integer and DIV, that type of analysis and specific design targetting is
no longer possible.
Comment 5 Luke Kenneth Casson Leighton 2019-07-26 22:39:22 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)

> particular,  I'm planning on having the div pipeline also handle integer
> div/mod and having the main pipeline handle integer multiplication and
> probably additional operations.

Yep got it, raised separate milestones.

ctx.op can be used to select operations, and the information about the capabilities at each Reservation Station need to be hard-coded into the Dependency Matrices.

If there are no free RSs in which the operands (and operator) can be stored, it is absolutely essential that further instruction issue be frozen, locked up solid, until an RS becomes free.

It is therefore pretty essential that we get the balance right, provide enough ALUs behind RSs with each ALU being able to handle the right mix of operations.

That said we cannot go overboard either, hence why I really like the idea of sharing the INT MUL/ADD/DIV and having bypass capability on the early and late stages of FP. issue #116
Comment 6 Jacob Lifshay 2020-09-21 16:53:57 BST
sort blocks list
Comment 7 Jacob Lifshay 2020-09-21 21:24:01 BST
change back to original MoU amount