requirements: * FP16 / FP32 / FP64 * odd/even/normal rounding modes * bug #76 - RISC-V tininess * bug #77 - mul (2 operand - as a pipeline and an FSM) * bug #123 - mul (FMAC - 3 operand - as a pipeline) * bug #129 - FCMP (FNE, FGE, FLT) * bug #130 - FMIN/MAX * bug #78 - div (as a FMS) * bug #99 - div (as a *pipeline*) * bug #75 - add (as a pipeline and FSM * bug #43 - sqrt (as a pipeline) * bug #44 - 1/sqrt (as a pipeline) * bug #107 - FCVT (float-to-float downsize) * bug #108 - FCVT (float-to-float upsize) * bug #111 - FCVT (int-to-float) * bug #112 - FCVT (float-to-int) * bug #113 - improve FCVT unit tests * bug #117 - FCLASS (determine type of float) * bug #120 - FABS (make positive), FNEG (make negative), FINV (x=-x) * bug #110 - FPRSQRT RISC-V Opcode needed * bug #118 - FPFlags needed (to go into FPCSR) * bug #119 - zero and sign extension needed * bug #122 - FP software emulation needed, incl RSQRT, incl rounding modes: softfloat-3 cannot do RSQRT. * bug #136 - partitioned multiplier needs to use Dadda tree algorithm
comment from jacob (to be discussed to create separate milestones): I think we should split this into the div/mod/sqrt/inv-sqrt pipeline (referred to as div pipeline hereafter) and the main pipeline. In particular, I'm planning on having the div pipeline also handle integer div/mod and having the main pipeline handle integer multiplication and probably additional operations.
(In reply to Luke Kenneth Casson Leighton from comment #1) > comment from jacob (to be discussed to create separate milestones): > > I think we should split this into the div/mod/sqrt/inv-sqrt pipeline > (referred to as div pipeline hereafter) and the main pipeline. In > particular, I'm planning on having the div pipeline also handle integer > div/mod and having the main pipeline handle integer multiplication and > probably additional operations. i just created a FMUL bugreport/milestone, which depends on the integer-mul one, before seeing this, so we're along the same lines. didn't occur to me about the DIV. DIV already exists as an FSM, so there's no need to duplicate the work already done there. looking at the code, now, the need to split it out does not appear to be as urgent as it clearly is for MUL, being as DIV is, nothing more complex than ADDs, 1-bit shifters, CMPs, XORs and SUBs. it's just not that complicated. MUL on the other hand, is a *massive* block of gates, and the need to break it out behind an API that gives us the option to reduce gate count (at the cost of increasing pipeline stage length) is quite clear.
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/add/nmigen_div_experiment.py;hb=HEAD#l158 relevant code: 161 div.quot.eq(div.quot << 1), 162 div.rem.eq(Cat(div.dend[-1], div.rem[0:])), 163 div.dend.eq(div.dend << 1), and: 172 with m.If(div.rem >= div.dor): 173 m.d.sync += [div.quot[0].eq(1), 175 div.rem.eq(div.rem - div.dor),] 177 with m.If(div.count == div.width-2): 178 m.next = "divide_3" 179 with m.Else(): 180 m.next = "divide_1" 181 m.d.sync += div.count.eq(div.count + 1), so the way i see it, it's hardly even worth sub-classing.
(In reply to Luke Kenneth Casson Leighton from comment #1) > comment from jacob (to be discussed to create separate milestones): > > particular, I'm planning on having the div pipeline also handle integer > div/mod and having the main pipeline handle integer multiplication and > probably additional operations. bear in mind in an OoO (parallel) design, keeping the units separate allows the developer to decide *how many* of each ALU type shall be deployed, and then to pass in a part-encoded instruction which tells the pipeline what action to take. i would greatly prefer there not to be massive code-duplication between the single-core barrel processor and the multi-core OoO one because the pipelines designed for one cannot be used in the other! would it be ok to design the ALUs to have the *option* to cover multiple functions, by way of being constructed from an API-selectable *range* of available pipelineable (and FSM-based) units? the reason that i ask is because, if you look at Mitch Alsup's 2nd book chapter, you will see a detailed assessment of how to design a suite of pipelines based on performance requirements that go so far as to analyse the application requirements from a statistical analysis perspective... ... and then *specifically* target the design of the pipeline suite *and* the register file port count to that *specific* analysis. if the pipeline is specifically *hard-coded* to *require* that it handle integer and DIV, that type of analysis and specific design targetting is no longer possible.
(In reply to Luke Kenneth Casson Leighton from comment #1) > particular, I'm planning on having the div pipeline also handle integer > div/mod and having the main pipeline handle integer multiplication and > probably additional operations. Yep got it, raised separate milestones. ctx.op can be used to select operations, and the information about the capabilities at each Reservation Station need to be hard-coded into the Dependency Matrices. If there are no free RSs in which the operands (and operator) can be stored, it is absolutely essential that further instruction issue be frozen, locked up solid, until an RS becomes free. It is therefore pretty essential that we get the balance right, provide enough ALUs behind RSs with each ALU being able to handle the right mix of operations. That said we cannot go overboard either, hence why I really like the idea of sharing the INT MUL/ADD/DIV and having bypass capability on the early and late stages of FP. issue #116
sort blocks list
change back to original MoU amount
received authorisation from Michiel that the MoU is updated to EUR 15525 for this Milestone. total budget is still EUR 50,000.