an IEEE754 compliant "DIV" unit is needed, as a FSM (state machine) rather than a pipeline. It must still conform to the pipeline API. Must support FP16/32/64.
operational again (after elaboratable rework)
> FDIV is often implemented with a FRECIP > (reciprocal) followed by a FMUL. we *might* need a pipelined fdiv, yet to be evaluated. where in a standard processor, time is not really critical, for a GPU it definitely is. FSQRT and ISQRT are definitely going to be done as pipelines, jacob knows if we need FDIV to be pipelined. if an FRECIP can be tracked down and it can be done as a pipeline (that's if we need DIV to be pipelined), that would be good.
(In reply to Luke Kenneth Casson Leighton from comment #2) > > FDIV is often implemented with a FRECIP > > (reciprocal) followed by a FMUL. We can't use FRECIP followed by FMUL without additional intermediate precision since the RISCV spec requires FDIV to have correctly rounded results. > > we *might* need a pipelined fdiv, yet to be evaluated. where in a > standard processor, time is not really critical, for a GPU it definitely > is. > > FSQRT and ISQRT are definitely going to be done as pipelines, jacob knows > if we need FDIV to be pipelined. having a pipelined fdiv is more important than sqrt or rsqrt, since divisions are much more common (every pixel needs at least 1 division) > > if an FRECIP can be tracked down and it can be done as a pipeline > (that's if we need DIV to be pipelined), that would be good.
(In reply to Jacob Lifshay from comment #3) > > FSQRT and ISQRT are definitely going to be done as pipelines, jacob knows > > if we need FDIV to be pipelined. > having a pipelined fdiv is more important than sqrt or rsqrt, since > divisions are much more common (every pixel needs at least 1 division) rats. ok i should be able to knock something together quite quickly, however it's going to need a whopping *fourteen* stages even if done as a 4x combinatorial chain of 14x pipelines. or it could be *26* pipelines stages of 2x combinatorial blocks. that's just for 32-bit FP. 64-bit FP would be a staggering 56 pipeline stages. luckily we don't need that, as the focus isn't 64-bit. will raise a separate bugreport for this one.