Bug 1025 - create IEEE754 FP Pipelines and decoder for TestIssuer
Summary: create IEEE754 FP Pipelines and decoder for TestIssuer
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 1134 1136 1137
Blocks: 1072
  Show dependency treegraph
 
Reported: 2023-03-15 16:42 GMT by Luke Kenneth Casson Leighton
Modified: 2023-12-01 18:21 GMT (History)
4 users (show)

See Also:
NLnet milestone: Future
total budget (EUR) for completion of task and all subtasks: 40000
budget (EUR) for this task, excluding subtasks' budget: 40000
parent task for budget allocation: 487
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2023-03-15 16:42:14 GMT
The first NLnet grant allowed us to create IEEE754 FP pipelines,
which now need integration into the Libre-SOC Core, and suitable
unit tests created. It will add approximately 60 new instructions
for Power ISA 3.0 SFFS Compliancy. second priority is Power ISA v3.1

----------------------------------------------------------------

To-do list:

(this doesn't include formal proofs, which a lot of will be complex enough to blow out our budget and/or not even run in a reasonable amount of time)

cost estimates:
* NOT COMPLETED AT ALL: make educated guess how much code will be needed
  * TODO: comment #6 and other comments (which you need to track down
          and re-read jacob) regarding getting FP into TestIssuer.
          without TestIssuer supporting the FP pipeline there *is* no
          FP pipeline. this set of tasks *includes documenting* of
          TestIssuer and how it fits together, finding checking
          and then updating the existing page.
  * TODO: plan out modifications to LDSTCompUnit, integrating DOUBLE/SINGLE
          into them, coordinating with Cesar ensuring that the Formal Proof
          plans are not disrupted (TODO: find the bugreport, link it here,
          link it to this bugreport and increase cesar's budget up again
          to cover the additional work)

  * DONE: done for everything but:
    * fptrans
      * huge, needs to be split into much smaller subtasks.
        it's enough work that, except for the tasks already split out 
        (recip/[r]sqrt/mod/rem and fminmax) we should probably eliminate
        it entirely from this grant due to being lower priority and difficult.
      * will figure out only if we have budget left over from other tasks
* DONE: translate code size estimates to monetary amounts.
  using conversion factor of 27 days per 1000 lines of code, estimated in:
  https://lists.libre-soc.org/pipermail/libre-soc-dev/2023-August/005607.html
  using EUR 3000/mo from:
  https://lists.libre-soc.org/pipermail/libre-soc-dev/2023-August/005592.html
  this comes out as EUR 2.70 per line of code, assuming a month is 30 days.
  I then rounded budgets to the nearest EUR 100

====================
subtasks we're doing
====================

Needed for v3.0/v3.1-without-PO1:

* DONE: add FPSCR and rounding-mode definitions to ieee754fpu
  https://bugs.libre-soc.org/show_bug.cgi?id=1135
  * 265 lines of code 70 lines of tests
  * rounded to EUR 900
* TODO: add FPSCR and FP registers and dependency-tracking to soc.git
  * probably 100 lines of code with 100 lines of tests
  * rounded to EUR 1400
* TODO: add PowerISA SINGLE/DOUBLE conversion functions
  * (not the same as ieee 754 conversion)
  * 200 lines of code with 100 lines of tests
  * rounded to EUR 1600
* TODO: add FP loads to soc.git
  * 200 lines of code with 200 lines of tests
  * lf[s/d][u][x]
  * lfiwax/lfiwzx
  * rounded to EUR 1900
* TODO: add FP stores to soc.git
  * 200 lines of code with 200 lines of tests
  * stf[s/d][u][x]
  * stfiwx
  * rounded to EUR 1900
* TODO: FP move/sign-bit-manipulation
  * fmr[.]
  * fneg[.]/fabs[.]/fnabs[.]
  * fcpsgn[.]
  * probably 100 lines of code with 50 lines of tests
  * rounded to EUR 500
* TODO: FP add/sub
  * fadd[s][.]/fsub[s][.] using existing pipelines
    * probably 100 lines with 100 lines of tests
  * add FPSCR/rounding support to add/sub pipeline
    * probably 200 lines with 100 lines of tests (due to rounding modes)
  * rounded to EUR 2600
* TODO: FP select
  * probably 100 lines with 50 lines of tests
  * fpsel[.]
  * simple enough that I don't think we even need a ieee754fpu pipeline
  * doesn't interact with FPSCR except for writing CR1 when Rc=1
  * rounded to EUR 400
* TODO: FPSCR manipulation
  * probably 300 lines with 200 lines of tests
  * mffs[.]
  * mffsce[.]
  * mffsc[d]rn[i]
  * mffsl
  * mcrfs
  * mtfsf[i][.]
  * mtfsb[0/1][.]
  * rounded to EUR 1400

=============================
not doing as part of this bug
=============================

(just here since I already did the budgeting work)

(overlaps with https://bugs.libre-soc.org/show_bug.cgi?id=1026#c0, we need to resolve what goes where)

Needed for v3.0/v3.1-without-PO1:

* TODO: FP store double pair
  * 300-500 lines of code with 400 lines of tests
  * stfdp[x]
  * stores are part of SFFS subset, loads are not according to v3.1B Appendix H
  * rounded to EUR 2200
* TODO: FP merge words
  * fmrgew
  * fmrgow
  * probably 100-200 lines of code with 50 lines of tests
  * rounded to EUR 500
* TODO: FP mul/fma
  * fmul[s][.]/f[n]madd[s][.]/f[n]msub[s][.] using existing pipelines
    * probably 2-300 lines with 200 lines of tests
  * add FPSCR/rounding support to mul/fma pipeline
    * probably 200 lines with 100 lines of tests (due to rounding modes)
  * rounded to EUR 3100
* TODO: FP div/sqrt/fre/frsqrte
  * fdiv[s][.]/fsqrt[s][.]/fre[s][.]/frsqrte[s][.] using existing giant div core
    (fre[s][.]/frsqrte[s][.] are full accuracy ops since that's easiest since
    we already need the full accuracy ops for fptrans and we already have a
    pipeline that does the core operation.)
    * probably 300-400 lines with 300 lines of tests
  * create new much smaller FSM pipeline suitable for small cores
    (the giant pipeline won't fit on most our FPGAs),
    basically taking all the giant div core and converting each stage
    to a FSM state. This includes fmod*/fremainder*/frecip*/frsqrt* since
    those are simple on top of the existing div/sqrt/rsqrt logic.
    * probably 500 lines with 200 lines of tests
  * add FPSCR/rounding support to div/sqrt/rsqrt pipeline
    * probably 200 lines with 100 lines of tests
  * rounded to EUR 4500, 1800 for just using existing giant div core,
    3600 without full rounding support
* TODO: FP test for SW div/sqrt
  * ftdiv/ftsqrt
    * probably 150 lines with 50 lines of tests
  * rounded to EUR 800
* TODO: FP round to single
  * frsp[.]
  * add FPSCR/rounding support
  * probably 2-300 lines with 300 lines of tests
  * rounded to EUR 1500
* TODO: FP convert from integer
  * https://bugs.libre-soc.org/show_bug.cgi?id=1137
  * needs fcfi* in ISACaller -- 500 lines with 200 lines of tests
    * mostly copy-paste, so only counting 300 lines total
  * probably 200-400 lines for pipelines/insns
  * create ctfpr pipeline, it can be used (perhaps with slight
    modifications) for all the fcfi* insns too.
  * this covers all of:
    * fcfid[u][s][.]
    * ctfpr*
  * rounded to EUR 1900
* TODO: FP convert to integer
  * https://bugs.libre-soc.org/show_bug.cgi?id=1136
  * needs fcti* in ISACaller -- 500 lines with 200 lines of tests
    * mostly copy-paste, so only counting 300 lines total
  * probably 400-500 lines for pipelines/insns
  * create cffpr pipeline, it can be used (perhaps with slight
    modifications) for all the fcti* insns too.
  * this covers all of:
    * fcti[d/w][u][z][.]
    * cffpr*
  * rounded to EUR 2000
* TODO: FP round to integer (FP to FP, not FP to int)
  * probably 200 lines with 100 lines of tests
  * fri[n/p/z/m][.]
  * add FPSCR support
  * rounded to EUR 800
* TODO: FP compare
  * probably 200 lines with 100 lines of tests
  * fcmp[u/o]
  * add FPSCR support
  * rounded to EUR 800

Needed for Libre-SOC extensions (stuff that's in openpower-isa):

* TODO: fmvis/fishmv
  * probably 100 lines with 50 lines of tests
  * rounded to EUR 400
* SEE ABOVE: c[ft]fpr* (covered by tasks above for FP convert from/to integer)
* TODO: m[ft]fpr*
  * probably 150 lines with 50 lines of tests
  * rounded to EUR 500
* TODO fptrans insns (fminmax*, fsqrt*/frsqrt*, frecip*, fmod*/fremainder*
  are split out into separate tasks)
  * easily multiple thousands of lines of code, should be split up further into
    sub-sub-tasks
* SEE ABOVE: fsqrt*/frsqrt*, frecip*, fmod*/fremainder* (covered by
  task above for FP div/sqrt/fre/frsqrte)
* TODO: fminmax*
  * probably 300 lines with 100 lines of tests
  * https://bugs.libre-soc.org/show_bug.cgi?id=1134
  * rounded to EUR 1100
* TODO: FFT/DCT FP ops
  * fdmadds/ffadd/ffdiv[s]/ffmadds/ffmsubs/ffmul[s]
    /ffnmadds/ffnmsubs/ffsub[s] in openpower-isa
  * probably 500 lines with 300 lines of tests
  * rounded to EUR 2200
Comment 1 Jacob Lifshay 2023-05-06 04:49:10 BST
(In reply to Luke Kenneth Casson Leighton from bug #1072 comment #8)
> (In reply to Jacob Lifshay from bug #1072 comment #7)
> 
> > all I need is just the register with accessible fields, 
> 
> copy the style here
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/
> decoder/isa/svstate.py;hb=HEAD

Done, though I may have gone a little overboard -- I added a parser that parses the fields from the doc comment to ensure they always match. One benefit is it can probably be easily adapted to whatever Latex table thing has the field definitions whenever the OpenPower Foundation releases the spec source.

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=bb7069cee706f7fbc75dc7cafec3afe19cd87e02;hp=c8b2c5d2c984cc444880187679c2a9589bae0526

commit bb7069cee706f7fbc75dc7cafec3afe19cd87e02
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Fri May 5 20:38:22 2023 -0700

    fix fpscr table parser error reporting

commit bef52a38023bfce850174a3156d61170f767dd01
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Fri May 5 20:34:18 2023 -0700

    add FPSCRState and FPSCRRecord and a FPSCR smoke-test

> 
> i will followup by adding it to ISACaller.

Thanks!
Comment 2 Luke Kenneth Casson Leighton 2023-08-19 04:17:31 BST
(In reply to Jacob Lifshay from comment #1)

> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=bb7069cee706f7fbc75dc7cafec3afe19cd87e02;
> hp=c8b2c5d2c984cc444880187679c2a9589bae0526

+
+    @property
+    def DRN(self):
+        return self.fsi['DRN'].asint(msb0=True)
+
+    @DRN.setter
+    def DRN(self, value):
+        self.fsi['DRN'].eq(value)
+

please remove all of those and replace them with
dynamic functions added within the for-loop.
there are far too many and you should have
immediately red-flagged the fact that they are
a regular pattern (60+ identical duplications of the
above lines with different 'DRN')

python is not rust or java.

use a simple override on __getattr__ and __setattr__ do not
waste time creating metaclasses (or override __dir__)
https://amir.rachum.com/python-dynamic-attributes/ we do
not want complex concepts nor unintelligable code nor
vast quantities of unmaintainable code.
Comment 3 Jacob Lifshay 2023-08-22 03:37:12 BST
(In reply to Luke Kenneth Casson Leighton from comment #2)
> please remove all of those and replace them with
> dynamic functions added within the for-loop.

I put them in there because that's what you did for SVSHAPE

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/svshape.py;hb=d40763cd6e186ad9b17ce6f974a38b4c4877965e

I can remove them if you like...
Comment 4 Jacob Lifshay 2023-08-26 01:46:08 BST
I adjusted the costs for different tasks, and figured out what would fit in our EUR 8000 budget (it ended up excluding all Libre-SOC instructions, so I think some of those should just be moved to #1026 where we explicitly have a budget for Libre-SOC instructions)

I picked:
* add FPSCR and rounding-mode definitions to ieee754fpu
  already done
* add FPSCR and FP registers and dependency-tracking to soc.git
  needed for everything else
* add PowerISA SINGLE/DOUBLE conversion functions
  needed for nearly everything else
* add FP loads to soc.git
  we need a way to get data in/out of FP registers and loads/stores are what
  compilers currently use (they don't support m[ft]fpr yet)
* add FP stores to soc.git
  same as FP loads reasoning
* FP move/sign-bit-manipulation
  data in registers needs to move around
* FP add/sub
  common and fits in budget
* FP select
  exactly fits in EUR 400 left over from everything else
* FPSCR manipulation
  we need a way to get data in/out of FPSCR
Comment 5 Luke Kenneth Casson Leighton 2023-08-26 01:55:10 BST
(In reply to Jacob Lifshay from comment #3)
> (In reply to Luke Kenneth Casson Leighton from comment #2)
> > please remove all of those and replace them with
> > dynamic functions added within the for-loop.
> 
> I put them in there because that's what you did for SVSHAPE

yes but not 64 individual bits resulting in 250+ lines of duplicated
code easily autogenerated.

> I can remove them if you like...

no it's too late now, done and a waste of time removing them, despite
being a maintenance nightmare.

to be honest these should all be done by insndb, which is a big enough
task to justify its own budget, and autogenerating nmigen
Record-derivatives as well.
Comment 6 Luke Kenneth Casson Leighton 2023-08-26 02:18:52 BST
(In reply to Jacob Lifshay from comment #4)
> I adjusted the costs for different tasks, and figured out what would fit in
> our EUR 8000 budget (it ended up excluding all Libre-SOC instructions, so I
> think some of those should just be moved to #1026 where we explicitly have a
> budget for Libre-SOC instructions)

not sure yet. see how it goes.
 
> I picked:
  ....

  it is a good start, gives the general idea of cost, but is missing
preparation (please find where i described it already) and regfile
profile analysis and documentation.  also adding regfiles, and prep
code to initialise then extract them in the TestAPI. and adding
config params, and adding TestIssuer options, all the way to Makefile
to compile with or without FP.


> * add FP loads to soc.git
>   we need a way to get data in/out of FP registers

ignore LD/ST initially and have the unit tests pre-arrange values
in regs. i had to add that for GPRs, and so adding the ability
to up!oad into FPRs prior to starting the nmigen sim is *another*
task on the list.

please follow the following incremental strategy
as top-level bugs:

* add FPR regfile **ONLY** plus do the pipeline reg allocation
  analysis.  grep all code for "IntRegs" and duplicate all sections,
  then document the proposed pipeline reg allocation.
  - fneg will be in 1R1W (plus FPSCR plus CR1, read and write)
  - fmac in 3R1W (plus FPSCR CR1)
  - fadd *also* consider in 3R1W but with mul as "no input".
  - fld/fst as another (bear in mind these are special-case)

* do the 1in1out as one bug "group"

* do 3r1w as another

* do LD/ST as another

* add FPSCR and CR1 to all pipelines *as a totally separate set of
  tasks* with their own toplevel budget.  reason: look in common_input
  and output_stage.py, the same can be done for CR1 / FPSCR.
  FPSCR regfile can copy XERRegs style.

this will give a coarse granularity that NLnet will be happier with,
and is a safer incremental approach.
Comment 7 Luke Kenneth Casson Leighton 2023-08-26 16:18:23 BST
i did think of another way (another subdivision) but i can't remember it
this morning. oh yes!

* pipeline (+tests)
  * CompUnit (+tests using classes as used in test_caller_*)
     * FunctionUnit (+tests using same classes)
        * Core integration (test_core.py - needs FP option)
           * TestIssuer integration (test_issuer.py - needs option(s) plural)

the *exact* same unit tests - many of which have already been written
for use by test_caller*.py - can if generalised be used the entire way,
*four* new (additional) times.

i got test_core.py up and running again last time i looked at this
(when doing the InOrder core).