Bug 418 - SPR pipeline formal correctness proof needed
Summary: SPR pipeline formal correctness proof needed
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Formal Verification (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Samuel A. Falvo II
URL:
Depends on: 429
Blocks: 195 383 315 348
  Show dependency treegraph
 
Reported: 2020-07-07 12:13 BST by Luke Kenneth Casson Leighton
Modified: 2021-04-24 18:16 BST (History)
2 users (show)

See Also:
NLnet milestone: NLNet.2019.10.Formal
total budget (EUR) for completion of task and all subtasks: 400
budget (EUR) for this task, excluding subtasks' budget: 400
parent task for budget allocation: 195
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
"kc5tja"=350 "lkcl"={amount=50, submitted=2021-04-24}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-07-07 12:13:59 BST
a formal proof of the SPR pipeline is needed
Comment 1 Luke Kenneth Casson Leighton 2020-07-09 09:51:59 BST
code is here:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/spr/main_stage.py;hb=HEAD

example which can be cookie-cut as a starting point:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/formal/proof_main_stage.py;hb=HEAD
Comment 2 Luke Kenneth Casson Leighton 2020-07-11 10:43:56 BST
so the purpose of the SPR pipeline is dead simple: there are only two
instructions.  they copy SPRs (special purpose registers) to / from
GPRs (general purpose registers).

that's it.  end of story.

the minor complication is that as you can see from main_stage.py the
SPRs are *not* in "one single register file", they're spread out across
*three* separate register files:

* XER   (implemented)
* FAST  (implemented)
* SLOW  (TODO)

when we add SLOW SPRs, a re-mapping will need to be used (see PowerDecoder2
SPRMap) which transforms the 10-bit PowerISA numbering scheme into a binary
index into the SPR SRAM (which has only 70 or so entries, not 1024).

so yes: a very dumb, very straightforward, very obvious and well-documented
formal proof is needed.  combinatorial.  nothing fancy at all.
Comment 3 Samuel A. Falvo II 2020-07-14 18:19:39 BST
There is so much meta-programming happening, I cannot navigate the interface of the SPR main stage module.  Is there some place I can go to read up on how interfaces are constructed and what their signal meanings are?  E.g., what is the difference between spr1 and fast1?  Where does the op field come from?  Etc.  Thanks.
Comment 4 Samuel A. Falvo II 2020-07-14 19:55:42 BST
Ahh, I just located https://libre-soc.org/3d_gpu/architecture/regfile/ which is helping to resolve some of the mysteries.
Comment 5 Luke Kenneth Casson Leighton 2020-07-14 20:53:52 BST
(In reply to Samuel A. Falvo II from comment #3)
> There is so much meta-programming happening, I cannot navigate the interface
> of the SPR main stage module. 

the code is designed to set up a massive parallel batch of pipelines.
not "1 lovely pipeline, oo look, innit pretty".

consequently, data paths branch out massively.  to avoid huge code-duplication,
the use of OO is just... necessary.

you will find a file soc/fu/README.md which explains the structure:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/README.md;hb=HEAD

for example: spr_input_subset.py contains *ONLY* the subset of PowerISA
field information *SPECIFICALLY* relevant to this particular pipeline.

spr/pipe_data.py *ONLY* contains the input and output format of register
data coming into and out of this particular pipeline.

> Is there some place I can go to read up on
> how interfaces are constructed and what their signal meanings are?  E.g.,
> what is the difference between spr1 and fast1?

the docstring at the top of fu/spr/pipe_data.py is worth reading.  it
explains that there are multiple register files.  you've discovered
the regfiles wiki page and that should link you back to regfiles.py

>  Where does the op field come from?

self.i.ctx.op is established by the pipeline code.  it contains
the "operation" - an instance of the Operation Class - in this
case that's CompOpSPRSubset, which you'll find in fu/spr/spr_input_record.py

basically, this is however all context.  the key things to know are:

* the pipelines *do not* themselves read or write to register files.
  they do not even actually know which register port nor which actual
  register they are writing to or reading from.

  all they need to know is: that has *already been taken care of*, and
  they need only, specifically and exclusively to think in terms of InputData
  and OutputData, as defined by "regspecs".

* therefore in this case. all that you need to know is:

  - self.i.ra/fast1/spr1/xer_so/ca/ov is incoming data
  - self.o.o/fast1/spr1/etc is outgoing data

* these are listed in the "convenience variables" in main_stage.py
  and you will see that exact same pattern both in other main_stage.py
  files *and in their corresponding proofs*

  see comment #1 for a link to e.g. the ALU formal proof

* in this particular pipeline, some of the SPRs (and other data) need
  to be accessed much more regularly than others.  these are labelled
  "Fast" Regs.

  these are in an *unary* addressed regfile with *six* read ports and
  five write ports (6R5W).

  all other SPRs are considered "slow" and are in a 2R1W binary-addressed
  regfile.
Comment 6 Luke Kenneth Casson Leighton 2020-07-14 21:00:32 BST
commit 769a8d5444f46cd093ff03645658198eaa8bd76d
Author: Samuel A. Falvo II <kc5tja@arrl.net>
Date:   Tue Jul 14 12:17:45 2020 -0700

    SPR: FV that should fail currently passes
    
    WIP.  Cannot figure out why this is not failing.  Code review requested.
Comment 7 Luke Kenneth Casson Leighton 2020-07-14 21:09:07 BST
(In reply to Luke Kenneth Casson Leighton from comment #6)
> commit 769a8d5444f46cd093ff03645658198eaa8bd76d
> Author: Samuel A. Falvo II <kc5tja@arrl.net>
> Date:   Tue Jul 14 12:17:45 2020 -0700
> 
>     SPR: FV that should fail currently passes
>     
>     WIP.  Cannot figure out why this is not failing.  Code review requested.

commit ead389c3edc78ec2af354a3f713788ac8c0ce6a2 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Jul 14 20:59:59 2020 +0100

    cookie-cut setup from alu proof_main_stage.py

Michael pre-established a pattern for verifying the input fields and in
particular for verifying that the ctx object fields are correctly
copied by the pipeline from input to output.

i've cookie-cut this code over.  this takes care of verifying this (one)
line of code at the end of fu/spr/main_stage.py:

        comb += self.o.ctx.eq(self.i.ctx)

actually.. it doesn't.  other ctx contents aren't verified.  it only
verifies ctx.op, oh well.  will raise a separate bugreport about that.



regarding unit tests:

    #don't worry about it - tests are run manually anyway.  fail is fine.
    #@skipUnless(getenv("FORMAL_SPR"), "Exercise SPR formal tests [WIP]")
    def test_formal(self):

commenting out / disabling unit tests "just because they fail or are
otherwise incomplete" is generally bad practice.  the python unittest
module is *specifically* designed to catch success/fail exceptions and
provide reporting.

disabling a test, there has to be a *really* good reason such as "the unit
test actively interferes with the completion or running of other tests",
such as failing to complete (at all), taking several days to complete,
or destroying or tampering with filesystems in ways that prevent other
tests from running.
Comment 8 Luke Kenneth Casson Leighton 2020-07-14 21:15:03 BST
(In reply to Luke Kenneth Casson Leighton from comment #5)

> * in this particular pipeline, some of the SPRs (and other data) need
>   to be accessed much more regularly than others.  these are labelled
>   "Fast" Regs.
> 
>   these are in an *unary* addressed regfile with *six* read ports and
>   five write ports (6R5W).
> 
>   all other SPRs are considered "slow" and are in a 2R1W binary-addressed
>   regfile.

and (forgot to mention) there are 5 (6 but one is ignored) bits from
the XER SPR which are accessed so frequently (carry, overflow, sticky)
that if we did not have a separate dedicated register file for them,
where each register is only 2 bits (!) then the pipelines would be
completely incapable of any reasonable level of parallelism / overlap.

this means however that in this *particular* pipeline, which implements
"transfer arbitrary SPR to register" and "transfer register to arbitrary
SPR", when that SPR is "XER" it becomes necessary to *construct* the
response - not from a Fast Reg - not from a slow SPR reg - but from
the *XER Regfile*.

microwatt does exactly the same thing.
Comment 9 Luke Kenneth Casson Leighton 2020-07-14 21:26:58 BST
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Tue, Jul 14, 2020 at 9:17 PM Samuel Falvo II <sam.falvo@gmail.com> wrote:
>
> On Tue, Jul 14, 2020 at 1:09 PM <bugzilla-daemon@libre-soc.org> wrote:
> >     cookie-cut setup from alu proof_main_stage.py
>
> I'm sorry; I'm trying; I'm just not getting it.  I do not feel
> comfortable just copy-pasta code from one file to another without
> knowing what it does. 

hm.  interesting.  then perhaps first it would be of value for you
to review the existing proofs, which were written by Michael, who
seems to very much know what he's doing, however (as you saw from
what i wrote and raised bug #429) occasionally misses thigns.

> It actively undermines, IMO, the value
> proposition behind formal verification. 

i hear you on that.

> If something goes wrong, and
> I cannot explain it, and even less, be able to reason about it, I
> don't know what value I'm providing the project.

very true.

ok so that makes reviewing the existing formal proofs up to and
*only* up to the verification of the op Record - quite important.


>
> >     #don't worry about it - tests are run manually anyway.  fail is fine.
> >     #@skipUnless(getenv("FORMAL_SPR"), "Exercise SPR formal tests [WIP]")
> >     def test_formal(self):
> >
> > commenting out / disabling unit tests "just because they fail or are
> > otherwise incomplete" is generally bad practice.  the python unittest
> > module is *specifically* designed to catch success/fail exceptions and
> > provide reporting.
>
> This goes against what was written, several different places, on the
> libre-soc wiki insisting that the test suites always pass. 

ah no, definitely not.  if that's the impression then it's definitely
wrong.

> I can't
> remember where I read it now, but I distinctly remember reading that
> features and tests which are known to fail should not interfere with
> the remainder of the code.

yes - and a simple test like this definitely will not interfere with
the remainder of the code.  some of the tests *do* actually interfere.

>  Hence my feature flag; my FV tests are not
> yet complete, and can be misleading to others as I learn more about
> the build.

don't worry about it.  really.

> The intention is, once I'm confident that the tests are of good
> quality, then the feature flag can be removed.

this actively interferes with collaboration.  anyone else attempting
to help with the development and understanding of the test has to
also set that "feature" flag.

as this practice (of setting "feature" flags) has not been established,
discussed, or documented, sticking to "standard python unit test"
practice is the default.

and that means "only damaging unit tests get de-activated or skipped",
and this is not one of them.
Comment 10 Samuel A. Falvo II 2020-07-17 18:32:48 BST
Code fleshed out in https://git.libre-soc.org/?p=soc.git;a=commit;h=3af7a728eebe2ce97238825470d2995e4be96ca7
Comment 11 Samuel A. Falvo II 2020-07-17 19:09:26 BST
I think I chose the wrong status.  Whoops.
Comment 12 Luke Kenneth Casson Leighton 2020-07-17 19:44:56 BST
(In reply to Samuel A. Falvo II from comment #11)
> I think I chose the wrong status.  Whoops.

normally we'd go through a review process, don't worry about it.
Comment 13 Luke Kenneth Casson Leighton 2020-07-17 19:49:49 BST
(In reply to Samuel A. Falvo II from comment #10)
> Code fleshed out in
> https://git.libre-soc.org/?p=soc.git;a=commit;
> h=3af7a728eebe2ce97238825470d2995e4be96ca7

looks really good, samuel.  i wouldn't have spotted that fast1.ok
is disallowed when there's a slow path (and vice-versa).
Comment 14 Luke Kenneth Casson Leighton 2020-09-06 20:13:59 BST
samuel i'm re-opening this one (and increasing the budget accordingly)
although it is very simple we need the slow SPRs as well as the DEC and TB
SPRs doing (see fu/spr/main_stage.py it's real simple).  the gotcha is
that TBU is a copy of the upper half of the internal TB register.