Bug 340 - formal proof of POWER9 SHIFTROT pipeline needed
Summary: formal proof of POWER9 SHIFTROT pipeline needed
Status: PAYMENTPENDING FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Formal Verification (show other bugs)
Version: unspecified
Hardware: PC Linux
: Low enhancement
Assignee: Jacob Lifshay
URL:
Depends on:
Blocks: 339
  Show dependency treegraph
 
Reported: 2020-05-22 21:28 BST by Luke Kenneth Casson Leighton
Modified: 2022-09-15 17:42 BST (History)
2 users (show)

See Also:
NLnet milestone: NLNet.2019.10.032.Formal
total budget (EUR) for completion of task and all subtasks: 2400
budget (EUR) for this task, excluding subtasks' budget: 2400
parent task for budget allocation: 195
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
jacob = { amount = 2000, paid = 2022-03-03 } [lkcl] amount = 400 submitted = 2022-06-16 paid = 2022-09-06


Attachments
Screenshot showing failure mode of shiftrot pipeline. (34.79 KB, image/png)
2020-08-03 16:21 BST, Samuel A. Falvo II
Details
Showing where ((1<<63) << 0) == 0 (62.83 KB, image/png)
2020-09-05 01:11 BST, Samuel A. Falvo II
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-05-22 21:28:41 BST
a formal proof of the POWER9 shiftrot function is needed

* OP_EXTSWSLI - done
* OP_SHL - done
* OP_SHR - done
* OP_RLC - done
* OP_RLCL - done
* OP_RLCR - done
* OP_GREV - done
* OP_TERNLOG - done

* Data.ok - done
* CA32 out - done
* CA out - done

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/formal/proof_main_stage.py;hb=HEAD
Comment 1 Luke Kenneth Casson Leighton 2020-07-13 20:31:28 BST
commit 21f5fbd55b42fd211604b9902eeb06996a1b7fb5
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Jul 13 20:28:11 2020 +0100

    formal proof of OP_EXTSWSLI
Comment 2 Luke Kenneth Casson Leighton 2020-07-26 22:56:56 BST
samuel the code is fine as there are a number of unit tests that already pass.  we just need the mask/clear proofs added.  i had made a start on it already however the different options was doing my head in.

here we definitely, definitely need to stick with the field decoders.  there are odd things in the pseudocode that swap the MSB to the LSB.

example:
     
    b <- mb[5] || mb[0:4]

however please especially note: the 3.0B spec pseudocode is **WRONG** for the field named "sh".  it is the shift amount AS-IS, it must NOT be Cat(sh[5], sh[0:5]) or somesuch.

i have raised this as a bugreport on openpowerhdl-cores.

https://libre-soc.org/openpower/isa/fixedshift/

so, again: shl, extswsli, these are all done.  it is just the "mask" versions rl* that need doing.
Comment 3 Samuel A. Falvo II 2020-07-28 23:59:55 BST
I cannot get the formal proofs for this pipeline working at all.  See the following error:

```
======================================================================
ERROR: test_formal (proof_main_stage.ALUTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kc5tja/git/libre-soc/soc/src/soc/fu/shift_rot/formal/proof_main_stage.py", line 174, in test_formal
    self.assertFormal(module, mode="bmc", depth=2)
  File "/home/kc5tja/git/libre-soc/nmutil/src/nmutil/formaltest.py", line 97, in assertFormal
    rtlil=rtlil.convert(Fragment.get(spec, platform="formal"))
  File "/home/kc5tja/git/libre-soc/nmigen/nmigen/hdl/ir.py", line 39, in get
    obj = obj.elaborate(platform)
  File "/home/kc5tja/git/libre-soc/soc/src/soc/fu/shift_rot/formal/proof_main_stage.py", line 68, in elaborate
    dut_sig = getattr(dut.o.ctx.op, name)
  File "/home/kc5tja/git/libre-soc/nmigen/nmigen/hdl/rec.py", line 149, in __getattr__
    return self[name]
  File "/home/kc5tja/git/libre-soc/nmigen/nmigen/hdl/rec.py", line 160, in __getitem__
    raise AttributeError("{} does not have a field '{}'. Did you mean one of: {}?"
AttributeError: Record 'alu_op' does not have a field 'alu_op__insn_type'. Did you mean one of: insn_type, fn_unit, imm_data, rc, oe, invert_a, zero_a, invert_out, write_cr0, input_carry, output_carry, is_32bit, is_signed, data_len, insn?
```

I tried changing all references of AluOp to SrOp, but I get the exact same errors.  I'm at a loss.  Can anyone provide guidance?  Thanks.
Comment 4 Luke Kenneth Casson Leighton 2020-07-29 00:16:51 BST
(In reply to Samuel A. Falvo II from comment #3)

> "/home/kc5tja/git/libre-soc/soc/src/soc/fu/shift_rot/formal/proof_main_stage.
> py", line 68, in elaborate
>     dut_sig = getattr(dut.o.ctx.op, name)

moo this should have been replaced, like when you found in trap formal proof
(or spr?) that muxid was not being checked.  i'd done ALU and a couple others.

> I tried changing all references of AluOp to SrOp, but I get the exact same
> errors.  I'm at a loss.  Can anyone provide guidance?  Thanks.

sorted, git pull, try again

commit 92d39595d144700306ce7ac5a37260cc918c93fc (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Jul 29 00:14:33 2020 +0100

    use ctx.op compare (and muxid) in shiftrot proof
    also use correct input record type and spec
Comment 5 Samuel A. Falvo II 2020-07-29 00:42:27 BST
(In reply to Luke Kenneth Casson Leighton from comment #4)
> moo this should have been replaced, like when you found in trap formal proof
> (or spr?) that muxid was not being checked.  i'd done ALU and a couple
> others.

Gahh, not sure why I didn't think of that.  I missed that.  Thanks; it passes now for me.
Comment 6 Samuel A. Falvo II 2020-07-31 18:01:17 BST
Committed some WIP code; I cannot get closure on the instructions:

rlwinm(.)
rlwnm(.)
rlwimi(.)

On my system, there appears to be a bug with mask generation.  Yosys generates an RLC-type instruction with MB=0x10 and ME=0x14, which should generate a mask of 0x0000F000.  However, the final mask that does get generated is 0x00000000 (since ml=0x00000000 while mr=0xFFFFFFFF.  I would *expect* that ml=0x0000FFFF and mr=0xFFFFF000, so that their bitwise-AND would give the desired mask.

But, when I calculate by hand how the masks are generated, the results are, indeed, exactly what I'd expect them to be.

I can't explain the discrepancy.

BTW, any reason why the mb and me signals are 7-bits wide in proof_main_stage.py?  Since GPRs are only 64-bits wide, wouldn't 6-bits be sufficient?
Comment 7 Luke Kenneth Casson Leighton 2020-07-31 18:30:25 BST
(In reply to Samuel A. Falvo II from comment #6)
> Committed some WIP code; I cannot get closure on the instructions:
> 
> rlwinm(.)
> rlwnm(.)
> rlwimi(.)
> 
> On my system, there appears to be a bug with mask generation.  Yosys
> generates an RLC-type instruction with MB=0x10 and ME=0x14, which should
> generate a mask of 0x0000F000.  However, the final mask that does get
> generated is 0x00000000 (since ml=0x00000000 while mr=0xFFFFFFFF.  I would
> *expect* that ml=0x0000FFFF and mr=0xFFFFF000, so that their bitwise-AND
> would give the desired mask.

ok suggestion there is to write some short program that actually prints out the values.

i got horribly confused by one of the masks being subtracted from 63/64 and not the other.


> But, when I calculate by hand how the masks are generated, the results are,
> indeed, exactly what I'd expect them to be.
> 
> I can't explain the discrepancy.
> 
> BTW, any reason why the mb and me signals are 7-bits wide in
> proof_main_stage.py?  Since GPRs are only 64-bits wide, wouldn't 6-bits be
> sufficient?

i believe this is because they are calculated as 64 bit and can be signed?  the top bit allow expression of -64 which requires 7 bits not 6.

honestly though i am just guessing, there.
Comment 8 Samuel A. Falvo II 2020-07-31 18:42:12 BST
(In reply to Luke Kenneth Casson Leighton from comment #7)
> ok suggestion there is to write some short program that actually prints out
> the values.

I'm not sure how this would help, as that would just tell me the same information that GTKWave is reporting, no?

The other question I had, upon further reflection, is that we have three types of RLC instructions, but one of them (rlwimi) has different behavior.  Once this masking bug is resolved, this will become another failure in the proofs.  To work around this, I need to actually check the major opcode (m_fields.OPCD).  I have *not* implemented that yet, but will shortly.  Should we assign a different OP_RLCxxx token for rlwimi because of this?  Or, is there some other field I should check that tells me whether or not I should OR-in the (a & ~mask) part of the result?
Comment 9 Luke Kenneth Casson Leighton 2020-07-31 19:46:12 BST
(In reply to Samuel A. Falvo II from comment #8)
> (In reply to Luke Kenneth Casson Leighton from comment #7)
> > ok suggestion there is to write some short program that actually prints out
> > the values.
> 
> I'm not sure how this would help, as that would just tell me the same
> information that GTKWave is reporting, no?

i meant as a smaller program rather than a much larger one.  i wrote a small mask-printing test a few weeks ago, ran for loops from 0 to 63 and it was very instructive.

> The other question I had, upon further reflection, is that we have three
> types of RLC instructions, but one of them (rlwimi) has different behavior.

it shouldn't.  or, microwatt has already taken it into consideration in the design of rotator.py and how it is called from sr main_stage.py

 
> Once this masking bug is resolved, this will become another failure in the
> proofs.  To work around this, I need to actually check the major opcode
> (m_fields.OPCD).  I have *not* implemented that yet, but will shortly. 
> Should we assign a different OP_RLCxxx token for rlwimi because of this?

no absolutely not.  the reason is because the CSV files come directly from microwatt decode1.vhdl, the code was written by a team with *over 25* years experience *each* in PowerISA, and we can trust they they know what they are doing.

the way that microwatt (and libresoc) works is that the immediates are placed into the RB input, as the second operand.

we therefore *lose and do not need* to even know that the instruction was an immediate or a register variant.

all that the pipeline needs to know is: RA, RB and for this pipeline RS as well.

however some operations also need mb and me fields so that is why rotator.py takes extra parameters.


> Or, is there some other field I should check that tells me whether or not I
> should OR-in the (a & ~mask) part of the result?

take a look at shift rot main_stage.py.  

you will see that rotator is called with some pretty clear decoding and identification of the instructions

* left mask needed
* right mask needed
* is arithmetic 
* left_or_right

something like that.

but really: you really *do not* need to know it was an immediate variant or not.  that has definitely been encoded into RB (and in the case of LDST into RA as a zero).  look at DecodeInB in power_decode2 .py search for "imm_out" then crossreference the CONST_xxx against the CSV files in isatables to see what is going on.
Comment 10 Luke Kenneth Casson Leighton 2020-07-31 19:46:22 BST
basically samuel if you believe that rlwimi requires an additional opcode test then you are on the wrong track and should take that into account and think why.
Comment 11 Luke Kenneth Casson Leighton 2020-08-01 10:42:10 BST
*click*.

no, you might be right.... no wait... nope :)

ok it's confusing.

* immediates come in on the imm_data part of the input record
* however it is the *CompUnit* that notices if imm_data.ok is set
* the *CompUnit* redirects imm_data into RB if imm_data.ok is set

therefore yes, by the time the pipeline receives anything, there
*are* no "immediates", and there's definitely no change of behaviour
based on whether it's a rlxxxxi

i'm going to expand out the for-loop here because the debug
reports from the formal proof are masked. every port there
is given the same line number from the source file, with no
"name"

38:        for p in rec.ports():
39:            comb += p.eq(AnyConst(p.width))

Value for anyconst in top (fu/shift_rot/formal/proof_main_stage.py:39): 1
Value for anyconst in top (fu/shift_rot/formal/proof_main_stage.py:39): 56
Value for anyconst in top (fu/shift_rot/formal/proof_main_stage.py:39): 1064
Comment 12 Luke Kenneth Casson Leighton 2020-08-01 10:50:21 BST
ok next stage, from those values, back-reverse-engineer the instruction
that *would* create those failed values into an actual instruction,
create a unit test for it and run it against qemu and ISACaller.
Comment 13 Luke Kenneth Casson Leighton 2020-08-01 12:20:47 BST
samuel i added this to the unit test: it passes.

--- a/src/soc/fu/shift_rot/test/test_pipe_caller.py
+++ b/src/soc/fu/shift_rot/test/test_pipe_caller.py
@@ -77,6 +77,14 @@ def set_alu_inputs(alu, dec2, sim):
 
 class ShiftRotTestCase(TestAccumulatorBase):
 
+    def case_0_proof_regression_rlwnm(self):
+        lst = ["rlwnm 3, 1, 2, 16, 20"]
+        initial_regs = [0] * 32
+        initial_regs[1] = 0x7ffdbffb91b906b9
+        initial_regs[2] = 31
+        print(initial_regs[1], initial_regs[2])
+        self.add_case(Program(lst, bigendian), initial_regs)
+

ah i think i know why.

look in rotator.py:

            with m.Case(0b00):
                comb += self.result_o.eq((rot & (mr & ml)) | (ra & ~(mr & ml)))
            with m.Case(0b01):
                comb += self.result_o.eq((rot & (mr | ml)) | (ra & ~(mr | ml)))


and the only (corresponding) case in the proof is:

                comb += exp_ol.eq(field((exp_rot & mrl) | (ainp & ~mrl),


so the detection for when the mask beginning is at the opposite end i.e.
(greater or less?) than the mask end, is missing.

basically from the spec:

    "A mask is generated having 1-bits from bit MB+32 through
     bit ME+32 and 0-bits elsewhere."

this mask actually *rotates* and hence the inversion thing above.

notice that the OR of mr and ml produces 0xffffffff? (MASK_FOR_RLC)

that should be 0x0000f800 because they are supposed to be ANDed

ml = 0xffffff800
mr = 0x00000ffff

and when 0x0000f800 is ANDed with the input it produces the correct
answer which is 0x00008000

however because the mask is ORed together (which it should not be)

                comb += mrl.eq(ml | mr)

the proof *incorrectly* asserts the correct output as being untrue.
Comment 14 Samuel A. Falvo II 2020-08-03 16:13:32 BST
(In reply to Luke Kenneth Casson Leighton from comment #13)
> so the detection for when the mask beginning is at the opposite end i.e.
> (greater or less?) than the mask end, is missing.
> 
> basically from the spec:
> 
>     "A mask is generated having 1-bits from bit MB+32 through
>      bit ME+32 and 0-bits elsewhere."
> 
> this mask actually *rotates* and hence the inversion thing above.
> 
> notice that the OR of mr and ml produces 0xffffffff? (MASK_FOR_RLC)
> 
> that should be 0x0000f800 because they are supposed to be ANDed
> 
> ml = 0xffffff800
> mr = 0x00000ffff

It looks like your test case is different from mine, as this is exactly not what I'm seeing.  I'm literally seeing ml = 0 and mr = -1.  Further, I'm seeing that MB < ME, so ml should not be equal to 0.

> however because the mask is ORed together (which it should not be)

The statement reads ml | mr because that was the final test I'd performed before checking everything in and giving up.  The test fails also with ml & mr, which as I'd documented previously, equals 0x00000000.

I'll add the MB > ME test and select & vs | based on its results.  If this makes a significant difference, then I'd be shocked, as the failing case would not at all indicate this was the problem.  I *should* be seeing ml and mr being disjoint masks if that were the case, but I'm not.
Comment 15 Samuel A. Falvo II 2020-08-03 16:19:22 BST
Updated repo to check for MB > ME and select ml &/| mr appropriately.  But, still same failure mode.  I've captured a screenshot.
Comment 16 Samuel A. Falvo II 2020-08-03 16:21:19 BST
Created attachment 95 [details]
Screenshot showing failure mode of shiftrot pipeline.

In this screenshot, you can see GTKWave's output for the failing case.  Note that MB < ME, which means that ml & mr should yield a non-zero mask.  ml should have the lower 16 bits set.  mr should have the top-most 20 bits set.  Instead, ml=0, and mr=0xFFFFFFFF.
Comment 17 Luke Kenneth Casson Leighton 2020-08-03 17:21:00 BST
(In reply to Samuel A. Falvo II from comment #14)
> (In reply to Luke Kenneth Casson Leighton from comment #13)
> > so the detection for when the mask beginning is at the opposite end i.e.
> > (greater or less?) than the mask end, is missing.
> > 
> > basically from the spec:
> > 
> >     "A mask is generated having 1-bits from bit MB+32 through
> >      bit ME+32 and 0-bits elsewhere."
> > 
> > this mask actually *rotates* and hence the inversion thing above.
> > 
> > notice that the OR of mr and ml produces 0xffffffff? (MASK_FOR_RLC)
> > 
> > that should be 0x0000f800 because they are supposed to be ANDed
> > 
> > ml = 0xffffff800
> > mr = 0x00000ffff
> 
> It looks like your test case is different from mine, 

hmm i was expecting symbiyosys to be deterministic.

if it is, you needed to have tested exactly the same code otherwise it woukd be different.

of course, the other way is to explicitly set the same input values in the proof itself.

> as this is exactly not
> what I'm seeing.  I'm literally seeing ml = 0 and mr = -1.  Further, I'm
> seeing that MB < ME, so ml should not be equal to 0.

ok so here's what i did:

* examined the proof debug assert failure logs
* associated each failure with the signal it came from
* back-reverse-engineer the instruction that would create the required MB, ME, abd OP_RLC
* create a unit test with those values
* get confidence that the unit tests succeeds (which tells us that the proof is wrong)
* compare the gtkwave files and look fir discrepancies.

> > however because the mask is ORed together (which it should not be)
> 
> The statement reads ml | mr because that was the final test I'd performed
> before checking everything in and giving up.  The test fails also with ml &
> mr, which as I'd documented previously, equals 0x00000000.

the problem with doing that is that the engine simply finds different inputs that cause *that* to fail instead.

so ml and mr get different values.

> I'll add the MB > ME test and select & vs | based on its results.

remember that you also need, when doing &, to invert it.

check rotator.py closely, again.


>  If this
> makes a significant difference, then I'd be shocked, as the failing case
> would not at all indicate this was the problem.  I *should* be seeing ml and
> mr being disjoint masks if that were the case, but I'm not.

MASK() which is the function back in the soec, very early on, is really unclear to me, unfortunately.  you might be able to make sense of it.
Comment 18 Samuel A. Falvo II 2020-08-04 03:49:11 BST
(In reply to Luke Kenneth Casson Leighton from comment #17)
> remember that you also need, when doing &, to invert it.
> 
> check rotator.py closely, again.

I did check this file, but I don't see where the mask is inverted.  The closest thing I can find to an inversion is this:

            with m.Case(0b00):
                comb += self.result_o.eq((rot & (mr & ml)) | (ra & ~(mr & ml)))
            with m.Case(0b01):
                comb += self.result_o.eq((rot & (mr | ml)) | (ra & ~(mr | ml)))

where bits from the un-rotated RA value are mixed into the final output via the negated mask.  I'm already doing that in the proof.  Inverting mask bits when MB > ME also contradicts the definition of the MASK() operation, documented on page 6 of Book I.

> MASK() which is the function back in the soec, very early on, is really
> unclear to me, unfortunately.  you might be able to make sense of it.

Based on my understanding of how rwlinm, et. al. seem to work (per the specs), MASK() takes a start and end position, in IBM bit position notation.  Starting at bit x (start), the mask contains 1 bits.  It ends (optionally wrapping) at bit y (end).  All other bits are 0.  So, let's pretend I'm using an 8-bit mask; then,

  MASK(1,7) = 0b01111111
  MASK(7,1) = 0b11000001
  MASK(3,4) = 0b00011000
  MASK(4,3) = 0b11111111

---

I apologize for taking so long on this.  This is a real mind-bender for me.  I'm increasingly inclined to put this task aside and work on something else; it seems intractable for my skill level.  I'm frustrated that I've been going on a week, and nothing to really show for it.
Comment 19 Luke Kenneth Casson Leighton 2020-08-04 09:18:50 BST
(In reply to Samuel A. Falvo II from comment #18)

> I did check this file, but I don't see where the mask is inverted.  The
> closest thing I can find to an inversion is this:
> 
>             with m.Case(0b00):
>                 comb += self.result_o.eq((rot & (mr & ml)) | (ra & ~(mr &
> ml)))
>             with m.Case(0b01):
>                 comb += self.result_o.eq((rot & (mr | ml)) | (ra & ~(mr |
> ml)))

yes that's it.  and using boolean remapping one of those could be done as, let's pick the 1st one

(rot & ~(mr | ml)) | (ra & ~(mr & ml)))

see how they're the bit-opposite?

> where bits from the un-rotated RA value are mixed into the final output via
> the negated mask.  I'm already doing that in the proof.  Inverting mask bits
> when MB > ME also contradicts the definition of the MASK() operation,
> documented on page 6 of Book I.

it's confusing as hell because all four possible combinations of and or not of ml and mr are involved in those 2 lines in rotator.py
 

> Based on my understanding of how rwlinm, et. al. seem to work (per the
> specs), MASK() takes a start and end position, in IBM bit position notation.
> Starting at bit x (start), the mask contains 1 bits.  It ends (optionally
> wrapping) at bit y (end).  All other bits are 0.  So, let's pretend I'm
> using an 8-bit mask; then,
> 
>   MASK(1,7) = 0b01111111
>   MASK(7,1) = 0b11000001
>   MASK(3,4) = 0b00011000
>   MASK(4,3) = 0b11111111

ahh now that is clear.  oh ah are you sure about 4,3? should that be 4,3=0b11011111?
 

> I apologize for taking so long on this.  This is a real mind-bender for me. 

likewise :)

> I'm increasingly inclined to put this task aside and work on something else;
> it seems intractable for my skill level.  I'm frustrated that I've been
> going on a week, and nothing to really show for it.

ok give it a rest, we can try mul and come back to this one.
Comment 20 Samuel A. Falvo II 2020-08-04 21:49:49 BST
(In reply to Luke Kenneth Casson Leighton from comment #19)
> (rot & ~(mr | ml)) | (ra & ~(mr & ml)))
> 
> see how they're the bit-opposite?

This doesn't satisfy DeMorgan's Theorem identities:

x AND y = ~(~x OR ~y)

It looks like you'll also need to invert mr and ml on the left-hand transformation as well.  Am I missing something?

> ahh now that is clear.  oh ah are you sure about 4,3? should that be
> 4,3=0b11011111?

I did some searching, and it seems that this StackOverflow response supports my understanding of how the mask is constructed:

https://stackoverflow.com/questions/30896622/understanding-powerpc-rlwinm-instruction

It looks like IBM uses inclusive bit numbering for the mask on both ends.  So, if MB and ME are 3-bit fields (implying an 8-bit mask, as in the previous example), then a mask of (0,7) == (8,7) == all bits set.  So, for a 32-bit wide mask, the only way to produce all bits set is if ((MB AND 31) == ((ME+1) AND 31)).

> ok give it a rest, we can try mul and come back to this one.

I'll keep toying with it from time to time.  I despise defeat.
Comment 21 Luke Kenneth Casson Leighton 2020-08-04 22:16:24 BST
(In reply to Samuel A. Falvo II from comment #20)

> This doesn't satisfy DeMorgan's Theorem identities:
> 
> x AND y = ~(~x OR ~y)
> 
> It looks like you'll also need to invert mr and ml on the left-hand
> transformation as well.  Am I missing something?

no, i get boolean logic wrong :)
 
> > ahh now that is clear.  oh ah are you sure about 4,3? should that be
> > 4,3=0b11011111?
> 
> I did some searching, and it seems that this StackOverflow response supports
> my understanding of how the mask is constructed:
> 
> https://stackoverflow.com/questions/30896622/understanding-powerpc-rlwinm-
> instruction
> 
> It looks like IBM uses inclusive bit numbering for the mask on both ends. 
> So, if MB and ME are 3-bit fields (implying an 8-bit mask, as in the
> previous example), then a mask of (0,7) == (8,7) == all bits set.  So, for a
> 32-bit wide mask, the only way to produce all bits set is if ((MB AND 31) ==
> ((ME+1) AND 31)).

ahh this explains why right_mask (in rotator.py) is one different from left_mask.

> 
> > ok give it a rest, we can try mul and come back to this one.
> 
> I'll keep toying with it from time to time.  I despise defeat.

:)
Comment 22 Luke Kenneth Casson Leighton 2020-08-19 22:28:35 BST
after realising that the inputs to OP_MUL_H32 can only be valid if op.is_32bit is true, it occurred to me that we have the exact same issue here.
Comment 23 Luke Kenneth Casson Leighton 2020-08-19 22:36:32 BST
with m.Elif(self.right_shift):
  # this is basically mb = sh + (is_32bit? 32: 0);

i noticed this is rotator.py and so when creating the mask it is offset by 32 bits.
Comment 24 Luke Kenneth Casson Leighton 2020-08-19 22:41:28 BST
i checked in the microwatt code, and yes, OP_RLC can be 64 bit, these are rldwm

https://github.com/antonblanchard/microwatt/blob/6d6cf59bb76b907abb9bd43c1e95dee94ee38afa/decode1.vhdl#L158
Comment 25 Samuel A. Falvo II 2020-09-05 00:16:48 BST
I'm running into a situation where computing ((1ULL << 63) << 0) == 0 instead of ((1ULL << 63) << 0) == (1ULL << 63).  

        # main assertion of arithmetic operations
        with m.Switch(itype):

            # left-shift: 64/32-bit
            with m.Case(MicrOp.OP_SHL):
                comb += Assume(ra == 0)
                with m.If(rec.is_32bit):
                    comb += Assert(o[0:32] == (rs << b[0:6])[0:32])
                    comb += Assert(o[32:64] == 0)
                with m.Else():
                    comb += DESIRED_sig.eq((rs << b[0:7])[0:64])
                    comb += Assert(o == DESIRED_sig)  #<--- assertion fails here.


From what I can discover, based on GTKWave investigation, it looks like the Rotator module is operating in mode 0, which applies both mr and ml masks to the output.  The problem is, while ml = 0xFFFFFFFFFFFFFFFF, mr is 0x0000000000000000.  This means that (ml & mr) == 0, which means the actual rotator output is also zero.

This begs the question, when invoking an OP_SHL type instruction, where mb=me=0, should it be operating in mode 0?  Should the masks be generated differently for these types of instructions?  Etc.
Comment 26 Luke Kenneth Casson Leighton 2020-09-05 00:44:39 BST
(In reply to Samuel A. Falvo II from comment #25)

> This begs the question, when invoking an OP_SHL type instruction, where
> mb=me=0, should it be operating in mode 0?  Should the masks be generated
> differently for these types of instructions?  Etc.

ok looking at p109 v3.0B spec, "sld" and "srd", the pseudocode has,
for sld, a MASK(0, 63-n) where n=RB.

and for slw (p107) it is MASK(32, 63-n)


so there is only one mask (mb?), the other one should be "Assume(me? == 0)".
or something.  maybe 0xffffffffffffffff, i don't know.

for OP_SHR it will be the other mask.

looking at the Rotator class, "input mode=0b0000" means:

* right_rhift=0,
* arith=0
* clear_left=0
* clear_right=0

this results in "output mode" (line 160) == 0b00 which yes, gives
us rot & (mr & ml)

wherrrrre.... how do we get me and mb both equal to zero?

oh hang on, look at the pseudocode again.

r = ROTL64((RS), n)
if (RB)57 = 0 then
     m = MASK(0, 63-n)

that means: if RB is between 64 and 127, the output is zero, simple as that.

i'll add a unit test for that, see how it goes.
Comment 27 Luke Kenneth Casson Leighton 2020-09-05 00:47:33 BST
    def case_sld_rb_too_big(self):
        lst = ["sld 3, 1, 4",
               ]
        initial_regs = [0] * 32
        initial_regs[1] = 0xffffffffffffffff
        initial_regs[4] = 64 # too big, output should be zero
        self.add_case(Program(lst, bigendian), initial_regs)

yep, output is zero.
Comment 28 Luke Kenneth Casson Leighton 2020-09-05 00:51:31 BST
(In reply to Samuel A. Falvo II from comment #25)
> I'm running into a situation where computing ((1ULL << 63) << 0) == 0
> instead of ((1ULL << 63) << 0) == (1ULL << 63).  
> 
>         # main assertion of arithmetic operations
>         with m.Switch(itype):
> 
>             # left-shift: 64/32-bit
>             with m.Case(MicrOp.OP_SHL):
>                 comb += Assume(ra == 0)
>                 with m.If(rec.is_32bit):

i think we should have "comb += Assume(b[5] == 0) here, to match the
behaviour of slw (v3.0B spec, p107)

>                     comb += Assert(o[0:32] == (rs << b[0:6])[0:32])
>                     comb += Assert(o[32:64] == 0)
>                 with m.Else():

add "comb += Assume(b[6] == 0) here and i believe things will "work"

>                     comb += DESIRED_sig.eq((rs << b[0:7])[0:64])
>                     comb += Assert(o == DESIRED_sig)  #<--- assertion fails
> here.
Comment 29 Samuel A. Falvo II 2020-09-05 01:07:49 BST
(In reply to Luke Kenneth Casson Leighton from comment #28)
> add "comb += Assume(b[6] == 0) here and i believe things will "work"

I still get the failure.
Comment 30 Samuel A. Falvo II 2020-09-05 01:11:28 BST
Created attachment 101 [details]
Showing where ((1<<63) << 0) == 0
Comment 31 Samuel A. Falvo II 2020-09-05 01:29:12 BST
Added a unit test covering the failing proof case.  However, I'm unable to run the unit tests for some reason.  I get a big pile of errors.  I need to review how to run unit tests; there's something in my environment which I'm missing.

*sigh*

Very frustrating to have error upon error upon error upon error.
Comment 32 Luke Kenneth Casson Leighton 2020-09-05 03:07:45 BST
(In reply to Samuel A. Falvo II from comment #31)
> Added a unit test covering the failing proof case.  However, I'm unable to
> run the unit tests for some reason.  I get a big pile of errors. 

you'll need to be a lot more specific.  "it don't work" means nothing to anyone, i'm afraid.

first thing: global search replace "def case_" with "def cse_" except on the one test to run.

this drastically cuts down output to only that which is useful.

then run only that one test with nohup and attach the output, here.

it is highly likely that you do not have all dependencies installed.  this is dead easy to spot (import error).
Comment 33 Luke Kenneth Casson Leighton 2020-09-05 04:14:30 BST
taking a look at the mask generation, microwatt added an extra test in the left mask, it may be a genuine bug (yay for formal proofs) will check tomorrow.
Comment 34 Luke Kenneth Casson Leighton 2020-09-05 14:00:12 BST
ran the unit test, samuel: it passed.

['RB', 'RS']
reading reg 8
reading reg 1
[SelectableInt(value=0x0, bits=64), SelectableInt(value=0x8000000000000000, bits=64)]
getitem slice(58, 64, None) 64 0x0
getitem 57 64 0x0
__eq__ SelectableInt(value=0x0, bits=1) 0
    eq 0 0 True
globals after SelectableInt(value=0x0, bits=64) SelectableInt(value=0x4, bits=64)
args[0] SelectableInt(value=0x0, bits=64) SelectableInt(value=0x4, bits=64)
(SelectableInt(value=0x8000000000000000, bits=64),)
after func SelectableInt(value=0x0, bits=64) SelectableInt(value=0x4, bits=64)
carry already done? 0b0
internal overflow None 0 0
writing reg 3 SelectableInt(value=0x8000000000000000, bits=64)
end of call SelectableInt(value=0x0, bits=64) SelectableInt(value=0x4, bits=64)
check extra output 'sld 3, 1, 8' 0 0
hw outputs {'o': 9223372036854775808}
sim outputs {'o': 9223372036854775808}
expected int sim 8000000000000000, actual: 8000000000000000
.
----------------------------------------------------------------------
Ran 1 test in 0.901s

OK
Comment 35 Luke Kenneth Casson Leighton 2020-09-07 11:59:17 BST
samuel it occurred to me that, just like in MUL, there may be some
spurious inputs being "wiggled" by the proof engine that should never
have been enabled in the first place.

examples of this include the "op.is_signed" flag - which should *definitely*
be False - for OP_SHL.

it may be worthwhile adding some Asserts on rotator itself

change this:

        # set up microwatt rotator module
        m.submodules.rotator = rotator = Rotator()

to this:

        # set up microwatt rotator module
        m.submodules.rotator = self.rotator = rotator = Rotator()

and that will allow you to get at it.

also:

        # instruction rotate type
        mode = Signal(4, reset_less=True)

change that to

        # instruction rotate type
        self.mode = mode = Signal(4, reset_less=True)

also it occurs to me that mb_extra - one of the key fields going in to
rotator - has not been set/asserted/assumed. 

therefore it could be adversely affecting the test.

the construction of mb and me *might* not be correct.  it's a little
more complicated (as you can see if you look in rotator.py) you have
to bring in a 6th bit... *sometimes*.
Comment 36 Samuel A. Falvo II 2020-09-07 23:38:49 BST
(In reply to Luke Kenneth Casson Leighton from comment #35)
> samuel it occurred to me that, just like in MUL, there may be some
> spurious inputs being "wiggled" by the proof engine that should never
> have been enabled in the first place.
> 
> examples of this include the "op.is_signed" flag - which should *definitely*
> be False - for OP_SHL.

I added this to the test:

    comb += Assume(~rec.is_signed)
    comb += Assume(~dut.i.ctx.op.is_signed)

but this did not change the outcome.  :(

I'll report my results looking at the other things separately.
Comment 37 Cole Poirier 2021-02-08 21:17:51 GMT
Taking as there hasn't been activity on this since September 2020
Comment 38 Jacob Lifshay 2022-02-22 09:48:34 GMT
I split up the formal proof into a per-instruction proof, allowing the proofs to be run in parallel, greatly speeding up the tests (if run with `pytest -n auto` or similar).

The code is mid-rewrite and very WIP, but it at least implements everything that was there before (except checking Data.ok, which I forgot). The new code also implements CA[32]. All non-rotate tests pass.

https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=442546d5bffafb8571f50b1a4dc48e83ace1ce47
Comment 39 Jacob Lifshay 2022-02-22 10:32:28 GMT
afaict samuel did some work on this too, icr if he said something about not wanting to be paid...if not, we should add him to the list of payees.
Comment 40 Luke Kenneth Casson Leighton 2022-02-22 10:34:21 GMT
(In reply to Jacob Lifshay from comment #38)
> I split up the formal proof into a per-instruction proof, allowing the
> proofs to be run in parallel, greatly speeding up the tests (if run with
> `pytest -n auto` or similar).

nice.

> The code is mid-rewrite and very WIP, but it at least implements everything
> that was there before (except checking Data.ok, which I forgot). The new
> code also implements CA[32]. All non-rotate tests pass.

briilliant.  this one kicked all our asses, there's some combinations
of the inputs to the Rotate module which are possible to set (imagine
a wire that is floating, and can randomly set 1 or 0) that *never*
get set by the main_stage.py pipeline... but Formal Correctness Engines
don't know that, and, in the absence of any Asserts/Assumes, will
detect and advise you of every single corner-case and permutation
that is "floating":

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/rotator.py;h=7c3d811c8fa0402a70d5a3e1551e8ecb83873280;hb=20ffa8ec223be44fc37df5184ca09d648e85a844#l49

basically there are combinations of those inputs that are never under any
legitimate circumstances in the Power ISA going to be set, and they need 
locking down.
Comment 41 Luke Kenneth Casson Leighton 2022-02-22 10:35:09 GMT
(In reply to Jacob Lifshay from comment #39)
> afaict samuel did some work on this too, icr if he said something about not
> wanting to be paid...if not, we should add him to the list of payees.

i leave that to you to contact him. he took himself off the cc list.
Comment 42 Jacob Lifshay 2022-02-24 02:45:22 GMT
added formal proof for OP_RLC

the way OpenPower decided to split up and recombine the mb field of rldimi was quite annoying to get right!