565 – Improve formal verification on PartitionedSignal

Bug 565 - Improve formal verification on PartitionedSignal

Summary: Improve formal verification on PartitionedSignal

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Formal Verification (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Cesar Strauss

URL:	https://lists.libre-soc.org/pipermail...

Depends on:
Blocks:	132
	Show dependency tree / graph

Reported:	2021-01-03 11:42 GMT by Cesar Strauss
Modified:	2022-09-01 20:19 BST (History)
CC List:	3 users (show)

See Also:	132 575 176 596
NLnet milestone:	NLNet.2019.10.032.Formal
total budget (EUR) for completion of task and all subtasks:	3000
budget (EUR) for this task, excluding subtasks' budget:	3000
parent task for budget allocation:	196
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	lkcl = { amount = 300, submitted = 2022-08-28, paid = 2022-08-31 } cesar = { amount = 2200, submitted = 2022-06-16, paid = 2022-07-21 } [jacob] amount = 500 submitted = 2022-07-06 paid = 2022-07-21

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Cesar Strauss 2021-01-03 11:42:30 GMT

On 01/02/2021 15:42, Luke Kenneth Casson Leighton wrote:
> we do need formal verification of PartitionedSignal, but in a way that it
> obvious how it works, so that it is easy to read the proof code.

I took a look at ieee754fpu/src/ieee754/part* and its proofs. I see what you mean.

I think I have an idea.

The approach is to create nested for loops that will enumerate all possible partitions of varying widths and positions. Then, check whether that partition is "complete" (delimited by ones on each side) and "unbroken" (all zeros inside). If so, compute the expected result, compare and assert.

It's horribly inefficient on resources (O(n^2) I think). Let's hope the proof is not too slow.

Comment 1 Luke Kenneth Casson Leighton 2021-01-03 12:27:02 GMT

(In reply to Cesar Strauss from comment #0)
> On 01/02/2021 15:42, Luke Kenneth Casson Leighton wrote:
> > we do need formal verification of PartitionedSignal, but in a way that it
> > obvious how it works, so that it is easy to read the proof code.
> 
> I took a look at ieee754fpu/src/ieee754/part* and its proofs. I see what you
> mean.

yes, the propagation across the partition points, when done optimally, is challenging to understand.

> I think I have an idea.
> 
> The approach is to create nested for loops that will enumerate all possible
> partitions of varying widths and positions. 

this can be done as a python for-loop that adds AST for each case, rather than an explicit hard-coded switch statement, just like in the Decoder.

> Then, check whether that
> partition is "complete" (delimited by ones on each side) and "unbroken" (all
> zeros inside). If so, compute the expected result, compare and assert.

an option there is to compute all 8-bit results @ offsets 0..N-1
all 16-bit results @ offsets 0..N-2
and so on.

then to "connect" those with the appropriate partition-derived test.

> It's horribly inefficient on resources (O(n^2) I think). Let's hope the
> proof is not too slow.

well, i would expect it to be dreadful for shift and multiply but tolerable for add, compares, OR etc.  however if the initial tests are only 32-bit and 3 partition points it should complete in reasonable time.

another speed-up trick would be to run multiple tests in parallel, with subsets of the permutations disallowed.  however... that would only be appropriate for speeding up testing and development.

also if you can, Cesar, use a higher-order-function (or, plan to add one, as an iterative incremental development) for the computation of the "thing" and the "thing to compare against".

the partitioning itself is identically-coded in all cases and it will save having to repeat the scheme you devised above.  this is important because we will need to revise some of the eq/etc. code to "propagate" compare/tests across all bits.  currently eq/gt/lt produce this:

     if a > b return 0b0001 # partition size of 32-bit

where what is actually needed is this:

     if a > b return 0b1111 # one bit set in each partition to indicate "True"

luckily there is a class which performs this job, it propagates the LSB in a partition, repeated up to the end of a partition.

Comment 2 Cesar Strauss 2021-01-03 18:20:44 GMT

New idea: take an arbitrary partition, by choosing its start point and size at random. Use "Assume" to ensure it is a whole unbroken partition (start and end points are one, with only zeros in between). Shift inputs and outputs down to zero. Loop over all possible partition sizes and, if it's the right size, compute the expected value, compare with the result, and assert.

I suspect it's still O(n^2), but it seems simpler (no nested for loop) in spite of some extra setup work.

To test the above, an idea is to create a partition pattern generator, that outputs an unique pattern, depending on partition size. For instance, on a size 3 partition, it could be: 0x31, 0x32, 0x33. 

(In reply to Luke Kenneth Casson Leighton from comment #1)
> also if you can, Cesar, use a higher-order-function (or, plan to add one, as
> an iterative incremental development) for the computation of the "thing" and
> the "thing to compare against".

Sure.

Comment 3 Luke Kenneth Casson Leighton 2021-01-03 18:41:20 GMT

(In reply to Cesar Strauss from comment #2)
> New idea: take an arbitrary partition, by choosing its start point and size
> at random. Use "Assume" to ensure it is a whole unbroken partition (start
> and end points are one, with only zeros in between). Shift inputs and
> outputs down to zero. Loop over all possible partition sizes and, if it's
> the right size, compute the expected value, compare with the result, and
> assert.

unfortunately we have to be careful: ultimately a full permutation proof has to be done.

perhaps a simpler task would be to first write a "length+offset" proof.  i.e. to write a proof of a sub-class that turns partitions into "length of partition plus the offset that it starts from".

this only needs one for-loop:

    for partitionbits in range(1<<partitionlength):
        # analyse partition bits, looking for runs of zeros

if this were expanded out from 32-bit with 3 partition bits, it would produce:

0 0 0 -> 8,0   8,8  8,16  8,24
0 0 1 -> 16,0       8,16  8,24
0 1 0 -> 8,0   16,8       8,24
0 1 1 -> 24,0             8,24
1 0 0 -> 8,0   8,8  16,16
1 0 1 -> 16,0       16,16
1 1 0 -> 8,0   24,8
1 1 1 -> 32,0

a *python* function can analyse the partitionbits and Assert those patterns


    with m.Switch(partitionsignal):
       for partitionbits in range(1<<partitionlength):
           # analyse partition bits, looking for runs of zeros
           l = create_patterns(partitionbits) # above table
           with m.Case(partitionbits):
               # check each pattern from the list

> I suspect it's still O(n^2), but it seems simpler (no nested for loop) in
> spite of some extra setup work.
> 
> To test the above, an idea is to create a partition pattern generator, that
> outputs an unique pattern, depending on partition size.

a for-loop can go through them all, the thing is it doesn't have to be
done manually and by rote like this:

   https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_cmp/formal/proof_gt.py;hb=HEAD

but it *can* be done in a similar way, with the m.Case() being in a for-loop.

    with m.Switch(partitionsignal):
       for partitionbits in range(1<<partitionlength):
           with m.Case(partitionbits):
                Assert(....)

Comment 4 Cesar Strauss 2021-01-03 19:01:42 GMT

(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Cesar Strauss from comment #2)
> > New idea: take an arbitrary partition, by choosing its start point and size
> > at random. Use "Assume" to ensure it is a whole unbroken partition (start
> > and end points are one, with only zeros in between). Shift inputs and
> > outputs down to zero. Loop over all possible partition sizes and, if it's
> > the right size, compute the expected value, compare with the result, and
> > assert.
> 
> unfortunately we have to be careful: ultimately a full permutation proof has
> to be done.

Maybe not: by letting the formal engine choose the size and offset, I think we are indeed covering all possible cases. It's just that we test only one partition at a time (the partition that the formal engine selected) instead of all, and let the formal engine do the outer loop for us. Reminds me of SIMT.

Comment 5 Luke Kenneth Casson Leighton 2021-01-03 19:17:31 GMT

(In reply to Cesar Strauss from comment #4)

> Maybe not: by letting the formal engine choose the size and offset, I think
> we are indeed covering all possible cases. It's just that we test only one
> partition at a time (the partition that the formal engine selected) instead
> of all, and let the formal engine do the outer loop for us. Reminds me of
> SIMT.

... that's the problem: we cannot assume that the testing of one partition will not cause interactions (due to coding mistakes) that result in the corruption of other partitions.

the way that the inference engines work is to exhaustively search for the combinations of inputs (and outputs!) that cause a failure.

or, are you speaking of turning the for-loops around (on their head), such
that you start from the *lengths* (and positions) and perform the Assert on the resultant partition bits?  that would be interesting.

in other words, you have patterns as follows:

  8-bit offsets 0,1,2,3
  16-bit offsets 0,1,2
  24-bit offsets 0,1
  32-bit

* for 8-bit the partition bit between the two must be zero
* for 16-bit the partition bit at the offset must be 1 and the partition bit
  after it must be zero
* for 24-bit the partition bits at the offset and at offset+1 must be 1 and at
  offset+2 must be zero
* for 32-bit the 3 bits must be 1 and you run out of partition bits to test for zero

this would be efficient/effective in that things being tested (arith) would only be tested once.

interesting.  and it has the advantage that "it's not the same underlying algorithm as the code it's testing" which is always a good thing.

Comment 6 Cesar Strauss 2021-01-03 19:52:13 GMT

(In reply to Luke Kenneth Casson Leighton from comment #5)
> or, are you speaking of turning the for-loops around (on their head), such
> that you start from the *lengths* (and positions) and perform the Assert on
> the resultant partition bits?  that would be interesting.

That's the idea, yes.
Only, since partition bits are inputs, we use Assume, not Assert.
We do Assert the result.

> 
> in other words, you have patterns as follows:
> 
>   8-bit offsets 0,1,2,3
>   16-bit offsets 0,1,2
>   24-bit offsets 0,1
>   32-bit
> 
> * for 8-bit the partition bit between the two must be zero
> * for 16-bit the partition bit at the offset must be 1 and the partition bit
>   after it must be zero
> * for 24-bit the partition bits at the offset and at offset+1 must be 1 and
> at
>   offset+2 must be zero
> * for 32-bit the 3 bits must be 1 and you run out of partition bits to test
> for zero

Exactly.

Well, only:
1) You have the polarity of the gates inverted. For instance, 32-bit is all zeros, see https://libre-soc.org/3d_gpu/architecture/dynamic_simd/eq/

2) I will also check that bit before the offset is 1. For instance, 16-bit is 101. Otherwise, you can be actually at the end of a bigger partition.

3) I will add two guard bits (= 1) at each end, so 32-bit is not a special case.

> this would be efficient/effective in that things being tested (arith) would
> only be tested once.
> 
> interesting.  and it has the advantage that "it's not the same underlying
> algorithm as the code it's testing" which is always a good thing.

Indeed.

Comment 7 Luke Kenneth Casson Leighton 2021-01-03 20:19:58 GMT

(In reply to Cesar Strauss from comment #6)

> Only, since partition bits are inputs, we use Assume, not Assert.
> We do Assert the result.

got it.
 

> Well, only:
> 1) You have the polarity of the gates inverted. For instance, 32-bit is all
> zeros, see https://libre-soc.org/3d_gpu/architecture/dynamic_simd/eq/

yes, you probably noticed i get order/polarity wrong

> 2) I will also check that bit before the offset is 1. For instance, 16-bit
> is 101. Otherwise, you can be actually at the end of a bigger partition.
> 
> 3) I will add two guard bits (= 1) at each end, so 32-bit is not a special
> case.

good idea.  if you can also parameterise the length but keep to smaller lengths initially, for testing, an O(N^2) algorithm on 64 bit could take some time, and shift or mul may even run for several hours.

i like the plan, Cesar.

Comment 8 Cesar Strauss 2021-01-08 18:47:00 GMT

The story so far:

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/formal/proof_partition.py;h=c8d15a2a61869941a6ed623352f90374d4e67ce7;hb=HEAD

* Implemented a partitioned pattern generator
* Implemented a prover, using the above as a test case

Next:

* Make the prover generic, avoiding repetition on the next proofs
* Implement proofs of the remaining partitioned operations

This code assumes equally spaced partition points. Otherwise it would have to loop over all possible start and end points, instead of just the possible widths, making it O(n^3).

Comment 9 Luke Kenneth Casson Leighton 2021-01-08 20:18:46 GMT

(In reply to Cesar Strauss from comment #8)
> The story so far:

:)

> This code assumes equally spaced partition points.

i think we can get away with that.  the other permutation set would be "aligned" ones.  so 16-8-8 but not 8-16-8

Comment 10 Cesar Strauss 2021-01-08 20:27:03 GMT

(In reply to Luke Kenneth Casson Leighton from comment #9)
> > This code assumes equally spaced partition points.
> 
> i think we can get away with that.  the other permutation set would be
> "aligned" ones.  so 16-8-8 but not 8-16-8

Both 16-8-8 and 8-16-8 are supported. They correspond respectively to gate patterns 011 and 101.

What is not supported is, for instance, 111 being 16-16-8-8 instead of 8-8-8-8.

Comment 11 Jacob Lifshay 2021-01-08 21:39:06 GMT

Note the partitioned multiplier currently assumes partitions are aligned, so 8-16-8 is invalid.

Comment 12 Cesar Strauss 2021-01-08 22:40:51 GMT

(In reply to Jacob Lifshay from comment #11)
> Note the partitioned multiplier currently assumes partitions are aligned, so
> 8-16-8 is invalid.

Understood.

There need to be a way to randomly generate a legal partitioning in this case, either by construction, or by constraining the output.

Comment 13 Jacob Lifshay 2021-01-08 22:56:48 GMT

A recursive function to create legal partitions:

def legal_partitions(log2_size):
    yield [0] * ((1 << log2_size) - 1)
    if log2_size == 0:
        return
    assert log2_size > 0
    next_smaller_partitions = list(legal_partitions(log2_size - 1))
    for left in next_smaller_partitions:
        for right in next_smaller_partitions:
            yield [*left, 1, *right]

for log2_size in range(4):
    print(f"log2_size={log2_size}:")
    for p in legal_partitions(log2_size):
        print(p)

Comment 14 Cesar Strauss 2021-01-09 10:41:06 GMT

(In reply to Jacob Lifshay from comment #13)
> A recursive function to create legal partitions:

Thanks. 32-bit is only 5 valid combinations, and 64-bit is still only 26. I think it can be done simply by enumerating them into a lookup-table, indexed by a random value.

(In reply to Cesar Strauss from comment #8)
> * Make the prover generic, avoiding repetition on the next proofs

I factored out the gate pattern generator, this should simplify proofs.

> * Implement proofs of the remaining partitioned operations

This is next. I'll begin with PartitionedEq.

Comment 15 Luke Kenneth Casson Leighton 2021-01-09 12:38:41 GMT

(In reply to Cesar Strauss from comment #14)

> > * Implement proofs of the remaining partitioned operations
> 
> This is next. I'll begin with PartitionedEq.

remember that where PartitionedEq currently returns 0b00001 (LSB and zeros) in each partition, this needs to be modified to return 0b11111 to indicate "equal".
that is done (easily) with nmutil.ripple.RippleLSB

also, that PartitionedEq itself isn't actually in use: it was one of the early intermediate stages of development, the actual code in use is 
PartitionedEqGtGe.  that also needs to be run through RippleLSB as its last stage, to modify the output.

ultimately PartitionedEqGtGe will need to be revised rather than post-processed: given that it took 3 weeks to write this can be done during an optimisation pass.

Comment 16 Cesar Strauss 2021-01-09 16:19:28 GMT

(In reply to Luke Kenneth Casson Leighton from comment #15)
> (In reply to Cesar Strauss from comment #14)
> 
> > > * Implement proofs of the remaining partitioned operations
> > 
> > This is next. I'll begin with PartitionedEq.

Done. It passes! See:

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_cmp/experiments/formal/proof_partitioned_eq.py;h=ba222a86356b744906fe05516ffb7e257e938cdd;hb=HEAD

> remember that where PartitionedEq currently returns 0b00001 (LSB and zeros)
> in each partition, this needs to be modified to return 0b11111 to indicate
> "equal".
> that is done (easily) with nmutil.ripple.RippleLSB

Understood. I added post processing by RippleLSB in the latest commit.

> also, that PartitionedEq itself isn't actually in use: it was one of the
> early intermediate stages of development, the actual code in use is 
> PartitionedEqGtGe.  that also needs to be run through RippleLSB as its last
> stage, to modify the output.

Sure. I guess it was a stepping stone. Will do PartitionedEqGtGe next.

Comment 17 Cesar Strauss 2021-01-09 17:44:38 GMT

(In reply to Cesar Strauss from comment #16)
> Will do PartitionedEqGtGe next.

Done!

It was very similar to PartitionedEq, just a few more lines to check the different opcodes.

By the way, I'm testing the full 64-bit width, 8 times 8-bit partitions. The run-time is still just a few seconds. Let's see how it goes.

I will now proceed to work on the remaining operations.

Comment 18 Luke Kenneth Casson Leighton 2021-01-09 18:06:49 GMT

(In reply to Cesar Strauss from comment #17)

> Done!
> 
> It was very similar to PartitionedEq, just a few more lines to check the
> different opcodes.

excellent.  yes it should be pretty easy, if abstracted out.  you shouuld be able to use python operator class.  operator.eq, etc. just pass that in as a parameter and "apply" it to both the PartitionedSignal *and* the Signal to assert against.
 
> By the way, I'm testing the full 64-bit width, 8 times 8-bit partitions. The
> run-time is still just a few seconds

cool!

> I will now proceed to work on the remaining operations.

shift and mul will be extremely long, thus is just normal.

Comment 19 Cesar Strauss 2021-01-10 17:18:27 GMT

A few files dealing with PartitionedSignal use the following idiom to get the width of a Signal:

width = sig.shape()[0]

In the latest git master, Shape is no longer a tuple, breaking this idiom.

I'll look into changing these to:

width = len(sig)

The above works since the last nMigen release, so it shouldn't break things for anyone.

Comment 20 Luke Kenneth Casson Leighton 2021-01-10 17:32:29 GMT

(In reply to Cesar Strauss from comment #19)

> I'll look into changing these to:
> 
> width = len(sig)
> 
> The above works since the last nMigen release, so it shouldn't break things
> for anyone.

yehyeh no should be fine

Comment 21 Cesar Strauss 2021-01-10 22:18:35 GMT

(In reply to Luke Kenneth Casson Leighton from comment #18)
> (In reply to Cesar Strauss from comment #17)
> excellent.  yes it should be pretty easy, if abstracted out.  you shouuld be
> able to use python operator class.  operator.eq, etc. just pass that in as a
> parameter and "apply" it to both the PartitionedSignal *and* the Signal to
> assert against.

Done, for all comparison operators.

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/formal/proof_partition.py;h=f257d0243b897456d4aded6de62b5d12cae2e1db;hb=HEAD#l459

This proof driver can only accept comparison operators, because it assumes the output is "predicate-like" (1-bit partitions), and it applies the ripple to its expected value.

Comment 22 Luke Kenneth Casson Leighton 2021-01-11 00:09:52 GMT

(In reply to Cesar Strauss from comment #21)
> (In reply to Luke Kenneth Casson Leighton from comment #18)
> > (In reply to Cesar Strauss from comment #17)
> > excellent.  yes it should be pretty easy, if abstracted out.  you shouuld be
> > able to use python operator class.  operator.eq, etc. just pass that in as a
> > parameter and "apply" it to both the PartitionedSignal *and* the Signal to
> > assert against.
> 
> Done, for all comparison operators.

fantastic.  could you try PartitionedSignal.all() as well?

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/partsig.py;h=53561d4bcc97c0dca31981261428b48d2e69dfff;hb=0bbad62934298fa33b05b655e9b34a4ab7d58bc0#l286

> https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/formal/
> proof_partition.py;h=f257d0243b897456d4aded6de62b5d12cae2e1db;hb=HEAD#l459

wait.. wheer.... how come you can use Repl() on the LSB, at line 380, and it produces the right answer? :)

i thought that PS.Ge/Le/Eq was returning 0b000001/0b00000 rather than ob111111/0b00000

ah well :)


> This proof driver can only accept comparison operators, because it assumes
> the output is "predicate-like" (1-bit partitions), and it applies the ripple
> to its expected value.

PartitionedSignal.all() should also be returning a predicate-like (partition-like) signal.  

it's one op - there is a way to detect the number of arguments a function can take.
https://stackoverflow.com/questions/847936/how-can-i-find-the-number-of-arguments-of-a-python-function

blech.

or... just a number of args are passed in (1,2,3) and it's a list
rather than a,b, it's arglist=[] then use self.op(*arglist)

Comment 23 Cesar Strauss 2021-01-13 23:06:41 GMT

(In reply to Luke Kenneth Casson Leighton from comment #22)
> (In reply to Cesar Strauss from comment #21)
> > This proof driver can only accept comparison operators, because it assumes
> > the output is "predicate-like" (1-bit partitions), and it applies the ripple
> > to its expected value.
> 
> PartitionedSignal.all() should also be returning a predicate-like
> (partition-like) signal.  
> 
> or... just a number of args are passed in (1,2,3) and it's a list
> rather than a,b, it's arglist=[] then use self.op(*arglist)

Sure, sounds like a good idea.

Also good is the idea you mentioned of defining an op_all(obj) function that returns obj.all().

More parameters can be added as needed to the driver, to cover more cases, like whether the output is arithmetic or boolean, if it has carry or not (like add_op and sub_op), and whether the shift operation is scalar or dynamic.

Comment 24 Cesar Strauss 2021-01-16 17:37:17 GMT

(In reply to Luke Kenneth Casson Leighton from comment #22)
> PartitionedSignal.all() should also be returning a predicate-like
> (partition-like) signal.  

> or... just a number of args are passed in (1,2,3) and it's a list
> rather than a,b, it's arglist=[] then use self.op(*arglist)

It works! The proof driver can now accept unary operations (returning a partition-like result).

PartitionedSignal.all() passes. Congratulations!

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/formal/proof_partition.py;h=df1b6f871c15c1cf16e338bb918b3e976d919434;hb=HEAD#l551

Comment 25 Luke Kenneth Casson Leighton 2021-01-16 18:12:27 GMT

(In reply to Cesar Strauss from comment #24)

> It works! The proof driver can now accept unary operations (returning a
> partition-like result).
> 
> PartitionedSignal.all() passes. Congratulations!

pffh, that's really funny given that it was one line of code :)

it's also fascinating in that its implementation is one of the first "meta" operators: i.e. it actually uses __eq__ by comparing against Const(-1)

next is to add some() which is __ne__ against Const(0)

Comment 26 Luke Kenneth Casson Leighton 2021-01-16 19:10:01 GMT

done some() - worked! - Cesar could you take a look at the (new) PartitionedXOR?
under bug #176.  it may need inverting of the input, or it may need RippleMSB.

Comment 27 Cesar Strauss 2021-01-16 21:08:31 GMT

(In reply to Luke Kenneth Casson Leighton from comment #26)
> done some() - worked! - Cesar could you take a look at the (new)
> PartitionedXOR?
> under bug #176.  it may need inverting of the input, or it may need
> RippleMSB.

It needed both, RippleLSB (not RippleMSB) and inverting the output (not the input). Also, formally verifying it on 8-bit partitions seemed to take a really long time, but it quickly passed with 4-bit partitions.

Comment 28 Luke Kenneth Casson Leighton 2021-01-16 22:00:10 GMT

(In reply to Cesar Strauss from comment #27)
> (In reply to Luke Kenneth Casson Leighton from comment #26)
> > done some() - worked! - Cesar could you take a look at the (new)
> > PartitionedXOR?
> > under bug #176.  it may need inverting of the input, or it may need
> > RippleMSB.
> 
> It needed both, RippleLSB (not RippleMSB) and inverting the output (not the
> input). 

wow that's very cool that it worked based on "guessing" :)

> Also, formally verifying it on 8-bit partitions seemed to take a
> really long time, but it quickly passed with 4-bit partitions.

fascinating.  that may be worthwhile looking at the ilang file in yosys
"show top", to see what the gate level looks like.

Comment 29 Cesar Strauss 2021-01-17 17:40:55 GMT

(In reply to Luke Kenneth Casson Leighton from comment #28)
> (In reply to Cesar Strauss from comment #27)
> > Also, formally verifying it on 8-bit partitions seemed to take a
> > really long time, but it quickly passed with 4-bit partitions.
> 
> fascinating.  that may be worthwhile looking at the ilang file in yosys
> "show top", to see what the gate level looks like.

I suspect it has to do with the search space. On an OR-cascade (PartitionedEQ), as soon as one bit equals one, the result becomes independent of the other bits, and the search tree can be pruned. With an XOR-cascade (PartitionedXOR) there is no such opportunity, it must follow every branch.

In other news: the proof driver now can also handle operations which give arithmetic outputs. Addition and subtraction (without carry-in or out) passes.

Comment 30 Luke Kenneth Casson Leighton 2021-01-17 17:53:18 GMT

(In reply to Cesar Strauss from comment #29)

> independent of the other bits, and the search tree can be pruned. With an
> XOR-cascade (PartitionedXOR) there is no such opportunity, it must follow
> every branch.

ah true.
 
> In other news: the proof driver now can also handle operations which give
> arithmetic outputs. Addition and subtraction (without carry-in or out)
> passes.

fantastic.  that's a great addition (pun intended).  neg is the next obvious one, and "or", "and", "not" and "implies" should be trivial but important to add for completeness.

shift: likewise, but these are expected to take a looong time, it may be better to add them as separate independent unit tests.

oh.  one important code-path: RHS (operand 2) as:

* Const
* Signal
* PartitionedSignal

Shift in particular is commmpletely different when passed in a global Const or a global (nonpartitioned) Signal.

Comment 31 Cesar Strauss 2021-01-17 22:58:28 GMT

(In reply to Luke Kenneth Casson Leighton from comment #30)
> neg is the next obvious one,

For some reason, neg(a) was returning (a-1) instead of (-a). I re-implemented it as (0-a).

The test case at src/ieee754/part/test/test_partsig.py was also testing for
(a-1). I fixed it by shifting down the input, negating it and shifting it back again. This way, it becomes independent of my implementation.

Speaking of which, that test suite fails when checking the partitioned carry-out from the adder. As it turns out, the carry-out is being registered, which confuses the test. See:

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_mul_add/adder.py;h=e1849b4d25fc5ec4fc4473cc02b32f49dd5912b2;hb=HEAD#l271

Should we replace sync by comb? Or was it intentional?

> and "or", "and", "not" and "implies" should be trivial but important to
> add for completeness.

Sure. A partitioned check for those seems a bit overkill, tough. I guess it's worth it to be extra careful.

> shift: likewise, but these are expected to take a looong time, it may be
> better to add them as separate independent unit tests.

OK.

> oh.  one important code-path: RHS (operand 2) as:
> 
> * Const
> * Signal
> * PartitionedSignal
> 
> Shift in particular is commmpletely different when passed in a global Const
> or a global (nonpartitioned) Signal.

Sure.

Const and Signal apply globally, right? For instance, "pa + 1" should add one independently to each partition, correct?

Const must be given an initial value when created, so we can't depend on the solver to generate random values for us. We have to repeat running the proof for random values of the Const.

Comment 32 Luke Kenneth Casson Leighton 2021-01-17 23:36:59 GMT

(In reply to Cesar Strauss from comment #31)
> (In reply to Luke Kenneth Casson Leighton from comment #30)
> > neg is the next obvious one,
> 
> For some reason, neg(a) was returning (a-1) instead of (-a). I
> re-implemented it as (0-a).

odd.  sounds better.

> The test case at src/ieee754/part/test/test_partsig.py was also testing for
> (a-1).

bizzare

> 
> Speaking of which, that test suite fails when checking the partitioned
> carry-out from the adder. As it turns out, the carry-out is being
> registered, which confuses the test. See:
> 
> https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/
> part_mul_add/adder.py;h=e1849b4d25fc5ec4fc4473cc02b32f49dd5912b2;hb=HEAD#l271
> 
> Should we replace sync by comb? 

err yes :)

> Or was it intentional?

not at all!
 
> > and "or", "and", "not" and "implies" should be trivial but important to
> > add for completeness.
> 
> Sure. A partitioned check for those seems a bit overkill, tough. I guess
> it's worth it to be extra careful.

yehyeh, that's the idea.  they're bitlevel, it's "irrelevant" but still

> Const and Signal apply globally, right? For instance, "pa + 1" should add
> one independently to each partition, correct?

yes. there will be cases where this needs to be done, such as immediate operations, the immediate will need adding to all SIMD elements.

likewise for example if an algorithm needs a shift by X const then SIMD shift by X applies to all elements 
 
> Const must be given an initial value when created, so we can't depend on the
> solver to generate random values for us. We have to repeat running the proof
> for random values of the Const.

hmm yuk, we can probably get away with just Signal then

Comment 33 Cesar Strauss 2021-01-20 22:11:24 GMT

(In reply to Luke Kenneth Casson Leighton from comment #32)
> (In reply to Cesar Strauss from comment #31)
> > (In reply to Luke Kenneth Casson Leighton from comment #30)
> > > neg is the next obvious one,
> > 
> > For some reason, neg(a) was returning (a-1) instead of (-a). I
> > re-implemented it as (0-a).

I have a theory. The test suite for PartitionedSignal has several instances of (-a) being calculated as ~(a-1), which is correct, even if unconventional. I think the intention was for neg(a) in PartitionedSignal itself to be calculated as ~(a-1), but the final inversion was forgotten, leaving only the (a-1) I found.

> > > and "or", "and", "not" and "implies" should be trivial but important to
> > > add for completeness.
> > 
> > Sure. A partitioned check for those seems a bit overkill, tough. I guess
> > it's worth it to be extra careful.
> 
> yehyeh, that's the idea.  they're bitlevel, it's "irrelevant" but still

Done.

implies() was calculated as "~premise | conclusion". I had to change it to "conclusion | ~premise", otherwise, the inversion would return a Signal, and the following OR would be Signal's OR, not PartitionedSignal's.

For some reason, the solver again took a long time with implies(). It doesn't make sense, since it's just like OR, which is fast, but with one input inverted. 

On a whim, I changed the solver from the default (yices2) to z3. Not only it didn't show a slowdown on implies(), but the earlier slow XOR became fast as well. On the other hand, NOT, ADD and SUB are slower with z3, while fast with yices2.

I'll add an optional "solver" parameter to FHDLTestCase.assertFormal().

Comment 34 Luke Kenneth Casson Leighton 2021-01-20 23:47:54 GMT

(In reply to Cesar Strauss from comment #33)
> (In reply to Luke Kenneth Casson Leighton from comment #32)

> Done.
> 
> implies() was calculated as "~premise | conclusion". I had to change it to
> "conclusion | ~premise", otherwise, the inversion would return a Signal, and
> the following OR would be Signal's OR, not PartitionedSignal's.

hmm there is a precedence system in python to deal with this, to get it to use the reverse version.

but, really, this is a bug in PartitionedSignal if the inverse fn is not returning a PartitionedSignal.  let me check...

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/partsig.py;h=728764e77f972d75ba0229f4d5e2e821a059ea78;hb=54ebe54bb3dceacfddf376a73088896daf900bc6#l65

yep that is a bug, it should be return PartitionedSignal(~self.sig, self.partpoints)

that's the root cause of the problem you are seeing.

 
> For some reason, the solver again took a long time with implies(). It
> doesn't make sense, since it's just like OR, which is fast, but with one
> input inverted. 

hmm.
 
> On a whim, I changed the solver from the default (yices2) to z3. Not only it
> didn't show a slowdown on implies(), but the earlier slow XOR became fast as
> well. On the other hand, NOT, ADD and SUB are slower with z3, while fast
> with yices2.

intriguing
 
> I'll add an optional "solver" parameter to FHDLTestCase.assertFormal().

nice idea.  do let whitequark know.

Comment 35 Cesar Strauss 2021-01-23 11:37:47 GMT

(In reply to Luke Kenneth Casson Leighton from comment #34)
> (In reply to Cesar Strauss from comment #33)
> > (In reply to Luke Kenneth Casson Leighton from comment #32)
> but, really, this is a bug in PartitionedSignal if the inverse fn is not
> returning a PartitionedSignal.  let me check...
> 
> https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/
> partsig.py;h=728764e77f972d75ba0229f4d5e2e821a059ea78;
> hb=54ebe54bb3dceacfddf376a73088896daf900bc6#l65
> 
> yep that is a bug, it should be return PartitionedSignal(~self.sig,
> self.partpoints)

Well, you may notice that every arithmetic/logical/shift operation is returning Signal instead of PartitionedSignal, not just inverse. Should we fix them all?

On the other hand, partition-like outputs (results from comparisons, any, all, xor) should be kept as Signals, right?

For instance: (pa >= pmin) & (pa <= pmax)
In the above, "&" is Signal's "and".

> > I'll add an optional "solver" parameter to FHDLTestCase.assertFormal().

Done. As a result, all proofs up to now use the full 64-bit, 8 x 8-bit partitions.

Comment 36 Luke Kenneth Casson Leighton 2021-01-23 14:17:05 GMT

(In reply to Cesar Strauss from comment #35)
> (In reply to Luke Kenneth Casson Leighton from comment #34)
> > (In reply to Cesar Strauss from comment #33)
> > > (In reply to Luke Kenneth Casson Leighton from comment #32)
> > but, really, this is a bug in PartitionedSignal if the inverse fn is not
> > returning a PartitionedSignal.  let me check...
> > 
> > https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/
> > partsig.py;h=728764e77f972d75ba0229f4d5e2e821a059ea78;
> > hb=54ebe54bb3dceacfddf376a73088896daf900bc6#l65
> > 
> > yep that is a bug, it should be return PartitionedSignal(~self.sig,
> > self.partpoints)
> 
> Well, you may notice that every arithmetic/logical/shift operation is
> returning Signal instead of PartitionedSignal, not just inverse. Should we
> fix them all?

arg!  yes :)


> On the other hand, partition-like outputs (results from comparisons, any,
> all, xor) should be kept as Signals, right?

actually at some point it may be necessary to create a "PartitionSignal"
class for them, or a "PartitionedBool" class might be more accurate.

in effect, partition-like outputs are also a PartitionedSignal, it's just
that the bitwidth of each partition is only one bit wide.

 
> For instance: (pa >= pmin) & (pa <= pmax)
> In the above, "&" is Signal's "and".

this *should* - for now - be ok.  this was why i insisted on the full
bits to be set on boolean tests (RippleMSBDown / RippleLSB).
actually it is related to PMux (parallel mux).

if all the bits are set then anything treated as boolean (pa >= pmin)
is a simple & or | to do these tests and it *should* all "work"

PMux (the parallel version of Mux) is then *DEAD* simple: every single
individual 8-bit group can just be a for-loop:
    for i in range(8):
         output[i] = Mux(test[i], a[i], b[i])

and that's it: no knowledge of the partitions is needed, at all.  if however
they were single-bit then PMux becomes much more complicated: it needs a
RippleLSB before doing the for-loop on the Muxes.


regarding PartitionedBool, this *may* be needed, we will have to see how
things interact when it comes to bool() overrides.


> > > I'll add an optional "solver" parameter to FHDLTestCase.assertFormal().
> 
> Done. As a result, all proofs up to now use the full 64-bit, 8 x 8-bit
> partitions.

star.

Comment 37 Luke Kenneth Casson Leighton 2021-01-23 14:22:49 GMT

(In reply to Luke Kenneth Casson Leighton from comment #36)

> > Well, you may notice that every arithmetic/logical/shift operation is
> > returning Signal instead of PartitionedSignal, not just inverse. Should we
> > fix them all?
> 
> arg!  yes :)

if the context of being a PartitionedSignal is not propagated it will
result in data corruption when creating chains of operators.

the idea here is to be able to use the standard operations, including
with m.If(), and to not even know it's a dynamic SIMD partitioned
object.

Comment 38 Cesar Strauss 2021-01-23 19:23:18 GMT

(In reply to Luke Kenneth Casson Leighton from comment #34)
> (In reply to Cesar Strauss from comment #33)
> > (In reply to Luke Kenneth Casson Leighton from comment #32)
> yep that is a bug, it should be return PartitionedSignal(~self.sig,
> self.partpoints)

It doesn't work that way, exactly. PartitionedSignal's __init__() doesn't wrap around a nMigen expression.

It must be:

    result = PartitionedSignal(self.partpoints, self.sig.width)
    self.m.d.comb += result.sig.eq(~self.sig)
    return result

Could be simplified a bit by implementing result = PartitionedSignal.like(self).

What do you think? Is there a way to do it the way you described?

Comment 39 Luke Kenneth Casson Leighton 2021-01-23 19:31:41 GMT

(In reply to Cesar Strauss from comment #38)

> Could be simplified a bit by implementing result =
> PartitionedSignal.like(self).

ahh yehyeh i was going to suggest that.  occurred to me a couple days
ago that like was missing.
 
> What do you think?

go for it on the like() method.  yes it is a bit of a nuisance that
all of these operators need to "automatically" add a module.  fortunately
when you look at the graphviz it's pretty clear / nice.

>  Is there a way to do it the way you described?

i was in pseudo-code mode (and concentrating on svp64), apologies.