On 01/02/2021 15:42, Luke Kenneth Casson Leighton wrote: > we do need formal verification of PartitionedSignal, but in a way that it > obvious how it works, so that it is easy to read the proof code. I took a look at ieee754fpu/src/ieee754/part* and its proofs. I see what you mean. I think I have an idea. The approach is to create nested for loops that will enumerate all possible partitions of varying widths and positions. Then, check whether that partition is "complete" (delimited by ones on each side) and "unbroken" (all zeros inside). If so, compute the expected result, compare and assert. It's horribly inefficient on resources (O(n^2) I think). Let's hope the proof is not too slow.

(In reply to Cesar Strauss from comment #0) > On 01/02/2021 15:42, Luke Kenneth Casson Leighton wrote: > > we do need formal verification of PartitionedSignal, but in a way that it > > obvious how it works, so that it is easy to read the proof code. > > I took a look at ieee754fpu/src/ieee754/part* and its proofs. I see what you > mean. yes, the propagation across the partition points, when done optimally, is challenging to understand. > I think I have an idea. > > The approach is to create nested for loops that will enumerate all possible > partitions of varying widths and positions. this can be done as a python for-loop that adds AST for each case, rather than an explicit hard-coded switch statement, just like in the Decoder. > Then, check whether that > partition is "complete" (delimited by ones on each side) and "unbroken" (all > zeros inside). If so, compute the expected result, compare and assert. an option there is to compute all 8-bit results @ offsets 0..N-1 all 16-bit results @ offsets 0..N-2 and so on. then to "connect" those with the appropriate partition-derived test. > It's horribly inefficient on resources (O(n^2) I think). Let's hope the > proof is not too slow. well, i would expect it to be dreadful for shift and multiply but tolerable for add, compares, OR etc. however if the initial tests are only 32-bit and 3 partition points it should complete in reasonable time. another speed-up trick would be to run multiple tests in parallel, with subsets of the permutations disallowed. however... that would only be appropriate for speeding up testing and development. also if you can, Cesar, use a higher-order-function (or, plan to add one, as an iterative incremental development) for the computation of the "thing" and the "thing to compare against". the partitioning itself is identically-coded in all cases and it will save having to repeat the scheme you devised above. this is important because we will need to revise some of the eq/etc. code to "propagate" compare/tests across all bits. currently eq/gt/lt produce this: if a > b return 0b0001 # partition size of 32-bit where what is actually needed is this: if a > b return 0b1111 # one bit set in each partition to indicate "True" luckily there is a class which performs this job, it propagates the LSB in a partition, repeated up to the end of a partition.

New idea: take an arbitrary partition, by choosing its start point and size at random. Use "Assume" to ensure it is a whole unbroken partition (start and end points are one, with only zeros in between). Shift inputs and outputs down to zero. Loop over all possible partition sizes and, if it's the right size, compute the expected value, compare with the result, and assert. I suspect it's still O(n^2), but it seems simpler (no nested for loop) in spite of some extra setup work. To test the above, an idea is to create a partition pattern generator, that outputs an unique pattern, depending on partition size. For instance, on a size 3 partition, it could be: 0x31, 0x32, 0x33. (In reply to Luke Kenneth Casson Leighton from comment #1) > also if you can, Cesar, use a higher-order-function (or, plan to add one, as > an iterative incremental development) for the computation of the "thing" and > the "thing to compare against". Sure.

(In reply to Cesar Strauss from comment #2) > New idea: take an arbitrary partition, by choosing its start point and size > at random. Use "Assume" to ensure it is a whole unbroken partition (start > and end points are one, with only zeros in between). Shift inputs and > outputs down to zero. Loop over all possible partition sizes and, if it's > the right size, compute the expected value, compare with the result, and > assert. unfortunately we have to be careful: ultimately a full permutation proof has to be done. perhaps a simpler task would be to first write a "length+offset" proof. i.e. to write a proof of a sub-class that turns partitions into "length of partition plus the offset that it starts from". this only needs one for-loop: for partitionbits in range(1<<partitionlength): # analyse partition bits, looking for runs of zeros if this were expanded out from 32-bit with 3 partition bits, it would produce: 0 0 0 -> 8,0 8,8 8,16 8,24 0 0 1 -> 16,0 8,16 8,24 0 1 0 -> 8,0 16,8 8,24 0 1 1 -> 24,0 8,24 1 0 0 -> 8,0 8,8 16,16 1 0 1 -> 16,0 16,16 1 1 0 -> 8,0 24,8 1 1 1 -> 32,0 a *python* function can analyse the partitionbits and Assert those patterns with m.Switch(partitionsignal): for partitionbits in range(1<<partitionlength): # analyse partition bits, looking for runs of zeros l = create_patterns(partitionbits) # above table with m.Case(partitionbits): # check each pattern from the list > I suspect it's still O(n^2), but it seems simpler (no nested for loop) in > spite of some extra setup work. > > To test the above, an idea is to create a partition pattern generator, that > outputs an unique pattern, depending on partition size. a for-loop can go through them all, the thing is it doesn't have to be done manually and by rote like this: https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_cmp/formal/proof_gt.py;hb=HEAD but it *can* be done in a similar way, with the m.Case() being in a for-loop. with m.Switch(partitionsignal): for partitionbits in range(1<<partitionlength): with m.Case(partitionbits): Assert(....)

(In reply to Luke Kenneth Casson Leighton from comment #3) > (In reply to Cesar Strauss from comment #2) > > New idea: take an arbitrary partition, by choosing its start point and size > > at random. Use "Assume" to ensure it is a whole unbroken partition (start > > and end points are one, with only zeros in between). Shift inputs and > > outputs down to zero. Loop over all possible partition sizes and, if it's > > the right size, compute the expected value, compare with the result, and > > assert. > > unfortunately we have to be careful: ultimately a full permutation proof has > to be done. Maybe not: by letting the formal engine choose the size and offset, I think we are indeed covering all possible cases. It's just that we test only one partition at a time (the partition that the formal engine selected) instead of all, and let the formal engine do the outer loop for us. Reminds me of SIMT.

(In reply to Cesar Strauss from comment #4) > Maybe not: by letting the formal engine choose the size and offset, I think > we are indeed covering all possible cases. It's just that we test only one > partition at a time (the partition that the formal engine selected) instead > of all, and let the formal engine do the outer loop for us. Reminds me of > SIMT. ... that's the problem: we cannot assume that the testing of one partition will not cause interactions (due to coding mistakes) that result in the corruption of other partitions. the way that the inference engines work is to exhaustively search for the combinations of inputs (and outputs!) that cause a failure. or, are you speaking of turning the for-loops around (on their head), such that you start from the *lengths* (and positions) and perform the Assert on the resultant partition bits? that would be interesting. in other words, you have patterns as follows: 8-bit offsets 0,1,2,3 16-bit offsets 0,1,2 24-bit offsets 0,1 32-bit * for 8-bit the partition bit between the two must be zero * for 16-bit the partition bit at the offset must be 1 and the partition bit after it must be zero * for 24-bit the partition bits at the offset and at offset+1 must be 1 and at offset+2 must be zero * for 32-bit the 3 bits must be 1 and you run out of partition bits to test for zero this would be efficient/effective in that things being tested (arith) would only be tested once. interesting. and it has the advantage that "it's not the same underlying algorithm as the code it's testing" which is always a good thing.

(In reply to Luke Kenneth Casson Leighton from comment #5) > or, are you speaking of turning the for-loops around (on their head), such > that you start from the *lengths* (and positions) and perform the Assert on > the resultant partition bits? that would be interesting. That's the idea, yes. Only, since partition bits are inputs, we use Assume, not Assert. We do Assert the result. > > in other words, you have patterns as follows: > > 8-bit offsets 0,1,2,3 > 16-bit offsets 0,1,2 > 24-bit offsets 0,1 > 32-bit > > * for 8-bit the partition bit between the two must be zero > * for 16-bit the partition bit at the offset must be 1 and the partition bit > after it must be zero > * for 24-bit the partition bits at the offset and at offset+1 must be 1 and > at > offset+2 must be zero > * for 32-bit the 3 bits must be 1 and you run out of partition bits to test > for zero Exactly. Well, only: 1) You have the polarity of the gates inverted. For instance, 32-bit is all zeros, see https://libre-soc.org/3d_gpu/architecture/dynamic_simd/eq/ 2) I will also check that bit before the offset is 1. For instance, 16-bit is 101. Otherwise, you can be actually at the end of a bigger partition. 3) I will add two guard bits (= 1) at each end, so 32-bit is not a special case. > this would be efficient/effective in that things being tested (arith) would > only be tested once. > > interesting. and it has the advantage that "it's not the same underlying > algorithm as the code it's testing" which is always a good thing. Indeed.

(In reply to Cesar Strauss from comment #6) > Only, since partition bits are inputs, we use Assume, not Assert. > We do Assert the result. got it. > Well, only: > 1) You have the polarity of the gates inverted. For instance, 32-bit is all > zeros, see https://libre-soc.org/3d_gpu/architecture/dynamic_simd/eq/ yes, you probably noticed i get order/polarity wrong > 2) I will also check that bit before the offset is 1. For instance, 16-bit > is 101. Otherwise, you can be actually at the end of a bigger partition. > > 3) I will add two guard bits (= 1) at each end, so 32-bit is not a special > case. good idea. if you can also parameterise the length but keep to smaller lengths initially, for testing, an O(N^2) algorithm on 64 bit could take some time, and shift or mul may even run for several hours. i like the plan, Cesar.

The story so far: https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/formal/proof_partition.py;h=c8d15a2a61869941a6ed623352f90374d4e67ce7;hb=HEAD * Implemented a partitioned pattern generator * Implemented a prover, using the above as a test case Next: * Make the prover generic, avoiding repetition on the next proofs * Implement proofs of the remaining partitioned operations This code assumes equally spaced partition points. Otherwise it would have to loop over all possible start and end points, instead of just the possible widths, making it O(n^3).

(In reply to Cesar Strauss from comment #8) > The story so far: :) > This code assumes equally spaced partition points. i think we can get away with that. the other permutation set would be "aligned" ones. so 16-8-8 but not 8-16-8

(In reply to Luke Kenneth Casson Leighton from comment #9) > > This code assumes equally spaced partition points. > > i think we can get away with that. the other permutation set would be > "aligned" ones. so 16-8-8 but not 8-16-8 Both 16-8-8 and 8-16-8 are supported. They correspond respectively to gate patterns 011 and 101. What is not supported is, for instance, 111 being 16-16-8-8 instead of 8-8-8-8.

Note the partitioned multiplier currently assumes partitions are aligned, so 8-16-8 is invalid.

(In reply to Jacob Lifshay from comment #11) > Note the partitioned multiplier currently assumes partitions are aligned, so > 8-16-8 is invalid. Understood. There need to be a way to randomly generate a legal partitioning in this case, either by construction, or by constraining the output.

A recursive function to create legal partitions: def legal_partitions(log2_size): yield [0] * ((1 << log2_size) - 1) if log2_size == 0: return assert log2_size > 0 next_smaller_partitions = list(legal_partitions(log2_size - 1)) for left in next_smaller_partitions: for right in next_smaller_partitions: yield [*left, 1, *right] for log2_size in range(4): print(f"log2_size={log2_size}:") for p in legal_partitions(log2_size): print(p)

(In reply to Jacob Lifshay from comment #13) > A recursive function to create legal partitions: Thanks. 32-bit is only 5 valid combinations, and 64-bit is still only 26. I think it can be done simply by enumerating them into a lookup-table, indexed by a random value. (In reply to Cesar Strauss from comment #8) > * Make the prover generic, avoiding repetition on the next proofs I factored out the gate pattern generator, this should simplify proofs. > * Implement proofs of the remaining partitioned operations This is next. I'll begin with PartitionedEq.

(In reply to Cesar Strauss from comment #14) > > * Implement proofs of the remaining partitioned operations > > This is next. I'll begin with PartitionedEq. remember that where PartitionedEq currently returns 0b00001 (LSB and zeros) in each partition, this needs to be modified to return 0b11111 to indicate "equal". that is done (easily) with nmutil.ripple.RippleLSB also, that PartitionedEq itself isn't actually in use: it was one of the early intermediate stages of development, the actual code in use is PartitionedEqGtGe. that also needs to be run through RippleLSB as its last stage, to modify the output. ultimately PartitionedEqGtGe will need to be revised rather than post-processed: given that it took 3 weeks to write this can be done during an optimisation pass.

(In reply to Luke Kenneth Casson Leighton from comment #15) > (In reply to Cesar Strauss from comment #14) > > > > * Implement proofs of the remaining partitioned operations > > > > This is next. I'll begin with PartitionedEq. Done. It passes! See: https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_cmp/experiments/formal/proof_partitioned_eq.py;h=ba222a86356b744906fe05516ffb7e257e938cdd;hb=HEAD > remember that where PartitionedEq currently returns 0b00001 (LSB and zeros) > in each partition, this needs to be modified to return 0b11111 to indicate > "equal". > that is done (easily) with nmutil.ripple.RippleLSB Understood. I added post processing by RippleLSB in the latest commit. > also, that PartitionedEq itself isn't actually in use: it was one of the > early intermediate stages of development, the actual code in use is > PartitionedEqGtGe. that also needs to be run through RippleLSB as its last > stage, to modify the output. Sure. I guess it was a stepping stone. Will do PartitionedEqGtGe next.

(In reply to Cesar Strauss from comment #16) > Will do PartitionedEqGtGe next. Done! It was very similar to PartitionedEq, just a few more lines to check the different opcodes. By the way, I'm testing the full 64-bit width, 8 times 8-bit partitions. The run-time is still just a few seconds. Let's see how it goes. I will now proceed to work on the remaining operations.

(In reply to Cesar Strauss from comment #17) > Done! > > It was very similar to PartitionedEq, just a few more lines to check the > different opcodes. excellent. yes it should be pretty easy, if abstracted out. you shouuld be able to use python operator class. operator.eq, etc. just pass that in as a parameter and "apply" it to both the PartitionedSignal *and* the Signal to assert against. > By the way, I'm testing the full 64-bit width, 8 times 8-bit partitions. The > run-time is still just a few seconds cool! > I will now proceed to work on the remaining operations. shift and mul will be extremely long, thus is just normal.

A few files dealing with PartitionedSignal use the following idiom to get the width of a Signal: width = sig.shape()[0] In the latest git master, Shape is no longer a tuple, breaking this idiom. I'll look into changing these to: width = len(sig) The above works since the last nMigen release, so it shouldn't break things for anyone.

(In reply to Cesar Strauss from comment #19) > I'll look into changing these to: > > width = len(sig) > > The above works since the last nMigen release, so it shouldn't break things > for anyone. yehyeh no should be fine

(In reply to Luke Kenneth Casson Leighton from comment #18) > (In reply to Cesar Strauss from comment #17) > excellent. yes it should be pretty easy, if abstracted out. you shouuld be > able to use python operator class. operator.eq, etc. just pass that in as a > parameter and "apply" it to both the PartitionedSignal *and* the Signal to > assert against. Done, for all comparison operators. https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/formal/proof_partition.py;h=f257d0243b897456d4aded6de62b5d12cae2e1db;hb=HEAD#l459 This proof driver can only accept comparison operators, because it assumes the output is "predicate-like" (1-bit partitions), and it applies the ripple to its expected value.

(In reply to Cesar Strauss from comment #21) > (In reply to Luke Kenneth Casson Leighton from comment #18) > > (In reply to Cesar Strauss from comment #17) > > excellent. yes it should be pretty easy, if abstracted out. you shouuld be > > able to use python operator class. operator.eq, etc. just pass that in as a > > parameter and "apply" it to both the PartitionedSignal *and* the Signal to > > assert against. > > Done, for all comparison operators. fantastic. could you try PartitionedSignal.all() as well? https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/partsig.py;h=53561d4bcc97c0dca31981261428b48d2e69dfff;hb=0bbad62934298fa33b05b655e9b34a4ab7d58bc0#l286 > https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/formal/ > proof_partition.py;h=f257d0243b897456d4aded6de62b5d12cae2e1db;hb=HEAD#l459 wait.. wheer.... how come you can use Repl() on the LSB, at line 380, and it produces the right answer? :) i thought that PS.Ge/Le/Eq was returning 0b000001/0b00000 rather than ob111111/0b00000 ah well :) > This proof driver can only accept comparison operators, because it assumes > the output is "predicate-like" (1-bit partitions), and it applies the ripple > to its expected value. PartitionedSignal.all() should also be returning a predicate-like (partition-like) signal. it's one op - there is a way to detect the number of arguments a function can take. https://stackoverflow.com/questions/847936/how-can-i-find-the-number-of-arguments-of-a-python-function blech. or... just a number of args are passed in (1,2,3) and it's a list rather than a,b, it's arglist=[] then use self.op(*arglist)

(In reply to Luke Kenneth Casson Leighton from comment #22) > (In reply to Cesar Strauss from comment #21) > > This proof driver can only accept comparison operators, because it assumes > > the output is "predicate-like" (1-bit partitions), and it applies the ripple > > to its expected value. > > PartitionedSignal.all() should also be returning a predicate-like > (partition-like) signal. > > or... just a number of args are passed in (1,2,3) and it's a list > rather than a,b, it's arglist=[] then use self.op(*arglist) Sure, sounds like a good idea. Also good is the idea you mentioned of defining an op_all(obj) function that returns obj.all(). More parameters can be added as needed to the driver, to cover more cases, like whether the output is arithmetic or boolean, if it has carry or not (like add_op and sub_op), and whether the shift operation is scalar or dynamic.

(In reply to Luke Kenneth Casson Leighton from comment #22) > PartitionedSignal.all() should also be returning a predicate-like > (partition-like) signal. > or... just a number of args are passed in (1,2,3) and it's a list > rather than a,b, it's arglist=[] then use self.op(*arglist) It works! The proof driver can now accept unary operations (returning a partition-like result). PartitionedSignal.all() passes. Congratulations! https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/formal/proof_partition.py;h=df1b6f871c15c1cf16e338bb918b3e976d919434;hb=HEAD#l551

(In reply to Cesar Strauss from comment #24) > It works! The proof driver can now accept unary operations (returning a > partition-like result). > > PartitionedSignal.all() passes. Congratulations! pffh, that's really funny given that it was one line of code :) it's also fascinating in that its implementation is one of the first "meta" operators: i.e. it actually uses __eq__ by comparing against Const(-1) next is to add some() which is __ne__ against Const(0)

done some() - worked! - Cesar could you take a look at the (new) PartitionedXOR? under bug #176. it may need inverting of the input, or it may need RippleMSB.

(In reply to Luke Kenneth Casson Leighton from comment #26) > done some() - worked! - Cesar could you take a look at the (new) > PartitionedXOR? > under bug #176. it may need inverting of the input, or it may need > RippleMSB. It needed both, RippleLSB (not RippleMSB) and inverting the output (not the input). Also, formally verifying it on 8-bit partitions seemed to take a really long time, but it quickly passed with 4-bit partitions.

(In reply to Cesar Strauss from comment #27) > (In reply to Luke Kenneth Casson Leighton from comment #26) > > done some() - worked! - Cesar could you take a look at the (new) > > PartitionedXOR? > > under bug #176. it may need inverting of the input, or it may need > > RippleMSB. > > It needed both, RippleLSB (not RippleMSB) and inverting the output (not the > input). wow that's very cool that it worked based on "guessing" :) > Also, formally verifying it on 8-bit partitions seemed to take a > really long time, but it quickly passed with 4-bit partitions. fascinating. that may be worthwhile looking at the ilang file in yosys "show top", to see what the gate level looks like.

(In reply to Luke Kenneth Casson Leighton from comment #28) > (In reply to Cesar Strauss from comment #27) > > Also, formally verifying it on 8-bit partitions seemed to take a > > really long time, but it quickly passed with 4-bit partitions. > > fascinating. that may be worthwhile looking at the ilang file in yosys > "show top", to see what the gate level looks like. I suspect it has to do with the search space. On an OR-cascade (PartitionedEQ), as soon as one bit equals one, the result becomes independent of the other bits, and the search tree can be pruned. With an XOR-cascade (PartitionedXOR) there is no such opportunity, it must follow every branch. In other news: the proof driver now can also handle operations which give arithmetic outputs. Addition and subtraction (without carry-in or out) passes.

(In reply to Cesar Strauss from comment #29) > independent of the other bits, and the search tree can be pruned. With an > XOR-cascade (PartitionedXOR) there is no such opportunity, it must follow > every branch. ah true. > In other news: the proof driver now can also handle operations which give > arithmetic outputs. Addition and subtraction (without carry-in or out) > passes. fantastic. that's a great addition (pun intended). neg is the next obvious one, and "or", "and", "not" and "implies" should be trivial but important to add for completeness. shift: likewise, but these are expected to take a looong time, it may be better to add them as separate independent unit tests. oh. one important code-path: RHS (operand 2) as: * Const * Signal * PartitionedSignal Shift in particular is commmpletely different when passed in a global Const or a global (nonpartitioned) Signal.

(In reply to Luke Kenneth Casson Leighton from comment #30) > neg is the next obvious one, For some reason, neg(a) was returning (a-1) instead of (-a). I re-implemented it as (0-a). The test case at src/ieee754/part/test/test_partsig.py was also testing for (a-1). I fixed it by shifting down the input, negating it and shifting it back again. This way, it becomes independent of my implementation. Speaking of which, that test suite fails when checking the partitioned carry-out from the adder. As it turns out, the carry-out is being registered, which confuses the test. See: https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_mul_add/adder.py;h=e1849b4d25fc5ec4fc4473cc02b32f49dd5912b2;hb=HEAD#l271 Should we replace sync by comb? Or was it intentional? > and "or", "and", "not" and "implies" should be trivial but important to > add for completeness. Sure. A partitioned check for those seems a bit overkill, tough. I guess it's worth it to be extra careful. > shift: likewise, but these are expected to take a looong time, it may be > better to add them as separate independent unit tests. OK. > oh. one important code-path: RHS (operand 2) as: > > * Const > * Signal > * PartitionedSignal > > Shift in particular is commmpletely different when passed in a global Const > or a global (nonpartitioned) Signal. Sure. Const and Signal apply globally, right? For instance, "pa + 1" should add one independently to each partition, correct? Const must be given an initial value when created, so we can't depend on the solver to generate random values for us. We have to repeat running the proof for random values of the Const.

(In reply to Cesar Strauss from comment #31) > (In reply to Luke Kenneth Casson Leighton from comment #30) > > neg is the next obvious one, > > For some reason, neg(a) was returning (a-1) instead of (-a). I > re-implemented it as (0-a). odd. sounds better. > The test case at src/ieee754/part/test/test_partsig.py was also testing for > (a-1). bizzare > > Speaking of which, that test suite fails when checking the partitioned > carry-out from the adder. As it turns out, the carry-out is being > registered, which confuses the test. See: > > https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/ > part_mul_add/adder.py;h=e1849b4d25fc5ec4fc4473cc02b32f49dd5912b2;hb=HEAD#l271 > > Should we replace sync by comb? err yes :) > Or was it intentional? not at all! > > and "or", "and", "not" and "implies" should be trivial but important to > > add for completeness. > > Sure. A partitioned check for those seems a bit overkill, tough. I guess > it's worth it to be extra careful. yehyeh, that's the idea. they're bitlevel, it's "irrelevant" but still > Const and Signal apply globally, right? For instance, "pa + 1" should add > one independently to each partition, correct? yes. there will be cases where this needs to be done, such as immediate operations, the immediate will need adding to all SIMD elements. likewise for example if an algorithm needs a shift by X const then SIMD shift by X applies to all elements > Const must be given an initial value when created, so we can't depend on the > solver to generate random values for us. We have to repeat running the proof > for random values of the Const. hmm yuk, we can probably get away with just Signal then

(In reply to Luke Kenneth Casson Leighton from comment #32) > (In reply to Cesar Strauss from comment #31) > > (In reply to Luke Kenneth Casson Leighton from comment #30) > > > neg is the next obvious one, > > > > For some reason, neg(a) was returning (a-1) instead of (-a). I > > re-implemented it as (0-a). I have a theory. The test suite for PartitionedSignal has several instances of (-a) being calculated as ~(a-1), which is correct, even if unconventional. I think the intention was for neg(a) in PartitionedSignal itself to be calculated as ~(a-1), but the final inversion was forgotten, leaving only the (a-1) I found. > > > and "or", "and", "not" and "implies" should be trivial but important to > > > add for completeness. > > > > Sure. A partitioned check for those seems a bit overkill, tough. I guess > > it's worth it to be extra careful. > > yehyeh, that's the idea. they're bitlevel, it's "irrelevant" but still Done. implies() was calculated as "~premise | conclusion". I had to change it to "conclusion | ~premise", otherwise, the inversion would return a Signal, and the following OR would be Signal's OR, not PartitionedSignal's. For some reason, the solver again took a long time with implies(). It doesn't make sense, since it's just like OR, which is fast, but with one input inverted. On a whim, I changed the solver from the default (yices2) to z3. Not only it didn't show a slowdown on implies(), but the earlier slow XOR became fast as well. On the other hand, NOT, ADD and SUB are slower with z3, while fast with yices2. I'll add an optional "solver" parameter to FHDLTestCase.assertFormal().

(In reply to Cesar Strauss from comment #33) > (In reply to Luke Kenneth Casson Leighton from comment #32) > Done. > > implies() was calculated as "~premise | conclusion". I had to change it to > "conclusion | ~premise", otherwise, the inversion would return a Signal, and > the following OR would be Signal's OR, not PartitionedSignal's. hmm there is a precedence system in python to deal with this, to get it to use the reverse version. but, really, this is a bug in PartitionedSignal if the inverse fn is not returning a PartitionedSignal. let me check... https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/partsig.py;h=728764e77f972d75ba0229f4d5e2e821a059ea78;hb=54ebe54bb3dceacfddf376a73088896daf900bc6#l65 yep that is a bug, it should be return PartitionedSignal(~self.sig, self.partpoints) that's the root cause of the problem you are seeing. > For some reason, the solver again took a long time with implies(). It > doesn't make sense, since it's just like OR, which is fast, but with one > input inverted. hmm. > On a whim, I changed the solver from the default (yices2) to z3. Not only it > didn't show a slowdown on implies(), but the earlier slow XOR became fast as > well. On the other hand, NOT, ADD and SUB are slower with z3, while fast > with yices2. intriguing > I'll add an optional "solver" parameter to FHDLTestCase.assertFormal(). nice idea. do let whitequark know.

(In reply to Luke Kenneth Casson Leighton from comment #34) > (In reply to Cesar Strauss from comment #33) > > (In reply to Luke Kenneth Casson Leighton from comment #32) > but, really, this is a bug in PartitionedSignal if the inverse fn is not > returning a PartitionedSignal. let me check... > > https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/ > partsig.py;h=728764e77f972d75ba0229f4d5e2e821a059ea78; > hb=54ebe54bb3dceacfddf376a73088896daf900bc6#l65 > > yep that is a bug, it should be return PartitionedSignal(~self.sig, > self.partpoints) Well, you may notice that every arithmetic/logical/shift operation is returning Signal instead of PartitionedSignal, not just inverse. Should we fix them all? On the other hand, partition-like outputs (results from comparisons, any, all, xor) should be kept as Signals, right? For instance: (pa >= pmin) & (pa <= pmax) In the above, "&" is Signal's "and". > > I'll add an optional "solver" parameter to FHDLTestCase.assertFormal(). Done. As a result, all proofs up to now use the full 64-bit, 8 x 8-bit partitions.

(In reply to Cesar Strauss from comment #35) > (In reply to Luke Kenneth Casson Leighton from comment #34) > > (In reply to Cesar Strauss from comment #33) > > > (In reply to Luke Kenneth Casson Leighton from comment #32) > > but, really, this is a bug in PartitionedSignal if the inverse fn is not > > returning a PartitionedSignal. let me check... > > > > https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/ > > partsig.py;h=728764e77f972d75ba0229f4d5e2e821a059ea78; > > hb=54ebe54bb3dceacfddf376a73088896daf900bc6#l65 > > > > yep that is a bug, it should be return PartitionedSignal(~self.sig, > > self.partpoints) > > Well, you may notice that every arithmetic/logical/shift operation is > returning Signal instead of PartitionedSignal, not just inverse. Should we > fix them all? arg! yes :) > On the other hand, partition-like outputs (results from comparisons, any, > all, xor) should be kept as Signals, right? actually at some point it may be necessary to create a "PartitionSignal" class for them, or a "PartitionedBool" class might be more accurate. in effect, partition-like outputs are also a PartitionedSignal, it's just that the bitwidth of each partition is only one bit wide. > For instance: (pa >= pmin) & (pa <= pmax) > In the above, "&" is Signal's "and". this *should* - for now - be ok. this was why i insisted on the full bits to be set on boolean tests (RippleMSBDown / RippleLSB). actually it is related to PMux (parallel mux). if all the bits are set then anything treated as boolean (pa >= pmin) is a simple & or | to do these tests and it *should* all "work" PMux (the parallel version of Mux) is then *DEAD* simple: every single individual 8-bit group can just be a for-loop: for i in range(8): output[i] = Mux(test[i], a[i], b[i]) and that's it: no knowledge of the partitions is needed, at all. if however they were single-bit then PMux becomes much more complicated: it needs a RippleLSB before doing the for-loop on the Muxes. regarding PartitionedBool, this *may* be needed, we will have to see how things interact when it comes to bool() overrides. > > > I'll add an optional "solver" parameter to FHDLTestCase.assertFormal(). > > Done. As a result, all proofs up to now use the full 64-bit, 8 x 8-bit > partitions. star.

(In reply to Luke Kenneth Casson Leighton from comment #36) > > Well, you may notice that every arithmetic/logical/shift operation is > > returning Signal instead of PartitionedSignal, not just inverse. Should we > > fix them all? > > arg! yes :) if the context of being a PartitionedSignal is not propagated it will result in data corruption when creating chains of operators. the idea here is to be able to use the standard operations, including with m.If(), and to not even know it's a dynamic SIMD partitioned object.

(In reply to Luke Kenneth Casson Leighton from comment #34) > (In reply to Cesar Strauss from comment #33) > > (In reply to Luke Kenneth Casson Leighton from comment #32) > yep that is a bug, it should be return PartitionedSignal(~self.sig, > self.partpoints) It doesn't work that way, exactly. PartitionedSignal's __init__() doesn't wrap around a nMigen expression. It must be: result = PartitionedSignal(self.partpoints, self.sig.width) self.m.d.comb += result.sig.eq(~self.sig) return result Could be simplified a bit by implementing result = PartitionedSignal.like(self). What do you think? Is there a way to do it the way you described?

(In reply to Cesar Strauss from comment #38) > Could be simplified a bit by implementing result = > PartitionedSignal.like(self). ahh yehyeh i was going to suggest that. occurred to me a couple days ago that like was missing. > What do you think? go for it on the like() method. yes it is a bit of a nuisance that all of these operators need to "automatically" add a module. fortunately when you look at the graphviz it's pretty clear / nice. > Is there a way to do it the way you described? i was in pseudo-code mode (and concentrating on svp64), apologies.