Bug 176 - partitioned dynamic bool/all/any/xor operators
Summary: partitioned dynamic bool/all/any/xor operators
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: ALU (including IEEE754 16/32/64-bit FPU) (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 132
  Show dependency treegraph
 
Reported: 2020-02-10 20:41 GMT by Luke Kenneth Casson Leighton
Modified: 2021-01-16 23:01 GMT (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2019.02
total budget (EUR) for completion of task and all subtasks: 150
budget (EUR) for this task, excluding subtasks' budget: 150
parent task for budget allocation: 132
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-02-10 20:41:03 GMT
similar to gt/le/eq in form, bool/all/any/xor operators are needed.

wiki page here

https://libre-soc.org/3d_gpu/architecture/dynamic_simd/logicops/
Comment 1 Luke Kenneth Casson Leighton 2020-12-21 16:32:08 GMT
pranav wrote:

> I spent some time reading the code in the partitioned adder/ shift etc so clear about the "width".

excellent.  it's basically a parameter which by default we set to 64 (with 7 partitions, every 8th bit) in real-world but the examples are all 32 bit (and only 3 partitions to give 4x 8 bit)

this is because it's just... smaller examples.

> am still a little unsure of how to figure the "batches" and also the
> 4/8/16/32 in the implementation of simd, how and when to choose the same. 

partitions first.  basically, if we did not have partitions, then we would need *separate* HDL for 1x64, separate HDL for 2x32, separate 4x16 and 8x8.

then a massive utterly awful switch statement literally in every single pipeline, with a parameter "if mode == 1x64 do this else if 2x32 do that".

this is insane and flat-out unmaintainable code.

instead, the pipeline has a "partition context" as an extra input.  these partition points "break up" the 64 bit Signal into smaller, completely isolated chunks, aka "batches".

it turns out that, just like Long Multiplication, you can *use* those "batches" to create larger results.

so the first thing to do is to break the computation down into the smallest "chunks".  this is typically 8x8.

for the 8x8 SIMD case (where all the partition gates are CLOSED), this *is* the answer!

now let us consider the 4x16 case.  how do you create four 16-bit results when you have performed 8x 8bit sub-results already?

well, the answer is very simple: you combine pairs of 8bit subresults to make 4x 16bit ones!

and if the 8bit subresults were, say, XOR of all 8 bits together, then, well, to create the 16 bit results you um take pairs of the XOR intermediaries, and simply XOR those together, too.

dead simple in principle.

i started describing it on this page, see boolean xor table

https://libre-soc.org/3d_gpu/architecture/dynamic_simd/logicops/
Comment 2 ps905 2020-12-27 19:17:17 GMT
>similar to gt/le/eq in form, bool/all/any/xor operators are needed.

>wiki page here

>https://libre-soc.org/3d_gpu/architecture/dynamic_simd/logicops/
1. needed some help in understanding if the result has to be specific for the width or it has to go through all the possibilities. 
I am guessing that I could be understanding these partitions improperly.
What I mean is given if width of the signal, and for an example it is 32. Does it have to be split into 16x4 and 8x8 as well or only specifically for 32x2
Comment 3 Luke Kenneth Casson Leighton 2020-12-27 20:25:47 GMT
(In reply to ps905 from comment #2)
> >similar to gt/le/eq in form, bool/all/any/xor operators are needed.
> 
> >wiki page here
> 
> >https://libre-soc.org/3d_gpu/architecture/dynamic_simd/logicops/
> 1. needed some help in understanding if the result has to be specific for
> the width or it has to go through all the possibilities. 

well, it turns out that it's pretty straightforward to do all possibilities, simply by having the 7 bits of partitioning.

> I am guessing that I could be understanding these partitions improperly.
> What I mean is given if width of the signal, and for an example it is 32.
> Does it have to be split into 16x4 and 8x8 as well or only specifically for
> 32x2

all the code written takes 2 parameters:
a) width
b) partition points

take a look at the eq code as an example.

if an instance of the eq class is declared as 32 bit the partition points can be:

8 8 8 8
16 8 8
8 16 8
8 8 16
16 16
8 24
24 8
32

because the partition settings will be

0 0 0
1 0 0
0 1 0
0 0 1
etc etc.

the code does NOT take a parameter "partition equals 64 bit, partition equals 2x32 bit, partition equals 4x15"

the code takes a parameter indicating the OPENING GATE POINTs... *between* the 8 bit sub-sections

look at the code and the notes that went with it.

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_cmp/experiments/equal_ortree.py;h=470fb603f48c31c84a13e2cf4e240ea7acd2f590;hb=54ebe54bb3dceacfddf376a73088896daf900bc6

the actual code is quite small but it did take a good 2-3 weeks for us to work through it.
Comment 4 Luke Kenneth Casson Leighton 2020-12-27 22:50:20 GMT
https://bugs.libre-soc.org/show_bug.cgi?id=132#c43

here's where the original conversation took place.

look again at the logical_ops wiki page.
Comment 5 Luke Kenneth Casson Leighton 2020-12-28 01:12:52 GMT
ah i just realised, the all() operator is equivalent to "is a == 0b111111111111111111"

and the some() operator is equivalent to "is b != 0b0000000000"

so in PartSig these can be done using PertitionedEqGtLt() with constants in b

this just leaves the xor() operator
Comment 6 Luke Kenneth Casson Leighton 2020-12-28 06:24:02 GMT
(In reply to Luke Kenneth Casson Leighton from comment #5)
> ah i just realised, the all() operator is equivalent to "is a ==
> 0b111111111111111111"


https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/partsig.py;h=728764e77f972d75ba0229f4d5e2e821a059ea78;hb=54ebe54bb3dceacfddf376a73088896daf900bc6#l291

pranav this one is dead easy.

  def any(self):
     width = self.sig.shape()[0]
     return self.__eq__(Const(-1, width))

or probably just:

  def any(self):
     return self == Const(-1)


would you like to try writing the (two lines!) some() case?
Comment 7 ps905 2020-12-28 07:01:19 GMT
>all the code written takes 2 parameters:
>a) width
>b) partition points

>take a look at the eq code as an example.

>if an instance of the eq class is declared as 32 bit the partition points can >be:

>8 8 8 8
>16 8 8
>8 16 8
>8 8 16
>16 16
>8 24
>24 8
>32

>because the partition settings will be

>0 0 0
>1 0 0
>0 1 0
>0 0 1
>etc etc.

>the code does NOT take a parameter "partition equals 64 bit, partition equals >2x32 bit, partition equals 4x15"

>the code takes a parameter indicating the OPENING GATE POINTs... *between* the >8 bit sub-sections

>look at the code and the notes that went with it.

got a better hold of it, thanks

>pranav this one is dead easy.

>  def any(self):
>     width = self.sig.shape()[0]
>     return self.__eq__(Const(-1, width))

or probably just:

>  def any(self):
>     return self == Const(-1)
 
>would you like to try writing the (two lines!) some() case?

sure will do this
Comment 8 Luke Kenneth Casson Leighton 2020-12-28 13:38:35 GMT
(In reply to ps905 from comment #7)

> >  def any(self):
> >     return self == Const(-1)
>  
> >would you like to try writing the (two lines!) some() case?
> 
> sure will do this

cool.  i just committed this:

commit cc37b3eed710bc3d8877b6170dcb0d6da183eea2 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Dec 28 13:34:49 2020 +0000

    partsig: redirect bool to any for now, and use a == Const(-1) for any

so if you can do the same, ahh sorry, that one was for "all()" not "any()"
(doh)

if you can do any() using the != and, just like i did, make sure to add a
comment in both the code and in the commit message (do not exceed 80 char
line lengths in either case)

just commit it: don't worry about writing unit tests (etc) yet, thsis code is unused at the moment so there is no "damage" that you can do.  this will get you familiar with the "commit" process.  i'll do a review after.
Comment 9 Luke Kenneth Casson Leighton 2021-01-09 12:44:15 GMT
note in bug #575 about RippleLSB to be used as post-processing, to make the result 0b111111/0b0000 not 0b00001/0b00000
Comment 10 Luke Kenneth Casson Leighton 2021-01-16 01:55:22 GMT
https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_cmp/experiments/eq_combiner.py;h=f3d299f780618a732debb32faa62221f20240068;hb=4a5f5fedddb68de9704cfb9c9991d93d32af8e47

xor can, i believe be constructed by simply copying (better, adding a parameter to augment) EqCombiner so that, at line 50, it does an XOR rather than an OR

and a copy of this:

https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_cmp/experiments/equal_ortree.py;h=470fb603f48c31c84a13e2cf4e240ea7acd2f590;hb=4a5f5fedddb68de9704cfb9c9991d93d32af8e47

augmented to firstly only have one argument and secondly to produce, at lihe 49, a list of partial XORs

     ...append(self.a[start:end].xor())

and i believe it's done.
Comment 11 Luke Kenneth Casson Leighton 2021-01-16 18:40:37 GMT
index 53561d4..ed221d3 100644
--- a/src/ieee754/part/partsig.py
+++ b/src/ieee754/part/partsig.py
@@ -281,6 +281,7 @@ class PartitionedSignal:
         Value, out
             ``1`` if any bits are set, ``0`` otherwise.
         """
+        return self != Const(0) # leverage the __ne__ operator here

pranav, you see how the any() operator was as simple as comparing "not equal
to zero"?
Comment 12 Luke Kenneth Casson Leighton 2021-01-16 18:51:58 GMT
done, not tested, have to see how it goes

commit c6583a6ae212dd6f28453451a43effdb053ed493 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Jan 16 18:45:10 2021 +0000

    convert EQCombiner to general-purpose, create XORCombiner

commit 8a652f6dc19cb9951ddd763881af11f328c01fc6 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Jan 16 18:51:34 2021 +0000

    add first cut (untested) of PartitionedXOR
Comment 13 Luke Kenneth Casson Leighton 2021-01-16 23:01:57 GMT
fixed and formally verified XOR by Cesar.
initial guess needed RippleMSBDown and inversion of output.