734 – add Partitioned SimdSignal support for elwidth-based layouts (currently ElwidPartType)

Bug 734 - add Partitioned SimdSignal support for elwidth-based layouts (currently ElwidPartType)

Summary: add Partitioned SimdSignal support for elwidth-based layouts (currently Elwid...

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:	https://libre-soc.org/3d_gpu/architec...

Depends on:
Blocks:	132 458
	Show dependency tree / graph

Reported:	2021-10-22 14:46 BST by Luke Kenneth Casson Leighton
Modified:	2022-06-16 15:51 BST (History)
CC List:	2 users (show)

See Also:	733 713 716
NLnet milestone:	NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2021-10-22 14:46:16 BST

as part of investigating SimdSignal.Slice it became apparent that the
elwidth-based layouts would need to be gotten up and running, first,
because those properly support layouts with blank/padding in them.

context:
http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-October/003920.html

this doesn't necessarily rely on SimdScope (to be renamed SimdMode?)
but is very closely related... TBD
initial version:
https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/simd_scope.py;h=a7f355d766d69ac52c32c9465955a36fa41d45dc;hb=96f15c29e57a41f1cdd480493da25826d84f7153

Comment 1 Luke Kenneth Casson Leighton 2021-10-22 14:56:48 BST

two key commits, here:

1) https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=cd06267946b1488104ddc2a5960f6b7002f322ec

 a) removing simd_full_width_hint because it's down to individual
    Signals (SimdSignals) to determine and explicitly set - if
    needed - their explicit width
 b) adding a module context.
    this is REALLY important.  firstly that the Type 1 (AST) casting
    rules for the module need to be declared as "Simd-aware"
    secondly that every single SimdSignal created must know what its
    module is, in order for it to be capable of creating sub-modules.

2) https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=f1b7f195e21c3f7434a96520ebdb8484b696e54b

 a) added a docstring / bare-minimum usage

 b) added a Signal() function which either redirects to nmigen Signal
    or SimdSignal as appropriate

(2b) also shows - finally - in a concrete way - why SimdShape had to derive
from Shape.  unless it did so, there was no way that the SimdScope context
could transparently switch over between scalar and simd without one hell of
a lot of unnecessary futzing about

caveat: must update nmigen libresoc-partsig at least this commit:
https://git.libre-soc.org/?p=nmigen.git;a=commitdiff;h=f7a0244e6355c70efb12cb54802cd765323c964a

this new function has been added to dsl.Module which *must* be called:

 185 
 186     def _setAstTypeCastFn(self, typefn=None):
 187         self._astTypeCast  = typefn or Value.cast

Comment 2 Luke Kenneth Casson Leighton 2021-10-22 15:02:22 BST

just realised it's critically important not to use the SimdSignal cast
function in scalar SimdScope mode!

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=1411a10fcf1bc0986fe57bb23144904b67941a4f

Comment 3 Luke Kenneth Casson Leighton 2021-10-22 15:28:37 BST

i put in quite a few TODO of code-comments that are missing in SimdScope

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=a95f90848c98564d6fcb2ec48c018d2362eed2e4

one in particular stood out: it is critically important NOT to try
to treat SimdSignal as if it could potentially be a scalar.  this will
have massive detrimental implications

the idea is to completely transparently switch all code - at compile-time -
with an option to issuer_verilog.py and to test_issuer.py - between
Simd and Scalar with a single command-line option that goes directly
into SimdScope's scalar argument.

if that scalar argument is True, the ENTIRE Simd behaviour of SimdScope
must get the hell out the way, leaving the entire code base unaltered
in every way at every level, exactly as if SimdScope was never even
being used.

this allows us to test that the changes to the ALUs do in fact not
damage the code's scalar behaviour

the absolute last thing we need is to radically alter the entire
suite of ALUs, encounter a bug, and have absolutely no idea and no
way to determine if the bug was down to EITHER:

1) changes to the ALUs
2) the new SimdSignal

only by having the ability to swap SimdScope and SimdScope RIGHT
out the way do we have a means by which the distinction between
those two can be made.

that, and the amount of code created will be s*** on gate count
for scalar, punishing test_issuer in scalar mode completely
unnecessarily.

Comment 4 Luke Kenneth Casson Leighton 2021-10-23 16:40:33 BST

(In reply to Luke Kenneth Casson Leighton from comment #3)

> 1) changes to the ALUs
> 2) the new SimdSignal

by moving the scalar-emulation to a util function there are 3 tests
that can be done. actually, 4:

1) existing scalar
2) scalar-emulation
3) SIMD with elwidth=0b00
4) SIMD with other elwidths (vs scalar with an equivalent XLEN)

this last one will be fun / interesting, and justifies doing scalar
XLEN=32/16/8 HDL, fascinatingly.

so, please don't *delete* the scalar-emulation, move it to a util function
that can be used during establishing PSpec at runtime/config.
should be very interesting and useful.

Comment 5 Luke Kenneth Casson Leighton 2021-10-23 18:39:23 BST

in https://bugs.libre-soc.org/show_bug.cgi?id=716#c17
it was determined that mismatched width/partitionpoints
needs an adapter that would adjust automatically the
input operands.  this *may* include zero/sign-extension
in the case of arithmetic/logical ops.

it has implications for the ElwidPartType because ElwidPartType
needs to use the correct element width per partition, including
padding pieces, to ensure that for each elwid case the total
width is always the same.

Comment 6 Luke Kenneth Casson Leighton 2021-10-24 12:43:31 BST

    elwid       |    |        |  |      |    |   
    0b00  x x x  x x  x x x x |B7 B6B5B4 B3B2 B1B0A2A1A0
    0b01  x x x |B7B6 B5B4AaA9 A8|x x x |B3B2 B1B0A2A1A0
    0b10  B7B6Ae AdAc|B5B4AaA9 A8|B3B2A6 A5A4|B1B0A2A1A0

this example, from the slice page, is showing up an oversight
in PartitionedCat (and other places that have a get_chunk
function)
https://libre-soc.org/3d_gpu/architecture/dynamic_simd/slice/

the issue is that PartitionedCat etc. were not designed with
disparate PartitionPoints in mind, and the case of a fixed-overall-width
and a fixed-element-width is breaking the assumptions.

a workaround _would_ be to convert all of the input parameters
(PartitionedCat.catlist) to the exact same partitionpoints
but i am currently unable to see how to do that.

determining the new lane_shapes is easy enough: just add up
the widths at each elwidth of every contributor in catlist,
create a new lane_shapes, job done.

the issue is: even with those new lane_shapes after creating
a new PartitionPoints, how the heck do you convert each one
of the catlist contributors to the same layout, when each
contributor is smaller and would fit in totally different
places? i don't believe i'm overthinking this.

it may need a design shift in PartitionedCat itself.

need to look at it some more.

Comment 7 Luke Kenneth Casson Leighton 2021-10-24 14:12:08 BST

here is the relevant code section from PartitionedCat:

 110     for pbit in self.ptype.get_cases():
 111         # set up some indices pointing to where things have got
 112         # then when called below in the inner nested loop they give
 113         # the relevant sequential chunk
 114         output = []
 115         y = [0] * len(self.catlist)
 116         # get a list of the length of each partition run
 117         runlengths = get_runlengths(pbit, len(keys))
 118         print ("pbit", bin(pbit), "runs", runlengths)
 119         for i in runlengths: # for each partition
 120             for yidx in range(len(y)):
 121                 thing = self.get_chunk(y, yidx, i) # sequential chunks
 122                 output.append(thing)

the core of PartitionedCat is lines 119 and 120, which
create the Cat()ed result on a per-elwidth basis (line 110)
for every element (line 119) in the partition, where the
actual Cat on a per-element basis is the loop at line 120.

what is missing from this picture is the arbitrary sizes and occurrences
of padding blocks.

to that end, get_chunk needs to move into ElwidPartType, and it
needs to be possible to determine that the "things" being created
do not necessarily match perfectly with the expected width, and
to pad them.

still thinking it through

Comment 8 Luke Kenneth Casson Leighton 2021-10-24 17:20:46 BST

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=f4af452829a8bbe686a5ae34f1ddc362882c9d86

i have a feeling that PartitionedCat needs a redesign to take into
consideration the lengths of each individual element, from which,
in combination with the length of each output element, it becomes
possible to calculate the length of any missing bits, aka "padding".

these would explicitly be set to zero.

another way to do it would be to use targetted slices:
   out[start:end].eq(chunk)

Comment 9 Jacob Lifshay 2021-10-24 20:45:42 BST

Luke, I already wrote a __Cat__ implementation that does everything we need in:
https://bugs.libre-soc.org/show_bug.cgi?id=731#c8

No need to spend a bunch of time trying to figure out how to do it yourself.

Comment 10 Luke Kenneth Casson Leighton 2021-10-24 21:04:03 BST

(In reply to Jacob Lifshay from comment #9)
> Luke, I already wrote a __Cat__ implementation that does everything we need
> in:
> https://bugs.libre-soc.org/show_bug.cgi?id=731#c8

i know: and to remind you, i said that it is a regression, and its
immediate deployment is too disruptive: there is no "from where things are"
to "where it is" path.

the code that you wrote has no understanding of PartType which makes it
a disruptive regression and places it firmly in version 2 category.

> No need to spend a bunch of time trying to figure out how to do it yourself.

well, then, please help by making PartitionedCat (the code that has
unit tests and is embedded into the existing 6 month tested code), work
instead of forcing me to duplicate effort in an area that i am not good at.

you could if you chose to do this in minutes, i am getting quite irritated
that you leave me to struggle with algorithmic issues that take me literally
ten or greater times longer due to a form of dyslexia.

Comment 11 Luke Kenneth Casson Leighton 2021-10-24 21:34:30 BST

(In reply to Jacob Lifshay from comment #9)
> Luke, I already wrote a __Cat__ implementation that does everything we need
> in:
> https://bugs.libre-soc.org/show_bug.cgi?id=731#c8
> 
> No need to spend a bunch of time trying to figure out how to do it yourself.

the only reason i am FORCED to, as you put it, "spend a bunch of time"
is because you are NOT helping by doing what i have told you repeatedly
is required.

please HELP to fix the EXISTING code in an INCREMENTAL fashion

i have stated very clearly multiple times that a project of this size
requires an incremental coding policy.  this is a mandatory requirement
that is not going to be revoked.  no exceptions.

please accept this policy and do not fight it directly or indirectly
by leaving me to struggle with algorithmic issues that you find so easy
"because your version 2 solution is better".

you wrote code that does not fit with existing code.

therefore if it doesn't fit it is *automatically* rejected.

please get this straight and start helping.

Comment 12 Luke Kenneth Casson Leighton 2021-10-25 12:12:17 BST

i've thrown together a SimdShape class and started tying it in.

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=b5ed5a2edf34115e830605370cf5397481b7ea13
https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=b9657e88aa2b50d117c86b88c6d6186b3c91bf8c

the fundamental assumption is that you would never create
a SimdShape without also putting it under a SimdScope
ContextManager.  have to see how that turns out

to make things "look" like they're the same as Scalar i've
also added SimdScope.Shape

commit 8cb1ccf1de1ca0166b0c62fcc46d5011df218eb2 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Oct 25 12:10:46 2021 +0100

    add SimdScope.Shape redirector which switches from scalar to simd behaviour
    depending on context.
    
    for compatibility with nmigen Shape() the width parameter is passed
    through to widths_at_elwid in SimdShape, allowing the layout() function
    the opportunity to turn a scalar width into fixed element widths at
    all elwids.


it may actually turn out to be better to have a "fixed_width" parameter
for SimdShape.  hmmm...

(edit:)

commit ff455da2a1e30a17c74ffd06ec5d0dcd53953093 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Oct 25 12:19:04 2021 +0100

    rename the arguments to SimdShape() so as to match up with Shape() params

Comment 13 Luke Kenneth Casson Leighton 2021-10-25 13:16:24 BST

i cut out an entire swathe of code which creates massive unnecessary
complications, replacing nearly 100 lines of complex interlocking
type analysis and conversions with *three* lines of code.

please write code that is simple and adapt it *only when it proves necessary*
(with incremental commits)


commit 7446824ba1944d822ae07ec826fd664bbdbf8b43 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Oct 25 13:14:06 2021 +0100

    remove unnecessary code which creates complications from SimdScope
    constructor.
    
    elwid is mandatory, vec_el_counts is mandatory.  no need for complicated
    types, complicated adaptation, complicated interlocking conditional
    behaviour.
    
    pass in elwid (Signal)
    pass in vec_el_counts.
    
    done.

Comment 14 Luke Kenneth Casson Leighton 2021-10-25 14:33:42 BST

ha! it works!

commit fff6d9ab2de5700103e0f9c1bb9fab64233f1783 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Oct 25 14:28:55 2021 +0100

    had to add fixed_width parameter temporarily to confirm that the
    minitest, test_partsig_scope.py, worked (which it did).
    now can work out how to remove it


now, the next incremental phases are:

1) work out how to remove the SimdScope.Signal(fixed_width) parameter
2) probably by creating a SimdShape(fixed_width=nnn)
3) which is passed in as the 1st parameter to SimdScope.Signal
4) which in the minitest makes it clear why rudimentary arithmetic
   is needed in SimdScope

        with SimdScope(self.m, elwid, vec_el_counts) as s:
            # BE CAREFUL with the fixed_width parameter.
            # it is NOT available in SimdScope.scalar mode
            self.a = s.Signal(fixed_width=width)
            self.b = s.Signal(fixed_width=width*2)
            self.o = s.Signal(fixed_width=width*3)

==>

        with SimdScope(self.m, elwid, vec_el_counts) as s:
            shape = s.Scope(fixed_width=width)
            self.a = s.Signal(shape)
            self.b = s.Signal(shape*2) # simple __mul__ needed
            self.o = s.Signal(shape*3) # again, simple __mul__ needed

Comment 15 Luke Kenneth Casson Leighton 2021-10-25 14:43:15 BST

done:

commit 6edd0e1e0732b3e9781d3c4d8d3830d475cd4a50
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Oct 25 14:35:57 2021 +0100

    use explicit SimdShape for minitest example rather than
    fixed_width parameter

+            shape = SimdShape(s, fixed_width=width)
+            shape2 = SimdShape(s, fixed_width=width*2)
+            shape3 = SimdShape(s, fixed_width=width*3)
+            self.a = s.Signal(shape)
+            self.b = s.Signal(shape2) # TODO: shape*2
+            self.o = s.Signal(shape3) # TODO: shape*3

TODO: the multiply part.  basically, SimdShape should (multiply-inherit)
derive from SimdMap, *or* (probably better) the functions *of* SimdMap
go into SimdShape.

jacob you had probably best do that, but *please do not go overboard*.

also please do not make SimdShape critically depend on any types.
do not alter it so that it critically depends on the enum Elwid.

if necessary start again with the absolute basics, ignoring SimdMap
entirely and *only* add in the *absolute* minimum rudimentary
functions needed, this time.

remember: write code *on-demand* - NOT "i think this might be needed
so am going to write 1,000 lines of code that has no unit tests
and no feet-on-the-ground justification for its existence"

show the need *first*, create code *second*, ok?

Comment 16 Luke Kenneth Casson Leighton 2021-10-25 19:42:12 BST

i have two functions added to PartType which are a new API, addressing
elements by their elwid and by the element number and returning the
range within the underlying Signal.

now it becomes possible, for an ElwidPartType version, to accumulate the
pieces to be Cat()ed together, and to check the length.  if it does not
fit exactly the expected length of that partition, add some padding zeros
so that it does.