Bug 731 - potential design oversight in Partitioned SimdSignal Cat/Assign/etc lhs/rhs
Summary: potential design oversight in Partitioned SimdSignal Cat/Assign/etc lhs/rhs
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 733
Blocks: 732
  Show dependency treegraph
 
Reported: 2021-10-17 20:07 BST by Luke Kenneth Casson Leighton
Modified: 2021-10-25 16:26 BST (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-10-17 20:07:34 BST
nmigen allows constructs such as:

     m.d.comb += Cat(a, b).eq(c)

which places the AST Construct on the left side not the right.
this is unlikely to have been forseen in the original Partitioned
submodule designs for PartitionedCat, PartitionedRepl, PartitionedAssign
etc. which may only cope at present with the AST Construct being on
the rhs:

     m.d.comb += c.eq(Cat(a, b))

this needs investigation and potentially correcting

Strategy / Incremental work plan:

1) Unit test showing and confirming need for this work       TODO 
  a) first prototype    DONE (confirmed, comment #15)
  b) Cat     TODO
  e) Slice     TODO
2) incremental step of adding "mirror (trmporary) storage classes"
   (or add "hack" to allow PartitionedAssign to set missing LHS/RHS)
  a) Cat class    DONE commit 9a9db43f4cecf0a43e1390a4fb8fd6746776f433
  d) Slice class  TODO
3) Add recognition (isinstance) for mirrors in PartitionedAssign
  (or add recognition of "hack")
  DONE commit ad925fc12563d9097dd1b93df0e0f3dc033b00ad
4) swap over to mirror classes in partsig.py one at a time
   (not needed if using "hack")
  a) Cat    TODO
  d) Slice  TODO
5) create entirely new LHS Partitioned classes (or adapt existing ones)
  a) PartitionedCat    TODO
  d) Slice  TODO
6) update PartitionedAssign to cope with LHS   TODO
7) update partsig if necessary to new PartitionedAssign   TODO
8) run LHS unit tests, confirm functional     TODO

note that the "mirror" classes for Cat, Slice etc do not actually
do anything, they act as simply temporary storage for the arguments
passed to ast.Cat() etc pending encountering an Assign, at which
point and ONLY at which point the missing information (LHS or
RHS) is apparent.
Comment 1 Luke Kenneth Casson Leighton 2021-10-17 20:30:26 BST
firstly, some unit tests need writing (in test_partsig.py),
it does not matter if they fail, that indicates a need to be
fixed.

examining PartitionedCat as an example, the flaw is anticipated
but not yet confirmed to be here:

 112                 with m.Case(pbit):
 113                     # direct access to the underlying Signal
 114                     comb += self.output.sig.eq(Cat(*output))

this is expected only to cope with RHS.  the option would therefore
be required for the following:

                      comb += Cat(*output).eq(self.input)
             

but, paradoxically, this results in exactly the same code because
output has to be followed up with an assignment to self.output.sig
anyway.

thus, we conclude that for "lhs" mode there must be a *list of outputs*
rather than as is the case currently only a list of *inputs*.

determining which mode to use, unfortunately, cannot be determined until
the actual Assignment takes place!

therefore, Assignment must actually perform an analysis of its lhs and
rhs arguments and set which mode (lhs or rhs) they are set to!

this is a rather large piece of work so should only be done if it
turns out to be entirely necessary, and, furthermore, evaluated against
whether the existing HDL can be adopted to avoid LHS AST assignments,
temporarily.  there are a number of known uses of Cat for example
Comment 2 Jacob Lifshay 2021-10-17 22:46:42 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)
> thus, we conclude that for "lhs" mode there must be a *list of outputs*
> rather than as is the case currently only a list of *inputs*.

Yup, having a list of outputs is exactly what I did in SwizzledSimdValue...That list (actually a dict from Signals to be assigned to to the computed values to assign to those Signals) is AssignSwizzle.outputs.

> determining which mode to use, unfortunately, cannot be determined until
> the actual Assignment takes place!
> 
> therefore, Assignment must actually perform an analysis of its lhs and
> rhs arguments and set which mode (lhs or rhs) they are set to!

Well, actually, the exact same object can be used as both a lhs and a rhs:
a = Signal(16)
s = a[5]
b = Signal()
m.d.comb += [b.eq(s), s.eq(1)]

> this is a rather large piece of work so should only be done if it
> turns out to be entirely necessary, and, furthermore, evaluated against
> whether the existing HDL can be adopted to avoid LHS AST assignments,
> temporarily.  there are a number of known uses of Cat for example

I'm expecting that we will want to support a[3:5].eq(...) since that's very common iirc. Assigning to Cat is very uncommon (I don't think we ever do it, but I could be wrong .. maybe as part of assigning to a Record?)
Comment 3 Jacob Lifshay 2021-10-17 22:49:45 BST
(In reply to Luke Kenneth Casson Leighton from comment #0)
> submodule designs for PartitionedCat, PartitionedRepl, PartitionedAssign
> etc. which may only cope at present with the AST Construct being on
> the rhs:

We won't have to worry about Repl, since nmigen only supports assigning to Signals and Slice, Cat, and Part:
https://github.com/nmigen/nmigen/blob/177f1b2e40694da473fdb0d95f4f6b33f5ea12ab/nmigen/back/rtlil.py#L631
Comment 4 Luke Kenneth Casson Leighton 2021-10-18 00:04:59 BST
ngggh i have this horrible feeling that the solution here, if the problem
is demonstrated to exist, involves a "collation" (AST temporary storage)
phase, a suite of mirror Cat/Part/Repl etc classes which keep track of
their Values, such that Assign may, once it is used, explicitly determine
what is LHS and which is RHS and *only then* perform the required
Partition submodule allocation.

given that the existing PartitionedCat and Repl etc. appear to be RHS-only
they need to be complemented with corresponding LHS classes, whereupon
Assign has what it needs.

drat.

solvable but complicated.
Comment 5 Luke Kenneth Casson Leighton 2021-10-18 00:22:47 BST
(In reply to Jacob Lifshay from comment #2)
> (In reply to Luke Kenneth Casson Leighton from comment #1)
> > thus, we conclude that for "lhs" mode there must be a *list of outputs*
> > rather than as is the case currently only a list of *inputs*.
> 
> Yup, having a list of outputs is exactly what I did in
> SwizzledSimdValue...That list (actually a dict from Signals to be assigned
> to to the computed values to assign to those Signals) is
> AssignSwizzle.outputs.

have you written the unit test first that confirms the work is necessary?

(also we do not chuck away existing code without serious justification,
we morph existing code with incremental changes)
 
> > determining which mode to use, unfortunately, cannot be determined until
> > the actual Assignment takes place!
> > 
> > therefore, Assignment must actually perform an analysis of its lhs and
> > rhs arguments and set which mode (lhs or rhs) they are set to!
> 
> Well, actually, the exact same object can be used as both a lhs and a rhs:
> a = Signal(16)
> s = a[5]
> b = Signal()
> m.d.comb += [b.eq(s), s.eq(1)]

if i am following correctly this assumes throwing away existing tested code
and replacing it with unauthorised code which was not discussed whether it
was needed, nor its design aspecymts discussed, before being written?
(if so this is a habit that you really, really need to break)

> I'm expecting that we will want to support a[3:5].eq(...) since that's very
> common iirc. 

the alternative which is a nuisance is to do Cat() construction darn it
that is LHS too.

> Assigning to Cat is very uncommon (I don't think we ever do it,
> but I could be wrong .. maybe as part of assigning to a Record?)

i use it a lot.  the PriorityPicker, some of the ALUs i think.
it is an extremely valuable technique that dramatically clarifies
graphviz diagrams.  there will not be that many though. oh wait...
the entirety of RecordObject, the Stage API... although perhaps
this is outside of the ALUs or there is a workaround.

good about Repl, of course makes no sense to assign LHS multiple
times.
Comment 6 Luke Kenneth Casson Leighton 2021-10-20 13:42:30 BST
ok Grant application nearly over, can semi-focus on this.
strategy / incremental workplan to go into top comment.

regarding unit tests: in Test-Driven Development the unit tests
are written *before* the code (!) which is done entirely as "mock"
that is then substituted one at a time.

it is perfectly fine to commit unit tests that FAIL, that
is their purpose, to keep track.

edit: updated top comment, it is a big damn workplan but quite straighforward,
should be done in about 20 separate commits.

i do not believe it necessary to do "full AST walk" of the AST looking
for things like:

    comb += Cat(Cat(a, b), c, z[3:5]).eq(d)

the "mirror" (temporary) AST classes can therefore take arguments
that are expected to have only one level of substitution.
PartitionedAssign therefore need only look at the type (isinstance)
and go, "oh, this a Cat mirror, i am the LHS of the Assign, i need
to allocate a LHS PartitionedCat"

no more complex than that.

but the workplan *looks* complex because it is designed to
ensure that at no time does any existing code fail existing
unit tests.
Comment 7 Luke Kenneth Casson Leighton 2021-10-21 03:50:33 BST
(In reply to Luke Kenneth Casson Leighton from comment #6)
 
> i do not believe it necessary to do "full AST walk" of the AST looking
> for things like:
> 
>     comb += Cat(Cat(a, b), c, z[3:5]).eq(d)

more to the point: attempts to do exactly that will result in
calls inside ast.py attempting to perform Value.cast() on objects
that do not derive from Value/UserValue.

unless the temporary classes do in fact do exactly that, in which
case there is not much point having them: the existing submodules
might as well just be created.

however... the tricky bit there is making sure that the missing
information (LHS or RHS) is propagated to the submodule *BEFORE*
the submodule's elaborate() function is called.
Comment 8 Jacob Lifshay 2021-10-21 04:05:20 BST
(In reply to Luke Kenneth Casson Leighton from comment #0)
> Strategy / Incremental work plan:

reordering a bit for clarity:

> 2) incremental step of adding "mirror (trmporary) storage classes"

SwizzledSimdValue *is* that temporary storage class, it is just a little different than what you envisioned here:
* use as rhs: its way of knowing when it's being used as the rhs is by trapping other code's accesses of SwizzledSimdValue.sig, and lazily adding the submodule that computes .sig's value. This way, no other code needs to operate differently when using SwizzledSimdValue as a rhs, it looks like a completely normal SimdSignal.
* use as lhs: it overrides __Assign__ so doesn't use PartitionedAssign, instead returning a list of nmigen Assigns, therefore PartitionedAssign doesn't need to be aware of SwizzledSimdValue at all.


> 
> 1) Unit test showing and confirming need for this work        TODO
>   a) first prototype
>   b) Cat     TODO
>   c) Part     TODO

iirc we weren't planning on supporting Part(...).eq(...), since, like Array, it's very complex and not super necessary. SwizzledSimdValue doesn't implement Part's functionality, because, if it did, it would basically devolve to a full general turing-complete logic network, rather than a simple single-layer bit-swizzle (different swizzle for each elwid).

If we need Part, we can have a rhs-only class (like PartitionedDynamicShift), which should be good enough.

lhs Part is hard, as evidenced by nmigen's rtlil backend having to raise an exception to convert each Part to a top-level rtlil switch statement:
raise here:
https://github.com/nmigen/nmigen/blob/177f1b2e40694da473fdb0d95f4f6b33f5ea12ab/nmigen/back/rtlil.py#L677
conversion to switch here:
https://github.com/nmigen/nmigen/blob/177f1b2e40694da473fdb0d95f4f6b33f5ea12ab/nmigen/back/rtlil.py#L785

>   d) Mux?     TODO

Mux(...).eq(...) doesn't work in nmigen, so we can remove that one from the list:

Traceback (most recent call last):
  File "/data/data/com.termux/files/home/nmigen-eq-test.py", line 54, in <module>
    m.d.comb += Mux(a, b, c).eq(d)
  File "/data/data/com.termux/files/home/nmigen/nmigen/hdl/dsl.py", line 38, in __iadd__
......
    raise TypeError("Value {!r} cannot be used in assignments".format(self))
TypeError: Value (m (sig a) (sig b) (sig c)) cannot be used in assignments

>   e) Slice     TODO
> 3) Add recognition (isinstance) for mirrors in PartitionedAssign  TODO

unnecessary, as outlined above

> 4) swap over to mirror classes in partsig.py one at a time
>   a) Cat    TODO
>   b) Part   TODO
>   c) Mux?   TODO
>   d) Slice  TODO
> 5) create entirely new LHS Partitioned classes (or adapt existing ones)

We won't actually need any new classes, since SwizzledSimdValue is general enough that Cat, Slice, and Repl *could* be implemented entirely (for both lhs and rhs) by having __Cat__, __Slice__, etc. just construct a SwizzledSimdValue instance:
kinda like (SimdSignal's method):
def __Cat__(self, *others):
    # convert to SwizzledSimdValue with same value
    retval = SwizzledSimdValue.from_simd_signal(self)

    # basically a wrapper around elwid, SwizzleKey can probably
    # be fully replaced with PartType once it grows more methods
    kinda_elwid = retval.swizzle_key

    # we construct the return value by Cat-ting SimdSignals one at a time,
    # this has no cost in complexity of the produced gates, since it
    # turns the whole sequence of Simd Cats into one layer of
    # non-simd muxes kinda like: Mux(elwid, Cat(bits...), Cat(...))
    for other in others:
        # convert from Value/SimdSignal to SwizzledSimdValue with
        # same value, splatting if needed
        other = SwizzledSimdValue.from_value(kinda_elwid, other)
        lhs_layout = get_layout(retval)
        rhs_layout = get_layout(other)
        layout = compute_cat_layout(retval, other)
        # dict of swizzles for each kinda_elwid value
        swizzles = {}
        for k in kinda_elwid.possible_values:
            # make `width` bit wide Swizzle initialized to const zeros
            swizzle = Swizzle.from_const(0, layout.width)
            # get lhs swizzle
            lhs = retval.swizzles[k]
            # get rhs swizzle
            rhs = other.swizzles[k]
            for lhs_el, rhs_el, el in zip(lhs_layout.elements(k),
                                          rhs_layout.elements(k),
                                          layout.elements(k)):
                # Cat the two lists of bits for this element, and
                # assign to slice of target Swizzle.bits.
                bits = lhs.bits[lhs_el.slice] + rhs.bits[rhs_el.slice]
                swizzle.bits[el.slice] = bits

                # above should probably be refactored into Swizzle.cat
                # and Swizzle.__setitem__ methods
            swizzles[k] = swizzle
        retval = SwizzledSimdValue(kinda_elwid, swizzles)
    return retval


        

> 6) update PartitionedAssign to cope with LHS   TODO

not necessary, as explained above

> 7) update partsig if necessary to new PartitionedAssign   TODO

not necessary, as explained above

> 8) run LHS unit tests, confirm functional     TODO

we retain step 8.

> note that the "mirror" classes for Cat, Part etc do not actually
> do anything, they act as simply temporary storage for the arguments
> passed to ast.Cat() etc pending encountering an Assign, at which
> point and ONLY at which point the missing information (LHS or
> RHS) is apparent.

That works great for LHS, however, for RHS use, you need the actual bits of a Cat, Slice, etc. SimdSignal well before encountering a .eq() call. SwizzledSimdValue handles that by trapping self.sig and lazily computing its value if needed.
Comment 9 Jacob Lifshay 2021-10-21 04:23:25 BST
(In reply to Jacob Lifshay from comment #8)
> iirc we weren't planning on supporting Part(...).eq(...), since, like Array,
> it's very complex and not super necessary. SwizzledSimdValue doesn't
> implement Part's functionality, because, if it did, it would basically
> devolve to a full general turing-complete logic network, rather than a
> simple single-layer bit-swizzle (different swizzle for each elwid).
> 
> If we need Part, we can have a rhs-only class (like
> PartitionedDynamicShift), which should be good enough.

To be clear, Part could be used on a SwizzledSimdValue (by treating it as a generic SimdSignal), it just can't be done without a separate class like PartitionedDynamicShift, unlike Cat and Slice.

The above only applies to dynamic Parts; static Parts can be easily done by just calling Slice, which gains all the SwizzledSimdValue benefits.
Comment 10 Luke Kenneth Casson Leighton 2021-10-21 11:37:31 BST
(In reply to Jacob Lifshay from comment #8)
> (In reply to Luke Kenneth Casson Leighton from comment #0)
> > Strategy / Incremental work plan:
> 
> reordering a bit for clarity:
> 
> > 2) incremental step of adding "mirror (trmporary) storage classes"
> 
> SwizzledSimdValue *is* that temporary storage class, it is just a little
> different than what you envisioned here:

unfortunately, it is much more than that.  it does that job *and* replaces
other code which has unit tests done *and* is a regression on capability
of existing code *and* is not a drop-in replacement for existing code.

it is also an optimisation and it goes too far.

we absolutely have to be extremely strict about this and keep to
an incremental development strategy at all times.

if that class is to be used it needs to be scheduled as its own
incremental update with full unit tests and full functionality and
redesigned as a dropin replacement.

given that the existing classes are functional this firmly places
it into "not essential right now and in fact detrimental to urgent
timescales" territory. which has been discussed and explained.


> * use as rhs: its way of knowing when it's being used as the rhs is by
> trapping other code's accesses of SwizzledSimdValue.sig, and lazily adding
> the submodule that computes .sig's value. This way, no other code needs to
> operate differently when using SwizzledSimdValue as a rhs, it looks like a
> completely normal SimdSignal.
> * use as lhs: it overrides __Assign__ so doesn't use PartitionedAssign,
> instead returning a list of nmigen Assigns, therefore PartitionedAssign
> doesn't need to be aware of SwizzledSimdValue at all.

if you had separated that out as an incremental update it could have
been considered.

> 
> > 
> > 1) Unit test showing and confirming need for this work        TODO
> >   a) first prototype
> >   b) Cat     TODO
> >   c) Part     TODO
> 
> iirc we weren't planning on supporting Part(...).eq(...), since, like Array,
> it's very complex and not super necessary.

i actually use that in dcache.py (!) but that is not ALU related (thank
goodness).

i realised retrospectively that i had looked at Value.__getitem__
and because of its int/Const checking assumed that Part offset must
also be int/Const.

again retrospectively i realised that Slice is const but Part is not,
so yes, Part is firmly off the table.

RHS style SimdSignal.__Part__ can be implemented in terms of a straight shift
and then slice.  it is about 3 lines of code due to SimdSignal shift being fully
functional already.  depends on RHS SimdSignal.__Slice__ being done first
though.  discuss in bug #716.

> If we need Part, we can have a rhs-only class (like
> PartitionedDynamicShift), which should be good enough.

yes.  3 lines.
 
> lhs Part is hard, 

yep, let's skip it.
> >   d) Mux?     TODO
> 
> Mux(...).eq(...) doesn't work in nmigen, so we can remove that one from the
> list:

gone.
 
> >   e) Slice     TODO
> > 3) Add recognition (isinstance) for mirrors in PartitionedAssign  TODO
> 
> unnecessary, as outlined above

version2.  we are not doing version2.  or any kind of premature optimisation.
sorry.  this is basic Project Management.
 
> > 4) swap over to mirror classes in partsig.py one at a time
> >   a) Cat    TODO
> >   b) Part   TODO
> >   c) Mux?   TODO
> >   d) Slice  TODO
> > 5) create entirely new LHS Partitioned classes (or adapt existing ones)
> 
> We won't actually need any new classes, since SwizzledSimdValue is general
> enough that Cat, Slice, and Repl *could* be implemented entirely

again: sorry. basic Stable Project Management rules apply.  SwizzledSimdValue
is too far away from existing code. it is version2 and we have made it
clear that version 2 is off the table until version 1 is complete.


> > note that the "mirror" classes for Cat, Part etc do not actually
> > do anything, they act as simply temporary storage for the arguments
> > passed to ast.Cat() etc pending encountering an Assign, at which
> > point and ONLY at which point the missing information (LHS or
> > RHS) is apparent.
> 
> That works great for LHS, however, for RHS use, you need the actual bits of
> a Cat, Slice, etc. SimdSignal well before encountering a .eq() call.
> SwizzledSimdValue handles that by trapping self.sig and lazily computing its
> value if needed.

if you had followed incremental development practices essential for a
project of this size and separated out the concepts i would be able to
follow this.

not that it needs explanation because the class cannot be used because it
violates Project Devevelopment Practices, i don't understand or follow how
a simple substitution of classes which
defer creation of classes that use the parameters could
cause RHS to stop working.
Comment 11 Luke Kenneth Casson Leighton 2021-10-21 12:39:04 BST
(In reply to Jacob Lifshay from comment #9)

> To be clear, Part could be used on a SwizzledSimdValue (by treating it as a
> generic SimdSignal), it just can't be done without a separate class like
> PartitionedDynamicShift, unlike Cat and Slice.

leaving that Swizzled cannot be used because it violates Project Devlopment
Practices to adopt, i've taken out Part entirely from the incremental
work plan, leaving just Cat and Slice.

i will go ahead with an exploration of Cat, if you can do Slice
as in bug #716 then it can catch up.

if i get into trouble with the incremental work plan i will
stop immediately and report back the results.
Comment 12 Luke Kenneth Casson Leighton 2021-10-21 13:09:30 BST
jacob: you damaged the code by adding in a NotImplemented exception
behind SimdSignal.__Slice__

it is clear that you had not run the unit tests because if you had,
you would have found the error that i just encountered, and fixed
in both PartitionedAssign and PartitionedRepl.

diff --git a/src/ieee754/part_repl/repl.py b/src/ieee754/part_repl/repl.py
index 7347648..05db372 100644
--- a/src/ieee754/part_repl/repl.py
+++ b/src/ieee754/part_repl/repl.py
@@ -73,7 +73,8 @@ class PartitionedRepl(Elaboratable):
         start = keys[upto]
         end = keys[upto+numparts]
         print ("start end", start, end, len(x))
-        return x[start:end]
+        # access the underlying Signal of SimdSignal directly
+        return x.sig[start:end]
 
     def elaborate(self, platform):
         m = Module()

please ensure that you follow Project Development Practices by always
identifying and running relevant unit tests.

please do not assume "the code is unused therefore it doesnt' matter"
(unless of course it is *actually* not used - including not having
a unit test at all because it's code that's just been added - in which
case Project Development Practices state clearly that it's ok to
change code that's not in use).

in this case, however, given the extensive unit tests, how absolutely
crucial this is, how it is critical path and how much effort has gone
into keeping this code stable (6 months and climbing)
it does *not* qualify as "unused", and *requires* running unit tests.

test_partsig.py at the bare minimum.  visual inspection of each
sub-classes output, and running them to ensure no syntax or runtime
errors is a secondary (but high) priority.
Comment 13 Luke Kenneth Casson Leighton 2021-10-21 13:44:54 BST
i added a very simple unit test:

    m.d.comb += o.eq(Cat(a, b))

and some quick investigation "debug prints":

     def __Cat__(self, *args, src_loc_at=0):
+        print ("partsig cat", self, args)
     def __Assign__(self, val, *, src_loc_at=0):
+        print ("partsig assign", self, val)

@@ -75,6 +75,7 @@ class PartitionedAssign(Elaboratable):
     def elaborate(self, platform):
+        print ("PartitionedAssign start")
...
...
+        print ("PartitionedAssign end")

etc. which reveals:

partsig cat <ieee754.part.partsig.SimdSignal object at 0x7f0e88f28400> (<ieee754.part.partsig.SimdSignal object at 0x7f0e88e697b8>,)
partsig assign <ieee754.part.partsig.SimdSignal object at 0x7f0e88e76128> <ieee754.part.partsig.SimdSignal object at 0x7f0e88e76470>
PartitionedCat start
PartitionedCat end
PartitionedAssign start
PartitionedAssign end

which is kinda as expected (but a relief to confirm) that elaborate()
is called **AFTER** submodule instantiation.

therefore, the opportunity to call a function "set_lhs()" exists
inside SimdSignal.__Assign__(), performing intrusive "tree-walking"
or basic one-level inspection of the contents of its inputs, and
adding the necessary parameter which alters the behaviour from
(default RHS) to LHS.

i have absolutely no problem at all with a truly dreadful hack of
adding in __something variables by PCat (and PSlice) which tell
the output that it came from a submodule.

def PCat(m, arglist, ctx):
    from ieee754.part_cat.cat import PartitionedCat # avoid recursive import
    global modcount
    modcount += 1
    pc = PartitionedCat(arglist, ctx)
    setattr(m.submodules, "pcat%d" % modcount, pc)
    pc.output.__trulyterriblehack = pc             <----- hack here
    return pc.output

thus, in SimdSignal.__Assign__ it may go, "oh, does lhs have a member
variable __trulyterriblehack? so let's call lhs.__trulyterriblehack.set_lhs()"

the behaviour of PartitionedCat may therefore SWITCH to one of LHS
rather than RHS.

also, *after* the set_lhs() function is called, lengths of signals at that
time may be adapted / adjusted, such that PartitionedAssign can ascertain
the lengths of the sub-signals, solving what i *think* you might have been 
referring to in comment #9

this should not cause any "damage" and is a minimal incremental change
satisfying the Project Development Practices.

that it is absolutely truly dreadful and will make anyone who sees it
scream if their eyes do not melt first is completely irrelevant :)

hack first [and for goodness sake document it], clean up later.


commit 994d0c3f3d549b1aaa71f64277ed589098c15405 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Oct 21 13:40:29 2021 +0100

    add quick print statements to show that elaborate() gets called as a
    second phase after the creation of the AST tree
    this gives a window of opportunity to tree-walk and set whether SimdSignals
    are LHS or RHS as determined by encountering SimdSignal.__Assign__
Comment 14 Luke Kenneth Casson Leighton 2021-10-21 15:07:19 BST
this "works" and does not cause "damage".  i did however have to
fix one of the unit tests where an assumption had been made
("ok to copy output of SimdSignal Cat onto a Signal") which
does "work" (due to UserValue downcasting).

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=9a9db43f4cecf0a43e1390a4fb8fd6746776f433
https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=ad925fc12563d9097dd1b93df0e0f3dc033b00ad


commit ad925fc12563d9097dd1b93df0e0f3dc033b00ad (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Oct 21 15:04:33 2021 +0100

    continue truly awful hack which, in SimdSignal.__Assign__, detects the
    back-link to the submodule (PartitionedCat) in its return result,
    and calls set_lhs_mode(True) or (False) on LHS and RHS as appropriate.
    
    the default value is *NOT* set in the PartitionedCat constructor very very
    deliberately so as to show up any bugs.  it is particularly fortunate that
    this was chosen to be done because there was, in fact, a bug in the
    TestCatMod unit test, which assumed that it was ok to splat a Cat() result
    of a pair of SimdSignals directly onto a Signal().
    
    it *is* in fact "technically allowed" by nmigen due to automatic casting
    of UserValue, but should not strictly have been done.
Comment 15 Luke Kenneth Casson Leighton 2021-10-21 16:49:39 BST
confirmed: this is what happens:

/home/lkcl/src/libresoc/ieee754fpu/src/ieee754/part/partsig.py:117: DriverConflict: Signal '(sig sig)' is driven from multiple fragments: top, top.pcat2; hierarchy will be flattened
  self.sig = Signal(*args, **kwargs)


lkcl@fizzy:~/src/libresoc/ieee754fpu/src$ git diff

--- a/src/ieee754/part/test/minitest_partsig.py
+++ b/src/ieee754/part/test/minitest_partsig.py
@@ -22,12 +22,19 @@ if __name__ == "__main__":
+    # RHS Cat
     m.d.comb += o.eq(Cat(a, b))
+    # LHS Cat
+    m.d.comb += Cat(a1, b1).eq(o)


commit ded76e2fe0d749ce1ef2930e21d7c6d74dfa3696 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Oct 21 16:50:01 2021 +0100

    confirmed (in prototype form) that LHS Cat will cause conflict

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=ded76e2fe0d749ce1ef2930e21d7c6d74dfa3696
Comment 16 Luke Kenneth Casson Leighton 2021-10-21 16:58:47 BST
wow.  just... wow.  it actually frickin worked.  thiis was all it took
(after having the "hack" set is_lhs).

+++ b/src/ieee754/part_cat/cat.py
@@ -120,7 +122,10 @@ class PartitionedCat(Elaboratable):
                         output.append(thing)
                 with m.Case(pbit):
                     # direct access to the underlying Signal
-                    comb += self.output.sig.eq(Cat(*output))
+                    if self.is_lhs:
+                        comb += Cat(*output).eq(self.output.sig) # LHS mode
+                    else:
+                        comb += self.output.sig.eq(Cat(*output)) # RHS mode
 
         print ("PartitionedCat end")
         return m

commit bc4f03efdc4ae932f2650bec0807070398178aa6 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Oct 21 16:56:51 2021 +0100

    add LHS support into PartitionedCat. amazingly - stunningly - it works

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=bc4f03efdc4ae932f2650bec0807070398178aa6
Comment 17 Luke Kenneth Casson Leighton 2021-10-21 17:54:17 BST
so, jacob, to recap there:

* 1 line of code: a hack backlink was made in the PCat function
  which allowed the signal containing the result to know the
  submodule that *produced* that result
* 4 lines of code: Partitioned Assign detected that hack and
  set a flag LHS/RHS
* 3 lines of code: ParttitionedCat based on that flag performs
  a LHS Cat instead of a RHS Cat.

eight - EIGHT - lines of code (not 400 replacing and regressing
existing classes and setting us back over a week)
excluding explanation, exploration, unit tests
and comments, all of which far exceed the additions themselves.

it doesn't have to be pretty, it doesn't need to fulfil future
needs, it has to do what is immediately needed in the shortest
timescale with the least fuss and in a way that at no time
destroys confidence in or undermines preexisting code.

turns out that PAssign does not need to worry about lengths,
i have no idea why but am not in the least bit going to
be concerned about it.  i suspect that on investigation
it will turn out that nmigen does deal with two assigns
somehow, by identifying the in/out direction.
Comment 18 Jacob Lifshay 2021-10-21 19:29:17 BST
Does it pass:

a = SimdSignal(...)
b = SimdSignal(...)
c = SimdSignal(...)
d = SimdSignal(...)

cat = Cat(a, b)

m.d.comb += cat.eq(c)
m.d.comb += d.eq(cat)

I expect not, since you fell into the trap that each PartitionedCat would only ever be used on the lhs or the rhs, not both. SwizzledSimdValue correctly handles this.

Also, does it handle:
a = SimdSignal(...)
b = SimdSignal(...)
c = SimdSignal(...)
d = SimdSignal(...)

cat = Cat(a, b)

m.d.comb += Cat(cat, c).eq(d) # two stages of Cat

I also expect not, SwizzledSimdValue also correctly handles this case, because it is designed to handle assignments through any number of Cat and Slice calls.
Comment 19 Jacob Lifshay 2021-10-21 19:36:50 BST
(In reply to Luke Kenneth Casson Leighton from comment #12)
> jacob: you damaged the code by adding in a NotImplemented exception
> behind SimdSignal.__Slice__

That's because, at the time, nmigen didn't call __Slice__ for __getitem__ (or so I thought). Apparently I was on an old checkout of nmigen, before:
https://git.libre-soc.org/?p=nmigen.git;a=commit;h=ccfd17f885a34062bbe2bd155a4a71690f5827c6
Comment 20 Jacob Lifshay 2021-10-21 20:30:11 BST
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #8)
> > (In reply to Luke Kenneth Casson Leighton from comment #0)
> > > Strategy / Incremental work plan:
> > 
> > reordering a bit for clarity:
> > 
> > > 2) incremental step of adding "mirror (trmporary) storage classes"
> > 
> > SwizzledSimdValue *is* that temporary storage class, it is just a little
> > different than what you envisioned here:
> 
> unfortunately, it is much more than that.  it does that job *and* replaces
> other code which has unit tests done

Those unit tests can be reused...

> *and* is a regression on capability
> of existing code

how so? What can existing code do that SwizzledSimdValue can't (after Cat and Slice are implemented in terms of it and it's todos are done)? As far as I can tell, SwizzledSimdValue can do everything existing code can, and more.

> *and* is not a drop-in replacement for existing code.

I'm justifying that because I reasoned that the existing Cat code was insufficient for LHS use, and because combining Cat and Slice (and possibly Repl) all into one general class greatly simplifies the logic, as well as generates far more optimal code (a requirement if we want our cpu to actually be low-area/low-power when using simd alus; I don't want us to have to discard all of SimdSignal as waay too inefficient).
For things like Cat(a[30:50][2:10:2], b, Repl(c[23], 5)):
SimdSignal will generate 8(!) layers of muxes (on the order of 2-4000 gates!), whereas SwizzledSimdValue will generate only 1 layer (since that's what it always generates) (I'd guess at most 500 gates, for a 64-bit output SimdSignal with 4 possible elwid values -- generating a single 4-in 64-bit mux).

> 
> it is also an optimisation and it goes too far.
> 
> we absolutely have to be extremely strict about this and keep to
> an incremental development strategy at all times.
> 
> if that class is to be used it needs to be scheduled as its own
> incremental update with full unit tests and full functionality and
> redesigned as a dropin replacement.
> 
> given that the existing classes are functional

Which they're not, Cat doesn't work for nested use on lhs, and doesn't work when the same Cat instance is used both on lhs and rhs, Slice isn't implemented at all. All of those can be quickly implemented in terms of SwizzledSimdValue (I already posted an implementation of Cat previously that handles *all* of those requirements by virtue of letting *already implemented* SwizzledSimdValue do all the heavy lifting).

Here's __Slice__ (SimdSignal method):
def __Slice__(self, start, stop):
    # convert to SwizzledSimdValue with same value
    lhs_v = SwizzledSimdValue.from_simd_signal(self)

    # basically a wrapper around elwid, SwizzleKey can probably
    # be fully replaced with PartType once it grows more methods
    kinda_elwid = lhs_v.swizzle_key
    lhs_layout = get_layout(lhs_v)
    layout = compute_slice_layout(lhs_v, start, stop)
    # dict of swizzles for each kinda_elwid value
    swizzles = {}
    for k in kinda_elwid.possible_values:
        # make `width` bit wide Swizzle initialized to const zeros
        swizzle = Swizzle.from_const(0, layout.width)
        # get lhs swizzle
        lhs = lhs_v.swizzles[k]
        for lhs_el, el in zip(lhs_layout.elements(k), layout.elements(k)):
            # Slice the list of bits for this element, and
            # assign to slice of target Swizzle.bits.
            bits = lhs.bits[lhs_el.slice][start:stop]
            swizzle.bits[el.slice] = bits

            # above should probably be refactored into using
            # Swizzle.__setitem__ method
        swizzles[k] = swizzle
    return SwizzledSimdValue(kinda_elwid, swizzles)


> this firmly places
> it into "not essential right now and in fact detrimental to urgent
> timescales" territory.

The above firmly places it into *faster* to working code territory, cuz I already wrote basically all the code.

> which has been discussed and explained.
> 
> 
> > * use as rhs: its way of knowing when it's being used as the rhs is by
> > trapping other code's accesses of SwizzledSimdValue.sig, and lazily adding
> > the submodule that computes .sig's value. This way, no other code needs to
> > operate differently when using SwizzledSimdValue as a rhs, it looks like a
> > completely normal SimdSignal.
> > * use as lhs: it overrides __Assign__ so doesn't use PartitionedAssign,
> > instead returning a list of nmigen Assigns, therefore PartitionedAssign
> > doesn't need to be aware of SwizzledSimdValue at all.
> 
> if you had separated that out as an incremental update it could have
> been considered.

Well, luke, you have to realize that not all things can be implemented incrementally, and that forcing them to occur incrementally when not suited to that will take waay longer and the diffs will not actually make much sense.
Comment 21 Luke Kenneth Casson Leighton 2021-10-21 22:20:07 BST
(In reply to Jacob Lifshay from comment #18)
> Does it pass:
> 
> a = SimdSignal(...)
> b = SimdSignal(...)
> c = SimdSignal(...)
> d = SimdSignal(...)
> 
> cat = Cat(a, b)
> 
> m.d.comb += cat.eq(c)
> m.d.comb += d.eq(cat)

no, because there aren't any locations where that's used
(because the practice of multiple-use of expensive AST
expressions as python variables i learned very early on
results in duplicated HDL)

> I expect not, since you fell into the trap that each PartitionedCat would
> only ever be used on the lhs or the rhs, not both.

... not a trap per se: i entirely avoid the practice throughout
the entirety of the code and will, if i see it being used, request
that it be fixed as a severe bug (or be commented in exceptional
circumstances)

if it turns out to be necessary, there is a simple fix: identify
whether set_lhs_mode has been called or not, if it's an incompatible
{LR}HS, and if so, duplicate the expression prior to using it.

this is about 6-8 lines of code, not 400.

> Also, does it handle:
> a = SimdSignal(...)
> b = SimdSignal(...)
> c = SimdSignal(...)
> d = SimdSignal(...)
> 
> cat = Cat(a, b)
> 
> m.d.comb += Cat(cat, c).eq(d) # two stages of Cat

no, and i explained already that this is, again, something that
is not used (double-cat), because, again, i learned very quickly
(2 years ago) that using python assignments is not a good idea.

therefore i know that there's no code that does this,
therefore it is not needed right now,
therefore there is no point worrying about it and certainly not
getting into discussions that delay progress more than progress
has already been delayed by trying to work in version 2 code and
concepts into version 1.
 
> I also expect not, SwizzledSimdValue also correctly handles this case,
> because it is designed to handle assignments through any number of Cat and
> Slice calls.

that's great - if it were needed for version 1.0 which is the "pragmatic
get-it-done-as-quickly-as-possible and learn from the exercise" version.

i know what you're doing. you're solving problems ahead-of-time.  this
is what i used to do.  you feel, "this person isn't listening to what
i'm saying therefore i'm just going to go ahead and do it".

the problem with that approach is that things emerge retrospectively.
i didn't know that you'd designed something that solved issues i hadn't
even thought about or understood, yet.

an incremental approach on the other hand allows me to understand what
you are trying to communicate, in terms of what i already understand.

the two cases you've noted - if in the extremely unlikely event that they
are needed - can be fixed by adding the capability to the existing code.
Comment 22 Luke Kenneth Casson Leighton 2021-10-21 22:35:51 BST
(In reply to Jacob Lifshay from comment #20)

> how so? What can existing code do that SwizzledSimdValue can't (after Cat
> and Slice are implemented in terms of it and it's todos are done)?

it doesn't use PartType.

> I'm justifying that because I reasoned that the existing Cat code was
> insufficient for LHS use, 

because you didn't trust the incremental approach that i described
is essential for projects of this size and complexity.

> and because combining Cat and Slice (and possibly
> Repl) all into one general class greatly simplifies the logic,

but abandons existing work which you resisted doing so i had to do it.
if you had done it when i asked instead of resisting, then you would
have been the one to make the decisions.

instead, because you kept resisting and wanting to design a version 2.0
before version 1.0 had been completed, you didn't help write PartitionedCat
etc. which then forced me into a position of doing it myself.

once they're done, they're done.  those then become the code that needs
to be incrementally morphed.

>  as well as
> generates far more optimal code (a requirement if we want our cpu to
> actually be low-area/low-power when using simd alus;

this is a good justification: it's also a version 2.0 justification.

we are *seriously behind* jacob.  how many times do i have to point
this out.

a version 1.0 is absolutely absolutely critical to moving the ALUs forward
*regardless* of how inefficient version 1.0 is.

please get this through to your understanding that this is urgent,
urgent, urgent.

> I don't want us to have
> to discard all of SimdSignal as waay too inefficient).
> For things like Cat(a[30:50][2:10:2], b, Repl(c[23], 5)):
> SimdSignal will generate 8(!) layers of muxes (on the order of 2-4000
> gates!), whereas SwizzledSimdValue will generate only 1 layer (since that's
> what it always generates) (I'd guess at most 500 gates, for a 64-bit output
> SimdSignal with 4 possible elwid values -- generating a single 4-in 64-bit
> mux).

that's great!  [for version 2.0 and/or for a drop-in replacement along the
way once we have multiple people working on ALU conversions and writing new
ones]

how many times do i have to repeat that this is urgent, urgent, urgent.

discussing this multiple times is actually actively contributing to those
very delays that i am trying to get you to stop creating.

you are *aggravating* the project, not *assisting* the project right now
by denying taking on board this lesson.



> > it is also an optimisation and it goes too far.
> > 
> > we absolutely have to be extremely strict about this and keep to
> > an incremental development strategy at all times.
> > 
> > if that class is to be used it needs to be scheduled as its own
> > incremental update with full unit tests and full functionality and
> > redesigned as a dropin replacement.
> > 
> > given that the existing classes are functional
> 
> Which they're not, Cat doesn't work for nested use on lhs, 

don't need it.

> and doesn't work when the same Cat instance is used both on lhs and rhs,

don't need it.

>  Slice isn't implemented at all.

only because you haven't done what you've been asked to do!


> All of those can be quickly implemented in terms of
> SwizzledSimdValue (I already posted an implementation of Cat previously that
> handles *all* of those requirements by virtue of letting *already
> implemented* SwizzledSimdValue do all the heavy lifting).

this can be done *AFTER* the urgent urgent urgent job of finishing all
of the AST constructs has been

we are holding up *three people* from helping on parallel tasks of
converting ALUs right now.

a "totally crap from your perspective" version 1 would allow unblocking
of that critical path.

*please listen* and help out

you keep not listening and keep fighting this and not learning this extremely
important Project Management lesson.

we are on *critical path*.

please listen.

please do the tasks that are needed to get off of critical path as
quickly as possible.

please listen.

> The above firmly places it into *faster* to working code territory, cuz I
> already wrote basically all the code.

which i have already told you goes *backwards* because it is not compatible
with the existing API and functionality.

which you haven't listened to me about because you've been trying to focus
everything on version 2.0.

please for goodness sake listen.

> Well, luke, you have to realize that not all things can be implemented
> incrementally, and that forcing them to occur incrementally when not suited
> to that will take waay longer and the diffs will not actually make much
> sense.

this is a skill that i am attempting to teach you.  it is absolutely
essential that you learn it for a project of this magnitude and
complexity.

so please.

will you please.

follow the path that i've outlined.

we are running out of time.

we are holding up multiple people and multiple tasks.
Comment 23 Luke Kenneth Casson Leighton 2021-10-22 00:21:59 BST
(In reply to Jacob Lifshay from comment #20)

> For things like Cat(a[30:50][2:10:2], b, Repl(c[23], 5)):
> SimdSignal will generate 8(!) layers of muxes (on the order of 2-4000
> gates!), whereas SwizzledSimdValue will generate only 1 layer (since that's
> what it always generates)

ok.  so - finally - it emerges what the purpose (and value) of SwizzledSimdValue
is.

anyone else who did not also have Asperger's would say something stupid
(like they did to me, and i found it terribly discouraging), along the lines of,
"well why the bloody hell didn't you explain this initially??"

the idea - complete idea - sort-of "emerged" in your head, fully-formed,
solving multiple problems at once, and the full list of things it solved
were so overwhelming that you *couldn't* explain them all at once, could you?

(i cannot count the number of times over the past 30 years that this happened
to me.  you are extremely lucky that i recognise it).

so.

what i learned to do was to say this:

"i have a solution, but it solves so many things that i can't explain all
of them immediately to you.  even if i write this in 4 hours flat it will
take about 2 days to 3 weeks for me to even grasp fully what *i* have solved
in a way that allows me to explain it. i need to put together a
proof-of-concept, which can be analysed retrospectively: is that ok?"

because i recognise this syndrome ("ideas emerging fully-formed") i have
no problem at all with you saying that.  [corollary: where i _do_ have a
problem is if you go ahead *without* saying that, because it severely
disrupts planning and task prioritisation].

i will put together a new bugreport, and what i will do - and i need you
to understand and accept this - is to put together a series of tasks
outlining necessary test suites and other blocking tasks that it is
*essential* that are completed first because SwizzledSimdValue is categorised
as an "optimisation" not an "essential", where the existing classes
"already do the job" and allow SimdSignal to be signed off and thus move
off of critical path.

those test suites - all of which can be added right now - do not have
to pass with version 1 code: they do have to be written though.  those
unit tests will be essential to proving that the ALU code is not going to
be disrupted by a drop-in replacement in the form of SwizzledSimdValue.

in particular, the unit tests must include the 2 different PartTypes
(one "existing" PartitionedSignal behaviour, covering all permutations,
one "elwidth" behaviour.  i expect SwizzledSimdValue to pass 100% all
unit tests with *both* PartTypes, and that means that the unit tests
need to be augmented - beforehand - to show that *existing* SimdSignal
with both types of behaviour - also work 100%.

i also have no problem at all with you putting in "this is planned to
be replaced with version 2 code, see bugreport XYZ" *as long as you
actually help do the version 1 code*.

(edit: done.  bug #732)
Comment 24 Luke Kenneth Casson Leighton 2021-10-25 16:26:49 BST
i just realised: when set_lhs_mode() is called, the callee may likewise
do exactly the sane thing, with its parameters.