nmigen allows constructs such as: m.d.comb += Cat(a, b).eq(c) which places the AST Construct on the left side not the right. this is unlikely to have been forseen in the original Partitioned submodule designs for PartitionedCat, PartitionedRepl, PartitionedAssign etc. which may only cope at present with the AST Construct being on the rhs: m.d.comb += c.eq(Cat(a, b)) this needs investigation and potentially correcting Strategy / Incremental work plan: 1) Unit test showing and confirming need for this work TODO a) first prototype DONE (confirmed, comment #15) b) Cat TODO e) Slice TODO 2) incremental step of adding "mirror (trmporary) storage classes" (or add "hack" to allow PartitionedAssign to set missing LHS/RHS) a) Cat class DONE commit 9a9db43f4cecf0a43e1390a4fb8fd6746776f433 d) Slice class TODO 3) Add recognition (isinstance) for mirrors in PartitionedAssign (or add recognition of "hack") DONE commit ad925fc12563d9097dd1b93df0e0f3dc033b00ad 4) swap over to mirror classes in partsig.py one at a time (not needed if using "hack") a) Cat TODO d) Slice TODO 5) create entirely new LHS Partitioned classes (or adapt existing ones) a) PartitionedCat TODO d) Slice TODO 6) update PartitionedAssign to cope with LHS TODO 7) update partsig if necessary to new PartitionedAssign TODO 8) run LHS unit tests, confirm functional TODO note that the "mirror" classes for Cat, Slice etc do not actually do anything, they act as simply temporary storage for the arguments passed to ast.Cat() etc pending encountering an Assign, at which point and ONLY at which point the missing information (LHS or RHS) is apparent.
firstly, some unit tests need writing (in test_partsig.py), it does not matter if they fail, that indicates a need to be fixed. examining PartitionedCat as an example, the flaw is anticipated but not yet confirmed to be here: 112 with m.Case(pbit): 113 # direct access to the underlying Signal 114 comb += self.output.sig.eq(Cat(*output)) this is expected only to cope with RHS. the option would therefore be required for the following: comb += Cat(*output).eq(self.input) but, paradoxically, this results in exactly the same code because output has to be followed up with an assignment to self.output.sig anyway. thus, we conclude that for "lhs" mode there must be a *list of outputs* rather than as is the case currently only a list of *inputs*. determining which mode to use, unfortunately, cannot be determined until the actual Assignment takes place! therefore, Assignment must actually perform an analysis of its lhs and rhs arguments and set which mode (lhs or rhs) they are set to! this is a rather large piece of work so should only be done if it turns out to be entirely necessary, and, furthermore, evaluated against whether the existing HDL can be adopted to avoid LHS AST assignments, temporarily. there are a number of known uses of Cat for example
(In reply to Luke Kenneth Casson Leighton from comment #1) > thus, we conclude that for "lhs" mode there must be a *list of outputs* > rather than as is the case currently only a list of *inputs*. Yup, having a list of outputs is exactly what I did in SwizzledSimdValue...That list (actually a dict from Signals to be assigned to to the computed values to assign to those Signals) is AssignSwizzle.outputs. > determining which mode to use, unfortunately, cannot be determined until > the actual Assignment takes place! > > therefore, Assignment must actually perform an analysis of its lhs and > rhs arguments and set which mode (lhs or rhs) they are set to! Well, actually, the exact same object can be used as both a lhs and a rhs: a = Signal(16) s = a[5] b = Signal() m.d.comb += [b.eq(s), s.eq(1)] > this is a rather large piece of work so should only be done if it > turns out to be entirely necessary, and, furthermore, evaluated against > whether the existing HDL can be adopted to avoid LHS AST assignments, > temporarily. there are a number of known uses of Cat for example I'm expecting that we will want to support a[3:5].eq(...) since that's very common iirc. Assigning to Cat is very uncommon (I don't think we ever do it, but I could be wrong .. maybe as part of assigning to a Record?)
(In reply to Luke Kenneth Casson Leighton from comment #0) > submodule designs for PartitionedCat, PartitionedRepl, PartitionedAssign > etc. which may only cope at present with the AST Construct being on > the rhs: We won't have to worry about Repl, since nmigen only supports assigning to Signals and Slice, Cat, and Part: https://github.com/nmigen/nmigen/blob/177f1b2e40694da473fdb0d95f4f6b33f5ea12ab/nmigen/back/rtlil.py#L631
ngggh i have this horrible feeling that the solution here, if the problem is demonstrated to exist, involves a "collation" (AST temporary storage) phase, a suite of mirror Cat/Part/Repl etc classes which keep track of their Values, such that Assign may, once it is used, explicitly determine what is LHS and which is RHS and *only then* perform the required Partition submodule allocation. given that the existing PartitionedCat and Repl etc. appear to be RHS-only they need to be complemented with corresponding LHS classes, whereupon Assign has what it needs. drat. solvable but complicated.
(In reply to Jacob Lifshay from comment #2) > (In reply to Luke Kenneth Casson Leighton from comment #1) > > thus, we conclude that for "lhs" mode there must be a *list of outputs* > > rather than as is the case currently only a list of *inputs*. > > Yup, having a list of outputs is exactly what I did in > SwizzledSimdValue...That list (actually a dict from Signals to be assigned > to to the computed values to assign to those Signals) is > AssignSwizzle.outputs. have you written the unit test first that confirms the work is necessary? (also we do not chuck away existing code without serious justification, we morph existing code with incremental changes) > > determining which mode to use, unfortunately, cannot be determined until > > the actual Assignment takes place! > > > > therefore, Assignment must actually perform an analysis of its lhs and > > rhs arguments and set which mode (lhs or rhs) they are set to! > > Well, actually, the exact same object can be used as both a lhs and a rhs: > a = Signal(16) > s = a[5] > b = Signal() > m.d.comb += [b.eq(s), s.eq(1)] if i am following correctly this assumes throwing away existing tested code and replacing it with unauthorised code which was not discussed whether it was needed, nor its design aspecymts discussed, before being written? (if so this is a habit that you really, really need to break) > I'm expecting that we will want to support a[3:5].eq(...) since that's very > common iirc. the alternative which is a nuisance is to do Cat() construction darn it that is LHS too. > Assigning to Cat is very uncommon (I don't think we ever do it, > but I could be wrong .. maybe as part of assigning to a Record?) i use it a lot. the PriorityPicker, some of the ALUs i think. it is an extremely valuable technique that dramatically clarifies graphviz diagrams. there will not be that many though. oh wait... the entirety of RecordObject, the Stage API... although perhaps this is outside of the ALUs or there is a workaround. good about Repl, of course makes no sense to assign LHS multiple times.
ok Grant application nearly over, can semi-focus on this. strategy / incremental workplan to go into top comment. regarding unit tests: in Test-Driven Development the unit tests are written *before* the code (!) which is done entirely as "mock" that is then substituted one at a time. it is perfectly fine to commit unit tests that FAIL, that is their purpose, to keep track. edit: updated top comment, it is a big damn workplan but quite straighforward, should be done in about 20 separate commits. i do not believe it necessary to do "full AST walk" of the AST looking for things like: comb += Cat(Cat(a, b), c, z[3:5]).eq(d) the "mirror" (temporary) AST classes can therefore take arguments that are expected to have only one level of substitution. PartitionedAssign therefore need only look at the type (isinstance) and go, "oh, this a Cat mirror, i am the LHS of the Assign, i need to allocate a LHS PartitionedCat" no more complex than that. but the workplan *looks* complex because it is designed to ensure that at no time does any existing code fail existing unit tests.
(In reply to Luke Kenneth Casson Leighton from comment #6) > i do not believe it necessary to do "full AST walk" of the AST looking > for things like: > > comb += Cat(Cat(a, b), c, z[3:5]).eq(d) more to the point: attempts to do exactly that will result in calls inside ast.py attempting to perform Value.cast() on objects that do not derive from Value/UserValue. unless the temporary classes do in fact do exactly that, in which case there is not much point having them: the existing submodules might as well just be created. however... the tricky bit there is making sure that the missing information (LHS or RHS) is propagated to the submodule *BEFORE* the submodule's elaborate() function is called.
(In reply to Luke Kenneth Casson Leighton from comment #0) > Strategy / Incremental work plan: reordering a bit for clarity: > 2) incremental step of adding "mirror (trmporary) storage classes" SwizzledSimdValue *is* that temporary storage class, it is just a little different than what you envisioned here: * use as rhs: its way of knowing when it's being used as the rhs is by trapping other code's accesses of SwizzledSimdValue.sig, and lazily adding the submodule that computes .sig's value. This way, no other code needs to operate differently when using SwizzledSimdValue as a rhs, it looks like a completely normal SimdSignal. * use as lhs: it overrides __Assign__ so doesn't use PartitionedAssign, instead returning a list of nmigen Assigns, therefore PartitionedAssign doesn't need to be aware of SwizzledSimdValue at all. > > 1) Unit test showing and confirming need for this work TODO > a) first prototype > b) Cat TODO > c) Part TODO iirc we weren't planning on supporting Part(...).eq(...), since, like Array, it's very complex and not super necessary. SwizzledSimdValue doesn't implement Part's functionality, because, if it did, it would basically devolve to a full general turing-complete logic network, rather than a simple single-layer bit-swizzle (different swizzle for each elwid). If we need Part, we can have a rhs-only class (like PartitionedDynamicShift), which should be good enough. lhs Part is hard, as evidenced by nmigen's rtlil backend having to raise an exception to convert each Part to a top-level rtlil switch statement: raise here: https://github.com/nmigen/nmigen/blob/177f1b2e40694da473fdb0d95f4f6b33f5ea12ab/nmigen/back/rtlil.py#L677 conversion to switch here: https://github.com/nmigen/nmigen/blob/177f1b2e40694da473fdb0d95f4f6b33f5ea12ab/nmigen/back/rtlil.py#L785 > d) Mux? TODO Mux(...).eq(...) doesn't work in nmigen, so we can remove that one from the list: Traceback (most recent call last): File "/data/data/com.termux/files/home/nmigen-eq-test.py", line 54, in <module> m.d.comb += Mux(a, b, c).eq(d) File "/data/data/com.termux/files/home/nmigen/nmigen/hdl/dsl.py", line 38, in __iadd__ ...... raise TypeError("Value {!r} cannot be used in assignments".format(self)) TypeError: Value (m (sig a) (sig b) (sig c)) cannot be used in assignments > e) Slice TODO > 3) Add recognition (isinstance) for mirrors in PartitionedAssign TODO unnecessary, as outlined above > 4) swap over to mirror classes in partsig.py one at a time > a) Cat TODO > b) Part TODO > c) Mux? TODO > d) Slice TODO > 5) create entirely new LHS Partitioned classes (or adapt existing ones) We won't actually need any new classes, since SwizzledSimdValue is general enough that Cat, Slice, and Repl *could* be implemented entirely (for both lhs and rhs) by having __Cat__, __Slice__, etc. just construct a SwizzledSimdValue instance: kinda like (SimdSignal's method): def __Cat__(self, *others): # convert to SwizzledSimdValue with same value retval = SwizzledSimdValue.from_simd_signal(self) # basically a wrapper around elwid, SwizzleKey can probably # be fully replaced with PartType once it grows more methods kinda_elwid = retval.swizzle_key # we construct the return value by Cat-ting SimdSignals one at a time, # this has no cost in complexity of the produced gates, since it # turns the whole sequence of Simd Cats into one layer of # non-simd muxes kinda like: Mux(elwid, Cat(bits...), Cat(...)) for other in others: # convert from Value/SimdSignal to SwizzledSimdValue with # same value, splatting if needed other = SwizzledSimdValue.from_value(kinda_elwid, other) lhs_layout = get_layout(retval) rhs_layout = get_layout(other) layout = compute_cat_layout(retval, other) # dict of swizzles for each kinda_elwid value swizzles = {} for k in kinda_elwid.possible_values: # make `width` bit wide Swizzle initialized to const zeros swizzle = Swizzle.from_const(0, layout.width) # get lhs swizzle lhs = retval.swizzles[k] # get rhs swizzle rhs = other.swizzles[k] for lhs_el, rhs_el, el in zip(lhs_layout.elements(k), rhs_layout.elements(k), layout.elements(k)): # Cat the two lists of bits for this element, and # assign to slice of target Swizzle.bits. bits = lhs.bits[lhs_el.slice] + rhs.bits[rhs_el.slice] swizzle.bits[el.slice] = bits # above should probably be refactored into Swizzle.cat # and Swizzle.__setitem__ methods swizzles[k] = swizzle retval = SwizzledSimdValue(kinda_elwid, swizzles) return retval > 6) update PartitionedAssign to cope with LHS TODO not necessary, as explained above > 7) update partsig if necessary to new PartitionedAssign TODO not necessary, as explained above > 8) run LHS unit tests, confirm functional TODO we retain step 8. > note that the "mirror" classes for Cat, Part etc do not actually > do anything, they act as simply temporary storage for the arguments > passed to ast.Cat() etc pending encountering an Assign, at which > point and ONLY at which point the missing information (LHS or > RHS) is apparent. That works great for LHS, however, for RHS use, you need the actual bits of a Cat, Slice, etc. SimdSignal well before encountering a .eq() call. SwizzledSimdValue handles that by trapping self.sig and lazily computing its value if needed.
(In reply to Jacob Lifshay from comment #8) > iirc we weren't planning on supporting Part(...).eq(...), since, like Array, > it's very complex and not super necessary. SwizzledSimdValue doesn't > implement Part's functionality, because, if it did, it would basically > devolve to a full general turing-complete logic network, rather than a > simple single-layer bit-swizzle (different swizzle for each elwid). > > If we need Part, we can have a rhs-only class (like > PartitionedDynamicShift), which should be good enough. To be clear, Part could be used on a SwizzledSimdValue (by treating it as a generic SimdSignal), it just can't be done without a separate class like PartitionedDynamicShift, unlike Cat and Slice. The above only applies to dynamic Parts; static Parts can be easily done by just calling Slice, which gains all the SwizzledSimdValue benefits.
(In reply to Jacob Lifshay from comment #8) > (In reply to Luke Kenneth Casson Leighton from comment #0) > > Strategy / Incremental work plan: > > reordering a bit for clarity: > > > 2) incremental step of adding "mirror (trmporary) storage classes" > > SwizzledSimdValue *is* that temporary storage class, it is just a little > different than what you envisioned here: unfortunately, it is much more than that. it does that job *and* replaces other code which has unit tests done *and* is a regression on capability of existing code *and* is not a drop-in replacement for existing code. it is also an optimisation and it goes too far. we absolutely have to be extremely strict about this and keep to an incremental development strategy at all times. if that class is to be used it needs to be scheduled as its own incremental update with full unit tests and full functionality and redesigned as a dropin replacement. given that the existing classes are functional this firmly places it into "not essential right now and in fact detrimental to urgent timescales" territory. which has been discussed and explained. > * use as rhs: its way of knowing when it's being used as the rhs is by > trapping other code's accesses of SwizzledSimdValue.sig, and lazily adding > the submodule that computes .sig's value. This way, no other code needs to > operate differently when using SwizzledSimdValue as a rhs, it looks like a > completely normal SimdSignal. > * use as lhs: it overrides __Assign__ so doesn't use PartitionedAssign, > instead returning a list of nmigen Assigns, therefore PartitionedAssign > doesn't need to be aware of SwizzledSimdValue at all. if you had separated that out as an incremental update it could have been considered. > > > > > 1) Unit test showing and confirming need for this work TODO > > a) first prototype > > b) Cat TODO > > c) Part TODO > > iirc we weren't planning on supporting Part(...).eq(...), since, like Array, > it's very complex and not super necessary. i actually use that in dcache.py (!) but that is not ALU related (thank goodness). i realised retrospectively that i had looked at Value.__getitem__ and because of its int/Const checking assumed that Part offset must also be int/Const. again retrospectively i realised that Slice is const but Part is not, so yes, Part is firmly off the table. RHS style SimdSignal.__Part__ can be implemented in terms of a straight shift and then slice. it is about 3 lines of code due to SimdSignal shift being fully functional already. depends on RHS SimdSignal.__Slice__ being done first though. discuss in bug #716. > If we need Part, we can have a rhs-only class (like > PartitionedDynamicShift), which should be good enough. yes. 3 lines. > lhs Part is hard, yep, let's skip it. > > d) Mux? TODO > > Mux(...).eq(...) doesn't work in nmigen, so we can remove that one from the > list: gone. > > e) Slice TODO > > 3) Add recognition (isinstance) for mirrors in PartitionedAssign TODO > > unnecessary, as outlined above version2. we are not doing version2. or any kind of premature optimisation. sorry. this is basic Project Management. > > 4) swap over to mirror classes in partsig.py one at a time > > a) Cat TODO > > b) Part TODO > > c) Mux? TODO > > d) Slice TODO > > 5) create entirely new LHS Partitioned classes (or adapt existing ones) > > We won't actually need any new classes, since SwizzledSimdValue is general > enough that Cat, Slice, and Repl *could* be implemented entirely again: sorry. basic Stable Project Management rules apply. SwizzledSimdValue is too far away from existing code. it is version2 and we have made it clear that version 2 is off the table until version 1 is complete. > > note that the "mirror" classes for Cat, Part etc do not actually > > do anything, they act as simply temporary storage for the arguments > > passed to ast.Cat() etc pending encountering an Assign, at which > > point and ONLY at which point the missing information (LHS or > > RHS) is apparent. > > That works great for LHS, however, for RHS use, you need the actual bits of > a Cat, Slice, etc. SimdSignal well before encountering a .eq() call. > SwizzledSimdValue handles that by trapping self.sig and lazily computing its > value if needed. if you had followed incremental development practices essential for a project of this size and separated out the concepts i would be able to follow this. not that it needs explanation because the class cannot be used because it violates Project Devevelopment Practices, i don't understand or follow how a simple substitution of classes which defer creation of classes that use the parameters could cause RHS to stop working.
(In reply to Jacob Lifshay from comment #9) > To be clear, Part could be used on a SwizzledSimdValue (by treating it as a > generic SimdSignal), it just can't be done without a separate class like > PartitionedDynamicShift, unlike Cat and Slice. leaving that Swizzled cannot be used because it violates Project Devlopment Practices to adopt, i've taken out Part entirely from the incremental work plan, leaving just Cat and Slice. i will go ahead with an exploration of Cat, if you can do Slice as in bug #716 then it can catch up. if i get into trouble with the incremental work plan i will stop immediately and report back the results.
jacob: you damaged the code by adding in a NotImplemented exception behind SimdSignal.__Slice__ it is clear that you had not run the unit tests because if you had, you would have found the error that i just encountered, and fixed in both PartitionedAssign and PartitionedRepl. diff --git a/src/ieee754/part_repl/repl.py b/src/ieee754/part_repl/repl.py index 7347648..05db372 100644 --- a/src/ieee754/part_repl/repl.py +++ b/src/ieee754/part_repl/repl.py @@ -73,7 +73,8 @@ class PartitionedRepl(Elaboratable): start = keys[upto] end = keys[upto+numparts] print ("start end", start, end, len(x)) - return x[start:end] + # access the underlying Signal of SimdSignal directly + return x.sig[start:end] def elaborate(self, platform): m = Module() please ensure that you follow Project Development Practices by always identifying and running relevant unit tests. please do not assume "the code is unused therefore it doesnt' matter" (unless of course it is *actually* not used - including not having a unit test at all because it's code that's just been added - in which case Project Development Practices state clearly that it's ok to change code that's not in use). in this case, however, given the extensive unit tests, how absolutely crucial this is, how it is critical path and how much effort has gone into keeping this code stable (6 months and climbing) it does *not* qualify as "unused", and *requires* running unit tests. test_partsig.py at the bare minimum. visual inspection of each sub-classes output, and running them to ensure no syntax or runtime errors is a secondary (but high) priority.
i added a very simple unit test: m.d.comb += o.eq(Cat(a, b)) and some quick investigation "debug prints": def __Cat__(self, *args, src_loc_at=0): + print ("partsig cat", self, args) def __Assign__(self, val, *, src_loc_at=0): + print ("partsig assign", self, val) @@ -75,6 +75,7 @@ class PartitionedAssign(Elaboratable): def elaborate(self, platform): + print ("PartitionedAssign start") ... ... + print ("PartitionedAssign end") etc. which reveals: partsig cat <ieee754.part.partsig.SimdSignal object at 0x7f0e88f28400> (<ieee754.part.partsig.SimdSignal object at 0x7f0e88e697b8>,) partsig assign <ieee754.part.partsig.SimdSignal object at 0x7f0e88e76128> <ieee754.part.partsig.SimdSignal object at 0x7f0e88e76470> PartitionedCat start PartitionedCat end PartitionedAssign start PartitionedAssign end which is kinda as expected (but a relief to confirm) that elaborate() is called **AFTER** submodule instantiation. therefore, the opportunity to call a function "set_lhs()" exists inside SimdSignal.__Assign__(), performing intrusive "tree-walking" or basic one-level inspection of the contents of its inputs, and adding the necessary parameter which alters the behaviour from (default RHS) to LHS. i have absolutely no problem at all with a truly dreadful hack of adding in __something variables by PCat (and PSlice) which tell the output that it came from a submodule. def PCat(m, arglist, ctx): from ieee754.part_cat.cat import PartitionedCat # avoid recursive import global modcount modcount += 1 pc = PartitionedCat(arglist, ctx) setattr(m.submodules, "pcat%d" % modcount, pc) pc.output.__trulyterriblehack = pc <----- hack here return pc.output thus, in SimdSignal.__Assign__ it may go, "oh, does lhs have a member variable __trulyterriblehack? so let's call lhs.__trulyterriblehack.set_lhs()" the behaviour of PartitionedCat may therefore SWITCH to one of LHS rather than RHS. also, *after* the set_lhs() function is called, lengths of signals at that time may be adapted / adjusted, such that PartitionedAssign can ascertain the lengths of the sub-signals, solving what i *think* you might have been referring to in comment #9 this should not cause any "damage" and is a minimal incremental change satisfying the Project Development Practices. that it is absolutely truly dreadful and will make anyone who sees it scream if their eyes do not melt first is completely irrelevant :) hack first [and for goodness sake document it], clean up later. commit 994d0c3f3d549b1aaa71f64277ed589098c15405 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Oct 21 13:40:29 2021 +0100 add quick print statements to show that elaborate() gets called as a second phase after the creation of the AST tree this gives a window of opportunity to tree-walk and set whether SimdSignals are LHS or RHS as determined by encountering SimdSignal.__Assign__
this "works" and does not cause "damage". i did however have to fix one of the unit tests where an assumption had been made ("ok to copy output of SimdSignal Cat onto a Signal") which does "work" (due to UserValue downcasting). https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=9a9db43f4cecf0a43e1390a4fb8fd6746776f433 https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=ad925fc12563d9097dd1b93df0e0f3dc033b00ad commit ad925fc12563d9097dd1b93df0e0f3dc033b00ad (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Oct 21 15:04:33 2021 +0100 continue truly awful hack which, in SimdSignal.__Assign__, detects the back-link to the submodule (PartitionedCat) in its return result, and calls set_lhs_mode(True) or (False) on LHS and RHS as appropriate. the default value is *NOT* set in the PartitionedCat constructor very very deliberately so as to show up any bugs. it is particularly fortunate that this was chosen to be done because there was, in fact, a bug in the TestCatMod unit test, which assumed that it was ok to splat a Cat() result of a pair of SimdSignals directly onto a Signal(). it *is* in fact "technically allowed" by nmigen due to automatic casting of UserValue, but should not strictly have been done.
confirmed: this is what happens: /home/lkcl/src/libresoc/ieee754fpu/src/ieee754/part/partsig.py:117: DriverConflict: Signal '(sig sig)' is driven from multiple fragments: top, top.pcat2; hierarchy will be flattened self.sig = Signal(*args, **kwargs) lkcl@fizzy:~/src/libresoc/ieee754fpu/src$ git diff --- a/src/ieee754/part/test/minitest_partsig.py +++ b/src/ieee754/part/test/minitest_partsig.py @@ -22,12 +22,19 @@ if __name__ == "__main__": + # RHS Cat m.d.comb += o.eq(Cat(a, b)) + # LHS Cat + m.d.comb += Cat(a1, b1).eq(o) commit ded76e2fe0d749ce1ef2930e21d7c6d74dfa3696 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Oct 21 16:50:01 2021 +0100 confirmed (in prototype form) that LHS Cat will cause conflict https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=ded76e2fe0d749ce1ef2930e21d7c6d74dfa3696
wow. just... wow. it actually frickin worked. thiis was all it took (after having the "hack" set is_lhs). +++ b/src/ieee754/part_cat/cat.py @@ -120,7 +122,10 @@ class PartitionedCat(Elaboratable): output.append(thing) with m.Case(pbit): # direct access to the underlying Signal - comb += self.output.sig.eq(Cat(*output)) + if self.is_lhs: + comb += Cat(*output).eq(self.output.sig) # LHS mode + else: + comb += self.output.sig.eq(Cat(*output)) # RHS mode print ("PartitionedCat end") return m commit bc4f03efdc4ae932f2650bec0807070398178aa6 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Oct 21 16:56:51 2021 +0100 add LHS support into PartitionedCat. amazingly - stunningly - it works https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=bc4f03efdc4ae932f2650bec0807070398178aa6
so, jacob, to recap there: * 1 line of code: a hack backlink was made in the PCat function which allowed the signal containing the result to know the submodule that *produced* that result * 4 lines of code: Partitioned Assign detected that hack and set a flag LHS/RHS * 3 lines of code: ParttitionedCat based on that flag performs a LHS Cat instead of a RHS Cat. eight - EIGHT - lines of code (not 400 replacing and regressing existing classes and setting us back over a week) excluding explanation, exploration, unit tests and comments, all of which far exceed the additions themselves. it doesn't have to be pretty, it doesn't need to fulfil future needs, it has to do what is immediately needed in the shortest timescale with the least fuss and in a way that at no time destroys confidence in or undermines preexisting code. turns out that PAssign does not need to worry about lengths, i have no idea why but am not in the least bit going to be concerned about it. i suspect that on investigation it will turn out that nmigen does deal with two assigns somehow, by identifying the in/out direction.
Does it pass: a = SimdSignal(...) b = SimdSignal(...) c = SimdSignal(...) d = SimdSignal(...) cat = Cat(a, b) m.d.comb += cat.eq(c) m.d.comb += d.eq(cat) I expect not, since you fell into the trap that each PartitionedCat would only ever be used on the lhs or the rhs, not both. SwizzledSimdValue correctly handles this. Also, does it handle: a = SimdSignal(...) b = SimdSignal(...) c = SimdSignal(...) d = SimdSignal(...) cat = Cat(a, b) m.d.comb += Cat(cat, c).eq(d) # two stages of Cat I also expect not, SwizzledSimdValue also correctly handles this case, because it is designed to handle assignments through any number of Cat and Slice calls.
(In reply to Luke Kenneth Casson Leighton from comment #12) > jacob: you damaged the code by adding in a NotImplemented exception > behind SimdSignal.__Slice__ That's because, at the time, nmigen didn't call __Slice__ for __getitem__ (or so I thought). Apparently I was on an old checkout of nmigen, before: https://git.libre-soc.org/?p=nmigen.git;a=commit;h=ccfd17f885a34062bbe2bd155a4a71690f5827c6
(In reply to Luke Kenneth Casson Leighton from comment #10) > (In reply to Jacob Lifshay from comment #8) > > (In reply to Luke Kenneth Casson Leighton from comment #0) > > > Strategy / Incremental work plan: > > > > reordering a bit for clarity: > > > > > 2) incremental step of adding "mirror (trmporary) storage classes" > > > > SwizzledSimdValue *is* that temporary storage class, it is just a little > > different than what you envisioned here: > > unfortunately, it is much more than that. it does that job *and* replaces > other code which has unit tests done Those unit tests can be reused... > *and* is a regression on capability > of existing code how so? What can existing code do that SwizzledSimdValue can't (after Cat and Slice are implemented in terms of it and it's todos are done)? As far as I can tell, SwizzledSimdValue can do everything existing code can, and more. > *and* is not a drop-in replacement for existing code. I'm justifying that because I reasoned that the existing Cat code was insufficient for LHS use, and because combining Cat and Slice (and possibly Repl) all into one general class greatly simplifies the logic, as well as generates far more optimal code (a requirement if we want our cpu to actually be low-area/low-power when using simd alus; I don't want us to have to discard all of SimdSignal as waay too inefficient). For things like Cat(a[30:50][2:10:2], b, Repl(c[23], 5)): SimdSignal will generate 8(!) layers of muxes (on the order of 2-4000 gates!), whereas SwizzledSimdValue will generate only 1 layer (since that's what it always generates) (I'd guess at most 500 gates, for a 64-bit output SimdSignal with 4 possible elwid values -- generating a single 4-in 64-bit mux). > > it is also an optimisation and it goes too far. > > we absolutely have to be extremely strict about this and keep to > an incremental development strategy at all times. > > if that class is to be used it needs to be scheduled as its own > incremental update with full unit tests and full functionality and > redesigned as a dropin replacement. > > given that the existing classes are functional Which they're not, Cat doesn't work for nested use on lhs, and doesn't work when the same Cat instance is used both on lhs and rhs, Slice isn't implemented at all. All of those can be quickly implemented in terms of SwizzledSimdValue (I already posted an implementation of Cat previously that handles *all* of those requirements by virtue of letting *already implemented* SwizzledSimdValue do all the heavy lifting). Here's __Slice__ (SimdSignal method): def __Slice__(self, start, stop): # convert to SwizzledSimdValue with same value lhs_v = SwizzledSimdValue.from_simd_signal(self) # basically a wrapper around elwid, SwizzleKey can probably # be fully replaced with PartType once it grows more methods kinda_elwid = lhs_v.swizzle_key lhs_layout = get_layout(lhs_v) layout = compute_slice_layout(lhs_v, start, stop) # dict of swizzles for each kinda_elwid value swizzles = {} for k in kinda_elwid.possible_values: # make `width` bit wide Swizzle initialized to const zeros swizzle = Swizzle.from_const(0, layout.width) # get lhs swizzle lhs = lhs_v.swizzles[k] for lhs_el, el in zip(lhs_layout.elements(k), layout.elements(k)): # Slice the list of bits for this element, and # assign to slice of target Swizzle.bits. bits = lhs.bits[lhs_el.slice][start:stop] swizzle.bits[el.slice] = bits # above should probably be refactored into using # Swizzle.__setitem__ method swizzles[k] = swizzle return SwizzledSimdValue(kinda_elwid, swizzles) > this firmly places > it into "not essential right now and in fact detrimental to urgent > timescales" territory. The above firmly places it into *faster* to working code territory, cuz I already wrote basically all the code. > which has been discussed and explained. > > > > * use as rhs: its way of knowing when it's being used as the rhs is by > > trapping other code's accesses of SwizzledSimdValue.sig, and lazily adding > > the submodule that computes .sig's value. This way, no other code needs to > > operate differently when using SwizzledSimdValue as a rhs, it looks like a > > completely normal SimdSignal. > > * use as lhs: it overrides __Assign__ so doesn't use PartitionedAssign, > > instead returning a list of nmigen Assigns, therefore PartitionedAssign > > doesn't need to be aware of SwizzledSimdValue at all. > > if you had separated that out as an incremental update it could have > been considered. Well, luke, you have to realize that not all things can be implemented incrementally, and that forcing them to occur incrementally when not suited to that will take waay longer and the diffs will not actually make much sense.
(In reply to Jacob Lifshay from comment #18) > Does it pass: > > a = SimdSignal(...) > b = SimdSignal(...) > c = SimdSignal(...) > d = SimdSignal(...) > > cat = Cat(a, b) > > m.d.comb += cat.eq(c) > m.d.comb += d.eq(cat) no, because there aren't any locations where that's used (because the practice of multiple-use of expensive AST expressions as python variables i learned very early on results in duplicated HDL) > I expect not, since you fell into the trap that each PartitionedCat would > only ever be used on the lhs or the rhs, not both. ... not a trap per se: i entirely avoid the practice throughout the entirety of the code and will, if i see it being used, request that it be fixed as a severe bug (or be commented in exceptional circumstances) if it turns out to be necessary, there is a simple fix: identify whether set_lhs_mode has been called or not, if it's an incompatible {LR}HS, and if so, duplicate the expression prior to using it. this is about 6-8 lines of code, not 400. > Also, does it handle: > a = SimdSignal(...) > b = SimdSignal(...) > c = SimdSignal(...) > d = SimdSignal(...) > > cat = Cat(a, b) > > m.d.comb += Cat(cat, c).eq(d) # two stages of Cat no, and i explained already that this is, again, something that is not used (double-cat), because, again, i learned very quickly (2 years ago) that using python assignments is not a good idea. therefore i know that there's no code that does this, therefore it is not needed right now, therefore there is no point worrying about it and certainly not getting into discussions that delay progress more than progress has already been delayed by trying to work in version 2 code and concepts into version 1. > I also expect not, SwizzledSimdValue also correctly handles this case, > because it is designed to handle assignments through any number of Cat and > Slice calls. that's great - if it were needed for version 1.0 which is the "pragmatic get-it-done-as-quickly-as-possible and learn from the exercise" version. i know what you're doing. you're solving problems ahead-of-time. this is what i used to do. you feel, "this person isn't listening to what i'm saying therefore i'm just going to go ahead and do it". the problem with that approach is that things emerge retrospectively. i didn't know that you'd designed something that solved issues i hadn't even thought about or understood, yet. an incremental approach on the other hand allows me to understand what you are trying to communicate, in terms of what i already understand. the two cases you've noted - if in the extremely unlikely event that they are needed - can be fixed by adding the capability to the existing code.
(In reply to Jacob Lifshay from comment #20) > how so? What can existing code do that SwizzledSimdValue can't (after Cat > and Slice are implemented in terms of it and it's todos are done)? it doesn't use PartType. > I'm justifying that because I reasoned that the existing Cat code was > insufficient for LHS use, because you didn't trust the incremental approach that i described is essential for projects of this size and complexity. > and because combining Cat and Slice (and possibly > Repl) all into one general class greatly simplifies the logic, but abandons existing work which you resisted doing so i had to do it. if you had done it when i asked instead of resisting, then you would have been the one to make the decisions. instead, because you kept resisting and wanting to design a version 2.0 before version 1.0 had been completed, you didn't help write PartitionedCat etc. which then forced me into a position of doing it myself. once they're done, they're done. those then become the code that needs to be incrementally morphed. > as well as > generates far more optimal code (a requirement if we want our cpu to > actually be low-area/low-power when using simd alus; this is a good justification: it's also a version 2.0 justification. we are *seriously behind* jacob. how many times do i have to point this out. a version 1.0 is absolutely absolutely critical to moving the ALUs forward *regardless* of how inefficient version 1.0 is. please get this through to your understanding that this is urgent, urgent, urgent. > I don't want us to have > to discard all of SimdSignal as waay too inefficient). > For things like Cat(a[30:50][2:10:2], b, Repl(c[23], 5)): > SimdSignal will generate 8(!) layers of muxes (on the order of 2-4000 > gates!), whereas SwizzledSimdValue will generate only 1 layer (since that's > what it always generates) (I'd guess at most 500 gates, for a 64-bit output > SimdSignal with 4 possible elwid values -- generating a single 4-in 64-bit > mux). that's great! [for version 2.0 and/or for a drop-in replacement along the way once we have multiple people working on ALU conversions and writing new ones] how many times do i have to repeat that this is urgent, urgent, urgent. discussing this multiple times is actually actively contributing to those very delays that i am trying to get you to stop creating. you are *aggravating* the project, not *assisting* the project right now by denying taking on board this lesson. > > it is also an optimisation and it goes too far. > > > > we absolutely have to be extremely strict about this and keep to > > an incremental development strategy at all times. > > > > if that class is to be used it needs to be scheduled as its own > > incremental update with full unit tests and full functionality and > > redesigned as a dropin replacement. > > > > given that the existing classes are functional > > Which they're not, Cat doesn't work for nested use on lhs, don't need it. > and doesn't work when the same Cat instance is used both on lhs and rhs, don't need it. > Slice isn't implemented at all. only because you haven't done what you've been asked to do! > All of those can be quickly implemented in terms of > SwizzledSimdValue (I already posted an implementation of Cat previously that > handles *all* of those requirements by virtue of letting *already > implemented* SwizzledSimdValue do all the heavy lifting). this can be done *AFTER* the urgent urgent urgent job of finishing all of the AST constructs has been we are holding up *three people* from helping on parallel tasks of converting ALUs right now. a "totally crap from your perspective" version 1 would allow unblocking of that critical path. *please listen* and help out you keep not listening and keep fighting this and not learning this extremely important Project Management lesson. we are on *critical path*. please listen. please do the tasks that are needed to get off of critical path as quickly as possible. please listen. > The above firmly places it into *faster* to working code territory, cuz I > already wrote basically all the code. which i have already told you goes *backwards* because it is not compatible with the existing API and functionality. which you haven't listened to me about because you've been trying to focus everything on version 2.0. please for goodness sake listen. > Well, luke, you have to realize that not all things can be implemented > incrementally, and that forcing them to occur incrementally when not suited > to that will take waay longer and the diffs will not actually make much > sense. this is a skill that i am attempting to teach you. it is absolutely essential that you learn it for a project of this magnitude and complexity. so please. will you please. follow the path that i've outlined. we are running out of time. we are holding up multiple people and multiple tasks.
(In reply to Jacob Lifshay from comment #20) > For things like Cat(a[30:50][2:10:2], b, Repl(c[23], 5)): > SimdSignal will generate 8(!) layers of muxes (on the order of 2-4000 > gates!), whereas SwizzledSimdValue will generate only 1 layer (since that's > what it always generates) ok. so - finally - it emerges what the purpose (and value) of SwizzledSimdValue is. anyone else who did not also have Asperger's would say something stupid (like they did to me, and i found it terribly discouraging), along the lines of, "well why the bloody hell didn't you explain this initially??" the idea - complete idea - sort-of "emerged" in your head, fully-formed, solving multiple problems at once, and the full list of things it solved were so overwhelming that you *couldn't* explain them all at once, could you? (i cannot count the number of times over the past 30 years that this happened to me. you are extremely lucky that i recognise it). so. what i learned to do was to say this: "i have a solution, but it solves so many things that i can't explain all of them immediately to you. even if i write this in 4 hours flat it will take about 2 days to 3 weeks for me to even grasp fully what *i* have solved in a way that allows me to explain it. i need to put together a proof-of-concept, which can be analysed retrospectively: is that ok?" because i recognise this syndrome ("ideas emerging fully-formed") i have no problem at all with you saying that. [corollary: where i _do_ have a problem is if you go ahead *without* saying that, because it severely disrupts planning and task prioritisation]. i will put together a new bugreport, and what i will do - and i need you to understand and accept this - is to put together a series of tasks outlining necessary test suites and other blocking tasks that it is *essential* that are completed first because SwizzledSimdValue is categorised as an "optimisation" not an "essential", where the existing classes "already do the job" and allow SimdSignal to be signed off and thus move off of critical path. those test suites - all of which can be added right now - do not have to pass with version 1 code: they do have to be written though. those unit tests will be essential to proving that the ALU code is not going to be disrupted by a drop-in replacement in the form of SwizzledSimdValue. in particular, the unit tests must include the 2 different PartTypes (one "existing" PartitionedSignal behaviour, covering all permutations, one "elwidth" behaviour. i expect SwizzledSimdValue to pass 100% all unit tests with *both* PartTypes, and that means that the unit tests need to be augmented - beforehand - to show that *existing* SimdSignal with both types of behaviour - also work 100%. i also have no problem at all with you putting in "this is planned to be replaced with version 2 code, see bugreport XYZ" *as long as you actually help do the version 1 code*. (edit: done. bug #732)
i just realised: when set_lhs_mode() is called, the callee may likewise do exactly the sane thing, with its parameters.