PartitionedSignal is a dynamic SIMD version of Signal. it needs to work with the following constructs: * m.If / Elif / Else * m.Switch / Case without PartitionedSignal, SIMD has to be done as follows: with m.If(partition == 64): operations treating Signals at full 64 bit with m.Elif(partition == 2x32bit): for i in range(2): exactly the same code as 64 bit except now it is repeated twice, first on the low 32 bits and then on the hi 32 and that repetition continues right the way down to 8 bit, complicating design massively. PartitionedSignal hides that entirely, as far as arithmetic and logic operations are concerned (__lt__, __or__, see bug #132) where things break down is this: ppts = PartitionPoints(64, 8) x = PartitionedSignal(64, partition=ppts) with m.If(x == 5): ... do something. the reason it breaks down is because PartitionedSignal.__eq__ does *not* return a single bit object, it returns a *dynamic multi-bit* object that dynamically reflects the current state of the partitioning. example: * partition points is set to 8 breaks * this subdivides the 64 bit signal into 8 separate 8 bit values * comparison against a constant (5) creates EIGHT separate 8-bit comparisons * those 8 comparisons are bundled together when the partitions are cleared to indicate that the 64 bits are to be treated as a single 64 bit value, the comparison "5" is done against the entire 64 bit, however the answer still goes into the same 8 bit object used when the partition was set to 8x8bit. the m.If construct cannot cope with this. the current workaround is to completely avoid using m.If entirely and to use a special PartitionedMux construct, PMux, instead. this however is less than ideal. similar logic applies to Switch/Case. Arrays need some discussion.
https://github.com/nmigen/nmigen/blob/59ef6e6a1c4e389a41148554f2dd492328820ecd/nmigen/hdl/dsl.py#L447 ah ha! this is the location where the code-fragments for m.If/Elif/Else are generated, and it should be really pretty straightforward to drop the appropriate replacement (parallel, SIMD) statements in at this location, with a replacement (override) of Module._pop_ctrl
more context: https://freenode.irclog.whitequark.org/nmigen/2020-12-06#28537093;
relevant page for describing the dynamic partitioned signal operators https://libre-soc.org/3d_gpu/architecture/dynamic_simd/
taking a look at this more closely, i realised that Value (and UserValue) will need some tweaks. at present, the code does this: class Value: def part(....) return Part(self, ...) where Part is a global function. in order to support override capability this needs to change to: class Value: def part(....) return ValuePart(self, ...) where Part is defined as: def Part(lhs, ....): return lhs.part(....) and the *current* Part function is *renamed* to ValuePart. UserValue then does: class UserValue: def part(....) return UserValuePart(self, ...) or more likely also calls ValuePart but it is clearly documented that user-derived overrides are perfectly well permitted. PartitionedSignal would do *exactly that*. then, a crucial addition to Value and UserValue would be: def mux(self, choice1, choice2): return ValueMux(self, choice1, choice2) and PartitionedSignal could override it to provide the correct dynamic SIMD functionality.
http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-September/003750.html carrying on from there, i am slowly recovering the analysis and state from many months back. line 437: if if_test is not None: if_test = Value.cast(if_test) if len(if_test) != 1: if_test = if_test.bool() that was what i was talking about might need a function "isconsideredbool" instead of len iftest !=1 line 447: self._statements.append(Switch(Cat(tests), cases, src_loc=src_loc, case_src_locs=dict(zip(cases, if_src_locs)))) the Switch there can be Value.__Switch__ and that function call standard Switch. the PartitionedSignal.__Switch__ variant instead does: for i, pbit in enumerate(self.partition_bits): totest = Part(test, i) ... and creates *multiple* switch/case statements, each of which takes a chunk at a time rather than does the entire lot as one hit. 8 paritions means *EIGHT* completely separate switch/cases, one for each partition. the reason why this works is because the HDL gates are going to be there anyway, and the rule for PartitionedBool is that ALL bits have to be set across the whole partition. therefore, each "column" in the partition receives a copy of the same "if" decision.
there's a couple of phases here: 1) redirection of Part (similar to operator.add(x, y) calling x.__add__(y) 2) redirection of Mux (same) 3) redirection of Switch (same) Part: a) existing Part renamed to InternalPart https://github.com/nmigen/nmigen/blob/59ef6e6a1c4e389a41148554f2dd492328820ecd/nmigen/hdl/ast.py#L785 any name will do. _InternalPart? _DefaultInternalPart? the idea here is *not* to disrupt the codebase, minimise changes, where actually moving the contents *of* Part into UserValue would flag that we are "making more changes than it actually is". if whitequark then asks for the merge, that's great. b) replacement for Part. real simple: def Part(lhs, *args): lhs.__part__(*args) this is exactly what is done in python operator module, follow the breadcrumbs.... :) c) add overrideable function to Value https://github.com/nmigen/nmigen/blob/59ef6e6a1c4e389a41148554f2dd492328820ecd/nmigen/hdl/ast.py#L133 a case can be made for calling it Value.__part__ because of the convention used by the python operator module. d) (later, TODO) write a PartitionedSignal.__part__() but this can be under a separate bugreport, separate budget. all of the others (Switch, Mux, Cat as well) can follow the same pattern. Cat() needs some special consideration. we have a nmigen repo copy, i suggest making a branch (sigh) and using that. means we also have to change all the documentation and scripts (argh) or (argh) work with a separate branch for anything depending on this... hmm, that might not be needed because the ieee754 PartitionedSignal class isn't used anywhere have to see how that goes.
separate comment about Cat() Cat() is a pain. what does it mean in a parallel / SIMD context to even attempt to concatenate a set of Signals or PartitionedSignals together? this is sufficiently complex i think it needs its own bugreport (TODO, edit... bug #707)
(In reply to Luke Kenneth Casson Leighton from comment #7) > this is sufficiently complex i think it needs its own bugreport > (TODO, edit... bug #707) worked it out. as long as only PartitionedSignals are concatenated it works perfectly fine, irrespective of the sizes of the PSes. if a Signal or Const is needed it is reasonable to require copying into a PartitionedSignal.
working through this, this afternoon (see main bug URL), although the parallel "columnisation" concept is fine, i.e. treating a parallelised SIMD Switch as a for-loop around "Lanes" of completely separate and independent Switch/Cases, one per partition, the *assignment* part of that - the action to be taken *in* each column - is more involved than i otiginslly envisaged, and that comes down not to the assignments themselves but to the fact that If and Switch statements *can be nested*. the original assumption i made was that the use of ast.Switch() by dsl.py would always create assignments within each case. this code: with m.If(iftest): comb += o.eq(c) with m.Elif(elsetest) comb += o eq(d) gets turned into: ast.Switch Cat(iftest, eliftest): case("1-"): comb += o.eq(c) case("-1"): comb += o.eq(d) which can be parallelised with a simple for-loop creating *multiple* Switch statements: for i in range(4partitions): ast.Switch Cat(iftest[i], eliftest[i]): case("1-"): comb += o[i].eq(c[i]) case("-1"): comb += o[i].eq(d[i]) the problem comes if there is a Switch inside a Switch. or, an m.If inside an m.If. i have no problem at all with limiting the nested depth to one and ONLY one in order to get this done as quickly as possible. this would involve analysing the first depth of all case statements, prior to creating the (independent) for-loops, checking that they are of type Assign, and creating a temporary Signal set (one for each assignment of each case) and, in each of the for-loop replace with *partial* assignment. o.eq(d) # d being potentially a complex expression o.eq(c) # likewise would be replaced with: tempin_d.eq(d) tempin_c.eq(c) o.eq(tempout) and the switch statement for-loop then morphed to: for i in range(4partitions): ast.Switch Cat(iftest[i], eliftest[i]): case("1-"): comb += tempout_o[i].eq(tempin_c[i]) case("-1"): comb += tempout_o[i].eq(tempin_d[i]) in and of itself this is a lot more involved that i first envisaged. hm. it involves analysing all Case statements LHS and RHS to create a global temporary substitution lookup table. also, anything such as "Cat(a,b,c).eq(.....)" will not be supported within an m.If or m.Switch, which is perfectly fine. this needs to be done as fast as possible, we are under time pressure now. complex analysis and complex "elegant, complete" solutions that take months to complete unfortunately are off the table. limited functionality is perfectly fine as a first pass, so we can get things moving, given that this is now on the critical path.
gaah. another idea, which may be simpler to implement and is less disruptive than temporary Signals *and* - maybe - has the potential to handle nested Switch/case/If: * override ast.Assign *as well* (PartitionedSignal.__Assign__, Value.__Assign__) * at every SIMD Switch perform a full tree-walk of all AST searching for Assign statements * append the test condition cases *onto the Assign* * merge (AND) all test conditions together * create a mini (leaf-node) Partitioned Switch with its own (private, leaf) for-loop, exactly as described in Comment #9 * on the (reasonable) assumption that a Partitioned Switch only contains Partitioned Signal Assignments, the base level (actual) Switch performs all its case statements unconditionally, on the basis that 100% of all switch/case Assignments are now at the leaf nodes. in other words, PartitionedSignal.__Assign__ becomes a private (leaf node) Switch-for-loop-merged-with-Assign the key strategic part is the merging (ANDing) together of all Switch conditions that were created by dsl.py when parsing the user's HDL. multiple nested Switches would have their conditions merged into a single point (a single switch). the actual Switch (back at the original PartitionedSignal.__Switch__ would *not* actually do anything at all except to output all of its case statements as unconditional statements, the assumption being that all Switching/Caseing has been moved entirely down to leaf-node level. what is a huge relief is that this would *still* not require a massive total replacement of dsl.py Module with a special variant suitable for PartitionedSignal only. it also has the advantage that, by merging of all tests, there is no combinatorial explosion caused by nested partitioned switch/case.
https://bugs.libre-soc.org/show_bug.cgi?id=709#c1 thoughts on the idea of a tree-walk down to leaf-node Assign statements: the PartitionedSignal.__Assign__ can be made *aware* of the Switch, and even help by having a hidden "Condition" Signal (which defaults to a Const of length equal to the number of partitions). instead of making a mess with temporary Signals as described at the end of Comment #9 the Switch hooks into the hidden "Condition" Signal on a per-partition basis. or, more to the point, PartitionedSignal.__Assign__ takes the list of switch/cases (which were put there as described in Comment #10), ANDs them together in the relevant partition, and sets the relevant Condition bit in the partition. by setting that Condition bit in each partition, it will either activate or prevent the actual assignment in that partition, but without making a mess. basically this is SIMD Predicate Masks, done in HDL.
fascinating and useful technical discussion with vup, with many thanks https://libera.irclog.whitequark.org/nmigen/2021-09-27
well, that could have gone better. turns out that whitequark has decided not to support concepts such as PartitionedSignal. this forces us into a position of making a hard fork of nmigen in order to support PartitionedSignal and other advanced concepts. we have no choice in the matter because we cannot further delay progress, Dynamic SIMD is now on the critical path. i have begun the hard fork branches and added prerequisite "merge" scripts that will bring in multiple (smaller) branches and allow progress to be made with minimal disruption and ongoing maintenance. https://git.libre-soc.org/?p=nmigen.git;a=summary
to explain the branches: * libresoc-nmigen-fork is a *target* branch, automatically updated by a script * libresoc-merger - the script for performing the merge is called ./branch_merger.sh and should be run for maintenance / releases / dev-releases / by developers. (TODO: update it as needed, to be more useful, including adding git pulls / git pushes) * libresoc-partsig - the branch related to PartitionedSignal. * display-patch - a useful patch for getting $display support in nmigen (which hasn't been added yet) the process is therefore: * git checkout libresoc-partsig * {do whatever} * {commit whatever} * git push libresoc which assumes this section in .git/config: [remote "libresoc"] url = gitolite3@git.libre-soc.org:nmigen.git fetch = +refs/heads/*:refs/remotes/origin/* * git checkout libresoc-merger ./branch_merger.sh * git push libresoc then back to libresoc-partsig to continue working.
(In reply to Luke Kenneth Casson Leighton from comment #13) > well, that could have gone better. turns out that whitequark has decided > not to support concepts such as PartitionedSignal. > > this forces us into a position of making a hard fork of nmigen in order to > support PartitionedSignal and other advanced concepts. well, if we make a wrapper class like: class PModule: def __init__(self, nmigen_module): self.nmigen_module = nmigen_module self.d = ... def If(self, cond): ... # ultimately call Module.If if needed ... we avoid needing a nmigen fork
(In reply to Jacob Lifshay from comment #15) > (In reply to Luke Kenneth Casson Leighton from comment #13) > > well, that could have gone better. turns out that whitequark has decided > > not to support concepts such as PartitionedSignal. > > > > this forces us into a position of making a hard fork of nmigen in order to > > support PartitionedSignal and other advanced concepts. > > well, if we make a wrapper class like: > class PModule: > def __init__(self, nmigen_module): > self.nmigen_module = nmigen_module > self.d = ... > def If(self, cond): > ... # ultimately call Module.If if needed > ... > > we avoid needing a nmigen fork ironically this type of approach gives the catastrophic and dangerously misleading impression that there is even a need or implication that the behaviour of the nmigen language has, needs, or requires, to be altered in order to support PartitionedSignal or other advanced data types. even the implication that the behaviour is changed by having an override of a function that fundamentally provides the core of the nmigen language is disastrous in the extreme. just like how SVP64 does not alter the behavioural characteristics of Power ISA v3.0B, PartitionedSignal or other advanced data type created by third parties should *in no way* alter the fundamental behaviour of m.If, m.Switch, m.FSM and so on. it is PartitionedSignal's responsibility to define what "Cat" means in the context of use of PartitionedSignals. it is not *Module*'s responsibility to define what "Cat" means in the context of use of PartitionedSignals. Module provides "intermediary" nmigen language characteristics (higher level than Cat, Mux, add, xor, Assign). even by creating an override of any of those intermediary language features would automatically mean that the fork has a unacceptably high maintenance burden. also: *there are* no functions that require overriding in dsl.Module! the entire functionality that requires overriding *is* in ast.Cat, ast.Mux, ast.Assign, not dsl.Module.
https://git.libre-soc.org/?p=nmigen.git;a=commitdiff;h=6a66c8a2283f031dbc7aec12bbf7b5b708e66469 Value.__Assign__, __Mux__ and __Switch__ added so far. Cat was not successful, will need investigating. it might need a static class function not a method, there. PartitionedSignal now derives from UserValue (this is temporary) but does not yet have __Mux__ etc. part_mux.py is independent, part_cat.py likewise, part_assign.py needs writing. part_add.py on the other hand and other arithmetic operations are linked in to respective operator-overrides (__add__ etc.)
https://git.libre-soc.org/?p=nmigen.git;a=commitdiff;h=113ccd91a27a6256656c44f6afc5035dd7328d43 __Cat__ was successful: it was just a little more complex than previously envisaged, requiring "intrusive inspection" of the first argument before proceeding with a Value.cast(), exactly as is done in (what is now called) _InternalCat. __Repl__ has also been added: https://git.libre-soc.org/?p=nmigen.git;a=commitdiff;h=160d23ee7ea78c3f2d522fc7daa1705bf8cb4f17 and PartitionedSignal.__Mux__ added: https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=52e20ac9e51992a34bbe7297c816162d2f480998 also a unit test for PartitionedSignal.__Mux__: https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=4369d65de14dff6a6a0c66c18e78ed2a299abece
a quick note about the idea floated in tuesday's meeting to use "wrappers". i have worked with python wrappers in the past: they turn out to be absolutely awful, creating massive slow-downs of at least 100%. also, the code that is almost 95% completed shows that wrappers are unnecessary. (In reply to Jacob Lifshay from comment #15) > we avoid needing a nmigen fork this is being dealt with by another route that does not involve throwing away 5+ months of work. we are under time and budget pressure. we cannot afford to throw away the work that has already been done.
https://git.libre-soc.org/?p=nmigen.git;a=blob;f=nmigen/hdl/dsl.py;h=982b3a574fd4aad475878b0ff9a927f11dae5a07;hb=refs/heads/libresoc-partsig#l184 an earlier analysis missed the fact that in both m.Switch and m.If/Elif, Value.cast was being used on the switch value and condition test, respectively. to achieve 100% abstraction of Type 2 (dsl.Module) from Type 1 (ast) nmigen language constructs, Module has to be told what Ast type it is permitted to use to cast to. this defaults to ast.Value on instantiating a Partition-aware dsl.Module, this argument may be set to PartitionedSignal (or, PartitionedBool if such a thing is ever created). this neatly solves the issue in _pop_ctrl where the tests were being Value.cast()ed followed by dropping to bool(), because with the AST type being optionally set to PartitionedSignal, that now becomes a PartitionedSignal.cast followed by a PartitionedSignal.bool() which of course is exactly the desired behaviour.
Repl needs similar treatment to Cat: a similar strategy as outlined here https://libre-soc.org/3d_gpu/architecture/dynamic_simd/cat/
(In reply to Luke Kenneth Casson Leighton from comment #21) > Repl needs similar treatment to Cat: a similar strategy as outlined > here https://libre-soc.org/3d_gpu/architecture/dynamic_simd/cat/ commit 4842290ca7444952b0a70984e8b5afa2d4a51fa0 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Oct 5 18:09:24 2021 +0100 add PartitionedRepl first version, no unit test just demo needs documenting though. based on Assign as it was slightly easier.
commit 423050fa77d75b84d0281f91a58764a688a3b3cb (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Oct 5 18:25:27 2021 +0100 add PartitionedRepl into PartitionedSignal.__Repl__ uses same auto-module-creation as PCat, PMux and PAssign
commit 2cc4385620a9386361948683d6927fc567890281 Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Oct 9 17:14:56 2021 +0100 add TestReplMod, under development commit 8db716e8227fad2ea78be493733fc598d8531679 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sun Oct 10 12:15:11 2021 +0100 fix SimdSignal Repl test (was previously unfinished)
done, using AST abstraction https://gitlab.com/nmigen/nmigen/-/merge_requests/9