Bug 458 - PartitionedSignal needs nmigen constructs "m.If", Switch etc
Summary: PartitionedSignal needs nmigen constructs "m.If", Switch etc
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Jacob Lifshay
URL: https://libre-soc.org/3d_gpu/architec...
Depends on: 708 734
Blocks: 132 362 732
  Show dependency treegraph
 
Reported: 2020-08-18 01:06 BST by Luke Kenneth Casson Leighton
Modified: 2021-10-22 14:46 BST (History)
4 users (show)

See Also:
NLnet milestone: NLNet.2019.10.Wishbone
total budget (EUR) for completion of task and all subtasks: 1250
budget (EUR) for this task, excluding subtasks' budget: 1250
parent task for budget allocation: 362
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
# uncomment once the budget is decided on... # commrnted since sum of 0 causes an error #awygle=0 #lkcl=0 #jacob=0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-08-18 01:06:55 BST
PartitionedSignal is a dynamic SIMD version of Signal.  it needs to work with the following constructs:

* m.If / Elif / Else
* m.Switch / Case

without PartitionedSignal, SIMD has to be done as follows:

with m.If(partition == 64):
    operations treating Signals at full 64 bit
with m.Elif(partition == 2x32bit):
    for i in range(2):
         exactly the same code as 64 bit
         except now it is repeated twice,
         first on the low 32 bits and then
         on the hi 32

and that repetition continues right the way down to 8 bit, complicating design massively.

PartitionedSignal hides that entirely, as far as arithmetic and logic operations are concerned (__lt__, __or__, see bug #132)

where things break down is this:

ppts = PartitionPoints(64, 8)
x = PartitionedSignal(64, partition=ppts)

with m.If(x == 5):
    ... do something.

the reason it breaks down is because PartitionedSignal.__eq__ does *not* return a single bit object, it returns a *dynamic multi-bit* object that dynamically reflects the current state of the partitioning.

example:

* partition points is set to 8 breaks
* this subdivides the 64 bit signal into 8 separate 8 bit values
* comparison against a constant (5) creates EIGHT separate 8-bit comparisons
* those 8 comparisons are bundled together

when the partitions are cleared to indicate that the 64 bits are to be treated as a single 64 bit value, the comparison "5" is done against the entire 64 bit, however the answer still goes into the same 8 bit object used when the partition was set to 8x8bit.

the m.If construct cannot cope with this.

the current workaround is to completely avoid using m.If entirely and to use a special PartitionedMux construct, PMux, instead.

this however is less than ideal.

similar logic applies to Switch/Case. Arrays need some discussion.
Comment 1 Luke Kenneth Casson Leighton 2020-12-06 01:06:06 GMT
https://github.com/nmigen/nmigen/blob/59ef6e6a1c4e389a41148554f2dd492328820ecd/nmigen/hdl/dsl.py#L447

ah ha!

this is the location where the code-fragments for m.If/Elif/Else are
generated, and it should be really pretty straightforward to drop the appropriate replacement (parallel, SIMD) statements in at this location, with a replacement (override) of Module._pop_ctrl
Comment 2 Luke Kenneth Casson Leighton 2020-12-06 17:15:06 GMT
more context:
https://freenode.irclog.whitequark.org/nmigen/2020-12-06#28537093;
Comment 3 Luke Kenneth Casson Leighton 2020-12-07 00:33:17 GMT
relevant page for describing the dynamic partitioned signal operators

https://libre-soc.org/3d_gpu/architecture/dynamic_simd/
Comment 4 Luke Kenneth Casson Leighton 2021-01-13 18:41:08 GMT
taking a look at this more closely, i realised that Value (and UserValue) will need some tweaks.
at present, the code does this:

class Value:
     def part(....)
         return Part(self, ...)

where Part is a global function.  in order to support override capability this needs to change to:

class Value:
     def part(....)
         return ValuePart(self, ...)

where Part is defined as:

def Part(lhs, ....):
    return lhs.part(....)

and the *current* Part function is *renamed* to ValuePart.

UserValue then does:

class UserValue:
     def part(....)
         return UserValuePart(self, ...)

or more likely also calls ValuePart but it is clearly documented that user-derived overrides are perfectly well permitted.

PartitionedSignal would do *exactly that*.

then, a crucial addition to Value and UserValue would be:

    def mux(self, choice1, choice2):
         return ValueMux(self, choice1, choice2)

and PartitionedSignal could override it to provide the correct dynamic SIMD functionality.
Comment 5 Luke Kenneth Casson Leighton 2021-09-21 18:48:41 BST
http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-September/003750.html

carrying on from there, i am slowly recovering the analysis and state
from many months back.


line 437:
                if if_test is not None:
                    if_test = Value.cast(if_test)
                    if len(if_test) != 1:
                        if_test = if_test.bool()


that was what i was talking about might need a function "isconsideredbool"
instead of len iftest !=1

line 447:


            self._statements.append(Switch(Cat(tests), cases,
                src_loc=src_loc, case_src_locs=dict(zip(cases, if_src_locs))))

the Switch there can be Value.__Switch__ and that function
call standard Switch.

the PartitionedSignal.__Switch__ variant instead does:

    for i, pbit in enumerate(self.partition_bits):
        totest = Part(test, i)
        ...

and creates *multiple* switch/case statements, each of which takes
a chunk at a time rather than does the entire lot as one hit.

8 paritions means *EIGHT* completely separate switch/cases, one for
each partition.

the reason why this works is because the HDL gates are going to be
there anyway, and the rule for PartitionedBool is that ALL bits
have to be set across the whole partition.

therefore, each "column" in the partition receives a copy
of the same "if" decision.
Comment 6 Luke Kenneth Casson Leighton 2021-09-23 20:39:36 BST
there's a couple of phases here:

1) redirection of Part (similar to operator.add(x, y) calling x.__add__(y)

2) redirection of Mux (same)

3) redirection of Switch (same)

Part:

a) existing Part renamed to InternalPart

https://github.com/nmigen/nmigen/blob/59ef6e6a1c4e389a41148554f2dd492328820ecd/nmigen/hdl/ast.py#L785

any name will do. _InternalPart? _DefaultInternalPart?

the idea here is *not* to disrupt the codebase, minimise
changes, where actually moving the contents *of* Part into
UserValue would flag that we are "making more changes than
it actually is".

if whitequark then asks for the merge, that's great.

b) replacement for Part.

real simple:

def Part(lhs, *args): lhs.__part__(*args)

this is exactly what is done in python operator module,
follow the breadcrumbs.... :)

c) add overrideable function to Value

https://github.com/nmigen/nmigen/blob/59ef6e6a1c4e389a41148554f2dd492328820ecd/nmigen/hdl/ast.py#L133

a case can be made for calling it Value.__part__ because of the convention used by the python operator module.

d) (later, TODO) write a PartitionedSignal.__part__()

but this can be under a separate bugreport, separate budget.


all of the others (Switch, Mux, Cat as well) can follow the
same pattern.  Cat() needs some special consideration.



we have a nmigen repo copy, i suggest making a branch (sigh)
and using that.  means we also have to change all the documentation
and scripts (argh) or (argh) work with a separate branch for
anything depending on this... hmm, that might not be needed
because the ieee754 PartitionedSignal class isn't used anywhere
have to see how that goes.
Comment 7 Luke Kenneth Casson Leighton 2021-09-23 20:42:03 BST
separate comment about Cat()

Cat() is a pain.

what does it mean in a parallel / SIMD context to even attempt to
concatenate a set of Signals or PartitionedSignals together?
this is sufficiently complex i think it needs its own bugreport
(TODO, edit... bug #707)
Comment 8 Luke Kenneth Casson Leighton 2021-09-24 11:21:34 BST
(In reply to Luke Kenneth Casson Leighton from comment #7)

> this is sufficiently complex i think it needs its own bugreport
> (TODO, edit... bug #707)

worked it out.  as long as only PartitionedSignals are concatenated
it works perfectly fine, irrespective of the sizes of the PSes.
if a Signal or Const is needed it is reasonable to require copying
into a PartitionedSignal.
Comment 9 Luke Kenneth Casson Leighton 2021-09-26 17:11:34 BST
working through this, this afternoon (see main bug URL),
although the parallel "columnisation" concept is
fine, i.e. treating a parallelised SIMD Switch
as a for-loop around "Lanes" of completely separate
and independent Switch/Cases, one per partition,
the *assignment* part of that - the action to
be taken *in* each column - is more involved than
i otiginslly envisaged, and that comes down
not to the assignments themselves but to the
fact that If and Switch statements *can be nested*.

the original assumption i made was that the use
of ast.Switch() by dsl.py would always create
assignments within each case. this code:

     with m.If(iftest):
          comb += o.eq(c)
     with m.Elif(elsetest)
          comb += o eq(d)

gets turned into:

     ast.Switch Cat(iftest, eliftest):
          case("1-"):
              comb += o.eq(c)
          case("-1"):
              comb += o.eq(d)

which can be parallelised with a simple for-loop
creating *multiple* Switch statements:

     for i in range(4partitions):
        ast.Switch Cat(iftest[i], eliftest[i]):
          case("1-"):
              comb += o[i].eq(c[i])
          case("-1"):
              comb += o[i].eq(d[i])

the problem comes if there is a Switch inside a Switch.
or, an m.If inside an m.If.

i have no problem at all with limiting the nested depth
to one and ONLY one in order to get this done as quickly as
possible.

this would involve analysing the first depth of all
case statements, prior to creating the (independent)
for-loops, checking that they are of type Assign,
and creating a temporary Signal set (one for each
assignment of each case) and, in each of the for-loop
replace with *partial* assignment.

    o.eq(d) # d being potentially a complex expression
    o.eq(c) # likewise

would be replaced with:

    tempin_d.eq(d)
    tempin_c.eq(c)
    o.eq(tempout)

and the switch statement for-loop then
morphed to:

     for i in range(4partitions):
        ast.Switch Cat(iftest[i], eliftest[i]):
          case("1-"):
              comb += tempout_o[i].eq(tempin_c[i])
          case("-1"):
              comb += tempout_o[i].eq(tempin_d[i])

in and of itself this is a lot more involved that i first envisaged.
hm.  it involves analysing all Case statements LHS and RHS to create
a global temporary substitution lookup table.

also, anything such as "Cat(a,b,c).eq(.....)" will not be supported
within an m.If or m.Switch, which is perfectly fine.

this needs to be done as fast as possible, we are under time pressure now.
complex analysis and complex "elegant, complete" solutions that take months to complete unfortunately are off the table.

limited functionality is perfectly fine as a first pass, so we can get things
moving, given that this is now on the critical path.
Comment 10 Luke Kenneth Casson Leighton 2021-09-26 17:52:03 BST
gaah.  another idea, which may be simpler to implement and is less disruptive than temporary Signals *and* - maybe - has the potential to handle nested Switch/case/If:

* override ast.Assign *as well* (PartitionedSignal.__Assign__,
  Value.__Assign__)
* at every SIMD Switch perform a full tree-walk of all AST searching
  for Assign statements
* append the test condition cases *onto the Assign*
* merge (AND) all test conditions together
* create a mini (leaf-node) Partitioned Switch with its
  own (private, leaf) for-loop, exactly as described
  in Comment #9
* on the (reasonable) assumption that a Partitioned Switch
  only contains Partitioned Signal Assignments, the base level
  (actual) Switch performs all its case statements unconditionally,
  on the basis that 100% of all switch/case Assignments are now
  at the leaf nodes.

in other words, PartitionedSignal.__Assign__ becomes
a private (leaf node) Switch-for-loop-merged-with-Assign

the key strategic part is the merging (ANDing) together of all Switch conditions that were created by dsl.py when parsing the user's HDL.

multiple nested Switches would have their conditions
merged into a single point (a single switch).

the actual Switch (back at the original PartitionedSignal.__Switch__
would *not* actually do anything at all except
to output all of its case statements as unconditional statements,
the assumption being that all Switching/Caseing has been moved
entirely down to leaf-node level.

what is a huge relief is that this would *still* not require
a massive total replacement of dsl.py Module with a special
variant suitable for PartitionedSignal only.

it also has the advantage that, by merging of all
tests, there is no combinatorial
explosion caused by nested partitioned switch/case.
Comment 11 Luke Kenneth Casson Leighton 2021-09-27 13:31:21 BST
https://bugs.libre-soc.org/show_bug.cgi?id=709#c1

thoughts on the idea of a tree-walk down to leaf-node
Assign statements: the PartitionedSignal.__Assign__
can be made *aware* of the Switch, and even help by having
a hidden "Condition" Signal (which defaults to a Const
of length equal to the number of partitions).

instead of making a mess with temporary Signals as described
at the end of Comment #9 the Switch hooks into the hidden "Condition"
Signal on a per-partition basis.

or, more to the point, PartitionedSignal.__Assign__ takes the
list of switch/cases (which were put there as described in Comment #10),
ANDs them together in the relevant
partition, and sets the relevant Condition bit in the
partition.

by setting that Condition bit in each partition, it will either
activate or prevent the actual assignment in that partition,
but without making a mess.

basically this is SIMD Predicate Masks, done in HDL.
Comment 12 Luke Kenneth Casson Leighton 2021-09-27 14:44:36 BST
fascinating and useful technical discussion with vup, with many thanks
https://libera.irclog.whitequark.org/nmigen/2021-09-27
Comment 13 Luke Kenneth Casson Leighton 2021-09-27 19:42:19 BST
well, that could have gone better. turns out that whitequark has decided
not to support concepts such as PartitionedSignal.

this forces us into a position of making a hard fork of nmigen in order to
support PartitionedSignal and other advanced concepts.

we have no choice in the matter because we cannot further delay progress,
Dynamic SIMD is now on the critical path.

i have begun the hard fork branches and added prerequisite "merge"
scripts that will bring in multiple (smaller) branches and allow
progress to be made with minimal disruption and ongoing maintenance.

https://git.libre-soc.org/?p=nmigen.git;a=summary
Comment 14 Luke Kenneth Casson Leighton 2021-09-27 20:55:37 BST
to explain the branches:

* libresoc-nmigen-fork is a *target* branch, automatically updated by
  a script
* libresoc-merger - the script for performing the merge is called
  ./branch_merger.sh and should be run for maintenance / releases /
  dev-releases / by developers.
  (TODO: update it as needed, to be more useful, including adding
   git pulls / git pushes)
* libresoc-partsig - the branch related to PartitionedSignal.
* display-patch - a useful patch for getting $display support in nmigen
  (which hasn't been added yet)

the process is therefore:

* git checkout libresoc-partsig
* {do whatever}
* {commit whatever}
* git push libresoc which assumes this section in .git/config:
   [remote "libresoc"]
       url = gitolite3@git.libre-soc.org:nmigen.git
       fetch = +refs/heads/*:refs/remotes/origin/*
* git checkout libresoc-merger
  ./branch_merger.sh
* git push libresoc

then back to libresoc-partsig to continue working.
Comment 15 Jacob Lifshay 2021-09-27 21:04:08 BST
(In reply to Luke Kenneth Casson Leighton from comment #13)
> well, that could have gone better. turns out that whitequark has decided
> not to support concepts such as PartitionedSignal.
> 
> this forces us into a position of making a hard fork of nmigen in order to
> support PartitionedSignal and other advanced concepts.

well, if we make a wrapper class like:
class PModule:
    def __init__(self, nmigen_module):
        self.nmigen_module = nmigen_module
        self.d = ...
    def If(self, cond):
        ... # ultimately call Module.If if needed
    ...

we avoid needing a nmigen fork
Comment 16 Luke Kenneth Casson Leighton 2021-09-27 21:39:27 BST
(In reply to Jacob Lifshay from comment #15)
> (In reply to Luke Kenneth Casson Leighton from comment #13)
> > well, that could have gone better. turns out that whitequark has decided
> > not to support concepts such as PartitionedSignal.
> > 
> > this forces us into a position of making a hard fork of nmigen in order to
> > support PartitionedSignal and other advanced concepts.
> 
> well, if we make a wrapper class like:
> class PModule:
>     def __init__(self, nmigen_module):
>         self.nmigen_module = nmigen_module
>         self.d = ...
>     def If(self, cond):
>         ... # ultimately call Module.If if needed
>     ...
> 
> we avoid needing a nmigen fork

ironically this type of approach gives the catastrophic
and dangerously misleading impression that there is even a
need or implication that the behaviour of the nmigen language
has, needs, or requires, to be altered in order to support
PartitionedSignal or other advanced data types.

even the implication that the behaviour is changed by having
an override of a function that fundamentally provides the
core of the nmigen language is disastrous in the extreme.

just like how SVP64 does not alter the behavioural characteristics
of Power ISA v3.0B, PartitionedSignal or other advanced data
type created by third parties should *in no way* alter the
fundamental behaviour of m.If, m.Switch, m.FSM and so on.

it is PartitionedSignal's responsibility to define what "Cat"
means in the context of use of PartitionedSignals.

it is not *Module*'s responsibility to define what "Cat" means
in the context of use of PartitionedSignals.

Module provides "intermediary" nmigen language characteristics
(higher level than Cat, Mux, add, xor, Assign).

even by creating an override of any of those intermediary
language features would automatically mean that the fork has a
unacceptably high maintenance burden.

also: *there are* no functions that require overriding in
dsl.Module!

the entire functionality that requires overriding *is*
in ast.Cat, ast.Mux, ast.Assign, not dsl.Module.
Comment 17 Luke Kenneth Casson Leighton 2021-09-27 22:11:56 BST
https://git.libre-soc.org/?p=nmigen.git;a=commitdiff;h=6a66c8a2283f031dbc7aec12bbf7b5b708e66469

Value.__Assign__, __Mux__ and __Switch__ added so far.  Cat was not successful,
will need investigating.  it might need a static class function not a method, there.

PartitionedSignal now derives from UserValue (this is temporary)
but does not yet have __Mux__ etc.

part_mux.py is independent, part_cat.py likewise, part_assign.py needs writing.
part_add.py on the other hand and other arithmetic operations are linked
in to respective operator-overrides (__add__ etc.)
Comment 18 Luke Kenneth Casson Leighton 2021-09-28 18:07:55 BST
https://git.libre-soc.org/?p=nmigen.git;a=commitdiff;h=113ccd91a27a6256656c44f6afc5035dd7328d43

__Cat__ was successful: it was just a little more complex than previously
envisaged, requiring "intrusive inspection" of the first argument before
proceeding with a Value.cast(), exactly as is done in (what is now called)
_InternalCat.

__Repl__ has also been added:

https://git.libre-soc.org/?p=nmigen.git;a=commitdiff;h=160d23ee7ea78c3f2d522fc7daa1705bf8cb4f17

and PartitionedSignal.__Mux__ added:

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=52e20ac9e51992a34bbe7297c816162d2f480998

also a unit test for PartitionedSignal.__Mux__:

https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=4369d65de14dff6a6a0c66c18e78ed2a299abece
Comment 19 Luke Kenneth Casson Leighton 2021-09-30 16:03:56 BST
a quick note about the idea floated in tuesday's meeting to use "wrappers".

i have worked with python wrappers in the past: they turn out to be absolutely
awful, creating massive slow-downs of at least 100%.

also, the code that is almost 95% completed shows that wrappers are unnecessary.

(In reply to Jacob Lifshay from comment #15)

> we avoid needing a nmigen fork

this is being dealt with by another route that does not involve throwing
away 5+ months of work.

we are under time and budget pressure.

we cannot afford to throw away the work that has already been done.
Comment 20 Luke Kenneth Casson Leighton 2021-10-02 20:00:54 BST
https://git.libre-soc.org/?p=nmigen.git;a=blob;f=nmigen/hdl/dsl.py;h=982b3a574fd4aad475878b0ff9a927f11dae5a07;hb=refs/heads/libresoc-partsig#l184

an earlier analysis missed the fact that in both m.Switch and
m.If/Elif, Value.cast was being used on the switch value and
condition test, respectively.

to achieve 100% abstraction of Type 2 (dsl.Module) from Type 1 (ast)
nmigen language constructs, Module has to be told what Ast type
it is permitted to use to cast to.  this defaults to ast.Value

on instantiating a Partition-aware dsl.Module, this argument may
be set to PartitionedSignal (or, PartitionedBool if such a thing
is ever created).

this neatly solves the issue in _pop_ctrl where the tests were
being Value.cast()ed followed by dropping to bool(), because
with the AST type being optionally set to PartitionedSignal,
that now becomes a PartitionedSignal.cast followed by a
PartitionedSignal.bool() which of course is exactly the desired
behaviour.
Comment 21 Luke Kenneth Casson Leighton 2021-10-05 17:50:59 BST
Repl needs similar treatment to Cat: a similar strategy as outlined
here https://libre-soc.org/3d_gpu/architecture/dynamic_simd/cat/
Comment 22 Luke Kenneth Casson Leighton 2021-10-05 18:10:01 BST
(In reply to Luke Kenneth Casson Leighton from comment #21)
> Repl needs similar treatment to Cat: a similar strategy as outlined
> here https://libre-soc.org/3d_gpu/architecture/dynamic_simd/cat/

commit 4842290ca7444952b0a70984e8b5afa2d4a51fa0 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Oct 5 18:09:24 2021 +0100

    add PartitionedRepl first version, no unit test just demo

needs documenting though. based on Assign as it was slightly easier.
Comment 23 Luke Kenneth Casson Leighton 2021-10-05 18:25:54 BST
commit 423050fa77d75b84d0281f91a58764a688a3b3cb (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Oct 5 18:25:27 2021 +0100

    add PartitionedRepl into PartitionedSignal.__Repl__
    uses same auto-module-creation as PCat, PMux and PAssign
Comment 24 Luke Kenneth Casson Leighton 2021-10-10 12:16:13 BST
commit 2cc4385620a9386361948683d6927fc567890281
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Oct 9 17:14:56 2021 +0100

    add TestReplMod, under development


commit 8db716e8227fad2ea78be493733fc598d8531679 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sun Oct 10 12:15:11 2021 +0100

    fix SimdSignal Repl test (was previously unfinished)