Bug 716 - PartitionedSignal Slice and Part needed for __getitem__
Summary: PartitionedSignal Slice and Part needed for __getitem__
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Jacob Lifshay
URL: https://libre-soc.org/3d_gpu/architec...
Depends on: 733
Blocks: 732
  Show dependency treegraph
 
Reported: 2021-10-02 13:06 BST by Luke Kenneth Casson Leighton
Modified: 2022-06-16 15:51 BST (History)
1 user (show)

See Also:
NLnet milestone: NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-10-02 13:06:27 BST
PartitionedSignal needs a way to select parts by standard python
x[3:5] and x[4] syntax, but the results returned to respect
SIMD (return a SIMD result)
Comment 1 Luke Kenneth Casson Leighton 2021-10-15 07:41:32 BST

like Value.abs and Value.implies, Value.__getitem__ is in
terms of other Value.xxxx operations.  a SimdSignal.__getitem__
is *not* required.  once SimdSignal.__Slice__ and part are written
simdsignal[start:end:step] should just magically "work"
as intuitively expected.

thank goodness Slice and Part take fixed constants only
(unlike Array)

the basic principle is that the submodule 
(call them slice.py and part.py) does all
the heavy lifting, and a "helper" (see pcat.py, 
prepl.py) is responsible for auto-instantiating an
on-demand instance of that submodule followed by
returning the output *of* that module, which, back
in the SimdSignal override, simply returns that
submodule output as *its* return result.

it's a pretty straightforward trick that only got
one scary moment for the Assign override but i managed
to work that out, thank goodness.

you _should_ be able to just use e.g. repl.py as a base
template, i did cat.py first then used it to create repl.py
dont use assign.py as the starting point it operates
differently (returns statements not a signal)

you can check the git history (commits are in bug #458)
for the sequence i did, cat.py first, pcat.py as an afterthought,
integrate submodule into partsig.py third and finally
unit test fourth (each as separate commits)

should be very straightforward, and by starting from
either cat.py or repl.py you will get to see how PartType
works.
Comment 2 Jacob Lifshay 2021-10-15 07:54:45 BST
I think we also need something to handle stuff like:
a[3:5].eq(b)
since assigning to a slice is quite common.
Comment 3 Luke Kenneth Casson Leighton 2021-10-15 08:22:06 BST
(In reply to Jacob Lifshay from comment #2)
> I think we also need something to handle stuff like:
> a[3:5].eq(b)
> since assigning to a slice is quite common.

this should work automatically because (i forgot to
emphasise) the return result from the submodule
(which is then a return result of SimdSignal.Slice/Part,
which in turn is then a return result of Value.__getitem__)
must be *another SimdSignal*.

that's very important.

have a close look at repl.py and prepl.py for how it
(very surprisingly) works.

the submodule is responsible for computing the width
of the output SimdSignal.  in the case of repl.py it
was dead easy, take the width of the input and multiply
it by the count.  cat.py likewise, just sum up the
total width of all input args.

for Slice and Part the width is easily calculated (statically)
thank goodness by analysing the slice which only
takes integers/Consts so can also be done in
the constructor.

in early revisions Michael and i forgot to make
the submodule return result a SimdSignal, we accidentally
returned a Signal().  if that mistake had not been
corrected then yes, things like Cat(simd11, simd2).eq(simd3)
would totally fail.

even things like simd1+simd2+simd3 would totally fail
because the first add would have returned a Signal
not another SimdSignal, and the 2nd add of simd3
would have produced utterly the wrong result.
Comment 4 Jacob Lifshay 2021-10-15 09:00:18 BST
(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> > I think we also need something to handle stuff like:
> > a[3:5].eq(b)
> > since assigning to a slice is quite common.
> 
> this should work automatically because (i forgot to
> emphasise) the return result from the submodule
> (which is then a return result of SimdSignal.Slice/Part,
> which in turn is then a return result of Value.__getitem__)
> must be *another SimdSignal*.

I think you're missing my point, which is that, for the following code:
a = SimdSignal(...)
b = SimdSignal(...)
c = Cat(a, b)
d = a[5]
m.d.comb += [c.eq(0), d.eq(0)]

c and d are the lhs of an assignment, which means that the signals *have to flow backwards* from c and d back to a and b.

From reading the existing source for PartitionedCat, PartitionedAssign, and SimdSignal, that case isn't handled at all (unless I'm missing something).
Comment 5 Jacob Lifshay 2021-10-15 09:03:14 BST
(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> > I think we also need something to handle stuff like:
> > a[3:5].eq(b)
> > since assigning to a slice is quite common.
> 
> this should work automatically because (i forgot to
> emphasise) the return result from the submodule
> (which is then a return result of SimdSignal.Slice/Part,
> which in turn is then a return result of Value.__getitem__)
> must be *another SimdSignal*.

That operators need to return SimdSignals is totally obvious to me...
Comment 6 Jacob Lifshay 2021-10-15 09:10:36 BST
(In reply to Jacob Lifshay from comment #5)
> (In reply to Luke Kenneth Casson Leighton from comment #3)
> > (In reply to Jacob Lifshay from comment #2)
> > > I think we also need something to handle stuff like:
> > > a[3:5].eq(b)
> > > since assigning to a slice is quite common.
> > 
> > this should work automatically because (i forgot to
> > emphasise) the return result from the submodule
> > (which is then a return result of SimdSignal.Slice/Part,
> > which in turn is then a return result of Value.__getitem__)
> > must be *another SimdSignal*.
> 
> That operators need to return SimdSignals is totally obvious to me...

actually, now that I think about it, Slice and Cat and Part have to return a SimdValue, not a SimdSignal, since otherwise a[5].eq(b) will assign to the signal that is a result of slicing, but the assignment won't propagate back to `a` since it won't know where to propagate to.
Comment 7 Luke Kenneth Casson Leighton 2021-10-15 21:06:09 BST
(In reply to Jacob Lifshay from comment #5)

> That operators need to return SimdSignals is totally obvious to me...

:)  the non-obvious bit is, "but... butbut... is that it??" :)

> actually, now that I think about it, Slice and Cat and Part have to return a
> SimdValue, not a SimdSignal, since otherwise a[5].eq(b) will assign to the
> signal that is a result of slicing, but the assignment won't propagate back
> to `a` since it won't know where to propagate to.

oo, oo, err... i need to think about that one.  this was part of the
mind-bending involving Assign.

... half-a-brainwave: i think it's ok... as long as the submodules entirely
use the comb domain (which they do), the netlist from LHS and RHS become
synonymous in effect, with the end result that an eq assignment should
propagate.

(this because unlike programming variables where assignment to LHS
overwrites RHS but a second assignment preserves the first, HDL comb
eq()s make the LHS and RHS "the same netlist"

   comb += x.eq(y)
   comb += y.eq(z)

x now == y now == z which would not be true if that was software.

i will have to think about that, or simply try a test Cat(x, y).eq(z)
which is doable because Cat and Assign are available.

ooo, it'd better work!
Comment 8 Luke Kenneth Casson Leighton 2021-10-15 22:00:44 BST
(In reply to Jacob Lifshay from comment #4)

> I think you're missing my point, which is that, for the following code:
> a = SimdSignal(...)
> b = SimdSignal(...)
> c = Cat(a, b)
> d = a[5]
> m.d.comb += [c.eq(0), d.eq(0)]
> 
> c and d are the lhs of an assignment, which means that the signals *have to
> flow backwards* from c and d back to a and b.
> 
> From reading the existing source for PartitionedCat, PartitionedAssign, and
> SimdSignal, that case isn't handled at all (unless I'm missing something).

well, one way to find out: try it :)  comb assignments should
(LHS or RHS) result in netlist merging, so i believe with a say... 40%
certainty it should work :]

it's not intuitively obvious, i appreciate, and early versions of
nmigen sim couldn't handle long assignment chains (whoops)
Comment 9 Jacob Lifshay 2021-10-16 02:26:31 BST
I added WIP code that will handle any combination of Slice and Cat, it can reduce it all down to a single layer of muxes. it also supports assignment to any combination of Slice and Cat of SimdSignals. I put it all in a swizzle.py file, that handles arbitrary bit-swizzles:
https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_swizzle/swizzle.py;h=44c1922f4893b03a1aa58f2225426191deadf037;hb=8692c384d1bab5a4dc86ccad5110c3db597b35a7

The basic idea is there's SwizzledSimdValue, a subclass of SimdSignal, that tracks where each bit goes from the original Signals for every possible value of mask/elwid (we really need to get elwid integrated into SimdSignal...), then we have each SwizzledSimdValue instance independently construct it's .sig member on-demand from the *original* Signals' bits, *instead of* from any other SwizzledSimdValue.sig.

When a SwizzledSimdValue is assigned to, it uses a completely different evaluation method: it constructs the full list of Signals that it's built out of, and returns the assignments that would reassign those full signals to the correct values.
Comment 10 Jacob Lifshay 2021-10-16 02:28:01 BST
(In reply to Jacob Lifshay from comment #9)
> I added WIP code that will handle any combination of Slice and Cat, it can
> reduce it all down to a single layer of muxes. it also supports assignment
> to any combination of Slice and Cat of SimdSignals. I put it all in a
> swizzle.py file, that handles arbitrary bit-swizzles:
> https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/
> part_swizzle/swizzle.py;h=44c1922f4893b03a1aa58f2225426191deadf037;
> hb=8692c384d1bab5a4dc86ccad5110c3db597b35a7

SwizzledSimdValue will effectively replace the existing Cat, since it's more powerful.
Comment 11 Jacob Lifshay 2021-10-16 02:30:53 BST
Assign for SwizzledSimdValue is 100% completed except for converting the rhs to the right SimdSignal shape.
Comment 12 Jacob Lifshay 2021-10-16 02:33:23 BST
(In reply to Luke Kenneth Casson Leighton from comment #8)
> well, one way to find out: try it :)  comb assignments should
> (LHS or RHS) result in netlist merging, so i believe with a say... 40%
> certainty it should work :]

I'm 99% certain that won't work, since yosys tries to emulate verilog's semantics, and the following doesn't work in verilog:
wire a, b, c;

# this doesn't connect a with c
always b = a;
always b = c;
Comment 13 Luke Kenneth Casson Leighton 2021-10-16 12:52:35 BST
(In reply to Jacob Lifshay from comment #12)
> (In reply to Luke Kenneth Casson Leighton from comment #8)
> > well, one way to find out: try it :)  comb assignments should
> > (LHS or RHS) result in netlist merging, so i believe with a say... 40%
> > certainty it should work :]
> 
> I'm 99% certain that won't work, since yosys tries to emulate verilog's
> semantics, and the following doesn't work in verilog:
> wire a, b, c;
> 
> # this doesn't connect a with c
> always b = a;
> always b = c;

so you didn't check it, with a simple test comprising 15-20 lines,
before writing nearly 400 and introducing new classes and concepts
that haven't been demonstrated to be needed.

you keep doing this.

now please can you do what was asked, which was to write a simple test.
Comment 14 Luke Kenneth Casson Leighton 2021-10-16 22:51:43 BST
(jacob, i have to share this with you: i too went through similar experiences during my early career.  i have done exactly these things.  i had to learn, and am happy to share the results with you.  for me, they were... hard knocks, and yet i actually ended up appreciating the fact that managers, other libre developers etc. actually took the time to explain things to me, because i understood that they were important.  i trust that you understand this too)
Comment 15 Luke Kenneth Casson Leighton 2021-10-21 12:11:47 BST
carrying over from 
https://bugs.libre-soc.org/show_bug.cgi?id=731#c10

Slice is straightforward to do because it is fixed consts.
*please do not use SimdSwizzle* it is too massive a jump
and throws away far too much code.

once Slice is done Part is (error checking aside) literally
1 line of code, using existing Simd shift operator and using
straight slices:

    return (self >> offset)[:width]

that's it.  incredibly simple.

yes agreed under no circumstances do we go anywhere near LHS Part
right now.
Comment 16 Luke Kenneth Casson Leighton 2021-10-21 12:28:57 BST
inferring how Part works from the difference between bit_select and word_select:

 413         if type(offset) is Const and isinstance(width, int):
 414             return self[offset.value * width:(offset.value + 1) * width]
 415         return Part(self, offset, width, stride=width, src_loc_at=1)

the one-line (key part of) Part i believe would be more like:

     return (self >> (offset*stride))[:width]

it is this aimple because under no circumstances are we doing LHS Part.
Comment 17 Luke Kenneth Casson Leighton 2021-10-23 18:41:42 BST
in the URL for this bugreport
https://libre-soc.org/3d_gpu/architecture/dynamic_simd/slice/

the question was raised, how the heck do you connect
between a Slice()d signal and a non-sliced one,
even when they use the same elwidth, because the
non-sliced one has no padding partitionpoints
and the sliced one does.

the answer is: an adapter submodule.

the exact details of how that works will emerge from
the examples, TBD.  i will write a couple now.
Comment 18 Luke Kenneth Casson Leighton 2021-10-23 20:06:27 BST
(In reply to Luke Kenneth Casson Leighton from comment #17)

> the answer is: an adapter submodule.
> 
> the exact details of how that works will emerge from
> the examples, TBD.  i will write a couple now.

oo. err and um. that's interesting.  an add of a 3 bit
sliced quantity with a 2 bit (element width=2bit) quantity
requires sign (or zero) extension prior to partition
expansion.

this is going to be interesting (translation: hairy)
Comment 19 Jacob Lifshay 2021-10-24 20:24:55 BST
(In reply to Luke Kenneth Casson Leighton from comment #18)
> (In reply to Luke Kenneth Casson Leighton from comment #17)
> 
> > the answer is: an adapter submodule.
> > 
> > the exact details of how that works will emerge from
> > the examples, TBD.  i will write a couple now.
> 
> oo. err and um. that's interesting.  an add of a 3 bit
> sliced quantity with a 2 bit (element width=2bit) quantity
> requires sign (or zero) extension prior to partition
> expansion.
> 
> this is going to be interesting (translation: hairy)

The answer to all of those conversions is to have a SimdSignal.cast method that handles conversion of an input nmigen Value or SimdSignal to the desired SimdShape:
Full conversion implementation:
class SimdSignal(...):
    @staticmethod
    def cast(value, *, shape=None, scope):
        # scope injected by calling this method as
        # scope.Signal_cast(value, shape=shape), which is a
        # partial() of SimdSignal.cast
        if shape is not None:
            assert shape.scope is scope
        if isinstance(value, SimdSignal):
            assert value.shape().scope is scope
            if shape is None or shape == value.shape():
                return value
        # I'm assuming SwizzleKey is already replaced by SimdScope.
        # from_value handles splatting a scalar nmigen Value if needed.
        src = SwizzledSimdValue.from_value(scope, value)
        swizzles = {}
        for src_el, dst_el in zip(src.shape().elements(),
                                  shape.elements()):
            elwid = src_el.elwid
            # SimdShape always iterates elements in the same order across all
            # SimdShapes for a particular scope.
            assert elwid == dst_el.elwid
            if elwid not in swizzles:
                swizzles[elwid] = Swizzle.from_const(0, shape.width)
            # get bits for current src_el
            el = src.swizzles[elwid][src_el.slice]
            if src_el.shape.signed: # *src* determines sign/zero extension
                # truncate/sign-extend
                el = el.convert_s_to(dst_el.shape)
            else:
                # truncate/zero-extend
                el = el.convert_u_to(dst_el.shape)
            # assign converted bits to dest
            swizzles[elwid].bits[dst_rl.slice] = el.bits
        # oops, I forgot shape argument for __init__
        return SwizzledSimdValue(scope, swizzles, shape=shape)

now, all ops can use:
ops_input_arg = SimdSignal.cast(ops_input_arg, shape=desired_shape,
                                scope=scope)

e.g.:
    def __add__(self, other):
        l_shape = self.shape()
        scope = l_shape.scope
        # splat scalars to SimdSignal
        other = SimdSignal.cast(other, scope=scope)
        r_shape = other.shape()
        def get_el_shape(l, r):
            # let nmigen compute result element's shape
            return (Const(0, l) + Const(0, r)).shape()
        el_shapes = SimdMap.map(get_el_shape, l_shape.el_shapes,
                                r_shape.el_shapes)
        shape = SimdShape(el_shapes, scope=scope)
        # could have a SimdShape.map method letting us write:
        # shape = SimdShape.map(get_el_shape, l_shape, r_shape)

        # convert to result's shape
        lhs = SimdSignal.cast(self, shape=shape, scope=scope)
        rhs = SimdSignal.cast(other, shape=shape, scope=scope)
        
        # do add
        return lhs.add_op(lhs, rhs, carry=0)
Comment 20 Luke Kenneth Casson Leighton 2021-10-24 20:55:08 BST
(In reply to Jacob Lifshay from comment #19)
> (In reply to Luke Kenneth Casson Leighton from comment #18)
> > (In reply to Luke Kenneth Casson Leighton from comment #17)
> > 
> > > the answer is: an adapter submodule.
> > > 
> > > the exact details of how that works will emerge from
> > > the examples, TBD.  i will write a couple now.
> > 
> > oo. err and um. that's interesting.  an add of a 3 bit
> > sliced quantity with a 2 bit (element width=2bit) quantity
> > requires sign (or zero) extension prior to partition
> > expansion.
> > 
> > this is going to be interesting (translation: hairy)
> 
> The answer to all of those conversions is to have a SimdSignal.cast method
> that handles conversion of an input nmigen Value or SimdSignal to the
> desired SimdShape:

ooo that makes sense.  like it.

Cat however still needs alteration: it can't cope with the
fact that the input(s) and output are fundamentally tied
together.

take e.g. a b and c to be Cat()ed together.  b creates holes
right in the middle of the result's PartitionPoints, that
critically depend on the size of a and c.  only PartitionedCat
itself can determine where to copy the bits of b into the
result.

bottom line is, SimdSignal.cast should work fine for one-to-one,
but not for many-to-one.  hopefully Cat() is just the weird
one here.
Comment 21 Jacob Lifshay 2021-10-24 21:00:48 BST
(In reply to Luke Kenneth Casson Leighton from comment #20)
> bottom line is, SimdSignal.cast should work fine for one-to-one,
> but not for many-to-one.  hopefully Cat() is just the weird
> one here.

See my other email for Cat's solution:

https://bugs.libre-soc.org/show_bug.cgi?id=734#c9
Comment 22 Luke Kenneth Casson Leighton 2021-10-24 21:09:45 BST
(In reply to Jacob Lifshay from comment #21)
> (In reply to Luke Kenneth Casson Leighton from comment #20)
> > bottom line is, SimdSignal.cast should work fine for one-to-one,
> > but not for many-to-one.  hopefully Cat() is just the weird
> > one here.
> 
> See my other email for Cat's solution:
> 
> https://bugs.libre-soc.org/show_bug.cgi?id=734#c9

please
help
to
fix
the
existing
code.

please do not keep advocating disruptive regressions
that have no unit tests.

i have said this multiple times and yet you keep
insisting on ignoring what i am saying.

this is getting extremely tedious.

when i give some reasons please LISTEN to and RESPECT those
reasons.

please do not keep making me waste time repeating things
multiple times.

you are making me angry at how much of my time you are wasting here.