708 – PartitionedSignal API design

Bug 708 - PartitionedSignal API design

Summary: PartitionedSignal API design

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	--- enhancement
Assignee:	Jacob Lifshay

URL:

Depends on:
Blocks:	132 458
	Show dependency tree / graph

Reported:	2021-09-23 22:06 BST by Jacob Lifshay
Modified:	2021-10-01 09:12 BST (History)
CC List:	2 users (show)

See Also:
NLnet milestone:	---
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jacob Lifshay 2021-09-23 22:06:32 BST

I think what we need to do is have the entire PartitionedSignal (and related) API be more like SIMT -- where basically all operations operate in-parallel and independently on each partition, including Cat, Mux, slicing, +, If, Switch, Case, conversion from Signal (which I think should only happen explicitly via a splat function/static-method, otherwise mixing PartitionedSignal and Signal results in a type error), conversion from int, etc.

Essentially we treat a PartitionedSignal conceptually as a list of Signals where all normal operations operate element-wise on each signal in the list.

I think this makes the code waay more readable and consistent.

There would be specially designated methods for converting the whole PartitionedSignal from/to their underlying Signals (think of conceptually concatenating all signals in a list together, or splitting a signal into a list of signals).

PartitionedSignal would essentially end up being a list of Signals (not literally, they would be pieces of a Signal), where the Signals could be either dynamically sized or statically sized, the size would be chosen at runtime by combinatorial logic (basically a small lookup table) based on a layout enum. that layout enum (with the combinatorial logic) replaces PartitionPoints. This allows things like:
class FPULayoutEnum(Enum):
    F16x4 = ...
    F32x2 = ...
    F64x1 = ...
exponent_field_layout = { # probably in a FP layout class
    FPULayoutEnum.F16x4: 5,
    FPULayoutEnum.F32x2: 8,
    FPULayoutEnum.F64x1: 11
}
exponent_field_slice = { # probably in a FP layout class
    FPULayoutEnum.F16x4: slice(10,16),
    FPULayoutEnum.F32x2: slice(23,32),
    FPULayoutEnum.F64x1: slice(52,64)
}
float_layout = { # probably next to FPULayoutEnum
    FPULayoutEnum.F16x4: 16,
    FPULayoutEnum.F32x2: 32,
    FPULayoutEnum.F64x1: 64
}

class MyFPUStage:
    def __init__(self):
        self.fpu_layout = Signal(FPULayoutEnum)
        with PartitionedSignal.layout_scope(self.fpu_layout):
            self.float_bits = PartitionedSignal(float_layout)
            self.exponent_field = PartitionedSignal(exponent_field_layout)
    def elaborate(self, ...):
        m = Module()
        with PartitionedSignal.layout_scope(self.fpu_layout):
            # generic over all float types! easy to read/think about!
            m.d.comb += exponent_field.eq(float_bits[exponent_field_slice])
            # do stuff with exponent_field...
        return m

This means PartitionedSignal would sometimes be partitioned in a more complex way, such as if all partitions always have 5 bits (conceptually like [Signal(5) for _ in ...]) -- there would need to be gaps in some cases.

For all examples:
a, b, and c are PartitionedSignal,
s is a Signal
v is a Python variable

Examples:
v = a + b # adding PartitionedSignal
conceptual equivalent:
v = [i + j for i, j in zip(a, b)]

v = a + splat(s)
conceptual equivalent:
v = [i + s for i in a]

v = a + s # adding PartitionedSignal and Signal
conceptual equivalent:
raise TypeError()

v = splat(s)
conceptual equivalent:
v = [s] * len(partitions)

v = Cat(a, b, c)
conceptual equivalent:
v = [Cat(i, j, k) for i, j, k in zip(a, b, c)]

v = Cat(a, s) # `Cat`ting PartitionedSignal and Signal
conceptual equivalent:
raise TypeError()


v = a[3:5]
conceptual equivalent:
v = [i[3:5] for i in a]

with m.Switch(a):
    with m.Case(3, 10, '--101'):
        m.d.comb += b.eq(23)
    with m.Default():
        m.d.comb += b.eq(45)
conceptual equivalent:
for i, j in zip(a, b):
    with m.Switch(i):
        with m.Case(3, 10, '--101'):
            m.d.comb += j.eq(23)
        with m.Default():
            m.d.comb += j.eq(45)

v = PartitionedSignal(underlying=s, partitions=...)
conceptual equivalent:
v = list(s.partitions()) # more or less...

v = a.underlying
conceptual equivalent:
v = Cat(*a) # more or less...

Comment 1 Jacob Lifshay 2021-09-23 22:12:11 BST

this change wouldn't require changing PartitionedSignal all that much, mostly just going all-in on the conceptual change, rather than the case we have currently, which is waay more confusing due to being a haphazard mix of SIMT and traditional Signal

Comment 2 Jacob Lifshay 2021-09-23 22:18:48 BST

assuming i have time after fixing #706, i'll build a small demo with partitioned-add and slicing

Comment 3 Luke Kenneth Casson Leighton 2021-09-23 22:59:26 BST

unfortunately, although it is a great idea, it should have been raised 18 months
ago at the time when the original discussion, design, and implementation of PartitionedSignal was taking place.

there has now been about 4-5 man-months of work (and payments) already gone out, leaving no spare budget for a major overhaul of the design *and* unit tests *and* Formal Correctness Proofs *and* updating the associated wiki documentation which is critical to understanding the API and its rationale.

in addition to that, a redesign would push us even further behind.

therefore unfortunately we are committed to completing the existing design, which was planned about 15-18 months ago, and will have to revisit this when other obligations have been met.

i don't have a problem at all with putting in a "refactoring" Grant request which learns from the lessons and experiences we get from actual practical application of the *current* design.  the reason being that any painful gotchas will be indelibly etched on our retinas like supernova flares and can be part or the justification for the extra Grant request.

right now if we put in such a Grant request without having completed the milestones already underway, without having concrete results and progress, it will raise eyebrows, with good justification, i feel.

Comment 4 Luke Kenneth Casson Leighton 2021-09-23 23:12:45 BST

(In reply to Jacob Lifshay from comment #0)
> exponent_field_layout = { # probably in a FP layout class
>     FPULayoutEnum.F16x4: 5,
>     FPULayoutEnum.F32x2: 8,
>     FPULayoutEnum.F64x1: 11
> }

yes, this would solve the tricky issue of irregular sizes,
that change depending on partition size.  over-allocation
will be required, in effect, where the total bits allocated
must be more than *might* be used for certain operation
widths, and the class ends up *dynamically* selecting
the appropriate sub-parts depending on the partition
points.

really quite mind-bending.

unfortunately, like ArrayProxy, i feel it is just too much
to tackle in a "first cut" of the API. i would much rather
that we:

a) complete the work as planned (esp. given that the budget's
   already spent)

b) go through the whole rigmarole of workarounds, documenting
   them clearly, to deal with the varying sizes explicitly
   (using the existing PartitionedSignal)

one of the primary reasons for (b) is because there has been
such extensive testing and comprehensive Formal Correctness
Proofs done.  we *know* that __eq__ etc work.

if we get into serious trouble or get completely fried and
brain-melted with how exasperating PartitionedSignal turns
out to be, that's easy to justify an extra Grant.

Comment 5 Luke Kenneth Casson Leighton 2021-09-23 23:17:57 BST

(In reply to Jacob Lifshay from comment #0)
 = ...
>     F64x1 = ...
> exponent_field_layout = { # probably in a FP layout class
>     FPULayoutEnum.F16x4: 5,
>     FPULayoutEnum.F32x2: 8,
>     FPULayoutEnum.F64x1: 11
> }

ah.  and F16x2+F32x1 and all other combinations.

yes, PartitionedSignal is designed to handle arbitrary partition
points: "1x24 1x8 2x16" is perfectly legitimate, and all operators including __eq__ and shift etc. all work perfectly with these arbitrary lengths.

it was a *LOT* of work, requiring some 3 weeks *per operator* of analysis in some cases even before coding could begin.

Comment 6 Luke Kenneth Casson Leighton 2021-09-23 23:19:17 BST

(In reply to Jacob Lifshay from comment #2)
> assuming i have time after fixing #706, i'll build a small demo with
> partitioned-add and slicing

as long as this does not become a distraction: remember we are on a tight
time schedule and there is no spare budget.

Comment 7 Jacob Lifshay 2021-10-01 09:12:41 BST

(In reply to Luke Kenneth Casson Leighton from comment #6)
> (In reply to Jacob Lifshay from comment #2)
> > assuming i have time after fixing #706, i'll build a small demo with
> > partitioned-add and slicing
> 
> as long as this does not become a distraction: remember we are on a tight
> time schedule and there is no spare budget.

A much larger than originally intended proof-of-concept:
https://salsa.debian.org/Kazan-team/simd_signal