I think what we need to do is have the entire PartitionedSignal (and related) API be more like SIMT -- where basically all operations operate in-parallel and independently on each partition, including Cat, Mux, slicing, +, If, Switch, Case, conversion from Signal (which I think should only happen explicitly via a splat function/static-method, otherwise mixing PartitionedSignal and Signal results in a type error), conversion from int, etc. Essentially we treat a PartitionedSignal conceptually as a list of Signals where all normal operations operate element-wise on each signal in the list. I think this makes the code waay more readable and consistent. There would be specially designated methods for converting the whole PartitionedSignal from/to their underlying Signals (think of conceptually concatenating all signals in a list together, or splitting a signal into a list of signals). PartitionedSignal would essentially end up being a list of Signals (not literally, they would be pieces of a Signal), where the Signals could be either dynamically sized or statically sized, the size would be chosen at runtime by combinatorial logic (basically a small lookup table) based on a layout enum. that layout enum (with the combinatorial logic) replaces PartitionPoints. This allows things like: class FPULayoutEnum(Enum): F16x4 = ... F32x2 = ... F64x1 = ... exponent_field_layout = { # probably in a FP layout class FPULayoutEnum.F16x4: 5, FPULayoutEnum.F32x2: 8, FPULayoutEnum.F64x1: 11 } exponent_field_slice = { # probably in a FP layout class FPULayoutEnum.F16x4: slice(10,16), FPULayoutEnum.F32x2: slice(23,32), FPULayoutEnum.F64x1: slice(52,64) } float_layout = { # probably next to FPULayoutEnum FPULayoutEnum.F16x4: 16, FPULayoutEnum.F32x2: 32, FPULayoutEnum.F64x1: 64 } class MyFPUStage: def __init__(self): self.fpu_layout = Signal(FPULayoutEnum) with PartitionedSignal.layout_scope(self.fpu_layout): self.float_bits = PartitionedSignal(float_layout) self.exponent_field = PartitionedSignal(exponent_field_layout) def elaborate(self, ...): m = Module() with PartitionedSignal.layout_scope(self.fpu_layout): # generic over all float types! easy to read/think about! m.d.comb += exponent_field.eq(float_bits[exponent_field_slice]) # do stuff with exponent_field... return m This means PartitionedSignal would sometimes be partitioned in a more complex way, such as if all partitions always have 5 bits (conceptually like [Signal(5) for _ in ...]) -- there would need to be gaps in some cases. For all examples: a, b, and c are PartitionedSignal, s is a Signal v is a Python variable Examples: v = a + b # adding PartitionedSignal conceptual equivalent: v = [i + j for i, j in zip(a, b)] v = a + splat(s) conceptual equivalent: v = [i + s for i in a] v = a + s # adding PartitionedSignal and Signal conceptual equivalent: raise TypeError() v = splat(s) conceptual equivalent: v = [s] * len(partitions) v = Cat(a, b, c) conceptual equivalent: v = [Cat(i, j, k) for i, j, k in zip(a, b, c)] v = Cat(a, s) # `Cat`ting PartitionedSignal and Signal conceptual equivalent: raise TypeError() v = a[3:5] conceptual equivalent: v = [i[3:5] for i in a] with m.Switch(a): with m.Case(3, 10, '--101'): m.d.comb += b.eq(23) with m.Default(): m.d.comb += b.eq(45) conceptual equivalent: for i, j in zip(a, b): with m.Switch(i): with m.Case(3, 10, '--101'): m.d.comb += j.eq(23) with m.Default(): m.d.comb += j.eq(45) v = PartitionedSignal(underlying=s, partitions=...) conceptual equivalent: v = list(s.partitions()) # more or less... v = a.underlying conceptual equivalent: v = Cat(*a) # more or less...
this change wouldn't require changing PartitionedSignal all that much, mostly just going all-in on the conceptual change, rather than the case we have currently, which is waay more confusing due to being a haphazard mix of SIMT and traditional Signal
assuming i have time after fixing #706, i'll build a small demo with partitioned-add and slicing
unfortunately, although it is a great idea, it should have been raised 18 months ago at the time when the original discussion, design, and implementation of PartitionedSignal was taking place. there has now been about 4-5 man-months of work (and payments) already gone out, leaving no spare budget for a major overhaul of the design *and* unit tests *and* Formal Correctness Proofs *and* updating the associated wiki documentation which is critical to understanding the API and its rationale. in addition to that, a redesign would push us even further behind. therefore unfortunately we are committed to completing the existing design, which was planned about 15-18 months ago, and will have to revisit this when other obligations have been met. i don't have a problem at all with putting in a "refactoring" Grant request which learns from the lessons and experiences we get from actual practical application of the *current* design. the reason being that any painful gotchas will be indelibly etched on our retinas like supernova flares and can be part or the justification for the extra Grant request. right now if we put in such a Grant request without having completed the milestones already underway, without having concrete results and progress, it will raise eyebrows, with good justification, i feel.
(In reply to Jacob Lifshay from comment #0) > exponent_field_layout = { # probably in a FP layout class > FPULayoutEnum.F16x4: 5, > FPULayoutEnum.F32x2: 8, > FPULayoutEnum.F64x1: 11 > } yes, this would solve the tricky issue of irregular sizes, that change depending on partition size. over-allocation will be required, in effect, where the total bits allocated must be more than *might* be used for certain operation widths, and the class ends up *dynamically* selecting the appropriate sub-parts depending on the partition points. really quite mind-bending. unfortunately, like ArrayProxy, i feel it is just too much to tackle in a "first cut" of the API. i would much rather that we: a) complete the work as planned (esp. given that the budget's already spent) b) go through the whole rigmarole of workarounds, documenting them clearly, to deal with the varying sizes explicitly (using the existing PartitionedSignal) one of the primary reasons for (b) is because there has been such extensive testing and comprehensive Formal Correctness Proofs done. we *know* that __eq__ etc work. if we get into serious trouble or get completely fried and brain-melted with how exasperating PartitionedSignal turns out to be, that's easy to justify an extra Grant.
(In reply to Jacob Lifshay from comment #0) = ... > F64x1 = ... > exponent_field_layout = { # probably in a FP layout class > FPULayoutEnum.F16x4: 5, > FPULayoutEnum.F32x2: 8, > FPULayoutEnum.F64x1: 11 > } ah. and F16x2+F32x1 and all other combinations. yes, PartitionedSignal is designed to handle arbitrary partition points: "1x24 1x8 2x16" is perfectly legitimate, and all operators including __eq__ and shift etc. all work perfectly with these arbitrary lengths. it was a *LOT* of work, requiring some 3 weeks *per operator* of analysis in some cases even before coding could begin.
(In reply to Jacob Lifshay from comment #2) > assuming i have time after fixing #706, i'll build a small demo with > partitioned-add and slicing as long as this does not become a distraction: remember we are on a tight time schedule and there is no spare budget.
(In reply to Luke Kenneth Casson Leighton from comment #6) > (In reply to Jacob Lifshay from comment #2) > > assuming i have time after fixing #706, i'll build a small demo with > > partitioned-add and slicing > > as long as this does not become a distraction: remember we are on a tight > time schedule and there is no spare budget. A much larger than originally intended proof-of-concept: https://salsa.debian.org/Kazan-team/simd_signal