as part of investigating SimdSignal.Slice it became apparent that the elwidth-based layouts would need to be gotten up and running, first, because those properly support layouts with blank/padding in them. context: http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-October/003920.html this doesn't necessarily rely on SimdScope (to be renamed SimdMode?) but is very closely related... TBD initial version: https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part/simd_scope.py;h=a7f355d766d69ac52c32c9465955a36fa41d45dc;hb=96f15c29e57a41f1cdd480493da25826d84f7153
two key commits, here: 1) https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=cd06267946b1488104ddc2a5960f6b7002f322ec a) removing simd_full_width_hint because it's down to individual Signals (SimdSignals) to determine and explicitly set - if needed - their explicit width b) adding a module context. this is REALLY important. firstly that the Type 1 (AST) casting rules for the module need to be declared as "Simd-aware" secondly that every single SimdSignal created must know what its module is, in order for it to be capable of creating sub-modules. 2) https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=f1b7f195e21c3f7434a96520ebdb8484b696e54b a) added a docstring / bare-minimum usage b) added a Signal() function which either redirects to nmigen Signal or SimdSignal as appropriate (2b) also shows - finally - in a concrete way - why SimdShape had to derive from Shape. unless it did so, there was no way that the SimdScope context could transparently switch over between scalar and simd without one hell of a lot of unnecessary futzing about caveat: must update nmigen libresoc-partsig at least this commit: https://git.libre-soc.org/?p=nmigen.git;a=commitdiff;h=f7a0244e6355c70efb12cb54802cd765323c964a this new function has been added to dsl.Module which *must* be called: 185 186 def _setAstTypeCastFn(self, typefn=None): 187 self._astTypeCast = typefn or Value.cast
just realised it's critically important not to use the SimdSignal cast function in scalar SimdScope mode! https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=1411a10fcf1bc0986fe57bb23144904b67941a4f
i put in quite a few TODO of code-comments that are missing in SimdScope https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=a95f90848c98564d6fcb2ec48c018d2362eed2e4 one in particular stood out: it is critically important NOT to try to treat SimdSignal as if it could potentially be a scalar. this will have massive detrimental implications the idea is to completely transparently switch all code - at compile-time - with an option to issuer_verilog.py and to test_issuer.py - between Simd and Scalar with a single command-line option that goes directly into SimdScope's scalar argument. if that scalar argument is True, the ENTIRE Simd behaviour of SimdScope must get the hell out the way, leaving the entire code base unaltered in every way at every level, exactly as if SimdScope was never even being used. this allows us to test that the changes to the ALUs do in fact not damage the code's scalar behaviour the absolute last thing we need is to radically alter the entire suite of ALUs, encounter a bug, and have absolutely no idea and no way to determine if the bug was down to EITHER: 1) changes to the ALUs 2) the new SimdSignal only by having the ability to swap SimdScope and SimdScope RIGHT out the way do we have a means by which the distinction between those two can be made. that, and the amount of code created will be s*** on gate count for scalar, punishing test_issuer in scalar mode completely unnecessarily.
(In reply to Luke Kenneth Casson Leighton from comment #3) > 1) changes to the ALUs > 2) the new SimdSignal by moving the scalar-emulation to a util function there are 3 tests that can be done. actually, 4: 1) existing scalar 2) scalar-emulation 3) SIMD with elwidth=0b00 4) SIMD with other elwidths (vs scalar with an equivalent XLEN) this last one will be fun / interesting, and justifies doing scalar XLEN=32/16/8 HDL, fascinatingly. so, please don't *delete* the scalar-emulation, move it to a util function that can be used during establishing PSpec at runtime/config. should be very interesting and useful.
in https://bugs.libre-soc.org/show_bug.cgi?id=716#c17 it was determined that mismatched width/partitionpoints needs an adapter that would adjust automatically the input operands. this *may* include zero/sign-extension in the case of arithmetic/logical ops. it has implications for the ElwidPartType because ElwidPartType needs to use the correct element width per partition, including padding pieces, to ensure that for each elwid case the total width is always the same.
elwid | | | | | | 0b00 x x x x x x x x x |B7 B6B5B4 B3B2 B1B0A2A1A0 0b01 x x x |B7B6 B5B4AaA9 A8|x x x |B3B2 B1B0A2A1A0 0b10 B7B6Ae AdAc|B5B4AaA9 A8|B3B2A6 A5A4|B1B0A2A1A0 this example, from the slice page, is showing up an oversight in PartitionedCat (and other places that have a get_chunk function) https://libre-soc.org/3d_gpu/architecture/dynamic_simd/slice/ the issue is that PartitionedCat etc. were not designed with disparate PartitionPoints in mind, and the case of a fixed-overall-width and a fixed-element-width is breaking the assumptions. a workaround _would_ be to convert all of the input parameters (PartitionedCat.catlist) to the exact same partitionpoints but i am currently unable to see how to do that. determining the new lane_shapes is easy enough: just add up the widths at each elwidth of every contributor in catlist, create a new lane_shapes, job done. the issue is: even with those new lane_shapes after creating a new PartitionPoints, how the heck do you convert each one of the catlist contributors to the same layout, when each contributor is smaller and would fit in totally different places? i don't believe i'm overthinking this. it may need a design shift in PartitionedCat itself. need to look at it some more.
here is the relevant code section from PartitionedCat: 110 for pbit in self.ptype.get_cases(): 111 # set up some indices pointing to where things have got 112 # then when called below in the inner nested loop they give 113 # the relevant sequential chunk 114 output = [] 115 y = [0] * len(self.catlist) 116 # get a list of the length of each partition run 117 runlengths = get_runlengths(pbit, len(keys)) 118 print ("pbit", bin(pbit), "runs", runlengths) 119 for i in runlengths: # for each partition 120 for yidx in range(len(y)): 121 thing = self.get_chunk(y, yidx, i) # sequential chunks 122 output.append(thing) the core of PartitionedCat is lines 119 and 120, which create the Cat()ed result on a per-elwidth basis (line 110) for every element (line 119) in the partition, where the actual Cat on a per-element basis is the loop at line 120. what is missing from this picture is the arbitrary sizes and occurrences of padding blocks. to that end, get_chunk needs to move into ElwidPartType, and it needs to be possible to determine that the "things" being created do not necessarily match perfectly with the expected width, and to pad them. still thinking it through
https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=f4af452829a8bbe686a5ae34f1ddc362882c9d86 i have a feeling that PartitionedCat needs a redesign to take into consideration the lengths of each individual element, from which, in combination with the length of each output element, it becomes possible to calculate the length of any missing bits, aka "padding". these would explicitly be set to zero. another way to do it would be to use targetted slices: out[start:end].eq(chunk)
Luke, I already wrote a __Cat__ implementation that does everything we need in: https://bugs.libre-soc.org/show_bug.cgi?id=731#c8 No need to spend a bunch of time trying to figure out how to do it yourself.
(In reply to Jacob Lifshay from comment #9) > Luke, I already wrote a __Cat__ implementation that does everything we need > in: > https://bugs.libre-soc.org/show_bug.cgi?id=731#c8 i know: and to remind you, i said that it is a regression, and its immediate deployment is too disruptive: there is no "from where things are" to "where it is" path. the code that you wrote has no understanding of PartType which makes it a disruptive regression and places it firmly in version 2 category. > No need to spend a bunch of time trying to figure out how to do it yourself. well, then, please help by making PartitionedCat (the code that has unit tests and is embedded into the existing 6 month tested code), work instead of forcing me to duplicate effort in an area that i am not good at. you could if you chose to do this in minutes, i am getting quite irritated that you leave me to struggle with algorithmic issues that take me literally ten or greater times longer due to a form of dyslexia.
(In reply to Jacob Lifshay from comment #9) > Luke, I already wrote a __Cat__ implementation that does everything we need > in: > https://bugs.libre-soc.org/show_bug.cgi?id=731#c8 > > No need to spend a bunch of time trying to figure out how to do it yourself. the only reason i am FORCED to, as you put it, "spend a bunch of time" is because you are NOT helping by doing what i have told you repeatedly is required. please HELP to fix the EXISTING code in an INCREMENTAL fashion i have stated very clearly multiple times that a project of this size requires an incremental coding policy. this is a mandatory requirement that is not going to be revoked. no exceptions. please accept this policy and do not fight it directly or indirectly by leaving me to struggle with algorithmic issues that you find so easy "because your version 2 solution is better". you wrote code that does not fit with existing code. therefore if it doesn't fit it is *automatically* rejected. please get this straight and start helping.
i've thrown together a SimdShape class and started tying it in. https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=b5ed5a2edf34115e830605370cf5397481b7ea13 https://git.libre-soc.org/?p=ieee754fpu.git;a=commitdiff;h=b9657e88aa2b50d117c86b88c6d6186b3c91bf8c the fundamental assumption is that you would never create a SimdShape without also putting it under a SimdScope ContextManager. have to see how that turns out to make things "look" like they're the same as Scalar i've also added SimdScope.Shape commit 8cb1ccf1de1ca0166b0c62fcc46d5011df218eb2 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Oct 25 12:10:46 2021 +0100 add SimdScope.Shape redirector which switches from scalar to simd behaviour depending on context. for compatibility with nmigen Shape() the width parameter is passed through to widths_at_elwid in SimdShape, allowing the layout() function the opportunity to turn a scalar width into fixed element widths at all elwids. it may actually turn out to be better to have a "fixed_width" parameter for SimdShape. hmmm... (edit:) commit ff455da2a1e30a17c74ffd06ec5d0dcd53953093 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Oct 25 12:19:04 2021 +0100 rename the arguments to SimdShape() so as to match up with Shape() params
i cut out an entire swathe of code which creates massive unnecessary complications, replacing nearly 100 lines of complex interlocking type analysis and conversions with *three* lines of code. please write code that is simple and adapt it *only when it proves necessary* (with incremental commits) commit 7446824ba1944d822ae07ec826fd664bbdbf8b43 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Oct 25 13:14:06 2021 +0100 remove unnecessary code which creates complications from SimdScope constructor. elwid is mandatory, vec_el_counts is mandatory. no need for complicated types, complicated adaptation, complicated interlocking conditional behaviour. pass in elwid (Signal) pass in vec_el_counts. done.
ha! it works! commit fff6d9ab2de5700103e0f9c1bb9fab64233f1783 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Oct 25 14:28:55 2021 +0100 had to add fixed_width parameter temporarily to confirm that the minitest, test_partsig_scope.py, worked (which it did). now can work out how to remove it now, the next incremental phases are: 1) work out how to remove the SimdScope.Signal(fixed_width) parameter 2) probably by creating a SimdShape(fixed_width=nnn) 3) which is passed in as the 1st parameter to SimdScope.Signal 4) which in the minitest makes it clear why rudimentary arithmetic is needed in SimdScope with SimdScope(self.m, elwid, vec_el_counts) as s: # BE CAREFUL with the fixed_width parameter. # it is NOT available in SimdScope.scalar mode self.a = s.Signal(fixed_width=width) self.b = s.Signal(fixed_width=width*2) self.o = s.Signal(fixed_width=width*3) ==> with SimdScope(self.m, elwid, vec_el_counts) as s: shape = s.Scope(fixed_width=width) self.a = s.Signal(shape) self.b = s.Signal(shape*2) # simple __mul__ needed self.o = s.Signal(shape*3) # again, simple __mul__ needed
done: commit 6edd0e1e0732b3e9781d3c4d8d3830d475cd4a50 Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Oct 25 14:35:57 2021 +0100 use explicit SimdShape for minitest example rather than fixed_width parameter + shape = SimdShape(s, fixed_width=width) + shape2 = SimdShape(s, fixed_width=width*2) + shape3 = SimdShape(s, fixed_width=width*3) + self.a = s.Signal(shape) + self.b = s.Signal(shape2) # TODO: shape*2 + self.o = s.Signal(shape3) # TODO: shape*3 TODO: the multiply part. basically, SimdShape should (multiply-inherit) derive from SimdMap, *or* (probably better) the functions *of* SimdMap go into SimdShape. jacob you had probably best do that, but *please do not go overboard*. also please do not make SimdShape critically depend on any types. do not alter it so that it critically depends on the enum Elwid. if necessary start again with the absolute basics, ignoring SimdMap entirely and *only* add in the *absolute* minimum rudimentary functions needed, this time. remember: write code *on-demand* - NOT "i think this might be needed so am going to write 1,000 lines of code that has no unit tests and no feet-on-the-ground justification for its existence" show the need *first*, create code *second*, ok?
i have two functions added to PartType which are a new API, addressing elements by their elwid and by the element number and returning the range within the underlying Signal. now it becomes possible, for an ElwidPartType version, to accumulate the pieces to be Cat()ed together, and to check the length. if it does not fit exactly the expected length of that partition, add some padding zeros so that it does.