there is a problem with using GPRs for indices in svindex, but also a lack of flexibility in the REMAP shortcut instructions. this is solved by having a separate regfile for REMAP Indices (all of them) where svshape2/3/4 svoffset and svindex all rewrite them, but they can be overwritten simply by using mtspr (or sv.mtspr) this solves a major problem of GPR Hazards.
if there's separate registers, i think it makes it easier to handle out-of-range indexes because svindex can simply do for i in range(VL): v = READ_INPUT(i) index[i] = 0 if v >= VL else v
though maybe it should be sv.index -- a standard SV loop over a scalar instruction loading the index SPRs one byte at a time
(In reply to Jacob Lifshay from comment #1) > if there's separate registers, i think it makes it easier to handle > out-of-range indexes because svindex can simply do > > for i in range(VL): > v = READ_INPUT(i) > index[i] = 0 if v >= VL else v to be clear, this means that out of range indexes cause you to access element 0 instead. e.g. if the input indexes are [1, 2345, 0, 2] and you run: sv.index *input # svremap something set *in2 to be remapped sv.mv *out, *in2 then out = [in2[1], in2[0], in2[0], in2[2]]
(In reply to Jacob Lifshay from comment #3) > to be clear, this means that out of range indexes cause you to access > element 0 instead. no. undefined. and the spec is not changing in the scope of this task. please keep bugreports specifically to scope and on task. i literally just requested that you observe this hard requirement just 12 hours ago, having had to repeat it to you multiple times systematically now for over three years. i am shocked that you have not listened when i very clearly outlined it to you, as has David, so many times.
(In reply to Jacob Lifshay from comment #1) > if there's separate registers, there are other reasons for doing a separate register file that are high priority. one of them being the sheer number of SPRs. the indices need "association" with an SVSHAPE, and there are 4 SVSHAPEs. that in turn means QTY 128 indices, QTY 4of. that's a whopping *512* bytes just for Indices in a single regfile. divided by 8 (bytes) that's 64 x 64-bit which is an alarming 64 SPRs, which is just not happening. deep breath this in turn means adding mtidx and mfidx as two new scalar instructions. also to consider is how to reduce the amount of context switch latency. a length encoding would do the trick, stored in the 128th register, given that VL goes from 0 to 127.
(In reply to Luke Kenneth Casson Leighton from comment #5) > also to consider is how to reduce the amount of context switch latency. > a length encoding would do the trick, stored in the 128th register, > given that VL goes from 0 to 127. that's a good idea, except that it needs to *not* be the 128th register, since we need to plan ahead for when we'll have more than 127 as the max length. so, it could work to put the length in register -1, and just use registers 0-126 as indexes...that's equivalent to putting the length in register 0 and having register i + 1 be for element number i. also, my idea for when we have to support indexes >=256 is we just add another bank of registers called idxhi or something that stores the msb 8 bits for every element and the old idx registers store the lsb 8 bits.
(In reply to Luke Kenneth Casson Leighton from comment #4) > (In reply to Jacob Lifshay from comment #3) > > > to be clear, this means that out of range indexes cause you to access > > element 0 instead. > > no. undefined. and the spec is not changing in the scope of this task. > > please keep bugreports specifically to scope and on task. ok, created bug #1243 I'll note that out-of-range behavior is closely related, because this bug is most likely defining new instructions for loading the idx registers (the sv.index i proposed and the sv.mtidx you proposed), and the out-of-range behavior is part of those instructions.
(In reply to Jacob Lifshay from comment #6) > that's a good idea, except that it needs to *not* be the 128th register, > since we need to plan ahead for when we'll have more than 127 as the max > length. the extra bits on 128th can expand the length in its MSB just like the additional bank expands the indices to 16-bit. > also, my idea for when we have to support indexes >=256 is we just add > another bank of registers called idxhi or something that stores the msb 8 > bits for every element and the old idx registers store the lsb 8 bits. yes. awkward but doable, like the ZX Spectrum screen which did 1st row of 1st 8 rows then 2nd row of 1st 8 rows - really odd but hardware does not care. (In reply to Jacob Lifshay from comment #7) > ok, created bug #1243 i've closed it as invalid, if you are not familiar with why please ask on that invalid bugreport (only), do not re-open it or spend significant time on it, we have too much to do and very little time.
(In reply to Luke Kenneth Casson Leighton from comment #8) > (In reply to Jacob Lifshay from comment #6) > > > that's a good idea, except that it needs to *not* be the 128th register, > > since we need to plan ahead for when we'll have more than 127 as the max > > length. > > the extra bits on 128th can expand the length in its MSB just like > the additional bank expands the indices to 16-bit. you missed the point, which is that when you set VL=200, which register does it use for the 128th element? the 128th register, of course. this conflicts with using that register for something other than an index.