(In reply to Jacob Lifshay from bug #1242 comment #3) > (In reply to Jacob Lifshay from bug #1242 comment #1) > > if there's separate registers, i think it makes it easier to handle > > out-of-range indexes because svindex can simply do > > > > for i in range(VL): > > v = READ_INPUT(i) > > index[i] = 0 if v >= VL else v > > to be clear, this means that out of range indexes cause you to access > element 0 instead. > e.g. if the input indexes are [1, 2345, 0, 2] > and you run: > sv.index *input > # svremap something set *in2 to be remapped > sv.mv *out, *in2 > > then out = [in2[1], in2[0], in2[0], in2[2]]
an alternative idea is to have out-of-range indexes become -1, which is a special marker that means register writes are ignored and register reads give zero. this is relatively easy to do, all of RISC-V, AArch64, CDC 6600, System/360, and more have a zero register that behaves basically the same, and x86-64 OoO cpus often have a microarchitectural zero register.
(In reply to Jacob Lifshay from comment #1) > an alternative idea is to have out-of-range indexes become -1 this has the benefit of matching the semantics of wasm's dynamic swizzle, as well as ARM and RISC-V and sorta x86.
(In reply to Jacob Lifshay from comment #1) > and x86-64 OoO cpus often have a microarchitectural zero register. more stuff about zero registers: https://yarchive.net/comp/zero_register.html as you know, ppc sorta has a zero register in that many instructions use zero instead of loading from a GPR when given register number 0.
(In reply to Jacob Lifshay from comment #1) > an alternative idea is to have out-of-range indexes become -1, whatever it is it has to be extreme-gate-efficient. this is absolute paramount absolute top priority. the indices are right smack in between decode and hazard matrix filling, and *any* extraneous logic severely damages top speed. comparators against VL are absolutely out. comparators against -1 are likewise not a good idea. a sub-2nm CPU will have at least a 4 gate cascade. this is why it is UNDEFINED behaviour. please drop this extraneous change and keep to the topic at hand for the third time of requesting in under 24 hours.
(In reply to Luke Kenneth Casson Leighton from comment #4) > comparators against VL are absolutely out. the comparison against VL happens in the mtidx instruction, where gate efficiency isn't as critical, *not* when the idx register is used. > please drop this extraneous change and keep to the topic at hand > for the third time of requesting in under 24 hours. this bug is the topic at hand because this message is posted on this bug. if you want to talk aspects other than if out-of-range should have defined behavior, there is a different bug for that: bug #1242
(In reply to Luke Kenneth Casson Leighton from comment #4) > comparators against -1 are likewise not a good idea. a comparison against -1 takes less latency than the addition that indexing mode already has to do anyway. a comparison would have the latency of an 8-input and-gate, which is comparable to a 2-bit adder, and we need at least a 7-bit adder either way. remember all those can be done in parallel. the only additional latency is that of 1 layer of nand or nor gates (depending on what register number the zero register is given), or can be merged into the final layer of xor gates for the sum output, giving essentially zero additional latency. alternatively, instead of having a zero register, it could just change that input/output to use r0, which might be even more powerful since it allows loading zero for compatibility, as well as any other value you want to stuff in r0. additionally, this alternative only requires 7-bit register numbers instead of kinda 8-bit.
I'm dropping this for now, but i think it's worth reconsidering at least the change to mtidx at some later point (doesn't modify fetch/decode/etc. at all), so i'm marking this deferred.