Bug 1243 - when loading index registers, have defined behavior for out of range indexes
Summary: when loading index registers, have defined behavior for out of range indexes
Status: DEFERRED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks:
 
Reported: 2024-01-02 03:13 GMT by Jacob Lifshay
Modified: 2024-01-02 11:22 GMT (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2024-01-02 03:13:25 GMT
(In reply to Jacob Lifshay from bug #1242 comment #3)
> (In reply to Jacob Lifshay from bug #1242 comment #1)
> > if there's separate registers, i think it makes it easier to handle
> > out-of-range indexes because svindex can simply do
> > 
> > for i in range(VL):
> >     v = READ_INPUT(i)
> >     index[i] = 0 if v >= VL else v
> 
> to be clear, this means that out of range indexes cause you to access
> element 0 instead.
> e.g. if the input indexes are [1, 2345, 0, 2]
> and you run:
> sv.index *input
> # svremap something set *in2 to be remapped
> sv.mv *out, *in2
> 
> then out = [in2[1], in2[0], in2[0], in2[2]]
Comment 1 Jacob Lifshay 2024-01-02 03:25:08 GMT
an alternative idea is to have out-of-range indexes become -1, which is a special marker that means register writes are ignored and register reads give zero. this is relatively easy to do, all of RISC-V, AArch64, CDC 6600, System/360, and more have a zero register that behaves basically the same, and x86-64 OoO cpus often have a microarchitectural zero register.
Comment 2 Jacob Lifshay 2024-01-02 03:29:07 GMT
(In reply to Jacob Lifshay from comment #1)
> an alternative idea is to have out-of-range indexes become -1

this has the benefit of matching the semantics of wasm's dynamic swizzle, as well as ARM and RISC-V and sorta x86.
Comment 3 Jacob Lifshay 2024-01-02 03:36:53 GMT
(In reply to Jacob Lifshay from comment #1)
> and x86-64 OoO cpus often have a microarchitectural zero register.

more stuff about zero registers: https://yarchive.net/comp/zero_register.html

as you know, ppc sorta has a zero register in that many instructions use zero instead of loading from a GPR when given register number 0.
Comment 4 Luke Kenneth Casson Leighton 2024-01-02 09:53:30 GMT
(In reply to Jacob Lifshay from comment #1)
> an alternative idea is to have out-of-range indexes become -1,

whatever it is it has to be extreme-gate-efficient.

this is absolute paramount absolute top priority.

the indices are right smack in between decode and hazard matrix
filling, and *any* extraneous logic severely damages top speed.

comparators against VL are absolutely out.
comparators against -1 are likewise not a good idea.
a sub-2nm CPU will have at least a 4 gate cascade.

this is why it is UNDEFINED behaviour.

please drop this extraneous change and keep to the topic at hand
for the third time of requesting in under 24 hours.
Comment 5 Jacob Lifshay 2024-01-02 10:02:13 GMT
(In reply to Luke Kenneth Casson Leighton from comment #4)
> comparators against VL are absolutely out.

the comparison against VL happens in the mtidx instruction, where gate efficiency isn't as critical, *not* when the idx register is used.

> please drop this extraneous change and keep to the topic at hand
> for the third time of requesting in under 24 hours.

this bug is the topic at hand because this message is posted on this bug. if you want to talk aspects other than if out-of-range should have defined behavior, there is a different bug for that: bug #1242
Comment 6 Jacob Lifshay 2024-01-02 10:28:36 GMT
(In reply to Luke Kenneth Casson Leighton from comment #4)
> comparators against -1 are likewise not a good idea.

a comparison against -1 takes less latency than the addition that indexing mode already has to do anyway. a comparison would have the latency of an 8-input and-gate, which is comparable to a 2-bit adder, and we need at least a 7-bit adder either way.

remember all those can be done in parallel. the only additional latency is that of 1 layer of nand or nor gates (depending on what register number the zero register is given), or can be merged into the final layer of xor gates for the sum output, giving essentially zero additional latency.

alternatively, instead of having a zero register, it could just change that input/output to use r0, which might be even more powerful since it allows loading zero for compatibility, as well as any other value you want to stuff in r0. additionally, this alternative only requires 7-bit register numbers instead of kinda 8-bit.
Comment 7 Jacob Lifshay 2024-01-02 11:22:43 GMT
I'm dropping this for now, but i think it's worth reconsidering at least the change to mtidx at some later point (doesn't modify fetch/decode/etc. at all), so i'm marking this deferred.