Bug 1242 - SV REMAP: store REMAP indices in 4 groups of 16 64-bit SPRs (or registers)
Summary: SV REMAP: store REMAP indices in 4 groups of 16 64-bit SPRs (or registers)
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 1211
  Show dependency treegraph
 
Reported: 2024-01-02 00:27 GMT by Luke Kenneth Casson Leighton
Modified: 2024-01-02 10:05 GMT (History)
2 users (show)

See Also:
NLnet milestone: NLnet.2023-12-059.expansion
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2024-01-02 00:27:46 GMT
there is a problem with using GPRs for indices
in svindex, but also a lack of flexibility in
the REMAP shortcut instructions.

this is solved by having a separate regfile for
REMAP Indices (all of them) where svshape2/3/4
svoffset and svindex all rewrite them, but they
can be overwritten simply by using mtspr (or sv.mtspr)

this solves a major problem of GPR Hazards.
Comment 1 Jacob Lifshay 2024-01-02 00:34:32 GMT
if there's separate registers, i think it makes it easier to handle out-of-range indexes because svindex can simply do

for i in range(VL):
    v = READ_INPUT(i)
    index[i] = 0 if v >= VL else v
Comment 2 Jacob Lifshay 2024-01-02 00:36:43 GMT
though maybe it should be sv.index -- a standard SV loop over a scalar instruction loading the index SPRs one byte at a time
Comment 3 Jacob Lifshay 2024-01-02 00:49:54 GMT
(In reply to Jacob Lifshay from comment #1)
> if there's separate registers, i think it makes it easier to handle
> out-of-range indexes because svindex can simply do
> 
> for i in range(VL):
>     v = READ_INPUT(i)
>     index[i] = 0 if v >= VL else v

to be clear, this means that out of range indexes cause you to access element 0 instead.
e.g. if the input indexes are [1, 2345, 0, 2]
and you run:
sv.index *input
# svremap something set *in2 to be remapped
sv.mv *out, *in2

then out = [in2[1], in2[0], in2[0], in2[2]]
Comment 4 Luke Kenneth Casson Leighton 2024-01-02 02:36:12 GMT
(In reply to Jacob Lifshay from comment #3)

> to be clear, this means that out of range indexes cause you to access
> element 0 instead.

no. undefined. and the spec is not changing in the scope of this task.

please keep bugreports specifically to scope and on task.

i literally just requested that you observe this hard
requirement just 12 hours ago, having had to repeat it to
you multiple times systematically now for over three years.

i am shocked that you have not listened when i very clearly
outlined it to you, as has David, so many times.
Comment 5 Luke Kenneth Casson Leighton 2024-01-02 02:43:11 GMT
(In reply to Jacob Lifshay from comment #1)

> if there's separate registers, 

there are other reasons for doing a separate register file
that are high priority. one of them being the sheer number
of SPRs. the indices need "association" with an SVSHAPE,
and there are 4 SVSHAPEs. that in turn means QTY 128 indices,
QTY 4of. that's a whopping *512* bytes just for Indices in
a single regfile.

divided by 8 (bytes) that's 64 x 64-bit which is an alarming
64 SPRs, which is just not happening.

deep breath this in turn means adding mtidx and mfidx as two new
scalar instructions.

also to consider is how to reduce the amount of context switch latency.
a length encoding would do the trick, stored in the 128th register,
given that VL goes from 0 to 127.
Comment 6 Jacob Lifshay 2024-01-02 03:08:09 GMT
(In reply to Luke Kenneth Casson Leighton from comment #5)
> also to consider is how to reduce the amount of context switch latency.
> a length encoding would do the trick, stored in the 128th register,
> given that VL goes from 0 to 127.

that's a good idea, except that it needs to *not* be the 128th register, since we need to plan ahead for when we'll have more than 127 as the max length. so, it could work to put the length in register -1, and just use registers 0-126 as indexes...that's equivalent to putting the length in register 0 and having register i + 1 be for element number i.

also, my idea for when we have to support indexes >=256 is we just add another bank of registers called idxhi or something that stores the msb 8 bits for every element and the old idx registers store the lsb 8 bits.
Comment 7 Jacob Lifshay 2024-01-02 03:17:36 GMT
(In reply to Luke Kenneth Casson Leighton from comment #4)
> (In reply to Jacob Lifshay from comment #3)
> 
> > to be clear, this means that out of range indexes cause you to access
> > element 0 instead.
> 
> no. undefined. and the spec is not changing in the scope of this task.
> 
> please keep bugreports specifically to scope and on task.

ok, created bug #1243

I'll note that out-of-range behavior is closely related, because this bug is most likely defining new instructions for loading the idx registers (the sv.index i proposed and the sv.mtidx you proposed), and the out-of-range behavior is part of those instructions.
Comment 8 Luke Kenneth Casson Leighton 2024-01-02 10:01:27 GMT
(In reply to Jacob Lifshay from comment #6)

> that's a good idea, except that it needs to *not* be the 128th register,
> since we need to plan ahead for when we'll have more than 127 as the max
> length.

the extra bits on 128th can expand the length in its MSB just like
the additional bank expands the indices to 16-bit.

> also, my idea for when we have to support indexes >=256 is we just add
> another bank of registers called idxhi or something that stores the msb 8
> bits for every element and the old idx registers store the lsb 8 bits.

yes. awkward but doable, like the ZX Spectrum screen which did 1st row
of 1st 8 rows then 2nd row of 1st 8 rows - really odd but hardware does
not care.

(In reply to Jacob Lifshay from comment #7)

> ok, created bug #1243

i've closed it as invalid, if you are not familiar with why please
ask on that invalid bugreport (only), do not re-open it or spend
significant time on it, we have too much to do and very little time.
Comment 9 Jacob Lifshay 2024-01-02 10:05:51 GMT
(In reply to Luke Kenneth Casson Leighton from comment #8)
> (In reply to Jacob Lifshay from comment #6)
> 
> > that's a good idea, except that it needs to *not* be the 128th register,
> > since we need to plan ahead for when we'll have more than 127 as the max
> > length.
> 
> the extra bits on 128th can expand the length in its MSB just like
> the additional bank expands the indices to 16-bit.

you missed the point, which is that when you set VL=200, which register does it use for the 128th element? the 128th register, of course. this conflicts with using that register for something other than an index.