Bug 553 - svp64 register mapping to accomidate AltiVec vectors expanding fp registers
Summary: svp64 register mapping to accomidate AltiVec vectors expanding fp registers
Status: DEFERRED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Jacob Lifshay
URL:
Depends on:
Blocks:
 
Reported: 2020-12-23 18:58 GMT by Jacob Lifshay
Modified: 2021-01-15 00:47 GMT (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Jacob Lifshay 2021-01-13 20:16:15 GMT
(In reply to Luke Kenneth Casson Leighton from bug #558 comment #57)
> (In reply to Jacob Lifshay from bug #558 comment #56)
> 
> > So, to be clear, you're advocating for not using the scheme I proposed just
> > now, or not using the scheme I proposed 18 months ago as part of the SVP for
> > RISC-V spec?
> 
> i'd really like to use both (dynamically), that was what the CR8x8 matrix
> concept was.  there is room to overload elwidth to do it... however the
> implications for the DMs are so complex that it would be foolish to try as a
> first iteration.
> 
> given that if we *don't* use vertical numbering on CRs we are forced instead
> to add a 1 year delay on the critical path it is clearly unacceptable to use
> the SVP scheme... for CRs

I would argue that we should use vertical numbering for all int/fp/cr register files since that makes for nice consistency as well as having benefits for register allocation.

you could think of it as extending OpenPower v3.1 scalar to have 4-reg vectors at every int/fp reg and 8-element vectors at every CR field.

This means we don't have to extend gcc's register allocator to handle ranges for the MVP, *saving several months time*. this will limit gcc for now to handling 8-element vectors or 256-bit vectors, whichever is smaller.

> given that it is clearly unacceptable to completely cut off entire swathes
> of the regfile from scalar operations

that's not a valid reason to prefer horizontal since both horizontal and vertical schemes cut off an equal number of registers.

> forcing the use of convoluted
> predicated mv operations if we *do* use vertical numbering on FP and Int
> operations it is clearly unacceptable to use the vertical numbering
> scheme... for FP and INT.

We don't realistically need that many scalar registers, 64 is (more than?) sufficient. the 128 are needed for vector purposes.

> conclusion: vertical numbering for CRs (reluctantly), horizontal numbering
> for INT and FP.

I disagree for the above reasons.
Comment 2 Luke Kenneth Casson Leighton 2021-01-14 13:38:39 GMT
(In reply to Jacob Lifshay from comment #1)

> I would argue that we should use vertical numbering for all int/fp/cr
> register files since that makes for nice consistency as well as having
> benefits for register allocation.

jacob: i already made it clear that the complexity in understanding the fractional numbering is too high.  it was almost two weeks before i understood it.

the only reasob for considering it for CRs is because we're forced to.  and CRs are Hell anyway, with the low 2 bits not being incremented through.


> you could think of it as extending OpenPower v3.1 scalar to have 4-reg
> vectors at every int/fp reg and 8-element vectors at every CR field.
> 
> This means we don't have to extend gcc's register allocator to handle ranges
> for the MVP, *saving several months time*.

i know.  i'm not happy about it.  but:
a) tough.  the hardware is too complex.  i have said it five or six times now: i am not redesigning the routing on the regfile.

b) there exists some explicit control over gcc fp/int regs where there is NONE on CRs.  as a concept they do not exist AT ALL in the frontend.  we are therefore FORCED to reluctantly use an alternative scheme.

> this will limit gcc for now to
> handling 8-element vectors or 256-bit vectors, whichever is smaller.

you are completely forgetting about the hardware design.  i do not want at this incredibly late stage to THINK about any kind of FP/INT redesign involving renumbering.

the only reason i'm considering it at all is because CRs are only 4 bits.  those can be batched up in groups of 8 which is only 32 wires.

the DMs are going to be shit, basically.  supporting mfcr is going to be a huge penalty on performance and require a rewrite of the whole CR pipeline.


 
> > given that it is clearly unacceptable to completely cut off entire swathes
> > of the regfile from scalar operations
> 
> that's not a valid reason to prefer horizontal since both horizontal and
> vertical schemes cut off an equal number of registers.

it's not about the quantity.

the way in which they are cut off forces the use of additional expensive instructions.


> > forcing the use of convoluted
> > predicated mv operations if we *do* use vertical numbering on FP and Int
> > operations it is clearly unacceptable to use the vertical numbering
> > scheme... for FP and INT.
> 
> We don't realistically need that many scalar registers, 64 is (more than?)
> sufficient. the 128 are needed for vector purposes.

having a system where interaction between the two PUNISHES developers for doing so is not going to fly.

i'm not running that by the OPF ISA WG.

i am really sorry, this one is also invalid.

we are under time pressure and we are wasting time discussing this.

we are not going to be adding 128 bit or VSX any time in the next 2+ years.

the INT/FP regfile design and routing is extremely complex, was done months ago, is *specifically* targetted at 64 bit and cannot change.

none of us can earn any donations from NLnet for continuing to discuss this.

can we please stop discussing this, i am getting very fed up of repeating myself, and getting very concerned that i am not earning any money for having to repeatedly go over something that is not going to happen.

can we PLEASE move on to implementation.

you keep putting me under huge pressure repeatedly by asking again and again for something that i have already said no on multiple times.  i appreciate that you want to do a full investigation but this is getting too much.  we HAVE to stop, i cannot cope.
Comment 3 Jacob Lifshay 2021-01-14 18:45:44 GMT
Ahh, so gcc already supporting contiguous register ranges in the register allocator combined with avoiding reworking existing instructions/ABIs in gcc that use register pairs finally sounds like a good enough reason to me to not implement #553. Lets just hope it can efficiently allocate large ranges without n^2 or n^3 runtime :)
Comment 4 Luke Kenneth Casson Leighton 2021-01-15 00:29:04 GMT
apologies, jacob, the information i am holding in my head (unimplemented) is becoming beyond my capacity to explain.  coupled with the difference between electrical and chemical neural recall (chemical is long-term and often difficult to access) i am basically getting symptoms best known by the phenomenon "writer's block".

in essence i "know" something is "not right" but am literally unable to say why because my memory recall is not responding immediately, and due to the length of time that has gone by on some of the details (2 years) it may actually be *several days* before details emerge sufficiently to be able to *begin* to describe them to you.

bottom line is that when you push and push and push basically demanding an exact and precise response *i cannot give you one* and this is terribly frustrating for me, not to be able to speak and give you the "exact" answer that you expect.

the only way that this is going to work is if implementation proceeds *right now*, without further delay, getting the core details out into code that can be reviewed, understood, and incrementally adjusted accordingly.
Comment 5 Jacob Lifshay 2021-01-15 00:47:24 GMT
(In reply to Luke Kenneth Casson Leighton from comment #4)
> the only way that this is going to work is if implementation proceeds *right
> now*, without further delay, getting the core details out into code that can
> be reviewed, understood, and incrementally adjusted accordingly.

Ok, then we should start implementing stuff! if you write it all out in code, it will likely become easier to think about! Writing it out can be one of the ways to think through the consequences.

The changes that implementing this bug report would require are pretty localized to the decoder anyway, so, if we decide that we need it (which we may want despite the extra work in gcc due to needing to efficiently support 128-bit data values for AES/SHA256/etc.), it should be pretty easy to add on afterwards.

One future option (that you don't need to think about now -- just start writing code) is to instead conceptually base the SV isa on 128-bit registers, where all integer and fp registers are rearranged into 64 128-bit registers (instead of 128 64-bit registers) and standard 64-bit scalar operations just operate on the lower 64-bits. Vector operations tack a sequence of 128-bit registers together to form the backing storage for SV vectors.

I'm marking this bug as deferred, since we *do* need to think about it later, just not right now, instead of saying we'll never do it.

We can defer this till after we have an initial working cpu design.