RFC ls005 (XLEN) needs iteration, review, feedback and questions.
there is a March 2023 (CONFIDENTIAL) OPF ISA WG discussion, permission
needed to copy questions from minutes.
Author: Luke Kenneth Casson Leighton <firstname.lastname@example.org>
Date: Tue Apr 18 16:30:22 2023 +0100
add extsb/h/w example of XLEN
see https://bugs.libre-soc.org/show_bug.cgi?id=1061 and
Author: Jacob Lifshay <email@example.com>
Date: Tue Apr 18 14:04:02 2023 -0700
fix extsb pseudo-code
some of the bit positions had been wrong
(In reply to Jacob Lifshay from comment #2)
> commit 3683d40d020ca785168fb059f75e7159cc904ab1
> Author: Jacob Lifshay <firstname.lastname@example.org>
> Date: Tue Apr 18 14:04:02 2023 -0700
> fix extsb pseudo-code
i have reverted this UNAUTHORISED change.
please consult, discuss, and obtain consent before making changes.
i have now put in corrected and short pseudocode after having
to also spend time reverting unauthorised pseudocode changes.
the purpose of the example pseudocode within this RFC
is to demonstrate bit-level functionality in as immediate
and direct fashion as possible.
Comments from IBM architects regarding LS005:
This is subject matter for which no degree of handwaving is acceptable. Every last detail needs to be spelled out, perhaps not in final words, but in clear, unambiguous descriptions. We would much prefer to scalarize or packed SIMD vectorize into scalar registers using existing Vector/VSX instructions to avoid the many pains associated with, for example, address generation using tiny registers.
Is XLEN a fixed constant for a given implementation, or can it be programmed?
Other complications that need to be addressed include interaction with SPRs and things like SPRs that have fixed lengths not subject to XLEN, how to do CMODX (self-modifying code), assurances about atomicity of operations that need to be atomic, etc.
Since the MMU depends on 64b addresses, it's hard to see anything smaller being acceptable for privileged architecture. Might get away with 32b for problem state. How does this apply to Vector and VSX? Although LibreSoC may not care, the architecture needs to be complete.
Given that this document (ls005) explicitly doesn't settle on final wording for the contentious cases, it is not an RFC, but rather a Formal Proposal.
Overall my opinion is that this would be quite invasive yet not really satisfy the need to specify what happens with SVP64 vectorization. It expresses the potentially shorter element width that SVP64 can specify, but doesn't express doing multiple operations within one register, or the effect of predication, or possible saturating arithmetic, all of which SVP64 can specify.
Also, with SVP64, the value of XLEN that applies to a given instruction is set dynamically, whereas there is no obvious way to control XLEN in the other cases given as justification. Is it a constant of the implementation in those cases?
I also don't think that setting XLEN=32 globally is a good way to describe a 32-bit Power ISA processor, because it (i.e. the current ISA with XLEN set to 32) would be quite incompatible with past 32-bit PowerPC implementations. For example, instructions like cmpw would only operate on 16 bits rather than 32, and you'd have to use cmpd to get a 32-bit comparison (which is an illegal instruction on 32-bit PowerPC CPUs).
It seems that there would be no chance of binary compatibility between an XLEN=32 implementation and an XLEN=64 implementation. In contrast, you can successfully run a binary compiled for an older 32-bit PowerPC implementation on a CPU that conforms to PowerISA v3.1B (at least as far as user-level instructions go).
How do loads and stores work? Does XLEN=32 mean they do half width, or are they exempt? If they only do half width, then they would only load or store half the amount of data that programmers would expect from the mnemonic. For example, an lwz instruction would load 16 bits, an lhz would load 8 bits, and I don't know what an lbz would do. That would be quite confusing.
Regarding the notion of grouping registers to get an address, when the size of an address is larger than the size of a register, does that apply to SPRs too? Do SPRs change size according to XLEN? If so, do we then need multiple SPRs for things like the PTCR?
In general, having instructions with "d" for "doubleword" in their name only do 32 (or 16 or 8) bits would be very confusing for programmers (and similarly for "word" and "halfword").
Thinking about this a bit further, the instruction description would not need to mention the possibility of doing multiple elements in a register in the SV case, because the SV iterations would take care of that. What would need to be mentioned, though, is that the bits affected are not always the least significant bits.
In other words, it isn't really sufficient for the RTL to say that bits 64-XLEN:63 of the relevant GPRs are used or modified. It needs to be something like
64 - ((i + 1) * XLEN) : 63 - (i * XLEN)
where i is the element number within the GPR. In the non-SV case, i would always be 0.