Bug 1062 - OPF RFC ls005 iterative feedback and questions
Summary: OPF RFC ls005 iterative feedback and questions
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL: https://libre-soc.org/openpower/sv/rf...
Depends on: 988
  Show dependency treegraph
Reported: 2023-04-18 16:41 BST by Luke Kenneth Casson Leighton
Modified: 2023-05-31 08:04 BST (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2022-08-051.OPF
total budget (EUR) for completion of task and all subtasks: 2500
budget (EUR) for this task, excluding subtasks' budget: 2500
parent task for budget allocation: 1012
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
lkcl=1100 red=1000 jacob=400


Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2023-04-18 16:41:50 BST
RFC ls005 (XLEN) needs iteration, review, feedback and questions.
there is a March 2023 (CONFIDENTIAL) OPF ISA WG discussion, permission
needed to copy questions from minutes.
Comment 1 Luke Kenneth Casson Leighton 2023-04-18 16:46:24 BST
commit 315cfdb906e073ef61ca71a00713163b895c6cb8
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Apr 18 16:30:22 2023 +0100

    add extsb/h/w example of XLEN
    see https://bugs.libre-soc.org/show_bug.cgi?id=1061 and

Comment 2 Jacob Lifshay 2023-04-18 22:06:24 BST
commit 3683d40d020ca785168fb059f75e7159cc904ab1
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Tue Apr 18 14:04:02 2023 -0700

    fix extsb pseudo-code
    some of the bit positions had been wrong

Comment 3 Luke Kenneth Casson Leighton 2023-04-18 22:59:17 BST
(In reply to Jacob Lifshay from comment #2)
> commit 3683d40d020ca785168fb059f75e7159cc904ab1
> Author: Jacob Lifshay <programmerjake@gmail.com>
> Date:   Tue Apr 18 14:04:02 2023 -0700
>     fix extsb pseudo-code
i have reverted this UNAUTHORISED change.

please consult, discuss, and obtain consent before making changes.
Comment 4 Luke Kenneth Casson Leighton 2023-04-18 23:18:02 BST
i have now put in corrected and short pseudocode after having
to also spend time reverting unauthorised pseudocode changes.

the purpose of the example pseudocode within this RFC 
is to demonstrate bit-level functionality in as immediate
and direct fashion as possible.
Comment 5 Paul Mackerras 2023-05-25 06:57:16 BST
Comments from IBM architects regarding LS005:

This is subject matter for which no degree of handwaving is acceptable. Every last detail needs to be spelled out, perhaps not in final words, but in clear, unambiguous descriptions. We would much prefer to scalarize or packed SIMD vectorize into scalar registers using existing Vector/VSX instructions to avoid the many pains associated with, for example, address generation using tiny registers.

Is XLEN a fixed constant for a given implementation, or can it be programmed?

Other complications that need to be addressed include interaction with SPRs and things like SPRs that have fixed lengths not subject to XLEN, how to do CMODX (self-modifying code), assurances about atomicity of operations that need to be atomic, etc.

Since the MMU depends on 64b addresses, it's hard to see anything smaller being acceptable for privileged architecture. Might get away with 32b for problem state. How does this apply to Vector and VSX? Although LibreSoC may not care, the architecture needs to be complete.
Comment 6 Paul Mackerras 2023-05-31 05:21:05 BST
Given that this document (ls005) explicitly doesn't settle on final wording for the contentious cases, it is not an RFC, but rather a Formal Proposal.

Overall my opinion is that this would be quite invasive yet not really satisfy the need to specify what happens with SVP64 vectorization. It expresses the potentially shorter element width that SVP64 can specify, but doesn't express doing multiple operations within one register, or the effect of predication, or possible saturating arithmetic, all of which SVP64 can specify.

Also, with SVP64, the value of XLEN that applies to a given instruction is set dynamically, whereas there is no obvious way to control XLEN in the other cases given as justification. Is it a constant of the implementation in those cases?

I also don't think that setting XLEN=32 globally is a good way to describe a 32-bit Power ISA processor, because it (i.e. the current ISA with XLEN set to 32) would be quite incompatible with past 32-bit PowerPC implementations. For example, instructions like cmpw would only operate on 16 bits rather than 32, and you'd have to use cmpd to get a 32-bit comparison (which is an illegal instruction on 32-bit PowerPC CPUs).

It seems that there would be no chance of binary compatibility between an XLEN=32 implementation and an XLEN=64 implementation. In contrast, you can successfully run a binary compiled for an older 32-bit PowerPC implementation on a CPU that conforms to PowerISA v3.1B (at least as far as user-level instructions go).

How do loads and stores work? Does XLEN=32 mean they do half width, or are they exempt? If they only do half width, then they would only load or store half the amount of data that programmers would expect from the mnemonic. For example, an lwz instruction would load 16 bits, an lhz would load 8 bits, and I don't know what an lbz would do. That would be quite confusing.

Regarding the notion of grouping registers to get an address, when the size of an address is larger than the size of a register, does that apply to SPRs too? Do SPRs change size according to XLEN? If so, do we then need multiple SPRs for things like the PTCR?

In general, having instructions with "d" for "doubleword" in their name only do 32 (or 16 or 8) bits would be very confusing for programmers (and similarly for "word" and "halfword").
Comment 7 Paul Mackerras 2023-05-31 08:04:59 BST
Thinking about this a bit further, the instruction description would not need to mention the possibility of doing multiple elements in a register in the SV case, because the SV iterations would take care of that. What would need to be mentioned, though, is that the bits affected are not always the least significant bits.

In other words, it isn't really sufficient for the RTL to say that bits 64-XLEN:63 of the relevant GPRs are used or modified. It needs to be something like

64 - ((i + 1) * XLEN) : 63 - (i * XLEN)

where i is the element number within the GPR. In the non-SV case, i would always be 0.