1053 – Separate Vector CRs containing CR8-CR127 from Scalar CR containing CR0-CR7

Bug 1053 - Separate Vector CRs containing CR8-CR127 from Scalar CR containing CR0-CR7

Summary: Separate Vector CRs containing CR8-CR127 from Scalar CR containing CR0-CR7

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Specification (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	Highest critical
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:

Reported:	2023-04-11 14:25 BST by Luke Kenneth Casson Leighton
Modified:	2023-11-30 07:01 GMT (History)
CC List:	3 users (show)

See Also:
NLnet milestone:	NLnet.2022-08-051.OPF
total budget (EUR) for completion of task and all subtasks:	1500
budget (EUR) for this task, excluding subtasks' budget:	1500
parent task for budget allocation:	1011
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	lkcl=1200 jacob=300

Attachments
subdivision into separate regfiles (583.94 KB, image/jpeg) 2023-04-11 14:28 BST, Luke Kenneth Casson Leighton	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2023-04-11 14:25:35 BST

TODO: IT IS CRITICAL that this be reverted, with a note that
implementations with an existing CR (CR0-7)
instead may internally perform micro-coding to achieve the
same end-result as described below.

we've been made aware that the use of CR Fields as both Vectors and
Predicate Masks could compromise multi-issue out-of-order systems
due to the massive Hazard Management it creates.

to ensure that *scalar* instructions are not "damaged" the idea is
to make instructions that mix and match from CR00-7 and CR8-127
raise Illegal Instruction traps, *with the exception* of 1-in 1-out
such as sv.mfcr and the sv.crweird group, which woud still be
restricted to singlr-scalar destination if the destination is CR0-CR7.

high-performance systems could therefore consider CR0-7 as
a *completely and literally separate* register file from
CR8-CR127.

the same concept could also hypothetically be applied to GPR and
FPR but the result coud damage Simple-V by restricting the number
of contiguous registers useable as Vectors: the existing scalar
GPR/FPR being 25% of SVP64's register range.

Comment 1 Luke Kenneth Casson Leighton 2023-04-11 14:28:17 BST

Created attachment 189 [details]
subdivision into separate regfiles

Comment 2 Jacob Lifshay 2023-04-12 09:24:26 BST

as i mentioned in the meeting on tuesday, I think we need to specifically permit crmove and mcrf between cr0-7 and cr8-127 because the register allocator needs to have an inexpensive method of moving cr fields around -- this can be restricted to svp64 scalar-mode only.

crmove a, b is cror a, b, b

Comment 3 Luke Kenneth Casson Leighton 2023-04-12 12:33:21 BST

(In reply to Jacob Lifshay from comment #2)
> as i mentioned in the meeting on tuesday, I think we need to specifically
> permit crmove and mcrf between cr0-7 and cr8-127 because the register
> allocator needs to have an inexpensive method of moving cr fields around --
> this can be restricted to svp64 scalar-mode only.

mcrf yes agreed 100%.
 
> crmove a, b is cror a, b, b

mmmm... it's making me nervous, because that's a really deep-dive into
the decoding.  not only is it "is this a cror" it's "is BFA equal to BFB"
as well as the "is BFA and BFB EXTRA3 marked Scalar" which is already being
proposed here.

with all the other possible aliases (from other crops), which would also have
to be tackled, i'm really not keen. remember this is the *decoder* we're talking

honestly i feel it would be better to keep that to the crweird mcrfm
instruction, which achieves the same thing and doesn't expect IBM to
"damage" their existing implementation.

https://libre-soc.org/openpower/sv/cr_int_predication/

also mcrfm can handle up to 4 bits at a time.

yes you can't do EQ->LT transfers/copies but you could transfer multiple
bits (either with mcrfm or with mcrf) and then start moving bits in a
single field (anywhere within the 32-bits of the Condition Register)
as a separate instruction.

Comment 4 Luke Kenneth Casson Leighton 2023-04-12 14:29:05 BST

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=8a607ea0122bd043125f3318bbc6ef1294255e1b

Comment 5 Luke Kenneth Casson Leighton 2023-04-12 17:03:26 BST

second aspect, add to crops page

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=834b0e8450499da9db632e2315162cfc5034e609

Comment 6 Luke Kenneth Casson Leighton 2023-04-12 17:06:10 BST

(In reply to Luke Kenneth Casson Leighton from comment #0)

> the same concept could also hypothetically be applied to GPR and
> FPR but the result coud damage Simple-V by restricting the number
> of contiguous registers useable as Vectors: the existing scalar
> GPR/FPR being 25% of SVP64's register range.

there is a reason why that is not needed: it's because unlike predication
using the CR Fields there is no additional Hazard Dependency created
just by Vector-Looping.

Comment 7 Luke Kenneth Casson Leighton 2023-04-12 18:06:40 BST

third aspect, add to "quirks" page

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=1e7791dd188efc45d5fc889071210b56790c7238

Comment 8 Luke Kenneth Casson Leighton 2023-09-05 05:36:18 BST

it is ABSOLUTELY PARAMOUNT that the changes actioned under this bugreport
be REVERTED, replaced by a "Engineering Note" to advise to use microcoding