TODO: IT IS CRITICAL that this be reverted, with a note that implementations with an existing CR (CR0-7) instead may internally perform micro-coding to achieve the same end-result as described below. we've been made aware that the use of CR Fields as both Vectors and Predicate Masks could compromise multi-issue out-of-order systems due to the massive Hazard Management it creates. to ensure that *scalar* instructions are not "damaged" the idea is to make instructions that mix and match from CR00-7 and CR8-127 raise Illegal Instruction traps, *with the exception* of 1-in 1-out such as sv.mfcr and the sv.crweird group, which woud still be restricted to singlr-scalar destination if the destination is CR0-CR7. high-performance systems could therefore consider CR0-7 as a *completely and literally separate* register file from CR8-CR127. the same concept could also hypothetically be applied to GPR and FPR but the result coud damage Simple-V by restricting the number of contiguous registers useable as Vectors: the existing scalar GPR/FPR being 25% of SVP64's register range.
Created attachment 189 [details] subdivision into separate regfiles
as i mentioned in the meeting on tuesday, I think we need to specifically permit crmove and mcrf between cr0-7 and cr8-127 because the register allocator needs to have an inexpensive method of moving cr fields around -- this can be restricted to svp64 scalar-mode only. crmove a, b is cror a, b, b
(In reply to Jacob Lifshay from comment #2) > as i mentioned in the meeting on tuesday, I think we need to specifically > permit crmove and mcrf between cr0-7 and cr8-127 because the register > allocator needs to have an inexpensive method of moving cr fields around -- > this can be restricted to svp64 scalar-mode only. mcrf yes agreed 100%. > crmove a, b is cror a, b, b mmmm... it's making me nervous, because that's a really deep-dive into the decoding. not only is it "is this a cror" it's "is BFA equal to BFB" as well as the "is BFA and BFB EXTRA3 marked Scalar" which is already being proposed here. with all the other possible aliases (from other crops), which would also have to be tackled, i'm really not keen. remember this is the *decoder* we're talking honestly i feel it would be better to keep that to the crweird mcrfm instruction, which achieves the same thing and doesn't expect IBM to "damage" their existing implementation. https://libre-soc.org/openpower/sv/cr_int_predication/ also mcrfm can handle up to 4 bits at a time. yes you can't do EQ->LT transfers/copies but you could transfer multiple bits (either with mcrfm or with mcrf) and then start moving bits in a single field (anywhere within the 32-bits of the Condition Register) as a separate instruction.
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=8a607ea0122bd043125f3318bbc6ef1294255e1b
second aspect, add to crops page https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=834b0e8450499da9db632e2315162cfc5034e609
(In reply to Luke Kenneth Casson Leighton from comment #0) > the same concept could also hypothetically be applied to GPR and > FPR but the result coud damage Simple-V by restricting the number > of contiguous registers useable as Vectors: the existing scalar > GPR/FPR being 25% of SVP64's register range. there is a reason why that is not needed: it's because unlike predication using the CR Fields there is no additional Hazard Dependency created just by Vector-Looping.
third aspect, add to "quirks" page https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=1e7791dd188efc45d5fc889071210b56790c7238
it is ABSOLUTELY PARAMOUNT that the changes actioned under this bugreport be REVERTED, replaced by a "Engineering Note" to advise to use microcoding