the number of CR operands if done as 4-wide CR Field Hazard protection comes out as 4-in 1-out. this is too much. however if the CR is considered 32-bit then both instructions are considered 1-in 1-out. the complication comes for SVP64. some discussion and analysis of architectural options is needed.
for scalar use assuming that there is only one 32-bit CR, this is pretty standard fare already: mtcr, mfcr. thus, for scalar i think there will be no problem. in TestIssuer the CR regfile is extremely weird: it is 4-bit ports mostly but then combined to *unary masking* to create at least two full 32-bit-wide ports that take read-enable (write-enable) on a per-4-bit-field basis. mfxcr is therefore a perfect match for this design, just pass the incoming mask immediate directly to the regfile. SVP64 gets complicated because the additional range (128 CR Fields) means that there would be QTY 16of 32-bit CRs (or, implementors could choose to make that QTY 8of 64-bit CRs). thus unless the programmer happens to issue an instruction that reads (then writes) to/from the same group of 8 CR Fields... crternluti can be reduced down to 3-in 1-out by making it an "overwrite". likewise crbinlut. this does have the significant advantage of greatly reducing opcode space, as well. i will see if there is an existing Form. drat i should have spotted this 6+ months ago.
* `crternlogi BT, BA, BB, BC, TLI, msk` | 0.5| 6-8 | 9-11 | 12-14 | 15-17 | 18-20 | 21-28 | 29-30 | 31 | Form | |----|-----|------|-------|-------|-------|-------|-------|-----|----------| | PO | BF | BFA | BFB | BFC | msk | TLI | XO | msk | CRB-Form | looking at https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD -> * `crternlogi BT, BA, BB, TLI, msk` | 0.5| 6-8 | 9-10 | 11-13 | 14-15 | 16-18 | 19-25 | 26-30 | 31 | Form | |----|-----|------|-------|-------|-------|-------|-----|----------| | PO | BF | msk | BFA | msk | BFB | TLI | XO | TLI | CRB-Form | This is messy but 7 bits of TLI match with TLI-Form, BF BFA and BFB match with X-Form, the intervening 2 bits being allocated to msk, the XO being the same length as A-Form, VA-Form, VA2-Form and most of the SV*-Forms, by starting at bit 26.
crbinlog | 0.5|6.8 | 9.11|12.14|15.17|18.21|22...30 |31| | -- | -- | --- | --- | --- |-----| -------- |--| | NN | BT | BA | BB | BC |m0-m3|000101110 |0 | -> same trick except i think it possible to just leave out the TLI Field | 0.5| 6-8 | 9-10 | 11-13 | 14-15 | 16-18 | 19-25 |26-30 | 31 | Form | |----|-----|------|-------|-------|-------|-------|------|----|----------| | PO | BF | msk | BFA | msk | BFB | /// | XO | / | CRB-Form | this means decoding both instructions is not costly.
911 TLI (21:28) 912 Field used by the ternlogi instruction as the 913 look-up table. 914 Formats: TLI needed: 911 TLI (21:25,19,20,31) 912 Field used by the crternlogi instruction as the 913 look-up table. 914 Formats: CRB-Form and msk is missing: msk (9:10,14:15) field used bycrternlogi and crbinlut to select which bits of CR Field BF are to be modified. Formats: CRB-Form
imho we should copy TLI-form: |0 |6 |9 |11 |14 |16 |21 |29 |31 | | PO | BF |msk| BFA |msk| RB/BFB | TLI | XO |Rc | except we don't need Rc, so: |0 |6 |9 |11 |14 |16 |21 |28 |31 | | PO | BF |msk| BFA |msk| RB/BFB | TLI | XO |TLI| RB/BFB is the LUT.
(In reply to Jacob Lifshay from comment #5) > imho we should copy TLI-form: > |0 |6 |9 |11 |14 |16 |21 |29 |31 | > | PO | BF |msk| BFA |msk| RB/BFB | TLI | XO |Rc | > > except we don't need Rc, so: > > |0 |6 |9 |11 |14 |16 |21 |28 |31 | > | PO | BF |msk| BFA |msk| RB/BFB | TLI | XO |TLI| > > RB/BFB is the LUT. oh, actually we only ever need BFB for crternlogi, so two more bits of XO by moving TLI bits over more. crbinlog is what needs RB/BFB where we can use almost the entire space used by TLI for XO, just need one bit for nh, which should match binlog's nh.
so, yeah, more or less what you came up with, just more space for RB and nh
(In reply to Jacob Lifshay from comment #6) > crbinlog is what needs RB/BFB where we can use almost the entire space used > by TLI for XO, just need one bit for nh, which should match binlog's nh. already explained in detail that the answer is a no on using the GPR for crbinlog. https://bugs.libre-soc.org/show_bug.cgi?id=1017#c18
commit 8e5326caf15e238494a20cb0937b0c41c0ea9d20 (HEAD -> master, origin/master, origin/HEAD) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Mar 15 15:00:29 2023 +0000 rewrite crternlogi and crbinlog to match new format, required to reduce both instructions to 3-read 1-write. https://bugs.libre-soc.org/show_bug.cgi?id=1023#c2 https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=8e5326caf15e238494a20cb0937b0c41c0ea9d20 -- commit aa9020ffed1996c7ab3526a5cfcf906ed61eeb04 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Mar 15 15:09:35 2023 +0000 add CRB-Form fields for crternlogi and crbinlog, they are both now reduced to 3-in 1-out, both needing to become overwrites due to the mask field (msk) making BF a Read-Modify-Write https://bugs.libre-soc.org/show_bug.cgi?id=1023#c4 https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=aa9020ffed1996c7ab3526a5cfcf906ed61eeb04
the answer here remained a firm "NO" following a detailed and unscheduled explanation of Dependency Matrces. the premise that grouping similar instructions together is ok because they already have GPR and CR paths from regfiles was demonstrated to be false. grouping of instructions behind shared Reservation Stations into the same pipelines requires the DM Cells to contain the *union* of the Register Profiles for those pipelines. thus whilst it is fine to group ALU and Logical together because they both share XER CR0 RT RS RA and RB, it is *not* fine to group madd with that same group because RC incteases the size of *all* the DM Cells by 15% just for serving that one extra register. thus grouping CR Ops with these instructions similarly increases the total number of registers handled by every DM Cell in front of those RSes. this is why the CDC6600 and the 68000 have 3 separate register files with very little crossover instructions as it keeps the DMs lean and sparse, right where they are critically important to keep gate count down. jacob was unaware of all of this and it had to be explained under duress.