we need a Condition Register pipeline * https://github.com/antonblanchard/microwatt/blob/master/execute1.vhdl * https://libre-soc.org/openpower/isa/condition/ * https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/fu/cr;hb=HEAD * OP_MFCR done * OP_CROP (which includes MCRF) done * OP_MTCRF done * OP_ISEL done * OP_SETB done
Are we planning to do crand, cror, crxor, and friends? They're not implemented in microwatt as far as I can tell
(In reply to Michael Nolan from comment #1) > Are we planning to do crand, cror, crxor, and friends? They're not > implemented in microwatt as far as I can tell https://github.com/antonblanchard/microwatt/blob/master/execute1.vhdl line 719. oink. am i reading this correctly: they actually use the cr operand as if it was a *table*?? moo? so the instruction field 0-15 is treated as a 4-bit number (a 4-bit truth-table) you then take ba and bb and create a 2-bit index. then use that 2-bit address to get one of the bits of the 4-bit instruction field. wooow. i'm deeply impressed. um... answer... yes they do implement crand, cror etc :) this must be how FPGAs do things, and is probably why LUTs are 4-bit.
The hell? IBM separately defines mtcrf and mtocrf, and reserves a bit to differentiate between the two. Except there's no functional difference between the two that I can tell, and the assembler will even assemble some instances of mtcrf as mtocrf. Basically mtcrf fxm, reg uses fxm to construct a 32 bit mask, where each bit in fxm corresponds to 4 bits in the mask (so mtcrf 0xf1 would correspond to a mask of 0xffff000f). It then uses this mask to choose between bits of CR and the input register, and writes them back to CR mtocrf fxm, reg assumes fxm is a one hot encoded index into the bits of reg. It grabs those bits from reg and inserts them into cr, leaving the rest untouched. If fxm is not onehot, then the result is undefined. Why these are different, I don't know.
btw apologies i did a code-munge on cr/main_stage.py - it did work prior to a rebase-conflict... am investigating, will keep in touch if i spot the discrepancy.
(In reply to Luke Kenneth Casson Leighton from comment #4) > btw apologies i did a code-munge on cr/main_stage.py - it did work prior to > a rebase-conflict... am investigating, will keep in touch if i spot the > discrepancy. oh: i think it's just that you added the unit test before adding the code.
(In reply to Michael Nolan from comment #3) > The hell? IBM separately defines mtcrf and mtocrf, and reserves a bit to > differentiate between the two. Except there's no functional difference > between the two that I can tell, and the assembler will even assemble some > instances of mtcrf as mtocrf. bizarre. * mtcrf FXM,RS mask <- ([FXM[0]]*4 || [FXM[1]]*4 || [FXM[2]]*4 || [FXM[3]]*4 || [FXM[4]]*4 || [FXM[5]]*4 || [FXM[6]]*4 || [FXM[7]]*4) CR <- ((RS)[32:63] & mask) | (CR & ¬mask) * mtocrf FXM,RS count <- 0 do i = 0 to 7 if FXM[i] = 1 then n <- i count <- count + 1 if count = 1 then CR[4*n+32:4*n+35] <- (RS)[4*n+32:4*n+35] else CR <- undefined > Basically mtcrf fxm, reg uses fxm to construct a 32 bit mask, where each bit > in fxm corresponds to 4 bits in the mask (so mtcrf 0xf1 would correspond to > a mask of 0xffff000f). It then uses this mask to choose between bits of CR > and the input register, and writes them back to CR > mtocrf fxm, reg assumes fxm is a one hot encoded index into the bits of reg. > It grabs those bits from reg and inserts them into cr, leaving the rest > untouched. If fxm is not onehot, then the result is undefined. > > Why these are different, I don't know. clueless. suggest to use nmutil PriorityPicker on FXM (see nmutil) because it can be used do the one-hot detection very efficiently. if the output is not exactly equal to the input from a PriorityPicker (and is not zero) then this is the condition for detection that FXM is one-hot. p = PriorityPicker p.i.eq(FXM) is_onehot.eq(p.i == p.o & p.i != 0)
(In reply to Luke Kenneth Casson Leighton from comment #6) sorry: p = PriorityPicker(8) comb += p.i.eq(FXM) comb += is_onehot.eq(p.i == p.o & p.i != 0)
(In reply to Luke Kenneth Casson Leighton from comment #6) > > clueless. suggest to use nmutil PriorityPicker on FXM (see nmutil) because > it > can be used do the one-hot detection very efficiently. > > if the output is not exactly equal to the input from a PriorityPicker > (and is not zero) then this is the condition for detection that FXM > is one-hot. > > p = PriorityPicker > p.i.eq(FXM) > > is_onehot.eq(p.i == p.o & p.i != 0) For mtocrf I don't think this is necessary. If the onehot condition is not satisfied, it lists the result as being undefined. This should mean that the current behavior is acceptable For mfocrf, the situation is similar, the extra bits of the output register are undefined, unless the input is onehot, in which case they are set to 0. Setting them to 0 for all bits not selected by the mask should be acceptable here too
ugh, ugh. i have a horrible feeling about "undefined" behaviour. we need to properly investigate and find out exactly what happens during that "undefined" behaviour by running native assembler on the TALOS-II workstation, with a full range of FXM bits (only 256 instructions), and a random set of CR and RS (times 256). this will tell us exactly what happens. my concern is that if we rely on the "undefined" behaviour to say "ok we choose to make mtcrf and mtocrf the same", some assembly writers may have chosen *specifically* to assume that CR is set to ZERO (or some other value) after noting that that behaviour occurs on POWER9... you follow where that goes? i really do not like "undefined" behaviour in specifications. it's always a problem.
(In reply to Michael Nolan from comment #8) > For mtocrf I don't think this is necessary. If the onehot condition is not > satisfied, it lists the result as being undefined. This should mean that the > current behavior is acceptable > > For mfocrf, the situation is similar, the extra bits of the output register > are undefined, unless the input is onehot, in which case they are set to 0. > Setting them to 0 for all bits not selected by the mask should be acceptable > here too except (cross-over) this may bite us due to incompatibility with the "undefined" behaviour that assembly writers use these for - in *direct* but de-facto violation of the standard. royal nuisance. let's not hyper-prioritise it though.
(In reply to Luke Kenneth Casson Leighton from comment #9) > ugh, ugh. > > i have a horrible feeling about "undefined" behaviour. we need to properly > investigate and find out exactly what happens during that "undefined" > behaviour by running native assembler on the TALOS-II workstation, with > a full range of FXM bits (only 256 instructions), and a random set of > CR and RS (times 256). > > this will tell us exactly what happens. Could you send me the login for the talos server so I can test this?
# NOTE: we really should be doing the field decoding which # selects which bits of CR are to be read / written, back in the # decoder / insn-isue, have both self.i.cr and self.o.cr # be broken down into 4-bit-wide "registers", with their # own "Register File" (indexed by bt, ba and bb), # exactly how INT regs are done (by RA, RB, RS and RT) # however we are pushed for time so do it as *one* register. We can do this for crand and friends, as well as mcrf, but it doesn't work for mtcrf or mfcr because they can read or write the whole cr register
(In reply to Luke Kenneth Casson Leighton from comment #9) > ugh, ugh. > > i have a horrible feeling about "undefined" behaviour. we need to properly > investigate and find out exactly what happens during that "undefined" > behaviour by running native assembler on the TALOS-II workstation, with > a full range of FXM bits (only 256 instructions), and a random set of > CR and RS (times 256). > > this will tell us exactly what happens. Qemu seems to do nothing when it's not onehot lis 1, 0xdead ori 1, 1, 0xbeef # r1 = 0xdeadbeef lis 2, 0x1234 ori 2, 2, 0x5678 # r2 = 0x12345678 mtcr 1 # cr = 0xdeadbeef mtcrf 0xf, 2 mfcr 3 # r3 should be 0xdead5678 mtcr 1 # cr = 0xdeadbeef again # mtocrf 0xf, 2 .byte 0x7c, 0x50, 0xf1, 0x20 mfcr 4 # who knows (gdb) p/x $r3 $1 = 0xdead5678 (gdb) p/x $r4 $2 = 0xdeadbeef
(In reply to Michael Nolan from comment #12) > We can do this for crand and friends, as well as mcrf, but it doesn't work > for mtcrf or mfcr because they can read or write the whole cr register ah good catch. updated. yes you're right. this would be where the "multi-bit hazard" detection would come into play. the bit-vector would be 0 1 ... 7 8 CR0 CR1 ... CR7 {ALLCRs} for crand the decoding of BA (etc) would set the corresponding CR register hazard *and* on vector bit 8 {ALLCRs}... but not the other bits. thus it would be possible to have multiple parallel overlapping crand(s) (etc) as long as the read/write dependencies on CR0-CR7 do not clash however mtcrf would set bit 8 {ALLCRs} *AND* it would set *ALL* other bits 0 thru 7. as it's an optimisation it's off the TODO list for now.
(In reply to Michael Nolan from comment #11) > Could you send me the login for the talos server so I can test this? oh - err, yes, sorry. except my laptop segfaulted and although i have a mosh-screen session running that will still be open, i can't remember the server's IP address! it doesn't have a domain name associated with it yet. will find it tomorrow - 23:00 hrs here now. (In reply to Michael Nolan from comment #13) > Qemu seems to do nothing when it's not onehot interesting. i think i have a hypothesis that fits: it's possible that an early version of POWERPC had one version, they realised they needed the other, but couldn't really retire the old one as it was too commonly used.
hmmmm there are not many good choices for putting this instruction in. it requires RA RC INT regs, and full CR's BI field as input. for output it needs INT RS CR is the closest least disruptive candidate, if a 2nd INT reg is added: RB. when OP_ISEL => crbit := to_integer(unsigned(insn_bc(e_in.insn))); if e_in.cr(31-crbit) = '1' then result := a_in; else result := b_in; end if; result_en := '1';
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/cr/pipe_data.py;h=2894144fd814444a7bdb75ae6c695af8519ae084;hb=HEAD can add b to CRInputData. https://libre-soc.org/openpower/isa/fixedtrap/ what on earth isel is doing in the trap instruction group is beyond me. could be i just put the pseudocode in the wrong file
i'm so excited, i'm so happy, i added an instruction (isel).
Created attachment 62 [details] table of mtocrf mask, reg results from the talos server So I decided to test mtocrf with non-onehot masks on the talos server, and it appears to me that it uses the most significant bit of the mask and ignores the rest (except for mask=0xf, dunno what's up there). Very interesting.
hm might be good to add the test assembly code to the repo, put it somewhere around soc.fu.cr
unit test is not testing reg output (MFCRF, ISEL)
(In reply to Luke Kenneth Casson Leighton from comment #21) > unit test is not testing reg output (MFCRF, ISEL) now it is... commit c2a94b3e640c6c3a8e33834f8d073c795ce66815 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sun May 24 21:46:56 2020 +0100 add test of reg output, for MFCRF and ISEL