see * https://github.com/antonblanchard/microwatt/blob/master/execute1.vhdl * https://libre-soc.org/openpower/isa/branch/ * https://libre-soc.org/openpower/isa/fixedtrap/ * https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/fu/branch;hb=HEAD we need: * OP_B* - done * OP_TRAP (moved to trap pipe) * OP_RFID (also) anything that sets the PC by setting NIA. (decision made to have trap separate)
I'm working on defining the pipeline data type for this, and I'm wondering how I should handle the CR input. Conditional branches can branch on any of the 32 bits in the cr, so I think it should take in the entire CR. However, I suspect we'll be implementing cr0..cr7 as separate 4 bit registers to avoid false dependencies, so making the branch take in the whole register would complicate matters
(In reply to Michael Nolan from comment #1) > I'm working on defining the pipeline data type for this, and I'm wondering > how I should handle the CR input. Conditional branches can branch on any of > the 32 bits in the cr, so I think it should take in the entire CR. there is a selector in the insn, right? that is the "CR" equivalent of the RA, RB, RS and RT fields. > However, > I suspect we'll be implementing cr0..cr7 as separate 4 bit registers to > avoid false dependencies, so making the branch take in the whole register > would complicate matters yes those are the two ways to do it: A treat CR as separate 4 bit regs. B treat it as one 64 bit reg scheme A requires 8 regs as input if you need the full CR. 8 inputs from the Dependency Matrices is a bit much. sometimes however you need the full 64 bit. i do have a Dep Matrix strategy for both, it involves a vector grouping system: * both A and B are available, however q request for CR0 read will *also* set a request for the *whole* 64 bits of CR. something like that. however, it is complicated. so for now, i think we just go with full 64 bit. annoying as that is.
hmm hmm i thought perhaps instead of passing nia into all ALUs (where it would then not actually be used - the wires would be routed, taking up space but then not actually go anywhere), to create a separate CompBROpSubset? then, likewise in CompBROpSubset, anything _not_ used in any branch operation can be removed from it. diff --git a/src/soc/alu/alu_input_record.py b/src/soc/alu/alu_input_record.py index 2c3dcc4..41a40eb 100644 --- a/src/soc/alu/alu_input_record.py +++ b/src/soc/alu/alu_input_record.py @@ -13,7 +13,6 @@ class CompALUOpSubset(Record): def __init__(self, name=None): layout = (('insn_type', InternalOp), ('fn_unit', Function), - ('nia', 64), ('imm_data', Layout((("imm", 64), ("imm_ok", 1)))), #'cr = Signal(32, reset_less=True) # NO: this is from the C R SPR #'xerc = XerBits() # NO: this is from the XER SPR
commit d1a2cb9f4852bd8118cb4db043e6622cae077830 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri May 15 19:58:47 2020 +0100 add branch input record will swap over to that in a mo
oink. you *removed* NIA :) well... we will probably need CompBROpSubset anyway. things like data_len can be removed (i don't see branch needing that), and there may be other fields as well. i would have kiinda expected NIA to be passed in though (to CompBROpSubset), because it comes from the Program Counter, which would be special-cased rather than stored in an SPR. hum hum, don't know. oh. right. NIA is *next* instruction address, not an input. ok so i think we need something called "cia" - current instruction address - in CompBROpSubset. btw do a git pull i just did a minor cleanup
(In reply to Luke Kenneth Casson Leighton from comment #5) > oink. you *removed* NIA :) well... we will probably need CompBROpSubset > anyway. Yep, since nothing but the branch unit used it > hum hum, don't know. oh. right. NIA is *next* instruction address, not an > input. > ok so i think we need something called "cia" - current instruction address - > in > CompBROpSubset. I added CIA to the branch unit's input data because it wasn't one of the things output by the decoder (well NIA is, but it's defaulted to 0), and I needed to set it from the testbench.
ok am done. btw comb Signals default to zero if not set... actually they default to the value in their initialisation - Signal(len, reset=0x5000) - so do not need to be initialised to zero.
(In reply to Michael Nolan from comment #6) > (In reply to Luke Kenneth Casson Leighton from comment #5) > > oink. you *removed* NIA :) well... we will probably need CompBROpSubset > > anyway. > > Yep, since nothing but the branch unit used it good call. > > hum hum, don't know. oh. right. NIA is *next* instruction address, not an > > input. > > ok so i think we need something called "cia" - current instruction address - > > in > > CompBROpSubset. > > I added CIA to the branch unit's input data because it wasn't one of the > things output by the decoder (well NIA is, but it's defaulted to 0), as it's an output - and the CompBROpSubset is intended as as *input* to the pipeline - yes NIA definitely goes. > and I needed to set it from the testbench. yehyeh. we need cia however. hmm hmm... it's not a "register" - there's no delay in fetching it: the branch unit will receive the current PC right there, because that (like the insn) is "known information". yep, it needs to be added to CompBROpSubset. will do that now... commit 4e88a5d7027d5c60093925d38da9b1e63b77f743 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri May 15 20:33:15 2020 +0100 add cia to CompBROpSubset, remove data_len
argh. comb += branch.p.data_i.ctx.op.eq_from_execute1(pdecode2.e) nope. cia can't be part of BranchCompALUSubset - that's not its role, the ctx.op only stores information decoded from the actual instruction. will revert shortly.
diff --git a/src/soc/branch/pipe_data.py b/src/soc/branch/pipe_data.py index cab43cf..6d97102 100644 --- a/src/soc/branch/pipe_data.py +++ b/src/soc/branch/pipe_data.py @@ -22,10 +22,11 @@ class BranchInputData(IntegerData): # We need both lr and spr for bclr and bcctrl. Bclr can read # from both ctr and lr, and bcctrl can write to both ctr and # lr. <snip> + self.tar = Signal(64, reset_less=True) # Target Address Register I removed the TAR input, because the branch instruction would need to decide which of the 3 inputs to branch to. Instead, it should be fed that operand in the spr input (though that does mean it will be fed ctr twice for bcctr...). Does that seem reasonable?
(In reply to Michael Nolan from comment #10) > I removed the TAR input, because the branch instruction would need to decide > which of the 3 inputs to branch to. Instead, it should be fed that operand > in the spr input (though that does mean it will be fed ctr twice for > bcctr...). Does that seem reasonable? nggghhh. *thinks*, *thinks*.... ok, the way that the Function Units work is: well, let me find a diagram.. https://libre-soc.org/3d_gpu/comp_unit_req_rel.jpg although that's an ALU FU, in the case of a *branch* FU, those inputs would be: * CTR * CIA * CR * TAR basically, those latches at the top "accumulate" all of the inputs needed (some of which as you can see actually come from the operand - this is CompBROpSubset) anything that needs to be read from a regfile - whether it be the SPR one, the Condition Regfile or the INT Regfile - all needs to have those pairs of REQUEST/GO_READ signals. * the outgoing one *requests* the register (actually the regfile port) * the incoming one (GO_READ) demands that the FU latch the incoming register data, right now. it will not get another chance. so the idea of doing time-synchronous feeding of the input data (sharing one of the register inputs for two registers, one after the other) is massively more complicated than simply storing the data in (multiple) Function Unit latches. now, *at the regfile* there may only be one available Reg Port Bus. therefore *over the (one) Reg Port Bus*, it may take two clock cycles: * one to read CTR * one to read TAR and this can be done in any order: CTR first, TAR first, or if 2 Reg Port Buses happen to be available, *both* simultaneously. however that task is handled at the CompUnit (like in the diagram above), precisely so that the pipelines *do not* have to have such complexity. therefore, really, if the branch op needs 2 regs (one CTR, one TAR), then the BranchCompUnit will need 2 latches to cover them, and consequently BranchInputData needs to have both ctr and tar. let's have a look at what was generated by the parser. all the others are subsets (less regs) than this one: @inject() def op_bctarl(self, CTR, CR, TAR, LR): global NIA if mode_is_64bit: M = 0 else: M = 32 if ~BO[2]: CTR = CTR - 1 ctr_ok = BO[2] | ne(CTR[M:64], 0) ^ BO[3] cond_ok = BO[0] | ~(CR[BI + 32] ^ BO[1]) if ctr_ok & cond_ok: NIA = concat(TAR[0:62], SelectableInt(value=0x0, bits=2)) if LK: LR = CIA + 4 return (CTR, CR, TAR, LR,) that's interesting. TAR is returned but i do not see it being modified. that's a bug. NIA however is calculated but is not returned *from* the... oh wait, yes i see, it's a global. ok. got it. so, yes, apologies, we can't use spr for dual-purpose, we need 2 regs, and they might as well be named ctr and tar, rather than spr.
likewise we need LR at the same time. yeah, i know. it's a hell of a lot of incoming wires and registers. * LR 64-bit * CIA 64-bit (48 probably more like?) * CTR 64-bit * TAR 64-bit * CR 32-bit * CompBROpSubset a *lot*. around 64 bits due to the expanded decoding. we just have to tolerate that, and have a *five* input Branch Computation Unit. which has me slightly concerned for adding TRAP to the same pipeline, because that brings in *another* 128 bits (!!) - RA 64-bit, RB 64-bit, plus it modifies the CR so that's another 32 bits on the output side as well. blegh. can't do that. new bugreport, there.
ahhh hang on: just looking at decoder/isa/system.py as well that will take in... either: * CTR for rfsvc * SRR1 for one of the other ops * HSRR1 for another so yes! CTR should remain named "spr" (or better, be named "spr1"). however the 2nd one... ahh op_rfid refers to *both* MSR *and* SRR1 so it should be called "spr2". then, the execution issue will have to * hard-decode the operations to detect which SPR is to go to spr1 and which to spr2 * put those SPR numbers into the SPR-Dependency-Matrix and in this way, the right SPRs will arrive at spr1 and spr2 we should record those somewhere.... hmmm... in the BranchInputData file for now?
waaaa... insn CR SPR1 SPR2 SPR3 ---- -- ---- ---- ---- op_b xx LR xx xx op_ba xx LR xx xx op_bl xx LR xx xx op_bla xx LR xx xx op_bc CR, LR, CTR xx op_bca CR, LR, CTR xx op_bcl CR, LR, CTR xx op_bcla CR, LR, CTR xx op_bclr CR, LR, CTR xx op_bclrl CR, LR, CTR xx op_bcctr CR, LR, CTR xx op_bcctrl CR, LR, CTR xx op_bctar CR, LR, CTR, TAR op_bctarl CR, LR, CTR, TAR op_sc xx xx xx MSR op_scv xx LR, SRR1, MSR op_rfscv xx LR, CTR, MSR op_rfid xx SRR0, SRR1, MSR op_hrfid xx HSRR0, HSRR1, MSR and if we did TRAP in the Branch pipeline as well, that would be *six* incoming 64-bit register latch paths. nooo, i don't think so :) so the above table should guide the numbering. LR - that's not an SPR, is it? if not, then ho hum we should call the field lr_or_spr1 or something horrible. not a huge fan of long data structure names, but hey :) * the BranchCompUnit should expect to read the SPR regfile and allocate lr_or_spr1, spr2, and spr3 accordingly. * BranchInputData should be the target recipient of that data * branch/main_stage should decode the three fields lr_or_spr1, spr2 and spr3 likewise according to that table. for op_bc* that's dead simple: lr = lr_or_spr1, ctr = spr2, tar = spr3 for op_sc and other system calls it's more complex, but doable.
(In reply to Luke Kenneth Casson Leighton from comment #14) > waaaa... > > insn CR SPR1 SPR2 SPR3 > ---- -- ---- ---- ---- > op_b xx LR xx xx > op_ba xx LR xx xx > op_bl xx LR xx xx > op_bla xx LR xx xx > op_bc CR, LR, CTR xx > op_bca CR, LR, CTR xx > op_bcl CR, LR, CTR xx > op_bcla CR, LR, CTR xx > op_bclr CR, LR, CTR xx > op_bclrl CR, LR, CTR xx > op_bcctr CR, LR, CTR xx > op_bcctrl CR, LR, CTR xx > op_bctar CR, LR, CTR, TAR > op_bctarl CR, LR, CTR, TAR > > op_sc xx xx xx MSR > op_scv xx LR, SRR1, MSR > op_rfscv xx LR, CTR, MSR > op_rfid xx SRR0, SRR1, MSR > op_hrfid xx HSRR0, HSRR1, MSR > > and if we did TRAP in the Branch pipeline as well, that would be > *six* incoming 64-bit register latch paths. nooo, i don't think so :) That table is a bit large, at least for the branch instructions. None of the branches except for bclr(l) need to read LR, only write to it. And bclr doesn't need to read TAR, so multiplexing TAR/LR on the same port would be possible. > so the above table should guide the numbering. LR - that's not an SPR, > is it? It is, it's SPR #8 what are the opcodes starting with op_sc? I don't see any of them in the power_enums list, nor do I see corresponding opcodes in the power ISA specification
(In reply to Michael Nolan from comment #15) > > what are the opcodes starting with op_sc? I don't see any of them in the > power_enums list, nor do I see corresponding opcodes in the power ISA > specification NVM, I see where they're from
(In reply to Michael Nolan from comment #15) > (In reply to Luke Kenneth Casson Leighton from comment #14) > > waaaa... > > > > insn CR SPR1 SPR2 SPR3 > > ---- -- ---- ---- ---- > > op_b xx LR xx xx > > op_ba xx LR xx xx > > op_bl xx LR xx xx > > op_bla xx LR xx xx > > op_bc CR, LR, CTR xx > > op_bca CR, LR, CTR xx > > op_bcl CR, LR, CTR xx > > op_bcla CR, LR, CTR xx > > op_bclr CR, LR, CTR xx > > op_bclrl CR, LR, CTR xx > > op_bcctr CR, LR, CTR xx > > op_bcctrl CR, LR, CTR xx > > op_bctar CR, LR, CTR, TAR > > op_bctarl CR, LR, CTR, TAR > > > > op_sc xx xx xx MSR > > op_scv xx LR, SRR1, MSR > > op_rfscv xx LR, CTR, MSR > > op_rfid xx SRR0, SRR1, MSR > > op_hrfid xx HSRR0, HSRR1, MSR > > > > and if we did TRAP in the Branch pipeline as well, that would be > > *six* incoming 64-bit register latch paths. nooo, i don't think so :) > > > That table is a bit large, at least for the branch instructions. None of the > branches except for bclr(l) need to read LR, only write to it. err... nuts. the parser is bringing lr in as an input, when it should not. that's another bug. let me update the table... > And bclr > doesn't need to read TAR, so multiplexing TAR/LR on the same port would be > possible. yep. good catch. > > so the above table should guide the numbering. LR - that's not an SPR, > > is it? > It is, it's SPR #8 okaaay good. > what are the opcodes starting with op_sc? I don't see any of them in the > power_enums list, nor do I see corresponding opcodes in the power ISA > specification (i saw you found them, later comment - decoder/isa/system.py) sooo, iiinteresting: OP_TRAP needs up to 3 SPRs, but OP_B* does not. that is almost worthwhile considering having OP_TRAP and OP_B* in the same pipeline... mm... 5 64-bit datapaths... yyeah no :)
while you're doing the cr pipe i'll sort out spr1/2/3 in branch.
(In reply to Luke Kenneth Casson Leighton from comment #18) > while you're doing the cr pipe i'll sort out spr1/2/3 in branch. done - please do let me know if the submodule is messed up (it shouldn't be). in the unit test it will be necessary to have some detection logic (decoding the op somewhat) in order to correctly set spr1,2,3. this will not be "wasted" code because once written it is exactly what the instruction-issue phase will have to do.
michael is this correct? # Yes, the CTR only counts 32 bits ctr = Signal(64, reset_less=True) comb += ctr.eq(self.i.ctr - 1) that's 64-bit for ctr, and the input self.i.ctr is also 64-bit
if (mode_is_64bit) then M <- 0 else M <- 32 answer's "no". 32-bit in 32-bit mode, 64-bit in 64-bit mode
(In reply to Luke Kenneth Casson Leighton from comment #21) > if (mode_is_64bit) then M <- 0 > else M <- 32 > > answer's "no". 32-bit in 32-bit mode, 64-bit in 64-bit mode Oops. I thought I had seen somewhere that that wasn't the case when I wrote the comment, but decided to double check and forgot to delete the comment.
i looked more closely at the difference between branch, sys and trap, and rheir combination would be a massive EIGHT incoming register ports. * RA, RB, CR, MSR for trap * 2x SPRs, MSR for syscalls * 2x SPRs, no MSR for branches this is too much. i had assumed that MSR was a SPR, it isn't. so the number of SPRs for branch can be reduced to 2. spr3 can be deleted.
ehn? any clues, michael? ``` Traceback (most recent call last): File "fu/branch/test/test_pipe_caller.py", line 134, in run_all sim = Simulator(m) File "/home/lkcl/src/libresoc/nmigen/nmigen/back/pysim.py", line 935, in __init__ self._fragment = Fragment.get(fragment, platform=None).prepare() File "/home/lkcl/src/libresoc/nmigen/nmigen/hdl/ir.py", line 39, in get obj = obj.elaborate(platform) File "/home/lkcl/src/libresoc/nmigen/nmigen/hdl/dsl.py", line 537, in elaborate fragment.add_subfragment(Fragment.get(self._named_submodules[name], platform), name) File "/home/lkcl/src/libresoc/nmigen/nmigen/hdl/ir.py", line 39, in get obj = obj.elaborate(platform) File "/home/lkcl/src/libresoc/nmigen/nmigen/hdl/dsl.py", line 537, in elaborate fragment.add_subfragment(Fragment.get(self._named_submodules[name], platform), name) File "/home/lkcl/src/libresoc/nmigen/nmigen/hdl/ir.py", line 39, in get obj = obj.elaborate(platform) File "/home/lkcl/src/libresoc/soc/src/soc/decoder/power_decoder.py", line 315, in elaborate m = PowerDecoder.elaborate(self, platform) File "/home/lkcl/src/libresoc/soc/src/soc/decoder/power_decoder.py", line 262, in elaborate comb += self.op._eq(row) File "/home/lkcl/src/libresoc/soc/src/soc/decoder/power_decoder.py", line 139, in _eq self.internal_op.eq(InternalOp[row['internal op']]), File "/usr/lib/python3.7/enum.py", line 351, in __getitem__ return cls._member_map_[name] KeyError: 'OP_TDI' ```
nggggh submodules is beginning to piss me off. git silently failed and refused to note that the submodule had been changed. nothing reported by "git diff", nothing reported by "git submodule" - nothing.
added OP_RFID it seemed to be missing?
(In reply to Luke Kenneth Casson Leighton from comment #25) > nggggh submodules is beginning to piss me off. git silently failed and > refused to note that the submodule had been changed. nothing reported > by "git diff", nothing reported by "git submodule" - nothing. Yep... I haven't had that happen to me yet, but it's pissing me off too. I removed the opcodes OP_TDI, OP_TD, OP_TWI, OP_TW and replaced them with OP_TRAP
(In reply to Michael Nolan from comment #27) > (In reply to Luke Kenneth Casson Leighton from comment #25) > > nggggh submodules is beginning to piss me off. git silently failed and > > refused to note that the submodule had been changed. nothing reported > > by "git diff", nothing reported by "git submodule" - nothing. > > Yep... I haven't had that happen to me yet, but it's pissing me off too. We definitely should move to a monorepo
(In reply to Jacob Lifshay from comment #28) > (In reply to Michael Nolan from comment #27) > > Yep... I haven't had that happen to me yet, but it's pissing me off too. > > We definitely should move to a monorepo no. that sends the wrong message, unfortunately [it says, very loudly, "other specific focussed alternative uses for our work not welcome, in the slightest. download our monorepo containing everything you don't want or need, and like it. or Go Home. we don't care"] when you have time, look up the nightmare of freeswitch ignoring standard development practices when it comes to dependency management. the intention of having the csv and isatables in the wiki is to make it clear that they are independently useable *outside* of our project. without requiring gigabyte additional downloads and dependencies which we need but they most certainly do not. really they should be split out of the wiki entirely, to make it *really* clear that they are nothing to do with soc *or* libresoc. it is just submodules pissing me off as i am not used to active development with them. i'll get over it.
(In reply to Michael Nolan from comment #27) > Yep... I haven't had that happen to me yet, but it's pissing me off too. i keep forgetting to do "git submodule update " after every git pull. i think i will write a shell script which does both pull and update. hmm i wonder... maybe it should be added as a git hook > I removed the opcodes OP_TDI, OP_TD, OP_TWI, OP_TW and replaced them with > OP_TRAP i saw. i added OP_RFID to enums. rhat was missing.
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;h=dbd87f967255f97bbec0b47a2b295780d6381033;hb=e46a1253d05ea47b9e62714ed50bcec6a7530d0f#l299 is that supposed to be a full 32 bit ranged lookup? for branch, the full 32 bit CR is needed and BI looks up ... oh i see. the idea is to use 3 bits of BI to look up CR00 to 7 then in OP_BC look up the remaining 2 bits of BI to select which bit of CRn is to be used, right? that's neat because different branches ysing different CRs would not block on the CR regfile.
(In reply to Luke Kenneth Casson Leighton from comment #31) > the idea is to use 3 bits of BI to look up CR00 to 7 (3 MSBs) > then in OP_BC look up the remaining 2 bits of BI to select which bit of CRn > is to be used, right? (2 LSBs) except, small issue: endian-ness looks like it will be an issue. in main_stage.py the bit-order is reversed using [31-BI] # The bit of CR selected by BI cr_bit = Signal(reset_less=True) comb += cr_bit.eq((self.i.cr & (1<<(31-BI))) != 0) if we are to reduce self.i.cr (BranchInputData.cr) to 4 bits, should that not be comb += cr_bit.eq((self.i.cr & (1<<(3-BI))) != 0) but also, back in power_decoder2.py, for DecodeCRIn, what about here? with m.Case(CRInSel.BI): comb += self.cr_bitfield.data.eq(self.dec.BI[2:5]) comb += self.cr_bitfield.ok.eq(1)
(In reply to Luke Kenneth Casson Leighton from comment #32) > (In reply to Luke Kenneth Casson Leighton from comment #31) > > > the idea is to use 3 bits of BI to look up CR00 to 7 > > (3 MSBs) > > > then in OP_BC look up the remaining 2 bits of BI to select which bit of CRn > > is to be used, right? > > (2 LSBs) Yep, that's the idea! > > except, small issue: endian-ness looks like it will be an issue. > > in main_stage.py the bit-order is reversed using [31-BI] > > # The bit of CR selected by BI > cr_bit = Signal(reset_less=True) > comb += cr_bit.eq((self.i.cr & (1<<(31-BI))) != 0) > > if we are to reduce self.i.cr (BranchInputData.cr) to 4 bits, > should that not be > comb += cr_bit.eq((self.i.cr & (1<<(3-BI))) != 0) > > but also, back in power_decoder2.py, for DecodeCRIn, what about here? > > with m.Case(CRInSel.BI): > comb += self.cr_bitfield.data.eq(self.dec.BI[2:5]) > comb += self.cr_bitfield.ok.eq(1) I think the register file should be designed to address the 8 crs in the correct bit order. If I ask for cr0, I should get cr0, whether cr0 comes from the 4 lsbs of the full register, or the 4 msbs.
(In reply to Michael Nolan from comment #33) > (In reply to Luke Kenneth Casson Leighton from comment #32) > > > then in OP_BC look up the remaining 2 bits of BI to select which bit of CRn > > > is to be used, right? > > > > (2 LSBs) > > Yep, that's the idea! excccellent, muhahaha oopssorry. > I think the register file should be designed to address the 8 crs in the > correct bit order. If I ask for cr0, I should get cr0, whether cr0 comes > from the 4 lsbs of the full register, or the 4 msbs. sigh. i will need help when it comes to that. i will have no clue which way round. basically i believe it may be quite simple... but this is POWER. * the CR regfile will be subdivided into 8 separate latch-banks (not a standard SRAM, at all) * there will be 8 read-enable (and write-enable) lines. * it will be possible to read/write *all* of those simultaneously to obtain the full 32-bit CR. this on a 32-bit data path * it will also be possible to read/write individual 4-bit CRs by a single read/write-enable line. this on a 4-bit data path the DMs take care of ensuring that 32 and 4 bit read/writes are entirely mutually exclusively time-sliced. however for the purposes of this discussion, the LSBs/MSBs i *believe* is as simple as hard-wiring the ordering correctly on the 32-bit path. * all 8 4-bit read/writes *at the regfile*, to produce the full 32-bit CR, will be LSB/MSB *hard-wired* correctly, right there. so with that in mind, would i be correct in thinking that the CR Regfile actually needs to store the data as: CR0: bit 3 2 1 0 CR1: 7 6 5 4 CR2: 11 10 9 8 ?
(In reply to Luke Kenneth Casson Leighton from comment #34) > however for the purposes of this discussion, the LSBs/MSBs i *believe* > is as simple as hard-wiring the ordering correctly on the 32-bit path. > > * all 8 4-bit read/writes *at the regfile*, to produce the full 32-bit CR, > will be LSB/MSB *hard-wired* correctly, right there. > > > so with that in mind, would i be correct in thinking that the CR Regfile > actually needs to store the data as: > > CR0: bit 3 2 1 0 CR1: 7 6 5 4 CR2: 11 10 9 8 > > ? I think it'll actually be CR0: bit 0 1 2 3 CR1: 4 5 6 7 CR2: 8 9 10 11 but it'll need some testing to be sure.
(In reply to Michael Nolan from comment #35) > I think it'll actually be > > CR0: bit 0 1 2 3 CR1: 4 5 6 7 CR2: 8 9 10 11 and.. um.. um... branch main_stage does crbit = cr[3-BI[0:2]] something like that? > but it'll need some testing to be sure. sigh :)
(In reply to Luke Kenneth Casson Leighton from comment #36) > (In reply to Michael Nolan from comment #35) > > > I think it'll actually be > > > > CR0: bit 0 1 2 3 CR1: 4 5 6 7 CR2: 8 9 10 11 > > and.. um.. um... branch main_stage does crbit = cr[3-BI[0:2]] something like > that? Just updated it to use the new interface, and yep that's what it does.
I have a formal proof for branches now, I handled the CRs in a similar way to the CR unit so I know the CR handling is good. (reply at https://bugs.libre-soc.org/show_bug.cgi?id=335#c1)
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/branch/pipe_data.py;h=28099eaa84f065065132208abaa9d8fa4a6f003f;hb=HEAD whoops the regspec for cr needs to be 0:3 range. 4 means "select bit 4 only". ie only a 1 bit wide CR.
looking at putting CR_ISEL into CR pipe, it also uses BC so i checked DecodeCRin and it puts all 5 bits of BC into the reg bitfield. that seems to be an error, i think it should be BC[2:5] like the others i grepped BC in the csv files and at present only BR uses BC in the CRin field.
bctar bcctr and bclr need to be added as unit tests. bclr fails at the moment: File "fu/branch/test/test_pipe_caller.py", line 208, in assert_outputs self.assertEqual(branch_taken, sim_branch_taken, code) AssertionError: 1 != False : bclr 0, 5, 0
michael i saw the change in fu/branch/test_pipe_caller.py to use spr2. it now needs to be read_fast1.data and read_fast2.data, not read_spr1.data and read_spr2.data.
(In reply to Luke Kenneth Casson Leighton from comment #42) > michael i saw the change in fu/branch/test_pipe_caller.py to use spr2. > it now needs to be read_fast1.data and read_fast2.data, > not read_spr1.data and read_spr2.data. How do you think TAR should be handled? Should it be a fast SPR as well?
(In reply to Michael Nolan from comment #43) > (In reply to Luke Kenneth Casson Leighton from comment #42) > > michael i saw the change in fu/branch/test_pipe_caller.py to use spr2. > > it now needs to be read_fast1.data and read_fast2.data, > > not read_spr1.data and read_spr2.data. > > How do you think TAR should be handled? Should it be a fast SPR as well? yes. it's accessed often enough. if you can hack that into DecodeA and DecodeB, such that this is in power_decoder2.py rather than in the unit test, it would be great: insn_type = yield dec2.e.insn_type if insn_type == InternalOp.OP_BCREG.value: xo9 = yield dec2.dec.FormXL.XO[9] xo5 = yield dec2.dec.FormXL.XO[5] if xo9 == 0: yield branch.p.data_i.spr2.eq(sim.spr['LR'].value) elif xo5 == 1: yield branch.p.data_i.spr2.eq(sim.spr['TAR'].value) else: yield branch.p.data_i.spr2.eq(sim.spr['CTR'].value) then, read_fast1 and read_fast2 can be used in both unit tests: fu/compunit/test/test_branch_compunit.py and fu/branch/test/test_pipe_caller.py if you can handle the modifications to power_decoder2.py that would be great. bear in mind that DecodeA reads CTR, (which goes into read_fast1) and DecodeB reads LR or TAR (which goes into read_fast2). so the above test needs to be split.
it's ok i got it, i appreciate you're dropping in on occasion at the moment.
(In reply to Luke Kenneth Casson Leighton from comment #44) > bear in mind that DecodeA reads CTR, (which goes into read_fast1) > and DecodeB reads LR or TAR (which goes into read_fast2). > > so the above test needs to be split. So main_stage.py will need to check whether the instruction is a bcctr and use spr1, otherwise use spr2?
(In reply to Michael Nolan from comment #46) > (In reply to Luke Kenneth Casson Leighton from comment #44) > > > bear in mind that DecodeA reads CTR, (which goes into read_fast1) > > and DecodeB reads LR or TAR (which goes into read_fast2). > > > > so the above test needs to be split. > > So main_stage.py will need to check whether the instruction is a bcctr and > use spr1, otherwise use spr2? sigh yes. although what i was going to do was to just make BCREG use spr2 for CTR/LR/TAR and BC uses spr1=CTR. i believe that may end up being simpler, because main_stage.py would not need to decode XO[5/9]?
(In reply to Luke Kenneth Casson Leighton from comment #44) > yes. it's accessed often enough. Lol, back when I was doing some PowerPC reverse engineering I never saw a bctar. Then again, it was a fairly old chip so TAR may not have been introduced yet. > > then, read_fast1 and read_fast2 can be used in both unit tests: > fu/compunit/test/test_branch_compunit.py > and > fu/branch/test/test_pipe_caller.py This has been done to test_pipe_caller, still need to do the compunit one.
(In reply to Michael Nolan from comment #48) > (In reply to Luke Kenneth Casson Leighton from comment #44) > > > yes. it's accessed often enough. > > Lol, back when I was doing some PowerPC reverse engineering I never saw a > bctar. okok :) put another way: i don't want the hassle of adding an extra regfile port (a slow SPR one) to the branch regspecs :) > > then, read_fast1 and read_fast2 can be used in both unit tests: > > fu/compunit/test/test_branch_compunit.py > > and > > fu/branch/test/test_pipe_caller.py > > This has been done to test_pipe_caller, still need to do the compunit one. the compunit one _should_ just pick it up automatically, because of the use of regspecs. howeverrrr.... there's something fishy going on in test_bc_cr, which is only noticed on the *next* instruction, test_bc_ctr here's test_bc_cr output phase: check extra output 'bc 12, 3, -12720' {} <-- uh-oh... no output regs. test_bc_ctr following that, this - in the output log - is extremely bad: before inputs, rd_rel, wr_rel: 0b1100 0b101 <--- wr_rel *must* be zero here. rd_rel 0 wait HI 0 spr1 0x57b4136e rd_rel 0 wait HI 1 spr1 0x57b4136e rd_rel 2 wait HI 1 cr_a 0x0 rd_rel 3 wait HI 1 cia 0x0 if that occurs, it means that the pipeline has set Data.ok flags but that PowerDecode2 has *NOT* set the corresponding write regs information. or the other way round.
problem is caused by test_bc_reg() - any one of those will cause subsequent tests to fail
found it. when no registers were to be written, MultiCompUnit couldn't cope. this actually happens with some of the branch tests. no change to LR, no change to CTR, no change to NIA (because the branch was not taken): wark-wark. i added yet another hacked-in condition to MCU to get it to test if the ALU was finished.