the creation of wrmask was found to rely on the data and ok signals being set and left set, rather than snapshot-captured.
On https://bugs.libre-soc.org/show_bug.cgi?id=336#c75, I said: "It seems that MultiCompUnit depends on the ALU *.ok output fields being held valid long enough into the write phase. In the test ALU, I tried holding *.ok valid just for the duration of the ALU valid_o, and it didn't work. I ended up instead directly decoding the input operation, combinatorially, into the *.ok, so they do stay valid during the entire instruction cycle." Maybe related?
http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-November/004086.html https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=809cf2faa4450901779045cfaa89e69f70ed9f42 (In reply to Cesar Strauss from comment #1) > On https://bugs.libre-soc.org/show_bug.cgi?id=336#c75, I said: > > "It seems that MultiCompUnit depends on the ALU *.ok output fields being > held valid long enough into the write phase. In the test ALU, I tried > holding *.ok valid just for the duration of the ALU valid_o, and it didn't > work. I ended up instead directly decoding the input operation, > combinatorially, into the *.ok, so they do stay valid during the entire > instruction cycle." > > Maybe related? sounds right. the issue is that wrmask goes directly into the write release, and it should have triggered the relevant latches at (and only at) the time that the ok occurs, which, of course, is when the n.i_valid occurs. that the ok has to be set and stay set *after* i_valid is a serious problem
as expected, wrmask which comes from all the "ok" signals which are only supposed to be valid when n.i_valid is set, is used on the write request latches: 256 # dest operand latch (not using issue_i) 257 m.d.sync += req_l.s.eq(alu_pulsem & self.wrmask) 258 m.d.sync += req_l.r.eq(reset_w | prev_wr_go) however this does not look right: 361 # write-release gated by busy and by shadow (and write-mask) 362 brd = Repl(self.busy_o & self.shadown_i, self.n_dst) 363 m.d.comb += self.wr.rel_o.eq(req_l.q & brd & self.wrmask) wrmask gating with the wr.rel_o output is most likely the bug, it says that wrmask has to be sustained (sync'd) and it is redundant anyway.
removing these seems to be ok: --- a/src/soc/experiment/compalu_multi.py +++ b/src/soc/experiment/compalu_multi.py @@ -215,11 +215,9 @@ class MultiCompUnit(RegSpecALUAPI, Elaboratable): # is enough, when combined with when read-phase is done (rst_l.q) wr_any = Signal(reset_less=True) req_done = Signal(reset_less=True) - m.d.comb += self.done_o.eq(self.busy_o & - ~((self.wr.rel_o & ~self.wrmask).bool())) + m.d.comb += self.done_o.eq(self.busy_o & ~(self.wr.rel_o).bool()) m.d.comb += wr_any.eq(self.wr.go_i.bool() | prev_wr_go.bool()) - m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready & - ((req_l.q & self.wrmask) == 0)) + m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready & (req_l.q == 0)) # argh, complicated hack: if there are no regs to write, # instead of waiting for regs that are never going to happen, # we indicate "done" when the ALU is "done" @@ -360,7 +358,7 @@ class MultiCompUnit(RegSpecALUAPI, Elaboratable): # write-release gated by busy and by shadow (and write-mask) brd = Repl(self.busy_o & self.shadown_i, self.n_dst) - m.d.comb += self.wr.rel_o.eq(req_l.q & brd & self.wrmask) + m.d.comb += self.wr.rel_o.eq(req_l.q & brd) # output the data from the latch on go_write for i in range(self.n_dst): the reasons why it's ok is likely because: * wrmask is the trigger (ANDed) into req_l setting) * wr.rel_o ANDed with wrmask therefore should never be done (and is redundant) * almost all of the places removed are (wr.rel_o & wrmask) the ones that i am leaving in are the "complicated hack": with m.If((self.wrmask == 0) & self.alu.n.i_ready & self.alu.n.o_valid & self.busy_o): m.d.comb += req_done.eq(1) this is a special indicator which detects when the incoming ready/valid pulse is valid, which is supposed to be the only time when wrmask (the Cat() of all data ok signals) is valid. and when all data ok signals are zero, at the *exact* moment of the incoming ready/valid, that indicates that there are no registers to write, so there is no waiting. now, on the face of it, it looks like wr_any covers this: m.d.comb += wr_any.eq(self.wr.go_i.bool() | prev_wr_go.bool()) m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready & (req_l.q == 0)) why would you have to request req_done to cancel the request when, clearly, wr.go_i should have no requests, right? wrong. the special "hack" covers the case where the *ALU* decides that it has no need to write to registers. this is for instances where XER or CR for example (XER.so in particular) writing is determined *by the ALU* that it has no need to write to XER.so, saving on a reg write. but, it was not possible to determine that before giving the ALU the *opportunity* to write to the XER.so (or other register) commit 0b16b46191e24089916b902a1512902c87fd782
ha, that's good: that's fixed ReservationStations2, it can now set the data combinatorial on a single pulse rather than being forced to set it for multiple cycles. Cesar would you like to have a look, make sure you're happy? i am just adding no-wait states to ReservationStations2.