Bug 742 - MultiCompUnit wrmask relies on sync ok
Summary: MultiCompUnit wrmask relies on sync ok
Status: CONFIRMED
Alias: None
Product: Libre-SOC's second ASIC
Classification: Unclassified
Component: Milestones (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 690
  Show dependency treegraph
 
Reported: 2021-11-06 12:26 GMT by Luke Kenneth Casson Leighton
Modified: 2021-11-07 15:42 GMT (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-11-06 12:26:37 GMT
the creation of wrmask was found to rely on the data and ok signals being
set and left set, rather than snapshot-captured.
Comment 1 Cesar Strauss 2021-11-06 13:06:45 GMT
On https://bugs.libre-soc.org/show_bug.cgi?id=336#c75, I said:

"It seems that MultiCompUnit depends on the ALU *.ok output fields being held valid long enough into the write phase. In the test ALU, I tried holding *.ok valid just for the duration of the ALU valid_o, and it didn't work. I ended up instead directly decoding the input operation, combinatorially, into the *.ok, so they do stay valid during the entire instruction cycle."

Maybe related?
Comment 2 Luke Kenneth Casson Leighton 2021-11-06 13:14:05 GMT
http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-November/004086.html

https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=809cf2faa4450901779045cfaa89e69f70ed9f42

(In reply to Cesar Strauss from comment #1)
> On https://bugs.libre-soc.org/show_bug.cgi?id=336#c75, I said:
> 
> "It seems that MultiCompUnit depends on the ALU *.ok output fields being
> held valid long enough into the write phase. In the test ALU, I tried
> holding *.ok valid just for the duration of the ALU valid_o, and it didn't
> work. I ended up instead directly decoding the input operation,
> combinatorially, into the *.ok, so they do stay valid during the entire
> instruction cycle."
> 
> Maybe related?

sounds right.

the issue is that wrmask goes directly into the write release, and it should
have triggered the relevant latches at (and only at) the time that the ok
occurs, which, of course, is when the n.i_valid occurs.

that the ok has to be set and stay set *after* i_valid is a serious problem
Comment 3 Luke Kenneth Casson Leighton 2021-11-06 13:20:56 GMT
as expected, wrmask which comes from all the "ok" signals
which are only supposed to be valid when n.i_valid is set,
is used on the write request latches:

 256         # dest operand latch (not using issue_i)
 257         m.d.sync += req_l.s.eq(alu_pulsem & self.wrmask)
 258         m.d.sync += req_l.r.eq(reset_w | prev_wr_go)

however this does not look right:

 361         # write-release gated by busy and by shadow (and write-mask)
 362         brd = Repl(self.busy_o & self.shadown_i, self.n_dst)
 363         m.d.comb += self.wr.rel_o.eq(req_l.q & brd & self.wrmask)

wrmask gating with the wr.rel_o output is most likely the bug,
it says that wrmask has to be sustained (sync'd) and it is redundant
anyway.
Comment 4 Luke Kenneth Casson Leighton 2021-11-07 12:30:12 GMT
removing these seems to be ok:

--- a/src/soc/experiment/compalu_multi.py
+++ b/src/soc/experiment/compalu_multi.py
@@ -215,11 +215,9 @@ class MultiCompUnit(RegSpecALUAPI, Elaboratable):
         # is enough, when combined with when read-phase is done (rst_l.q)
         wr_any = Signal(reset_less=True)
         req_done = Signal(reset_less=True)
-        m.d.comb += self.done_o.eq(self.busy_o &
-                                   ~((self.wr.rel_o & ~self.wrmask).bool()))
+        m.d.comb += self.done_o.eq(self.busy_o & ~(self.wr.rel_o).bool())
         m.d.comb += wr_any.eq(self.wr.go_i.bool() | prev_wr_go.bool())
-        m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready &
-                                ((req_l.q & self.wrmask) == 0))
+        m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready & (req_l.q == 0))
         # argh, complicated hack: if there are no regs to write,
         # instead of waiting for regs that are never going to happen,
         # we indicate "done" when the ALU is "done"
@@ -360,7 +358,7 @@ class MultiCompUnit(RegSpecALUAPI, Elaboratable):
 
         # write-release gated by busy and by shadow (and write-mask)
         brd = Repl(self.busy_o & self.shadown_i, self.n_dst)
-        m.d.comb += self.wr.rel_o.eq(req_l.q & brd & self.wrmask)
+        m.d.comb += self.wr.rel_o.eq(req_l.q & brd)
 
         # output the data from the latch on go_write
         for i in range(self.n_dst):

the reasons why it's ok is likely because:

* wrmask is the trigger (ANDed) into req_l setting)
* wr.rel_o ANDed with wrmask therefore should never be done
  (and is redundant)
* almost all of the places removed are (wr.rel_o & wrmask)

the ones that i am leaving in are the "complicated hack":

        with m.If((self.wrmask == 0) &
                  self.alu.n.i_ready & self.alu.n.o_valid & self.busy_o):
            m.d.comb += req_done.eq(1)

this is a special indicator which detects when the incoming ready/valid
pulse is valid, which is supposed to be the only time when wrmask (the
Cat() of all data ok signals) is valid.

and when all data ok signals are zero, at the *exact* moment of the
incoming ready/valid, that indicates that there are no registers to
write, so there is no waiting.

now, on the face of it, it looks like wr_any covers this:

        m.d.comb += wr_any.eq(self.wr.go_i.bool() | prev_wr_go.bool())
        m.d.comb += req_done.eq(wr_any & ~self.alu.n.i_ready & (req_l.q == 0))

why would you have to request req_done to cancel the request when, clearly,
wr.go_i should have no requests, right?

wrong.

the special "hack" covers the case where the *ALU* decides that it has
no need to write to registers.

this is for instances where XER or CR for example (XER.so in particular)
writing is determined *by the ALU* that it has no need to write to XER.so,
saving on a reg write.

but, it was not possible to determine that before giving the ALU the
*opportunity* to write to the XER.so (or other register)

commit 0b16b46191e24089916b902a1512902c87fd782
Comment 5 Luke Kenneth Casson Leighton 2021-11-07 13:10:23 GMT
ha, that's good: that's fixed ReservationStations2, it can now set the
data combinatorial on a single pulse rather than being forced to set it
for multiple cycles.

Cesar would you like to have a look, make sure you're happy?
i am just adding no-wait states to ReservationStations2.