Bug 879 - Formal proof of soc.experiment.compalu_multi.MultiCompUnit needed
Summary: Formal proof of soc.experiment.compalu_multi.MultiCompUnit needed
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Formal Verification (show other bugs)
Version: unspecified
Hardware: PC Linux
: High enhancement
Assignee: Cesar Strauss
Depends on:
Blocks: 197
  Show dependency treegraph
Reported: 2022-07-01 14:10 BST by Cesar Strauss
Modified: 2022-11-14 17:49 GMT (History)
2 users (show)

See Also:
NLnet milestone: NLNet.2019.10.032.Formal
total budget (EUR) for completion of task and all subtasks: 2500
budget (EUR) for this task, excluding subtasks' budget: 2500
parent task for budget allocation: 197
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Note You need to log in before you can comment on or make changes to this bug.
Description Cesar Strauss 2022-07-01 14:10:27 BST
In short, MultiCompUnit:

1) stores an opcode from Issue, when not "busy", and "issue" is pulsed
2) signals "busy" high
3) fetches its operand(s), if any (which are not masked or zero) from the Scoreboard (REQ/REL protocol)
4) starts the ALU (ready/valid protocol), as soon as all inputs are available
5) captures result from ALU (again ready/valid)
5) sends the result(s) back to the Scoreboard (again REQ/REL)
6) pulses "done", and drops "busy"

Note that, if the conditions are right, many of the above can occur together, on a single cycle.

The formal proof involves ensuring that:
1) the ALU gets the right opcode from Issue
2) the ALU gets the right operands from the Scoreboard
3) the Scoreboard receives the right result from the ALU
4) no transactions are dropped or repeated
Comment 1 Cesar Strauss 2022-10-12 19:57:14 BST
The work done on the formal proof of MultiCompUnit is here:

It did caught issues and subtle bugs, unseen so far, which managed to keep hidden from our current unit tests.

The most serious bug is improper assertion of rel_o, for a masked operand, when the previous instruction had zero_a or imm_ok. This is due to the src_l latch currently not being reset, on issue.

This particular sequence was not present on the unit tests. In fact, there were already tests, which tested each of zero_a, imm_ok, and masked operands. They were just not adjacent to each other... Sure enough, I moved the tests to be next to each other, and it did reproduce the bug!

I guess the unexpected operand read (which was thrown away) did not affect results, so far.

Another bug found is the req_l latch not being clear on system reset. Even if the holding register of SRLatch is really reset-less, it is still reset at startup, because the latch reset port has reset=1 (https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/latch.py;h=e2d7541d396fa7468668f4bd903346abd504a94f;hb=8aa377705c96f365606950c0fa142e40546e566b#l69). But, this only works if the latch reset port is driven by the sync domain. In case of req_l, it's driven by the comb domain, so the latch is not reset on system reset.

A minor issue is, on handshakes, to assume the other end only sends pulses when requested/allowed. For instance, MultiCompUnit is not resilient to:
1) Receiving a new issue pulse when busy is already high
2) Receiving a go signal when rel is low

If desired, just replace the use of pulses with the appropriate AND of the handshake signals.

All in all, I consider the formal verification work a success in catching subtle bugs in MultiCompUnit.
Comment 2 Luke Kenneth Casson Leighton 2022-10-12 20:22:54 BST
(In reply to Cesar Strauss from comment #1)

> All in all, I consider the formal verification work a success in catching
> subtle bugs in MultiCompUnit.

fantastic, cesar, we should definitely consider one for LDSTCompUnit,
in the next grant.