Bug 726 - Additional core_stop check after Execute breaks single-stepping
Summary: Additional core_stop check after Execute breaks single-stepping
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: High major
Assignee: Cesar Strauss
URL: https://libre-soc.org/irclog/latest.l...
Depends on:
Blocks: 737
  Show dependency treegraph
 
Reported: 2021-10-12 22:34 BST by Cesar Strauss
Modified: 2021-12-25 03:00 GMT (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cesar Strauss 2021-10-12 22:34:51 BST
Executing:

1) python ~/src/soc/src/soc/simple/issuer_verilog.py --disable-svp64 --debug=dmi ~/src/soc/src/soc/litex/florent/libresoc/libresoc.v

2) python ~/src/soc/src/soc/litex/florent/sim.py --debug --variant=standard

... simulates the libre-soc core, with an embedded FSM single-stepping it,  controlled by DMI.

Right now, one every two DMI single-step commands is not actually executing, deterministically.

Since we may want to stop the core in the middle of a VL loop, I have put another core stop check after Execute. Together with the check before Fetch, that's two core stop checks in a row.

What I didn't anticipate was core_stop being pulsed low, for single-step. As core_stop immediately goes high, the second check before Fetch catches it, and doesn't resume execution.

Unfortunately it seems likely that this bug ended up on the chip. The additional core_stop check after Execute was not conditional on --svp64.

These are the tasks as I see it:

1) Make a test-case that catches this regression
2) Fix the FSM to avoid the issue
3) Document the present behavior of the test chip
4) Develop and test mitigations for testing the chip

In principle, running two DMI single step commands in a row should work around this problem on the chip.

A side effect is that, after randomly stopping the core, the PC read by DMI may or may not point to the next instruction, depending whether the last executed instruction updated the PC, and it stopped on the check after Execution.

Too bad about the chip. Let's hope the workaround actually works in practice, and doesn't impact testing by much. Sorry about this.
Comment 1 Cesar Strauss 2021-12-23 11:57:03 GMT
FSM fixed in https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=a1639b39fcd13094c9469ce677f0265fd8d0fea2

Single-stepping into a VL loop no longer works, but at least DMI single-step works again, as intended, for regular cases.

Please check. Sorry for the inconvenience.
Comment 2 Luke Kenneth Casson Leighton 2021-12-25 03:00:54 GMT
works great cesar, i was able to get a verilator dump
of all instructions and regs, and tracked down thw
anomaly compared to microwatt mmu. thank you