Bug 812 - invalid access to 0x0 on startup leads to core hang
Summary: invalid access to 0x0 on startup leads to core hang
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Other
: --- major
Assignee: tpearson
URL:
Depends on:
Blocks: 813
  Show dependency treegraph
 
Reported: 2022-04-16 08:31 BST by tpearson
Modified: 2023-08-02 18:57 BST (History)
2 users (show)

See Also:
NLnet milestone: NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments
PC async reset fix (1.14 KB, patch)
2022-04-16 19:47 BST, tpearson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description tpearson 2022-04-16 08:31:03 BST
With the current ls2/soc/nmigen versions, the CPU core reset appears to be overridden by the default nmigen soc reset, and both resets appear to be released at the same time.

This has the net effect of holding the CPU PC at 0x0 instead of <RESET_VECTOR> during the reset pulse.  When the CPU reset and SoC resets are released simultaneously, the ICache sees the 0x0 PC for an instant and starts loading a cache line from 0x0.  This locks up the bus since nothing responds to the requests for 0x0, 0x8, etc.

I am working on staggering the main rst pulse and the CPU reset pulse to fix this problem.
Comment 1 Luke Kenneth Casson Leighton 2022-04-16 11:41:48 BST
(In reply to tpearson from comment #0)

> With the current ls2/soc/nmigen versions, the CPU core reset appears to be
> overridden by the default nmigen soc reset, and both resets appear to be
> released at the same time.

sorry, something undocumented and i forgot to tell you

use "make microwatt_external_core_bram" back in soc to get it to create
a version of external_core_top.v that has its reset at 0xff00_0000
or "make microwatt_external_core_spi" if you want one that has its reset
at 0xf000_0000
 
> I am working on staggering the main rst pulse and the CPU reset pulse to fix
> this problem.

the above make targets should in theory ensure that doesn't happen.

look in external_core_top.v for this:

          \dec2_cur_pc$next  = 64'h00000000ff000000;
          \dec2_cur_msr$next  = 64'h8000000000000001;

and other occurrences of h0000000f0000000
Comment 2 Luke Kenneth Casson Leighton 2022-04-16 11:49:51 BST
there's higher priority items than this one.

a "workaround" is to put everything back to boot at 0x0000_0000

* set fw_addr=0x0000_0000
* in coldboot/Makefile set BOOT_MEM_BASE=0x00000000
* change this to 0x4000_0000       ddr_addr=0x00000000,      # DRAM_BASE

* make clean and make on coldboot
* in soc use "make microwatt_external_core"

that will get you back to booting at 0x0000_0000 with the
firmware block loaded into BRAM at 0x0000_0000

the plan has always been to move these to config files
(or a python module that can be named on the commandline)
and to hook them into the dynamic pinmux system (adding
JTAG, GPIO and Pin Multiplexing in one single hit)

but actually getting the peripherals up and running *at all*
has taken priority over getting that done.

if anything is a priority it would be to do that runtime
config, as a JSON file, as a mini task under bug #50
Comment 3 Luke Kenneth Casson Leighton 2022-04-16 12:52:36 BST
(In reply to tpearson from comment #0)

> I am working on staggering the main rst pulse and the CPU reset pulse to fix
> this problem.

took a look: it's a mess, inside TestIssuer.  i added a state to the
Fetch FSM called "pre-idle" and it's not properly stopped.  there's
supposed to be a delay-count and it's running out too early.
Comment 4 Luke Kenneth Casson Leighton 2022-04-16 13:41:50 BST
still a mess but a working mess.

added an extra reset delay counter to ECP5CRG which is *separate* from
the 25 mhz "init" clock domain.

this allows different reset releases which is ok.

better design would be to trigger them in a cascade. later.

also added in a single cycle delay *in the core* which gives time
for reset pc to get into the fetch FSM.
Comment 5 Luke Kenneth Casson Leighton 2022-04-16 15:46:29 BST
Tim if you're adding into openpower-isa the only reason i can think
of why you'd do that is to add to PowerDecoder2, which is the wrong
place to add a pc-reset signal.

PowerDecoder2 is used not only by the SoC but also by the Power ISA
Simulator (which is why it's in openpower-isa, not soc).

the DMI Interface must also perform an over-ride of the PC, by putting
the core into "stopped", writing to the PC, then letting it carry on
(or single-stepping).

it's all quite hairy with a hell of a lot going on so has to be done
with some care, otherwise other things start breaking (such as DMI)
Comment 6 tpearson 2022-04-16 19:47:49 BST
Created attachment 161 [details]
PC async reset fix

OK so this is what I'd come up with in parallel, it's pretty minimal as far as changes and I think we've both kinda figured out the same root cause.

The openpower-isa change is a "nice to have", right now SoC reset forces the PC to 0x0 which I think is incorrect.  The patch to fix that (no access to push a branch) is attached.

For the other reset / WB address issues, here's the fixes I came up with.  Working in hardware and simulation now:

https://git.libre-soc.org/?p=soc.git;a=commit;h=95348f69a936ad4dbf02f02f647613d302d3ec14

https://git.libre-soc.org/?p=soc.git;a=commit;h=b25bcf06665b246515186a10b954dcea24df5bf4

https://git.libre-soc.org/?p=soc.git;a=commit;h=c1a38c32b850f96c0d83da0a5cd3c04285271147