Bug 855 - add libre-soc to kestrel
Summary: add libre-soc to kestrel
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's second ASIC
Classification: Unclassified
Component: source code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: tpearson
URL:
Depends on: 859
Blocks: 850
  Show dependency treegraph
 
Reported: 2022-06-14 21:00 BST by Luke Kenneth Casson Leighton
Modified: 2023-03-06 19:17 GMT (History)
1 user (show)

See Also:
NLnet milestone: NGI.POINTER.Gigabit.ASIC
total budget (EUR) for completion of task and all subtasks: 40000
budget (EUR) for this task, excluding subtasks' budget: 40000
parent task for budget allocation: 850
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
red={amount=40000,paid=2022-12-16}


Attachments
Initial Web server functioning (73.27 KB, image/png)
2022-07-05 19:58 BST, tpearson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2022-06-14 21:00:55 BST
add libre-soc as a direct drop-in replacement for microwatt.
no software modifications. only hardware modifications to
ensure that libresoc.v is absolutely identical functionality
to microwatt.v in every way necessary

https://gitlab.raptorengineering.com/dormito/pythondata-cpu-microwatt/-/commit/d491878d1f9af2df6345a956f38976447958d5f7
Comment 1 Luke Kenneth Casson Leighton 2022-06-16 19:41:04 BST

    
Comment 2 tpearson 2022-06-16 19:45:01 BST
I can start work on this as soon as the SPR in bug 859 is implemented.  The end goal will be the Zephyr RTOS variant of Kestrel running on LibreSoC using the exact same (binary match) firmware image as the Microwatt version.
Comment 3 tpearson 2022-06-16 20:00:28 BST
Also wanted to mention that Zephyr image is the same one that has a full Web server integrated into it, and initial Redfish support, so it will easily show working Gigabit Ethernet.

The hardware target is a single Arctic Tern module plugged into the BMC carrier card.
Comment 4 tpearson 2022-07-02 18:44:08 BST
Getting close.  Still chasing down some bugs in interrupt handling...

====================================================
    __ __          __            __
   / //_/__  _____/ /_________  / /
  / ,< / _ \/ ___/ __/ ___/ _ \/ /
 / /| /  __(__  ) /_/ /  /  __/ /
/_/_|_\___/____/\__/_/ __\___/_/ _________
  / ___/____  / __/ /_/ __ )/  |/  / ____/
  \__ \/ __ \/ /_/ __/ __  / /|_/ / /
 ___/ / /_/ / __/ /_/ /_/ / /  / / /___
/____/\____/_/  \__/_____/_/  /_/\____/

====================================================

 (c) Copyright 2020-2022 Raptor Engineering, LLC
 (c) Copyright 2012-2020 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Jul  2 2022 02:38:29
 BIOS CRC passed (23f2a5dc)

 Migen git sha1: 5d8ad08
 LiteX git sha1: 7495d92c

--=============== SoC ==================--
CPU:            LibreSoC @ 60MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            8-bit data
ROM:            52KiB
SRAM:           8KiB
L2:             8KiB
SDRAM:          1048576KiB 32-bit @ 240MT/s (CL-6 CWL-5)

--========== Initialization ============--
Ethernet init...
Initializing SDRAM @0x40000000...
Switching SDRAM to software control
Comment 5 Luke Kenneth Casson Leighton 2022-07-02 18:53:34 BST
(In reply to tpearson from comment #4)
> Getting close.

dang!

>  Still chasing down some bugs in interrupt handling...

don't be too surprised if the problem disappears at 55 or 50 mhz

                             vv
> CPU:            LibreSoC @ 60MHz
                             ^^

also do bear in mind i added KAIVB to *everything*.
external interrupts, DEC/TB timer, everything: all
exceptions go via the same codepath, and sc is also
considered a type of exception.
Comment 6 tpearson 2022-07-02 18:55:34 BST
(In reply to Luke Kenneth Casson Leighton from comment #5)
> (In reply to tpearson from comment #4)
> > Getting close.
> 
> dang!
> 
> >  Still chasing down some bugs in interrupt handling...
> 
> don't be too surprised if the problem disappears at 55 or 50 mhz
> 
>                              vv
> > CPU:            LibreSoC @ 60MHz
>                              ^^

Yeah, I know. :)  I have ... reasons ... for that setting, but will be cranking it back down at some point.

> 
> also do bear in mind i added KAIVB to *everything*.
> external interrupts, DEC/TB timer, everything: all
> exceptions go via the same codepath, and sc is also
> considered a type of exception.

Understood.
Comment 7 tpearson 2022-07-03 01:15:17 BST
OK, so it's not the clock frequency, in simulation the interrupts are never getting enabled even though the interrupt controller is set up correctly.

Digging further, it looks like when the EE bit of the MSR is set, this never gets propagated to the actual register store.  This is the bit of assembler that's supposed to turn the interrupts on:

a6 00 20 7d     mfmsr   r9
00 80 29 61     ori     r9,r9,32768
64 01 20 7d     mtmsrd  r9

In practice, this sets r9 to 0x8000 and tries to run mtmsrd.  The instruction starts to execute and does in fact set the core.state.MSR4 data field correctly, but for some reason the write enable for MSR1 is then fired instead of the write enable for MSR4.  Needless to say, the operational MSR is unchanged at 0x0 and the pending external interrupts are ignored.

Thoughts?
Comment 8 tpearson 2022-07-03 01:26:25 BST
For clarity here, the sofware is *trying* to set the EE bit, but the MSR never changes.  The newly set bit propagates partway up to the core but never actually gets stored in the appropriate register file.

When the MSR set attempts to run, state_wen is set to 0x2 (MSR1 WEN), not MSR4_WEN as I'd expect from at least initial tracing.
Comment 9 tpearson 2022-07-03 01:32:15 BST
Digging a bit further, MSR1 data is also set correctly and WEN1 does strobe, yet the MSR remains unchanged.  I'll continue investigating.
Comment 10 tpearson 2022-07-03 05:40:23 BST
Turned out to be msr_i_ok, defaulted to 1 which was overriding the MSR in all cases.  Explicitly forcing it to 0 seems to have resolved the issues, the bootloader at least is functional with the core at 60MHz.

Will continue with Kestrel bringup over the long weekend, but good progress so far!
Comment 11 Luke Kenneth Casson Leighton 2022-07-03 09:44:12 BST
(In reply to tpearson from comment #10)
> Turned out to be msr_i_ok, defaulted to 1 which was overriding the MSR in
> all cases.  

obviously that shouldn't happen!

it's probably this:

 264             with m.Case(MicrOp.OP_MTMSRD, MicrOp.OP_MTMSR):
 265                 # L => bit 16 in LSB0, bit 15 in MSB0 order
 266                 L = self.fields.FormX.L1[0:1] # X-Form field L1
 267                 # start with copy of msr
 268                 comb += msr_o.eq(msr_i)
                             ^^^^^^^^
this is copying a straight 64-bit Signal into a 65-bit Record
(data, ok)

it should be msr_o.data.eq(msr_i) 

likewise:

 341                     comb += field(msr_o, 51).eq(field(srr1_i, 51)) # ME

should be field(msr_o.data, 51)...

> Explicitly forcing it to 0 seems to have resolved the issues,

can you add the diff here, i need to take a look.



> the bootloader at least is functional with the core at 60MHz.
> 
> Will continue with Kestrel bringup over the long weekend, but good progress
> so far!

fantastic

(In reply to tpearson from comment #7)

> OK, so it's not the clock frequency, in simulation the interrupts are never
> getting enabled even though the interrupt controller is set up correctly.
> 
> Digging further, it looks like when the EE bit of the MSR is set, this never
> gets propagated to the actual register store.  This is the bit of assembler
> that's supposed to turn the interrupts on:
> 
> a6 00 20 7d     mfmsr   r9
> 00 80 29 61     ori     r9,r9,32768
> 64 01 20 7d     mtmsrd  r9

this looks perfect for at least a unit test and probably a microwatt
stand-alone unit test as well.

> In practice, this sets r9 to 0x8000 and tries to run mtmsrd.

that's "mtmsrd r9,0" where you maaay have meant to use "mtmsrd r9,1"
p989 v3.0C Book III section 5.4.4

  L=1:
     Bits 48 and 62 of register RS are placed into the
     corresponding bits of the MSR. The remaining bits
     of the MSR are unchanged.

regardless, if there's something borked about L=0 that's important
to fix.

>  The
> instruction starts to execute and does in fact set the core.state.MSR4 data
> field correctly, but for some reason the write enable for MSR1 is then fired
> instead of the write enable for MSR4.

there isn't an MSR4 or an MSR1, i'm guessing you're referring to the
auto-renamed auto-generated variables as data is copied through the
hierarchy.

>  Needless to say, the operational MSR
> is unchanged at 0x0 and the pending external interrupts are ignored.

this would explain a hell of a lot of things about when i was trying to
run linux-5.7. sigh.

the way that Function Units work, they are given the data from register files,
where yes, PC DEC SVSTATE TB and MSR are registers (State Regfile)
that also happen to be cached in core.state (CoreState), and the
cached copies are to be used for moving multi-issue state on, later,
so that is why in their associated Input Record you will see msr and cia being
passed in rather than via Regfile reads

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/trap/trap_input_record.py;hb=HEAD

  17                   ('msr', 64),     # from core.state
  18                   ('cia', 64),     # likewise
  19                   ('svstate', 64), # likewise

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/trap/pipe_data.py;hb=HEAD

  11             # note here that MSR CIA and SVSTATE are *not* read as regs:
  12             # they are passed in as incoming "State", via the
  13             # CompTrapOpSubset


they send back notifications of a desire to write, by setting the "ok"
bit.
Comment 12 tpearson 2022-07-03 23:22:32 BST
I'm holding off on posting the patches until I'm a little bit further along -- basically I'm still validating the overall approach to integration here to make sure I'm not going down a dead end.

There are still a couple of issues, mostly centered around the timebase / decrementer but also some general problems and functional differences from Microwatt that I need to investigate.

As a bit of background, Zephyr (the RTOS underlying the full network-enabled version of Kestrel) uses both the timebase and the decrementer, along with the decrementer interrupt, for thread scheduling and delays / timeouts.  For some reason I haven't been able to track down yet, Zephyr operates as if it is under continual overload -- timeouts are running way too fast, it's literally saturated a test network with continuous DHCP requests instead of waiting the normal second or two for a response.

Potentially related to the abocve: I have noticed that LibreSoC is significantly slower than Microwatt in at least the bootloader memory test loop, and that would be explained by a decrementer interrupt firing more often than under Microwatt.

Finally, I've noticed problems with the GPIO setup / set using the LiteX CSR access routines.  I need to determine if this is a difference in how LibreSoC is executing an instruction specific to the CSR bytewise access, or a general problem in the new integration "glue logic".
Comment 13 Luke Kenneth Casson Leighton 2022-07-04 00:04:30 BST
(In reply to tpearson from comment #12)
> I'm holding off on posting the patches until I'm a little bit further along
> -- basically I'm still validating the overall approach to integration here
> to make sure I'm not going down a dead end.

please don't take this approach, aside from anything we are under Audit
Conditions.  and also under time-pressure: we cannot wait for weeks.

please do use a branch, like last time, i don't mind: create
what you need.

> There are still a couple of issues, mostly centered around the timebase /
> decrementer but also some general problems and functional differences from
> Microwatt that I need to investigate.

there should be absolutely no difference whatsoever: that is the inviolate
rule.

if you investigate those "in a general uncontrolled way", it will
take forever, you will have hundreds of thousands or potentially millions
of instructions to go through with an almost impossible / non-existent
debugging environment.

our approach in this project has been: unit tests, unit tests, unit tests.

the reason why Libre-SOC works at all is not because i knew exactly
what i was doing, it was because i strictly followed a process of
creating unit tests that could produce *identical* output to that
of microwatt, usually under verilator, literally dumping output
of all registers line-by-line.

i have run mmu.bin, helloworld.bin, binary tests1 through 10,
dectb.bin, and several others, in this way.

by doing simple "diffs" on runs containing those register-dumps i
could then track down the exact point at which the instruction was
wrong.

i then added a libre-soc unit test, corrected the python-based Simulator
to match it, ran the same unit test against the HDL, corrected that,
then went back and re-ran the verilator simulation and found that it worked.

after enough of these (18 months worth) it made progress.

mmu.bin took weeks to get working.  each unit test in mmu.bin
that failed required 2-4 unit tests in libre-soc to be written,
including getting the python-based Simulator to replicate the
functionality.


to reiterate: if there is a problem i *need* to know *exactly* where
it is, with a small stand-alone (helloworld-like) unit test or other
repro case.


> As a bit of background, Zephyr (the RTOS underlying the full network-enabled
> version of Kestrel) uses both the timebase and the decrementer, along with
> the decrementer interrupt, for thread scheduling and delays / timeouts.  For
> some reason I haven't been able to track down yet, Zephyr operates as if it
> is under continual overload -- timeouts are running way too fast, it's
> literally saturated a test network with continuous DHCP requests instead of
> waiting the normal second or two for a response.

timer/dec in neither microwatt nor libre-soc are "according to spec".
the linux kernel device-tree file has to have an explicit entry to
say what the timer base frequency is, and does a calculation on what
multiplier is needed to get that into "real" Hz, by running a tight
"CTR" loop and checking how much dec/tb time elapsed.

with libre-soc running an FSM, a similar calibration may be needed
because the FSM runs instructions approximately 10x slower than microwatt.

it would not surprise me at all if timer/dec were running 1-2 orders of
magnitude faster than "expected".

 
> Potentially related to the abocve: I have noticed that LibreSoC is
> significantly slower than Microwatt in at least the bootloader memory test
> loop,

you should find it to be approximately 10x slower because it is still a
Finite State Machine not a pipeline.  there's a pipelined Core in development.


> and that would be explained by a decrementer interrupt firing more
> often than under Microwatt.

DEC/TB also operate slightly differently: they run on a 4-clock schedule,
readdec-writedec-readtb-writetb to avoid having half a dozen register file
ports.

so you can expect DEC/TB to be 4x slower than microwatt which updates
them both exactly on a clock cycle per update.

> Finally, I've noticed problems with the GPIO setup / set using the LiteX CSR
> access routines.  I need to determine if this is a difference in how
> LibreSoC is executing an instruction specific to the CSR bytewise access, or
> a general problem in the new integration "glue logic".

again: unit tests.

i need to activate the project's standard procedures, i appreciate this one
may be slightly tricky, although it may be possible to do with a verilator
c++ GPIO simulator module, first under microwatt then drop-in libre-soc.

this is the FSM which performs $display() dumping of full register state
from both microwatt and libresoc, using DMI control

https://git.libre-soc.org/?p=libresoc-litex.git;a=blob;f=sim.py;h=f4ec8dce544e5ede1ec909d86cd28a7b3d2df08b;hb=0f03df1546c8cf6ab91ef63b04713dca768a84c4#l189

it's pretty braindead, it's dreadfully slow, and has saved vast amounts
of pain.
Comment 14 tpearson 2022-07-04 00:13:51 BST
Fair enough, the 10x slower bit may explain a lot actually especially in the context of an RTOS that is expecting time slices in wall clock time.

I'll push what I have into some branches shortly.

Yes, I've seen DEC firing differently than under Microwatt.

Agreed on unit tests, I've been spending some time trying to figure out what the differences are so that they can be isolated / pulled into unit tests.  So far, the biggest one may well be the slower execution, and that may be throwing Zephyr for a loop.  I'll try adjusting the timeslice configuration to better match the execution speed; unfortunately that is a difference that I can't really work around any other way.

I do have a fairly sophisticated simulation setup running that I'm using to debug with, along with the real hardware.  I think it's close, I just need to track down the remaining differences and figure out how to best handle them.

Zephyr does have an optional runtime timebase step read capability, and I might be able to use that to abstract away the difference between Microwatt and LibreSoC speed, but that's for after I at least have it up and running.

The good news is it does talk over the network.  Once I have the timer issues sorted I expect receive to work as well.
Comment 15 Luke Kenneth Casson Leighton 2022-07-04 00:53:50 BST
(In reply to tpearson from comment #14)
> Fair enough, the 10x slower bit may explain a lot actually especially in the
> context of an RTOS that is expecting time slices in wall clock time.
> 
> I'll push what I have into some branches shortly.

now that there's *a* difference spotted i have to write a unit test
for mtmsr anyway. the sooner i can get that started the better.

i also need to add corresponding DEC/TB counting in the python simulator

> Yes, I've seen DEC firing differently than under Microwatt.

it even makes life difficult to write comparative unit tests against
the python-based simulator, because the (small) program even if it is
3 lines will not react identically.

annoying but it is what it is.

> Agreed on unit tests, I've been spending some time trying to figure out what
> the differences are so that they can be isolated / pulled into unit tests. 
> So far, the biggest one may well be the slower execution, and that may be
> throwing Zephyr for a loop.  I'll try adjusting the timeslice configuration
> to better match the execution speed; unfortunately that is a difference that
> I can't really work around any other way.

no, agreed. i hit this with the linux kernel as well, it was getting
timer interrupts actually overlapping at one point.

> The good news is it does talk over the network. 

dang.

> Once I have the timer
> issues sorted I expect receive to work as well.
Comment 16 tpearson 2022-07-04 23:10:56 BST
Very quick update in advance of a more detailed update tomorrow...

Tracked the main Zephyr problem down to a specific section of mis-executed assembler:

subf    r9,r29,r31
addi    r9,r9,999
lis     r30,8388
ori     r30,r30,39845
rldicr  r30,r30,32,31
oris    r30,r30,58195
ori     r30,r30,63439
mulhd   r30,r9,r30
sradi   r10,r30,7
sradi   r30,r9,63
subf    r30,r30,r10
clrldi  r30,r30,32

What this is supposed to do is basically:

ceiling_fraction(deadline - now, MSEC_PER_SEC);

where ceiling_fraction is defined as:

#define ceiling_fraction(numerator, divider) \
        (((numerator) + ((divider) - 1)) / (divider))

r31 is "deadline", set to 8480 decimal
r29 is "now", set to 480 decimal

The correct output would be 8 decimal, stored in in r30.  What's actually returned is zero, which causes all manner of chaos in the timing systems of Zephyr.

I've verified this mis-execution on real hardware and in simulation.  I'm working on reducing the test case down and identifying the actual broken instruction.

I'll also be uploading the requested branches tomrorow, just wanted to get this out there in case it sparks some ideas in the interim.
Comment 17 tpearson 2022-07-04 23:11:51 BST
Forgot to add -- MSEC_PER_SEC is a fixed constant, in this case 1000.
Comment 18 Luke Kenneth Casson Leighton 2022-07-05 04:46:32 BST
(In reply to tpearson from comment #16)
> Very quick update in advance of a more detailed update tomorrow...

no problem

> Tracked the main Zephyr problem down to a specific section of mis-executed
> assembler:
> 
> subf    r9,r29,r31
> addi    r9,r9,999
> lis     r30,8388
> ori     r30,r30,39845
> rldicr  r30,r30,32,31
> oris    r30,r30,58195
> ori     r30,r30,63439
> mulhd   r30,r9,r30
> sradi   r10,r30,7
> sradi   r30,r9,63
> subf    r30,r30,r10
> clrldi  r30,r30,32

if you can get the incoming register values (r9, r29, r30, r31)
i can throw that easily into a unit test for the python-based
Simulator.  bear in mind it doesn't understand pseudo-ops, i
know what lis is but not clridi
Comment 19 tpearson 2022-07-05 07:39:13 BST
I'ts mis-executing mulhd.  Input 0x20c49ba500000000 in r30 and 0x1f40 in r9, out comes 0x1 which is completely wrong.

Does the core need the SPV64 instructions enabled for mulhd and friends to work?

Full update tomorrow.
Comment 20 tpearson 2022-07-05 07:39:58 BST
Correction, 0x20c49ba5e353f7cf in r30, though I suspect the other value will cause a similar problem.
Comment 21 Luke Kenneth Casson Leighton 2022-07-05 13:44:38 BST
(In reply to tpearson from comment #19)
> I'ts mis-executing mulhd.  Input 0x20c49ba500000000 in r30 and 0x1f40 in r9,
> out comes 0x1 which is completely wrong.

briiilliant, an awesome and brain-dead-easy unit test for that, coming up
 
> Does the core need the SPV64 instructions enabled for mulhd and friends to
> work?

hell no.  it's supposed to be exactly SFFS (Power ISA 3.0B) compliant.
or... more to the point (sigh), exactly *Microwatt* Compliant because
"the Spec" != "IBM POWER9" and "The Spec" == "retrospective writing"
as we well know...

> Full update tomorrow.

the mulhd regression is perfect.
Comment 22 Luke Kenneth Casson Leighton 2022-07-05 14:04:48 BST
    def case_kestrel_regression_0(self):
        lst = ["mulhd r30,r9,r30"]
        initial_regs = [0] * 32
        initial_regs[30] = 0x20c49ba5e353f7cf
        initial_regs[9] = 0x1f40
        e = ExpectedState(initial_regs, 4)
        e.intregs[30] = 0x400
        self.add_case(Program(lst, bigendian), initial_regs, expected=e)

expected value according to the ISACaller Simulator is 0x400
i'm going to go with that and see what TestIssuer does.

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=fdeeb295ae18191d5702e7acd8f940a1aeb2977b
Comment 23 Luke Kenneth Casson Leighton 2022-07-05 14:21:12 BST
oh ha ha very funny, the output from the multiply had been
truncated to 64-bit.

i'll need to re-run some of the other unit tests before pushing
because Div uses the same data structure


--- a/src/soc/fu/mul/pipe_data.py
+++ b/src/soc/fu/mul/pipe_data.py
@@ -25,7 +25,7 @@ class MulOutputData(FUBaseData):
 
     @property
     def regspec(self):
-        return [('INT', 'o', "0:%d" % (self.pspec.XLEN)),
+        return [('INT', 'o', "0:%d" % (self.pspec.XLEN*2)), # 2xXLEN
                ('XER', 'xer_so', '32')] # XER bit 32: SO
Comment 24 Luke Kenneth Casson Leighton 2022-07-05 14:46:29 BST
ah.  this was me screwing up in february, replacing a 128-bit spec
for the pipeline output with only XLEN(=64) not XLEN*2.
how in hell's teeth i managed not to spot this when we have so
many unit tests.

mul and div tests all pass.  git pushed.



https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=8aaa3876cc22950271d8e4cf622d1658efe93aef

diff --git a/src/soc/fu/mul/pipe_data.py b/src/soc/fu/mul/pipe_data.py
index 072c5da647451ab77d9938c88664e2becc29e243..ded4c5089a56dd22319a2343324830a9a99bd8f6 100644 (file)
--- a/src/soc/fu/mul/pipe_data.py
+++ b/src/soc/fu/mul/pipe_data.py
@@ -15,8 +15,6 @@ class MulIntermediateData(DivInputData):
 
 
 class MulOutputData(FUBaseData):
-    regspec = [('INT', 'o', '0:128'),
-               ('XER', 'xer_so', '32')] # XER bit 32: SO
     def __init__(self, pspec):
         super().__init__(pspec, False) # still input style
 
@@ -25,6 +23,11 @@ class MulOutputData(FUBaseData):
         self.data.append(self.neg_res)
         self.data.append(self.neg_res32)
 
+    @property
+    def regspec(self):
+        return [('INT', 'o', "0:%d" % (self.pspec.XLEN)),
+               ('XER', 'xer_so', '32')] # XER bit 32: SO
+
Comment 25 tpearson 2022-07-05 19:58:36 BST
Created attachment 168 [details]
Initial Web server functioning
Comment 26 tpearson 2022-07-05 20:03:06 BST
As promised, a large update...

Main repositories available here:

https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/litex

https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/pythondata-cpu-libresoc

https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/litex-boards

https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-firmware/zephyr-firmware

A single module Arctic Tern card, in the PCIe carrier, is the hardware target.

Those include the recent change to fix mulhd, which allows Zephyr to boot and we can load Web pages using the exact same binary used with Microwatt -- in the end, the timing differences are not a major problem at least under Zephyr.  See attached screenshot...

BIOS output:

====================================================
    __ __          __            __
   / //_/__  _____/ /_________  / /
  / ,< / _ \/ ___/ __/ ___/ _ \/ /
 / /| /  __(__  ) /_/ /  /  __/ /
/_/_|_\___/____/\__/_/ __\___/_/ _________
  / ___/____  / __/ /_/ __ )/  |/  / ____/
  \__ \/ __ \/ /_/ __/ __  / /|_/ / /
 ___/ / /_/ / __/ /_/ /_/ / /  / / /___
/____/\____/_/  \__/_____/_/  /_/\____/

====================================================

 (c) Copyright 2020-2022 Raptor Engineering, LLC
 (c) Copyright 2012-2020 Enjoy-Digital
 (c) Copyright 2007-2015 M-Labs

 BIOS built on Jul  5 2022 13:02:44
 BIOS CRC passed (59e4ee3e)

 Migen git sha1: 5d8ad08
 LiteX git sha1: 7495d92c

--=============== SoC ==================--
CPU:            LibreSoC @ 50MHz
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            8-bit data
ROM:            52KiB
SRAM:           8KiB
L2:             8KiB
SDRAM:          1048576KiB 32-bit @ 200MT/s (CL-6 CWL-5)

--========== Initialization ============--
Ethernet init...
Initializing SDRAM @0x40000000...
Switching SDRAM to software control.
Read leveling:
  m0, b0: |01110000| delays: 02+-01
  m0, b1: |00000000| delays: -
  m0, b2: |00000000| delays: -
  m0, b3: |00000000| delays: -
  best: m0, b00 delays: 02+-01
  m1, b0: |01110000| delays: 02+-01
  m1, b1: |00000000| delays: -
  m1, b2: |00000000| delays: -
  m1, b3: |00000000| delays: -
  best: m1, b00 delays: 02+-01
  m2, b0: |01110000| delays: 02+-01
  m2, b1: |00000000| delays: -
  m2, b2: |00000000| delays: -
  m2, b3: |00000000| delays: -
  best: m2, b00 delays: 02+-01
  m3, b0: |01110000| delays: 02+-01
  m3, b1: |00000000| delays: -
  m3, b2: |00000000| delays: -
  m3, b3: |00000000| delays: -
  best: m3, b00 delays: 02+-01
  best: m3, b00 delays: 02+-01
Switching SDRAM to hardware control.
Memtest at 0x00000040000000 (2MiB)...
  Write: 0x40000000-0x40200000 2MiB
   Read: 0x40000000-0x40200000 2MiB
Memtest OK
Memspeed at 0x00000040000000 (2MiB)...
  Write speed: 10MiB/s
   Read speed: 11MiB/s

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
             Timeout
Booting from network...
Local IP: 192.168.1.50
Remote IP: 192.168.1.1
Booting from boot.json...
Booting from boot.bin...
Copying boot.bin to 0x00000040000000... (641024 bytes)
Executing booted program at 0x40000000

--============= Liftoff! ===============--
*** Booting Zephyr OS build zephyr-v2.5.0-3798-gc14cfa8dd9bf  ***


[00:00:03.985,852] <err> gpio_litex: H
[00:00:03.985,949] <inf> spi_tercel: Raptor Tercel SPI master found, device version 1.0.-939524096 0x000038c1/0x4009c224

[00:00:03.996,337] <inf> spi_tercel: Tercel SPI controller frequency configured to 4 MHz (bus frequency 10 MHz, dummy cycles 10305)

[00:00:03.996,856] <err> spi_nor: SFDP magic 00000000 invalid
[00:00:03.996,915] <err> spi_nor: SFDP read failed: -22
[00:00:04.024,300] <inf> shell_telnet: Telnet shell backend initialized
[00:00:04.029,245] <inf> net_config: Initializing network
[00:00:04.029,762] <inf> net_config: IPv4 address: 192.168.1.80
[00:00:04.030,173] <inf> net_config: Running dhcpv4 client...
uart:~$ Area 2 at 0xc00000 on bmc for 4194304 bytes

<etc>

I *think* we might be able to call this one closed soon. :)  Great job to all involved, this is no small feat -- 600k+ of binary running as-is on a completely different CPU...
Comment 27 tpearson 2022-07-05 20:05:00 BST
Also, you might note something interesting in the BIOS output.  I'm not serial loading that image (it takes too long, I'm not patient enough to wait for it each time).  It's actually being loaded over the network via TFTP...
Comment 28 Luke Kenneth Casson Leighton 2022-07-05 20:20:42 BST
(In reply to tpearson from comment #25)
> Created attachment 168 [details]
> Initial Web server functioning

oleeee!

(In reply to tpearson from comment #26)

> Main repositories available here:
> https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/
> litex
> https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/
> pythondata-cpu-libresoc
> https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/
> litex-boards
> https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-firmware/
> zephyr-firmware

bonus points if we can do an automated repro script for getting everything
compiled-and-running (happy to have it include microwatt as an option)
aside from anything it acts as "documentation".

if there's a 4th milestone (very short one) that could be done separately,
or we can put someone else on it?

> A single module Arctic Tern card, in the PCIe carrier, is the hardware
> target.
> 
> Those include the recent change to fix mulhd, which allows Zephyr to boot
> and we can load Web pages using the exact same binary used with Microwatt --

frickin fantastic.


> Booting from network...
> Local IP: 192.168.1.50
> Remote IP: 192.168.1.1
> Booting from boot.json...
> Booting from boot.bin...

niiice

> I *think* we might be able to call this one closed soon. :)  

yeah i'd agree.  up to you if you'd like to do a "full auto build"
script (not the upload, just the building)?
https://git.libre-soc.org/?p=dev-env-setup.git;a=summary


> Great job to
> all involved, this is no small feat -- 600k+ of binary running as-is on a
> completely different CPU...

4,000+ separate unit tests helps quite a lot, there :)
Comment 29 Luke Kenneth Casson Leighton 2022-07-05 20:39:06 BST
tim did you need any TRAP pipeline changes in the end?
or was the sorting-out on msr_o.data sufficient?
i really want to make sure there's a unit test covering
what you had, so need to know the input criteria: what
was MSR when starting those 3 instructions (basically)
mfmsr ori mtmsrd
Comment 30 tpearson 2022-07-05 21:10:40 BST
> bonus points if we can do an automated repro script for getting everything
> compiled-and-running (happy to have it include microwatt as an option)
> aside from anything it acts as "documentation".
> 
> if there's a 4th milestone (very short one) that could be done separately,
> or we can put someone else on it?

I don't mind putting something together, it's not too hard and I know the commands involved.  Just expect it to be a 2 hour build cycle on a reasonably powerful POWER workstation, and much longer on a laptop. :)

> tim did you need any TRAP pipeline changes in the end?
> or was the sorting-out on msr_o.data sufficient?
> i really want to make sure there's a unit test covering
> what you had, so need to know the input criteria: what
> was MSR when starting those 3 instructions (basically)
> mfmsr ori mtmsrd

The relevant change I made was here:

https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/litex/-/blob/d86cad8308d15ce50de19684051956eaede15a46/litex/soc/cores/cpu/libresoc/core.py#L135

If for some reason that's an incorrect signal let me know.
Comment 31 tpearson 2022-07-05 21:13:41 BST
One other little bonus...

--=============== SoC ==================--
                           vvvvv
CPU:            LibreSoC @ 60MHz
                           ^^^^^
BUS:            WISHBONE 32-bit @ 4GiB
CSR:            8-bit data
ROM:            52KiB
SRAM:           8KiB
L2:             8KiB
SDRAM:          1048576KiB 32-bit @ 240MT/s (CL-6 CWL-5)

<snip>

Memspeed at 0x00000040000000 (2MiB)...
               vvvvvvvv
  Write speed: 15MiB/s
   Read speed: 14MiB/s
               ^^^^^^^^

<snip>

--- 8 messages dropped ---
[00:00:04.071,476] <inf> spi_nor: PH1: ff84 rev 0.128: 14529 DW @ 4009c224
[00:00:04.071,609] <inf> spi_tercel: Tercel SPI controller frequency configured to 4 MHz (bus frequency 10 MHz, dummy cycles 41153)

[00:00:04.072,275] <inf> spi_nor: bmc: SFDP v 1.255 AP 3a732502 with 544612432 PH
[00:00:04.072,335] <inf> spi_nor: PH0: ff00 rev 6.48: 26817 DW @ 4009c220
[00:00:04.073,105] <inf> spi_nor: bmc: 64 MiBy flash
[00:00:04.073,443] <inf> spi_nor: PH1: ff84 rev 0.128: 14529 DW @ 4009c224
[00:00:04.073,608] <inf> spi_tercel: Tercel SPI controller frequency configured to 4 MHz (bus frequency 10 MHz, dummy cycles 0)

[00:00:04.074,247] <inf> spi_nor: H
[00:00:04.074,360] <inf> spi_nor: PH0: ff00 rev 6.48: 27329 DW @ 4009c220
[00:00:04.075,077] <inf> spi_nor: fpga: 16 MiBy flash
[00:00:04.075,182] <inf> spi_nor: PH1: ff84 rev 0.128: 8385 DW @ 4009c214
[00:00:04.104,709] <inf> shell_telnet: Telnet shell backend initialized
[00:00:04.112,128] <inf> net_config: Initializing network
[00:00:04.112,661] <inf> net_config: IPv4 address: 192.168.1.80
[00:00:04.113,082] <inf> net_config: Running dhcpv4 client...
uart:~$ Area 2 at 0xc00000 on bmc for 4194304 bytes
FAIL: mount id 2 at /lfs: -28
Raptor Aquila LPC slave found, device version 1.0.0Raptor Tercel SPI master found, device version 1.0.0Flash controller frequency configured to 15 MHz (bus frequency 60 MHz, dummy cycles 10)FPGA SPI flash ID: 0x20ba1810Raptor Tercel SPI master found, device version 1.0.0Micron N25Q 512Mb Flash device detected, configuringFlash controller frequency configured to 5 MHz (bus frequency 60 MHz, dummy cycles 10)BMC SPI flash ID: 0x20ba2010Raptor Tercel SPI master found, device version 1.0.0Micron N25Q 512Mb Flash device detected, configuringFlash controller frequency configured to 5 MHz (bus frequency 60 MHz, dummy cycles 10)Host SPI flash ID: 0x20ba2010
FSP0>

Pretty good speed bump in Web page load just with that change to the core clock.

Who wants Arctic Tern modules? ;)
Comment 32 tpearson 2022-07-05 22:16:30 BST
Things are currently a bit unstable at 60MHz, but a quick check at 50MHz is showing better functionality.  Over time, as part of a different project, we'll be working on getting the timing fixed -- for now, 50MHz is the known-good target for at least the industrial grade Arctic Tern modules (commercial grade may be able to go faster).
Comment 33 Luke Kenneth Casson Leighton 2022-07-06 00:10:35 BST
(In reply to tpearson from comment #32)
> Things are currently a bit unstable at 60MHz, but a quick check at 50MHz is
> showing better functionality.  Over time, as part of a different project,
> we'll be working on getting the timing fixed -- for now, 50MHz is the
> known-good target for at least the industrial grade Arctic Tern modules
> (commercial grade may be able to go faster).

manufacturing tolerances on DRAM ICs is rather unfortunately in the range
48 to 55 mhz.  the *listed* minimum clock rate in datasheets is 100 mhz!
i.e. if you can get stability at 55 mhz then chances are high all product
sold will work.

unnfortunately i learned recently that nextpnr is not capable of dealing
with twin non-synchronous clocks. i.e. it will screw up the routing if
you try CDC.
Comment 34 tpearson 2022-07-06 05:20:52 BST
Quick question: the current LiteX support needs this commit to the soc repository, does it look good to push to the main soc repo?

https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-libresoc/soc/-/commit/52a661be75b38d7f0a0a7d5e8751b1c469132629
Comment 35 Luke Kenneth Casson Leighton 2022-07-06 09:12:15 BST
(In reply to tpearson from comment #34)
> Quick question: the current LiteX support needs this commit to the soc
> repository, does it look good to push to the main soc repo?
> 
> https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-libresoc/
> soc/-/commit/52a661be75b38d7f0a0a7d5e8751b1c469132629

other than florent pissed me off so much and caused so many problems
after i gave him 3 chances that i don't even want the word "litex" in the
repository, or to look even remotely like his work is being endorsed
or supported in any way... yes.

i'm redoing it as "fabric" - "Add fabric compatibility mode" and "fabric_compat"

commit 557e6d75a40c0901e74a4963b71b4ce395361e57 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Jul 6 09:11:47 2022 +0100
Comment 36 tpearson 2022-07-06 18:43:57 BST
OK...in the future could we ensure the authorship is split / maintained?  Mostly thinking of potential audits, since it looks like someone else did the work. :)

I'll get the CPU data module updated to reference the new GIT commits.
Comment 37 Luke Kenneth Casson Leighton 2022-07-06 18:56:33 BST
(In reply to tpearson from comment #36)
> OK...in the future could we ensure the authorship is split / maintained? 

drat! feel free to do a commit author rewrite / force-push,
do watch out for the git submodule update in the current commit
Comment 38 tpearson 2022-07-06 18:59:35 BST
(In reply to Luke Kenneth Casson Leighton from comment #37)
> (In reply to tpearson from comment #36)
> > OK...in the future could we ensure the authorship is split / maintained? 
> 
> drat! feel free to do a commit author rewrite / force-push,
> do watch out for the git submodule update in the current commit

It's not a big deal here, there's an archive copy at https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-libresoc/soc/-/tree/libresoc-bug-855-original-changes and a record in this bug report.  Was more a heads up for the future, in case there was a process issue. :)

Working on the CI test script today...
Comment 39 tpearson 2022-07-06 20:11:17 BST
Kestrel install and CI scripts added:

https://git.libre-soc.org/?p=dev-env-setup.git;a=commit;h=ba2b5226bff6356e7eecb66b957cfa5c6f931141
Comment 41 Luke Kenneth Casson Leighton 2022-07-06 20:40:40 BST
just investigating this: 

  File "./rcs_arctic_tern_bmc_card.py", line 604, in <module>
    main()
  File "./rcs_arctic_tern_bmc_card.py", line 580, in main
    builder.build(**builder_kargs, run=args.build)
  File "/home/lkcl/src/kestrel/litex/litex/soc/integration/builder.py", line 252, in build
    self._generate_includes()
  File "/home/lkcl/src/kestrel/litex/litex/soc/integration/builder.py", line 134, in _generate_includes
    for k, v in export.get_cpu_mak(self.soc.cpu, self.compile_software):
  File "/home/lkcl/src/kestrel/litex/litex/soc/integration/export.py", line 92, in get_cpu_mak
    ("ARCHITECTURE", select_triple(triple).split("-")[0]),
  File "/home/lkcl/src/kestrel/litex/litex/soc/integration/export.py", line 87, in select_triple
    raise OSError(msg)
OSError: Unable to find any of the cross compilation toolchains:
- powerpc64le-linux
- powerpc64le-linux-gnu

which is odd because:

$  dpkg -l | grep powerpc
ii  binutils-powerpc64-linux-gnu           2.31.1-16                               amd64        GNU binary utilities, for powerpc64-linux-gnu target
ii  binutils-powerpc64le-linux-gnu         2.31.1-16                               amd64        GNU binary utilities, for powerpc64le-linux-gnu target
ii  cpp-8-powerpc64-linux-gnu              8.3.0-2cross2                           amd64        GNU C preprocessor
ii  cpp-8-powerpc64le-linux-gnu            8.3.0-2cross1                           amd64        GNU C preprocessor
ii  gcc-8-powerpc64-linux-gnu              8.3.0-2cross2                           amd64        GNU C compiler
ii  gcc-8-powerpc64-linux-gnu-base:amd64   8.3.0-2cross2                           amd64        GCC, the GNU Compiler Collection (base package)
ii  gcc-8-powerpc64le-linux-gnu            8.3.0-2cross1                           amd64        GNU C compiler
ii  gcc-8-powerpc64le-linux-gnu-base:amd64 8.3.0-2cross1                           amd64        GCC, the GNU Compiler Collection (base package)
Comment 42 Luke Kenneth Casson Leighton 2022-07-06 20:42:18 BST
doh.

--- a/ppc64-gdb-gcc
+++ b/ppc64-gdb-gcc
@@ -5,7 +5,9 @@ if [ "$EUID" -ne 0 ]
 fi
 
 # first install powerpc64 gcc-8 cross-compiler
-apt-get install gcc-8-powerpc64-linux-gnu wget texinfo
+apt-get install gcc-8-powerpc64-linux-gnu \
+                gcc-powerpc64-linux-gnu \
+                wget texinfo
Comment 43 tpearson 2022-07-06 20:43:34 BST
Yeah, this was built on a POWER box.  Apologies for overlooking the cross compiler on other platforms.
Comment 44 Luke Kenneth Casson Leighton 2022-07-06 21:38:23 BST
no problem.  currently fighting nextpnr-ecp5 build errors.
Comment 45 Luke Kenneth Casson Leighton 2022-07-07 14:28:25 BST
5962de9b
36310a2d
21d109cf
60af61c1
f4fa39f8
d31a60f8
a809cc53
4fb52854
d1a961d2
a2a2d991
43ec5a4f
aa91bdd0
a6c76b34
5fd31b34
f1df7c45
dffe0db   <-- missing leading zero
994c82b1
617ed732


this is probably something of a fluke but definitely a litex bug

Info: Program finished normally.
Inconsistent word width at line 15 of /home/lkcl/src/kestrel/litex-boards/litex_boards/targets/build/rcs_arctic_tern_bmc_card/gateware/rom.init!
triple ('powerpc64le-linux', 'powerpc64le-linux-gnu')
platform linux-x86_64
whichin powerpc64le-linux-gcc
whichin powerpc64le-linux-gnu-gcc
platform linux-x86_64
whichin powerpc64le-linux-gcc
whichin powerpc64le-linux-gnu-gcc
Traceback (most recent call last):
  File "./rcs_arctic_tern_bmc_card.py", line 604, in <module>
    main()
  File "./rcs_arctic_tern_bmc_card.py", line 587, in main
    "-t", os.path.join(builder.gateware_dir, "rom_data.init")])
  File "/usr/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ecpbram', '-i', '/home/lkcl/src/kestrel/litex-boards/litex_boards/targets/build/rcs_arctic_tern_bmc_card/gateware/rcs_arctic_tern_bmc_card.config', '-o', '/home/lkcl/src/kestrel/litex-boards/litex_boards/targets/build/rcs_arctic_tern_bmc_card/gateware/rcs_arctic_tern_bmc_card_stuffed.config', '-f', '/home/lkcl/src/kestrel/litex-boards/litex_boards/targets/build/rcs_arctic_tern_bmc_card/gateware/rom.init', '-t', '/home/lkcl/src/kestrel/litex-boards/litex_boards/targets/build/rcs_arctic_tern_bmc_card/gateware/rom_data.init']' returned non-zero exit status 1.
Comment 46 tpearson 2022-07-07 18:40:03 BST
In my experience, that's because of Python pulling in a different LiteX version.  The Kestrel version has this patch applied:

https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/migen/-/commit/36c73dd8a7adfbeeaaf460cc0bd25c92eb51a468
Comment 47 tpearson 2022-07-07 18:44:55 BST
Scratch that, my bad, it's a migen repo patch not a litex one.  We need to merge that patch to the LibreSoC nmigen version, is that something that would be acceptable?
Comment 48 Luke Kenneth Casson Leighton 2022-07-07 19:22:34 BST
(In reply to tpearson from comment #47)
> Scratch that, my bad, it's a migen repo patch not a litex one.  We need to
> merge that patch to the LibreSoC nmigen version, is that something that
> would be acceptable?

migen or nmigen?  i'm only the maintainer of nmigen (in which case yes),
migen is upstream sebastien bordeauducq.
Comment 49 tpearson 2022-07-07 19:39:04 BST
Looking further, I think it's migen.  We can handle that one of two ways, the simplest is to use the Kestrel migen version in the hdl setup scripts, the more problematic way would be to try to push it upstream.
Comment 50 tpearson 2022-07-07 19:41:03 BST
I went ahead and added a build script for the Zephyr firmware:

https://git.libre-soc.org/?p=dev-env-setup.git;a=commit;h=1bac06b4bdfe6c45753e58ad18ce25d2c46d6038

In theory, you should now be able to build both the LibreSoC-enabled bitstream as well as the RTOS image for network loading.  The only piece you're missing at this point is the Arctic Tern hardware to load those two components onto...
Comment 51 Jacob Lifshay 2022-07-07 19:49:17 BST
(In reply to tpearson from comment #50)
> The only piece
> you're missing at this point is the Arctic Tern hardware to load those two
> components onto...

yup, sorry, still waiting on nlnet for that...
Comment 52 tpearson 2022-07-07 19:50:56 BST
(In reply to Jacob Lifshay from comment #51)
> yup, sorry, still waiting on nlnet for that...

No problem.  At least this excercise shook out a few files that hadn't made their way to the public GIT trees, so everything should be ready for when the hardware is purchased and connected...
Comment 53 Luke Kenneth Casson Leighton 2022-07-07 19:51:49 BST
(In reply to tpearson from comment #49)
> Looking further, I think it's migen.  We can handle that one of two ways,
> the simplest is to use the Kestrel migen version in the hdl setup scripts,
> the more problematic way would be to try to push it upstream.

yes please kestrel it is, but sebastien should be open (and really grateful) to
receive migen upstream patches. 15 years old or not.
https://github.com/m-labs/migen

i added the clone here
https://git.libre-soc.org/?p=dev-env-setup.git;a=commitdiff;h=bf7327ad43be743b85805a98eb40056cdeff75a1

didn't do anything with it though :)

(In reply to tpearson from comment #50)
> I went ahead and added a build script for the Zephyr firmware:
> 
> https://git.libre-soc.org/?p=dev-env-setup.git;a=commit;
> h=1bac06b4bdfe6c45753e58ad18ce25d2c46d6038

fantastic.

> In theory, you should now be able to build both the LibreSoC-enabled
> bitstream as well as the RTOS image for network loading.  The only piece
> you're missing at this point is the Arctic Tern hardware to load those two
> components onto...

soon. $EUR.
Comment 54 tpearson 2022-07-07 20:25:05 BST
Let's see what upstream says: https://github.com/m-labs/migen/pull/262
Comment 55 Luke Kenneth Casson Leighton 2022-09-28 11:35:30 BST
Substituting Libre-SOC in place of Microwatt was successful, and allowed a
demonstration of a Web Server front-end running from Zephyr Real-Time OS
through a Gigabit Ethernet port on an Arctic Tern FPGA Board.

This showed that Libre-SOC's Power ISA Core was capable of running an
Operating System through a Gigabit Ethernet port.