393 – Hook up augmented-Wishbone Memory Bus to LDSTCompUnit (via PortInterface)

Bug 393 - Hook up augmented-Wishbone Memory Bus to LDSTCompUnit (via PortInterface)

Summary: Hook up augmented-Wishbone Memory Bus to LDSTCompUnit (via PortInterface)

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Mac OS

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:	403
Blocks:	383
	Show dependency tree / graph

Reported:	2020-06-19 22:33 BST by Yehowshua
Modified:	2021-04-20 14:48 BST (History)
CC List:	2 users (show)

See Also:	382 318 216 397
NLnet milestone:	NLNet.2019.10.043.Wishbone
total budget (EUR) for completion of task and all subtasks:	300
budget (EUR) for this task, excluding subtasks' budget:	300
parent task for budget allocation:	383
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	"lkcl"={amount=300, paid=2020-08-21}

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Yehowshua 2020-06-19 22:33:42 BST

Comment 1 Luke Kenneth Casson Leighton 2020-06-19 22:58:19 BST

increnental strategy described here:
http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-June/008231.html

the idea is to first morph L0CacheBuffer to use the minerva LoadStoreInterface.

this to be done by creating a TestMemoryLoadStoreUnit (with associated unit tests) that ignores the wishbone sbus entirely.

once L0CacheBuffer "talks" minerva LoadStoreInterface, it is plain sailing from that point.

the only changes to the minerva loadstore.py at that point would be to widen the buses to 64 bit.

Comment 2 Yehowshua 2020-06-20 14:52:39 BST

Some example memory connections.
Will merge into SOC git repo later.

https://github.com/BracketMaster/nmigen-by-example

Comment 3 Luke Kenneth Casson Leighton 2020-06-20 15:11:58 BST

(In reply to Yehowshua from comment #2)
> Some example memory connections.
> Will merge into SOC git repo later.
> 
> https://github.com/BracketMaster/nmigen-by-example

see bug #382, i have begun that process (and found a bug in the sram.py as well).

the README btw is excellent.  it would be fantastic to have the same quality of documentation in the documentation store known as the "wiki", for the project.

by being part of the project, this would help increase the readability of the project and help us to gain more contributors, because people respect the source of the information as a valuable resource.

although superb and exactly the kind of high quality readability that we need, and i appreciate that there was a dependency (nmigen-soc) whch is now added to git.libre-soc.org - unfortunately the high quality README has the unintended side-effect of directing - diverting - attention *away* from our project.

Comment 4 Yehowshua 2020-06-20 15:36:36 BST

> the README btw is excellent.  it would be fantastic to have the same quality
> of documentation in the documentation store known as the "wiki", for the
> project.

Right. Can add into the Wiki.
For a bit more context, I have a student working on my Thesis and
encountered another guy in France who found the README and examples
pretty helpful.

Comment 5 Luke Kenneth Casson Leighton 2020-06-20 20:06:12 BST

(In reply to Yehowshua from comment #4)
> > the README btw is excellent.  it would be fantastic to have the same quality
> > of documentation in the documentation store known as the "wiki", for the
> > project.
> 
> Right. Can add into the Wiki.

star

> For a bit more context, I have a student working on my Thesis and
> encountered another guy in France who found the README and examples
> pretty helpful.

yehyeh, no i meant it, they're really good!  it's just that they praise
nmigen and nmigen-soc (which is fantastic) but don't mention libre-soc.
which in a larger context is why (bug #374) this small example illustrates
why it would be detrimental to even have a mirror of our code on github
[as opposed to a respectful and short README with a link]

Comment 6 Luke Kenneth Casson Leighton 2020-06-20 20:18:51 BST

ok so back on topic here: the two interfaces that are *nearly* the same
are:

* LoadStoreInterface from minerva
* PortInterface from libre-soc l0cache.py

these are what need "matching up".

these are the signals from LoadStoreInterface:

        self.x_addr = Signal(addr_wid)
        self.x_mask = Signal(mask_wid)
        self.x_load = Signal()
        self.x_store = Signal()
        self.x_store_data = Signal(data_wid)
        self.x_stall = Signal()
        self.x_valid = Signal()
        self.m_stall = Signal()
        self.m_valid = Signal()

        self.x_busy = Signal()
        self.m_busy = Signal()
        self.m_load_data = Signal(data_wid)
        self.m_load_error = Signal()
        self.m_store_error = Signal()
        self.m_badaddr = Signal(addr_wid-log2_int(mask_wid))

and these are the ones from PortInterface:

        # distinguish op type (ld/st)
        self.is_ld_i = Signal(reset_less=True)
        self.is_st_i = Signal(reset_less=True)
        self.op = CompLDSTOpSubset()  # hm insn_type ld/st duplicates here

        # common signals
        self.busy_o = Signal(reset_less=True)     # do not use if busy
        self.go_die_i = Signal(reset_less=True)   # back to reset
        self.addr = Data(addrwid, "addr_i")            # addr/addr-ok
        # addr is valid (TLB, L1 etc.)
        self.addr_ok_o = Signal(reset_less=True)
        self.addr_exc_o = Signal(reset_less=True)  # TODO, "type" of exception

        # LD/ST
        self.ld = Data(regwid, "ld_data_o")  # ok to be set by L0 Cache/Buf
        self.st = Data(regwid, "st_data_i")  # ok to be set by CompUnit

Data is a Record that contains a "data" and an "ok".

therefore:

* pi.is_ld_i == lsi.x_load
* pi.is_st_i == lsi.x_store
* pi.ld.data == lsi.m_load_data
* pi.ld.ok   == lsi.m_valid     (probably - just guessing here)
* pi.st.data == lsi.x_store_data
* pi.st.ok   == don't know
* pi.addr.data == lsi.x_addr
* pi.addr.ok   == lsi.x_valid   (probably)
* pi.go_die_i == {concept does not exist in LSI - *might* be lsi.rst}
* pi.busy_o  == lsi.x_busy      (probably - again, just guessing)
* pi.op.data_len == lsi.x_mask but **ONLY** after processing through LenExpand
* pi.addr_ok_o   == {concept does not exist in LSI, i don't think}
* pi.addr_exc_o  == lsi.m_load_error *and* lsi.m_store_error
* {nonexistent}  == lsi.m_badaddr

this leaves lsi.x_busy to identify - plus all of the other guess-work
signals.

Comment 7 Yehowshua 2020-06-20 20:23:32 BST

Aha. So l0_cache.py **isn't** RTL - its just a mock of what the RTL would do.
So, would you want me to fit RTL into that?

Comment 8 Yehowshua 2020-06-20 20:27:41 BST

    * only one of is_ld_i or is_st_i may be asserted.  busy_o
      will immediately be asserted and remain asserted.

That sounds like a rule for formal...

Comment 9 Luke Kenneth Casson Leighton 2020-06-20 20:35:20 BST

(In reply to Yehowshua from comment #7)
> Aha. So l0_cache.py **isn't** RTL -

it is indeed the RTL.

> its just a mock of what the RTL would do.

no, it's a combination of different classes that includes a "test"
class (the current L0CacheBuffer) which, as the docstring says,
allows us to "move on" in other areas of the code without the
memory architecture being a major blocker

it also contains sub-components which are needed for the "production"
version.

now, if we don't have time before the Oct 2020 deadline, L0CacheBuffer
*will* be the RTL that ends up being taped out.

this would be bad, because it can't handle misaligned ld/st and it
is deliberately hard-limited to one LD/ST at a time.


> So, would you want me to fit RTL into that?

see comment #1 and the link to the incremental strategy that i outlined.

i advocate that we go one step at a time - forget L1 Cache entirely for now
as it is a distraction.

if we can get a TestMemoryLoadStoreUnit written - *no cache nothing
its sole purpose being to write to TestMemory* which can then be
connected into L0CacheBuffer, then dropping in either BareLoadStoreUnit
*or* CacheLoadStoreUnit *OR* the cache code that you would like to write
is an absolutely trivial one-line change.

however without everything complying with that LoadStoreInterface, none
of that is possible.

Comment 10 Yehowshua 2020-06-20 20:36:12 BST

Sorry - its an RTL interface. I got it now.

Comment 11 Luke Kenneth Casson Leighton 2020-06-20 21:38:46 BST

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Sat, Jun 20, 2020 at 8:55 PM Yehowshua <yimmanuel3@gatech.edu> wrote:
>
> >
> > what do you think?
>
> > * create a TestMemoryLoadStoreUnit (TMLSU) which is compliant with the
> > LSUI interface, but *IGNORES* LSUI's wishbone bus entirely
> > * make TMLSU read and write to TestMemory *using* the LSUI x_addr,
> > x_mask, etc. etc.
>
> Wait - why not just have TestMemory test
> LSUI connected to memory via wishbone?

because that's yet another piece in the chain that would need to be
understood.   the FSMs of wishbone are complex enough to understand
without being in the way of TestMemory.

    L0CacheBuffer <- LoadStoreInterface -> TestMemory is dead simple

    L0CacheBuffer <-> LoadStoreInterface <- can have a unit test

    unit test here -> LoadStoreInterface -> TestMemory is dead simple

what you are proposing is:

    L0CacheBuffer <- LoadStoreInterface <- SomeClass -> Wishbone -> TestMemory

for which we need *four* unit tests - one at each join-point.  (note: SomeClass
is almost identical to BareUnitLoadStore).

now that *is* actually what we need.... however: one thing at a time, and,
more than that, it's... well... 4 things (actually 8 including unit tests)
to write rather than 2 (actually 4) before we get to the point where we can
get some feedback about whether things work.

>
> That would make sense to me?
>
> I might not be understanding everything here,
> but if we make all memories look like wishbone mems,
> whether caches or SRAMS, then the only problem that needs solving
> Is ensuring LSUI is wishbone compliant.

yes... however that's a lot of complexity - a lot of code to write before
we can actually do integration.

the strategy above (outlined by starting with TestMemoryLoadStoreUnit)
is an incremental one with a lot less risk.

> However, I remember that we had two caches - that is, the even/odd cache
> Thingies connected directly to LDSTunit .

thingies lol.  yes.  that's the next phase.... but not yet.

> It might make sense to abstract out these with wishbone?

wishbone's sufficiently complex that it needs its own unit test... *and*
we need a unit test on the thing it connects to.

should anything go wrong with wishbone, we can't even *get* to the thing that
it's behind.

Comment 12 Yehowshua 2020-06-20 22:15:41 BST

Well - we don't have to do full wishbone...

If we targeting Litex DRAM, sub, cycle, addr, re, and data
really suffice I think.


Basically, LiteX DRAM behaves kinda like Harry Ho's SRAM.

And now I'm confused.
L0 Cache **isnt** the even odd cache things?

At some point, we should have a call so I can better understand this.

For now, I can make a dead simple cache that looks like a memory.
I'm afraid I won't be much help on LDST as my conversation will generate
more questions.

I really wish thing were more simple.

Comment 13 Yehowshua 2020-06-20 22:24:26 BST

I won't be of much use on LDST until I've read through the link below...
Unless - you have something wishbone that you want me to plug into.
https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE

Luke, I would suggest not going for full wishbone compliance.
Just do something that **works with LiteX DRAM sitting below the LDST
unit.

It can really be quite simple - even if the resulting FSM isn't fully functional.

Comment 14 Yehowshua 2020-06-20 22:25:10 BST

> be quite simple - even if the resulting FSM isn't fully
> functional.

I mean to say compliant.
It should be functional of course.

Comment 15 Yehowshua 2020-06-20 22:27:33 BST

Ah yes - L0_cache buffer - I see it here:
https://bugs.libre-soc.org/show_bug.cgi?id=216

Comment 16 Yehowshua 2020-06-20 22:27:52 BST

Pardon - here.
https://libre-soc.org/3d_gpu/architecture/6600scoreboard/

Comment 17 Luke Kenneth Casson Leighton 2020-06-20 22:48:42 BST

(In reply to Yehowshua from comment #12)
> Well - we don't have to do full wishbone...

it's the fact that it has to be done at all which places a chain of 4
completely unknown pieces of code in a line, before any of the 4 can
be tested that is the problem.

the strategy is to cut that back *first* to *only* 2... and *then*
add in the other 2.

we have two tasks, here:

1) convert L0CacheBuffer to talk the LoadStoreInterface
2) add wishbone.

what you are proposing is to do those *two things at the same time*.

can you see that this chain of unknowns provides us with no way to
know which two pieces *might* be a problem, if there is code that
"does not work"?

_yes_ we need to do both tasks.... however you are proposing to do
*both* of them *at the same time* and consequently this will be far
harder than it needs to be


> If we targeting Litex DRAM, sub, cycle, addr, re, and data
> really suffice I think.

... which is extra complexity that still does not help us because that
is step (2) *and we still need step (1)*.


> Basically, LiteX DRAM behaves kinda like Harry Ho's SRAM.
> 
> And now I'm confused.
> L0 Cache **isnt** the even odd cache things?

see line 322 of L0CacheBuffer:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/l0_cache.py#l322

 322     Note that the final version will have *two* interfaces per LDSTCompUnit,
 323     to cover mis-aligned requests, as well as *two* 128-bit L1 Cache
 324     interfaces: one for odd (addr[4] == 1) and one for even (addr[4] == 1).

(i have been meaning to correct that to "even addr[4] == 0" for some time) ... :)

it *becomes* the odd-even cache system

> At some point, we should have a call so I can better understand this.
> 
> For now, I can make a dead simple cache that looks like a memory.

as long as it conforms to the LoadStoreInterface it will be useful.


> I'm afraid I won't be much help on LDST as my conversation will generate
> more questions.

questions are great.
 

> I really wish thing were more simple.

they are.  it's just that you're suggesting to add extra complexity without
fully understanding that it is in fact extra complexity, not less.

the very simple approach - we need two things:

1) a class named TestMemoryLoadStoreUnit which conforms to LoadStoreInterface
   (as a slave)

2) modifications to L0CacheBuffer which gets it to conform to LoadStoreInterface
   (as a master)

that's it.  that's all that's needed to get this:

    L0CacheBuffer <- LoadStoreInterface -> TestMemory

it cannot get less simple than that.

it's two unit tests.

one of those unit tests actually partly exists already (the existing unit
test for L0CacheBuffer).

wishbone is *not* needed.  at all, in that.

yet it provides us with the absolutely critical step of converting L0CacheBuffer
to LoadStoreInterface.


what you are (might be) proposing is (requires the following steps):

1) a class that implements a Memory that conforms to Wishbone (we already have
   this, it is Harry's sram.py)

2) to bring in BareLoadStoreUnit and connect it to the Wishbone-enabled sram.py
   (BareLoadStoreUnit conforms slave-side to LoadStoreInterface)

3) to modify L0CacheBuffer to get it to conform to LoadStoreInterface
   (master-side)

connect (3) to (2) to (1)

that's four unit tests.

that's more work, more complexity, and more things to go wrong.


or, what you might be proposing - which is *even more* complex is:

1) to write a *completely new* from-scratch cache

2) to put an interface around it that conforms to Wishbone

3) to (create?) a completely new(?) class that conforms to LoadStoreInterface
   (slave-side)

4) to modify L0CacheBuffer to get it to conform to LoadStoreInterface.
   (master-compliant)

that's *FIVE* unit tests, and *THREE* completely unknown, completely new sets
of code.

that's *even more* complex!


again: *yes* we have to get to that point.  however if we do it *right now*,
straight away, we have a chain of completely unknown untested code.

i don't understand why it is that you don't understand that what you are proposing
is far more complex.

what you are proposing *still and always requires that L0CacheBuffer be converted
to LoadStoreInterface*.

therefore, it should be logically obvious that the first incremental step should
be to get that done as the first top priority.

and it should also be logically obvious that, because we are doing incremental
unit tests, that the simplest possible way to demonstrate that L0CacheBuffer
"works" when converted to LoadStoreInterface is:

write a simple class that connects LoadStoreInterface (slave) to TestMemory.

what is not obvious and simple about that?

i don't understand why it's not obvious and simple.

perhaps if you write it in your own words, describing how to connect to
PortInterface, that would help?

PortInterface is our equivalent of minerva LoadStoreInterface.

can you describe - in your own words - what you would do, what code you would
write - that would connect to PortInterface, to provide LOAD/STORE capability?

Comment 18 Luke Kenneth Casson Leighton 2020-06-20 22:51:57 BST

(In reply to Yehowshua from comment #13)
> I won't be of much use on LDST until I've read through the link below...
> Unless - you have something wishbone that you want me to plug into.
> https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE
> 
> Luke, I would suggest not going for full wishbone compliance.
> Just do something that **works with LiteX DRAM sitting below the LDST
> unit.

ok.  think it through.  what classes are needed?

> It can really be quite simple - even if the resulting FSM isn't fully
> functional.

indeed.  that is not the issue.

what components are needed?  describe the *full* chain.

think it through.

bear in mind that no matter what is created, they *have* to talk - interface to -
PortInterface [the libre-soc equivalent of LoadStoreInterface].

forget *everything* in l0_cache.py

describe to me - in your own words - *every* component, every piece of code,
outlining the *full* connectivity between each and every component.... starting
at:

     PortInterface.

go from there.

Comment 19 Luke Kenneth Casson Leighton 2020-06-22 14:21:10 BST

i've created TestMemoryPortInterface:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/pimem.py;hb=HEAD

it's based *on* L0CacheBuffer however is simpler in that it "manages"
the TestMemory itself.

i've added comments into TestMemoryPortInterface that it is intended as the
"absolute simplest compliant thing with PortInterface that can handle LOAD/STOREs"

L0CacheBuffer now basically reduces to the role of "PortInterface Arbiter",
picking one and only one incoming PortInterface and passing it through.

that's the next task.

Comment 20 Luke Kenneth Casson Leighton 2020-06-22 20:29:13 BST

(In reply to Luke Kenneth Casson Leighton from comment #19)

> L0CacheBuffer now basically reduces to the role of "PortInterface Arbiter",
> picking one and only one incoming PortInterface and passing it through.
> 
> that's the next task.

done.  L0CacheBuffer's sole job now is to select between incoming
PortInterfaces (one per LDSTCompUnit), and to output *to* a PortInterface.

that will however not provide support for misaligned requests.

however the nice thing is: DualPortSplitter *will* deal with misaligned
requests... and DualPortSplitter is intended to comply with PortInterface.

hook one up to the other...

Comment 21 Luke Kenneth Casson Leighton 2020-06-25 11:40:12 BST

michael i created a file soc/experiment/pi2ls.py which contains this:

"""
    PortInterface                LoadStoreUnitInterface

    is_ld_i/1          x_ld_i
    is_st_i/1          x_st_i

    data_len/4         x_mask/16  (translate using LenExpand)

    busy_o/1           most likely to be x_busy_o
    go_die_i/1         rst?
    addr.data/48       x_addr_i[4:] (x_addr_i[:4] goes into LenExpand)
    addr.ok/1          probably x_valid_i & ~x_stall_i

    addr_ok_o/1        no equivalent.  *might* work using x_stall_i
    addr_exc_o/2(?)    m_load_err_o and m_store_err_o

    ld.data/64         m_ld_data_o
    ld.ok/1            probably implicit, when x_busy drops low
    st.data/64         x_st_data_i
    st.ok/1            probably kinda redundant, set to x_st_i
"""

it shows the mapping between PI and LSUI, as best as i can determine,
and also shows where LenExpand gets used.  from TestMemoryPortInterface:

            # set up LenExpander with the LD len and lower bits of addr
            lsbaddr, msbaddr = self.splitaddr(ldport.addr.data)
            comb += lenexp.len_i.eq(ldport.data_len)
            comb += lenexp.addr_i.eq(lsbaddr)

and then this should do the trick:

            comb += lsui.x_mask_i.eq(lenexp.exp_o)

btw TestMemoryPortInterface is done as a FSM because, given that it represents
PortInterface, it has to do the 3 things: offer, exchange, complete.

so this is why there is the "adrok_l" latch, because that is kept LOW to
indicate the "offer" phase is still being negotiated.  once "adrok_l" goes
HI this indicates a move into "exchange" phase of the Contract.

* "complete" is represented by reset_l.

the "offer" part the bit where we have to fudge things for a while: still have that
"address ok / not-ok" phase however return fake answers ("yes address is always
ok")

or, if you actually want to make it "work", check the size of the TestMemory,
and if the address is out of range of the size of the TestMemory, actually
set addr_exc_o to 1!

Comment 22 Luke Kenneth Casson Leighton 2020-06-25 18:54:13 BST

oh, also, as you can probably see, i got fed up with LoadStoreUnitInterface names not being clear which were inputs and which were outputs :)

i went through minerva core.py and also loadstore.py and ended up independently confirming what you had identified as input and output.

Comment 23 Luke Kenneth Casson Leighton 2020-06-25 20:55:24 BST

[master 7400c18] start connecting up Pi2LSUI

okok i was bored, i started on it :)

looking at lsmem.py, it occurs to me that simply hooking up
BareMemoryLoadStoreUnit / CacheLoadStoreUnit to harry ho's
sram.py the exact same unit test could be used.

and that that could be done through that reconfigureable
class in bug #403.

Comment 24 Luke Kenneth Casson Leighton 2020-06-26 12:45:41 BST

here's the test class that connects Minerva BareLoadStoreUnit to a
harry ho wishbone-based SRAM:

[master 638a80f] add a test SRAM that lives behind a minerva LoadStoreUnitInterface

Comment 25 Luke Kenneth Casson Leighton 2020-07-13 00:03:01 BST

error handling (exceptions) still TODO here

Comment 26 Luke Kenneth Casson Leighton 2020-08-18 13:54:52 BST

TODO add error handling separately once dcache vhdl has been converted and analysed.  this will tell us the full range of error messages required.