Bug 382 - nmigen wishbone Memory (SRAM) object needed
Summary: nmigen wishbone Memory (SRAM) object needed
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 383
  Show dependency treegraph
 
Reported: 2020-06-14 18:32 BST by Luke Kenneth Casson Leighton
Modified: 2021-04-20 14:55 BST (History)
3 users (show)

See Also:
NLnet milestone: NLNet.2019.10.Wishbone
total budget (EUR) for completion of task and all subtasks: 150
budget (EUR) for this task, excluding subtasks' budget: 150
parent task for budget allocation: 383
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
"lkcl"={amount=150, paid=2020-08-21}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-06-14 18:32:15 BST
for testing purposes (and also because we will actually need an on-board SRAM
and maybe a ROM as well) we need a means to place a Memory object behind a
wishbone bus.

does anyone know of anything like this?

one of the simplest ways to achieve it is to hack the minerva L1 cache
code, removing the "tag" aspect and substituting direct memory read-write.

however it would be preferable not to spend the time doing that if someone
knows of a pre-existing block.
Comment 3 Luke Kenneth Casson Leighton 2020-06-19 21:51:43 BST
found another
https://git.m-labs.hk/M-Labs/HeavyX

heavycomps/heavycomps/wishbone.py contains an SRAM.
Comment 4 Luke Kenneth Casson Leighton 2020-06-19 21:52:29 BST
(In reply to Luke Kenneth Casson Leighton from comment #3)
> found another
> https://git.m-labs.hk/M-Labs/HeavyX
> 
> heavycomps/heavycomps/wishbone.py contains an SRAM.

btw watch out for that repository, it now appears to be default language
in russian.
Comment 5 Luke Kenneth Casson Leighton 2020-06-19 22:06:28 BST
https://git.libre-soc.org/?p=nmigen-soc.git;a=blob;f=nmigen_soc/wishbone/sram.py;h=de5c15d551c081a224590ee4ba8706ca3d2d7547;hb=refs/heads/wishbone_interconnect

i've created a repository nmigen-soc, tracking harry ho's repo
https://github.com/HarryMakes/nmigen-soc

i can't stand branches: they always end up making a mess, indicating divergence
and communication delay.

we need to move much faster.
Comment 6 Luke Kenneth Casson Leighton 2020-06-19 23:39:44 BST
i did a quick scan/review of heavycomp SRAM and the branched nmigen-soc version and the nmigen-soc version looks much better.

it's the wishbone arbiter (etc) that just looks more complete.

we will need to add nmigen-soc as a library dependency.

unfortunately with no obvious central project being responsible for this code and with the deadline rapidly approaching we cannot delay so it makes sense to simply fork this code, including adding unit tests to it and so on.

if it was a much larger codebase or if we did not have a hard deadline i would not recommend that strategy.
Comment 7 Luke Kenneth Casson Leighton 2020-06-20 11:58:01 BST
yehowshua, hi, can you please put where the external source code is that contains
the libresoc unit tests for sram, i would like to run them and am delayed doing
so until i have spent time tracking them down.
Comment 8 Luke Kenneth Casson Leighton 2020-06-20 11:59:31 BST
found them by searching through my inbox containing 60,000 messages:
https://github.com/BracketMaster/nmigen-by-example
Comment 9 Yehowshua 2020-06-20 15:39:31 BST
> yehowshua, hi, can you please put where the external source code is that
> contains

Just now saw this. Glad you found it!
Also, what was the bug in SRAM.py?
Comment 10 Yehowshua 2020-06-20 18:35:14 BST
You can't write to Minerva's cache.
I was to shy to say this before because it sounds silly and I thought maybe I didn't understand Minerva.
But after EXTENSIVE review, I'm certain this is the case.
Minerva handles writes by going directly to memory.
In fact, Minerva's Cache is really really simple - I don't think its what we're looking for.

Luke, I HAVE to have a cache for my thesis, so I'm thinking of writing one.
Maybe I can kill two birds with one stone - just tell me what the cache must be able to do.
I can also formally verify it no problem.
Comment 11 Yehowshua 2020-06-20 18:36:37 BST
In fact, Minerva's cache codebase is really bizarre.
We'd really have to shoe horn it I think.
Comment 12 Yehowshua 2020-06-20 18:43:28 BST
(In reply to Yehowshua from comment #11)
> In fact, Minerva's cache codebase is really bizarre.
> We'd really have to shoe horn it I think.

Basically, any time there is a write, if the entry is in the cache,
it gets evicted from the cache.

The write data then goes around the cache - not through the cache,
on a separate bus, directly to memory.

Yehowshua
Comment 13 Yehowshua 2020-06-20 18:55:56 BST
Basically this is the cache interface that I tried to do around Minerva's cache,
but its really really hard to shoehorn Minerva's two stage cache.

(Higher Memory or CPU) <==> (CACHE) <==> (Lower Memory or RAM)

I'm going to start writing such a cache and its proof.
I could have finished it in the time I spent trying to shoehorn Minerva.
It will be useful for my thesis, hopefully it will make sense for LibreSOC.
Comment 14 Luke Kenneth Casson Leighton 2020-06-20 19:01:00 BST
(In reply to Yehowshua from comment #9)
> > yehowshua, hi, can you please put where the external source code is that
> > contains
> 
> Just now saw this. Glad you found it!
> Also, what was the bug in SRAM.py?


            n_wrport = wrport.en.shape()[0]
            n_bussel = self.bus.sel.shape()[0]
            assert n_wrport == n_bussel, "bus enable count %d " \
                    "must match memory wen count %d" % (n_wrport, n_bussel)
            for i in range(n_wrport):
                m.d.comb += wrport.en[i].eq(self.bus.cyc & self.bus.stb &
                                            self.bus.we & self.bus.sel[i])

was:
            for i in range(4):

which would only copy over a subset of wen signals.  and on certain
granularities would overrun the wen array.
Comment 15 Yehowshua 2020-06-20 19:26:29 BST
> which would only copy over a subset of wen signals.  and on certain
> granularities would overrun the wen array.


Oh yeah. I implicitly in my mind recognized you need to modify that for 64 bits.
Comment 16 Luke Kenneth Casson Leighton 2020-06-20 19:37:26 BST
(In reply to Yehowshua from comment #10)
> You can't write to Minerva's cache.

correct.  you only write via CacheLoadStoreUnit - that's its job.
Minerva.L1Cache is just a subcomponent that is managed *through*
CacheLoadStoreUnit.

it's a bit of a pain / mind-bender.

i believe RocketChip, ariane, and so on have this type of design.
however microwatt i believe has the design you are "expecting":

there, i believe, we can see not just evidence of L1, we can see TLB
as well:

https://github.com/antonblanchard/microwatt/blob/master/dcache.vhdl

in other words their team has merged at least three separate and
distinct objects into one, and consequently it "looks" more like
what you might be expecting.

if we were to *separate* the code from lines 102 through 139 (and
possibly slightly beyond) into a *separate* file named l1cache.vhdl,
*now* we have the equivalent of Minerva.L1Cache and *now* you would,
i believe, become likewise confused :)


> I was to shy to say this before because it sounds silly and I thought maybe
> I didn't understand Minerva.

:)

> But after EXTENSIVE review, I'm certain this is the case.
> Minerva handles writes by going directly to memory.

it should write to *both*.  that's the definition of a write-through cache
[that's if i understand the code correctly in CacheLoadStoreUnit]

i.e. i believe CacheLoadStoreUnit *is* what you are thinking of as being a 
"write-through" cache.... *not* Minerva.L1Cache itself.

my guess is that you are thinking - please do let me know if this is
correct or not - in terms of an interface where there is a single
Wishbone interface to {InsertNameOfL1CacheToBeDesignedHere}

and that in {InsertNameHereL1Cache} *that* is where it takes care of
write-through to Main Memory (and so on).

if that is the case, then it is simply a terminology / naming confusion,
because CacheLoadStoreUnit *is* that {InsertNameHereL1Cache}.

this becomes clearer when we come to adding L2 (and L3?) because then there
will be *THREE* wishbone Buses in CacheLoadStoreUnit:

1) a WB Bus to the object known by the name "Minerva.L1Cache"
2) a WB Bus to the [future] object known by the name "L2 Cache"
3) a WB Bus to *MAIN MEMORY*.

ths [future] CacheLoadStoreUnit would:

a) make a read request first of (1), then if that failed 
b) make a request of 2) and if that failed
c) make a read request to 3) and if that failed
d) report an address error / exception.

for a write - if we were to respect "write-thru" rules, you would initiate
*all three* of these at once:

a) write request to (1)
b) write request to (2)
c) write request to (3)

now, if you wanted to do "evict" and MESI or MOESI blah blah you would do:

a) write request to (1), and if not room in L1 perform an "evict" which
b) writes the *EVICTED* data to (2), and if not room in L2 likewise "evict"
c) the *L2* evicted data gets written to (3)


what i *can't* tell is whether the Minerva CacheLoadStoreUnit is actually
a write-through or whether it implements the latter delayed/eviction algorithm.

and *sigh* with no documentation, we have no idea.  some unit tests which
deliberately fill up the cache would help us to tell.


> In fact, Minerva's Cache is really really simple - I don't think its what
> we're looking for.

i believe you may agree, after reflecting on the above, that it's just
a sub-component, and that through naming you *may* have been misdirected
in expectations that it does more than its name suggests.


> Luke, I HAVE to have a cache for my thesis, so I'm thinking of writing one.
> Maybe I can kill two birds with one stone - just tell me what the cache must
> be able to do.

i think the main thing we need - the one thing that minerva's cache cannot
do which we need at this point (there will be others) is to give a
"reservation" on a cache line which can be dropped.

basically a decoupling of the "address assertion" from "actually putting
data into it".

however with a little bit of thought, i *believe* that this may be
achieved in *CacheLoadStoreUnit*...

*WITHOUT* having to actually modify Minerva's L1 Cache code at all.

i.e. it is a *FSM* issue, not a "modification of Minerva.L1Cache" issue.

however... honestly... i would be inclined to recommend to you that you
simply go ahead and write one anyway, because it will give you a better
understanding than trying to examine Minerva's code.

because you were the person that wrote it.

and therefore (hint, hint) documented it.

however what i suggest is - if this is ok - please do stick to that
same LoadStoreUnitInterface.

the reason is because then everything becomes a drop-in replacement,
revolving around that common data structure.


> I can also formally verify it no problem.

ahh excellent.
Comment 17 Luke Kenneth Casson Leighton 2020-06-20 19:40:31 BST
(In reply to Yehowshua from comment #15)
> > which would only copy over a subset of wen signals.  and on certain
> > granularities would overrun the wen array.
> 
> 
> Oh yeah. I implicitly in my mind recognized you need to modify that for 64
> bits.

(i replied in bug #382.  or will once i hit send)
Comment 18 Luke Kenneth Casson Leighton 2020-06-20 19:43:56 BST
moving to this bugreport

(In reply to Yehowshua from https://bugs.libre-soc.org/show_bug.cgi?id=382#c15)
> > which would only copy over a subset of wen signals.  and on certain
> > granularities would overrun the wen array.
> 
> 
> Oh yeah. I implicitly in my mind recognized you need to modify that for 64
> bits.

yes.  it melted my brain a bit that granularity is specified in number of
bits, not number of subdivisions.

and that nmigen Memory likewise has this convention.

we can't go 128 bit wide with this code without modifying it because
it directly ties SRAM width to Wishbone width.

and Wishbone is not permitted to be 128 bit (which is a pain, for us).

we could however do odd-even at bit 3 of the address, and have *pairs*
of 64 bit Wishbone-accessible SRAMs.

but, for an internal SRAM (for boot purposes and test purposes) i don't
think we need that level of high performance access (128 bit wide data)
Comment 19 Yehowshua 2020-06-20 19:58:26 BST
Hey, lets schedule a call and flesh out the cache interface.
Making sure we’re on the same page.

I can then document the chosen interface and get to work.
Comment 20 Jacob Lifshay 2020-06-21 21:14:04 BST
I started writing a L1 cache, you could probably use my code:
https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/memory_pipe_experiment;h=fa3eca3be8acf77bc31c680b54e56c08c5ef810b;hb=HEAD
Comment 21 Luke Kenneth Casson Leighton 2020-06-21 21:56:49 BST
(In reply to Jacob Lifshay from comment #20)
> I started writing a L1 cache, you could probably use my code:
> https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/memory_pipe_experiment;
> h=fa3eca3be8acf77bc31c680b54e56c08c5ef810b;hb=HEAD

the topic for this bugreport is specifically for an SRAM behind
a Wishbone interface.  strictly speaking, L1 cache design should
be discussed under a separate bugreport.