for testing purposes (and also because we will actually need an on-board SRAM and maybe a ROM as well) we need a means to place a Memory object behind a wishbone bus. does anyone know of anything like this? one of the simplest ways to achieve it is to hack the minerva L1 cache code, removing the "tag" aspect and substituting direct memory read-write. however it would be preferable not to spend the time doing that if someone knows of a pre-existing block.
found one https://codecov.io/gh/m-labs/nmigen-soc/pull/6/diff?src=pr&el=tree#diff-bm1pZ2VuX3NvYy93aXNoYm9uZS9zcmFtLnB5
found one https://codecov.io/gh/m-labs/nmigen-soc/pull/6/diff?src=pr&el=tree#diff-bm1pZ2VuX3NvYy93aXNoYm9uZS9zcmFtLnB5 https://github.com/m-labs/nmigen-soc/pull/6
found another https://git.m-labs.hk/M-Labs/HeavyX heavycomps/heavycomps/wishbone.py contains an SRAM.
(In reply to Luke Kenneth Casson Leighton from comment #3) > found another > https://git.m-labs.hk/M-Labs/HeavyX > > heavycomps/heavycomps/wishbone.py contains an SRAM. btw watch out for that repository, it now appears to be default language in russian.
https://git.libre-soc.org/?p=nmigen-soc.git;a=blob;f=nmigen_soc/wishbone/sram.py;h=de5c15d551c081a224590ee4ba8706ca3d2d7547;hb=refs/heads/wishbone_interconnect i've created a repository nmigen-soc, tracking harry ho's repo https://github.com/HarryMakes/nmigen-soc i can't stand branches: they always end up making a mess, indicating divergence and communication delay. we need to move much faster.
i did a quick scan/review of heavycomp SRAM and the branched nmigen-soc version and the nmigen-soc version looks much better. it's the wishbone arbiter (etc) that just looks more complete. we will need to add nmigen-soc as a library dependency. unfortunately with no obvious central project being responsible for this code and with the deadline rapidly approaching we cannot delay so it makes sense to simply fork this code, including adding unit tests to it and so on. if it was a much larger codebase or if we did not have a hard deadline i would not recommend that strategy.
yehowshua, hi, can you please put where the external source code is that contains the libresoc unit tests for sram, i would like to run them and am delayed doing so until i have spent time tracking them down.
found them by searching through my inbox containing 60,000 messages: https://github.com/BracketMaster/nmigen-by-example
> yehowshua, hi, can you please put where the external source code is that > contains Just now saw this. Glad you found it! Also, what was the bug in SRAM.py?
You can't write to Minerva's cache. I was to shy to say this before because it sounds silly and I thought maybe I didn't understand Minerva. But after EXTENSIVE review, I'm certain this is the case. Minerva handles writes by going directly to memory. In fact, Minerva's Cache is really really simple - I don't think its what we're looking for. Luke, I HAVE to have a cache for my thesis, so I'm thinking of writing one. Maybe I can kill two birds with one stone - just tell me what the cache must be able to do. I can also formally verify it no problem.
In fact, Minerva's cache codebase is really bizarre. We'd really have to shoe horn it I think.
(In reply to Yehowshua from comment #11) > In fact, Minerva's cache codebase is really bizarre. > We'd really have to shoe horn it I think. Basically, any time there is a write, if the entry is in the cache, it gets evicted from the cache. The write data then goes around the cache - not through the cache, on a separate bus, directly to memory. Yehowshua
Basically this is the cache interface that I tried to do around Minerva's cache, but its really really hard to shoehorn Minerva's two stage cache. (Higher Memory or CPU) <==> (CACHE) <==> (Lower Memory or RAM) I'm going to start writing such a cache and its proof. I could have finished it in the time I spent trying to shoehorn Minerva. It will be useful for my thesis, hopefully it will make sense for LibreSOC.
(In reply to Yehowshua from comment #9) > > yehowshua, hi, can you please put where the external source code is that > > contains > > Just now saw this. Glad you found it! > Also, what was the bug in SRAM.py? n_wrport = wrport.en.shape()[0] n_bussel = self.bus.sel.shape()[0] assert n_wrport == n_bussel, "bus enable count %d " \ "must match memory wen count %d" % (n_wrport, n_bussel) for i in range(n_wrport): m.d.comb += wrport.en[i].eq(self.bus.cyc & self.bus.stb & self.bus.we & self.bus.sel[i]) was: for i in range(4): which would only copy over a subset of wen signals. and on certain granularities would overrun the wen array.
> which would only copy over a subset of wen signals. and on certain > granularities would overrun the wen array. Oh yeah. I implicitly in my mind recognized you need to modify that for 64 bits.
(In reply to Yehowshua from comment #10) > You can't write to Minerva's cache. correct. you only write via CacheLoadStoreUnit - that's its job. Minerva.L1Cache is just a subcomponent that is managed *through* CacheLoadStoreUnit. it's a bit of a pain / mind-bender. i believe RocketChip, ariane, and so on have this type of design. however microwatt i believe has the design you are "expecting": there, i believe, we can see not just evidence of L1, we can see TLB as well: https://github.com/antonblanchard/microwatt/blob/master/dcache.vhdl in other words their team has merged at least three separate and distinct objects into one, and consequently it "looks" more like what you might be expecting. if we were to *separate* the code from lines 102 through 139 (and possibly slightly beyond) into a *separate* file named l1cache.vhdl, *now* we have the equivalent of Minerva.L1Cache and *now* you would, i believe, become likewise confused :) > I was to shy to say this before because it sounds silly and I thought maybe > I didn't understand Minerva. :) > But after EXTENSIVE review, I'm certain this is the case. > Minerva handles writes by going directly to memory. it should write to *both*. that's the definition of a write-through cache [that's if i understand the code correctly in CacheLoadStoreUnit] i.e. i believe CacheLoadStoreUnit *is* what you are thinking of as being a "write-through" cache.... *not* Minerva.L1Cache itself. my guess is that you are thinking - please do let me know if this is correct or not - in terms of an interface where there is a single Wishbone interface to {InsertNameOfL1CacheToBeDesignedHere} and that in {InsertNameHereL1Cache} *that* is where it takes care of write-through to Main Memory (and so on). if that is the case, then it is simply a terminology / naming confusion, because CacheLoadStoreUnit *is* that {InsertNameHereL1Cache}. this becomes clearer when we come to adding L2 (and L3?) because then there will be *THREE* wishbone Buses in CacheLoadStoreUnit: 1) a WB Bus to the object known by the name "Minerva.L1Cache" 2) a WB Bus to the [future] object known by the name "L2 Cache" 3) a WB Bus to *MAIN MEMORY*. ths [future] CacheLoadStoreUnit would: a) make a read request first of (1), then if that failed b) make a request of 2) and if that failed c) make a read request to 3) and if that failed d) report an address error / exception. for a write - if we were to respect "write-thru" rules, you would initiate *all three* of these at once: a) write request to (1) b) write request to (2) c) write request to (3) now, if you wanted to do "evict" and MESI or MOESI blah blah you would do: a) write request to (1), and if not room in L1 perform an "evict" which b) writes the *EVICTED* data to (2), and if not room in L2 likewise "evict" c) the *L2* evicted data gets written to (3) what i *can't* tell is whether the Minerva CacheLoadStoreUnit is actually a write-through or whether it implements the latter delayed/eviction algorithm. and *sigh* with no documentation, we have no idea. some unit tests which deliberately fill up the cache would help us to tell. > In fact, Minerva's Cache is really really simple - I don't think its what > we're looking for. i believe you may agree, after reflecting on the above, that it's just a sub-component, and that through naming you *may* have been misdirected in expectations that it does more than its name suggests. > Luke, I HAVE to have a cache for my thesis, so I'm thinking of writing one. > Maybe I can kill two birds with one stone - just tell me what the cache must > be able to do. i think the main thing we need - the one thing that minerva's cache cannot do which we need at this point (there will be others) is to give a "reservation" on a cache line which can be dropped. basically a decoupling of the "address assertion" from "actually putting data into it". however with a little bit of thought, i *believe* that this may be achieved in *CacheLoadStoreUnit*... *WITHOUT* having to actually modify Minerva's L1 Cache code at all. i.e. it is a *FSM* issue, not a "modification of Minerva.L1Cache" issue. however... honestly... i would be inclined to recommend to you that you simply go ahead and write one anyway, because it will give you a better understanding than trying to examine Minerva's code. because you were the person that wrote it. and therefore (hint, hint) documented it. however what i suggest is - if this is ok - please do stick to that same LoadStoreUnitInterface. the reason is because then everything becomes a drop-in replacement, revolving around that common data structure. > I can also formally verify it no problem. ahh excellent.
(In reply to Yehowshua from comment #15) > > which would only copy over a subset of wen signals. and on certain > > granularities would overrun the wen array. > > > Oh yeah. I implicitly in my mind recognized you need to modify that for 64 > bits. (i replied in bug #382. or will once i hit send)
moving to this bugreport (In reply to Yehowshua from https://bugs.libre-soc.org/show_bug.cgi?id=382#c15) > > which would only copy over a subset of wen signals. and on certain > > granularities would overrun the wen array. > > > Oh yeah. I implicitly in my mind recognized you need to modify that for 64 > bits. yes. it melted my brain a bit that granularity is specified in number of bits, not number of subdivisions. and that nmigen Memory likewise has this convention. we can't go 128 bit wide with this code without modifying it because it directly ties SRAM width to Wishbone width. and Wishbone is not permitted to be 128 bit (which is a pain, for us). we could however do odd-even at bit 3 of the address, and have *pairs* of 64 bit Wishbone-accessible SRAMs. but, for an internal SRAM (for boot purposes and test purposes) i don't think we need that level of high performance access (128 bit wide data)
Hey, lets schedule a call and flesh out the cache interface. Making sure we’re on the same page. I can then document the chosen interface and get to work.
I started writing a L1 cache, you could probably use my code: https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/memory_pipe_experiment;h=fa3eca3be8acf77bc31c680b54e56c08c5ef810b;hb=HEAD
(In reply to Jacob Lifshay from comment #20) > I started writing a L1 cache, you could probably use my code: > https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/memory_pipe_experiment; > h=fa3eca3be8acf77bc31c680b54e56c08c5ef810b;hb=HEAD the topic for this bugreport is specifically for an SRAM behind a Wishbone interface. strictly speaking, L1 cache design should be discussed under a separate bugreport.