increnental strategy described here: http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-June/008231.html the idea is to first morph L0CacheBuffer to use the minerva LoadStoreInterface. this to be done by creating a TestMemoryLoadStoreUnit (with associated unit tests) that ignores the wishbone sbus entirely. once L0CacheBuffer "talks" minerva LoadStoreInterface, it is plain sailing from that point. the only changes to the minerva loadstore.py at that point would be to widen the buses to 64 bit.
Some example memory connections. Will merge into SOC git repo later. https://github.com/BracketMaster/nmigen-by-example
(In reply to Yehowshua from comment #2) > Some example memory connections. > Will merge into SOC git repo later. > > https://github.com/BracketMaster/nmigen-by-example see bug #382, i have begun that process (and found a bug in the sram.py as well). the README btw is excellent. it would be fantastic to have the same quality of documentation in the documentation store known as the "wiki", for the project. by being part of the project, this would help increase the readability of the project and help us to gain more contributors, because people respect the source of the information as a valuable resource. although superb and exactly the kind of high quality readability that we need, and i appreciate that there was a dependency (nmigen-soc) whch is now added to git.libre-soc.org - unfortunately the high quality README has the unintended side-effect of directing - diverting - attention *away* from our project.
> the README btw is excellent. it would be fantastic to have the same quality > of documentation in the documentation store known as the "wiki", for the > project. Right. Can add into the Wiki. For a bit more context, I have a student working on my Thesis and encountered another guy in France who found the README and examples pretty helpful.
(In reply to Yehowshua from comment #4) > > the README btw is excellent. it would be fantastic to have the same quality > > of documentation in the documentation store known as the "wiki", for the > > project. > > Right. Can add into the Wiki. star > For a bit more context, I have a student working on my Thesis and > encountered another guy in France who found the README and examples > pretty helpful. yehyeh, no i meant it, they're really good! it's just that they praise nmigen and nmigen-soc (which is fantastic) but don't mention libre-soc. which in a larger context is why (bug #374) this small example illustrates why it would be detrimental to even have a mirror of our code on github [as opposed to a respectful and short README with a link]
ok so back on topic here: the two interfaces that are *nearly* the same are: * LoadStoreInterface from minerva * PortInterface from libre-soc l0cache.py these are what need "matching up". these are the signals from LoadStoreInterface: self.x_addr = Signal(addr_wid) self.x_mask = Signal(mask_wid) self.x_load = Signal() self.x_store = Signal() self.x_store_data = Signal(data_wid) self.x_stall = Signal() self.x_valid = Signal() self.m_stall = Signal() self.m_valid = Signal() self.x_busy = Signal() self.m_busy = Signal() self.m_load_data = Signal(data_wid) self.m_load_error = Signal() self.m_store_error = Signal() self.m_badaddr = Signal(addr_wid-log2_int(mask_wid)) and these are the ones from PortInterface: # distinguish op type (ld/st) self.is_ld_i = Signal(reset_less=True) self.is_st_i = Signal(reset_less=True) self.op = CompLDSTOpSubset() # hm insn_type ld/st duplicates here # common signals self.busy_o = Signal(reset_less=True) # do not use if busy self.go_die_i = Signal(reset_less=True) # back to reset self.addr = Data(addrwid, "addr_i") # addr/addr-ok # addr is valid (TLB, L1 etc.) self.addr_ok_o = Signal(reset_less=True) self.addr_exc_o = Signal(reset_less=True) # TODO, "type" of exception # LD/ST self.ld = Data(regwid, "ld_data_o") # ok to be set by L0 Cache/Buf self.st = Data(regwid, "st_data_i") # ok to be set by CompUnit Data is a Record that contains a "data" and an "ok". therefore: * pi.is_ld_i == lsi.x_load * pi.is_st_i == lsi.x_store * pi.ld.data == lsi.m_load_data * pi.ld.ok == lsi.m_valid (probably - just guessing here) * pi.st.data == lsi.x_store_data * pi.st.ok == don't know * pi.addr.data == lsi.x_addr * pi.addr.ok == lsi.x_valid (probably) * pi.go_die_i == {concept does not exist in LSI - *might* be lsi.rst} * pi.busy_o == lsi.x_busy (probably - again, just guessing) * pi.op.data_len == lsi.x_mask but **ONLY** after processing through LenExpand * pi.addr_ok_o == {concept does not exist in LSI, i don't think} * pi.addr_exc_o == lsi.m_load_error *and* lsi.m_store_error * {nonexistent} == lsi.m_badaddr this leaves lsi.x_busy to identify - plus all of the other guess-work signals.
Aha. So l0_cache.py **isn't** RTL - its just a mock of what the RTL would do. So, would you want me to fit RTL into that?
* only one of is_ld_i or is_st_i may be asserted. busy_o will immediately be asserted and remain asserted. That sounds like a rule for formal...
(In reply to Yehowshua from comment #7) > Aha. So l0_cache.py **isn't** RTL - it is indeed the RTL. > its just a mock of what the RTL would do. no, it's a combination of different classes that includes a "test" class (the current L0CacheBuffer) which, as the docstring says, allows us to "move on" in other areas of the code without the memory architecture being a major blocker it also contains sub-components which are needed for the "production" version. now, if we don't have time before the Oct 2020 deadline, L0CacheBuffer *will* be the RTL that ends up being taped out. this would be bad, because it can't handle misaligned ld/st and it is deliberately hard-limited to one LD/ST at a time. > So, would you want me to fit RTL into that? see comment #1 and the link to the incremental strategy that i outlined. i advocate that we go one step at a time - forget L1 Cache entirely for now as it is a distraction. if we can get a TestMemoryLoadStoreUnit written - *no cache nothing its sole purpose being to write to TestMemory* which can then be connected into L0CacheBuffer, then dropping in either BareLoadStoreUnit *or* CacheLoadStoreUnit *OR* the cache code that you would like to write is an absolutely trivial one-line change. however without everything complying with that LoadStoreInterface, none of that is possible.
Sorry - its an RTL interface. I got it now.
--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Sat, Jun 20, 2020 at 8:55 PM Yehowshua <yimmanuel3@gatech.edu> wrote: > > > > > what do you think? > > > * create a TestMemoryLoadStoreUnit (TMLSU) which is compliant with the > > LSUI interface, but *IGNORES* LSUI's wishbone bus entirely > > * make TMLSU read and write to TestMemory *using* the LSUI x_addr, > > x_mask, etc. etc. > > Wait - why not just have TestMemory test > LSUI connected to memory via wishbone? because that's yet another piece in the chain that would need to be understood. the FSMs of wishbone are complex enough to understand without being in the way of TestMemory. L0CacheBuffer <- LoadStoreInterface -> TestMemory is dead simple L0CacheBuffer <-> LoadStoreInterface <- can have a unit test unit test here -> LoadStoreInterface -> TestMemory is dead simple what you are proposing is: L0CacheBuffer <- LoadStoreInterface <- SomeClass -> Wishbone -> TestMemory for which we need *four* unit tests - one at each join-point. (note: SomeClass is almost identical to BareUnitLoadStore). now that *is* actually what we need.... however: one thing at a time, and, more than that, it's... well... 4 things (actually 8 including unit tests) to write rather than 2 (actually 4) before we get to the point where we can get some feedback about whether things work. > > That would make sense to me? > > I might not be understanding everything here, > but if we make all memories look like wishbone mems, > whether caches or SRAMS, then the only problem that needs solving > Is ensuring LSUI is wishbone compliant. yes... however that's a lot of complexity - a lot of code to write before we can actually do integration. the strategy above (outlined by starting with TestMemoryLoadStoreUnit) is an incremental one with a lot less risk. > However, I remember that we had two caches - that is, the even/odd cache > Thingies connected directly to LDSTunit . thingies lol. yes. that's the next phase.... but not yet. > It might make sense to abstract out these with wishbone? wishbone's sufficiently complex that it needs its own unit test... *and* we need a unit test on the thing it connects to. should anything go wrong with wishbone, we can't even *get* to the thing that it's behind.
Well - we don't have to do full wishbone... If we targeting Litex DRAM, sub, cycle, addr, re, and data really suffice I think. Basically, LiteX DRAM behaves kinda like Harry Ho's SRAM. And now I'm confused. L0 Cache **isnt** the even odd cache things? At some point, we should have a call so I can better understand this. For now, I can make a dead simple cache that looks like a memory. I'm afraid I won't be much help on LDST as my conversation will generate more questions. I really wish thing were more simple.
I won't be of much use on LDST until I've read through the link below... Unless - you have something wishbone that you want me to plug into. https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE Luke, I would suggest not going for full wishbone compliance. Just do something that **works with LiteX DRAM sitting below the LDST unit. It can really be quite simple - even if the resulting FSM isn't fully functional.
> be quite simple - even if the resulting FSM isn't fully > functional. I mean to say compliant. It should be functional of course.
Ah yes - L0_cache buffer - I see it here: https://bugs.libre-soc.org/show_bug.cgi?id=216
Pardon - here. https://libre-soc.org/3d_gpu/architecture/6600scoreboard/
(In reply to Yehowshua from comment #12) > Well - we don't have to do full wishbone... it's the fact that it has to be done at all which places a chain of 4 completely unknown pieces of code in a line, before any of the 4 can be tested that is the problem. the strategy is to cut that back *first* to *only* 2... and *then* add in the other 2. we have two tasks, here: 1) convert L0CacheBuffer to talk the LoadStoreInterface 2) add wishbone. what you are proposing is to do those *two things at the same time*. can you see that this chain of unknowns provides us with no way to know which two pieces *might* be a problem, if there is code that "does not work"? _yes_ we need to do both tasks.... however you are proposing to do *both* of them *at the same time* and consequently this will be far harder than it needs to be > If we targeting Litex DRAM, sub, cycle, addr, re, and data > really suffice I think. ... which is extra complexity that still does not help us because that is step (2) *and we still need step (1)*. > Basically, LiteX DRAM behaves kinda like Harry Ho's SRAM. > > And now I'm confused. > L0 Cache **isnt** the even odd cache things? see line 322 of L0CacheBuffer: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/l0_cache.py#l322 322 Note that the final version will have *two* interfaces per LDSTCompUnit, 323 to cover mis-aligned requests, as well as *two* 128-bit L1 Cache 324 interfaces: one for odd (addr[4] == 1) and one for even (addr[4] == 1). (i have been meaning to correct that to "even addr[4] == 0" for some time) ... :) it *becomes* the odd-even cache system > At some point, we should have a call so I can better understand this. > > For now, I can make a dead simple cache that looks like a memory. as long as it conforms to the LoadStoreInterface it will be useful. > I'm afraid I won't be much help on LDST as my conversation will generate > more questions. questions are great. > I really wish thing were more simple. they are. it's just that you're suggesting to add extra complexity without fully understanding that it is in fact extra complexity, not less. the very simple approach - we need two things: 1) a class named TestMemoryLoadStoreUnit which conforms to LoadStoreInterface (as a slave) 2) modifications to L0CacheBuffer which gets it to conform to LoadStoreInterface (as a master) that's it. that's all that's needed to get this: L0CacheBuffer <- LoadStoreInterface -> TestMemory it cannot get less simple than that. it's two unit tests. one of those unit tests actually partly exists already (the existing unit test for L0CacheBuffer). wishbone is *not* needed. at all, in that. yet it provides us with the absolutely critical step of converting L0CacheBuffer to LoadStoreInterface. what you are (might be) proposing is (requires the following steps): 1) a class that implements a Memory that conforms to Wishbone (we already have this, it is Harry's sram.py) 2) to bring in BareLoadStoreUnit and connect it to the Wishbone-enabled sram.py (BareLoadStoreUnit conforms slave-side to LoadStoreInterface) 3) to modify L0CacheBuffer to get it to conform to LoadStoreInterface (master-side) connect (3) to (2) to (1) that's four unit tests. that's more work, more complexity, and more things to go wrong. or, what you might be proposing - which is *even more* complex is: 1) to write a *completely new* from-scratch cache 2) to put an interface around it that conforms to Wishbone 3) to (create?) a completely new(?) class that conforms to LoadStoreInterface (slave-side) 4) to modify L0CacheBuffer to get it to conform to LoadStoreInterface. (master-compliant) that's *FIVE* unit tests, and *THREE* completely unknown, completely new sets of code. that's *even more* complex! again: *yes* we have to get to that point. however if we do it *right now*, straight away, we have a chain of completely unknown untested code. i don't understand why it is that you don't understand that what you are proposing is far more complex. what you are proposing *still and always requires that L0CacheBuffer be converted to LoadStoreInterface*. therefore, it should be logically obvious that the first incremental step should be to get that done as the first top priority. and it should also be logically obvious that, because we are doing incremental unit tests, that the simplest possible way to demonstrate that L0CacheBuffer "works" when converted to LoadStoreInterface is: write a simple class that connects LoadStoreInterface (slave) to TestMemory. what is not obvious and simple about that? i don't understand why it's not obvious and simple. perhaps if you write it in your own words, describing how to connect to PortInterface, that would help? PortInterface is our equivalent of minerva LoadStoreInterface. can you describe - in your own words - what you would do, what code you would write - that would connect to PortInterface, to provide LOAD/STORE capability?
(In reply to Yehowshua from comment #13) > I won't be of much use on LDST until I've read through the link below... > Unless - you have something wishbone that you want me to plug into. > https://groups.google.com/forum/#!topic/comp.arch/cbGAlcCjiZE > > Luke, I would suggest not going for full wishbone compliance. > Just do something that **works with LiteX DRAM sitting below the LDST > unit. ok. think it through. what classes are needed? > It can really be quite simple - even if the resulting FSM isn't fully > functional. indeed. that is not the issue. what components are needed? describe the *full* chain. think it through. bear in mind that no matter what is created, they *have* to talk - interface to - PortInterface [the libre-soc equivalent of LoadStoreInterface]. forget *everything* in l0_cache.py describe to me - in your own words - *every* component, every piece of code, outlining the *full* connectivity between each and every component.... starting at: PortInterface. go from there.
i've created TestMemoryPortInterface: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/experiment/pimem.py;hb=HEAD it's based *on* L0CacheBuffer however is simpler in that it "manages" the TestMemory itself. i've added comments into TestMemoryPortInterface that it is intended as the "absolute simplest compliant thing with PortInterface that can handle LOAD/STOREs" L0CacheBuffer now basically reduces to the role of "PortInterface Arbiter", picking one and only one incoming PortInterface and passing it through. that's the next task.
(In reply to Luke Kenneth Casson Leighton from comment #19) > L0CacheBuffer now basically reduces to the role of "PortInterface Arbiter", > picking one and only one incoming PortInterface and passing it through. > > that's the next task. done. L0CacheBuffer's sole job now is to select between incoming PortInterfaces (one per LDSTCompUnit), and to output *to* a PortInterface. that will however not provide support for misaligned requests. however the nice thing is: DualPortSplitter *will* deal with misaligned requests... and DualPortSplitter is intended to comply with PortInterface. hook one up to the other...
michael i created a file soc/experiment/pi2ls.py which contains this: """ PortInterface LoadStoreUnitInterface is_ld_i/1 x_ld_i is_st_i/1 x_st_i data_len/4 x_mask/16 (translate using LenExpand) busy_o/1 most likely to be x_busy_o go_die_i/1 rst? addr.data/48 x_addr_i[4:] (x_addr_i[:4] goes into LenExpand) addr.ok/1 probably x_valid_i & ~x_stall_i addr_ok_o/1 no equivalent. *might* work using x_stall_i addr_exc_o/2(?) m_load_err_o and m_store_err_o ld.data/64 m_ld_data_o ld.ok/1 probably implicit, when x_busy drops low st.data/64 x_st_data_i st.ok/1 probably kinda redundant, set to x_st_i """ it shows the mapping between PI and LSUI, as best as i can determine, and also shows where LenExpand gets used. from TestMemoryPortInterface: # set up LenExpander with the LD len and lower bits of addr lsbaddr, msbaddr = self.splitaddr(ldport.addr.data) comb += lenexp.len_i.eq(ldport.data_len) comb += lenexp.addr_i.eq(lsbaddr) and then this should do the trick: comb += lsui.x_mask_i.eq(lenexp.exp_o) btw TestMemoryPortInterface is done as a FSM because, given that it represents PortInterface, it has to do the 3 things: offer, exchange, complete. so this is why there is the "adrok_l" latch, because that is kept LOW to indicate the "offer" phase is still being negotiated. once "adrok_l" goes HI this indicates a move into "exchange" phase of the Contract. * "complete" is represented by reset_l. the "offer" part the bit where we have to fudge things for a while: still have that "address ok / not-ok" phase however return fake answers ("yes address is always ok") or, if you actually want to make it "work", check the size of the TestMemory, and if the address is out of range of the size of the TestMemory, actually set addr_exc_o to 1!
oh, also, as you can probably see, i got fed up with LoadStoreUnitInterface names not being clear which were inputs and which were outputs :) i went through minerva core.py and also loadstore.py and ended up independently confirming what you had identified as input and output.
[master 7400c18] start connecting up Pi2LSUI okok i was bored, i started on it :) looking at lsmem.py, it occurs to me that simply hooking up BareMemoryLoadStoreUnit / CacheLoadStoreUnit to harry ho's sram.py the exact same unit test could be used. and that that could be done through that reconfigureable class in bug #403.
here's the test class that connects Minerva BareLoadStoreUnit to a harry ho wishbone-based SRAM: [master 638a80f] add a test SRAM that lives behind a minerva LoadStoreUnitInterface
error handling (exceptions) still TODO here
TODO add error handling separately once dcache vhdl has been converted and analysed. this will tell us the full range of error messages required.