Bug 485 - Create I-Cache from microwatt icache.vhdl
Summary: Create I-Cache from microwatt icache.vhdl
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: Normal normal
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 502 590
Blocks: 383 491 690
  Show dependency treegraph
 
Reported: 2020-09-02 20:04 BST by Cole Poirier
Modified: 2022-07-26 16:43 BST (History)
5 users (show)

See Also:
NLnet milestone: NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks: 1500
budget (EUR) for this task, excluding subtasks' budget: 1500
parent task for budget allocation: 51
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
[tplaten] amount = 300 submitted = 2022-06-16 paid = 2022-07-20 [lkcl] amount = 1200 submitted = 2022-06-16 paid = 2022-07-21


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cole Poirier 2020-09-02 20:04:41 BST
The goal is to achieve a working I-Cache by translating microwatt's vhdl icache code

***Must follow up on nmigen SRAM attributes***
Comment 1 Luke Kenneth Casson Leighton 2020-09-29 16:31:56 BST
 397 def read_tag(way, tagset):
 398     return tagset[way * TAG_BITS:(way + 1) * TAG_BITS]
 

use word_select @ TAG_BITS width.

write_tag use same trick in dcache.

copy into temp cv

assign.

copy back.
Comment 2 Luke Kenneth Casson Leighton 2020-09-29 16:39:09 BST
 716                 comb += plru.lru_o.eq(plru_out)
 717 

why are you destroying the output from the plru by overwriting it with a signal that is not used?
Comment 3 Cole Poirier 2020-09-29 16:47:17 BST
(In reply to Luke Kenneth Casson Leighton from comment #2)
>  716                 comb += plru.lru_o.eq(plru_out)
>  717 
> 
> why are you destroying the output from the plru by overwriting it with a
> signal that is not used?

Because I still barely know what I'm doing. I think I may not properly understand how port/generic maps should be converted i.e. I mix up the assignees and the assignments. What should I be doing instead in this case?
Comment 4 Luke Kenneth Casson Leighton 2020-09-29 16:51:08 BST
1107 #                 r.wb.stb <= '0';
1108 #               -- We only ever do reads on wishbone
1109 #               r.wb.dat <= (others => '0');
1110 #               r.wb.sel <= "11111111";

missed the wb.sel.

just set comb += r.wb.sel.eq(-1)
Comment 5 Luke Kenneth Casson Leighton 2020-09-29 16:51:55 BST
1119 #                 -- Process cache invalidations
1120 #                 if inval_in = '1' then
1121 #                     for i in index_t loop
1122 #                         cache_valids(i) <= (others => '0');
1123 #                     end loop;
1124 #                     r.store_valid <= '0';
1125 #                 end if;
1126         # Process cache invalidations
1127         with m.If(inval_in):
1128             for i in range(NUM_LINES):
1129                 sync += cache_valid_bits[i].eq(~1)

line 1122 says "set to zero".  why set to ~1?
Comment 6 Cole Poirier 2020-09-29 16:55:04 BST
(In reply to Luke Kenneth Casson Leighton from comment #4)
> 1107 #                 r.wb.stb <= '0';
> 1108 #               -- We only ever do reads on wishbone
> 1109 #               r.wb.dat <= (others => '0');
> 1110 #               r.wb.sel <= "11111111";
> 
> missed the wb.sel.
> 
> just set comb += r.wb.sel.eq(-1)

It's inside of an 'if rst=1' block, you said to ignore all code inside such blocks:

```
1100 #             if rst = '1' then
1101 # On reset, clear all valid bits to force misses
1102 #               for i in index_t loop
1103 #                   cache_valids(i) <= (others => '0');
1104 #               end loop;
1105 #                 r.state <= IDLE;
1106 #                 r.wb.cyc <= '0';
1107 #                 r.wb.stb <= '0';
1108 #               -- We only ever do reads on wishbone
1109 #               r.wb.dat <= (others => '0');
1110 #               r.wb.sel <= "11111111";
1111 #               r.wb.we  <= '0';
```
Comment 7 Cole Poirier 2020-09-29 16:57:08 BST
(In reply to Luke Kenneth Casson Leighton from comment #5)
> 1119 #                 -- Process cache invalidations
> 1120 #                 if inval_in = '1' then
> 1121 #                     for i in index_t loop
> 1122 #                         cache_valids(i) <= (others => '0');
> 1123 #                     end loop;
> 1124 #                     r.store_valid <= '0';
> 1125 #                 end if;
> 1126         # Process cache invalidations
> 1127         with m.If(inval_in):
> 1128             for i in range(NUM_LINES):
> 1129                 sync += cache_valid_bits[i].eq(~1)
> 
> line 1122 says "set to zero".  why set to ~1?

I thought vhdl '(others => '0')' nmigen equivalent was ~1, doing just eq(0) you said, only sets the first bit to 0, where '(others => '0')' sets all bits to zero.
Comment 8 Luke Kenneth Casson Leighton 2020-09-29 16:58:53 BST
1227 #                   if i = replace_way then
1228 #                       tagset := cache_tags(r.store_index);
1229 #                       write_tag(i, tagset, r.store_tag);
1230 #                       cache_tags(r.store_index) <= tagset;
1231 #                   end if;
1232 #               end loop;
1233                     for i in range(NUM_WAYS):
1234                         with m.If(i == replace_way):
1235                             sync += tagset.eq(cache_tags[r.store_index])
1236                             sync += write_tag(i, tagset, r.store_tag)
1237                             sync += cache_tags[r.store_index].eq(tagset)

on clock cycle N+1, tagset gets updated.
on clock cycle N, write_tags gets updated but using the value of tagset from cycle N-1 because of the sync on tagset.

you have to keep in mind that all output from sync will ONLY change on the NEXT cycle.

you want comb += tagset.eq(....)
Comment 9 Luke Kenneth Casson Leighton 2020-09-29 17:00:44 BST
1243 #               -- Requests are all sent if stb is 0
1244 #               stbs_done := r.wb.stb = '0';
1245                 # Requests are all sent if stb is 0
1246                 sync += stbs_done.eq(r.wb.stb == 0)

likewise stbs_done will here be set ONLY ON THE NEXT CYCLE yet the expectation is clearly for use in this FSM state.

you want comb += just like i did in dcache.
Comment 10 Luke Kenneth Casson Leighton 2020-09-29 17:05:43 BST
1252                 # was one accepted?
1253                 with m.If(~wb_in.stall & ~stbs_done):
1254 #               -- That was the last word ? We are done sending.


actually annoyingly you will need a 2nd temporary variable at line 1247 stbs_zero and test that at line 1253 

comb stbs_zero.eq(stbs==0)
comb stbs_fone.eq(stbs_done)

          with m.If(~wb_in.stall & ~stbs_zero):

otherwise a comb loop is created on stbs_done.
Comment 11 Cole Poirier 2020-09-29 17:06:37 BST
(In reply to Luke Kenneth Casson Leighton from comment #8)
> 1227 #                   if i = replace_way then
> 1228 #                       tagset := cache_tags(r.store_index);
> 1229 #                       write_tag(i, tagset, r.store_tag);
> 1230 #                       cache_tags(r.store_index) <= tagset;
> 1231 #                   end if;
> 1232 #               end loop;
> 1233                     for i in range(NUM_WAYS):
> 1234                         with m.If(i == replace_way):
> 1235                             sync += tagset.eq(cache_tags[r.store_index])
> 1236                             sync += write_tag(i, tagset, r.store_tag)
> 1237                             sync += cache_tags[r.store_index].eq(tagset)
> 
> on clock cycle N+1, tagset gets updated.
> on clock cycle N, write_tags gets updated but using the value of tagset from
> cycle N-1 because of the sync on tagset.
> 
> you have to keep in mind that all output from sync will ONLY change on the
> NEXT cycle.
> 
> you want comb += tagset.eq(....)

Like this?

```
1236                     for i in range(NUM_WAYS):
1237                         with m.If(i == replace_way):
1238                             comb += tagset.eq(cache_tags[r.store_index])
1239                             comb += write_tag(i, tagset, r.store_tag)
1240                             sync += cache_tags[r.store_index].eq(tagset)
```

At first I only changed line 1238 to comb, but got a nmigen error about trying to drive tagset from comb and sync domains, changing line 1239 to comb as well fixed that error.
Comment 12 Cole Poirier 2020-09-29 17:11:06 BST
(In reply to Luke Kenneth Casson Leighton from comment #9)
> 1243 #               -- Requests are all sent if stb is 0
> 1244 #               stbs_done := r.wb.stb = '0';
> 1245                 # Requests are all sent if stb is 0
> 1246                 sync += stbs_done.eq(r.wb.stb == 0)
> 
> likewise stbs_done will here be set ONLY ON THE NEXT CYCLE yet the
> expectation is clearly for use in this FSM state.
> 
> you want comb += just like i did in dcache.

Done. This reqires making the following change as well:
```
1264    # That was the last word ?
1265    # We are done sending.
1266    # Clear stb and set stbs_done
1267    # so we can handle
1268    # an eventual last ack on
1269    # the same cycle.
1270    with m.If(is_last_row_addr(r.wb.adr, r.end_row_ix)):
1271        sync += r.wb.stb.eq(0)
1272        comb += stbs_done.eq(1)
```

r is driven by sync domain in the rest of the function/process, is the above correct?
Comment 13 Luke Kenneth Casson Leighton 2020-09-29 17:14:55 BST
(In reply to Cole Poirier from comment #7)
> (In reply to Luke Kenneth Casson Leighton from comment #5)
> > 1119 #                 -- Process cache invalidations
> > 1120 #                 if inval_in = '1' then
> > 1121 #                     for i in index_t loop
> > 1122 #                         cache_valids(i) <= (others => '0');
> > 1123 #                     end loop;
> > 1124 #                     r.store_valid <= '0';
> > 1125 #                 end if;
> > 1126         # Process cache invalidations
> > 1127         with m.If(inval_in):
> > 1128             for i in range(NUM_LINES):
> > 1129                 sync += cache_valid_bits[i].eq(~1)
> > 
> > line 1122 says "set to zero".  why set to ~1?
> 
> I thought vhdl '(others => '0')' nmigen equivalent was ~1, 

no, it's "set all other bits to zero".  it says "0".

you meant (are confusing this with) "others => '1'

that asks for "all other bits to be set to 1."

and the way to do that is to set eq(-1) which is the same as eq(~0)

> doing just eq(0)
> you said, only sets the first bit to 0,

no i did not.  i said, "all nonspecified bits are set to zero i.e eq(0) is zero-extended"

> where '(others => '0')' sets all
> bits to zero.

yes.
Comment 14 Luke Kenneth Casson Leighton 2020-09-29 17:17:13 BST
(In reply to Cole Poirier from comment #12)
> (In reply to Luke Kenneth Casson Leighton from comment #9)
> > 1243 #               -- Requests are all sent if stb is 0
> > 1244 #               stbs_done := r.wb.stb = '0';
> > 1245                 # Requests are all sent if stb is 0
> > 1246                 sync += stbs_done.eq(r.wb.stb == 0)
> > 
> > likewise stbs_done will here be set ONLY ON THE NEXT CYCLE yet the
> > expectation is clearly for use in this FSM state.
> > 
> > you want comb += just like i did in dcache.
> 
> Done. This reqires making the following change as well:

> 1271        sync += r.wb.stb.eq(0)
> 1272        comb += stbs_done.eq(1)

well spotted.  this sets stbs_done in the cycle it is needed.

> ```
> 
> r is driven by sync domain in the rest of the function/process,

correct.

> is the above
> correct?

looks reasonable.
Comment 15 Cole Poirier 2020-09-29 17:19:13 BST
(In reply to Luke Kenneth Casson Leighton from comment #13)
> (In reply to Cole Poirier from comment #7)

> > I thought vhdl '(others => '0')' nmigen equivalent was ~1, 
> 
> no, it's "set all other bits to zero".  it says "0".
> 
> you meant (are confusing this with) "others => '1'

Correct.

> that asks for "all other bits to be set to 1."
> 
> and the way to do that is to set eq(-1) which is the same as eq(~0)

Gotcha.

> > doing just eq(0)
> > you said, only sets the first bit to 0,
> 
> no i did not.  i said, "all nonspecified bits are set to zero i.e eq(0) is
> zero-extended"

!! So eq(1) is only the first bit, but eq(0) is all the bits of the Signal?

> > where '(others => '0')' sets all
> > bits to zero.
> 
> yes.

Ok, glad I understood at least that correctly.
Comment 16 Cole Poirier 2020-09-29 17:21:20 BST
(In reply to Luke Kenneth Casson Leighton from comment #14)
> (In reply to Cole Poirier from comment #12)
> > Done. This reqires making the following change as well:
> 
> > 1271        sync += r.wb.stb.eq(0)
> > 1272        comb += stbs_done.eq(1)
> 
> well spotted.  this sets stbs_done in the cycle it is needed.

Thank you, I'm having significant difficulty making the determination of which cycle which signal's value is needed, and only got it right in this instance because r is used many times throughout the rest of that function in the sync domain.

> > r is driven by sync domain in the rest of the function/process,
> 
> correct.
> 
> > is the above
> > correct?
> 
> looks reasonable.

Awesome.
Comment 17 Luke Kenneth Casson Leighton 2020-09-29 17:22:35 BST
(In reply to Cole Poirier from comment #11)

> 1236                     for i in range(NUM_WAYS):
> 1237                         with m.If(i == replace_way):
> 1238                             comb += tagset.eq(cache_tags[r.store_index])
> 1239                             comb += write_tag(i, tagset, r.store_tag)
> 1240                             sync += cache_tags[r.store_index].eq(tagset)
> ```
> 
> At first I only changed line 1238 to comb, but got a nmigen error about
> trying to drive tagset from comb and sync domains, changing line 1239 to
> comb as well fixed that error.

err... errr... yes!  exactly the same pattern as with cv, a few lines above.

copy into a temp var using comb.  modify.  store back.

btw the use of write_tag there will result in some awful code but we sort that later.
Comment 18 Jacob Lifshay 2020-09-29 17:23:49 BST
(In reply to Cole Poirier from comment #15)
> (In reply to Luke Kenneth Casson Leighton from comment #13)
> > (In reply to Cole Poirier from comment #7)
> > > doing just eq(0)
> > > you said, only sets the first bit to 0,
> > 
> > no i did not.  i said, "all nonspecified bits are set to zero i.e eq(0) is
> > zero-extended"
> 
> !! So eq(1) is only the first bit, but eq(0) is all the bits of the Signal?

eq(<Python int>) sets the bits to the sign-extended version if the int is negative and to the zero-extended version otherwise. It always sets all bits of the thing eq() is called on.
Comment 19 Luke Kenneth Casson Leighton 2020-09-29 17:25:49 BST
(In reply to Cole Poirier from comment #15)

> !! So eq(1) is only the first bit, but eq(0) is all the bits of the Signal?

incorrect. 

whatever value you place into the LHS is zero extended.

therefore eq(1) will be 1 in the LSB and all zeros in the remaining MSBs.

which is why eq(~1) is especially doubly wrong.

this places 0b1111111111111111111110 into the LHS i.e a 0 in the LSB and all 1s in the MSBs.
Comment 20 Cole Poirier 2020-09-29 17:32:54 BST
(In reply to Luke Kenneth Casson Leighton from comment #17)
> (In reply to Cole Poirier from comment #11)
> 
> > 1236                     for i in range(NUM_WAYS):
> > 1237                         with m.If(i == replace_way):
> > 1238                             comb += tagset.eq(cache_tags[r.store_index])
> > 1239                             comb += write_tag(i, tagset, r.store_tag)
> > 1240                             sync += cache_tags[r.store_index].eq(tagset)
> > ```
> > 
> > At first I only changed line 1238 to comb, but got a nmigen error about
> > trying to drive tagset from comb and sync domains, changing line 1239 to
> > comb as well fixed that error.
> 
> err... errr... yes!  exactly the same pattern as with cv, a few lines above.
> 
> copy into a temp var using comb.  modify.  store back.
> 
> btw the use of write_tag there will result in some awful code but we sort
> that later.

Tried to do that, is this correct:
```
1236 for i in range(NUM_WAYS):
1237     with m.If(i == replace_way):
1238         ts = Signal(INDEX_BITS)
1239         comb += ts.eq(cache_tags[r.store_index])
1240         comb += ts.bit_select(i, TAG_BITS).eq(
1241                  r.store_tag
1242                 )
1243         comb += tagset.eq(ts)
1244         sync += cache_tags[r.store_index].eq(tagset)
```

The code for write_tag for quick reference:
```
def write_tag(way, tagset, tag):
    return tagset[way * TAG_BITS:(way + 1) * TAG_BITS].eq(tag)
```
Comment 21 Cole Poirier 2020-09-29 17:37:19 BST
(In reply to Jacob Lifshay from comment #18)

> eq(<Python int>) sets the bits to the sign-extended version if the int is
> negative and to the zero-extended version otherwise. It always sets all bits
> of the thing eq() is called on.

Hmmm...

(In reply to Luke Kenneth Casson Leighton from comment #19)
> (In reply to Cole Poirier from comment #15)
> 
> > !! So eq(1) is only the first bit, but eq(0) is all the bits of the Signal?
> 
> incorrect. 
> 
> whatever value you place into the LHS is zero extended.
> 
> therefore eq(1) will be 1 in the LSB and all zeros in the remaining MSBs.
> 
> which is why eq(~1) is especially doubly wrong.
> 
> this places 0b1111111111111111111110 into the LHS i.e a 0 in the LSB and all
> 1s in the MSBs.

Aha! Ok understood now... mostly... I think.
Comment 22 Luke Kenneth Casson Leighton 2020-09-29 17:37:47 BST
(In reply to Cole Poirier from comment #20)

> 1236 for i in range(NUM_WAYS):
> 1237     with m.If(i == replace_way):
> 1238         ts = Signal(INDEX_BITS)

wrong width.

> 1239         comb += ts.eq(cache_tags[r.store_index])
> 1240         comb += ts.bit_select(i, TAG_BITS).eq(

word_srlect.

> 1241                  r.store_tag
> 1242                 )
> 1243         comb += tagset.eq(ts)
> 1244         sync += cache_tags[r.store_index].eq(tagset)
> ```
> 
> The code for write_tag for quick reference:
> ```
> def write_tag(way, tagset, tag):
>     return tagset[way * TAG_BITS:(way + 1) * TAG_BITS].eq(tag)
> ```

oh, errr then forget that for now write_tag actually looks reasonable.

leave it for now
Comment 23 Cole Poirier 2020-09-29 17:42:52 BST
(In reply to Luke Kenneth Casson Leighton from comment #22)
> (In reply to Cole Poirier from comment #20)
> 
> > 1236 for i in range(NUM_WAYS):
> > 1237     with m.If(i == replace_way):
> > 1238         ts = Signal(INDEX_BITS)
> 
> wrong width.
> 
> > 1239         comb += ts.eq(cache_tags[r.store_index])
> > 1240         comb += ts.bit_select(i, TAG_BITS).eq(
> 
> word_srlect.
> 
> > 1241                  r.store_tag
> > 1242                 )
> > 1243         comb += tagset.eq(ts)
> > 1244         sync += cache_tags[r.store_index].eq(tagset)
> > ```
> > 
> > The code for write_tag for quick reference:
> > ```
> > def write_tag(way, tagset, tag):
> >     return tagset[way * TAG_BITS:(way + 1) * TAG_BITS].eq(tag)
> > ```
> 
> oh, errr then forget that for now write_tag actually looks reasonable.
> 
> leave it for now

Ok, undone, back to previous version.
Comment 24 Luke Kenneth Casson Leighton 2020-09-30 03:34:24 BST
 397 #       return tagset((way+1) * TAG_BITS - 1 downto way * TAG_BITS);
 398 #     end;
 399 # Read a tag from a tag memory row
 400 def read_tag(way, tagset):
 401     return tagset.word_select(way, TAG_WIDTH)[:TAG_BITS]


nooo.  return tagset.wordselect(way, TAGBITS).

then in writetag return readtag(....).eq(...)
Comment 25 Luke Kenneth Casson Leighton 2020-09-30 09:46:05 BST
commit 22746aeb180b2ea5f358f7f3cbcf1c39ffa29436 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Sep 30 09:43:40 2020 +0100

    forgot to add PLRUs as submodules

commit 43d8b84ac86f063328f730251f9c2cd55cae6f86 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Sep 30 09:45:56 2020 +0100

    fix read_tag to use word_select correctly
Comment 26 Cole Poirier 2020-09-30 19:49:10 BST
Fantastic! Thanks for fixing it! Do you want me to work on the memory-dictionary test utility that we discussed yesterday?
Comment 27 Luke Kenneth Casson Leighton 2020-09-30 20:16:39 BST
(In reply to Cole Poirier from comment #26)
> Fantastic! Thanks for fixing it!

it was after looking at the gtkwave i noticed that the SRAM module only sent ack every other cycle.

but... the r.store_row was advanced *even when no ack had yet come in*

> Do you want me to work on the
> memory-dictionary test utility that we discussed yesterday?

yes and do the comment corrections that you missed.

a dictionary memory test based on wb_get will allow very large addresses to be sent and also lots of them.

if you do several thousand this will create some cache flush (evictions) and we will find potential errors that might otherwise be missed.
Comment 28 Luke Kenneth Casson Leighton 2020-10-01 22:21:04 BST
ah. right.  not a bug at all.  turns out the SRAM WB module was not properly
WB spec compiliant.  sigh.
Comment 29 Cole Poirier 2020-10-01 23:10:58 BST
(In reply to Luke Kenneth Casson Leighton from comment #28)
> ah. right.  not a bug at all.  turns out the SRAM WB module was not properly
> WB spec compiliant.  sigh.

(In reply to Luke Kenneth Casson Leighton from comment #27)
> (In reply to Cole Poirier from comment #26)
> > Fantastic! Thanks for fixing it!
> 
> it was after looking at the gtkwave i noticed that the SRAM module only sent
> ack every other cycle.
> 
> but... the r.store_row was advanced *even when no ack had yet come in*

In-light of the above comment #28, do I have to do something to the SRAM or have you already done it?

> > Do you want me to work on the
> > memory-dictionary test utility that we discussed yesterday?
> 
> yes and do the comment corrections that you missed.

Sorry I'm not sure what you're referring to here? Is this in reference to my commit message saying 'fixed bugs raised in bugzilla comments' without providing the url to the exact comment numbers?

> a dictionary memory test based on wb_get will allow very large addresses to
> be sent and also lots of them.
> 
> if you do several thousand this will create some cache flush (evictions) and
> we will find potential errors that might otherwise be missed.

Very cool, I'll give it a go today.
Comment 30 Luke Kenneth Casson Leighton 2020-10-01 23:18:26 BST
(In reply to Cole Poirier from comment #29)

> In-light of the above comment #28, do I have to do something to the SRAM

no.

> > > Do you want me to work on the
> > > memory-dictionary test utility that we discussed yesterday?
> > 
> > yes and do the comment corrections that you missed.
> 
> Sorry I'm not sure what you're referring to here? Is this in reference to my
> commit message saying 'fixed bugs raised in bugzilla comments' without
> providing the url to the exact comment numbers?

no, it's "you didn't read all comments therefore didn't implement them".

if you had made individual commits and placed the url including comment numver in each commit you would be able to easily review and find which ones you missed.

now you have much morecwork to do. congratulations! :)
Comment 31 Cole Poirier 2020-10-01 23:42:40 BST
(In reply to Luke Kenneth Casson Leighton from comment #30)

> no, it's "you didn't read all comments therefore didn't implement them".

I've gone over this bug report twice and haven't found a comment that I missed... can you give me a hint? :)
 
> if you had made individual commits and placed the url including comment
> numver in each commit you would be able to easily review and find which ones
> you missed.
> 
> now you have much morecwork to do. congratulations! :)

Heh :) Trying my best, still end up being a bit of a monkey who doesn't remember certain key development rules, and stuffs things up sometimes. I have to stress, I've reviewed each comment on this bug report and haven't been able to find a comment that is not implemented in icache.py... and the current very limited unit test pass so.. I'm stuck and confused.
Comment 32 Luke Kenneth Casson Leighton 2020-10-02 00:03:04 BST
req_hit_way is missing from icache_comb and at line 763 comb assign from req_hit also missing.
Comment 33 Luke Kenneth Casson Leighton 2020-10-02 00:06:27 BST
(In reply to Cole Poirier from comment #31)

> I've gone over this bug report twice and haven't found a comment that I
> missed... can you give me a hint? :)

mmm... just went over them, i maay have done them.  hmm will need to re-read the code.  i know there's something just can't remember what.
Comment 34 Cole Poirier 2020-10-02 00:19:44 BST
(In reply to Luke Kenneth Casson Leighton from comment #32)
> req_hit_way is missing from icache_comb and at line 763 comb assign from
> req_hit also missing.

Pushed with link to this comment in commit message. I'm pretty sure I fixed the first part, unsure about the second part, but I think I fixed it?
Comment 35 Cole Poirier 2020-10-02 00:21:55 BST
(In reply to Luke Kenneth Casson Leighton from comment #33)
> (In reply to Cole Poirier from comment #31)
> 
> > I've gone over this bug report twice and haven't found a comment that I
> > missed... can you give me a hint? :)
> 
> mmm... just went over them, i maay have done them.  hmm will need to re-read
> the code.  i know there's something just can't remember what.

Ok... I'm a bit confused.

I didn't find the missing code in a comment on this bug report but rather took it from icache.vhdl. When you were referring to comments were you referring to the comments in icache.py that I should have left that were the code of icache.vhdl pasted in at the start of the translation process?
Comment 36 Luke Kenneth Casson Leighton 2020-10-02 11:42:54 BST
(In reply to Cole Poirier from comment #34)
> (In reply to Luke Kenneth Casson Leighton from comment #32)
> > req_hit_way is missing from icache_comb and at line 763 comb assign from
> > req_hit also missing.
> 
> Pushed with link to this comment in commit message. I'm pretty sure I fixed
> the first part, unsure about the second part, but I think I fixed it?

did you run it?  i can tell immediately just from looking at the code
that you didn't.  why did you not run the code before committing?

remember: you need to think, logically, like a computer program, keeping
a model of "expected python programs" in your head, and actually run them
to make sure that you are constantly reminded of that.

  File "experiment/icache.py", line 768, in icache_comb
    comb += req_hit_way.eq(hit_way)
NameError: name 'req_hit_way' is not defined
Comment 37 Cole Poirier 2020-10-02 22:13:49 BST
(In reply to Luke Kenneth Casson Leighton from comment #36)
> (In reply to Cole Poirier from comment #34)
> > (In reply to Luke Kenneth Casson Leighton from comment #32)
> > > req_hit_way is missing from icache_comb and at line 763 comb assign from
> > > req_hit also missing.
> > 
> > Pushed with link to this comment in commit message. I'm pretty sure I fixed
> > the first part, unsure about the second part, but I think I fixed it?
> 
> did you run it?  i can tell immediately just from looking at the code
> that you didn't.  why did you not run the code before committing?
> 
> remember: you need to think, logically, like a computer program, keeping
> a model of "expected python programs" in your head, and actually run them
> to make sure that you are constantly reminded of that.
> 
>   File "experiment/icache.py", line 768, in icache_comb
>     comb += req_hit_way.eq(hit_way)
> NameError: name 'req_hit_way' is not defined

No I didn't run it, you're right, I forgot. Sorry I've been really off this week, will try harder :)
Comment 38 Luke Kenneth Casson Leighton 2020-10-02 22:29:18 BST
(In reply to Cole Poirier from comment #37)

> > NameError: name 'req_hit_way' is not defined
> 
> No I didn't run it, you're right, I forgot. 

wark-wark, melted-brain :)  i could have written precise instructions, it would be better however for you to learn and that means adding some "hmm" moments
Comment 39 Cole Poirier 2020-10-03 00:57:17 BST
(In reply to Luke Kenneth Casson Leighton from comment #38)
> wark-wark, melted-brain :)  i could have written precise instructions, it
> would be better however for you to learn and that means adding some "hmm"
> moments

Melted-brain indeed... I don't know why but I've had such bad brain fog this week. Yeah, fixing it was super straight-forward, just had to add req_hit_way as a parameter to the function and the function call.

Brain-fog seems to be clearing up, look for a forthcoming email re OPENTitan-Litex.
Comment 40 Cole Poirier 2020-10-03 01:26:19 BST
To confirm my memory of what my next steps here are:

* I am to extract the entirety of the wb_get() fn from lines 27-76 of experiment/test/test_mmu_dcache.py into a file experiment/test/wb_get.py

* I then will paramaterize the 'mem' dict in wb_get such that I can call wb_get in icache_sim() passing in a dictionary containing 10,000 entries, adding a bias to the random generator function to ensure lots of large 64-bit address are tested

* I call wb_get in a for i in range(10000) loop, comparing the results returned from the icache hdl code with the SourceOfTruth(TM) i.e. the mem dict passed in to the wb_get function?

*** I can see some mistakes in my above statements but I can't figure out the correct version so I'm writing it here for help and correction
Comment 41 Luke Kenneth Casson Leighton 2020-10-03 03:06:59 BST
(In reply to Cole Poirier from comment #40)

> *** I can see some mistakes in my above statements but I can't figure out
> the correct version so I'm writing it here for help and correction

pretty much.

1) start from known-good ie make sure that the dcache test works with
   the creation of the wb_get modification FIRST.

2) test and commit that.

3) do an equivalent test for icache

4) test and commit that

5) create a dictionary with 1000 entries where BOTH key AND data are
   large random values.

6) test and commit that.

note two things:

a) *keep what you do simple in each commit* and ALWAYS test it.  work
   INCREMENTALLY.  can you see the plan that you laid out was a chain
   of ever-complex dependence, without any checking that each stage was
   going to work before moving to the next?

b) you almost had the latter phase right: the for-loop is not to add
   *addresses* from 0 to 10,000, the idea behind the test is to
   have QTY 10,000 *completely* random data AND address requests.

but, start with 100 or 1000 and move upwards.
Comment 42 Cole Poirier 2020-10-03 06:40:53 BST
(In reply to Luke Kenneth Casson Leighton from comment #41)
[snip]
> note two things:
> 
> a) *keep what you do simple in each commit* and ALWAYS test it.  work
>    INCREMENTALLY.  can you see the plan that you laid out was a chain
>    of ever-complex dependence, without any checking that each stage was
>    going to work before moving to the next?

Noted. That’s a much easier, more sensible, and less frantic way of doing things. Thanks for the guidance, it’s much easier to become a better programmer with help :)

> b) you almost had the latter phase right: the for-loop is not to add
>    *addresses* from 0 to 10,000, the idea behind the test is to
>    have QTY 10,000 *completely* random data AND address requests.

Completely random, noted. I guessed wrong based on for some reason thinking this was like the div unit test where we specifically wanted to test the transition points, whereas in this instance we are much more likely to hit the necessary areas of the test space. But then again unit tests can’t cover the whole test space. Will an eventual task be to write the formal proof for icache?

> but, start with 100 or 1000 and move upwards.

Sensible, speeds up the debug cycle. Noted.
Comment 43 Luke Kenneth Casson Leighton 2020-10-03 20:54:41 BST
remember also, move FSM states to separate function. IDLE, CLR/WAIT,
will reduce code-indentation increase code clarity.
Comment 44 Cole Poirier 2020-10-05 00:35:05 BST
(In reply to Luke Kenneth Casson Leighton from comment #43)
> remember also, move FSM states to separate function. IDLE, CLR/WAIT,
> will reduce code-indentation increase code clarity.

Will do.
Comment 45 Cole Poirier 2020-10-05 03:12:24 BST
(In reply to Luke Kenneth Casson Leighton from comment #41)
> 1) start from known-good ie make sure that the dcache test works with
>    the creation of the wb_get modification FIRST.

Starting with extracting wb_get into a separate file now. I'm being thrown off by the 'global stop' that is shared with mmu_sim and mmu_lookup in experiment/test/test_mmu_dcache.py.

Should stop become a parameter of each of the 3 fn's i.e. including wb_get()?
Comment 46 Luke Kenneth Casson Leighton 2020-10-05 03:36:48 BST
(In reply to Cole Poirier from comment #45)

> Starting with extracting wb_get into a separate file now. I'm being thrown
> off by the 'global stop' that is shared with mmu_sim and mmu_lookup in
> experiment/test/test_mmu_dcache.py.

o Rs. you will have to keep everything in that one file for now.

the "solution" here is to use classes.  i was trying to not do that.

shove the icache unit test in the same file for now.

> Should stop become a parameter of each of the 3 fn's i.e. including wb_get()?

that would make it local, wouldn't it? because function parameters are local, aren't they?

how could changing a local boolean variable change a completely separate isolated local boolean in another function?

the purpose of stop being global is so that one function can tell another function to stop, isn't it?
Comment 47 Luke Kenneth Casson Leighton 2020-10-05 11:48:59 BST
(*sotto voice* yeah don't do this, please)

        self.itlb_lookup( 
            m, tlb_req_index, itlb_ptes, itlb_tags, real_addr,
            itlb_valid_bits, ra_valid, eaa_priv, priv_fault, access_ok
        )

use this convention:

        self.itlb_lookup(m, tlb_req_index, itlb_ptes, itlb_tags, real_addr,
                         itlb_valid_bits, ra_valid, eaa_priv, priv_fault,
                         access_ok)
Comment 48 Luke Kenneth Casson Leighton 2020-10-05 11:49:55 BST
just did a couple of whitespace cleanups.
Comment 49 Jacob Lifshay 2020-10-05 12:19:15 BST
https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=6f8178aab36b954b2c7f1c5b93a92cfa476942ce

Oops, Python accidentally gained a new "return if" statement.
Comment 50 Luke Kenneth Casson Leighton 2020-10-05 12:28:54 BST
(In reply to Jacob Lifshay from comment #49)

> Oops, Python accidentally gained a new "return if" statement.

wark-wark.  err yeah let's not encourage them to keep quotes evolving quotes the python syntax :)  list incomprehensions and new assignment operators are bad enough...
Comment 51 Jacob Lifshay 2020-10-05 12:37:35 BST
(In reply to Luke Kenneth Casson Leighton from comment #50)
> (In reply to Jacob Lifshay from comment #49)
> 
> > Oops, Python accidentally gained a new "return if" statement.
> 
> wark-wark.  err yeah let's not encourage them to keep quotes evolving quotes
> the python syntax :)  list incomprehensions and new assignment operators are
> bad enough...

https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=040a0600105dc8d4d4e16af68437d3ed752ef53f

how about using something that's not still a syntax error (note `:`):
return ((n << 32) & ((n - 1) << 32)) == 0
instead of:
return ((n << 32) & ((n-1) << 32)) == 0:
Comment 52 Luke Kenneth Casson Leighton 2020-10-05 12:51:53 BST
(In reply to Jacob Lifshay from comment #51)

> return ((n << 32) & ((n-1) << 32)) == 0:

nggggh you can tell i am not doing any dev work today.
Comment 53 Jacob Lifshay 2020-10-05 13:44:28 BST
The version of ispow2 isn't actually correct, you can pass in 0 and it returns True:
def ispow2(n):
    return ((n << 32) & ((n-1) << 32)) == 0

The correct code is (assuming the input is always a Python int, adjust for Signals):
def ispow2(n):
    return n != 0 and (n & (n - 1)) == 0

see https://stackoverflow.com/a/57025941

AFAIK the original VHDL code is correct though, since to_unsigned causes an error for negative inputs, catching the case where n = 0.

Also note (n << 32) is not how to translate to_unsigned, the translation is complicated, though it can often be just the identity function.

It's also often just Value.cast(n).as_unsigned() or just n.as_unsigned() if n is already a Value/Signal.

For the WIP version of to_unsigned, see:
https://github.com/nmigen/nmigen/issues/464
Comment 54 Cole Poirier 2020-10-05 17:40:34 BST
(In reply to Luke Kenneth Casson Leighton from comment #46)
> (In reply to Cole Poirier from comment #45)

> o Rs. you will have to keep everything in that one file for now.
> 
> the "solution" here is to use classes.  i was trying to not do that.

Aha!
 
> shove the icache unit test in the same file for now.

Will do.

> > Should stop become a parameter of each of the 3 fn's i.e. including wb_get()?
> 
> that would make it local, wouldn't it? because function parameters are
> local, aren't they?
> 
> how could changing a local boolean variable change a completely separate
> isolated local boolean in another function?
> 
> the purpose of stop being global is so that one function can tell another
> function to stop, isn't it?

This is why I asked because my python is mediocre at best. Anything more complicated than little scripts I write in rust so that I have the compiler assisting me ;)
Comment 55 Cole Poirier 2020-10-05 17:41:20 BST
(In reply to Luke Kenneth Casson Leighton from comment #47)
> (*sotto voice* yeah don't do this, please)
> 
>         self.itlb_lookup( 
>             m, tlb_req_index, itlb_ptes, itlb_tags, real_addr,
>             itlb_valid_bits, ra_valid, eaa_priv, priv_fault, access_ok
>         )
> 
> use this convention:
> 
>         self.itlb_lookup(m, tlb_req_index, itlb_ptes, itlb_tags, real_addr,
>                          itlb_valid_bits, ra_valid, eaa_priv, priv_fault,
>                          access_ok)

Will do.
Comment 56 Cole Poirier 2020-10-05 17:43:45 BST
(In reply to Jacob Lifshay from comment #53)
> The version of ispow2 isn't actually correct, you can pass in 0 and it
> returns True:
> def ispow2(n):
>     return ((n << 32) & ((n-1) << 32)) == 0
> 
> The correct code is (assuming the input is always a Python int, adjust for
> Signals):
> def ispow2(n):
>     return n != 0 and (n & (n - 1)) == 0
> 
> see https://stackoverflow.com/a/57025941
> 
> AFAIK the original VHDL code is correct though, since to_unsigned causes an
> error for negative inputs, catching the case where n = 0.
> 
> Also note (n << 32) is not how to translate to_unsigned, the translation is
> complicated, though it can often be just the identity function.
> 
> It's also often just Value.cast(n).as_unsigned() or just n.as_unsigned() if
> n is already a Value/Signal.
> 
> For the WIP version of to_unsigned, see:
> https://github.com/nmigen/nmigen/issues/464

Thanks for your help Jacob. Fixing it now.
Comment 57 Cole Poirier 2020-10-08 03:00:04 BST
(In reply to Luke Kenneth Casson Leighton from comment #41)
> (In reply to Cole Poirier from comment #40)
> 
> > *** I can see some mistakes in my above statements but I can't figure out
> > the correct version so I'm writing it here for help and correction
> 
> pretty much.
> 
> 1) start from known-good ie make sure that the dcache test works with
>    the creation of the wb_get modification FIRST.
> 
> 2) test and commit that.
> 
> 3) do an equivalent test for icache
> 
> 4) test and commit that


Ok, I've tried to do (1,2), I committed a tested working version, then tried the harder (3,4) and committed that but commented out so as to not affect the working test_mmu(), tested and the test_mmu() still works. Currently it goes into an infinite loop because I have done *something* wrong, I don't know what however, so can you take a look and point out whatever mistakes are obvious to you but not to me please :)
Comment 58 Luke Kenneth Casson Leighton 2020-10-08 05:14:46 BST
def wb_get(c, mem=default_mem):

never do this with a dictionary in python.

def wb_get(c, mem):
   ...
   ...


 140     sim.add_sync_process(wrap(mmu_sim(mmu)))
 141     sim.add_sync_process(wrap(wb_get(dcache, default_mem)))
Comment 59 Luke Kenneth Casson Leighton 2020-10-08 05:28:55 BST
 139     sim.add_sync_process(wrap(icache_mmu_sim(mmu)))
 140     sim.add_sync_process(wrap(wb_get(icache, "ICACHE")))

right.

ok.

the default dict here you have left it as mem_default.

that mem_default is set up to answer an example taken from gem5-power which is *specifically* for dcache.

it has absolutely no entries for instruction cache reading whatsoever 

also this is not the test that needs to be written (yet).

the test that needs to be written - *still using wb_get when taking two arguments one of which is a mem dict* - is to test the icache, *not the icache using the mmu*

in other words look at icache.py unit test and instead of the SRAM add the wb_get as the sync process with a nicely pre-prepared mem dict (that contains 100 randomly created keys and values)

later we can and will do "icache-with-mmu" but only after doing "icache" because if we have not tested icache how can we tell what is faulty if the with-mmu fails?

one test at a time.
Comment 60 Cole Poirier 2020-10-08 19:25:25 BST
(In reply to Luke Kenneth Casson Leighton from comment #58)
> def wb_get(c, mem=default_mem):
> 
> never do this with a dictionary in python.
> 
> def wb_get(c, mem):
>    ...
>    ...
> 
> 
>  140     sim.add_sync_process(wrap(mmu_sim(mmu)))
>  141     sim.add_sync_process(wrap(wb_get(dcache, default_mem)))

Got it. Out of curiosity and to be a better python programmer, why is this bad practice?
Comment 61 Luke Kenneth Casson Leighton 2020-10-08 19:33:13 BST
(In reply to Cole Poirier from comment #60)

> Got it. Out of curiosity and to be a better python programmer, why is this
> bad practice?

aside from needing totally different dictionaries, it's called a "singleton".  i.e. that dictionary is *shared between function calls*.

i.e. you call the function once, make some changes, then call it again and despite expecting an entirely new fresh default dict argument, after many days of frustrating debugging go "wtf, this dictionary has been modified, wtf??"

don't do it unless that is EXACTLY what you want and make absolutely sure that you comment the code saying "this is intentionally a singleton default dictionary" because it is a major source of really obscure bugs.
Comment 62 Cole Poirier 2020-10-08 19:35:39 BST
(In reply to Luke Kenneth Casson Leighton from comment #59)
>  139     sim.add_sync_process(wrap(icache_mmu_sim(mmu)))
>  140     sim.add_sync_process(wrap(wb_get(icache, "ICACHE")))
> 
> right.
> 
> ok.
> 
> the default dict here you have left it as mem_default.
> 
> that mem_default is set up to answer an example taken from gem5-power which
> is *specifically* for dcache.
> 
> it has absolutely no entries for instruction cache reading whatsoever 

Yes indeed, because...

> also this is not the test that needs to be written (yet).
> 
> the test that needs to be written - *still using wb_get when taking two
> arguments one of which is a mem dict* - is to test the icache, *not the
> icache using the mmu*

... this. Also I was trying to infer what to do from the existing test in that file, which turns out to have been the wrong approach. To confirm, I can use wb_get without modification... Can I import it into icache.py or should I be writing the test in the same file as wb_get(). How is wb_get connected to ICache? Like this?

```
sim.add_sync_process(wrap(icache_test(icache)))
sim.add_sync_process(wrap(wb_get(icache, icache_mem, "ICACHE")))
```

> in other words look at icache.py unit test and instead of the SRAM add the
> wb_get as the sync process with a nicely pre-prepared mem dict (that
> contains 100 randomly created keys and values)

I don't see how the wb_get fn can be used in icache.py given the global variable... though I think it's likely I'm showing my python ignorance here.
 
> later we can and will do "icache-with-mmu" but only after doing "icache"
> because if we have not tested icache how can we tell what is faulty if the
> with-mmu fails?

Of course, I was just starting from the working dcache_mmu_test because the global varible threw me off, but I was also thrown off by looking at the test_mmu_dcache and thinking I should follow it's design... Just lots of confusion on my end.

> one test at a time.

The only way to do it successfully :)
Comment 63 Cole Poirier 2020-10-08 19:36:18 BST
(In reply to Luke Kenneth Casson Leighton from comment #61)
> (In reply to Cole Poirier from comment #60)
> 
> > Got it. Out of curiosity and to be a better python programmer, why is this
> > bad practice?
> 
> aside from needing totally different dictionaries, it's called a
> "singleton".  i.e. that dictionary is *shared between function calls*.
> 
> i.e. you call the function once, make some changes, then call it again and
> despite expecting an entirely new fresh default dict argument, after many
> days of frustrating debugging go "wtf, this dictionary has been modified,
> wtf??"
> 
> don't do it unless that is EXACTLY what you want and make absolutely sure
> that you comment the code saying "this is intentionally a singleton default
> dictionary" because it is a major source of really obscure bugs.

Aha! That makes a lot of sense, thanks for taking to time to explain it in detail.
Comment 64 Luke Kenneth Casson Leighton 2020-10-08 19:58:08 BST
(In reply to Cole Poirier from comment #62)
> Can I import it into icache.py 

no.  said this twice now.

> or
> should I be writing the test in the same file as wb_get().

yes.  said that twice now, too. 

> How is wb_get
> connected to ICache? Like this?
> sim.add_sync_process(wrap(icache_test(icache)))
> sim.add_sync_process(wrap(wb_get(icache, icache_mem, "ICACHE")))
> ```

yes.

> > in other words look at icache.py unit test and instead of the SRAM add the
> > wb_get as the sync process with a nicely pre-prepared mem dict (that
> > contains 100 randomly created keys and values)
> 
> I don't see how the wb_get fn can be used in icache.py given the global
> variable... though I think it's likely I'm showing my python ignorance here.

that's why i said (third time) put the test in the same file.

> Of course, I was just starting from the working dcache_mmu_test

indeed.

we don't yet know how to use mmu in icache mode, it will need some research, reading some of microwatt tests for example.
Comment 65 Luke Kenneth Casson Leighton 2020-10-08 19:58:18 BST

    
Comment 66 Cole Poirier 2020-10-08 20:12:22 BST
(In reply to Luke Kenneth Casson Leighton from comment #64)
> (In reply to Cole Poirier from comment #62)
> > Can I import it into icache.py 
> 
> no.  said this twice now.

Yes, but I have a thick head :)

> > or
> > should I be writing the test in the same file as wb_get().
> 
> yes.  said that twice now, too. 

I get nervous and panicky with this kind of stuff as you know, I'm continuing to work on that.

> > How is wb_get
> > connected to ICache? Like this?
> > sim.add_sync_process(wrap(icache_test(icache)))
> > sim.add_sync_process(wrap(wb_get(icache, icache_mem, "ICACHE")))
> > ```
> 
> yes.

Great.

> > > in other words look at icache.py unit test and instead of the SRAM add the
> > > wb_get as the sync process with a nicely pre-prepared mem dict (that
> > > contains 100 randomly created keys and values)
> > 
> > I don't see how the wb_get fn can be used in icache.py given the global
> > variable... though I think it's likely I'm showing my python ignorance here.
> 
> that's why i said (third time) put the test in the same file.
> 
> > Of course, I was just starting from the working dcache_mmu_test
> 
> indeed.
> 
> we don't yet know how to use mmu in icache mode, it will need some research,
> reading some of microwatt tests for example.

Ok will do... I'm not sure they have a test for that. I just looked through all the icache* and mmu* files in microwatt and other than the test I've already translated into icache.py from icache_tb.vhdl they are all C, assembly, or binaries, I don't see an HDL test for MMU-ICache.
Comment 67 Cole Poirier 2020-10-08 21:36:48 BST
(In reply to Luke Kenneth Casson Leighton from comment #59)
> the test that needs to be written - *still using wb_get when taking two
> arguments one of which is a mem dict* - is to test the icache, *not the
> icache using the mmu*
> 
> in other words look at icache.py unit test and instead of the SRAM add the
> wb_get as the sync process with a nicely pre-prepared mem dict (that
> contains 100 randomly created keys and values)

Ok I've committed my attempt at this, it fails assert valid because I suspect I haven't connected things quite the right way... but I don't really understand how to connect them the right way. Can you take a look? You'll have to uncomment test_icache() at the bottom of the file, I commented it out so as not to break test_mmu_dcache.py in case tobias is using it.
Comment 68 Luke Kenneth Casson Leighton 2020-10-08 23:17:25 BST
(In reply to Cole Poirier from comment #67)

> Ok I've committed my attempt at this, it fails assert valid because I
> suspect I haven't connected things quite the right way... but I don't really
> understand how to connect them the right way. Can you take a look? You'll
> have to uncomment test_icache() at the bottom of the file, I commented it
> out so as not to break test_mmu_dcache.py in case tobias is using it.

we really should be using python unittest, this "stops" such problems
dead.
Comment 69 Luke Kenneth Casson Leighton 2020-10-08 23:19:28 BST
(In reply to Cole Poirier from comment #66)

> I get nervous and panicky with this kind of stuff as you know, I'm
> continuing to work on that.

does this help?

http://www.userfriendly.org/cartoons/archives/20oct/uf013202.gif

> Ok will do... I'm not sure they have a test for that. I just looked through
> all the icache* and mmu* files in microwatt and other than the test I've
> already translated into icache.py from icache_tb.vhdl they are all C,
> assembly, or binaries, I don't see an HDL test for MMU-ICache.

yyeahh i know.  it's why i went and hunted down the gem5 power RADIX
walkthrough for dcache.
Comment 70 Luke Kenneth Casson Leighton 2020-10-08 23:28:02 BST
(In reply to Cole Poirier from comment #67)
> (In reply to Luke Kenneth Casson Leighton from comment #59)
> > the test that needs to be written - *still using wb_get when taking two
> > arguments one of which is a mem dict* - is to test the icache, *not the
> > icache using the mmu*
> > 
> > in other words look at icache.py unit test and instead of the SRAM add the
> > wb_get as the sync process with a nicely pre-prepared mem dict (that
> > contains 100 randomly created keys and values)
> 
> Ok I've committed my attempt at this, it fails assert valid because I
> suspect I haven't connected things quite the right way... but I don't really
> understand how to connect them the right way. 

neither do i.  that's why the unit test, to use it to "find out".

> Can you take a look?

looks like it's a bug! yay for unit tests!
Comment 71 Cole Poirier 2020-10-09 02:22:31 BST
(In reply to Luke Kenneth Casson Leighton from comment #69)
> (In reply to Cole Poirier from comment #66)
> 
> > I get nervous and panicky with this kind of stuff as you know, I'm
> > continuing to work on that.
> 
> does this help?
> 
> http://www.userfriendly.org/cartoons/archives/20oct/uf013202.gif

Yes, that's hilarious!

> > Ok will do... [snip] I don't see an HDL test for MMU-ICache.
> 
> yyeahh i know.  it's why i went and hunted down the gem5 power RADIX
> walkthrough for dcache.

Oh boy... this will be the first time I'm looking for code outside of microwatt!

(In reply to Luke Kenneth Casson Leighton from comment #68)
> we really should be using python unittest, this "stops" such problems
> dead.

Sounds good, I'll try to figure out how to do that here from other modules where it's used quite extensively. grep is my friend.

(In reply to Luke Kenneth Casson Leighton from comment #70)
> (In reply to Cole Poirier from comment #67)
> > Ok I've committed my attempt at this, it fails assert valid because I
> > suspect I haven't connected things quite the right way... but I don't really
> > understand how to connect them the right way. 
> 
> neither do i.  that's why the unit test, to use it to "find out".

!! :)

> > Can you take a look?
> 
> looks like it's a bug! yay for unit tests!

Perhaps, but my concern is that it's a bug in my test or test setup, so can you check that I haven't made a stupid and obvious mistake?
Comment 72 Luke Kenneth Casson Leighton 2020-10-09 11:40:09 BST
(In reply to Cole Poirier from comment #71)

> > looks like it's a bug! yay for unit tests!
> 
> Perhaps, but my concern is that it's a bug in my test or test setup, so can
> you check that I haven't made a stupid and obvious mistake?

that's what i did.  i'll need 2 days to go over it in-depth.
Comment 73 Cole Poirier 2020-10-09 18:28:14 BST
(In reply to Luke Kenneth Casson Leighton from comment #72)
> (In reply to Cole Poirier from comment #71)
> 
> > > looks like it's a bug! yay for unit tests!
> > 
> > Perhaps, but my concern is that it's a bug in my test or test setup, so can
> > you check that I haven't made a stupid and obvious mistake?
> 
> that's what i did.  i'll need 2 days to go over it in-depth.

Ok perfect, just wanted to make sure because I didn't have any confidence I had implemented the unit test correctly. Are you saying I got it right! Super cool! :)
Comment 74 Cole Poirier 2020-10-13 04:27:56 BST
The unit test fails in different ways depending on the address being requested. Some will fail the assert on the first of the 100 loop iterations/requests, others will fail by hanging.

```fail first assert
    ICACHE get 1f0 data 5d5f2f0e00000000
    ICACHE LOOKUP FAIL 1c0
Traceback (most recent call last):
  File "test_mmu_dcache.py", line 216, in <module>
    test_icache()
[snip]
  File "test_mmu_dcache.py", line 115, in icache_sim
    "insn @%x=%x expected %x" % (nia, insn, v)
AssertionError: insn @1f3=0 expected aebed43600000000
```

```fail by hanging
ICACHE LOOKUP FAIL 190
^CTraceback (most recent call last):
  File "test_mmu_dcache.py", line 216, in <module>
    test_icache()
  File "test_mmu_dcache.py", line 146, in test_icache
    sim.run()
[snip]
KeyboardInterrupt
```

Any idea of what this tells us?
Comment 75 Luke Kenneth Casson Leighton 2020-10-13 05:35:31 BST
(In reply to Cole Poirier from comment #74)
> The unit test fails in different ways depending on the address being
> requested. Some will fail the assert on the first of the 100 loop
> iterations/requests, others will fail by hanging.

can you narrow that down to a bare minimum set?  2 or 3 requests maximum.

random tests find problems, however they are nonreproducible (unless you set random.seed) which you should do.

then create a repro sequence and add a test specifically for that (short) sequence.

> Any idea of what this tells us?

not until i get an entire day looking at it, no.
Comment 76 Cole Poirier 2020-10-18 01:23:12 BST
(In reply to Luke Kenneth Casson Leighton from comment #75)
> (In reply to Cole Poirier from comment #74)
> > The unit test fails in different ways depending on the address being
> > requested. Some will fail the assert on the first of the 100 loop
> > iterations/requests, others will fail by hanging.
> 
> can you narrow that down to a bare minimum set?  2 or 3 requests maximum.
> 
> random tests find problems, however they are nonreproducible (unless you set
> random.seed) which you should do.
> 
> then create a repro sequence and add a test specifically for that (short)
> sequence.
> 
> > Any idea of what this tells us?
> 
> not until i get an entire day looking at it, no.

Apologies, are you waiting for my repro before you spend 'an entire day looking at it'? I'll try to do this now.
Comment 77 Cole Poirier 2020-10-18 01:38:58 BST
Done.

Have the two different failure modes repro'd, obviously need to have two different random seeds, as in one case it will fail on first by failing an assertion, and on the other it will fail on first by looping infinitely.

These are the two failure modes, documented in icache.py. You'll have to uncomment test_icache() at the bottom of the file and comment/uncomment the different seeds depending on which repro case you want.

```
129    # fail 'AssertionError: insn @1d8=0 expected 61928a6100000000'
130    #random.seed(41)
131    # fail infinite loop 'cache read adr: 24 data: 0'
132    random.seed(43)
```
Comment 78 Luke Kenneth Casson Leighton 2020-10-18 16:32:35 BST
(In reply to Cole Poirier from comment #77)
> Done.
> 
> Have the two different failure modes repro'd, obviously need to have two
> different random seeds, as in one case it will fail on first by failing an
> assertion, and on the other it will fail on first by looping infinitely.
> 
> These are the two failure modes, documented in icache.py. You'll have to
> uncomment test_icache() at the bottom of the file and comment/uncomment the
> different seeds depending on which repro case you want.

ok the next thing to do will be:

* to reduce the loop size down to the absolute minimum value that causes "fail"
* to print out those values, and manually copy them into an array (new test,
  new memory dictionary)
* to start cutting out values and repeating the test, *only* leaving in those
  values that continue to cause the failure

basically if there are 100-1000 values that cause "failure", staring at the
gtkwave file will take a huge amount of time (1,000-10,000 cycles).

if however it's 4, 8, 10 or so values, *now* it's only a few cycles in gtkwave and the problem is easier to diagnose.
Comment 79 Cole Poirier 2020-10-20 20:19:13 BST
(In reply to Luke Kenneth Casson Leighton from comment #78)
> (In reply to Cole Poirier from comment #77)
> > Done.
> > 
> > Have the two different failure modes repro'd, obviously need to have two
> > different random seeds, as in one case it will fail on first by failing an
> > assertion, and on the other it will fail on first by looping infinitely.
> > 
> > These are the two failure modes, documented in icache.py. You'll have to
> > uncomment test_icache() at the bottom of the file and comment/uncomment the
> > different seeds depending on which repro case you want.
> 
> ok the next thing to do will be:
> 
> * to reduce the loop size down to the absolute minimum value that causes
> "fail"
> * to print out those values, and manually copy them into an array (new test,
>   new memory dictionary)
> * to start cutting out values and repeating the test, *only* leaving in those
>   values that continue to cause the failure
> 
> basically if there are 100-1000 values that cause "failure", staring at the
> gtkwave file will take a huge amount of time (1,000-10,000 cycles).
> 
> if however it's 4, 8, 10 or so values, *now* it's only a few cycles in
> gtkwave and the problem is easier to diagnose.

I have the loop set to 3, but actually there need not be a loop at all because it will always fail on the first use/request. So, I'm pretty confident I've already done what you're requesting above. Is that correct?
Comment 80 Luke Kenneth Casson Leighton 2020-10-20 21:25:32 BST
(In reply to Cole Poirier from comment #79)

> I have the loop set to 3, but actually there need not be a loop at all
> because it will always fail on the first use/request.

ok then set up just the one item in the mem dict and create just that one
test, as its own unit test.
Comment 81 Cole Poirier 2020-10-20 21:36:55 BST
(In reply to Luke Kenneth Casson Leighton from comment #80)
> (In reply to Cole Poirier from comment #79)
> 
> > I have the loop set to 3, but actually there need not be a loop at all
> > because it will always fail on the first use/request.
> 
> ok then set up just the one item in the mem dict and create just that one
> test, as its own unit test.

Ok. Will do this eventually. Not a priority until JTAG is debugged and ULX3S85F JTAG pins are done.
Comment 82 Cesar Strauss 2020-11-13 11:13:21 GMT
I've run the icache_tb.vhdl testbench in Microwatt, and it fails for me. Maybe the test itself is defective, or has bitrotten.

Maybe one of the problems with our test, is that it is based on this Microwatt test bench, that already fails.
Comment 83 Luke Kenneth Casson Leighton 2020-11-13 14:15:14 GMT
(In reply to Cesar Strauss from comment #82)
> I've run the icache_tb.vhdl testbench in Microwatt, and it fails for me.

indeed. i have no idea why, yet icache itself clearly works.  bit of a nuisance as it is usually the unit tests by which the code can be understood.
Comment 84 Cole Poirier 2020-11-13 19:14:13 GMT
(In reply to Cesar Strauss from comment #82)
> I've run the icache_tb.vhdl testbench in Microwatt, and it fails for me.
> Maybe the test itself is defective, or has bitrotten.
> 
> Maybe one of the problems with our test, is that it is based on this
> Microwatt test bench, that already fails.

Thanks for your help Cesar! Seems so obvious now that you've done it, I should have tried running the microwatt test first.

(In reply to Luke Kenneth Casson Leighton from comment #83)
> (In reply to Cesar Strauss from comment #82)
> > I've run the icache_tb.vhdl testbench in Microwatt, and it fails for me.
> 
> indeed. i have no idea why, yet icache itself clearly works.  bit of a
> nuisance as it is usually the unit tests by which the code can be understood.

I think it works in 'simple' cases, but seems to lose it when it gets a 64 bit address that isn't basically an 8 bit address i.e. 0x0000000000000001. Not sure how to figure out where this is going wrong, would likely need to understand the module to the extent that I could write the formal proof for it, then I'd likely be able to figure out which part is violating the correct behaviour. But that's a big undertaking, for after the 2 dec tapeout deadline.
Comment 85 Luke Kenneth Casson Leighton 2020-11-13 19:42:00 GMT
(In reply to Cole Poirier from comment #84)

> I think it works in 'simple' cases, but seems to lose it when it gets a 64
> bit address that isn't basically an 8 bit address i.e. 0x0000000000000001.

if the memory is not large enough the wishbone address will "wrap" (truncate)
Comment 86 Luke Kenneth Casson Leighton 2021-12-10 00:38:32 GMT
TODO: adapt to Minerva FetchUnitInterface

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/minerva/units/fetch.py;hb=HEAD

  23         # inputs: address to fetch PC, and valid/stall signalling
  24         self.a_pc_i = Signal(self.addr_wid)
  25         self.a_stall_i = Signal()
  26         self.a_i_valid = Signal()
  27         self.f_stall_i = Signal()
  28         self.f_i_valid = Signal()
  29 
  30         # outputs: instruction (or error), and busy indicators
  31         self.a_busy_o = Signal()
  32         self.f_busy_o = Signal()
  33         self.f_instr_o = Signal(self.data_wid)
  34         self.f_fetch_err_o = Signal()
  35         self.f_badaddr_o = Signal(bad_wid)

aldo contains stall in and out

  31 class Fetch1ToICacheType(RecordObject):
  32     def __init__(self, name=None):
  33         super().__init__(name=name)
  34         self.req           = Signal()
  35         self.virt_mode     = Signal()
  36         self.priv_mode     = Signal()
  37         self.stop_mark     = Signal()
  38         self.sequential    = Signal()
  39         self.nia           = Signal(64)
  40 
  41 
  42 class ICacheToDecode1Type(RecordObject):
  43     def __init__(self, name=None):
  44         super().__init__(name=name)
  45         self.valid         = Signal()
  46         self.stop_mark     = Signal()
  47         self.fetch_failed  = Signal()
  48         self.nia           = Signal(64)
  49         self.insn          = Signal(32)