71 – option to replace SetAssocCache PLRU with random selection (LFSR)

Bug 71 - option to replace SetAssocCache PLRU with random selection (LFSR)

Summary: option to replace SetAssocCache PLRU with random selection (LFSR)

Status:	PAYMENTPENDING FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:

Reported:	2019-04-22 04:56 BST by Luke Kenneth Casson Leighton
Modified:	2020-12-06 14:03 GMT (History)
CC List:	3 users (show)

See Also:
NLnet milestone:	NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks:	500
budget (EUR) for this task, excluding subtasks' budget:	500
parent task for budget allocation:	191
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	"lkcl"={amount=250, paid=2019-06-04} "jacob"={amount=250, paid=2019-06-04}

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2019-04-22 04:56:01 BST

http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-April/001168.html

Comment 1 Luke Kenneth Casson Leighton 2019-04-22 04:59:12 BST

jacob can you write (and add) a LFSR module? that would be really handy.

https://git.libre-riscv.org/?p=soc.git;a=tree;f=TLB/src;hb=HEAD

Comment 2 Luke Kenneth Casson Leighton 2019-04-22 06:20:35 BST

it goes something like this:

* cset is the address (passed to N banks of memory, simultaneously)
* cset also goes into the PLRU to generate an arbitrary N (one of the mem banks)
  where there is one PLRU *per cset* (so... lots of them)
* the incoming tag is dropped into the data that's stored in the selected bank
* active bit is set as well (bit 0 of the actual data stored in the Memory)

to read:

* hit *all* N banks with the cset address and the tag
* ask all banks to read and see if the tag-bits match the requested tag
* if so, and the active bit is also set in the bank, return data.

so... what you are saying, jacob, is that a random selection of which bank
to use is fine, no need to check if any of the banks are full or not.

i have seen on google searches that both algorithms are sometimes used.
am curious as to why.

Comment 3 Jacob Lifshay 2019-04-22 06:35:00 BST

(In reply to Luke Kenneth Casson Leighton from comment #2)
> so... what you are saying, jacob, is that a random selection of which bank
> to use is fine, no need to check if any of the banks are full or not.
Yeah, when you get a miss, then picking a bank to replace/fill randomly works fine.
> 
> i have seen on google searches that both algorithms are sometimes used.
> am curious as to why.
Random replacement is used for the simplicity (hence why ARM picked it for some of their processors) and because it's generally good enough.
LRU is used because it is a conceptually simple deterministic algorithm that was historically one of the few algorithms initially proposed. It's also closely related to the optimal algorithm (load what will be used next), in that it's a time-mirrored version.

Comment 4 Jacob Lifshay 2019-04-22 08:50:31 BST

(In reply to Luke Kenneth Casson Leighton from comment #1)
> jacob can you write (and add) a LFSR module? that would be really handy.
> 
> https://git.libre-riscv.org/?p=soc.git;a=tree;f=TLB/src;hb=HEAD

I finished a comprehensive LFSR implementation: https://git.libre-riscv.org/?p=soc.git;a=blob;f=TLB/src/LFSR2.py;h=8cdb2cc560ca9a28e88d7e96e81213da2c44bf3d;hb=HEAD

It appears as though Daniel was a little faster though

Comment 5 Luke Kenneth Casson Leighton 2019-04-22 17:09:36 BST

(In reply to Jacob Lifshay from comment #4)

> I finished a comprehensive LFSR implementation:
> https://git.libre-riscv.org/?p=soc.git;a=blob;f=TLB/src/LFSR2.py;
> h=8cdb2cc560ca9a28e88d7e96e81213da2c44bf3d;hb=HEAD

excellent.  i like the split of the polynomials, and the unit test's great.
 
> It appears as though Daniel was a little faster though

yes :) context:
http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-April/001180.html

ironically it probably only took you what... 45 minutes to write?
i've been doing review/code-morph for about 2.5 hours so far (!)

https://youtu.be/dljWBXSBGYw

even after that i found some more things (which make the code even shorter,
and more "direct use of python features")

http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-April/001181.html

every time i look at it there's another way to reduce the code by another
line or two.  LFSR2.__init__ i managed to remove 2 more lines of code just
now:

    def __init__(self, exponents=()):
        for e in exponents:
            assert isinstance(e, int), TypeError("%s must be an int" % repr(e))
            assert (e >= 0), ValueError("%d must not be negative" % e)
        set.__init__(self, set(exponents).union({0})) # must contain zero

the exponents can be a list, tuple, set, dictionary, generator, iterator
or another LFSRPolynomial (which is derived from a set) and it doesn't
matter, they all get converted to a set, zero is union'd into it, and
because the result of the union is a new set, that new set is returned
to the constructor of set.

Comment 6 Luke Kenneth Casson Leighton 2019-04-22 20:45:13 BST

http://www.ntu.edu.sg/home/smitha/ParaCache/Paracache/sa4.html

found an online simulator of a 4-way set-associative cache,
may prove very useful to understanding.

Comment 7 Daniel Benusovich 2019-04-23 05:52:00 BST

I saw that there is a plru or lfsr option which is kind of neat. Should we remove the plru entirely? The ticket leads me to believe the extermination of the plru is the only way forward haha.

Comment 8 Luke Kenneth Casson Leighton 2019-04-24 19:18:15 BST

(In reply to Daniel Benusovich from comment #7)
> I saw that there is a plru or lfsr option which is kind of neat. Should we
> remove the plru entirely? The ticket leads me to believe the extermination
> of the plru is the only way forward haha.

:) i honestly don't know the answer to that one, the nice thing is, if
it's not activated it won't do any harm or add any gates.

just because i'm curious, i'd actually like to *see* the gates increase
that results, and a unit test to output some hit/miss statistics for both
algorithms.

much of this project is not just about "getting it done", it's about
making the processor a significant long-term educational resource.

we want people to instead of having a tablet or phone or laptop and
feeling totally disempowered about their device, to go, "moo? i'm running
a device where i can actually see how it works at the hardware level??
moo? awesome!"