Bug 502 - determine SRAM block size and implement it
Summary: determine SRAM block size and implement it
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Hardware Layout (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Staf Verhaegen
URL:
Depends on: 591
Blocks: 199 383 469 485
  Show dependency treegraph
 
Reported: 2020-09-24 16:20 BST by Luke Kenneth Casson Leighton
Modified: 2021-05-05 21:44 BST (History)
3 users (show)

See Also:
NLnet milestone: NLNet.2019.Coriolis2
total budget (EUR) for completion of task and all subtasks: 1250
budget (EUR) for this task, excluding subtasks' budget: 1250
parent task for budget allocation: 199
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
# TBD, placeholder "staf"={amount=900, submitted=2021-04-21} "lkcl"={amount=300, submitted=2021-04-24} "cole"={amount=50, paid=2021-05-05}


Attachments
SRAM block spice simulation (17.37 KB, image/png)
2021-04-20 13:20 BST, Staf Verhaegen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-09-24 16:20:42 BST
see https://bugs.libre-soc.org/show_bug.cgi?id=138#c12
Comment 1 Luke Kenneth Casson Leighton 2020-10-05 18:06:57 BST
Staf we are likely to go with a 1k SRAM block size as the "unit".
this because microwatt is designed around creating multiple independent
SRAM blocks.

Cache Tag RAM widths are very odd amounts: 192 for D-Cache and 184 for I-Cache.
these are better off staying as DFFs.

the arrangement we need is:

* 64 bits (8 byte) data width
* byte-level "select" lines
* 7 bit addressing (128 "rows")
* 1R *or* 1W (not both)
* one clock synchronous latency/delay on reads

i believe this is a "standard" arrangement?
Comment 2 Staf Verhaegen 2020-10-06 15:09:26 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)

> the arrangement we need is:
> 
> * 64 bits (8 byte) data width
> * byte-level "select" lines
> * 7 bit addressing (128 "rows")
> * 1R *or* 1W (not both)
> * one clock synchronous latency/delay on reads
> 
> i believe this is a "standard" arrangement?

The SRAM will have 1 port that can be used both for read or write with the following ports:
- a: input of 7 bit
- d: input of 64 bit
- q: output of 64 bit
- we: input of 8 bit
- clk: input of 1 bit

The we vector input will determine for each byte (e.g. 8bits) if it is written or read.

Suppose we do an operation with 0x000000000000000 stored in an address and with d equal to 0xFFFFFFFFFFFFFFFF and we equal to 0xF0. After the operation the address will contain 0xFFFFFFFF00000000 and the Q output will also be 0xFFFFFFFF00000000.
Comment 3 Staf Verhaegen 2020-10-06 15:11:05 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)
> Staf we are likely to go with a 1k SRAM block size as the "unit".
> this because microwatt is designed around creating multiple independent
> SRAM blocks.

I think you should estimate the maximum number of blocks you want to put on the design this way and confirm this then with Jean-Paul for P&R.
Comment 4 Luke Kenneth Casson Leighton 2020-10-06 15:32:35 BST
(In reply to Staf Verhaegen from comment #2)

> The SRAM will have 1 port that can be used both for read or write with the
> following ports:
> - a: input of 7 bit
> - d: input of 64 bit
> - q: output of 64 bit
> - we: input of 8 bit
> - clk: input of 1 bit
> 
> The we vector input will determine for each byte (e.g. 8bits) if it is
> written or read.

ok that sounds great.  it matches with the above, i believe.

> Suppose we do an operation with 0x000000000000000 stored in an address and
> with d equal to 0xFFFFFFFFFFFFFFFF and we equal to 0xF0. After the operation
> the address will contain 0xFFFFFFFF00000000 and the Q output will also be
> 0xFFFFFFFF00000000.

address will contain 0xFFFFFFFF00000000? did you mean data in... oh, you
mean that data *at* the address.


(In reply to Staf Verhaegen from comment #3)

> I think you should estimate the maximum number of blocks you want to put on
> the design this way and confirm this then with Jean-Paul for P&R.

it should only be 9 (or so)

* 1x at address 0x0000_0000 for internal SRAM
* 4x for I-cache (4 "ways")
* 4x for D-cache (4 "ways")

yes only 4k I-cache and 4k D-cache.  (if we do need to expand that to 8k i will do 2x 1k SRAMs and route "manually" using bit 8 of the address).
Comment 5 Luke Kenneth Casson Leighton 2020-11-14 16:38:58 GMT
Staf, Jean-Paul: i have worked out how in litex to add multiple SRAMs
and it is very easy.

i realised that to support the PowerISA Interrupt Handlers, SRAM has to
be at addresses 0x700, etc. which are barely covered by a single 4096 byte
SRAM.

Jean-Paul is it ok to add 4x separate 4096 byte SRAMs?  i assume you are
happy to put them all down the left hand side, starting from the bottom
left corner?

Staf, what size do the 4096 byte SRAM blocks come out at?
Comment 6 Staf Verhaegen 2020-11-16 14:52:11 GMT
(In reply to Luke Kenneth Casson Leighton from comment #5)

> Jean-Paul is it ok to add 4x separate 4096 byte SRAMs?  i assume you are
> happy to put them all down the left hand side, starting from the bottom
> left corner?
> 
> Staf, what size do the 4096 byte SRAM blocks come out at?

Design is not finished but current abstract view has dimension of about 0.5mm by 0.7mm.
You seem to want to fix the exact location of the IO cells and the macro blocks now. Typically one let freedom to floorplanning to put move some of these to better places.
Comment 7 Luke Kenneth Casson Leighton 2020-11-16 15:03:38 GMT
(In reply to Staf Verhaegen from comment #6)

> > Staf, what size do the 4096 byte SRAM blocks come out at?
> 
> Design is not finished but current abstract view has dimension of about
> 0.5mm by 0.7mm.

ok good to know.

> You seem to want to fix the exact location of the IO cells and the macro
> blocks now.

nono: not at all.  i need to know "acceptable QTY" not "position".

> Typically one let freedom to floorplanning to put move some of
> these to better places.

yes, agreed.  the reason i am asking is so that JP has the information
needed.  however what i really need to know from JP - before i add them -
is:

    is it ok to add QTY 4of 4k SRAM blocks?

i need to know "yes or no" to that question.
Comment 8 Staf Verhaegen 2020-12-06 12:35:53 GMT
As asked in http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-December/001451.html below is a generic SRAM simulation model in VHDL. I had to adapt the model for multiple WE bits so the model is not fully tested. It does analyze with ghdl though.

    -- Generic SRAM simulation model
    
    library ieee;
    use ieee.std_logic_1164.all;
    use ieee.numeric_std.all;
    
    entity sram is
      port (
        CLK:    in std_logic;
        -- Width of address will determine number of words in the RAM
        A:      in std_logic_vector;
        -- D and Q have to have the same width
        D:      in std_logic_vector;
        Q:      out std_logic_vector;
        -- Width of WE determines the write granularity
        WE:     in std_logic_vector
      );
    end entity sram;
    
    architecture rtl of sram is
      constant WEWORDBITS: integer := (D'length)/(WE'length);
      type word is array (WE'length - 1 downto 0) of std_logic_vector(WEWORDBITS - 1 downto 0);
      type ram_type is array (0 to (2**A'length) - 1) of word;
    
      signal RAM:       ram_type;
      signal A_hold:    std_logic_vector(A'range);
    
      signal addr:      integer;
      signal addr_hold: integer;
    begin
      addr <= to_integer(unsigned(A));
      addr_hold <= to_integer(unsigned(A_hold));
    
      process(CLK) is
      begin
        if (rising_edge(CLK)) then
          A_hold <= A;
          for weword in 0 to WE'length - 1 loop
              if WE(weword) = '1' then
                -- Write cycle
                RAM(addr)(weword) <= D((weword + 1)*WEWORDBITS - 1 downto weword*WEWORDBITS);
              end if;
          end loop;
        end if;
      end process;
    
      read: for weword in 0 to WE'length - 1 generate
      begin
        Q((weword + 1)*WEWORDBITS - 1 downto weword*WEWORDBITS) <= RAM(addr_hold)(weword);
      end generate;
    end architecture rtl;
Comment 9 Luke Kenneth Casson Leighton 2020-12-22 12:41:03 GMT
(In reply to Jean-Paul.Chaput from comment #100)
> Hello Luke,
> 
> Staf is now in a state where he can provides me with a first
> version of the SRAM block. So would it be possible to include
> instances of thoses block inside the ls180 dry run ?

well.. yes... if i knew how it was done.  i think the most sensible
thing to do is: you and Staf create a small example, first.  doesn't
matter how it's created: verilog, vhdl, nmigen/ilang, doesn't matter.
it also doesn't matter where it goes: soclayout/experiments12, or
alliance-check-toolkit.

also what i would recommend is to include that 1k DFF SRAM, although
i recommend you make it 512 bytes (and i will include two) because
yosys-abc goes mental above 512 bytes, kicking in a different "technique"
which can take several gigabytes of resident RAM.

with a small example that shows both, we will know if there are any surprises.

it might be necessary for example to put the model into its own special
Cell Library, given its own "instance name", so that it is separate and
distinct from the "standard" Cell Library for memory, which results in the
DFF SRAM being substituted.  i am not the best person to write such a
Cell Library as i've never done one before.

once that's completed i will be able to see how it works, and will be
able to do the same thing for ls180.

at the moment, i have no idea how to use the model shown in comment #8

also, we will know if, like last time, there are any surprises as far
as NDAs are concerned.
Comment 10 Luke Kenneth Casson Leighton 2020-12-22 13:06:54 GMT
(In reply to Luke Kenneth Casson Leighton from comment #9)

> with a small example that shows both, we will know if there are any
> surprises.

afterthought / clarity : adding both the 512 byte DFF Memory and the 4k
SRAM Memory to the same worked example will have the advantage of showing
if there are any problems getting yosys to support / understand both.

if the standard way of doing Memory in yosys is taken with the DFF Memory,
how is the SRAM supposed to fit?

if the standard way of doing Memory in yosys is taken with the SRAM Memory,
how is the DFF version supposed to fit?

i have absolutely no idea how to answer these questions although if nobody
else knows either i can help work them out.

(and, of course, in a small example, iteration and discovery of those answers
will take minutes to compile rather than 90 minutes as it does in ls180)

also JP it will be a good place to show how the DFF SRAM manual layout works?
Comment 11 Staf Verhaegen 2020-12-22 13:27:38 GMT
Currently I SPBlock_512W64B8W as name of the 4K SRAM block. This should be the nmigen code to include it:

    a = Signal(9)
    q = Signal(64)
    d = Signal(64)
    we = Signal(8)
    sram = Instance("SPBlock_512W64B8W", i_a=a, o_q=q, i_d=d, i_we=we)
    m.submodules += sram

How to do the conversion to litex I don't know.

Using this should allow to generate Verilog netlist that instantiates the SRAM blocks.
Comment 12 Luke Kenneth Casson Leighton 2020-12-22 14:31:33 GMT
(In reply to Staf Verhaegen from comment #11)
> Currently I SPBlock_512W64B8W as name of the 4K SRAM block.

ah! this was part of the missing information for the puzzle :)

> This should be
> the nmigen code to include it:
> 
>     a = Signal(9)
>     q = Signal(64)
>     d = Signal(64)
>     we = Signal(8)
>     sram = Instance("SPBlock_512W64B8W", i_a=a, o_q=q, i_d=d, i_we=we)
>     m.submodules += sram

ahh goood, perfect.  so this will not conflict with yosys detection of Memory/arrays at all. excellent.

based on this, creating a tiny example for soclayout called experiments12 should be very easy.

> How to do the conversion to litex I don't know.

that's why i suggested doing an extremely simple example (not involving litex at all).

i may have to create a special wishbone peripheral for this (mostly cut/paste of the way that litex does SRAM) so as to keep it separate.

then, the standard litex "SocCore.add_sram()" litex function will create Memory (which yosys turns to DFF), the special peripheral creates the 
SPBlock_512W64B8W instance.


> Using this should allow to generate Verilog netlist that instantiates the
> SRAM blocks.

fanntastic.

to complete a "make lvx", on the tiny example, will a special (new) Cell Library be needed, one that contains one item: SPBlock_512W64B8W?

or, is there something else going on?
Comment 13 Luke Kenneth Casson Leighton 2020-12-22 15:04:31 GMT
okaay, here we go.  "make lvx" in soclayout experiments12


1. Executing RTLIL frontend.
Input filename: memory.il

2. Executing HIERARCHY pass (managing design hierarchy).

2.1. Analyzing design hierarchy..
Top module:  \memory
ERROR: Module `\SPBlock_512W64B8W' referenced in module `\memory' in cell `\U$$0' is not part of the design.
mk/synthesis-yosys.mk:50: recipe for target 'memory.blif' failed
make: *** [memory.blif] Error 1


this is what i was expecting: there is no Cell Library for yosys to "understand" the block named SPBlock_512W6B48W.  how is that solved?


commit 4b443ec0a071074334b29f3a972949a889f61cd4
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Dec 22 15:02:32 2020 +0000

    add SPBlock_512W64B8W to memory.py

https://git.libre-soc.org/?p=soclayout.git;a=commitdiff;h=4b443ec0a071074334b29f3a972949a889f61cd4
Comment 14 Staf Verhaegen 2020-12-22 16:13:05 GMT
(In reply to Luke Kenneth Casson Leighton from comment #13)
> okaay, here we go.  "make lvx" in soclayout experiments12
> 
> 
> 1. Executing RTLIL frontend.
> Input filename: memory.il
> 
> 2. Executing HIERARCHY pass (managing design hierarchy).
> 
> 2.1. Analyzing design hierarchy..
> Top module:  \memory
> ERROR: Module `\SPBlock_512W64B8W' referenced in module `\memory' in cell
> `\U$$0' is not part of the design.
> mk/synthesis-yosys.mk:50: recipe for target 'memory.blif' failed
> make: *** [memory.blif] Error 1
> 
> 
> this is what i was expecting: there is no Cell Library for yosys to
> "understand" the block named SPBlock_512W6B48W.  how is that solved?

I can of think of some solutions:
* Custom yosys code that defines the external cell
* An empty verilog module for the block, yosys scripting then has to mark the block as external so it is not removed
* A liberty file for the SRAM block (like there is for the cells). This would contain the pins but no timing.

I think it is Jean-Paul who has to look at which solution is the best for his Coriolis flow.
Comment 15 Cole Poirier 2020-12-22 18:50:11 GMT
(In reply to Staf Verhaegen from comment #14)
> (In reply to Luke Kenneth Casson Leighton from comment #13)
> > okaay, here we go.  "make lvx" in soclayout experiments12
> > [snip]
> > this is what i was expecting: there is no Cell Library for yosys to
> > "understand" the block named SPBlock_512W6B48W.  how is that solved?
> 
> I can of think of some solutions:
> [snip]
> I think it is Jean-Paul who has to look at which solution is the best for
> his Coriolis flow.

I think this stackoverflow question and answer may provide the process needed to do this: https://stackoverflow.com/questions/60143268/how-to-create-a-custom-technology-cell-map-for-yosys

Note that Dave Shah has commented on the answer so it seems like it's the right process.
Comment 16 Luke Kenneth Casson Leighton 2021-01-25 14:28:37 GMT
Staf: we still need that cell library (aka liberty file) with just the one
item in it: SPBlock_512W64B8W

as none of us in libresoc know how to create liberty files the project is held
up until this is available.

question: if we were to take the IO pad library, which if i remember correctly you said only has one cell in it, and replace its files with the model in comment #8 would that work?

as we do not know how liberty files are made we need a *simple* template to start from and the IO pad library would be as good a starting point as any.

if that is the case, is there anything that we need to know to make that cell library containing one item?

once we have one example like this it should be possible to create the PLL and the SRAM-DFF version, but until we have the one example everything is held up.
Comment 17 Staf Verhaegen 2021-01-30 12:11:14 GMT
(In reply to Luke Kenneth Casson Leighton from comment #16)
> Staf: we still need that cell library (aka liberty file) with just the one
> item in it: SPBlock_512W64B8W
> 

I propose to use Verilog files to define blackboxes for yosys. This would as follows for the SRAM bock:

    (* blackbox = 1 *)
    module SPBlock_512W64B8W(input [8:0] a, input [63:0] d, output [63:0] q, input [7:0] we, input clk);
    endmodule // SPBlock_512W64B8W

This has been tested to work by Jean-Paul and support it has been added to the Coriolis flow.

I also noticed that I did not connect pin to the SRAM block in previous code. The nmogen code should be:

    a = Signal(9)
    q = Signal(64)
    d = Signal(64)
    we = Signal(8)
    sram = Instance(
"SPBlock_512W64B8W", i_a=a, o_q=q, i_d=d, i_we=we, i_clk=ClockSignal())
    m.submodules += sram
Comment 18 Staf Verhaegen 2021-01-30 12:13:30 GMT
Accidently saved comment, nmigen code:

    a = Signal(9)
    q = Signal(64)
    d = Signal(64)
    we = Signal(8)
    sram = Instance("SPBlock_512W64B8W",
        i_a=a, o_q=q, i_d=d, i_we=we, i_clk=ClockSignal()
    )
    m.submodules += sram
Comment 19 Luke Kenneth Casson Leighton 2021-01-30 12:34:03 GMT
thank you Staf (and Jean-Paul), this is great, it unblocks the 4k SRAM
and the PLL can be done the same way.  the DFF-SRAM is slightly
different but could either be ignored for now or done differently.

unfortunately the critical reliance on NDA'd versions of FlexLib means
that i cannot give any kind of confirmation or perform iterative
development or debugging:

diff --git a/experiments12/Makefile b/experiments12/Makefile
index acd76db..5be0fc9 100755
--- a/experiments12/Makefile
+++ b/experiments12/Makefile
@@ -2,7 +2,7 @@
 
         LOGICAL_SYNTHESIS = Yosys
        PHYSICAL_SYNTHESIS = Coriolis
-               DESIGN_KIT = sxlib
+               DESIGN_KIT = FlexLib018
 
 #           YOSYS_FLATTEN = Yes
                      CHIP = chip

if removing that and reverting to sxlib:

        Python stack trace:
        #0 in                  <module>() at /home/lkcl/soclayout/experiments12/coriolis2/settings.py:23
        #1 in          loadUserSettings() at .../lib/python2.7/dist-packages/crlcore/helpers/__init__.py:441
        #2 in                  <module>() at /home/lkcl/alliance-check-toolkit/bin/doChip.py:15
        Error was:
          No module named NDA.node180.tsmc_c018

settings.py contains this:


+from   NDA.node180.tsmc_c018 import techno, FlexLib, LibreSOCIO, LibreSOCMem
Comment 20 Jean-Paul.Chaput 2021-02-01 17:45:21 GMT
Hello,

I've commited d35e748 which provides correct block netlist integration.

To integrate a block, asides from the layout you have to provide :

* A Verilog blackbox netlist ("machin.v") for Yosys.
* A VHDL hollow netlist ("machin.vbe") for blif2vst and Coriolis
  at large.

Concerning the use in symbolic mode, we would need a symbolic abstract
view of the SRAM block. This is not very complicated, but still needs
a modicum of time. And as it has a bit complex interface than the
I/O pads, I leave it to the initiative of Staf.

And to use the Coriolis in full compliance we should also add a diode
(dio_x0) to the symbolic library nsxlib.

The layout integration is not completed yet, but in good way.

Best,
Comment 21 Luke Kenneth Casson Leighton 2021-02-02 17:56:50 GMT
(In reply to Jean-Paul.Chaput from comment #20)
> Hello,
> 
> I've commited d35e748 which provides correct block netlist integration.

star.

> 
> To integrate a block, asides from the layout you have to provide :
> 
> * A Verilog blackbox netlist ("machin.v") for Yosys.

i am investigating if there is an easy way for nmigen to apply
user attributes to modules.  this would do the same job.


> * A VHDL hollow netlist ("machin.vbe") for blif2vst and Coriolis
>   at large.
> 
> Concerning the use in symbolic mode, we would need a symbolic abstract
> view of the SRAM block.

otherwise the burden of even basic syntax checking falls entirely to
you and Staf.

> This is not very complicated, but still needs
> a modicum of time. And as it has a bit complex interface than the
> I/O pads, I leave it to the initiative of Staf.

Staf i think i will assign some budget to this task, to help with that.
 
> And to use the Coriolis in full compliance we should also add a diode
> (dio_x0) to the symbolic library nsxlib.
> 
> The layout integration is not completed yet, but in good way.

super.
Comment 22 Luke Kenneth Casson Leighton 2021-02-02 19:35:08 GMT
apparently this will do the trick:

     m.submodules.a.attrs["test"] = "value"
Comment 23 Luke Kenneth Casson Leighton 2021-02-20 14:32:43 GMT
commit 800e4d580b833f1307bf447987a1bc3acf2515a4 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Feb 20 14:30:07 2021 +0000

    add Wishbone-wrapped SPBlock_512W64B8W

now this needs adding to ls180.  once added i cannot simulate it (because it is
an Instance), and i cannot P&R it because there is no Symbolic representation.
great care has to be taken, therefore.
Comment 24 Luke Kenneth Casson Leighton 2021-02-20 14:59:14 GMT
commit 362d5638d3c51a76bf42f140ab781af0ce58328b (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Feb 20 14:58:58 2021 +0000

    add QTY 4of 4k SRAMs SPBlock512W64B8W to TestIssuer if enabled
Comment 25 Luke Kenneth Casson Leighton 2021-02-20 15:22:31 GMT
commit 0cd474099a8106c81178c6ac1cd507737068d24d (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Feb 20 15:22:18 2021 +0000

    add litex wishbone interconnect to 4x 4k SRAMs
    also had to add one more of the massive DFF 512 byte SRAMs in order to cover
    all the exception areas (0x900) without going into 4k SRAM area,
    which litex demands to be on an aligned boundary
Comment 26 Luke Kenneth Casson Leighton 2021-02-20 15:32:06 GMT
https://git.libre-soc.org/?p=soclayout.git;a=commitdiff;h=342a89ebd25fa4c988826d01e1db0ff3d24387a0

commit 342a89ebd25fa4c988826d01e1db0ff3d24387a0
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Feb 20 15:25:29 2021 +0000

    add 4k sram build

ok that's in.  

* the QTY 4of SPBlock512W64B8W instances are actually created
  in nmigen using Instance(), exposed via QTY 4of Wishbone Buses

* QTY 4of Wishbone Buses are created by TestIssuer Verilog
  (make ls180_verilog)

* litex libresoc/core.py "picks up" those QTY 4 Wishbone Buses

* litex ls180soc.py actually connects those up onto the main litex
  interconnect bus.

* make ls180 in soc/litex/florent/Makefile constructs the ilang file


it's done this way because there's not a cat in hell's chance i'm going
to modify or add to litex.  i'm sure it's possible: it's just so devoid of
debug-messages and error-catching that it's not worth the risk.
Comment 27 Staf Verhaegen 2021-02-21 10:59:08 GMT
(In reply to Luke Kenneth Casson Leighton from comment #23)
> commit 800e4d580b833f1307bf447987a1bc3acf2515a4 (HEAD -> master)
> Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
> Date:   Sat Feb 20 14:30:07 2021 +0000
> 
>     add Wishbone-wrapped SPBlock_512W64B8W
> 
> now this needs adding to ls180.  once added i cannot simulate it (because it
> is
> an Instance),

Maybe you can make the block with an option simulation=(False|True) so you can have a Wishbone wrapped Memory block during simulation ?

> and i cannot P&R it because there is no Symbolic representation.

As intermediary step, you should be able to do synthesis using for example nsxlib and then simulate the design post-synthesis using a VHDL or verilog model for the SRAM block.
Comment 28 Luke Kenneth Casson Leighton 2021-02-21 12:54:19 GMT
(In reply to Staf Verhaegen from comment #27)

> Maybe you can make the block with an option simulation=(False|True) so you
> can have a Wishbone wrapped Memory block during simulation ?

ah! i think there might be a way to detect "platform=" when running simulations.

> > and i cannot P&R it because there is no Symbolic representation.
> 
> As intermediary step, you should be able to do synthesis using for example
> nsxlib and then simulate the design post-synthesis using a VHDL or verilog
> model for the SRAM block.

good point.
Comment 29 Luke Kenneth Casson Leighton 2021-04-19 20:46:31 BST
this was done last week, successfully simulated as well.
Comment 30 Luke Kenneth Casson Leighton 2021-04-20 09:57:10 BST
summary work:
* cole - research into techniques for blackbox cells in yosys
* staf - extra verification work on the selected SRAM size block
* lkcl - integration as a blackbox into nmigen HDL
Comment 31 Staf Verhaegen 2021-04-20 13:06:38 BST
(In reply to Luke Kenneth Casson Leighton from comment #30)
> summary work:
> * cole - research into techniques for blackbox cells in yosys
> * staf - extra verification work on the selected SRAM size block
> * lkcl - integration as a blackbox into nmigen HDL

I would say it both integration of SRAM in libre-SOC and extra verification I did.
Comment 32 Staf Verhaegen 2021-04-20 13:20:07 BST
Created attachment 131 [details]
SRAM block spice simulation

Here are the results of the verification I did on the SRAM block.  I simulate four clock cycles:
- cycle 1: Write 0 to address 0
- cycle 2: Write $FFFFFFFFFFFFFFFF to address 5
- cycle 3: Read address 0
- cycle 4: Read address 5

In the picture you can see the clk, d (=data_in) and we (=write-enable) signals on the top graph and clk and q (=data_out) on the bottom one.
You can see the write through of written data in first 2 cycles and correct values read in the next two cycles.

Also the clk->q delay is shown in the graph for typical corner. A value of almost 2ns is seen. This is without parasitics so with that included I think the clk->q will be more like 3-4ns meaning 200MHz is I think not a problem. All depends of course how much logic is after the SRAM.

Unfortunately I had to use the proprietary Eldo spice simulator as ngspice did not find a DC solution after two days of simulation. Xyce could not read the TSMC SPICE models although that should be possible with some helper tools. The Eldo simulation finished in about 15 minutes.