see https://bugs.libre-soc.org/show_bug.cgi?id=138#c12
Staf we are likely to go with a 1k SRAM block size as the "unit". this because microwatt is designed around creating multiple independent SRAM blocks. Cache Tag RAM widths are very odd amounts: 192 for D-Cache and 184 for I-Cache. these are better off staying as DFFs. the arrangement we need is: * 64 bits (8 byte) data width * byte-level "select" lines * 7 bit addressing (128 "rows") * 1R *or* 1W (not both) * one clock synchronous latency/delay on reads i believe this is a "standard" arrangement?
(In reply to Luke Kenneth Casson Leighton from comment #1) > the arrangement we need is: > > * 64 bits (8 byte) data width > * byte-level "select" lines > * 7 bit addressing (128 "rows") > * 1R *or* 1W (not both) > * one clock synchronous latency/delay on reads > > i believe this is a "standard" arrangement? The SRAM will have 1 port that can be used both for read or write with the following ports: - a: input of 7 bit - d: input of 64 bit - q: output of 64 bit - we: input of 8 bit - clk: input of 1 bit The we vector input will determine for each byte (e.g. 8bits) if it is written or read. Suppose we do an operation with 0x000000000000000 stored in an address and with d equal to 0xFFFFFFFFFFFFFFFF and we equal to 0xF0. After the operation the address will contain 0xFFFFFFFF00000000 and the Q output will also be 0xFFFFFFFF00000000.
(In reply to Luke Kenneth Casson Leighton from comment #1) > Staf we are likely to go with a 1k SRAM block size as the "unit". > this because microwatt is designed around creating multiple independent > SRAM blocks. I think you should estimate the maximum number of blocks you want to put on the design this way and confirm this then with Jean-Paul for P&R.
(In reply to Staf Verhaegen from comment #2) > The SRAM will have 1 port that can be used both for read or write with the > following ports: > - a: input of 7 bit > - d: input of 64 bit > - q: output of 64 bit > - we: input of 8 bit > - clk: input of 1 bit > > The we vector input will determine for each byte (e.g. 8bits) if it is > written or read. ok that sounds great. it matches with the above, i believe. > Suppose we do an operation with 0x000000000000000 stored in an address and > with d equal to 0xFFFFFFFFFFFFFFFF and we equal to 0xF0. After the operation > the address will contain 0xFFFFFFFF00000000 and the Q output will also be > 0xFFFFFFFF00000000. address will contain 0xFFFFFFFF00000000? did you mean data in... oh, you mean that data *at* the address. (In reply to Staf Verhaegen from comment #3) > I think you should estimate the maximum number of blocks you want to put on > the design this way and confirm this then with Jean-Paul for P&R. it should only be 9 (or so) * 1x at address 0x0000_0000 for internal SRAM * 4x for I-cache (4 "ways") * 4x for D-cache (4 "ways") yes only 4k I-cache and 4k D-cache. (if we do need to expand that to 8k i will do 2x 1k SRAMs and route "manually" using bit 8 of the address).
Staf, Jean-Paul: i have worked out how in litex to add multiple SRAMs and it is very easy. i realised that to support the PowerISA Interrupt Handlers, SRAM has to be at addresses 0x700, etc. which are barely covered by a single 4096 byte SRAM. Jean-Paul is it ok to add 4x separate 4096 byte SRAMs? i assume you are happy to put them all down the left hand side, starting from the bottom left corner? Staf, what size do the 4096 byte SRAM blocks come out at?
(In reply to Luke Kenneth Casson Leighton from comment #5) > Jean-Paul is it ok to add 4x separate 4096 byte SRAMs? i assume you are > happy to put them all down the left hand side, starting from the bottom > left corner? > > Staf, what size do the 4096 byte SRAM blocks come out at? Design is not finished but current abstract view has dimension of about 0.5mm by 0.7mm. You seem to want to fix the exact location of the IO cells and the macro blocks now. Typically one let freedom to floorplanning to put move some of these to better places.
(In reply to Staf Verhaegen from comment #6) > > Staf, what size do the 4096 byte SRAM blocks come out at? > > Design is not finished but current abstract view has dimension of about > 0.5mm by 0.7mm. ok good to know. > You seem to want to fix the exact location of the IO cells and the macro > blocks now. nono: not at all. i need to know "acceptable QTY" not "position". > Typically one let freedom to floorplanning to put move some of > these to better places. yes, agreed. the reason i am asking is so that JP has the information needed. however what i really need to know from JP - before i add them - is: is it ok to add QTY 4of 4k SRAM blocks? i need to know "yes or no" to that question.
As asked in http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-December/001451.html below is a generic SRAM simulation model in VHDL. I had to adapt the model for multiple WE bits so the model is not fully tested. It does analyze with ghdl though. -- Generic SRAM simulation model library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity sram is port ( CLK: in std_logic; -- Width of address will determine number of words in the RAM A: in std_logic_vector; -- D and Q have to have the same width D: in std_logic_vector; Q: out std_logic_vector; -- Width of WE determines the write granularity WE: in std_logic_vector ); end entity sram; architecture rtl of sram is constant WEWORDBITS: integer := (D'length)/(WE'length); type word is array (WE'length - 1 downto 0) of std_logic_vector(WEWORDBITS - 1 downto 0); type ram_type is array (0 to (2**A'length) - 1) of word; signal RAM: ram_type; signal A_hold: std_logic_vector(A'range); signal addr: integer; signal addr_hold: integer; begin addr <= to_integer(unsigned(A)); addr_hold <= to_integer(unsigned(A_hold)); process(CLK) is begin if (rising_edge(CLK)) then A_hold <= A; for weword in 0 to WE'length - 1 loop if WE(weword) = '1' then -- Write cycle RAM(addr)(weword) <= D((weword + 1)*WEWORDBITS - 1 downto weword*WEWORDBITS); end if; end loop; end if; end process; read: for weword in 0 to WE'length - 1 generate begin Q((weword + 1)*WEWORDBITS - 1 downto weword*WEWORDBITS) <= RAM(addr_hold)(weword); end generate; end architecture rtl;
(In reply to Jean-Paul.Chaput from comment #100) > Hello Luke, > > Staf is now in a state where he can provides me with a first > version of the SRAM block. So would it be possible to include > instances of thoses block inside the ls180 dry run ? well.. yes... if i knew how it was done. i think the most sensible thing to do is: you and Staf create a small example, first. doesn't matter how it's created: verilog, vhdl, nmigen/ilang, doesn't matter. it also doesn't matter where it goes: soclayout/experiments12, or alliance-check-toolkit. also what i would recommend is to include that 1k DFF SRAM, although i recommend you make it 512 bytes (and i will include two) because yosys-abc goes mental above 512 bytes, kicking in a different "technique" which can take several gigabytes of resident RAM. with a small example that shows both, we will know if there are any surprises. it might be necessary for example to put the model into its own special Cell Library, given its own "instance name", so that it is separate and distinct from the "standard" Cell Library for memory, which results in the DFF SRAM being substituted. i am not the best person to write such a Cell Library as i've never done one before. once that's completed i will be able to see how it works, and will be able to do the same thing for ls180. at the moment, i have no idea how to use the model shown in comment #8 also, we will know if, like last time, there are any surprises as far as NDAs are concerned.
(In reply to Luke Kenneth Casson Leighton from comment #9) > with a small example that shows both, we will know if there are any > surprises. afterthought / clarity : adding both the 512 byte DFF Memory and the 4k SRAM Memory to the same worked example will have the advantage of showing if there are any problems getting yosys to support / understand both. if the standard way of doing Memory in yosys is taken with the DFF Memory, how is the SRAM supposed to fit? if the standard way of doing Memory in yosys is taken with the SRAM Memory, how is the DFF version supposed to fit? i have absolutely no idea how to answer these questions although if nobody else knows either i can help work them out. (and, of course, in a small example, iteration and discovery of those answers will take minutes to compile rather than 90 minutes as it does in ls180) also JP it will be a good place to show how the DFF SRAM manual layout works?
Currently I SPBlock_512W64B8W as name of the 4K SRAM block. This should be the nmigen code to include it: a = Signal(9) q = Signal(64) d = Signal(64) we = Signal(8) sram = Instance("SPBlock_512W64B8W", i_a=a, o_q=q, i_d=d, i_we=we) m.submodules += sram How to do the conversion to litex I don't know. Using this should allow to generate Verilog netlist that instantiates the SRAM blocks.
(In reply to Staf Verhaegen from comment #11) > Currently I SPBlock_512W64B8W as name of the 4K SRAM block. ah! this was part of the missing information for the puzzle :) > This should be > the nmigen code to include it: > > a = Signal(9) > q = Signal(64) > d = Signal(64) > we = Signal(8) > sram = Instance("SPBlock_512W64B8W", i_a=a, o_q=q, i_d=d, i_we=we) > m.submodules += sram ahh goood, perfect. so this will not conflict with yosys detection of Memory/arrays at all. excellent. based on this, creating a tiny example for soclayout called experiments12 should be very easy. > How to do the conversion to litex I don't know. that's why i suggested doing an extremely simple example (not involving litex at all). i may have to create a special wishbone peripheral for this (mostly cut/paste of the way that litex does SRAM) so as to keep it separate. then, the standard litex "SocCore.add_sram()" litex function will create Memory (which yosys turns to DFF), the special peripheral creates the SPBlock_512W64B8W instance. > Using this should allow to generate Verilog netlist that instantiates the > SRAM blocks. fanntastic. to complete a "make lvx", on the tiny example, will a special (new) Cell Library be needed, one that contains one item: SPBlock_512W64B8W? or, is there something else going on?
okaay, here we go. "make lvx" in soclayout experiments12 1. Executing RTLIL frontend. Input filename: memory.il 2. Executing HIERARCHY pass (managing design hierarchy). 2.1. Analyzing design hierarchy.. Top module: \memory ERROR: Module `\SPBlock_512W64B8W' referenced in module `\memory' in cell `\U$$0' is not part of the design. mk/synthesis-yosys.mk:50: recipe for target 'memory.blif' failed make: *** [memory.blif] Error 1 this is what i was expecting: there is no Cell Library for yosys to "understand" the block named SPBlock_512W6B48W. how is that solved? commit 4b443ec0a071074334b29f3a972949a889f61cd4 Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Dec 22 15:02:32 2020 +0000 add SPBlock_512W64B8W to memory.py https://git.libre-soc.org/?p=soclayout.git;a=commitdiff;h=4b443ec0a071074334b29f3a972949a889f61cd4
(In reply to Luke Kenneth Casson Leighton from comment #13) > okaay, here we go. "make lvx" in soclayout experiments12 > > > 1. Executing RTLIL frontend. > Input filename: memory.il > > 2. Executing HIERARCHY pass (managing design hierarchy). > > 2.1. Analyzing design hierarchy.. > Top module: \memory > ERROR: Module `\SPBlock_512W64B8W' referenced in module `\memory' in cell > `\U$$0' is not part of the design. > mk/synthesis-yosys.mk:50: recipe for target 'memory.blif' failed > make: *** [memory.blif] Error 1 > > > this is what i was expecting: there is no Cell Library for yosys to > "understand" the block named SPBlock_512W6B48W. how is that solved? I can of think of some solutions: * Custom yosys code that defines the external cell * An empty verilog module for the block, yosys scripting then has to mark the block as external so it is not removed * A liberty file for the SRAM block (like there is for the cells). This would contain the pins but no timing. I think it is Jean-Paul who has to look at which solution is the best for his Coriolis flow.
(In reply to Staf Verhaegen from comment #14) > (In reply to Luke Kenneth Casson Leighton from comment #13) > > okaay, here we go. "make lvx" in soclayout experiments12 > > [snip] > > this is what i was expecting: there is no Cell Library for yosys to > > "understand" the block named SPBlock_512W6B48W. how is that solved? > > I can of think of some solutions: > [snip] > I think it is Jean-Paul who has to look at which solution is the best for > his Coriolis flow. I think this stackoverflow question and answer may provide the process needed to do this: https://stackoverflow.com/questions/60143268/how-to-create-a-custom-technology-cell-map-for-yosys Note that Dave Shah has commented on the answer so it seems like it's the right process.
Staf: we still need that cell library (aka liberty file) with just the one item in it: SPBlock_512W64B8W as none of us in libresoc know how to create liberty files the project is held up until this is available. question: if we were to take the IO pad library, which if i remember correctly you said only has one cell in it, and replace its files with the model in comment #8 would that work? as we do not know how liberty files are made we need a *simple* template to start from and the IO pad library would be as good a starting point as any. if that is the case, is there anything that we need to know to make that cell library containing one item? once we have one example like this it should be possible to create the PLL and the SRAM-DFF version, but until we have the one example everything is held up.
(In reply to Luke Kenneth Casson Leighton from comment #16) > Staf: we still need that cell library (aka liberty file) with just the one > item in it: SPBlock_512W64B8W > I propose to use Verilog files to define blackboxes for yosys. This would as follows for the SRAM bock: (* blackbox = 1 *) module SPBlock_512W64B8W(input [8:0] a, input [63:0] d, output [63:0] q, input [7:0] we, input clk); endmodule // SPBlock_512W64B8W This has been tested to work by Jean-Paul and support it has been added to the Coriolis flow. I also noticed that I did not connect pin to the SRAM block in previous code. The nmogen code should be: a = Signal(9) q = Signal(64) d = Signal(64) we = Signal(8) sram = Instance( "SPBlock_512W64B8W", i_a=a, o_q=q, i_d=d, i_we=we, i_clk=ClockSignal()) m.submodules += sram
Accidently saved comment, nmigen code: a = Signal(9) q = Signal(64) d = Signal(64) we = Signal(8) sram = Instance("SPBlock_512W64B8W", i_a=a, o_q=q, i_d=d, i_we=we, i_clk=ClockSignal() ) m.submodules += sram
thank you Staf (and Jean-Paul), this is great, it unblocks the 4k SRAM and the PLL can be done the same way. the DFF-SRAM is slightly different but could either be ignored for now or done differently. unfortunately the critical reliance on NDA'd versions of FlexLib means that i cannot give any kind of confirmation or perform iterative development or debugging: diff --git a/experiments12/Makefile b/experiments12/Makefile index acd76db..5be0fc9 100755 --- a/experiments12/Makefile +++ b/experiments12/Makefile @@ -2,7 +2,7 @@ LOGICAL_SYNTHESIS = Yosys PHYSICAL_SYNTHESIS = Coriolis - DESIGN_KIT = sxlib + DESIGN_KIT = FlexLib018 # YOSYS_FLATTEN = Yes CHIP = chip if removing that and reverting to sxlib: Python stack trace: #0 in <module>() at /home/lkcl/soclayout/experiments12/coriolis2/settings.py:23 #1 in loadUserSettings() at .../lib/python2.7/dist-packages/crlcore/helpers/__init__.py:441 #2 in <module>() at /home/lkcl/alliance-check-toolkit/bin/doChip.py:15 Error was: No module named NDA.node180.tsmc_c018 settings.py contains this: +from NDA.node180.tsmc_c018 import techno, FlexLib, LibreSOCIO, LibreSOCMem
Hello, I've commited d35e748 which provides correct block netlist integration. To integrate a block, asides from the layout you have to provide : * A Verilog blackbox netlist ("machin.v") for Yosys. * A VHDL hollow netlist ("machin.vbe") for blif2vst and Coriolis at large. Concerning the use in symbolic mode, we would need a symbolic abstract view of the SRAM block. This is not very complicated, but still needs a modicum of time. And as it has a bit complex interface than the I/O pads, I leave it to the initiative of Staf. And to use the Coriolis in full compliance we should also add a diode (dio_x0) to the symbolic library nsxlib. The layout integration is not completed yet, but in good way. Best,
(In reply to Jean-Paul.Chaput from comment #20) > Hello, > > I've commited d35e748 which provides correct block netlist integration. star. > > To integrate a block, asides from the layout you have to provide : > > * A Verilog blackbox netlist ("machin.v") for Yosys. i am investigating if there is an easy way for nmigen to apply user attributes to modules. this would do the same job. > * A VHDL hollow netlist ("machin.vbe") for blif2vst and Coriolis > at large. > > Concerning the use in symbolic mode, we would need a symbolic abstract > view of the SRAM block. otherwise the burden of even basic syntax checking falls entirely to you and Staf. > This is not very complicated, but still needs > a modicum of time. And as it has a bit complex interface than the > I/O pads, I leave it to the initiative of Staf. Staf i think i will assign some budget to this task, to help with that. > And to use the Coriolis in full compliance we should also add a diode > (dio_x0) to the symbolic library nsxlib. > > The layout integration is not completed yet, but in good way. super.
apparently this will do the trick: m.submodules.a.attrs["test"] = "value"
commit 800e4d580b833f1307bf447987a1bc3acf2515a4 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Feb 20 14:30:07 2021 +0000 add Wishbone-wrapped SPBlock_512W64B8W now this needs adding to ls180. once added i cannot simulate it (because it is an Instance), and i cannot P&R it because there is no Symbolic representation. great care has to be taken, therefore.
commit 362d5638d3c51a76bf42f140ab781af0ce58328b (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Feb 20 14:58:58 2021 +0000 add QTY 4of 4k SRAMs SPBlock512W64B8W to TestIssuer if enabled
commit 0cd474099a8106c81178c6ac1cd507737068d24d (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Feb 20 15:22:18 2021 +0000 add litex wishbone interconnect to 4x 4k SRAMs also had to add one more of the massive DFF 512 byte SRAMs in order to cover all the exception areas (0x900) without going into 4k SRAM area, which litex demands to be on an aligned boundary
https://git.libre-soc.org/?p=soclayout.git;a=commitdiff;h=342a89ebd25fa4c988826d01e1db0ff3d24387a0 commit 342a89ebd25fa4c988826d01e1db0ff3d24387a0 Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Feb 20 15:25:29 2021 +0000 add 4k sram build ok that's in. * the QTY 4of SPBlock512W64B8W instances are actually created in nmigen using Instance(), exposed via QTY 4of Wishbone Buses * QTY 4of Wishbone Buses are created by TestIssuer Verilog (make ls180_verilog) * litex libresoc/core.py "picks up" those QTY 4 Wishbone Buses * litex ls180soc.py actually connects those up onto the main litex interconnect bus. * make ls180 in soc/litex/florent/Makefile constructs the ilang file it's done this way because there's not a cat in hell's chance i'm going to modify or add to litex. i'm sure it's possible: it's just so devoid of debug-messages and error-catching that it's not worth the risk.
(In reply to Luke Kenneth Casson Leighton from comment #23) > commit 800e4d580b833f1307bf447987a1bc3acf2515a4 (HEAD -> master) > Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> > Date: Sat Feb 20 14:30:07 2021 +0000 > > add Wishbone-wrapped SPBlock_512W64B8W > > now this needs adding to ls180. once added i cannot simulate it (because it > is > an Instance), Maybe you can make the block with an option simulation=(False|True) so you can have a Wishbone wrapped Memory block during simulation ? > and i cannot P&R it because there is no Symbolic representation. As intermediary step, you should be able to do synthesis using for example nsxlib and then simulate the design post-synthesis using a VHDL or verilog model for the SRAM block.
(In reply to Staf Verhaegen from comment #27) > Maybe you can make the block with an option simulation=(False|True) so you > can have a Wishbone wrapped Memory block during simulation ? ah! i think there might be a way to detect "platform=" when running simulations. > > and i cannot P&R it because there is no Symbolic representation. > > As intermediary step, you should be able to do synthesis using for example > nsxlib and then simulate the design post-synthesis using a VHDL or verilog > model for the SRAM block. good point.
this was done last week, successfully simulated as well.
summary work: * cole - research into techniques for blackbox cells in yosys * staf - extra verification work on the selected SRAM size block * lkcl - integration as a blackbox into nmigen HDL
(In reply to Luke Kenneth Casson Leighton from comment #30) > summary work: > * cole - research into techniques for blackbox cells in yosys > * staf - extra verification work on the selected SRAM size block > * lkcl - integration as a blackbox into nmigen HDL I would say it both integration of SRAM in libre-SOC and extra verification I did.
Created attachment 131 [details] SRAM block spice simulation Here are the results of the verification I did on the SRAM block. I simulate four clock cycles: - cycle 1: Write 0 to address 0 - cycle 2: Write $FFFFFFFFFFFFFFFF to address 5 - cycle 3: Read address 0 - cycle 4: Read address 5 In the picture you can see the clk, d (=data_in) and we (=write-enable) signals on the top graph and clk and q (=data_out) on the bottom one. You can see the write through of written data in first 2 cycles and correct values read in the next two cycles. Also the clk->q delay is shown in the graph for typical corner. A value of almost 2ns is seen. This is without parasitics so with that included I think the clk->q will be more like 3-4ns meaning 200MHz is I think not a problem. All depends of course how much logic is after the SRAM. Unfortunately I had to use the proprietary Eldo spice simulator as ngspice did not find a DC solution after two days of simulation. Xyce could not read the TSMC SPICE models although that should be possible with some helper tools. The Eldo simulation finished in about 15 minutes.