934 – Evaluation of optimized placement for Yosys generated SRAM

Bug 934 - Evaluation of optimized placement for Yosys generated SRAM

Summary: Evaluation of optimized placement for Yosys generated SRAM

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's second ASIC
Classification:	Unclassified
Component:	source code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:	889
	Show dependency tree / graph

Reported:	2022-09-22 09:58 BST by Jean-Paul Chaput
Modified:	2022-10-30 21:22 GMT (History)
CC List:	1 user (show)

See Also:
NLnet milestone:	NGI.POINTER.Gigabit.ASIC
total budget (EUR) for completion of task and all subtasks:	2000
budget (EUR) for this task, excluding subtasks' budget:	2000
parent task for budget allocation:	889
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	jean-paul=2000

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jean-Paul Chaput 2022-09-22 09:58:22 BST

A SRAM has intrinsically a matrix-like structure. Letting it placed by the
regular placer results in a medium loss of area, but more importantly, much 
longer wiring. This increase of wiring is likely to make the overall design
much more difficult to route. Not even mentioning timing issues.

Comment 1 Jean-Paul Chaput 2022-09-22 10:13:41 BST

Evaluation results could be rebuild with:

* coriolis commit #7d31d6c4
* alliance-check-toolkit commit #d389964d

This is a copy of the results given in the Cumulus plugin sramplacer2.py


Automatic placement of a Yosys generated SRAM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* We were expecting the output decoder to be the same for each bit
  line, allowing us to rebuild a matrix-like placement. This is not so.
  Each output mux equation is synthesized differently. Knowing that
  we did create a row-based placement, with reordering capabilities
  so we can optimize the mux placement.

* Alas, the previous effort was doomed from the start. If you have
  the same multiplexing function for all the bits, the command signals
  from the decoder are the same. For example, to mux 256 words,
  assuming we use only mux2, we need 8 bits (control lines).
  Given that we have also to take into account "ce", "we", "rst"
  and "oe", there are more of them, but not so much. Let's say 20.
 
    When running placeSRAM and looking at the last level (5) of the
  DAG's decoder, we see that it contains 832 gates, which means as
  much command signals. That is 26 control signals *per* bit.
  This is the direct consequence that *each* multiplexer has it's
  own structure. 26 signals takes up more than half the horizontal
  routing capacity of a slice (40), this result in an unroutable
  design, the bits are kept into one row each.
    832 gates is for the TSMC 180nm, for SkyWater 130nm we got
  976 gates on the third level.

Conclusions
~~~~~~~~~~~

1. A Yosys generated SRAM cannot be regularly  placed, neither in
   2-D matrix fashion nor in simple bit-line organization.

2. Worse, a thorough analysis of the generated netlist shows it is
   highly sub-optimal. Yosys generate *way* too much signals to
   achieve it, resulting in a bloated design.

3. Creating a small generator of SRAM, even based on standard cells
   would be a great improvement over the Yosys generated one.
   (the simpler OpenRAM approach)

Looking backward, as we were using Yosys generated SRAM in the LibreSOC,
that explain lot of the observed congestion.

Comment 2 Jean-Paul Chaput 2022-10-14 09:45:56 BST

Mistakenly put here, put in a separate item, see bug #951.