A SRAM has intrinsically a matrix-like structure. Letting it placed by the
regular placer results in a medium loss of area, but more importantly, much
longer wiring. This increase of wiring is likely to make the overall design
much more difficult to route. Not even mentioning timing issues.
Evaluation results could be rebuild with:
* coriolis commit #7d31d6c4
* alliance-check-toolkit commit #d389964d
This is a copy of the results given in the Cumulus plugin sramplacer2.py
Automatic placement of a Yosys generated SRAM
* We were expecting the output decoder to be the same for each bit
line, allowing us to rebuild a matrix-like placement. This is not so.
Each output mux equation is synthesized differently. Knowing that
we did create a row-based placement, with reordering capabilities
so we can optimize the mux placement.
* Alas, the previous effort was doomed from the start. If you have
the same multiplexing function for all the bits, the command signals
from the decoder are the same. For example, to mux 256 words,
assuming we use only mux2, we need 8 bits (control lines).
Given that we have also to take into account "ce", "we", "rst"
and "oe", there are more of them, but not so much. Let's say 20.
When running placeSRAM and looking at the last level (5) of the
DAG's decoder, we see that it contains 832 gates, which means as
much command signals. That is 26 control signals *per* bit.
This is the direct consequence that *each* multiplexer has it's
own structure. 26 signals takes up more than half the horizontal
routing capacity of a slice (40), this result in an unroutable
design, the bits are kept into one row each.
832 gates is for the TSMC 180nm, for SkyWater 130nm we got
976 gates on the third level.
1. A Yosys generated SRAM cannot be regularly placed, neither in
2-D matrix fashion nor in simple bit-line organization.
2. Worse, a thorough analysis of the generated netlist shows it is
highly sub-optimal. Yosys generate *way* too much signals to
achieve it, resulting in a bloated design.
3. Creating a small generator of SRAM, even based on standard cells
would be a great improvement over the Yosys generated one.
(the simpler OpenRAM approach)
Looking backward, as we were using Yosys generated SRAM in the LibreSOC,
that explain lot of the observed congestion.
Mistakenly put here, put in a separate item, see bug #951.