Bug 951 - Standard Cell based SRAM for ethmac
Summary: Standard Cell based SRAM for ethmac
Alias: None
Product: Libre-SOC's second ASIC
Classification: Unclassified
Component: source code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Jean-Paul Chaput
Depends on:
Reported: 2022-10-14 09:54 BST by Jean-Paul Chaput
Modified: 2022-10-30 21:23 GMT (History)
2 users (show)

See Also:
NLnet milestone: NGI.POINTER.Gigabit.ASIC
total budget (EUR) for completion of task and all subtasks: 5000
budget (EUR) for this task, excluding subtasks' budget: 5000
parent task for budget allocation: 889
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Plot of the SRAM 256x32, folded once (151.82 KB, application/pdf)
2022-10-14 10:03 BST, Jean-Paul Chaput
Plot of the SRAM 256x32, folded twice (502.18 KB, application/pdf)
2022-10-14 10:04 BST, Jean-Paul Chaput
Plot of the SRAM 256x32, four fold. (159.46 KB, image/png)
2022-10-17 16:32 BST, Jean-Paul Chaput

Note You need to log in before you can comment on or make changes to this bug.
Description Jean-Paul Chaput 2022-10-14 09:54:04 BST
Provide an SRAM with optimized regular (matrix like) placement instead of the Yosys generated (placed by the all purpose placer Etesian).
Comment 1 Jean-Paul Chaput 2022-10-14 10:00:09 BST
Evaluation results could be rebuild with:

* coriolis commit #d294a770
* alliance-check-toolkit commit #1049f10

Provisional results

.. note:: All length are in micro-meters.

| Kind         | Generator                   | Yosys                       |
| # Gates      | 23209      (-25.4%)         | 32121                       |
|                                  1 Fold                                  |
| Area         | 7182 x 330  (-5.5%)         | 7380 x 340                  |
| Wirelength   | 1841036     (-4.3%)         | 1924153                     |
|                                  2 Fold                                  |
| Area         | 3599 x 660  (-5.3%)         | 3690 x 680                  |
| Wirelength   | 1670455     (-6.3%)         | 1782558                     |
|                                  4 Fold                                  |
| Area         | 1812 x 1320 (-4.6%)         | 1900 x 1320                 |
| Wirelength   | 1699810     (-1.5%)         | 1726436                     |

Conclusions that we can draw from those results are :

1. The generator version uses subtantially less gates than the Yosys one.
   As the both SRAM uses the exact same number of SFFs, the difference is
   only due to the decoder for the control of input and output muxes.

2. Notwithanding having less gates the generator version uses similar areas,
   which means that we use fewer but significantly *bigger* cells.

3. The FlexLib library supplied for SkyWater 130nm do not contains all
   SxLib one, effectively restricting our choices.

   In particular, to build the output multiplexer we only have mx2 and
   mx3 cells, which are large. The density of the SRAM could be much
   increased if we did have nmx2 and nmx3. We could also try to synthesise
   the tree using nandX and norX but we are short of time.

   Furthermore for the output multiplexers, as it is a controlled case,
   we may also uses three-state drivers cells (which have not been
   ported either).

.. note:: Cell width in the SkyWater 130 port of FlexLib:

          ==============  =====
          Cell            Width
          ==============  =====
          mx2_x2          7
          mx3_x2          11
          nand2_x0        2
          nand3_x0        3
          nand4_x0        4
          nor2_x0         2
          ==============  =====

          1. mx2_x2 + mx3_x2         = 18
          2. 9 * nand2_x0            = 18
          3. 4 * nand3_x0 + nand4_x0 = 16
          4. 6 * nand2_x0 + nor2_x0  = 14
Comment 2 Jean-Paul Chaput 2022-10-14 10:03:23 BST
Created attachment 171 [details]
Plot of the SRAM 256x32, folded once
Comment 3 Jean-Paul Chaput 2022-10-14 10:04:06 BST
Created attachment 172 [details]
Plot of the SRAM 256x32, folded twice
Comment 4 Jean-Paul Chaput 2022-10-17 16:29:05 BST
Evaluation results could be rebuild with:

* coriolis commit #9594476a
* alliance-check-toolkit commit #9eec8a0

Updated SRAM results

Added results for the NAND2/NOR2 output multiplexer version.

All the benchs have been run using the Google/SkyWater 130nm DK, with a port
of Chips4Makers/Flexlib. The version using TSMC_C180 has also been done, but
needs access to NDA to be run outside Sorbonne Université/LIP6.

.. note:: All length are in micro-meters.

| Arch   | Kind         | Generator                   | Yosys                     |
|  Mux   | # Gates      | 23209      (-25.4%)         | 32121                     |
+--------+--------------+-----------------------------+                           |
|  Nao   | # Gates      | 34637      (+7.8%)          |                           |
|                                       1 Fold                                    |
|        | Area         | 7182 x 330  (-5.5%)         | 7380 x 340                |
|  Mux   +--------------+-----------------------------+---------------------------+
|        | Wirelength   | 1841036     (-4.3%)         | 1924153                   |
|        | Area         | 6680 x 340  (-14.9%)        |                           |
|  Nao   +--------------+-----------------------------+                           |
|        | Wirelength   | 1637781     (-14.9%)        |                           |
|                                       2 Fold                                    |
|        | Area         | 3599 x 660  (-5.3%)         | 3690 x 680                |
|  Mux   +--------------+-----------------------------+---------------------------+
|        | Wirelength   | 1670455     (-6.3%)         | 1782558                   |
|        | Area         | 3350 x 680  (-9.2%)         |                           |
|  Nao   +--------------+-----------------------------+                           |
|        | Wirelength   | 1548358     (-13.1%)        |                           |
|                                       4 Fold                                    |
|        | Area         | 1812 x 1320 (-4.6%)         | 1900 x 1320               |
|  Mux   +--------------+-----------------------------+---------------------------+
|        | Wirelength   | 1699810     (-1.5%)         | 1726436                   |
|        | Area         | 1692 x 1360 (-8.2%)         |                           |
|  Nao   +--------------+-----------------------------+                           |
|        | Wirelength   | 1512107     (-12.4%)        |                           |

The difference between the two implementations resides only in the *output*
multiplexer. With a 4 inputs mux made of mux2+mux3 or 2 inputs multiplexer
made of alternate layers of nand2+nor2.

Conclusions for the mux2+mux3 implementation :

1. The generator version uses subtantially less gates than the Yosys one.
   As the both SRAM uses the exact same number of SFFs, the difference is
   only due to the decoder for the control of input and output muxes.

2. Notwithanding having less gates the generator version uses similar areas,
   which means that we use fewer but significantly *bigger* cells.

3. The FlexLib library supplied for SkyWater 130nm do not contains all
   SxLib one, effectively restricting our choices.

   In particular, to build the output multiplexer we only have mx2 and
   mx3 cells, which are large. The density of the SRAM could be much
   increased if we did have nmx2 and nmx3.

   Furthermore for the output multiplexers, as it is a controlled case,
   we may also uses three-state drivers cells (which have not been
   ported either).

Conclusion for the nand2+nor2 implementation:

1. The multiplexer allows us for a much more compact area and noticeably
   lesser wire length. With an increased number of cells (not an issue).

2. The total wire length is extremely sensitive to the placement, which
   in our case is just a column ordering. To optimize, the binary tree
   (for the netlist) is not placed fully symmetrically but slightly
Comment 5 Jean-Paul Chaput 2022-10-17 16:32:53 BST
Created attachment 173 [details]
Plot of the SRAM 256x32, four fold.

This is the final NAND2/NOR2 version.

The PDF image was too big to be downloaded :-(