Provide an SRAM with optimized regular (matrix like) placement instead of the Yosys generated (placed by the all purpose placer Etesian).
Evaluation results could be rebuild with: * coriolis commit #d294a770 * alliance-check-toolkit commit #1049f10 Provisional results ~~~~~~~~~~~~~~~~~~~ .. note:: All length are in micro-meters. +--------------+-----------------------------+-----------------------------+ | Kind | Generator | Yosys | +==============+=============================+=============================+ | # Gates | 23209 (-25.4%) | 32121 | +--------------+-----------------------------+-----------------------------+ | 1 Fold | +--------------+-----------------------------+-----------------------------+ | Area | 7182 x 330 (-5.5%) | 7380 x 340 | +--------------+-----------------------------+-----------------------------+ | Wirelength | 1841036 (-4.3%) | 1924153 | +--------------+-----------------------------+-----------------------------+ | 2 Fold | +--------------+-----------------------------+-----------------------------+ | Area | 3599 x 660 (-5.3%) | 3690 x 680 | +--------------+-----------------------------+-----------------------------+ | Wirelength | 1670455 (-6.3%) | 1782558 | +--------------+-----------------------------+-----------------------------+ | 4 Fold | +--------------+-----------------------------+-----------------------------+ | Area | 1812 x 1320 (-4.6%) | 1900 x 1320 | +--------------+-----------------------------+-----------------------------+ | Wirelength | 1699810 (-1.5%) | 1726436 | +--------------+-----------------------------+-----------------------------+ Conclusions that we can draw from those results are : 1. The generator version uses subtantially less gates than the Yosys one. As the both SRAM uses the exact same number of SFFs, the difference is only due to the decoder for the control of input and output muxes. 2. Notwithanding having less gates the generator version uses similar areas, which means that we use fewer but significantly *bigger* cells. 3. The FlexLib library supplied for SkyWater 130nm do not contains all SxLib one, effectively restricting our choices. In particular, to build the output multiplexer we only have mx2 and mx3 cells, which are large. The density of the SRAM could be much increased if we did have nmx2 and nmx3. We could also try to synthesise the tree using nandX and norX but we are short of time. Furthermore for the output multiplexers, as it is a controlled case, we may also uses three-state drivers cells (which have not been ported either). .. note:: Cell width in the SkyWater 130 port of FlexLib: ============== ===== Cell Width ============== ===== mx2_x2 7 mx3_x2 11 nand2_x0 2 nand3_x0 3 nand4_x0 4 nor2_x0 2 ============== ===== 1. mx2_x2 + mx3_x2 = 18 2. 9 * nand2_x0 = 18 3. 4 * nand3_x0 + nand4_x0 = 16 4. 6 * nand2_x0 + nor2_x0 = 14
Created attachment 171 [details] Plot of the SRAM 256x32, folded once
Created attachment 172 [details] Plot of the SRAM 256x32, folded twice
Evaluation results could be rebuild with: * coriolis commit #9594476a * alliance-check-toolkit commit #9eec8a0 Updated SRAM results ~~~~~~~~~~~~~~~~~~~~ Added results for the NAND2/NOR2 output multiplexer version. All the benchs have been run using the Google/SkyWater 130nm DK, with a port of Chips4Makers/Flexlib. The version using TSMC_C180 has also been done, but needs access to NDA to be run outside Sorbonne Université/LIP6. .. note:: All length are in micro-meters. +--------+--------------+-----------------------------+---------------------------+ | Arch | Kind | Generator | Yosys | +========+==============+=============================+===========================+ | Mux | # Gates | 23209 (-25.4%) | 32121 | +--------+--------------+-----------------------------+ | | Nao | # Gates | 34637 (+7.8%) | | +--------+--------------+-----------------------------+---------------------------+ | 1 Fold | +--------+--------------+-----------------------------+---------------------------+ | | Area | 7182 x 330 (-5.5%) | 7380 x 340 | | Mux +--------------+-----------------------------+---------------------------+ | | Wirelength | 1841036 (-4.3%) | 1924153 | +--------+--------------+-----------------------------+---------------------------+ | | Area | 6680 x 340 (-14.9%) | | | Nao +--------------+-----------------------------+ | | | Wirelength | 1637781 (-14.9%) | | +--------+--------------+-----------------------------+---------------------------+ | 2 Fold | +--------+--------------+-----------------------------+---------------------------+ | | Area | 3599 x 660 (-5.3%) | 3690 x 680 | | Mux +--------------+-----------------------------+---------------------------+ | | Wirelength | 1670455 (-6.3%) | 1782558 | +--------+--------------+-----------------------------+---------------------------+ | | Area | 3350 x 680 (-9.2%) | | | Nao +--------------+-----------------------------+ | | | Wirelength | 1548358 (-13.1%) | | +--------+--------------+-----------------------------+---------------------------+ | 4 Fold | +--------+--------------+-----------------------------+---------------------------+ | | Area | 1812 x 1320 (-4.6%) | 1900 x 1320 | | Mux +--------------+-----------------------------+---------------------------+ | | Wirelength | 1699810 (-1.5%) | 1726436 | +--------+--------------+-----------------------------+---------------------------+ | | Area | 1692 x 1360 (-8.2%) | | | Nao +--------------+-----------------------------+ | | | Wirelength | 1512107 (-12.4%) | | +--------+--------------+-----------------------------+---------------------------+ The difference between the two implementations resides only in the *output* multiplexer. With a 4 inputs mux made of mux2+mux3 or 2 inputs multiplexer made of alternate layers of nand2+nor2. Conclusions for the mux2+mux3 implementation : 1. The generator version uses subtantially less gates than the Yosys one. As the both SRAM uses the exact same number of SFFs, the difference is only due to the decoder for the control of input and output muxes. 2. Notwithanding having less gates the generator version uses similar areas, which means that we use fewer but significantly *bigger* cells. 3. The FlexLib library supplied for SkyWater 130nm do not contains all SxLib one, effectively restricting our choices. In particular, to build the output multiplexer we only have mx2 and mx3 cells, which are large. The density of the SRAM could be much increased if we did have nmx2 and nmx3. Furthermore for the output multiplexers, as it is a controlled case, we may also uses three-state drivers cells (which have not been ported either). Conclusion for the nand2+nor2 implementation: 1. The multiplexer allows us for a much more compact area and noticeably lesser wire length. With an increased number of cells (not an issue). 2. The total wire length is extremely sensitive to the placement, which in our case is just a column ordering. To optimize, the binary tree (for the netlist) is not placed fully symmetrically but slightly "askew".
Created attachment 173 [details] Plot of the SRAM 256x32, four fold. This is the final NAND2/NOR2 version. The PDF image was too big to be downloaded :-(