Bug 826 - Trial run of ethmac (freecores) layout.
Summary: Trial run of ethmac (freecores) layout.
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's second ASIC
Classification: Unclassified
Component: source code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 690
  Show dependency treegraph
 
Reported: 2022-04-30 14:42 BST by Jean-Paul Chaput
Modified: 2022-08-29 23:11 BST (History)
3 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jean-Paul Chaput 2022-04-30 14:42:04 BST
Make a layout of ethmac as a standalone block to evaluate multiple clock-trees strategies.

ethmac is taken from here:
    https://github.com/freecores/ethmac.git
Comment 1 Jean-Paul Chaput 2022-04-30 14:55:36 BST
The Verilog from Freecores/ethmac seems not be readable by Yosys.
Hang there:


     Yosys 0.12+23 (git sha1 UNKNOWN, gcc 11.2.1 -fPIC -Os)

    1. Executing Verilog-2005 frontend: ethmac.v
    Parsing Verilog input from `ethmac.v' to AST representation.
    make: *** [mk/synthesis-yosys.mk:53: ethmac.blif] Error 247

I'm setting it up as a standalone example in alliance-check-toolkit.
Do you want me to commit it right now?

There will also be likely questions about the implementation of the
FIFOs/SRAMs.
Comment 2 Jean-Paul Chaput 2022-04-30 16:19:31 BST
(In reply to Jean-Paul Chaput from comment #1)

With the following script, inspired from:

    https://git.libre-soc.org/?p=ls2.git;a=blob;f=simsoc.ys;h=a4adcefdcd7103aa29cc6578bdaf470b89e0f845;hb=0ed190756075447abdf96cb7e508e7ed92118236#l33

That is:

    yosys read_verilog  eth_clockgen.v
    yosys read_verilog  eth_cop.v
    yosys read_verilog  eth_crc.v
    yosys read_verilog  eth_fifo.v
    yosys read_verilog  eth_maccontrol.v
    yosys read_verilog  ethmac_defines.v
    yosys read_verilog  eth_macstatus.v
    yosys read_verilog  ethmac.v
    yosys read_verilog  eth_miim.v
    yosys read_verilog  eth_outputcontrol.v
    yosys read_verilog  eth_random.v
    yosys read_verilog  eth_receivecontrol.v
    yosys read_verilog  eth_registers.v
    yosys read_verilog  eth_register.v
    yosys read_verilog  eth_rxaddrcheck.v
    yosys read_verilog  eth_rxcounters.v
    yosys read_verilog  eth_rxethmac.v
    yosys read_verilog  eth_rxstatem.v
    yosys read_verilog  eth_shiftreg.v
    yosys read_verilog  eth_spram_256x32.v
    yosys read_verilog  eth_top.v
    yosys read_verilog  eth_transmitcontrol.v
    yosys read_verilog  eth_txcounters.v
    yosys read_verilog  eth_txethmac.v
    yosys read_verilog  eth_txstatem.v
    yosys read_verilog  eth_wishbone.v
    yosys read_verilog  timescale.v
    yosys hierarchy -check -top ethmac
    yosys synth            -top ethmac
    yosys memory
    yosys dfflibmap -liberty    FlexLib.lib
    yosys abc       -liberty    FlexLib.lib
    yosys clean
    yosys write_blif ethmac.blif

I can go a little further:

     Yosys 0.12+23 (git sha1 UNKNOWN, gcc 11.2.1 -fPIC -Os)

    1. Executing Verilog-2005 frontend: eth_clockgen.v
    Parsing Verilog input from `eth_clockgen.v' to AST representation.
    Generating RTLIL representation for module `\eth_clockgen'.
    Successfully finished Verilog frontend.

    2. Executing Verilog-2005 frontend: eth_cop.v
    Parsing Verilog input from `eth_cop.v' to AST representation.
    Generating RTLIL representation for module `\eth_cop'.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: Warning: System task `$display' outside initial block is unsupported.
    eth_cop.v:0: ERROR: System task `$stop' outside initial block is unsupported.


As I'm not fluent in Verilog, I cannot tell if it's a Yosys unsupported feature or
an outright Verilog error.
Comment 3 Luke Kenneth Casson Leighton 2022-04-30 17:11:55 BST
(In reply to Jean-Paul Chaput from comment #2)

>     eth_cop.v:0: ERROR: System task `$stop' outside initial block is
> unsupported.
> 
> 
> As I'm not fluent in Verilog, I cannot tell if it's a Yosys unsupported
> feature or
> an outright Verilog error.

https://github.com/freecores/ethmac/blob/master/rtl/verilog/eth_cop.v

it is for simulation purposes (icarus, verilator). $display and $stop
clearly will not work in an ASIC!  if you remove $stop you will
get further
Comment 4 Luke Kenneth Casson Leighton 2022-04-30 17:13:39 BST
(In reply to Jean-Paul Chaput from comment #1)

> I'm setting it up as a standalone example in alliance-check-toolkit.
> Do you want me to commit it right now?

sure, let's get it up and running.
Comment 5 Jean-Paul Chaput 2022-05-21 17:57:03 BST
Took a while to get it up and running. It triggered some annoying bugs that
I wanted to be completely solved before going any further.
All know errors in the router should now have been cleared.

Commited in Coriolis #c877d7e9 and alliance-check-toolkit #5fb4f50,
the ethmac base example. It is provided for both TSMC 180nm
(private use only) and SkyWater 130nm, for the general public.

This is the starting point from which I will start optimizing the
P&R of the block.
Comment 6 Luke Kenneth Casson Leighton 2022-05-21 19:20:18 BST
(In reply to Jean-Paul Chaput from comment #5)
> Took a while to get it up and running. It triggered some annoying bugs that
> I wanted to be completely solved before going any further.

interesting.

> This is the starting point from which I will start optimizing the
> P&R of the block.

adhoc clock tree, localisation of the parts connected to it?
be interesting to hear, also it occurs to me that maybe jtag_tck
could be treated similarly on ls180 as bigger test?
Comment 7 Jean-Paul Chaput 2022-05-21 21:24:57 BST
(In reply to Luke Kenneth Casson Leighton from comment #6)

> adhoc clock tree, localisation of the parts connected to it?
> be interesting to hear, also it occurs to me that maybe jtag_tck
> could be treated similarly on ls180 as bigger test?

  Yes. I will analyse to what block the clocks are connected.
  See if a manual placement of said block can help.

  Also will look at the data flow as we have huge buses and
  clearly bi-directional data-flow.

  Concerning the jtag_tck, that will depend on how many DFFs
  is it connected to and how widespread in the rest of the
  design they are.
Comment 8 Luke Kenneth Casson Leighton 2022-05-21 21:58:38 BST
(In reply to Jean-Paul Chaput from comment #7)

>   Yes. I will analyse to what block the clocks are connected.
>   See if a manual placement of said block can help.
> 
>   Also will look at the data flow as we have huge buses and
>   clearly bi-directional data-flow.

yes. all IO Pads. these are combinatorial muxes to re-route IO
for testing.

>   Concerning the jtag_tck, that will depend on how many DFFs
>   is it connected to and how widespread in the rest of the
>   design they are.

there will be a lot of Muxes onto the wishbone bus, i set that
to cut off the core in case things go wrong, but they should
not involve DFFs there.

basically, whilst information on the JTAG side comes from or
into ASync DFFs to cross over between tck and sysclk, signal
interception goes through *combinatorial* muxes.

you will see this (Clock-Domain-Crossing)

    jtag side signal -> DFF(tck) -> DFF(clk) -> clk controlled signal


you will NEVER see this:

    jtag side signal -> DFF(clk) -> clk controlled signal

or this:

    jtag side signal -> DFF(tck) -> clk controlled signal


this is something you will also see on eth_mac around the FIFOs, a
pair of DFFs chained together.
Comment 9 Staf Verhaegen 2022-05-23 09:02:40 BST
(In reply to Luke Kenneth Casson Leighton from comment #6)
> 
> also it occurs to me that maybe jtag_tck
> could be treated similarly on ls180 as bigger test?

The jtag_clk is indeed an interesting case as the boundary scan goes over the whole input signals. So although jtag_tck is not used in the core it needs to be distributed close to all the IO cells. The placer is also involved here as that will determine where exactly the logic is placed.

As the boundary scan is basically a big shift register one could also have a strategy for jtag_clk that distributes jtag in a circular way and not a tree. Typically this is done with the clock going in the opposite direction of the shift register to help for hold violations.

From timing point of view the max. operating frequency for jtag_tck can also be made lower than the core max. clock frequency.
Comment 10 Jean-Paul Chaput 2022-06-10 15:56:20 BST
After a first basic run with the Yosys generated SRAM, it appears that the SRAM takes up 42% of the area for the DFF only. If all the paraphernalia of address decoding and output muxing is added we should be close to 60%.

So, would it be possible to have a SRAM of 256 words of 32 bits,
conforming to the following interface:

entity cmpt_eth_spram_256x32 is
  port ( ce   : in bit 
       ; clk  : in bit 
       ; oe   : in bit 
       ; rst  : in bit 
       ; we   : in bit_vector(3 downto 0)
       ; addr : in bit_vector(7 downto 0)
       ; di   : in bit_vector(31 downto 0)
       ; dato : out bit_vector(31 downto 0)
       ; vdd  : in bit 
       ; vss  : in bit 
       );
end cmpt_eth_spram_256x32;

It would ensure a drastic area reduction.
Comment 11 Luke Kenneth Casson Leighton 2022-06-12 18:45:20 BST
removed from bug #850 and copied here to the appropriate bugreport


> * 256x32 SRAM for eth_mac

If that's supposed to hold a full ethernet packet, it's too small for the standard ethernet frame size:

it needs to be at least 1522 bytes if we don't want to support jumbo frames (the ethernet fields -- not just the payload -- are needed for full packet capture like for wireshark):
https://en.wikipedia.org/wiki/Ethernet_frame

if we want to support jumbo frames we'll need 9022 bytes:
https://en.wikipedia.org/wiki/Jumbo_frame
Comment 12 Luke Kenneth Casson Leighton 2022-06-12 18:57:12 BST
(In reply to Luke Kenneth Casson Leighton from comment #11)

> If that's supposed to hold a full ethernet packet, 

no. registers (and something called "BD", Buffer Descriptor, whatever
that is).

packets are transferred directly to/from FIFOs from/to memory
using a Wishbone Master interface.

in theory the SRAM could be made larger.
Comment 13 Luke Kenneth Casson Leighton 2022-07-12 14:29:22 BST
(In reply to Jean-Paul Chaput from comment #10)
> After a first basic run with the Yosys generated SRAM, it appears that the
> SRAM takes up 42% of the area for the DFF only. If all the paraphernalia of
> address decoding and output muxing is added we should be close to 60%.

as there is not an actual ASIC being manufactured this is not such a big concern.

> So, would it be possible to have a SRAM of 256 words of 32 bits,
> conforming to the following interface:

we are out of budget to do so, everything has been allocated.
Comment 14 Jean-Paul Chaput 2022-07-13 13:53:53 BST
(In reply to Luke Kenneth Casson Leighton from comment #13)
> (In reply to Jean-Paul Chaput from comment #10)
> > After a first basic run with the Yosys generated SRAM, it appears that the
> > SRAM takes up 42% of the area for the DFF only. If all the paraphernalia of
> > address decoding and output muxing is added we should be close to 60%.
> 
> as there is not an actual ASIC being manufactured this is not such a big
> concern.

  Yes and no... We won't do the ASIC but still plan to submit a
  mini-design to the Google/SkyWater MPW program. I preemptively
  reply to your question : yes the SkyWater I/O pads are too slow
  to run the ethmac at nominal speed. But we will try to run it slower
  just to check the whole design.
    On a more general side, I think that some people may not have
  access to SRAM optimized block and still rely on Yosys generated
  ones, so having a dedicated placer should be beneficial for the
  community at large.

> > So, would it be possible to have a SRAM of 256 words of 32 bits,
> > conforming to the following interface:

  I leave that up to Staf if he wants to still do it.