Bug 488 - Build test serdes on 180nm test chip for oct2020
Summary: Build test serdes on 180nm test chip for oct2020
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Hardware Layout (show other bugs)
Version: unspecified
Hardware: All All
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks:
 
Reported: 2020-09-10 16:50 BST by Jacob Lifshay
Modified: 2020-09-24 17:24 BST (History)
3 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2020-09-10 16:50:45 BST
I found a paper describing a voltage-controlled delay element with differential inputs and outputs that can operate at about 100ps delay at 350nm and about 10ps delay at 28nm, I was thinking that we might want to try building a PLL where the VCO has 4 outputs all 90deg apart based on 4, 8, or 12 of those delay elements connected in a ring oscillator where one of them is wired in an inverting configuration by swapping output wires. We could then build a high-speed serdes based on the VCO's 4 outputs triggering 4 D-FFs with the D input wired to the high-speed input data for the deserializer and with a similar set of 4 FFs with Q and not-Q wired up to a pile of and-or gates that build a differential-signal-based (for equal propagation delay) 4-input xor gate where the output is the serialized signal.

http://jultika.oulu.fi/files/nbnfi-fe2018121851276.pdf

If we have time, I think we should try to build those circuits on the 180nm october tapeout and see how fast we can run it, how much power it takes, how much area it takes, etc. I'd estimate the required area to be much less than 1/10 mm^2 and the required power when running to be in the range of 200mW. I'd guess we can achieve somewhere around 5-20Gb/s over one wire in 180nm.

The frequency can be reduced to near zero by reducing the delay elements' control voltage to near zero, so we won't need to add and gates in the delay loop or something.
Comment 1 Jacob Lifshay 2020-09-10 17:13:56 BST
If you like I can try to build an example schematic in circuitjs
Comment 2 Jacob Lifshay 2020-09-10 19:28:45 BST
after some experimenting in circuitjs, that particular delay cell will probably need a circuit to help keep the two wires of the differential signal in sync, since the simulation I was playing with ended up with both wires as high or both wires as low relatively often.
Comment 3 Staf Verhaegen 2020-09-10 19:36:40 BST
Prof. Dimitri Galayko @ LIP6 is already working on a PLL for the tape-out including a VCO.
I don't see how you guys could do an analog design for the October tape-out as you don't have access to the PDK. Also the paper is just a simulation excercise which has not been verified in silicon.
I would also think there is enough to do on the digital side for the Power core + GPU extension for the October tape-out.
Comment 4 Jacob Lifshay 2020-09-10 21:24:49 BST
(In reply to Staf Verhaegen from comment #3)
> Prof. Dimitri Galayko @ LIP6 is already working on a PLL for the tape-out
> including a VCO.

IIRC that particular VCO design doesn't have 4 output phases and isn't designed to run at 10GHz since our processor core won't ever be running that fast.

What I wanted to do is more of building and testing the digital logic for the serdes rather than focusing on the VCO -- I figured that since we're taping out a test chip anyway, if we have time, we might as well put an experimental serdes design on there to see if it might work and what speeds it works at. That could influence our decision for if we should include a similar serdes running at 50Gbaud for OMI on 40nm/28nm with a VCO designed by someone with more experience.


> I don't see how you guys could do an analog design for the October tape-out
> as you don't have access to the PDK.

That's true, however, if it doesn't take much work to do the digital side of the serdes, the only non-standard cells needed would be the capacitor for storing the control voltage, a much smaller capacitor for the charge pump for adjusting the control voltage, and the variable delay circuit. I'd guess that the capacitors aren't very much work, and the delay circuit might take a day or two for you to draw.

> Also the paper is just a simulation
> excercise which has not been verified in silicon.

True, if it doesn't work it most likely won't affect the test chip much, all you do is tell the cpu not to access that particular peripheral and use a transmission gate to short the control voltage capacitor to ground, causing the experimental VCO to stop.

If it does work, it would be great evidence that the 50Gbaud serdes would probably work on 40nm/28nm.

> I would also think there is enough to do on the digital side for the Power
> core + GPU extension for the October tape-out.

The october tape-out probably won't actually have any GPU extensions since those aren't even out of the ISA design stage yet.

The Power core is mostly working on a FPGA, so the stuff that's left for the oct tapeout is adding a MMU and more peripherals IIRC.
Comment 5 Jacob Lifshay 2020-09-11 02:37:21 BST
I added PLL with 8-phase outputs and serializer/deserializer designs to the wiki:
https://libre-soc.org/resources/high-speed-serdes-in-circuitjs/
Comment 6 Staf Verhaegen 2020-09-11 13:34:19 BST
(In reply to Jacob Lifshay from comment #4)
> (In reply to Staf Verhaegen from comment #3)

> > I don't see how you guys could do an analog design for the October tape-out
> > as you don't have access to the PDK.
> 
> That's true, however, if it doesn't take much work to do the digital side of
> the serdes, the only non-standard cells needed would be the capacitor for
> storing the control voltage, a much smaller capacitor for the charge pump
> for adjusting the control voltage, and the variable delay circuit. I'd guess
> that the capacitors aren't very much work, and the delay circuit might take
> a day or two for you to draw.

You also have the two switches in phase frequency detector; all these have to be properly designed and layouted with proper scaling to get right filtering response and speed and with good stability.
It's not a small job and even it is a small job I don't see how you could do it without access to the PDK.
Main problem is that at the high frequencies the parasitic resistances and capacitances of the actual layout and the interconnects become important. Again I don't see how you could do a design taking that into account without access to the PDK.

I would say such an exercise would be much better fit for Sky130 where they (plan to) make the needed information to do the design.

> 
> > Also the paper is just a simulation
> > excercise which has not been verified in silicon.
> 
> True, if it doesn't work it most likely won't affect the test chip much, all
> you do is tell the cpu not to access that particular peripheral and use a
> transmission gate to short the control voltage capacitor to ground, causing
> the experimental VCO to stop.
> 
> If it does work, it would be great evidence that the 50Gbaud serdes would
> probably work on 40nm/28nm.

How would you test if the design would or wouldn't work ? For such design often the design of the test circuit is as involved if not more involved than designing the circuit itself.
You can't simply bring high frequency signals out as output as that will always have too much capacitive load on the output signal.
Comment 7 Jacob Lifshay 2020-09-11 15:30:11 BST
(In reply to Staf Verhaegen from comment #6)
> (In reply to Jacob Lifshay from comment #4)
> > (In reply to Staf Verhaegen from comment #3)
> 
> > > I don't see how you guys could do an analog design for the October tape-out
> > > as you don't have access to the PDK.
> > 
> > That's true, however, if it doesn't take much work to do the digital side of
> > the serdes, the only non-standard cells needed would be the capacitor for
> > storing the control voltage, a much smaller capacitor for the charge pump
> > for adjusting the control voltage, and the variable delay circuit. I'd guess
> > that the capacitors aren't very much work, and the delay circuit might take
> > a day or two for you to draw.
> 
> You also have the two switches in phase frequency detector; all these have
> to be properly designed and layouted with proper scaling to get right
> filtering response and speed and with good stability.
> It's not a small job and even it is a small job I don't see how you could do
> it without access to the PDK.

use the smallest 3-state buffer as the charge pump and the smallest transmission gate set to always on between the control voltage capacitor and the charge pump and a bigger-than-necessary control voltage capacitor, even though the response time is really slow it will still lock.

 +---------+
 | PF det. |
 +---------+
      |   |
    +---+ |
     \ /--+
      V
      | <- could insert more transmission gates here
      |
      +---- to VCO
      |
    +---+
     \ /
Vdd --XO-- Vss
     / \
    +---+
      |
      |
   ---+---

   ---+---
      |
     Vss
   
    

> Main problem is that at the high frequencies the parasitic resistances and
> capacitances of the actual layout and the interconnects become important.
> Again I don't see how you could do a design taking that into account without
> access to the PDK.

do the approximate layout in a way where those are taken into account for the typical PDK, e.g. layout the VCO's delay loop such that it loops back on itself half way through, so the end is right next to the beginning and is likely to not have excessive parasitics.

> 
> I would say such an exercise would be much better fit for Sky130 where they
> (plan to) make the needed information to do the design.

IIRC they already have that information available in the git repos for the cells.
> 
> > 
> > > Also the paper is just a simulation
> > > excercise which has not been verified in silicon.
> > 
> > True, if it doesn't work it most likely won't affect the test chip much, all
> > you do is tell the cpu not to access that particular peripheral and use a
> > transmission gate to short the control voltage capacitor to ground, causing
> > the experimental VCO to stop.
> > 
> > If it does work, it would be great evidence that the 50Gbaud serdes would
> > probably work on 40nm/28nm.
> 
> How would you test if the design would or wouldn't work ?

Test the VCOs by running their output through a long enough divider chain that it doesn't switch too fast for the cpu to measure the divided result and/or send the divided output to a pin.

the PLL reference frequency can be adjusted by adjusting whatever clock source is used or just bring that to an external pin. in the circuitjs example, the reference frequency is 100MHz IIRC, which should be doable. if not, the PLL can be adjusted to use a lower reference frequency.

Test the serializer output by having one or more d-ff with the clock taken from a source where the timing can be adjusted in fine increments, e.g. an external pin hooked to a high-resolution signal generator. the D inputs are connected to the serialized data. this allows fast sampling of the serialized data.

have the serializer output connected to the deserializer input with the deserializer selectable between using its own PLL and using the serializer's PLL. test if sending input to the serializer gives the same output from the deserializer.

have muxes between the serdes PLLs and the digital logic, allow switching the clock source to a johnson counter for testing operation at low speeds.

> For such design
> often the design of the test circuit is as involved if not more involved
> than designing the circuit itself.
> You can't simply bring high frequency signals out as output as that will
> always have too much capacitive load on the output signal.

true, hence the above workarounds.
Comment 8 Staf Verhaegen 2020-09-24 16:36:25 BST
We had a meeting with parties involved for the .18um libre-soc prototype tape-out (see https://bugs.libre-soc.org/show_bug.cgi?id=138#c12)
Unfortunately we did not see a possibility to account for this request in the already strict tape-out roadmap.
So we should see what is best way to proceed with this. Sky130 or maybe a new .18um TSMC are possibilities. The former should be free if you are accepted on the run, for the latter we need to find funding. As area of the prototype is reduced part of that money way be repurposed but that need to be discussed with NLNet.
Comment 9 Luke Kenneth Casson Leighton 2020-09-24 17:24:43 BST
just spotted this:
https://groups.google.com/g/skywater-pdk-users/c/A_58XfdGlMU/m/FMgsYeS0CAAJ