21 – LPDDR3/LPDDR4 needed

Bug 21 - LPDDR3/LPDDR4 needed

Summary: LPDDR3/LPDDR4 needed

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Specification (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:	2
	Show dependency tree / graph

Reported:	2018-05-20 11:48 BST by Luke Kenneth Casson Leighton
Modified:	2020-02-24 17:22 GMT (History)
CC List:	5 users (show)

See Also:
NLnet milestone:	---
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2018-05-20 11:48:41 BST

http://libre-riscv.org/shakti/m_class/DDR/

Comment 1 Luke Kenneth Casson Leighton 2018-05-20 11:55:31 BST

found LPDDR3 PHY through edmund (thank you!)

* LPDDR3 PHY - BSD-licensed: USD $200k
* LPDDR3 PHY layout - USD $100k.
* LPRDDR3-to-LPDDR4 conversion - USD $300k

estimated time: 8-12 months as it is all analog and needs porting
to the target fab node.

edmund's team has MOSIS-like auto-generation of macro library cells:
no need to use foundry's proprietary cells.

Comment 2 Jacob Lifshay 2020-02-24 05:49:01 GMT

(In reply to Luke Kenneth Casson Leighton from comment #1)
> found LPDDR3 PHY through edmund (thank you!)
> 
> * LPDDR3 PHY - BSD-licensed: USD $200k

any source code links?

If we do go with the proprietary DDR PHY, for the test chip it might be a good idea to include even a partial implementation of a libre PHY (maybe with a separate power domain if we're worried about shorts) and include a pin-mux on the DDR pins, or just include both of them wired to separate pins, the extra pins shouldn't be too much of a concern for the test chip since the major cost is for masks, rather than die packaging.

For the actual DDR PHY, the only special (not standard digital) parts are:

* a VDD/2 power rail (can be provided by an external power supply)
* differential input amplifiers (some of which can be used to compare single-ended inputs with VDD/2).
* output drivers (basically high-current buffers with a high-impedance mode) -- may need series impedance matching resistors between drivers and pins.
* the memory clock PLLs (we need PLLs anyway for driving the other clock domains)
* termination resistors with transmission gates to enable/disable them
* a delay-locked-loop (DLL).

The only part that seems complex to design (ignoring the PLL, since we need one anyway) is the DLL, however we may be able to get away without a DLL if we can run the PLL at 2x the transfers-per-sec rate (4x or 8x the DDR clock -- icr which) and dynamically select between clocking on the positive or negative edge of that 2x clock. the 2x clock would only run to the flip-flops at the DDR inputs so shouldn't be a major power draw since it doesn't need to go off-chip.

If we turn out to need the DLL, we could try building an all-digital DLL by having:
* the variable delay element be a chain of buffers with a mux to select which point to get the signal from that chain. we could also have an xor gate to easily add another 180deg phase shift.
* the phase detector be a flip-flop with D attached to one input and clk to the other. we might need an additional synchronizing flip-flop (or two) on the output.
* the rest of the DLL could be an up/down counter that counts up if the phase detector signals "too-early" and counts down for "too-late", where the count selects the number of delay stages that the delay element enables.

we would need to design avoiding glitches, since those would mess-up the DLL output.

Comment 3 Luke Kenneth Casson Leighton 2020-02-24 08:23:18 GMT

(In reply to Jacob Lifshay from comment #2)


> any source code links?

no.  it can be *made* libre on receipt of funds.

> If we do go with the proprietary DDR PHY, for the test chip it might be a
> good idea to include even a partial implementation of a libre PHY (maybe
> with a separate power domain if we're worried about shorts) 

all DDR PHYs use separate power domains,  the voltage levels are different from core (and other IO).

> and include a
> pin-mux on the DDR pins, or just include both of them wired to separate
> pins, the extra pins shouldn't be too much of a concern for the test chip

it is far too much to do and far too costly (1000% more costly) to do a DDR PHY for the 180nm test chip, and 180nm cannot cope with DDR3 anyway.

SDRAM (133mhz max) is the limit there and it is doable.

Comment 4 Jacob Lifshay 2020-02-24 08:30:59 GMT

(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> > any source code links?
> 
> no.  it can be *made* libre on receipt of funds.

ah, missed that.

> > If we do go with the proprietary DDR PHY, for the test chip it might be a
> > good idea to include even a partial implementation of a libre PHY (maybe
> > with a separate power domain if we're worried about shorts) 
> 
> all DDR PHYs use separate power domains,  the voltage levels are different
> from core (and other IO).

I had meant a separate power domain from even the other DDR interface such that if there was a problem, it wouldn't prevent the chip from powering on.

> SDRAM (133mhz max) is the limit there and it is doable.

Yeah, makes sense.

Oh well, my ideas for the DLL might be useful for something else :)

Comment 5 Luke Kenneth Casson Leighton 2020-02-24 08:49:27 GMT

(In reply to Jacob Lifshay from comment #4)

> Oh well, my ideas for the DLL might be useful for something else :)

when we have large funding, yes.

SDRAM is asynchronous and is basically XT bus aka AT Bus aka 8080 MCU bus aka FlexBus aka IDE Bus aka PCMCIA bus aka CompactFlash Bus aka ONFI NAND Bus.

all of these are literally the same fundamental async bus all from the same era (IBM / Intel) using WEN REN CS# wires etc etc etc etc. it just got faster and hit a practical limit of around 133 mhz.

DDR2 went synchronous clock driven and that's when it got complicated.

Comment 6 Michael Nolan 2020-02-24 14:13:28 GMT

(In reply to Luke Kenneth Casson Leighton from comment #5)
> (In reply to Jacob Lifshay from comment #4)
> 
> > Oh well, my ideas for the DLL might be useful for something else :)
> 
> when we have large funding, yes.
> 
> SDRAM is asynchronous and is basically XT bus aka AT Bus aka 8080 MCU bus
> aka FlexBus aka IDE Bus aka PCMCIA bus aka CompactFlash Bus aka ONFI NAND
> Bus.
> 
> all of these are literally the same fundamental async bus all from the same
> era (IBM / Intel) using WEN REN CS# wires etc etc etc etc. it just got
> faster and hit a practical limit of around 133 mhz.
> 
> DDR2 went synchronous clock driven and that's when it got complicated.

Nit: both SDR SDRAM and DDR SDRAM are synchronous (that's the S in SDRAM).

On a more constructive note, would it be feasible to do DDR1 or DDR2 ourselves? IIRC the interface is a bit simpler:
 - Slower speed
 - No differential IOs (not countitng the clock) on DDR1 at least
 - Data signals are terminated with resistors to VCC not VCC/2

I think we'd still need a DLL or PLL, though Jacob's suggestion of using a PLL at 2x the transfer frequency would be easier here than for DDR3/4.

Comment 7 Luke Kenneth Casson Leighton 2020-02-24 14:24:54 GMT

(In reply to Michael Nolan from comment #6)

> Nit: both SDR SDRAM and DDR SDRAM are synchronous (that's the S in SDRAM).

ah appreciated the correction.
 
> On a more constructive note, would it be feasible to do DDR1 or DDR2
> ourselves?

not in the timescales we have.

l.

Comment 8 Luke Kenneth Casson Leighton 2020-02-24 14:44:11 GMT

(In reply to Luke Kenneth Casson Leighton from comment #7)
> (In reply to Michael Nolan from comment #6)
> > On a more constructive note, would it be feasible to do DDR1 or DDR2
> > ourselves?
> 
> not in the timescales we have.

plus, it's essential that we keep everything that we're doing in the
digital domain.

any analog work - PLLs, resistors - we can't afford to take the risk
(and for the resultant analog design component to be a massively
critical dependency, without which success jeapordises 100% of the
entire design)

and we don't have an allocated budget - or the time - to put it in
anyway.

october 2020.

that's 7 months in which to finish the core, and the instruction issue
engine, and the FPU, and the peripherals, and the pinmux, and the L1
cache, and, and, and and *then do the layout*.

Comment 9 Michael Nolan 2020-02-24 17:22:40 GMT

(In reply to Luke Kenneth Casson Leighton from comment #8)

> > not in the timescales we have.
> 
> plus, it's essential that we keep everything that we're doing in the
> digital domain.
> 
> any analog work - PLLs, resistors - we can't afford to take the risk
> (and for the resultant analog design component to be a massively
> critical dependency, without which success jeapordises 100% of the
> entire design)
> 
> and we don't have an allocated budget - or the time - to put it in
> anyway.
> 

Ah ok, thanks for the clarification