The current implementation of gram is bistable, with a working state and a non-working state, randomly entered at FPGA clock tree reset. If the FPGA enters the working state, memory access are reliable until another reset (or reprogram) is issued, at which point the device has a 50% chance to enter the non-working state. Similarly, in the non-working state the device has a 50% change of entering the working state on reset / reprogram. After significant debugging effort, the root cause of this has been located. In an nutshell, the DDR3 interface blocks require two aligned clocks, the ECLK (DDR clock) and SCLK (SDR clock). In the working state, the transmitter blocks are aligned with the receiver blocks, and in the non-working state they are misaligned by 1T (1 ECLK period, or 1/2 SCLK period). The ECP5 relies on a single master reset wire, shared among all DDR primitives in a specific logical controller, to synchronize the interface blocks at startup. While this reset wire has been plumbed to the data I/O blocks, it is absent from the address / command blocks where it is hardwired by nmigen to 0. This means the address / command generator can be up to 1/2 SCLK out of alignment with the rest of the system, and this unwanted phase delay is randomly applied at startup from analog stochastic behavior.
(In reply to tpearson from comment #0) > The ECP5 relies on a single master reset wire, shared among all DDR > primitives in a specific logical controller, to synchronize the interface > blocks at startup. While this reset wire has been plumbed to the data I/O > blocks, it is absent from the address / command blocks where it is hardwired > by nmigen to 0. urrr. do you happen to know of any other implementation thst gets this right? if i have something to work from i can take a look.
(In reply to Luke Kenneth Casson Leighton from comment #1) > (In reply to tpearson from comment #0) > > > The ECP5 relies on a single master reset wire, shared among all DDR > > primitives in a specific logical controller, to synchronize the interface > > blocks at startup. While this reset wire has been plumbed to the data I/O > > blocks, it is absent from the address / command blocks where it is hardwired > > by nmigen to 0. > > urrr. > > do you happen to know of any other implementation thst gets this right? > if i have something to work from i can take a look. LiteDRAM gets it right: https://github.com/enjoy-digital/litedram/blob/15f7ba27138367f21832e5c00e7882db8a6fab54/litedram/phy/ecp5ddrphy.py#L229
(In reply to tpearson from comment #2) > LiteDRAM gets it right: > > https://github.com/enjoy-digital/litedram/blob/ > 15f7ba27138367f21832e5c00e7882db8a6fab54/litedram/phy/ecp5ddrphy.py#L229 got it. i know what to do now. https://gitlab.com/nmigen/nmigen/-/issues/2
part of the solution here is to take the rather drastic but necessary step of altering the nmigen API by adding a reset line to the Pin data structure. there's really no other way to get down to the DDR Instances with the reset signal needed. whilst investigating i noticed that the assumption that the reset pads are "straight" (xdr=1) is wrong: they're also supposed to be 4x phased (xdr=4), which is quite fascinating, it means that rising and falling edge *reset* lines do different things inside the DRAM IC. i've now wired up all the DRAM pad resets to ResetSignal("dramsync") which in theory should start to get stability. an early check showed that yes they were locking much more often. making rst xdr=4 should also help, it meant that 1/2 the IOpads were not being properly reset at all (hm)