Hi,
We have had an issue similar to the linked topic wherein some of our boards have exhibited a consistent byte corruption.
Specifically, we have 5 16-bit wide DRAM chips configured (one reserved for 8-bit ECC) and for every 32-bytes of data read, the 7th byte (lower byte on chip 4) would experience spurious bit flips.
We came across the linked post and investigated the DXnLCDLRn registers and found that, between a "good" board and a "bad" board, we could see differences in R0DQSGD and R0DGSL for the byte lane experiencing the bit flipping (registers DX6LCDLR2 and DX6GTR). Modifying the value in R0DQSGD by increasing the value by 0x10 could induce the bit flipping to occur in more bytes (i.e. byte 15, byte 23, byte 31); while reducing the value by 0x10 would stabilise the bytes. We also saw stabilisation by setting these values on the "bad" board to those obtained from the "good" board.
Attached are annotated memory dumps of the DDR PHY registers of the "good" board and "bad" board. The orange highlighted values can be ignored as they are just where our tool has detected a live change while the window is open.
This is the "Good" board
This is the "Bad" board
We are particularly interested in the exact meaning and relationships between the highlighted registers and fields within those registers; as the datasheet isn't 100% clear as to how they relate to timings.
- The Write Levelling Delay field (DXnLCDLR0.R0WLD), in both cases, shows a general increase in value which mirrors the distance away from the SoC along the fly-by routed address lines.
- What does the field represent (generally) in terms of units? i.e. is it whole/half/quarter clock cycles, picoseconds/nanoseconds, etc.
- The Read DQS Gating Delay field (DXnLCDLR2.R0DQSGD), also follows a general increase, but seems to be linked with the DQS Gating System Latency field (DXnGTR.R0DGSL).
- The R0DGSL field is clearly documented as whole clock cycles of latency; but what does the R0DQSGD field represent in terms of units? i.e. as above, is it clock cycle or time related?
- During operation, the Master Delay Line register field MDL Delay (DXnMDLR.MDLD) was repeatedly being updated; which is understandable as the master delay line calibration was enabled.
- As above, does the MDL value have a specific unit of measurement?
- The values seemed to jump around quite a lot, what is the typical magnitude of the delay line corrections? e.g. typically a few LSBs; or as observed.
- The existence of the Master Delay line suggests that there is a global delay line which the Read/Write/Gating delays are related to.
- How does the Master Delay line relate to the other Delays? Is the Master Delay line like an initial delay common to all delays? Is it common to only some delays?
Additionally, we are considering the use of precalculated values for some of these calibrations fields.
- What is the recommended point to write these values? Pre- or Post-Initialisation?
- Should we run the calibration at initialisation if we are using these values?
- Will the PHY use the supplied values as the calibration starting point, or will it discard the values?
- What is the maximum magnitude of the deltas that the VT compensation can make to our supplied field values?
For determining stable values for our use case; are there any recommended procedures we can follow to determine these values?
Many thanks for taking the time to read through this post, I can provide some more information if necessary; I hope that we can find the answers for some of these questions.