AM5K2E04: DDR3 DXnLCDLRn and associated Registers, field interpretation

Daniel Moran

Part Number: AM5K2E04

Hi,

We have had an issue similar to the linked topic wherein some of our boards have exhibited a consistent byte corruption.

Specifically, we have 5 16-bit wide DRAM chips configured (one reserved for 8-bit ECC) and for every 32-bytes of data read, the 7th byte (lower byte on chip 4) would experience spurious bit flips.

We came across the linked post and investigated the DXnLCDLRn registers and found that, between a "good" board and a "bad" board, we could see differences in R0DQSGD and R0DGSL for the byte lane experiencing the bit flipping (registers DX6LCDLR2 and DX6GTR). Modifying the value in R0DQSGD by increasing the value by 0x10 could induce the bit flipping to occur in more bytes (i.e. byte 15, byte 23, byte 31); while reducing the value by 0x10 would stabilise the bytes. We also saw stabilisation by setting these values on the "bad" board to those obtained from the "good" board.

Attached are annotated memory dumps of the DDR PHY registers of the "good" board and "bad" board. The orange highlighted values can be ignored as they are just where our tool has detected a live change while the window is open.

This is the "Good" board

This is the "Bad" board

We are particularly interested in the exact meaning and relationships between the highlighted registers and fields within those registers; as the datasheet isn't 100% clear as to how they relate to timings.

The Write Levelling Delay field (DXnLCDLR0.R0WLD), in both cases, shows a general increase in value which mirrors the distance away from the SoC along the fly-by routed address lines.
1. What does the field represent (generally) in terms of units? i.e. is it whole/half/quarter clock cycles, picoseconds/nanoseconds, etc.
The Read DQS Gating Delay field (DXnLCDLR2.R0DQSGD), also follows a general increase, but seems to be linked with the DQS Gating System Latency field (DXnGTR.R0DGSL).
1. The R0DGSL field is clearly documented as whole clock cycles of latency; but what does the R0DQSGD field represent in terms of units? i.e. as above, is it clock cycle or time related?
During operation, the Master Delay Line register field MDL Delay (DXnMDLR.MDLD) was repeatedly being updated; which is understandable as the master delay line calibration was enabled.
1. As above, does the MDL value have a specific unit of measurement?
2. The values seemed to jump around quite a lot, what is the typical magnitude of the delay line corrections? e.g. typically a few LSBs; or as observed.
The existence of the Master Delay line suggests that there is a global delay line which the Read/Write/Gating delays are related to.
1. How does the Master Delay line relate to the other Delays? Is the Master Delay line like an initial delay common to all delays? Is it common to only some delays?

Additionally, we are considering the use of precalculated values for some of these calibrations fields.

What is the recommended point to write these values? Pre- or Post-Initialisation?
Should we run the calibration at initialisation if we are using these values?
1. Will the PHY use the supplied values as the calibration starting point, or will it discard the values?
What is the maximum magnitude of the deltas that the VT compensation can make to our supplied field values?

For determining stable values for our use case; are there any recommended procedures we can follow to determine these values?

Many thanks for taking the time to read through this post, I can provide some more information if necessary; I hope that we can find the answers for some of these questions.

over 1 year ago

0 kcastille over 1 year ago

TI__Guru 54452 points

Hello,

Sorry for the delay ... Can you summarize your status or progress since this post?

Can you also summarize project status? Is this a new design? Or previously in production? Any recent changes?

Thanks,

Kyle

0 Daniel Moran over 1 year ago in reply to kcastille

Prodigy 55 points

Hello Kyle,

Sorry for the delay in responding.

We have multiple boards, and multiple instances of those boards. Our hardware team will be making every effort to keep the same (or as close to the same) DDR tracking layout between the different board types. We are currently in the middle-late prototyping stages with one, maybe two, revisions until our design is fixed.

We saw this issue when we had a second batch of an existing design produced; we think it may have been a combination of a marginally stable design and variations in manufacture which resulted in the first batch working, and the second batch being unstable.

Our current provisional plan is to write a set of known-good values to the aforementioned registers; these values being identical across all board types and instances. The values would be obtained by observation of a "stable" autoconfiguration.

Because we do not fully understand these registers and their purpose; we do not have full confidence in this provisional plan. Could you provide some guidance in line with questions asked in the original post?

Many thanks

Daniel

0 Kevin S over 1 year ago

TI__Mastermind 38510 points

Hi,

Daniel Moran said:
The existence of the Master Delay line suggests that there is a global delay line which the Read/Write/Gating delays are related to.

How does the Master Delay line relate to the other Delays? Is the Master Delay line like an initial delay common to all delays? Is it common to only some delays?

The master delay line is used to track VT variation. This information is then used to adjust the settings of the other delay lines.

Daniel Moran said:
What does the field represent (generally) in terms of units? i.e. is it whole/half/quarter clock cycles, picoseconds/nanoseconds, etc.

Generally speaking, delay lines will be made up of delay elements. Typically, the time for the signal to propagate through the delay element would be significantly less than the clock period such that the signal could be delayed by fractions of a clock cycle. Then to achieve the desired delay, some number of delay elements can be "turned on".

Generally, delay lines would be "calibrated" to determine how many delay elements are equivalent to 1 clock cycle. Based on this information, the PHY or user could programmatically set the desired delay. For instance, you typically would want the DQ / DQS timing relationship to be shifted ~ 1/4 of the DRAM clock. In section 4.65 of the Keystone 2 DDR3 Memory Controller User Guide (https://www.ti.com/lit/pdf/spruhn7 ), it states:

"The write data delay (WDQD) and the read DQS delay (RDQSD) are automatically derived from the measured period
during calibration. WDQD and RDQSD correspond to a 90 degrees phase shift for DQ during writes and
DQS during reads, respectively. The 90 degrees phase shift is used to centre DQS into the write and read
data eyes. A 90 degrees phase shift is equivalent to half the DDR clock period. After calibration WDQD
and RDQSD fields will contain a value that corresponds to half the DDR clock period (or a quarter of the
SDRAM clock period)."

Here, it indicates that the values found in DXnLCDLR1 should correspond to ~ 1/4 of the SDRAM clock period after training. However, this value is dependent on the measured period during calibration.

This indicates that the register value itself does not correspond directly to a fraction of a clock cycle, but rather likely corresponds to a code used to determine how many delay elements to "turn on". As far as I am aware, this would also be true for the LCDLs corresponding to the write leveling delay and the read DQS gating delay. Note that there are parameters "GDQSPRD" and "WLPRD" found in the DATX8 General Status Register (DXnGSR0) which seem to indicate the code equivalent to the DDR clock period.

Regards,
Kevin

0 Daniel Moran over 1 year ago in reply to Kevin S

Prodigy 55 points

Hi Kevin,

Thank you for your response; it has provided a little bit of clarification.

Unfortunately, since the first post in the thread, we have had another board (different design) experience the same problems; but with byte lanes 0,1, and 2. We are currently forcing DXnLCDLR2.R0DQSGD and DXnGTR.R0DGSL to values observed on "good" runs.

While we are setting these values manually, we currently cannot justify why these are "good", beyond "it seems to work"; it's a bit of a stab in the dark.

Is there a recommended manual method for determining initial values for these registers?
We are currently overwriting these values at the end of our DDR3 bring-up procedures. Is this recommended? Is this the best time to do this?
If we supply these initial values before we initiate the training/auto-calibration, will they be used, or ignored?

We have observed during the automatic calibration that the R0DGSL register usually takes on a value of "0x2", with R0DQSGD between "0x40" and "0x7F"; but occasionally, where R0DQSGD would be in the 0x70-0x7F range and R0DGSL set to 0x2, R0DGSL would take on a value of 0x3 and R0DQSGD would be 0x00-0x12, i.e. when R0DQSGD seems to "rollover" some limit, R0DGSL appears to have incremented.

In this boundary region; what can we expect to happen with the ongoing calibration? Will R0DQSGD and R0DGSL correctly update when their values drift up/down?

Ultimately, we are trying to understand:

Why the automatic calibration is choosing delay-line register values which give unstable/incorrect data reads;
How we can avoid this happening (whether via software with "known-good" register values; or if we have to redesign the hardware);
What the justification is for the chosen mitigation(s); i.e. can we have justifiable confidence in the mitigation.

Many Thanks,

Daniel

0 Kevin S over 1 year ago in reply to Daniel Moran

TI__Mastermind 38510 points

Hi Daniel,

Daniel Moran said:
Is there a recommended manual method for determining initial values for these registers?

Gate delay should be dependent on the round-trip delay of the traces. Specifically, the summation of the command/address flight time + the DQS flight time.

For some older TI devices, spreadsheets were provided that took in trace delays and calculated approximate values for these type of settings. However, I am not finding anything similar for this device when looking at the below documents.

https://www.ti.com/lit/pdf/sprabx7

https://www.ti.com/lit/pdf/spracm0

Daniel Moran said:
We are currently overwriting these values at the end of our DDR3 bring-up procedures. Is this recommended? Is this the best time to do this?

According to information I have: "The fixed values should be applied after PHY initialization. If the fixed values are provided before PHY initialization, then the training algorithm will clear the values and find a data trained result."

Daniel Moran said:
If we supply these initial values before we initiate the training/auto-calibration, will they be used, or ignored?

I cannot find any indication that the value set in these registers would be consumed by the training / auto-calibration.

Daniel Moran said:
In this boundary region; what can we expect to happen with the ongoing calibration? Will R0DQSGD and R0DGSL correctly update when their values drift up/down?

I am not aware of any issue or errata indicating that the PHY cannot handle adjustments due to environment conditions when initially converging to a delay near the clock cycle boundary.

Regards,
Kevin

0 Daniel Moran over 1 year ago in reply to Kevin S

Prodigy 55 points

Hi Kevin,

Apologies for the delay, we've had some internal network issues; and thank you for your response.

We are currently evaluating a few more configurations for stability; including but not limited to specific timing values, slower DDR speed, setting DTMPR in DTCR.

Additionally, we are discussing the possibility of probing the DDR signals to check their integrity.

We have also found that lanes whose traces are close to a specific length (approx 50mm when operating at 1600MTS) seem to be more likely to have training issues.

Regards,
Daniel

Processors

Processors forum

AM5K2E04: DDR3 DXnLCDLRn and associated Registers, field interpretation