This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DDR3 Readback issue and DDR3 leveling questions

Hello , 

We  Have an urgent need for design guidance  (see below)

Background…

            We are experiencing  DDR3 read back issues on our first spin PCB design which has a C6655 DSP and two  Micron MT41K128M16JT-125:K  2Gb DDR3   (16 M x 16 x 8) memories operating at 1033MHz.  We are have been experiencing single bit errors on some of the boards (the quantity of errors vary from a few thousand to only a few when repeatedly reading back 1MByte of data previously written).  The bit location of the error(s) change somewhat but there are certain bits which have a much higher probability of error.  Other boards repeatedly have no read back errors issue. There are no indications so far with issues writing to memory reliably. We tried slowing the DDR clock from 1033 to 825 and still had issues although there is some reduction in the number of errors.

            Up until yesterday we have been using the “ Partial Automatic Leveling”  mode which is the same method used on the EVM for the 6657 and seems to be the most common method discussed on the TI E to E.   Yesterday, we changed the leveling method used to “ Full Automatic &Incremental “ and followed  the details exactly as described  in  sprugv8c.pdf  example 22. With this change, all of the PCBs are  working 100 % of the time ( no readback errors) .  There are excerpts from the GEL for the DDR initialization for both leveling cases included in attachment.

 I believe we have followed all of the schematic guidelines, all of the registers are set using the two  TI excel files DDR3 Register Calc v4.xlsx  and   DDR3 PHY Calc v10.xls  to obtain  the proper register (see attached) settings.  We have followed the DDR initializing in the same manner as described in sprugv8c.pdf and the EVM GEL.

We believed we were following all of the TI PCB layout guidelines but now see that some aspects of the net class routing rules specified in section 4 of sprabi1a.pdf are being exceeded.   Namely the following;

  • All address and command net classes shall be skew matched to respective clock lines to within CLK  <=20 mils 
  • Total length from DSP to each SDRAM for all respective DQ and DM signals within a byte lane should be skew matched to the DQS line <=10.00 mils DQS to DQS# skew shall be <= 10.00 mils

Attached in the file Signal Routing Lengths.xls it has routing length of each net as well as the simulated propagation delays.

We are correcting the net class routing to meet  the class limits in the production design but want to have confidence that the read back problem we are experiencing will be remedied with the layout changes and  there is not some other reason for our issue.  Ideally we would like to use the more common “ Partial Automatic Leveling” method.

 OK , now for some questions…

1)    Is there anything you can see we are doing incorrectly or should be checking into?

2)    To the degree we are exceeding the net class  routing rule limits, would it likely explain why we have read back errors when using   “ Partial Automatic Leveling “  but none with “ Full Automatic &Incremental’ ?

3)    Should we being using routing rules  for net classes based  on absolute length as described above or use propagation time from modeling ?  The results are likely to be mutually exclusive.

4)    There is very little information given on the  Full Automatic &Incremental  leveling method and there are several caveats listed in sprugv8c.pdf  and the silicon errata. 

  1. Is using Full Automatic &Incremental   a viable short-term and /or long –term solution ? 
  2. With the values using in example  22 of sprugv8c.pdf  . How often is memory unavailable due to the incremental leveling operations ?   Is leveling occurring every 10ms and does the leveling process take 1 ms each time? Any updates to recommended setup of the associated registers? Is there any more documentation on Full Automatic & Incremental  leveling ?  What is difference between  Read eye and  read gate leveling ?
  3.  Could we turn-on Full Automatic & Incremental  leveling after DDR init and then  switch –off ?  Will it retain the optimal values for  read-eye and read gate timing ?  The procedure to turn-off Full Automatic & Incremental   leveling is not provided . I tried code snippet shown below and it seemed to work ( no errors in read back when I came back many hours later to reread memory )  but I don’t know if Automatic &Incremental  leveling is still active.

DDR_RDWR_LVL_RMP_CTRL = 0x80000000; //enable full leveling

DDR_RDWR_LVL_CTRL = 0x80000000; //Trigger full leveling 

Delay_milli_seconds(10ms);

// Turn on Full Automatic & Incremental Leveling per SPRABL2A example 20                 

DDR_RDWR_LVL_RMP_WIN = 0x00000502;

DDR_RDWR_LVL_RMP_CTRL = 0x80030300;

DDR_RDWR_LVL_CTRL = 0xFF090900;  

Delay_milli_seconds(500); // How long to wait before disabling ?

// turn off Full Automatic & Incremental Leveling

//turn off inc. DQS gate and read eye training , don't retrigger.

DDR_RDWR_LVL_CTRL = 0x00000000;

//turn off  inc DQS gate and read eye training         

DDR_RDWR_LVL_RMP_CTRL = 0x80000000;

DDR_RDWR_LVL_RMP_WIN = 0x00000000;

 

Thanks so much for your assistance in the urgent matter.

 

Larry

Files for DDR3 Readback Issue Posting.zip
  • Larry,

    We will review the data provided and provide a response.  The 'partial automatic leveling' and 'full automatic leveling' procedures should work equally well.  Both are fully qualified for long-term use.

    Tom

     

  • Larry,

    We will have an initial response by Friday after a thorough review of the provided data.

    Tom

     

  • Tom,

     Thank you in advance for reviewing our design documentation.  I also have a few questions which should have nothing to do with our specific design details and are simply questions about the C6655 DDR interface.   I was hoping to get a quick reply on these matters.

    Re Leveling:

    I glad to hear the “full -automatic “leveling mode  is fully qualified on the C6655.  In our case,  to get robust error-free read back on all PCBs we also need to turn-on the “incremental leveling after full leveling “  ( Example 21 and 22  in section 3.3 of SPRABLA2A ) .  Using just the auto-leveling  ( example 21 only)  , the read back can still be erratic on some PCBs. Based on information, it sound like the read-eye sample point may not converge but when we  read the DDR Status register, it always give  a successful result, 0x40000004

    Following the DDR initialization, we are using the leveling procedure as described in section 3.3 of SPRABLA2A

    //  Removed for Auto leveling DDR3_CONFIG_REG_23 |= 0x00000200;

    // Example 21  - auto leveling

    RDWR_LVL_RMP_CTRL = 0x80000000; //  Enable leveling

    RDWR_LVL_CTRL = 0x80000000; //  Trigger full leveling

    for(i=0;i<1000;i++); //Wait 3ms for leveling to complete

    //Example 22 – incremental leveling

    RDWR_LVL_RMP_WIN = 0x00000502;  //  Set incremental ramp window to 10ms ?

    RDWR_LVL_RMP_CTRL = 0x80030300;  // Set Read Data Eye And Read Gate interval during ramp window

    RDWR_LVL_CTRL = 0xFF090900;    // Set Read Data Eye And Read Gate training interval

    Delay ( TBD ms)  How long will 64 iterations  take  ?

    If we leave the leveling mode registers unchanged at this point, according to documentation, incremental leveling adjustments will be made on a 10ms basis to compensate for temperature and voltage changes.  I don’t know if the incremental  leveling  would interfere with normal memory operation (refresh , Mem  R/W operations) without halting operation.    Is this a valid concern?

    Is there a procedure to turn-off the incremental leveling while retaining the read-gate and read data eye leveling settings  (understand  that  it will no longer be tracking voltage and temperature changes) ?   I added the lines of code below after example 22  and it seems to work (  robust read back with all PCBs) but I am uncertain if the incremental leveling  operations have ceased .

    DDR_RDWR_LVL_CTRL = 0x00000000; //turn off incr DQS gate & read eye training , don't retrigger.
    DDR_RDWR_LVL_RMP_CTRL = 0x80000000; //turn off DQS gate and read eye training    
    DDR_RDWR_LVL_RMP_WIN = 0x00000000;

    Please provide guidance in for the usage of Incremental leveling after Automatic leveling  ?

    Does the fact we need Incremental leveling after Automatic leveling mode to make the  readback operation robust indicate another design issue is present ?

    Re Layout Design Rules:

    We are reexamining PCB layout contraints and want to know if routing length guidelines  ( ie. Total length from DSP to each SDRAM for all respective DQ and DM signals within a byte lane should be skew matched to the DQS line +/- 10.00 mils  ) take precedent over  ICX propagation timing simulation results. 

    Thank you for your consideration in this matter.

    Larry

     

  • Hi Larry and TI experts:

    Glad to see Larry's DDR works fine.I have the same DDR MT41K128M16JT-125IT for my own board which contains a c66x dsp.After downloading Larry's files pasted here ,I tried to make my own initialization code,but failed anyway.

    I have some questions that hope you can help:

    1.Does DDR initializaiton has anything to do with DSP type(6614 or 6657 make a difference?)?We have the same  DDR,does it mean we are going to have the same register values calculated by those XMLs?

    2.Larry,What is your configuration for pll2:What's the value of your PLLM_DDR and PLLD_DDR ? Why you choose 1033Hz as output DDR data rate? As far as I know,our DDR should support 1600 data rate.

    3.Do you have any suggestions for debugging the DDR registers?

    Thank you all in advance !

    Striker Qian

  • Striker,

    You cannot use the XLS from another design.  Some of the inputs are from the board layout.

    KeyStone-I devices are currently limited to 1333MT/s.  Please refer to the Data Manuals and Errata docs.

    Tom

     

  • Striker, 

    I  agree with Tom's comments ( ie, 1333MHz is max bus speed of this DSP).. 

    BTW -  I am using 1033 because original design used 1066 memory ( now EOL) and due to the clock input frequency ( 51.6 MHz)  I had available for the DDR PPL Clock input.

    Larry

     

  • Hi Tom,

    Can you specify which part will be related with my DDR type and which part will be from my board layout?

    In my opinion,the values calculated  from DDR3 Register Calc v4.xls, which contains:

    DDR_SDTIM1       
    DDR_SDTIM2       
    DDR_SDTIM3      
    DDR_SDCFG        
    DDR_SDRFC (normal)       
    DDR_SDRFC
    (ext temp)       
    DDR_SDRFC (initialization)       

    will be from MT41K128M16JT-125I datasheet.

    And the DDR3 PHY Calc v10.xls will be determined by my board layout.which contains:

    Register
    DATA0_WRLVL_INIT_RATIO
    DATA1_WRLVL_INIT_RATIO
    DATA2_WRLVL_INIT_RATIO
    DATA3_WRLVL_INIT_RATIO
    DATA4_WRLVL_INIT_RATIO
    DATA5_WRLVL_INIT_RATIO
    DATA6_WRLVL_INIT_RATIO
    DATA7_WRLVL_INIT_RATIO
    DATA8_WRLVL_INIT_RATIO
     
    Register
    DATA0_GTLVL_INIT_RATIO
    DATA1_GTLVL_INIT_RATIO
    DATA2_GTLVL_INIT_RATIO
    DATA3_GTLVL_INIT_RATIO
    DATA4_GTLVL_INIT_RATIO
    DATA5_GTLVL_INIT_RATIO
    DATA6_GTLVL_INIT_RATIO
    DATA7_GTLVL_INIT_RATIO
    DATA8_GTLVL_INIT_RATIO
     
    Register
    DDR3_CONFIG_REG_12 OR Mask

    Is that the way to specify all the registers we need? I asked the board PCB designer of our side and he told me that the board's DDR layout is the same with 6614evm board. Should I refer to he PHY configuration of  the evm board?

     

     

     

     

  • Larry,

    Many thanks for your kind help. I just to know from you which documents can offer me enough infomation to determine all those many registers values using the xls.I have download the MT41K128M16JT-125IT  data sheet 2GB_1_35V_DDR3L.pdf   . But the infomation I got is very limited. Can you list the docs which can help to specify all the values?

  • Striker,  Yes, I know the data sheet for the MT41K128M16JT-125IT  has limited timing info..  As you know this it is  a LV part ( 1.35V) .  A FAE from Micron indicated that where not specfied otherwise on the  MT41K128M16JT-125IT  , I can use the MT41J128M16 spec sheet (1.5V part)..  This spec sheet had all of the timing specs I needed and it is what I used into the TI Excel file to deteremine register settings.    Here is link to  a Micron Technical note  regarding 1.35V memory    http://www.micron.com/~/media/Documents/Products/Technical%20Note/DRAM/tn4114_ddr3_1_35v_1_5v_compat.pdf

    "Micron 1.35V DDR3L and DDR3L-RS devices use the same die as 1.5V DDR3 devices,

    but have been separated during the test screen and marking process. The Micron 1.35V

    test screen incorporates testing to ensure backward compatibility to 1.5V operation.

    Therefore, all parts marked as DDR3L or DDR3L-RS are backward compatible to parts

    marked as DDR3, and meet the JEDEC 1.5V voltage level operation specifications. This

    is in compliance with both Micron and JEDEC specifications."

    Note:  I am using a 1.5 V  power with the MT41K128M16JT-125IT.    I don't know if there would be voltage incompatiablity issues with the DSP if you were to operate DDR3 on 1.35V .  This is something you would need to discuss with TI.

    Larry

  • Hi Larry,

    You’ve described your problem as a read error issue. I just want to be sure about the specific test that you are preforming. Based on your notes the memory test writes to a block of data and then reads back multiple times, comparing the value read with the value expected. Is that correct? When you see the errors do the bad values change from read to read or remain consistent?  Do the errors appear more frequently in one bit vs another or do the errors appear randomly across the byte lanes? Are any of the byte lanes stable? What memory test are you using? Is this a test provided by TI or a test that you have developed? As you commented, reading back same error multiple times from the same memory location would indicate a write error but if you are seeing different incorrect values over multiple reads it would indicate read errors. Often customers use a memory test that will write to memory followed by a read, repeated multiple times. This doesn’t narrow the problem to either a write error or a read error. Based on your comments, you’re reading multiple times with different incorrect results indicating a read error but I wanted to be sure.

    We've reviewed the data that you sent and we have some questions and concerns. We want to be sure that any changes to your PCB are complete and successful.

    Length spreadsheet

    It's clear from the length matching spreadsheet that you have routed the two memories in fly by mode but we have some questions. You asked about the length requirements compared to simulated timing requirements. Ideally we would present requirements as a timing relationship, but we have found that generating these requirements in terms of length is more accepted by PCB designers. On most designs the length and the delay correlate pretty closely since groups of signals are routed on the same layer. 

    1)      The length of the data lanes is listed as a single number with no reference to layers. In the PHY calc spreadsheet this length is reflected entirely as stripline. What portion of each trace is on the top or bottom layers?

    2)      In a number of cases your delay and your length do not correlate as expected. For example for your DQSN/P_0 and your EDQ0 signals, the DQS have shorter lengths then the EDQ0 but the delay for DQS are longer. Since these signals should be routed on the same layers, we’re trying to understand what caused this difference. Were they routed on the same layer? Was the top or bottom layer used to route a significant portions of these traces? Were the same number of vias used for each of the traces in each byte lane?

    3)      How was the length of each of the address and command traces calculated? The length should have been measured from the ball on the C665x to the ball of the memory device. How many vias were used on each trace? Were the terminating resistors placed at the end of the trace?

    4)      One of the concerns that we have that is not captured well in the routing guidelines is the length of the stub between the address trace and the ball of the memory. Since the memory trace continues from the C665x to the terminating resistor there is always a stub from the fly-by trace to the ball of the memory. This stub should be kept as short as possible.

    5)      Can your PCB layout tool provide a report that specifies the length of the address, command, clock and data traces? This report should show the portion of the length between any vias that are part of the layout and the layers where each portion resides. For example a data lane routed between the C665x on the top layer to a memory  on the top layer would have a length on top between the C665x and a via, a length of some other layer between the via and a second via and, finally, a length between the second via and the ball of the memory on the top layer.

     

    Schematic

    1)      Would you be willing to provide the schematic pages in a searchable PDF for our review?

    PCB layout

    In addition to lengths there are a number of other layout issues that could cause memory errors. The error we see most commonly is the lack of a solid ground plane adjacent to the data and clock traces and the lack of a solid ground or solid DVDD15 adjacent to the address and command traces.

    1)      Can you provide the PCB stackup? Can you specify which layers were used for routing DDR signals?

    2)      Were the signals for each byte lane routed on a common layer? To be clear each byte lane consists of eight data signals and the associated DQM and DQSP/N.

    3)      Have the routing requirements associated with ground planes and DVD15 planes been followed?

    4)      Would you be willing provide a PDB database that we could check? We have the ability to view Cadence and PADS databases. 

    REG Calc spreadsheet

    Based on the 1.5V version of the datasheet on the Micron website for the memory device that you are using, the timing information needed for the REG calc spreadsheet appears to be correct.

    PHY calc spreadsheet

    This spreadsheet looks fine based on the numbers you entered but, as mentioned previously, the lengths are not broken down into stripline and microstrip. Are the lengths accurately entered?

    Regards, Bill

  • Hi All,

    We are also facing the same sort of issue. was the issue solved on your side

    Regards,

    Avinash N

  • Hi Avinash,  Sorry,   I did not see your reply until today but I  did see the post  "2Gb DDR3L" you initiated on 11/5and was wondering if you were having the same issue.   We have still not determined reason why the partial automatic leveling  does not work ( erratic readback)  with the MT41K128M16JT-125 DDR3L part #  (same part # you are using)..  We swirched to  full automatic leveling  along with incremental leveling ( turn on for 100ms and then off)  after DDR initialization completes and that is our plan for production.  This leveling method has worked 100% for us even with a higher DDR speed planned for the product  ( 1315 vx 1033).   We duid have a few  boards made with two different 1.5V DDR3 part #.  The 1.5V DDR parts all seem to  work fine using the partial automatic leveling  ( same GEL settings) but we have not done extensive testing .  This seems to point to a difference in the DDR3L part  ( Note we are still operating the  DDR3 with a VDD of 1.5V) but to dat have not found an explaination.

    Can you provide update/info from your end..  Is the problem you are having erratic readback.. (ie, .typically a single bit in a word, vary between 100 to 10, 000 errors when writing and then reading a 3MB file The errors are data dependent and the bit locations for the errors vary from board to board)..  when using partial automatic leveling ?  Does the memory work fine with DDR3 ( 1.5V) part  using partial automatic leveling ?  Have you tried adding full leveling with incremential wi the DDR3L part to see uif problem is fixed.. Thanks

    Larry

  • Larry,

    The C6654/55/57 DDR3 initialization sequence was found to be deficient. This has been addressed in other e2e posts. The corrected initialization process is now fully documented in the on-line documentation.

    The revised KeyStone I DDR3 Initialization Application Report SPRABL2D can be downloaded from: http://www.ti.com/litv/pdf/sprabl2d. Please also download the latest support spreadsheets from: http://www.ti.com/litv/zip/sprabl2d.

    The revised KeyStone Architecture DDR3 Memory Controller User Guide SPRUGV8E can be downloaded from: http://www.ti.com/lit/pdf/sprugv8.

    The latest C6654/55/57 GEL file change is in Keystone1 Emupack release 1.0.9.1 available at: http://software-dl.ti.com/sdoemb/sdoemb_public_sw/ti_emupack_keystone1/latest/index_FDS.html.

    Tom