This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DRA74P: DRA7xxP HW leveling sometimes timeout only on EMIF1 /2Gb

Part Number: DRA74P

Our customer returned to us a board where sometime, after the DDR setup, DDR controller and DDR3 are not able to go up and running.

On board  we have jacinto+ DRA746PPIGABZRQ1  and 4 x MT41K512M16VRP-107AATP memory (2 x each EMIF) .

When problem is occurring emif1 register status of jacinto+ present timeout errors during leveling

read EMIF_STATUS reg value = 0x40000054

The issue recurs 20-30% of boot on this board

when boot is ok we perrformed climatic ddr test (-20/+60°C  -12h-10cycle and we do not see reset)   

After this results we focus on EMIF full leveling

We did a preliminary workaround introducing 100 ms delay between a failure event and a retry of EMIF init configuration and leveling (no action on DDR just action on EMIF) with 100% positive result:  EMIF LEVELING ok .

 We acquired all EMIF regs ( listed in an excell file (EMIF_regs_Leveling_compare.xls) to be compared ):

        before Leveling

        after Leveling goes well

        after Leveling Fail

        after a retry and Leveling goes well

After these tests  we suspect the  problem is in "hw leveling" so we tried to replace it

to SW Leveling  and we did no see any issue.(100% board start ok)

We acquired also the EMIF regs in attached file

(Emif1_reg_dump_SwLeveling.txt)

we need support to understand if the problem is on DDR chips or on EMIF side)

have you got any tools to retrive additional information?

SW LEVELING DONE:

EMIF DUMP
Reg 0x4c000000: 0x50443d01
Reg 0x4c000004: 0x40000004
Reg 0x4c000008: 0x61851ab2
Reg 0x4c00000c: 0x8000000
Reg 0x4c000010: 0x40f1
Reg 0x4c000014: 0x61b
Reg 0x4c000018: 0xcccf36b3
Reg 0x4c00001c: 0x1c000000
Reg 0x4c000020: 0x30bf7fda
Reg 0x4c000024: 0x0
Reg 0x4c000028: 0x427f8ba8
Reg 0x4c00002c: 0xf8120
Reg 0x4c000030: 0x11220c0c
Reg 0x4c000034: 0x11220c0c
Reg 0x4c000038: 0x0
Reg 0x4c00003c: 0x0
Reg 0x4c000040: 0x0
Reg 0x4c000044: 0x0
Reg 0x4c000048: 0x0
Reg 0x4c00004c: 0x0
Reg 0x4c000050: 0x0
Reg 0x4c000054: 0x7770000
Reg 0x4c000058: 0x90001010
Reg 0x4c00005c: 0x42727
Reg 0x4c000060: 0x2011
Reg 0x4c000064: 0x0
Reg 0x4c000068: 0x0
Reg 0x4c00006c: 0x0
Reg 0x4c000070: 0x0
Reg 0x4c000074: 0x0
Reg 0x4c000078: 0x0
Reg 0x4c00007c: 0x0
Reg 0x4c000080: 0x0
Reg 0x4c000084: 0x0
Reg 0x4c000088: 0x10000
Reg 0x4c00008c: 0x0
Reg 0x4c000090: 0xa052faf
Reg 0x4c000094: 0x0
Reg 0x4c000098: 0x50000
Reg 0x4c00009c: 0x90000
Reg 0x4c0000a0: 0x0
Reg 0x4c0000a4: 0x0
Reg 0x4c0000a8: 0x0
Reg 0x4c0000ac: 0x0
Reg 0x4c0000b0: 0x0
Reg 0x4c0000b4: 0x0
Reg 0x4c0000b8: 0x0
Reg 0x4c0000bc: 0x0
Reg 0x4c0000c0: 0x0
Reg 0x4c0000c4: 0x0
Reg 0x4c0000c8: 0xf0000
Reg 0x4c0000cc: 0x0
Reg 0x4c0000d0: 0x0
Reg 0x4c0000d4: 0x0
Reg 0x4c0000d8: 0x80000000
Reg 0x4c0000dc: 0x0
Reg 0x4c0000e0: 0x0
Reg 0x4c0000e4: 0xe24400b
Reg 0x4c0000e8: 0xe24400b
Reg 0x4c0000ec: 0x0
Reg 0x4c0000f0: 0x0
Reg 0x4c0000f4: 0x0
Reg 0x4c0000f8: 0x0
Reg 0x4c0000fc: 0x0
Reg 0x4c000100: 0x0
Reg 0x4c000104: 0x0
Reg 0x4c000108: 0x0
Reg 0x4c00010c: 0x0
Reg 0x4c000110: 0x0
Reg 0x4c000114: 0x0
Reg 0x4c000118: 0x0
Reg 0x4c00011c: 0x0
Reg 0x4c000120: 0x305
Reg 0x4c000124: 0xffffff
Reg 0x4c000128: 0x0
Reg 0x4c00012c: 0x0
Reg 0x4c000130: 0x0
Reg 0x4c000134: 0x0
Reg 0x4c000138: 0x0
Reg 0x4c00013c: 0x0
Reg 0x4c000140: 0x0
Reg 0x4c000144: 0x14d0f3
Reg 0x4c000148: 0x6d32a350
Reg 0x4c00014c: 0xa
Reg 0x4c000150: 0x120000
Reg 0x4c000154: 0x9999
Reg 0x4c000158: 0x4924
Reg 0x4c00015c: 0x0
Reg 0x4c000160: 0x0
Reg 0x4c000164: 0x0
Reg 0x4c000168: 0x0
Reg 0x4c00016c: 0x0
Reg 0x4c000170: 0x7000700
Reg 0x4c000174: 0x7000700
Reg 0x4c000178: 0x7000700
Reg 0x4c00017c: 0x7000700
Reg 0x4c000180: 0x7000700
Reg 0x4c000184: 0x4603a4
Reg 0x4c000188: 0x2be01ba
Reg 0x4c00018c: 0x3a50291
Reg 0x4c000190: 0x570091
Reg 0x4c000194: 0x110339
Reg 0x4c000198: 0x260384
Reg 0x4c00019c: 0x29e019a
Reg 0x4c0001a0: 0x3850271
Reg 0x4c0001a4: 0x370071
Reg 0x4c0001a8: 0x110339
Reg 0x4c0001ac: 0x10f00000
Reg 0x4c0001b0: 0x0
Reg 0x4c0001b4: 0x0
Reg 0x4c0001b8: 0x0
Reg 0x4c0001bc: 0x0
Reg 0x4c0001c0: 0x0
Reg 0x4c0001c4: 0x0
Reg 0x4c0001c8: 0x0
Reg 0x4c0001cc: 0x0
Reg 0x4c0001d0: 0x0
Reg 0x4c0001d4: 0x0
Reg 0x4c0001d8: 0x0
Reg 0x4c0001dc: 0x0
Reg 0x4c0001e0: 0x0
Reg 0x4c0001e4: 0x0
Reg 0x4c0001e8: 0x0
Reg 0x4c0001ec: 0x0
Reg 0x4c0001f0: 0x0
Reg 0x4c0001f4: 0x0
Reg 0x4c0001f8: 0x0
Reg 0x4c0001fc: 0x0
Reg 0x4c000200: 0x10040100
Reg 0x4c000204: 0x10040100
Reg 0x4c000208: 0x910091
Reg 0x4c00020c: 0x910091
Reg 0x4c000210: 0x950095
Reg 0x4c000214: 0x950095
Reg 0x4c000218: 0x9b009b
Reg 0x4c00021c: 0x9b009b
Reg 0x4c000220: 0x9e009e
Reg 0x4c000224: 0x9e009e
Reg 0x4c000228: 0x6b006b
Reg 0x4c00022c: 0x6b006b
Reg 0x4c000230: 0x350035
Reg 0x4c000234: 0x350035
Reg 0x4c000238: 0x350035
Reg 0x4c00023c: 0x350035
Reg 0x4c000240: 0x350035
Reg 0x4c000244: 0x350035
Reg 0x4c000248: 0x350035
Reg 0x4c00024c: 0x350035
Reg 0x4c000250: 0x350035
Reg 0x4c000254: 0x350035
Reg 0x4c000258: 0x60006d
Reg 0x4c00025c: 0x60006d
Reg 0x4c000260: 0x600069
Reg 0x4c000264: 0x600069
Reg 0x4c000268: 0x600067
Reg 0x4c00026c: 0x600067
Reg 0x4c000270: 0x60006b
Reg 0x4c000274: 0x60006b
Reg 0x4c000278: 0x600060
Reg 0x4c00027c: 0x600060
Reg 0x4c000280: 0x40004d
Reg 0x4c000284: 0x40004d
Reg 0x4c000288: 0x400049
Reg 0x4c00028c: 0x400049
Reg 0x4c000290: 0x400047
Reg 0x4c000294: 0x400047
Reg 0x4c000298: 0x40004b
Reg 0x4c00029c: 0x40004b
Reg 0x4c0002a0: 0x400040
Reg 0x4c0002a4: 0x400040
Reg 0x4c0002a8: 0x800080
Reg 0x4c0002ac: 0x800080
Reg 0x4c0002b0: 0x800080
Reg 0x4c0002b4: 0x800080
Reg 0x4c0002b8: 0x40010080
Reg 0x4c0002bc: 0x40010080
Reg 0x4c0002c0: 0x8102040
Reg 0x4c0002c4: 0x8102040
Reg 0x4c0002c8: 0x1500150
Reg 0x4c0002cc: 0x1500150
Reg 0x4c0002d0: 0x1500150
Reg 0x4c0002d4: 0x1500150
Reg 0x4c0002d8: 0x1500150
Reg 0x4c0002dc: 0x1500150
Reg 0x4c0002e0: 0x1500150
Reg 0x4c0002e4: 0x1500150
Reg 0x4c0002e8: 0x1500150
Reg 0x4c0002ec: 0x1500150
Reg 0x4c0002f0: 0x0
Reg 0x4c0002f4: 0x0
Reg 0x4c0002f8: 0x0
Reg 0x4c0002fc: 0x0
Reg 0x4c000300: 0x0
Reg 0x4c000304: 0x0
Reg 0x4c000308: 0x0
Reg 0x4c00030c: 0x0
Reg 0x4c000310: 0x0
Reg 0x4c000314: 0x0
Reg 0x4c000318: 0x77
Reg 0x4c00031c: 0x77
EMIF_regs_Leveling_compare.xlsx 

thanks in advance for your helps.

regards

Nello Michele

 

  • Hi All,

    we made additional tests ONLY on EMIF1 trying to isolate the problem: 100

    • emif_ddr_phy_ctlr_1_init = 0x0624400B - skip Gate training and skip write leveling training : 100% board start whit leveling good
    • emif_ddr_phy_ctlr_1_init = 0x0424400B - skip gate training ONLY: we still saw leveling FAIL occurence
    • emif_ddr_phy_ctlr_1_init = 0x0224400B - skip write leveling training ONLY: we still saw leveling FAIL occurence

    How we can interpret these results?

    can we assume that the problem is isolated in the DDR which seems does't send back the leveling info to the EMIF PHY  ( EMIF signal a timeout in tha leveling operation )?

    thanks in advance

    Regards

    Michele and Nello

  • Hi Michele and Nello,

    In the log file, it shows the following registers having a value of 0x1500150, but in the XLS, it shows them as 0x0. How are these configured in your software code?

    Reg 0x4c0002c8: 0x1500150
    Reg 0x4c0002cc: 0x1500150
    Reg 0x4c0002d0: 0x1500150
    Reg 0x4c0002d4: 0x1500150
    Reg 0x4c0002d8: 0x1500150
    Reg 0x4c0002dc: 0x1500150
    Reg 0x4c0002e0: 0x1500150
    Reg 0x4c0002e4: 0x1500150
    Reg 0x4c0002e8: 0x1500150
    Reg 0x4c0002ec: 0x1500150

  • Hi Kevin,

    in our code, wiht HW_Leveling enabled, these register are configured all 0x0 ( you can see the value in the xls file which contains all reg values read back from EMIF after its configuration);

    Log file, is related to SW_Leveling, these registers are not configured by our code ( we configure regs till EMIF_EXT_PHY_CONTROL_25 ), log files contains all reg value read back from EMIF after its configuration ;

    Regards

    Michele

  • Hi Michele,

    Ok, thanks. It looks like training is failing to a specific memory (lower two bytes). It also looks like all training types (for those byte lanes) are yielding poor results. This is generally true if the first training type fails; however it seems like both write leveling and gate training fail per your results.

    Is it a single board failing or multiple? If multiple failing boards, do all boards show same behavior (failing on same EMIF / byte lanes)?

    Additionally, is this design relatively new, or has it been in production for some time?

    Regards,
    Kevin

  • Hi Kevin,

    The fail is only on this single board (which is one of more 10k boards production batch)
    We have to provide to our customer the 8D_report with the root cause in short time.
    If we understood well from your reply, the problem seems to be located in memory with lower two Data_bytes.

    We have to find root cause. what do you suggest?
    Please could you provide test to isolate the problem?


    Regards Nello Michele

  • Hi Nello,

    Ok thanks for that additional information.

    Would you be able to perform an A-B-A swap of the DDR3 memory connected to the two lower bytes to see if the failure follows the DDR3 device currently populated on the PCB?

    Regards,
    Kevin

  • No ,we can only replace DDR chip with new one (Because we do not have reboll machine, so we can not swap with 100% sucess)
    Moreover we need to be sure to have done all the checks in current status to find the root cause .

    We could suppose 3 possible results:
    1 problem follows ddr chip (we ask Micron chip analisis), confirm?
    2 problem disappear . (in this case production problem?) please consider that when leveling is ok, the board has been stressed -20/+60 celsius for 10 h without problems.
    3 problem remain as it is. (in this case DDR Controller?) other test possible to continue in this direction before reworking ?

    regards Nello

  • Hi Kevin,

      Sorry to pres you : i'm the MTA HW design manager. Customer is escalating to have the root cause. 

    I ask to you :

    if you can focus on case 3 and say to us if we could run further tests  before starting with the rework  , that as Nello said, is presenting risk to damage ddr and to close analysis unsuccessfully.

    Many thanks

    Br

    Paolo

  • Hi Paolo,

    Because the failure is specific to hardware training, there is not much from a software perspective that can be performed to isolate the issue further. A-B-A swaps and/or physical measurements would be needed.

    A few thoughts:

    • Check voltage rails (vdd_core / DDR IO supply)
      • Measure with scope / probe, ensuring value matches expected value
      • Try increasing vdd_core and the DDR IO supply by 50 mV to see if there is any impact on the failure
    • Since re-initializing / re-training resolves the issue, verify that the power-up / reset procedure matches expectations. At minimal, 
      • Ensure power supplies ramped and stable at least 200 us before DDR reset released (rising edge)
      • Check that there is a 500 us delay between DDR reset (rising edge) and DDR cke (rising edge)
      • Check that clocks are locked and stable prior to DDR cke active (rising edge)
    • Since the device fails write leveling,
      • Turn off all trainings except write leveling,
      • Try to probe DQS and DQ during the write leveling procedure to ensure activity. Perform this on a good byte lane (byte lane 2 or 3) as well as the bad byte lanes (byte lane 0 or 1) for comparison. 
      • If the TI processor is sending activity on the DQS but the DRAM is not returning anything on the DQ, then it seems like the issue may be on the DRAM side.

    In the case of #3 (problem still exists with new DDR memory), you might be able to return the TI device for further analysis as outlined here: https://www.ti.com/support-quality/additional-information/customer-returns.html . 

    Regards,
    Kevin

  • Hi Kevin,

    in case "HW Leveling" fails, do yuo have a retry procedure to redo HW Leveling ?

    What you suggest to do in case something goes wrong during HW Leveling procedure ( done by bootloader software )?

    As we already said ( init of this thread ) "We did a preliminary workaround introducing 100 ms delay between a failure event and a retry of EMIF init configuration and leveling (no action on DDR just action on EMIF) with 100% positive result:  EMIF LEVELING ok .", we wonder if you have a retry procedure to be followed.

    Thanks in advance

    Regards

    Michele

  • Hi Kevin,

    we inform you that we changed one DDR3 CI , the one connected to two lower bytes of EMIF1 of J6+.

    After more than thousands of restarts we don't see again the issue.

    Thanks for your support

    Regards

    Michele and Nello