This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/TMS320DM8168: MSATA drive link down at 60C but comes up at room temperature.

Part Number: TMS320DM8168

Tool/software: Linux

We have tried multiple vendors and we have eliminated 2 of them due to the slow speed issues.  Some come up at 3Gb/s and some at 1.5GB/s.  We can’t tolerate the 1.5GB/s rate due to the long bootup time and then sluggish system response.  Our system only uses 1 SATA link (port 2) on the TI DSP.

 

This one memory is working great except at temperature of 65C it does not even get detected on a power cycle .  The attached logs shows our sequence.  On a power cycle of our main voltage we do not do a shutdown command and so our software detects this and requires a reboot.  This second reboot continues until we power cycle the unit.

 

We found 1 vendor works from -35C to 65C on power cycles but has this occasional state of failure at 65C.

 

  1.  Power cycle at 65C and the system does not even recognize the MSATA.

  2. Our system hangs 2 minutes and reboots. 

  3. Again the MSATA is not detected.

  4. This repeats forever until we power cycle.

     

If we lower the temperature to 30C it still does not reboot on its own. We think the 65C temperature initiates the problem but lowering the temperature does not allow it to recover.

 

Here’s the startup sequence to compare with a good startup.  On the left below shows the problem where ata2 is down.  It is not detected for some reason.  We do not have a ata 1 device.

 

 

You can compare the first 2 files.  Left side is the teraterm WCS2 May 7…. And the right side is the teraterm May Virtium good.

 

This always occurs at ~1.35 seconds on power cycle as recorded by the kernel. 

 

  • Hi Clark,

    There are some silicon errata notes related to SATA module, please make sure you are aligned with these:

    Advisory 2.1.36 SERDES Transit Signals Pass ESD-CDM Up To ±150 V
    Advisory 2.1.105 SATA: Gen2 Link Fails With Some of GEN3 Devices

    Advisory 2.0.64 — SATA: Link Establishment Fails With SATA GEN3 Capable Targets - this one is NOT valid for 2.1 silicon revision devices

    Check also DM816x datasheet, section 9.17.1 SATA Interface Design Specifications


    A heat dissipation solution is required for proper device operation. Thermal performance of the overall system must be carefully considered to ensure conformance with the recommended operating conditions. Heat generated by this device must be removed with the help of heat sinks, heat spreaders, or airflow. SmartReflex can significantly lower the power consumption of this device and its use is required for proper device operation. A thermal model can be provided for thermal simulation to estimate the system thermal environment.

    Check also below e2e threads:

    e2e.ti.com/.../123355
    e2e.ti.com/.../475302
    e2e.ti.com/.../360053

    Regards,
    Pavel
  • I reviewed your suggestions and confirm the DM8168 we are using is only silicon 2.1

    1. Advisory 2.1.36 is for ESD. I don't think this applies to the msata coming up. If this does become a problem for my design then a board spin may be required for the new protection. I am not aware of any protection for the 3Gb/s serdes pairs and would ask TI for a suggestion that does not degrade performance.

    2. Advisory 2.1.105 is the main SATA problem that initiated this trouble report. I have found this suggestion is now invalid as the vendors I am working with do not make GEN2 or GEN1 devices. I have asked for firmware from the msata vendors to force the memory to GEN2 (3.0Gb/s) speeds but no vendors have provided a solution. I think TI needs to revisit this as this seems only valid for old 2.5" harddrives and is not possible for the new form factor msata parts. TI's DSP needs to work with the new GEN3 devices.

    3. Advisory 2.0.64 is not applicable for 2.1 silicon.

    4. Thermal suggestions. I think this is a contributing factor but this is a proven design of 2+ years and the only change is going to a new NAND flash (msata). The DSP has a heatsink and strong forced air cooling and the DSP temperature is read by software of ~76C in a 65C ambient.   I think this is well within the spec of the TI DSP operating range of 0 to 105C.

    Conclusion:

    1. I have 2 msata ports connected to the DSP. Port 2 is a 52pin PCIE (msata) connector and the Port 1 is a 7pin (TX/RX only) cable connection with local power. The 52pin trace lengths are near the max length allowed (10") at 9.5" but the 7 pin msata lengths are much shorter at ~5.5". At 65C Port 2 (52pin) the msata does not come up but port 1 (7 pin) comes up at 65C and 70C on 10+ power cycles. I think the signal integrity of the msata is degrading on the long traces and is not able to establish a connection at 3Gb/s or 1.5Gb/s. I don't know if the problem is the TX/RX side of the DSP or the memory. Can TI provide guidance on how to check this? I don't have a SATA analyzer and don't know how to test in circuit.

  • Clark Tollerson said:

    2. Advisory 2.1.105 is the main SATA problem that initiated this trouble report. I have found this suggestion is now invalid as the vendors I am working with do not make GEN2 or GEN1 devices. I have asked for firmware from the msata vendors to force the memory to GEN2 (3.0Gb/s) speeds but no vendors have provided a solution. I think TI needs to revisit this as this seems only valid for old 2.5" harddrives and is not possible for the new form factor msata parts. TI's DSP needs to work with the new GEN3 devices.

    DM816x device is legacy device and no more HW/SW updates are planned. For new design, AM572x device is recommended. Refer to the below e2e post:

    Check also below wiki, these also have some info regarding SATA GEN3:

    Clark Tollerson said:
    1. I have 2 msata ports connected to the DSP. Port 2 is a 52pin PCIE (msata) connector and the Port 1 is a 7pin (TX/RX only) cable connection with local power. The 52pin trace lengths are near the max length allowed (10") at 9.5" but the 7 pin msata lengths are much shorter at ~5.5". At 65C Port 2 (52pin) the msata does not come up but port 1 (7 pin) comes up at 65C and 70C on 10+ power cycles. I think the signal integrity of the msata is degrading on the long traces and is not able to establish a connection at 3Gb/s or 1.5Gb/s. I don't know if the problem is the TX/RX side of the DSP or the memory. Can TI provide guidance on how to check this? I don't have a SATA analyzer and don't know how to test in circuit.

    I will check this and come back to you.

    Regards,
    Pavel

  • Clark Tollerson said:
    1. I have 2 msata ports connected to the DSP. Port 2 is a 52pin PCIE (msata) connector and the Port 1 is a 7pin (TX/RX only) cable connection with local power. The 52pin trace lengths are near the max length allowed (10") at 9.5" but the 7 pin msata lengths are much shorter at ~5.5". At 65C Port 2 (52pin) the msata does not come up but port 1 (7 pin) comes up at 65C and 70C on 10+ power cycles. I think the signal integrity of the msata is degrading on the long traces and is not able to establish a connection at 3Gb/s or 1.5Gb/s. I don't know if the problem is the TX/RX side of the DSP or the memory. Can TI provide guidance on how to check this? I don't have a SATA analyzer and don't know how to test in circuit.

    Check the below pointers:

    Also I found the below info:

    I am aware how to do the SATA TSG compliance test. These are on SATA TX. Steps are:

    1. Power cycle the board, run linux.
    2. Once the linux boots up type the following on linux command prompt for MFTP pattern

    MFTP

    mem_rdwr.out --wr 4a14012c 320;
    mem_rdwr.out --wr 4a1400a4 40706

    1. Power cycle the board, run linux.
    2. Once the linux boots up type the following on linux command prompt for HFTP pattern

    HFTP
    mem_rdwr.out --wr 4a14012c 320;
    mem_rdwr.out --wr 4a1400a4 40707

    Follow the same  procedure with rest of the patterns below:

    LBP
    mem_rdwr.out --wr 4a14012c 320;
    mem_rdwr.out --wr 4a1400a4 40705

    LFTP
    mem_rdwr.out --wr 4a14012c 320;
    mem_rdwr.out --wr 4a1400a4 40708

    SSOP
    mem_rdwr.out --wr 4a14012c 320;
    mem_rdwr.out --wr 4a1400a4 40700

    The above are for GEN2. For GEN1 type the following on command prompt:

    Gen1:

    MFTP
    mem_rdwr.out --wr 4a14012c 310;
    mem_rdwr.out --wr 4a1400a4 40706

    HFTP
    mem_rdwr.out --wr 4a14012c 310;
    mem_rdwr.out --wr 4a1400a4 40707

    LBP
    mem_rdwr.out --wr 4a14012c 310;
    mem_rdwr.out --wr 4a1400a4 40705

    LFTP
    mem_rdwr.out --wr 4a14012c 310;
    mem_rdwr.out --wr 4a1400a4 40708

    SSOP
    mem_rdwr.out --wr 4a14012c 310;
    mem_rdwr.out --wr 4a1400a4 40700

    I have run the SATA TSG compliance test on one Netra EVM and UDWORKS board. The GEN2 jitter is within spec. You will need a Tektronix CRO and SATA compliance suite installed on the CRO.

     

    Regards,
    Pavel