This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5748: stuck at high temperature

Part Number: AM5748

AM57x Team,

My customer ran into an issue with the AM5748ABZXA getting stuck after being operated at a high temperature during testing

2 out of 10 units were observed to have this issue while chamber testing.

Observations:

  • At high junction temperature (> 133C) Processor would reboot -> go to uboot, try to run Linux, reboot and the cycle continues forever until temperature is lowered then the loop is broken
  • If the unit is left in this rebooting cycles for too long ( > 1hour or so), it will eventually stop rebooting and get “stuck”
  • When it is stuck, lowering the temperature will not revive it, i.e. it will not reboot anymore
  • Even after setting the ambient temperature around AM5748 to room temperature for 2 days, manually resetting the AM5748 cannot boot it up (which also issues a  power-on-reset)
    • I made sure that power-on-reset is applied during my troubleshooting
    • When I manually reset it, the soc didn’t issue a reset_out
  • The only thing that can revive it is by doing a power cycle (removing power and applying power)

 

What has been checked:

  • Sysboot pins are at correct level the whole time the unit is “stuck”. So when manual reset is applied, the straps are at proper level
  • Clock input is present and correct
  • PMIC got reset and PMIC went out of reset (it releases power-on-reset) indicating that all voltage output are good

 

I also noticed that the soc has an internal watchdog (WD_Timer2) that should kick every 3 minutes during booting if system fails to boot i.e.:  if it fails to find a device to boot from or if it can’t execute image.

However, I didn’t see this 3 minute interval during this “stuck” condition.

 

I don’t think I can use a JTAG debugger for this issue since this issue occurs very early on in the booting process (in the ROM startup/initialization sequence).

 

I would like to know of what is happening and where it is stuck at in the booting process.

Thanks,

Tom

  • Hi,

    I would like to give you some updates: 

    - During this stuck condition, VDD_CORE (SMPS6 output of PMIC: TPS6590378ZWSR) is at 0.00V. The rest of the voltage rails are good.

    At power up this VDD_CORE normally is 1.14V (OTP) and changes to 1.01V. But in this stuck condition, it becomes 0.00V.

    How is this possible? For this specific rail to be 0.00V, AM5748 has to either set PMIC SMPS6 to OFF or change the VSEL to 0.00V through I2C.

    The only way to restore VDD_CORE to 1.14V is by power cycling it. Even cutting 3.3V supply to PMIC's VCC won't restore it.

    1. Why is that?

    2. Can you help me understand this? What is the mechanism of PMIC configuration/initialization by AM5748 during bootup?

    3. Is there a document that outlines this process?

    4. Also, where can I find information on this VDD_CORE = 1.01V ? Why did it change to 1.01V?

    Thanks,

    Louis

  • Louis, Tom,

    I can comment on the SoC side, we may need other input on the PMIC side.

    The SoC has a built in TSHUT mechanism that will force an internal reset when the on die temp sensors reach 123 degC.  It's possible depending on the ambient temperature, processor load, and cooling mechanisms that the part can get stuck in a reset loop.

    TSHUT is documented in the TRM in Section 19.4.6.2 Thermal Management Related Registers

    Re: the 1.14 (OTP) to 1.01V observation:  The processor has a feature called AVS, Adaptive Voltage Scaling.  Each SoC is fused with a set unique voltage settings in the SoC AVS registers (refer to the table "Registers Associated With AVS Class 0 Voltage"). 

    The normal operation is for the device to boot at the NOM voltages (1.15V for CORE and MPU), SW reads the AVS EFuse register for the appropriate domain and OPP and sends an I2C command to the PMIC to set the voltage to the "optimized" voltage level for the domain/OPP for that particular device.

    AVS is documented in the TRM in section 19.4.6.12 AVS Class 0 Associated Registers.

    Regards,

    Kyle

  • Hi Kyle,

    Thanks for your reply! Regarding the TSHUT: yes, I agree, as long as the ambient temperature is high, it might get stuck in a reset loop due to die temp sensor doing its work.

    However, I lowered the ambient temperature to 20C for 2 days and uboot was still unable to start. The only way to reboot it in this scenario is to power cycle the system (removing both 3.3V and 5V to PMIC). And we found out that this happened when VDD_CORE is 0.00V. 

    Regarding the 0.00V of VDD_CORE, I can think of several scenarios (can you please comment on them):

    - Is it possible that after many tries, processor decides to "kill" the system by sending 0V for vdd_core to PMIC? (sounds crazy and extreme, but just some thought)

    - according to fig. 19-13 in sec. 19.4.6.12, boot loader determines the OPP and read values from CTRL_CORE_STD_FUSE_OPP_VMIN_CORE_2 register and update PMIC through I2C. Is it possible that it reads 0x000 and write it to PMIC?

    - overcurrent in PMIC SMPS6 shutting down VDD_CORE

    Another question:

    - this AVS communication between processor and PMIC seems a bit risky, especially in unknown rebooting conditions. Would it be possible to set NOM for CORE at 1.01V and disable the I2C command sending for CORE?

    - or perhaps even disable (I2C for AVS) and set hardened voltage values?

    Thanks,

    Louis

  • Louis, 

    - Is it possible that after many tries, processor decides to "kill" the system by sending 0V for vdd_core to PMIC? (sounds crazy and extreme, but just some thought)

    No, that is extremely unlikely unless you added some code like this which I don't suspect.

    - according to fig. 19-13 in sec. 19.4.6.12, boot loader determines the OPP and read values from CTRL_CORE_STD_FUSE_OPP_VMIN_CORE_2 register and update PMIC through I2C. Is it possible that it reads 0x000 and write it to PMIC?

    That is unlikely for this part that has been in production for years and is very mature.  You can easily check this by connecting code composer and reading the address locations for that address via a memory window.  

    - overcurrent in PMIC SMPS6 shutting down VDD_CORE

    I haven't heard of anything like this.  But I'll pull in the PMIC team to comment on whether there is any configuration that can be queried inside the PMIC to see what may be going on.

    - this AVS communication between processor and PMIC seems a bit risky, especially in unknown rebooting conditions. Would it be possible to set NOM for CORE at 1.01V and disable the I2C command sending for CORE?

    - or perhaps even disable (I2C for AVS) and set hardened voltage values?

    If you're using a recent "enough" SDK then the code to AVS this should be quite mature.  

    The 1.01V in efuse is "optimized" for the exact/single unit you are debugging at the moment.  Other units may have higher or lower (or the same...) voltage.  If you were to set 1.01V for units that required higher voltage then you can run into functionality issues.

    On the other hand, if you choose to use a fixed 1.15V (which is valid but not recommended) the vdd_core domain will draw significantly higher current and consume more power and generate more heat.  So if you really have a power and heat dissipation problem then running at fixed 1.15V will be going in the wrong direction.

    Regards,

    Kyle

  • Thanks Kyle,

    I know that you are not the expert in PMIC, can you please relay it the PMIC team? or how do I tag PMIC team here in this thread?

    At this point I am looking more and more on PMIC. In this "stuck" condition, where VDD_CORE is at 0V, why did PMIC (TPS6590378) release the reset? Shouldn't it hold RESET_OUT low until VDD_CORE is back to OTP (1.15V) ?

    SMPS6's max current output is 3A, it is possible that a short circuit condition occurred due to continuous rebooting at high temperature. I noticed that the current consumption could jump so high during bootup. If it ever gets to a short condition that leads to shutting down of SMPS6, who will reset it (INT2_STATUS register).

    Thank you!

    Louis

  • Hi Louis,

    I will provide an update to this tomorrow.

    Thanks,

    Daniel W

  • Hi Louis,

    You should be able to check if there was an overcurrent event in SMPS6 by checking the SMPS_SHORT_STATUS register. Additionally, the device may be shutting off an SMPS due to it crossing a thermal threshold. You can check this in the  SMPS_THERMAL_STATUS register.

    https://www.ti.com/lit/pdf/sliu015

    Additionally, as mentioned above, the SMPS can be disabled through I2C.

    Thanks,

    Daniel W

  • Hi Daniel,


    Thanks. I did an experiment on a separate unit (not the rebooting unit at high temperature) on my desk and shorted output of SMPS6 to ground. I was able to read the short status in SMPS_SHORT_STATUS register. Now, how does PMIC recover from this situation? Who is responsible to clear it when this happens? Because when it happens, processor (AM5748) is not running.

    I tried several things through I2C to reset it, like clearing INT2_STATUS, reading SMPS_SHORT_STATUS register, but none of them could reset it. 

    Thanks,

    Louis

  • Hi Louis,

    SMPS_SHORT_STATUS register should clear either on a read or a power-on reset, as described in the Datasheet, in section 5.3.2.1.3 Current Monitoring and Short Circuit Detection. https://www.ti.com/lit/gpn/tps659037

    However, if the SMPS is still shorted when the register is reset, then it will immediately trigger the short status register again.

    As for responsibility, both of these solutions require external control of the pmic.

    For clearing any interrupts generated by this please see Datasheet section 5.3.8 on clearing interrupts.

    Additionally, when you are shorting SMPS6, please ensure you do not exceed 4A as this may permanently damage the device.

    Thanks,

    Daniel W

  • Hi Daniel,

    Thank you for the pointers. Last week, I read SMS_SHORT_STATUS register and it was indeed having a short. However, it didn't recover and the short condition was removed before reading it. I will try again and update you.

    My next question is: how come PMIC is out of reset when vdd_core is outputting 0V? Isn't it supposed to hold RESET_OUT pin low?

    Regards,

    Louis

  • Hi Daniel,

    I could recover the device after shorting SMPS6 briefly by reading the SMPS_SHORT_STATUS register and applying warm reset (RESWARM pin). 

    However, could you help me understand my observation: how come PMIC is out of reset when vdd_core is outputting 0V? Isn't it supposed to hold RESET_OUT pin low?

    Thanks,

    Louis

  • Hi Louis,

    The TPS6590378 will set RESET_OUT to high at the end of the power sequence. After this, if a short is detected and an SMPS is turned off it will not pull RESET_OUT low again. Also, during the power sequence, short detection is masked. However, the POWERGOOD pin should change when the short is registered.

    Thanks,

    Daniel W