This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA2SX: thermal issue: critical termperature reached(125°C)

Part Number: TDA2SX

Hello, Cherry,

Nice to communicate with you. last week, I've met one question about thermal issue on SoC. when app is running, some log printed on screen and app stopped.
log as below:

[ 2740.140898] critical temperature 1 reached
[ 2740.141423] thermal thermal_zone0: critical temperature reached(125 C),shutting down
[ 2740.227952] Attempting kernel_power_off
[ 2740.228586] reboot: Failed to start orderly shutdown: forcing the issue
[ 2740.243864] reboot: Power down
[ 2740.244359] reboot: Power down
[ 2740.245737] palmas_power_off: Unable to write to DEV_CTRL_DEV_ON: -121
[ 2740.246570] kernel_power_off has failed! Attempting emergency_restart

it seems that some core in Soc is too hot to continue running. so monitor process power off the kernel. for current temprature of cores in Soc, we could check by below commands:

                cat /sys/class/thermal/thermal_zone0/temp 
                cat /sys/class/thermal/thermal_zone1/temp 
                cat /sys/class/thermal/thermal_zone2/temp 
                cat /sys/class/thermal/thermal_zone3/temp 
                cat /sys/class/thermal/thermal_zone4/temp 

I've searched some documents, there is a mapping about tda4x on thermal_zone and cores as below:

thermal_zone0 -> WKUP domain DMSC core 

thermal_zone1 -> MAIN domain MPU A72 core

thermal_zone2 -> MAIN domain C7x core

thermal_zone3 -> MAIN domain GPU core

thermal_zone4 -> MAIN domain R5F core

but I can't find the mapping about tda2X(tda2SX), could you kinldy tell me the mapping between cores and thermal zones? thanks in advance.

another question is :

could you kindly tell me what possible kinds of reasons could make the core too hot to run? I've checked the load of every core, the A15  load is about 40~50%, dsp2: 60%, dsp1: cant get the stat data,  ipu2 :40%, it seems ok. but the temprature would be very high when SR1 has little free memory. but i can't see the mechanism between the memory and the temprature.

thanks in advance.

  • The 5 sensors are near MPU, GPU, the core, IVA, and DSPEVE.

    the SoC consumes power and this turns into heat; the heat of the SoC will increase until the rate of heat generation matches the rate at which heat is removed from the system. This is a basic idea, but maybe it is worth thinking about how you are getting heat out of the system...

    Kevin

  • Hello, Kevin,

    Thanks for your kindly explanation.


    at first, one info need to confirm : whether the mapping between sensors and cores which mentioned in your response is :
    thermal_zone0 -> MPU(deduce that it's A15 core)

    thermal_zone1 -> GPU

    thermal_zone2 -> the core (IPU? )

    thermal_zone3 -> IVA

    thermal_zone4 -> DSP and EVE

    the PCB lays on the desk in our office room, the air condition is running, and the room temperature is about 20 to 25 °C. It seems that SoC gets little heat from environment.

    for mechanism on heat generating and removing, I agree that heat comes from power consumption of SoC. and it's observed that the temprature of SoC would increase with A15 load becoming heavier, e.g. from 5%  to 40%. So I guess that when load of A15 increases, the power consumption would increase, too. Then the temperature of SoC goes up.

    another observation is that when available SR1 memory decreases, the temperature of SoC increases. does it mean that when SR1 memory has little free space, the kernel would do more job to search and alloc the memory for app, so power consumption would increase and temperature of SoC goes up?

    thanks and best rgds.


  • I have copied header file info so that you can see the structure, but "yes" your understanding is correct:

    • MPU --> 0
    • CORE --> 1
    • IVA --> 2
    • DSPEVE ->3
    • GPU --> 4

    With regards to the real problem -- the part is getting hot -- can you add a heatsink or a fan? What I am writing is obvious but... The part is increasingly efficient at dissipating heat as the temperature difference between the part and the ambient increases. So, at some point, this will level-off but that is not the solution.

    • The part can only dissipate heat to the ambient through its surface so a heat sink increases the surface.
    • When a part has a heat sink, most of the heat is dissipated through the fins of the heat sink; without a heat sink, the heat flows through the BGA balls into the board. The board, then, dissipates heat to the ambient. 
    • The fan circulates air so that the "local" ambient does not rise and make the temperature difference small (and less efficient)

    typedef enum pmhalPrcmVdId
    {
    PMHAL_PRCM_VD_LOWER_BOUND = (-(int32_t)1),
    /**< Lower Bound Exclusive */
    PMHAL_PRCM_VD_MIN = 0,
    /**< Minimum Voltage Domain */
    PMHAL_PRCM_VD_MPU = PMHAL_PRCM_VD_MIN,
    /**< VD_MPU Voltage Domain */
    PMHAL_PRCM_VD_CORE = 1,
    /**< VD_MPU Voltage Domain */
    PMHAL_PRCM_VD_IVAHD = 2,

    ...

    Kevin

  • Hello, Kevin,

    Thanks for your kindly help to clarify the concepts and mechanism of dissipating heat.

    By adding heat sink, the chipset could control the temperature below 90 degree C. before that, I wanna solve this issue by software method, such as reducing the load of hottest core. It looks that hardware and mechanic solution would be more efficient for this issue.

    and it's said that the chipset would occur more faults with temperature increases. So, the whole cause-effect chain becomes clear:
    1, app makes the core heavy loaded;
    2, core becomes hot;

    3, SoC occurs more errors;

    4, App produces faults.

    This chain could explain the phenomenon observed.

    Thanks and best rgds.