This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728 Thermal Management Queries

Other Parts Discussed in Thread: AM5728

Hi All,

We were doing some tests on AM5728 and came across a situation where the CPU load went high suddenly. On debugging further we found that the MPU OPP had switched automatically from HIGH(1.5GHz) to OD (1.176GHz).

 

Based on the same we have the following observations and queries.

  1. We believe that there are on-chip thermal sensors for each zone and the device driver reads temperature of that particular zone. Can you please confirm if this is the junction temperature of that particular zone?
  2. We found out that some “talert” interrupts were triggered when this switch from HIGH to OD had happened. On checking the PSDK driver we saw that the device responds to two events “hot threshold and cold threshold” by changing the OPP when it hits the hot threshold and till it reaches the cold threshold it works at a reduced configuration. The hot threshold was set to 100C and critical temperature was set to 125C (shut down temperature). Can you please confirm if our understanding is correct here?
  3. Also as per the Table 5-3 in the AM5728 datasheet Rev “O”, the maximum junction temperature of commercial grade part is 90C. But, we are not clear on why the driver code uses 100C for the hot threshold.
    1. We also read the MPU’s thermal sensor using the following command “cat /sys/class/thermal/thermal_zone0/temp” and see that the value is around 85C.
    2. However on measuring the package temperature of the processor at the same time using a thermal gun, we saw it to be 50C.
    3. Now as per Table 5-18 of the datasheet Rev “O” of AM5728, R-thetaJC (junction to case) is 0.82 C/W. The R-thetaJT (junction to top) is 0.62 C/W.
    4. Thus the junction temperature should have been around 3-5 C higher than the measure package top temperature.
    5. Can you please help us understand this particular observation and such huge difference in the measured package top temperature and junction temperature.

 

It will be good if TI can help us understand above observations and let us know what can be done in case of such events. Also, if TI has any documentation on the thermal management (handled as part of Linux device drivers/frameworks) it will be good if it can be shared.

Regards

Ayusman

  • Hi Biser,

    We will go through this, but I dont feel this will be enough to answer our queries.
    If you see, also in this link it is mentioned the operating junction temperature is 0 to 90C. However the driver uses 100C for hot threshold. Do you have any idea on why there is difference in these values.

    It will be good if you can go through my original queries to get a feel of what experiments and study we have done so far and help us understand the behavior of processor better.

    Regards
    Ayusman
  • I will ask the factory team, but feedback may be delayed due to upcoming holidays.
  • Hello Ayusman,

    Answers to your questions are inline below, along with a couple follow-up questions.

    1. We believe that there are on-chip thermal sensors for each zone and the device driver reads temperature of that particular zone. Can you please confirm if this is the junction temperature of that particular zone?

      Correct

    2. We found out that some “talert” interrupts were triggered when this switch from HIGH to OD had happened. On checking the PSDK driver we saw that the device responds to two events “hot threshold and cold threshold” by changing the OPP when it hits the hot threshold and till it reaches the cold threshold it works at a reduced configuration. The hot threshold was set to 100C and critical temperature was set to 125C (shut down temperature). Can you please confirm if our understanding is correct here?

      Correct

    3. Also as per the Table 5-3 in the AM5728 datasheet Rev “O”, the maximum junction temperature of commercial grade part is 90C. But, we are not clear on why the driver code uses 100C for the hot threshold

      This is a bug due to reuse of omap4-cpu-thermal.dtsi (will be fixed in future release of SDK).  You can modify cpu_alert0 temperature to 90000 and cpu_alert1 temperature to 105000.  To be thorough, there are two other critical temperatures defined as 125C in omap5-gpu-thermal.dtsi and omap5-core-thermal.dtsi that should be changed to 105000.

      1. We also read the MPU’s thermal sensor using the following command “cat /sys/class/thermal/thermal_zone0/temp” and see that the value is around 85C.
      2. However on measuring the package temperature of the processor at the same time using a thermal gun, we saw it to be 50C.
      3. Now as per Table 5-18 of the datasheet Rev “O” of AM5728, R-thetaJC (junction to case) is 0.82 C/W. The R-thetaJT (junction to top) is 0.62 C/W.
      4. Thus the junction temperature should have been around 3-5 C higher than the measure package top temperature.

        You will also need the thermal resistance between the SoC and ambient, otherwise you do not have a complete circuit.  With a complete circuit, you can then work backwards to your case temperature.  We have a few older app-notes on ti.com that walk through these calculations, and will be refreshing the app note for the AM57xx family.  Are you using a heatsink/ heat spreader by chance?

      5. Can you please help us understand this particular observation and such huge difference in the measured package top temperature and junction temperature.

        Just to confirm, your physical measurement is 50C, and Linux is reporting 85C?  I will need to confirm on my side this observation - it is possible you are encountering a bug.  If you make repeated queries, does the temperature change at all?  What about the other thermal zones?

        Regards,
        Mike

  • Ayusman,

    Sorry, I forgot to supply you with some documentation as you requested.

    In the TRM, see section 18.4.6.2, Thermal Management Related Registers.

    An overview of the Linux thermal framework can be found here:
    https://linuxplumbersconf.org/2015/ocw//system/presentations/2613/original/thermal-framework-status-no-transitioning.pdf

    The Linux thermal framework kernel documentation can be found here:
    Documentation/devicetree/bindings/thermal/ti_soc_thermal.txt
    Documentation/devicetree/bindings/thermal/thermal.txt

    Regards,
    Mike

  • Hi Mike,

    Thanks for the links. I will go through them and let you know, if we have any queries.

    Meanwhile, please find my answers inline.

    1. Also as per the Table 5-3 in the AM5728 datasheet Rev “O”, the maximum junction temperature of commercial grade part is 90C. But, we are not clear on why the driver code uses 100C for the hot threshold

    [Mike] This is a bug due to reuse of omap4-cpu-thermal.dtsi (will be fixed in future release of SDK).  You can modify cpu_alert0 temperature to 90000 and cpu_alert1 temperature to 105000.  To be thorough, there are two other critical temperatures defined as 125C in omap5-gpu-thermal.dtsi and omap5-core-thermal.dtsi that should be changed to 105000.

    [Ayusman] We will check this and let you know.

    1. We also read the MPU’s thermal sensor using the following command “cat /sys/class/thermal/thermal_zone0/temp” and see that the value is around 85C.
    2. However on measuring the package temperature of the processor at the same time using a thermal gun, we saw it to be 50C.
    3. Now as per Table 5-18 of the datasheet Rev “O” of AM5728, R-thetaJC (junction to case) is 0.82 C/W. The R-thetaJT (junction to top) is 0.62 C/W.
    4. Thus the junction temperature should have been around 3-5 C higher than the measure package top temperature.

    [Mike] You will also need the thermal resistance between the SoC and ambient, otherwise you do not have a complete circuit.  With a complete circuit, you can then work backwards to your case temperature.  We have a few older app-notes on ti.com that walk through these calculations, and will be refreshing the app note for the AM57xx family.  Are you using a heatsink/ heat spreader by chance?

    [Ayusman] We assume that the R-thetaJT (junction to package top) is the thermal resistance between the SoC and ambient temp when there is no heatsink/heat spreader. Please let us know, if this assumption is fine.  We have not used a heatsink, and the 50C measurment is the package top temperature measured by a thermal gun. 

    1. Can you please help us understand this particular observation and such huge difference in the measured package top temperature and junction temperature.

    [Mike] Just to confirm, your physical measurement is 50C, and Linux is reporting 85C?  I will need to confirm on my side this observation - it is possible you are encountering a bug.  If you make repeated queries, does the temperature change at all?  What about the other thermal zones?

    [Ayusman] Yes, the physical measurement and Linux values are not matching. It will be really good if you can confirm from your side also. Are you talking about errata i813 and i814? If yes, workaround for these conditions is already present and is in use the driver. We have measured also for thermal zone 5 i.e IVAHD which was 3-4C higher than thermal zone 0. We have not measured the temperatures of other thermal zones.

    Regards

    Ayusman

  • Aysuman,

    Thank you for getting back to us.  My responses to your questions are below.

    [Ayusman] We assume that the R-thetaJT (junction to package top) is the thermal resistance between the SoC and ambient temp when there is no heatsink/heat spreader. Please let us know, if this assumption is fine.  We have not used a heatsink, and the 50C measurment is the package top temperature measured by a thermal gun. 

    [Mike] Additionally you must add the case-to-ambient thermal resistance term, which is not given separately in our data manual, but we do have Junction-to-free air (R-thetaJA).  Depending if the air is still, or moving, you'd be looking at 7.5-11.0 C/W.  If you had a heatsink, you would add your heatsink thermal resistance to the R-thetaJC term.

    In the case of no heatsink, a large percentage of the heat generated by the SoC will be dissipated through the PCB.  The thermal resistance term is called junction-to-board (R-thetaJB) and is listed as 3.78 C/W in the data manual.

    These thermal resistance numbers are measured using JEDEC standards and will vary based on application.  If you have a small, dense PCB, your thermal resistance values can be quite different.

    We have an older app note for the DM6446 that has good guidelines that can be applied to characterizing any SoC and deriving thermal resistance numbers for your system: http://www.ti.com/lit/an/spraae4a/spraae4a.pdf

     

    [Ayusman] Yes, the physical measurement and Linux values are not matching. It will be really good if you can confirm from your side also. Are you talking about errata i813 and i814? If yes, workaround for these conditions is already present and is in use the driver. We have measured also for thermal zone 5 i.e IVAHD which was 3-4C higher than thermal zone 0. We have not measured the temperatures of other thermal zones.

    [Mike] Yes, these were the errata I was thinking about.  I have been running experiments with cpuburn running on both A15 cores, but have not seen any anomalies yet.  My kernel version is 4.1.6-g52c4aa7.  Are you using a kernel derived from the TI SDK?  Which version are you on?

    Regards,
    Mike

  • Dear Mike,

    Yes we are using a kernel derived from TI-PSDK Version 02.00.00.00 which has the Linux kernel version linux-4.1.6-g52c4aa7.

    I went through the app note you have shared and have a few queries, listed below.

    1. Section 3.5 mentions about Psi-JT. This is what I had also assumed in my calculations shared in my post. Also section 5.1 tells how to measure the case temperature using IR Gun and we have done the measurement in the same way. I believe we measured using an IR gun is actually the package top temperature. And the Psi-JT value mentioned in datasheet is 0.6 whereas we are seeing Tj = 85C and T(package-top) = 50C which if we use to calculate Psi-JT will come to a very high value.

    2. I understand that the thetaJA is 11C/W for junction to ambient but what we have measured is actually the package top temperature of the IC and not the nearby ambient temperature.

    3. You have also mentioned that in case of no heatsink a lot of heat will be dissipated by PCB, in that case the amount of heat left to be dissipated by the case would be less and thus the difference in junction temperature and case temperature should have been less.

    Can you please check and let us know, if you are also seeing the same behavior.

    Regards

    Ayusman

  • Dear Mike,

    Did you get a chance to go through my above message.

    regards
    Ayusman
  • Ayusman,

    Sorry for the delay.

    Do you have a thermocouple you can use to confirm the processor lid temperature?  I believe we may be seeing effects of emissivity because of the relatively shiny processor lid.  

    Regards,
    Mike

  • Dear Mike,

    Unfortunately we dont have a thermocouple to measure the temperature. Can we do something with the IR gun we have to measure the temperature properly. Is there a way to take care of the effects of emissivity.

    Regards
    Ayusman
  • Ayusman,

    The best way would be to correct the measurement using the emissivity coefficient for the processor lid material. However bare metals can have a pretty wide emissivity range, depending on luster, so that may not really help. You can determine emissivity empirically with a thermocouple, but without one you cannot do that.

    If you can apply a layer of flat black paint, that should get you to much closer, but it will not be perfect (and will be messy).

    What is the reading on the back of your PCB under the processor (if accessible)?

    Check to see if you have any DMMs which can read a thermocouple- thermocouples by themselves are not too expensive.
  • Dear Mike,

    Sure. We are trying to get a thermocouple. Will update you once we do the measurements again.

    To summarize this discussion so far,

    1. I understand that you have already measured the package top temperature and the junction temperature and you see no disparities in the values. The values are inline with the theta values given in the datasheet.
    2. Also there are some bugs in the driver based on the temperature set, which is actually more than the junction temperature ratings of the device. These I believe will be fixed in future BSP releases.

    Can you please confirm.

    Regards
    Ayusman
  • Ayusman,

    To answer your questions:

    1)  the temperature measurements we have done here are reasonable, though I have not attempted to confirm the theta values.  This requires a specific test setup.

    2)  the bugs in the bandgap driver have been fixed as of processor sdk 2.0.  You can verify i813 and i814 fixes are present in your kernel/drivers/thermal/ti-soc-thermal/ti-bandgap.c

    A follow-up question for you: do you plan to use a heatsink or some other thermal management on your product?  What processor speed do you intend to use?

    Regards,
    Mike 

  • Hi Mike,

    We don't intend to use a heatsink for our product.

    We plan to run the AM5728 ARM at 1.5GHz with all the other cores up and running.Do you foresee any issues?

    Regards

    Ayusman

  • Ayusman,

    Have you done thermal modeling of the PCB?

    You will most likely require a heatsink, especially if your product will be used at elevated temperatures.  With the processor fully loaded you will probably see >8W power consumption and that heat has to go somewhere.

    If you've designed your board with enough copper to deal with this, great, but please be advised the AM57xx family of processors is on a different level in terms of processing power, and power consumption from our previous ARM A8/ A9 devices.  

    Regards,
    Mike

  • Hi Mike,

    We usually dont do thermal modeling of the PCB unless required.
    You are right, we may need a heatsink if the product is used at elevated temperatures. However the observations what i have shared with you are all at room temperature. At this ambient temperature, we dont expect the junction temperature to rise upto 85C even without using a heatsink.

    Also, can you tell us with which version of processor you have taken the measurements. We have used PG 1.1. Do you see any issues with that.

    Regards
    Ayusman
  • Ayusman,

    Much of our thermal data was taken early on in development, so was likely PG 1.0 or PG 1.1 silicon.

    Do you anticipate heavy CPU utilization for extended periods? I would recommend at least experimenting with running cpuburn on both ARM A15 cores for a period of time, and watch the junction temperature. This will give you some idea of worst-case loading on the ARM side.

    Another variable to consider is nominal vs hot silicon. Odds are you are testing with nominal silicon, since that makes up the majority of the shipped devices, but if you end up with some hot devices, you may find running at full clock speed without a heatsink is not possible.

    Regards,
    Mike

  • Dear Mike,

    We have made Thermal measurements with the Thermocouple as suggested. We have loaded both the ARM A15 MPU to near 96% of the load with frequency set to 1.5GHz. With this we see a difference of around 13 degrees between the Junction Temperature (reported in Linux) and the body temperature. This does not match with our understanding of the datasheet value of R-thetaJC (junction to case which is 0.82 C/W) x power dissipation. We have also gone through the links provided in this post but have not been able to clearly get any proper understanding.

    Also we observe that the Junction Temperature rise in very steep and is reached within 3 minutes of loading condition. 

    Regards,

    Ritesh 

  • Dear Mike,

    Did you get a chance to go through the earlier post by Ritesh. As you had suggested, we got a thermocouple and did the measurements again. However we still see discrepancies in our understanding.

    It will be good if you can help us out on this.

    Regards
    Ayusman
  • Ritesh,

    Assuming you do not have provisions for measuring SOC power, what is the power consumption at idle vs max MPU load?  If you can also share you actual temperature measurments, we can see how it correlates to our own measurements.  

    I recall you are not using a heatsink, correct?  R-thetaJC is only valid with a heatsink.  You should be looking at R-thetaJA in applications without a heatsink.

    In practice, R-theta numbers are really only good for making comparisons between packages and SOC vendors because they are measured using an industry standard test board, with the part dissipating a specific amount of power.  In an application without a heatsink, the majority of the heat generated by the SOC will be dissipated through the PCB, so design plays a very large role, and R-theta values will be different.  

    Regards,
    Mike

  • Hi Mike,

    We have done power measurements on our board where we load the one of the MPUs to 100%. We see that in idle condition we get power consumption of 4.9W. On loading the MPU to 100% we see power consumption of 7.4W. Can you please check and let us know, if you have also seen similar behavior.

    Regards

    Ayusman

  • Hi Ayusman,

    The delta between your idle power and fully-loaded single MPU core is close to our measurements: our numbers on nominal silicon, on the X15 EVM is approximately 2.6W, close to your 2.5W delta.

    For single core, 100% MPU load, we measure Tambient = 25C, Tcase = 55.7C, and Tj = 64C (Tj is coming from the MPU bandgap temperature sensor).

    The thermal delta is about 8.3C, which is a bit less than your 13C.  We can probably attribute the delta to our larger board- I recall you are designing a SOM?

    Regards,
    Mike