AM5726: problem with internal temperature sensor

Part Number: AM5726
Other Parts Discussed in Thread: BEAGLEBOARD-X15

AM5726_Temperature_Analysis_EN.pdf 

  • Hello Christoph,

    Can you clarify your question? I am not sure there is a question, from the document it seems like the system is working as expected, correct?

    I will confer internally to see what is the range of normal temperature fluctuations.

    I believe the SoC is setup to have the governor setup as "ondemand", see https://software-dl.ti.com/processor-sdk-linux/esd/AM57X/11_01_02_01/exports/docs/linux/Foundational_Components/Power_Management/pm_dvfs.html

    -Josue

  • Hello Josue,

    My colleagues are a little concerned that there might be a hardware issue somewhere, because the temperature fluctuates more than on the BeagleBoard. Personally, I’m a bit more relaxed because everything else on our board works fine: the two DDR3 RAM banks, PCIe, 2x 1 Gbps LAN, and even USB 3.0. The temperature sensor is completely internal—maybe the bootloader configures it a little differently on the Beagle Board????

  • Christoph,

    Could you please expand on what you mean by HW issue? Meaning on the SoC itself or your own design's PCB?

    -Josue 

  • They think the range of temperature fluctuations is too high and this could have a reason in the hardware??? All this is on a board designed by myself. The Beagle board is only the reference.

  • Hello Josue,

    Allow me to show you the issue on our custom hardware:



    As you can see, these variations cannot possibly be the real temperature.
    The BeagleBoard-x15 does not show this behaviour: the curve there is very smooth, with maybe 1°C difference between readouts.

    I guess the questions are:
    - What could explain this issue on our hardware?
    - On what do these bandgap temperature registers depend? (it does not seem possible to configure anything in software).

    Also worth mentioning:
    * This seems totally independent from the software used: the results are the same between Linux (Scarthgap BSP) and the TI baremetal "Readout_Temperatures" (AM572x_prcm_config.gel).
    * We had (I/O) issues in the past because of our custom power-up logic/sequence. As Christoph wrote, everything else now works as expected.
    * We could use omapconf to audit/dump DPLL/OPP/PRCM etc if this helps.

    Julien

  • Hi Julien,

    Thank you for the graph. I will reassign this to one of or power/thermal engineers to review. 

    -Josue

  • I share your concern/uncertainty about the temperature response. A temperature response is often relatively slow. In this case you are showing a 15C jump in ~50ms. This is possible but a little unusual. Two high-level questions:
    (1)
    The initial document comments upon an 8C delta between the reported -43C and -35C. I assumed that in this case you were intending to be a- -40C. Can you confirm?
    The document talks about idle/wakeup cycling. If I were trying to measure the stability of the temperature sensor, I would recommend providing a static load to the SoC. The idea would be to eliminate sharp changes in the SoC temperature due to changes in the loading. Do you have results with a more-or-less constant SoC loading as opposed to the idle/wakeup ideas?
    (2)
    In the graph within the e2e ticket, the temperature range varies between ~70C and ~40C. I assume that this is a room temperature test. Can you confirm?
    Also, in this test is the SoC load static?

    I know that you have provided information on the settings before, but I'd like to see these registers on your custom hardware.

    Clocks:
    CM_L3INSTR_CTRL_MODULE_BANDGAP_CLKCTRL  0x4A008E50
    CM_CLKSEL_WKUPAON  0x4AE06108
    CM_CLKSEL_ABE_LP_CLK  0x4AE061D8
    CTRL_CORE_BANDGAP_MASK_1 0x4A002380

    Bandgap trim:
    CTRL_CORE_STD_FUSE_OPP_BGAP_MPU  0x4A0021E4

    Board-level connections:
    The MPU bandgap is supplied through the pin: vdda_ldo_bg_mpu1. Could you just make sure that the routing on your custom hardware makes sense to this pin?

    And finally, if you ran this same test on the CORE temperature sensor, do you see the same kind of variation?

    Kevin

  • Hello Kevin, I will cover what I can:

    (2)
    - Yes, those are always room temperature tests.
    - The SoC load is static.
    We see these same variations whether the board is mostly idling (Linux boot only: ~0% CPU load) or if we stress it (Linux with our apps: 2xCPU @ 100% + DSP1 running).
    - The requested dumps:

    Clocks:
    0x01000001 (CM_L3INSTR_CTRL_MODULE_BANDGAP_CLKCTRL)
    0x00000000 (CM_CLKSEL_WKUPAON)
    0x00000000 (CM_CLKSEL_ABE_LP_CLK)
    0x2000002A (CTRL_CORE_BANDGAP_MASK_1)
    Bandgap trim:
    0xFEFF0400 (CTRL_CORE_STD_FUSE_OPP_BGAP_MPU)

    - And yes we see the same variation for the CORE, this is not specific to the MPU.
    Actually all sensors show the same varitions at the exact same time. For instance when calling "omapconf show temp" manually 2 times:
    |--------------------------------------------|
    | Sensor | Temperature (C) | Temperature (F) |
    |--------------------------------------------|
    | MPU    | 55              | 131             |
    | GPU    | 55              | 131             |
    | CORE   | 52              | 125             |
    | IVA    | 53              | 127             |
    | DSPEVE | 53              | 127             |
    |--------------------------------------------|
    
    |--------------------------------------------|
    | Sensor | Temperature (C) | Temperature (F) |
    |--------------------------------------------|
    | MPU    | 39              | 102             |
    | GPU    | 40              | 104             |
    | CORE   | 40              | 104             |
    | IVA    | 42              | 107             |
    | DSPEVE | 39              | 102             |
    |--------------------------------------------|

    Julien

  • Julien,

    I think the stair-step pattern might be related to the setting in the following register. The initial "2" indicates that bit 29 is set (bits 28 and 27 are 0). This setting creates a counter delay of 250ms. I think the stair step pattern arises in this example because the code is re-reading the same data over-and-over. 

    0x2000002A (CTRL_CORE_BANDGAP_MASK_1)

    In order to prevent reading the same data repeatedly, you might validate the EOCz bit.

    -----

    This suggestion does not obviously address the variability of the measurements. Still, it might be worth trying this modification. (The COUNTER_DELAY could also be reduced...)

    Kevin

  • Hello Kevin,
    the readout frequency is not really a concern at the moment, but thanks for the info: the EOCz tip will be useful later.
    Anything else I can do on the software side to help with the investigation?
    Julien

  • Julien,

    My understanding is that you have this board which behaves strangely and another board that behaves as expected. 

    I have two requests:

    (1) read out DIE ID from both good and bad boards

    CTRL_WKUP_STD_FUSE_DIE_ID_0 0x4AE0C200
    CTRL_WKUP_STD_FUSE_DIE_ID_1 0x4AE0C208
    CTRL_WKUP_STD_FUSE_DIE_ID_2 0x4AE0C20C
    CTRL_WKUP_STD_FUSE_DIE_ID_3 0x4AE0C210

    (2) Read Bandgap trim from the good board:
    CTRL_CORE_STD_FUSE_OPP_BGAP_MPU  0x4A0021E4

    Thanks.

    Kevin

  • Hello Kevin, yes our evaluation board is the BB-X15:
    www.beagleboard.org/.../beagleboard-x15

    Here the requested values in one table:

    |---------------------------------------|
    | BeagleBoard-X15 | Custom AM5726 board |
    |---------------------------------------|
    | 0x19011003      | 0x0400800B          | 0x4AE0C200 (Die ID 0)
    | 0x01495F1B      | 0x0153F06D          | 0x4AE0C208 (Die ID 1)
    | 0x3EBE1A1E      | 0x3ABE1A1E          | 0x4AE0C20C (Die ID 2)
    | 0x5D8C08E2      | 0x4FF008A2          | 0x4AE0C210 (Die ID 3)
    |---------------------------------------|
    | 0x01000001      | 0x01000001          | 0x4A008E50 (Bgap clk ctrl)
    | 0x00000000      | 0x00000000          | 0x4AE06108 (Wkup always-on)
    | 0x00000000      | 0x00000000          | 0x4AE061D8 (ABE LP clk)
    | 0x28000015      | 0x2000002A          | 0x4A002380 (Bgap mask 1)
    |---------------------------------------|
    | 0xFE09020F      | 0xFEFF0400          | 0x4A0021E4 (Fuse OPP Bgap MPU)

    Thank you for your help.
    Julien

  • This behavior does not match our expectations, and we do expect the tighter range you see on the Beagleboard.

    If you needed to return the part, you can follow a link....

    https://www.ti.com/quality-reliability/faqs/customer-returns.html

    Kevin

  • I checked again this:Board-level connections:
    The MPU bandgap is supplied through the pin: vdda_ldo_bg_mpu1. Could you just make sure that the routing on your custom hardware makes sense to this pin?

    Within the reference manual you will find this vdda_ldo_bg_mpu1 only twice page 454


    and on page 458


    Within the pinout table you will not find vdda_ldo_bg_mpu1. But from the signals around I suppose in is ball N16.

    Can you confirm this.