This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62A7: Not Booting When Temperature is High

Part Number: AM62A7
Other Parts Discussed in Thread: SK-AM62A-LP

Tool/software:

Hello.
I'm Jae Young Choi.

We are experiencing a boot issue on our custom board using the AM62A7 and would like to ask for your support.
Is it possible that the AM62A7 fails to boot when its internal temperature is high?


Experiment Setup:

  • Custom board based on TI AM62A7
  • Both A53 and C7x cores are used
  • PROCESSOR-SDK-LINUX-AM62A 10.00.00.08
  • Booting from eMMC

Observed Behavior:

  • After operating for 2–3 hours, some boards fail to power up when turned off and on again.
  • This occurs in about 2 to 3 out of 10 boards.
  • The failure is not consistent: different boards may fail on different attempts.
  • At the time of failure, the MPU temperature is around 80°C to 90°C.

Expected Behavior:

  • The board should boot successfully even when the temperature is around 80°C to 90°C.

Additional Experiment:

  •   When heating the SoC area with a heat gun to raise the temperature above 90°C, the board consistently fails to power on.
    However, if the power is turned off and left for about 5 minutes to cool down, all boards power on normally again.


Questions:

  • Can high internal temperature at power-on prevent the AM62A7 from booting?
  • The datasheet specifies a maximum operating temperature of 125°C, but is there a lower temperature limit specifically for the power-on or boot process?

Serial Log:

U-Boot SPL 2024.04-ti-g818c76aed67f (Nov 11 2024 - 15:14:15 +0900)
SYSFW ABI: 4.0 (firmware rev 0x000a '10.0.8--v10.00.08 (Fiery Fox)')
SPL initial stack usage: 13568 bytes
Trying to boot from MMC1

Thank you. 

Best regards. 

  • Hello Jae Young Choi, 

    Thank you for the query.

    At the time of failure, the MPU temperature is around 80°C to 90°C.

    help me understand what you mean by MPU temperature - is this the VTM sensor temperature?

    Regards,

    Sreenivasa

  • Hello 

    Yes. I measured the VTM sensor temperature using the command below.  

    cat /sys/class/thermal/thermal_zone*/temp

    Thank you for your assistance. 

    Regards. 

  • Hello Jae Young Choi, 

    Yes. I measured the VTM sensor temperature using the command below.  

    Was the VTM temperature measured as 90C?

    Do you know if customer made the changes that were suggested for the 5V DC/DC made - reducing cap.

    Regards,

    Sreenivasa

  • Hello. 

    The temperatures I mentioned were measured before making any modifications to the 5V DCDC peripheral circuitry.

    Thank you. 

    Regards 

  • Hello Jae Young Choi,

    Thank you.

    Was the VTM temperature measured as 90C?

    Please help answer the query.

    Did customer make some progress?

    Regards,

    Sreenivasa

  • Hello.

    The customer tried reducing the capacitance by approximately 100 uF, but it had no effect.
    On the custom board, low-cost alternatives were used for both the 25 MHz and 32.768 kHz crystals.
    We plan to test the custom board using the ECS crystal used on the TI SK-AM62A-LP.
    The custom board also uses a low-cost alternative for the eMMC. We will test with the Micron eMMC used on the TI SK-AM62A-LP as well.
    We will share the booting test results once they are available.

    Thank you.
    Regards.

  • Hello Jae Young Choi,

    Thank you.

    I have a few observations (likely value changes) on the schematics and will share the same for customer to review and make updates while testing at the earliest.

    Regards,

  • Hello, Mr. Sreenivasa

    It's been a while.
    For the mean time, we made new boards that applying ECS crystal & Micron eMMC as same as EVM.
    In fact, those are four types of combinations placing two vendor crystal & eMMC each to figure out the cause.
    1. alternative crystal & alternative eMMC
    2. alternative crystal & Micron eMMC
    3. ECS crystal & alternative eMMC
    4. ECS crystal & Micron eMMC

    Unfortunately, all types have the same symptoms, reboot failure, when the board temperature is around 90°C~100°C.
    We tested them under an extreme circumstance with higher temperature than we said before to reproduce as same as similar condition for real.

    So, now I am asking you,

    1. I thought the reboot failure is caused by the high temperature condition.
    Could it be the reason of reboot failure that crystal and/or eMMC under the condition?
    The operating temperature specifications of crystal & eMMC, 125°C, are much higher than the tested condition.
    Or any other operating condition could not make MPU to reboot, such as clock timing of crystal?(possibly affecting boot sequence?)

    2. Under the condition, is it possible to malfunction of the PMIC to supply right power to MPU?
    I found "Power Supply Requirements", section 7.10.2, in the MPU datasheet.
    It explanes the requirement with "Slew Rate", and I guess it is the reason you make me to try reducing cap in the other previous thread.
    Am I correct?

    3. I understand that you have some point to test.
    Could you let me know the result?

  • Hello HOGYUN RYU 

    Thank you for the inputs.

    I mentioned this before but can you measure the VTM temperature.

    The SOC uses the die temp to trip and this can be higher than the ambient.

    Regards,

    Sreenivasa

  • Hello Kallikuppa Sreenivasa

    Thank you for your reply.
    Apologies for the confusion.
    All the temperature values mentioned refer to the VTM temperatures, obtained using the following command:

    # cat /sys/class/thermal/thermal_zone*/temp
    84974
    84498
    86869
    

    Thank you.
    Best regards,

  • Hello Jae Young Choi,

    Thank you.

    I understand that you have some point to test.
    Could you let me know the result?

    Not sure where was this discussed.

    I reviewed the schematics and shared the review comments.

    Some of the review comments are related to value changes.

    Can you please review the comments and check.

    Can you please confirm the ambient temperature you are setting or seeing.

    Regards,

    Sreenivasa

  • Hello Jae Young Choi,

    Is the heating done locally or in a chamber?

    Is there a way to verify if the board has come out of reset and trying to boot and failed or the board did not startup ?

    Regards,

    Sreenivasa

  • Hello Kallikuppa Sreenivasa,

    We applied heat to the entire device.
    To raise the temperature quickly, we covered the devices with blankets.

    For debugging, we connected a UART0 debug cable to one of the devices.
    When the temperature reached around 85°C, we performed cold reset.

    The device comes out of reset but hangs during the boot process.
    It does not always stop at the same point—sometimes it hangs during U-Boot, and other times during kernel booting.
    For your reference, I have attached two representative log files.

    If you have any ideas or suggestions that might be helpful, please feel free to share them.

    Thank you.

    Regards. 

    [ 5249.581712] audit: type=1334 audit(1751366042.968:209): prog-id=113 op=LOAD
    [ 5279.836836] audit: type=1334 audit(1751366073.988:210): prog-id=113 op=UNLOAD
    [ 5279.844140] audit: type=1334 audit(1751366073.988:211): prog-id=112 op=UNLOAD
    [ 5279.851340] audit: type=1334 audit(1751366073.988:212): prog-id=111 op=UNLOAD
    
    U-Boot SPL 2024.04-ti-g818c76aed67f (Nov 11 2024 - 15:14:15 +0900)
    SYSFW ABI: 4.0 (firmware rev 0x000a '10.0.8--v10.00.08 (Fiery Fox)')
    SPL initial stack usage: 13568 bytes
    Trying to boot from MMC1
    Authentication passed
    Authentication passed
    Authentication passed
    Authentication passed
    Authentication passed
    Starting ATF on ARM64 core...
    
    NOTICE:  BL31: v2.10.0(release):v2.10.0-367-g00f1ec6b87-dirty
    NOTICE:  BL31: Built : 16:09:05, Feb  9 2024
    ERROR:   Timeout waiting for thread SP_RESPONSE to fill
    ERROR:   Thread SP_RESPONSE verification failed (-60)
    ERROR:   Message receive failed (-60)
    ERROR:   Failed to get response (-60)
    ERROR:   Transfer send failed (-60)
    ERROR:   Timeout waiting for thread SP_RESPONSE to fill
    ERROR:   Thread SP_RESPONSE verification failed (-60)
    ERROR:   Message receive failed (-60)
    ERROR:   Failed to get response (-60)
    ERROR:   Transfer send failed (-60)
    ERROR:   Unable to query firmware capabilities (-60)
    E/TC:0 0 k3_sec_proxy_verify_thread:108 Queue is busy
    E/TC:0 0 k3_sec_proxy_recv:196 Thread SEC_PROXY_RESPONSE_THREAD verification failed. ret = -65523
    E/TC:0 0 ti_sci_get_response:101 Message receive failed (-65523)
    E/TC:0 0 ti_sci_do_xfer:150 Failed to get response (-65523)
    E/TC:0 0 ti_sci_init:486 Unable to communicate with control firmware (-65523)
    E/TC:0 0 call_initcalls:43 Initcall __text_start + 0x00070568 failed
    E/TC:0 0 k3_sec_proxy_verify_thread:108 Queue is busy
    E/TC:0 0 k3_sec_proxy_recv:196 Thread SEC_PROXY_RESPONSE_THREAD verification failed. ret = -65523
    E/TC:0 0 ti_sci_get_response:101 Message receive failed (-65523)
    E/TC:0 0 ti_sci_do_xfer:150 Failed to get response (-65523)
    E/TC:0 0 k3_sec_proxy_verify_thread:108 Queue is busy
    E/TC:0 0 k3_sec_proxy_recv:196 Thread SEC_PROXY_RESPONSE_THREAD verification failed. ret = -65523
    E/TC:0 0 ti_sci_get_response:101 Message receive failed (-65523)
    E/TC:0 0 ti_sci_do_xfer:150 Failed to get response (-65523)
    E/TC:0 0 k3_sec_proxy_verify_thread:108 Queue is busy
    E/TC:0 0 k3_sec_proxy_recv:196 Thread SEC_PROXY_RESPONSE_THREAD verification failed. ret = -65523
    E/TC:0 0 ti_sci_get_response:101 Message receive failed (-65523)
    E/TC:0 0 ti_sci_do_xfer:150 Failed to get response (-65523)
    E/TC:0 0 tee_otp_get_hw_unique_key:97 Could not get HUK
    E/TC:0 0 call_initcalls:43 Initcall __text_start + 0x00070590 failed
    E/TC:0 0 k3_sec_proxy_verify_thread:108 Queue is busy
    E/TC:0 0 k3_sec_proxy_recv:196 Thread SEC_PROXY_RESPONSE_THREAD verification failed. ret = -65523
    E/TC:0 0 ti_sci_get_response:101 Message receive failed (-65523)
    E/TC:0 0 ti_sci_do_xfer:150 Failed to get response (-65523)
    E/TC:0 0 k3_sec_proxy_verify_thread:108 Queue is busy
    E/TC:0 0 k3_sec_proxy_recv:196 Thread SEC_PROXY_RESPONSE_THREAD verification failed. ret = -65523
    E/TC:0 0 ti_sci_get_response:101 Message receive failed (-65523)
    E/TC:0 0 ti_sci_do_xfer:150 Failed to get response (-65523)
    E/TC:0 0 sa2ul_init:106 Could not change TRNG firewall owner
    E/TC:0 0 call_initcalls:43 Initcall __text_start + 0x00070598 failed
    E/TC:0 0
    E/TC:0 0 Core data-abort at address 0x14 (translation fault)
    E/TC:0 0  esr 0x96000005  ttbr0 0x9e8a1000   ttbr1 0x00000000   cidr 0x0
    E/TC:0 0  cpu #0          cpsr 0x600003c4
    E/TC:0 0  x0  000000009e874000 x1  0000000000000000
    E/TC:0 0  x2  0000000000000000 x3  0000000000000000
    E/TC:0 0  x4  0000000000000050 x5  000000009e891d70
    E/TC:0 0  x6  ffffffffffffffb0 x7  0000000000010cb0
    E/TC:0 0  x8  0000000000010cb0 x9  000000009e891f80
    E/TC:0 0  x10 000000009e881070 x11 0000000000000008
    E/TC:0 0  x12 0000000000000000 x13 000000009e8a2e60
    E/TC:0 0  x14 0000000000000000 x15 0000000000000000
    E/TC:0 0  x16 000000009e81c74c x17 0000000000000000
    E/TC:0 0  x18 0000000000000000 x19 000000009e8a31e0
    E/TC:0 0  x20 000000009e8a31e8 x21 000000009e874000
    E/TC:0 0  x22 000000009e874000 x23 000000009e874f00
    E/TC:0 0  x24 000000009e873dc0 x25 0000000000000000
    E/TC:0 0  x26 0000000000000000 x27 0000000000000000
    E/TC:0 0  x28 0000000000000000 x29 000000009e8a3170
    E/TC:0 0  x30 000000009e816f6c elr 000000009e816f7c
    E/TC:0 0  sp_el0 000000009e8a3170
    E/TC:0 0 TEE load address @ 0x9e800000
    E/TC:0 0 Call stack:
    E/TC:0 0  0x9e816f7c
    E/TC:0 0  0x9e807c64
    E/TC:0 0  0x9e8220ec
    E/TC:0 0  0x9e807de0
    E/TC:0 0 Panic 'unhandled pageable abort' at /usr/src/debug/optee-os/4.2.0+git/core/arch/arm/kernel/abort.c:582 <abort_handler>
    E/TC:0 0 TEE load address @ 0x9e800000
    E/TC:0 0 Call stack:
    E/TC:0 0  0x9e808030
    E/TC:0 0  0x9e81ebb0
    E/TC:0 0  0x9e807884
    E/TC:0 0  0x9e804a68
    

    U-Boot SPL 2024.04-gb666d1ba-dirty (Jul 14 2025 - 17:04:52 +0900)
    SYSFW ABI: 4.0 (firmware rev 0x000a '10.0.8--v10.00.08 (Fiery Fox)')
    SPL initial stack usage: 13568 bytes
    Trying to boot from MMC1
    Authentication passed
    Authentication passed
    Authentication passed
    Authentication passed
    Authentication passed
    Starting ATF on ARM64 core...
    
    NOTICE:  BL31: v2.10.0(release):v2.10.0-367-g00f1ec6b87-dirty
    NOTICE:  BL31: Built : 16:09:05, Feb  9 2024
    
    U-Boot SPL 2024.04-gb666d1ba-dirty (Jul 14 2025 - 17:04:47 +0900)
    SYSFW ABI: 4.0 (firmware rev 0x000a '10.0.8--v10.00.08 (Fiery Fox)')
    Delaying before mmc_init: 50 msec
    Trying to boot from MMC1
    Authentication passed
    Authentication passed
    
    
    U-Boot 2024.04-gb666d1ba-dirty (Jul 14 2025 - 17:04:47 +0900)
    
    SoC:   AM62AX SR1.0 HS-FS
    Model: S1 AM62A7 S1HS
    DRAM:  2 GiB (effective 4 GiB)
    Delaying in board_init: 30 msec
    Core:  83 devices, 28 uclasses, devicetree: separate
    MMC:   mmc@fa10000: 0, mmc@fa00000: 1
    Loading Environment from nowhere... OK
    In:    serial@2800000
    Out:   serial@2800000
    Err:   serial@2800000
    Net:   Could not get PHY for mdio@f00: addr 0
    am65_cpsw_nuss_port ethernet@8000000port@1: phy_connect() failed
    No ethernet found.
    
    Hit any key to stop autoboot:  0
    switch to partitions #0, OK
    mmc0(part 0) is current device
    SD/MMC found on device 0
    Can't set block device
    20632064 bytes read in 193 ms (101.9 MiB/s)
    60448 bytes read in 33 ms (1.7 MiB/s)
    Working FDT set to 88000000
    ## Flattened Device Tree blob at 88000000
       Booting using the fdt blob at 0x88000000
    Working FDT set to 88000000
       Loading Device Tree to 000000008feee000, end 000000008fffffff ... OK
    Working FDT set to 8feee000
    
    Starting kernel ...
    
    [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
    [    0.000000] Linux version 6.6.32-ti-01301-gdb8871293143-dirty (oe-user@oe-host) (aarch64-oe-linux-gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.42.0.20240620) #1 SMP PREEMPT Thu Aug  1 19:10:56 UTC 2024
    [    0.000000] KASLR disabled due to lack of seed
    [    0.000000] Machine model: Texas Instruments AM62A7 S1HS
    [    0.000000] earlycon: ns16550a0 at MMIO32 0x0000000002800000 (options '')
    [    0.000000] printk: bootconsole [ns16550a0] enabled
    [    0.000000] efi: UEFI not found.
    [    0.000000] Reserved memory: created CMA memory pool at 0x00000000c0000000, size 576 MiB
    [    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x00000000c0000000..0x00000000e3ffffff (589824 KiB) map reusable linux,cma
    [    0.000000] OF: reserved mem: 0x0000000080000000..0x000000008007ffff (512 KiB) nomap non-reusable tfa@80000000
    [    0.000000] Reserved memory: created DMA memory pool at 0x0000000099800000, size 1 MiB
    [    0.000000] OF: reserved mem: initialized node c7x-dma-memory@99800000, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x0000000099800000..0x00000000998fffff (1024 KiB) nomap non-reusable c7x-dma-memory@99800000
    [    0.000000] Reserved memory: created DMA memory pool at 0x0000000099900000, size 31 MiB
    [    0.000000] OF: reserved mem: initialized node c7x-memory@99900000, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x0000000099900000..0x000000009b7fffff (31744 KiB) nomap non-reusable c7x-memory@99900000
    [    0.000000] Reserved memory: created DMA memory pool at 0x000000009b800000, size 1 MiB
    [    0.000000] OF: reserved mem: initialized node r5f-dma-memory@9b800000, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x000000009b800000..0x000000009b8fffff (1024 KiB) nomap non-reusable r5f-dma-memory@9b800000
    [    0.000000] Reserved memory: created DMA memory pool at 0x000000009b900000, size 15 MiB
    [    0.000000] OF: reserved mem: initialized node r5f-dma-memory@9b900000, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x000000009b900000..0x000000009c7fffff (15360 KiB) nomap non-reusable r5f-dma-memory@9b900000
    [    0.000000] Reserved memory: created DMA memory pool at 0x000000009c800000, size 1 MiB
    [    0.000000] OF: reserved mem: initialized node r5f-dma-memory@9c800000, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x000000009c800000..0x000000009c8fffff (1024 KiB) nomap non-reusable r5f-dma-memory@9c800000
    [    0.000000] Reserved memory: created DMA memory pool at 0x000000009c900000, size 30 MiB
    [    0.000000] OF: reserved mem: initialized node r5f-dma-memory@9c900000, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x000000009c900000..0x000000009e6fffff (30720 KiB) nomap non-reusable r5f-dma-memory@9c900000
    [    0.000000] OF: reserved mem: 0x000000009e800000..0x000000009fffffff (24576 KiB) nomap non-reusable optee@9e800000
    [    0.000000] OF: reserved mem: 0x00000000a0000000..0x00000000a0ffffff (16384 KiB) nomap non-reusable edgeai-rtos-ipc-memory-region
    [    0.000000] Reserved memory: created DMA memory pool at 0x00000000a1000000, size 32 MiB
    [    0.000000] OF: reserved mem: initialized node edgeai-dma-memory@a1000000, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x00000000a1000000..0x00000000a2ffffff (32768 KiB) nomap non-reusable edgeai-dma-memory@a1000000
    [    0.000000] OF: reserved mem: initialized node edgeai_shared-memories, compatible id dma-heap-carveout
    [    0.000000] OF: reserved mem: 0x00000000a3000000..0x00000000adffffff (180224 KiB) map non-reusable edgeai_shared-memories
    [    0.000000] Reserved memory: created DMA memory pool at 0x00000000ae000000, size 288 MiB
    [    0.000000] OF: reserved mem: initialized node edgeai-core-heap-memory@ae000000, compatible id shared-dma-pool
    [    0.000000] OF: reserved mem: 0x00000000ae000000..0x00000000bfffffff (294912 KiB) nomap non-reusable edgeai-core-heap-memory@ae000000
    [    0.000000] Zone ranges:
    [    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
    [    0.000000]   DMA32    empty
    [    0.000000]   Normal   [mem 0x0000000100000000-0x00000008ffffffff]
    [    0.000000] Movable zone start for each node
    [    0.000000] Early memory node ranges
    [    0.000000]   node   0: [mem 0x0000000080000000-0x000000008007ffff]
    [    0.000000]   node   0: [mem 0x0000000080080000-0x00000000997fffff]
    [    0.000000]   node   0: [mem 0x0000000099800000-0x000000009e6fffff]
    [    0.000000]   node   0: [mem 0x000000009e700000-0x000000009e7fffff]
    [    0.000000]   node   0: [mem 0x000000009e800000-0x00000000a2ffffff]
    [    0.000000]   node   0: [mem 0x00000000a3000000-0x00000000adffffff]
    [    0.000000]   node   0: [mem 0x00000000ae000000-0x00000000bfffffff]
    [    0.000000]   node   0: [mem 0x00000000c0000000-0x00000000ffffffff]
    [    0.000000]   node   0: [mem 0x0000000880000000-0x00000008ffffffff]
    [    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000008ffffffff]
    [    0.000000] psci: probing for conduit method from DT.
    [    0.000000] psci: PSCIv1.1 detected in firmware.
    [    0.000000] psci: Using standard PSCI v0.2 function IDs
    [    0.000000] psci: Trusted OS migration not required
    [    0.000000] psci: SMC Calling Convention v1.4
    [    0.000000] percpu: Embedded 20 pages/cpu s43112 r8192 d30616 u81920
    [    0.000000] Detected VIPT I-cache on CPU0
    [    0.000000] CPU features: detected: GIC system register CPU interface
    [    0.000000] CPU features: detected: ARM erratum 845719
    [    0.000000] alternatives: applying boot alternatives
    [    0.000000] Kernel command line: console=ttyS2,115200n8 earlycon=ns16550a,mmio32,0x02800000 mtdparts=spi-nand0:512k(ospi_nand.tiboot3),2m(ospi_nand.tispl),4m(ospi_nand.u-boot),256k(ospi_nand.env),256k(ospi_nand.env.backup),98048k@32m(ospi_nand.rootfs),256k@130816k(ospi_nand.phypattern) root=PARTUUID=a90d02f3-01 rw rootfstype=ext4 rootwait
    [    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
    [    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
    [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1032192
    [    0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
    [    0.000000] software IO TLB: area num 4.
    [    0.000000] software IO TLB: mapped [mem 0x00000000fbfff000-0x00000000fffff000] (64MB)
    [    0.000000] Memory: 2807496K/4194304K available (12224K kernel code, 1268K rwdata, 4092K rodata, 2432K init, 508K bss, 796984K reserved, 589824K cma-reserved)
    [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
    [    0.000000] rcu: Preemptible hierarchical RCU implementation.
    [    0.000000] rcu:     RCU event tracing is enabled.
    [    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
    [    0.000000]  Trampoline variant of Tasks RCU enabled.
    [    0.000000]  Tracing variant of Tasks RCU enabled.
    [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
    [    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
    [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
    [    0.000000] Unable to handle kernel paging request at virtual address 0000000000004000
    [    0.000000] Mem abort info:
    [    0.000000]   ESR = 0x0000000096000004
    [    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
    [    0.000000]   SET = 0, FnV = 0
    [    0.000000]   EA = 0, S1PTW = 0
    [    0.000000]   FSC = 0x04: level 0 translation fault
    [    0.000000] Data abort info:
    [    0.000000]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
    [    0.000000]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
    [    0.000000]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
    [    0.000000] [0000000000004000] user address but active_mm is swapper
    [    0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
    [    0.000000] Modules linked in:
    [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.32-ti-01301-gdb8871293143-dirty #1
    [    0.000000] Hardware name: Texas Instruments AM62A7 S1HS (DT)
    [    0.000000] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [    0.000000] pc : __pi_strcmp+0x24/0x140
    [    0.000000] lr : __of_device_is_compatible+0xb4/0x15c
    [    0.000000] sp : ffff800081273cc0
    [    0.000000] x29: ffff800081273cc0 x28: 00000000830040ac x27: 0000000000000000
    [    0.000000] x26: 0000000000000000 x25: ffff00087f8a0e98 x24: ffff8000810b92e0
    [    0.000000] x23: ffff8000810b9300 x22: ffff8000810b9320 x21: 0000000000000000
    [    0.000000] x20: ffff00087b8a0f80 x19: ffff800080f7b650 x18: 0000000000000006
    [    0.000000] x17: 696a203532207369 x16: 2079616c65642074 x15: ffff8000812737e0
    [    0.000000] x14: ffff800081396440 x13: 0000000000000000 x12: fffffc0020000a48
    [    0.000000] x11: 0000000000000000 x10: ffff800080f77650 x9 : 0000000000000002
    [    0.000000] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 0000000000000000
    [    0.000000] x5 : 0080808080000000 x4 : 0000000000000061 x3 : 62697461706d6f63
    [    0.000000] x2 : ffff8000810b9300 x1 : ffff800080f7b650 x0 : 0000000000004000
    [    0.000000] Call trace:
    [    0.000000]  __pi_strcmp+0x24/0x140
    [    0.000000]  of_find_matching_node_and_match+0x68/0x14c
    [    0.000000]  of_irq_init+0xa4/0x398
    [    0.000000]  irqchip_init+0x18/0x24
    [    0.000000]  init_IRQ+0x9c/0xb4
    [    0.000000]  start_kernel+0x258/0x608
    [    0.000000]  __primary_switched+0xbc/0xc4
    [    0.000000] Code: 54000401 b50002c6 d503201f f86a6803 (f8408402)
    [    0.000000] ---[ end trace 0000000000000000 ]---
    [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
    [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

  • Hello Jae Young Choi,

    Thank you.

    Let me review the inputs.

    I may have to request our software expert to review the logs.

    When the temperature reached around 85°C, we performed cold reset.

    I need you to confirm if this is die temperature or ambient.

    if this is the die, help me confirm the ambient temperature.

    Regards,

    Sreenivasa

  • Hi Sreenivasa.

    If the chip temperature rises excessively (for example, above 80~90°C), there seems be a theoretical possibility that certain internal analog blocks, power switching, or PLL functions may become temporarily unstable during reset operations (both cold and hot resets). In particular, if the temperature is close to the critical threshold at which the thermal shutdown circuit is about to activate, normal operation may not be guaranteed even after power is reapplied immediately following a reset.

    After an automatic shutdown due to overheating or a user-initiated forced shutdown, the chip does not automatically reboot and recover solely through its internal boot logic; instead, the system can only resume normal booting after the temperature has sufficiently decreased and power is supplied (i.e., the system is reset).


    If the system is designed to power up and boot normally once the temperature has dropped low enough and power is re-applied, what is the safe temperature range to which the chip must return for normal operation?
    Also, does the PMIC have a feature that can automatically re-supply power and attempt to reboot the system by itself when the temperature returns to a safe range?

    If the cpu_critical temperature value of thermal-zones is set to a temperature similar to 85–90°C, and the IC reaches that temperature causing the system to power off, would it be possible to power on (cold reset) immediately afterwards and expect normal boot without any issues?
    Is it also necessary to verify whether there is any difference in results between restarting the system right after a normal shutdown at a similar temperature and rebooting the system by performing a hot or cold reset at that temperature?

     
    Thanks.

    Best regards, 

    Jack

  • Hello Jack, 

    Thank you for the inputs.

    I need you to confirm if this is die temperature or ambient.

    if this is the die, help me confirm the ambient temperature.

    Can you please help answer the above question for me to check with the experts.

    Regards,

    Sreenivasa

  • Hello Kallikuppa Sreenivasa,

    The temperature I mentioned earlier refers to the die temperature (VTM temperature). 
    The custom board is enclosed in a mechanical housing, and we are unable to measure the temperature inside the enclosure directly.
    We placed a temperature sensor inside the blanket, approximately 3 cm away from the enclosure, and the measured temperature was around 34°C.

    Thank you.
    Regards,

  • Hello Jack, 

    Thank you.

    Can you confirm the temperature setting 

    #define MAX_TEMP 115000
    +#define COOL_DOWN_TEMP 90000

    have you tried setting a temp as above and seen the SOC tripping and recovering as a test ?

    The setup description is not clear if we do not have the SOC ambient temperature information.

    Regards,

    Sreenivasa

  • Hi Sreenivasa.

    Should I ask this customer to try your suggestion through device tree changes as below?

    thermal-zones {
    cpu_thermal: cpu_thermal {
    polling-delay-passive = <250>; // ms
    polling-delay = <1000>; // ms

    trips {
    cpu_alert0: trip0 {
    temperature = <90000>; // COOL_DOWN_TEMP (in millicelsius)
    hysteresis = <2000>;
    type = "passive";
    };

    cpu_crit: trip1 {
    temperature = <115000>; // MAX_TEMP (in millicelsius)
    hysteresis = <2000>;
    type = "critical";
    };
    };

    cooling-maps {
    map0 {
    trip = <&cpu_alert0>;
    cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
    };
    };
    };
    };

  • Hello JK, 

    Thank you.

    Anshu, did you have some thoughts?

    Regards,

    Sreenivasa