DRA829V: Intermittent hang when handing over to U-Boot proper

Part Number: DRA829V
Other Parts Discussed in Thread: DRA829, AM69,

Tool/software:

Hello experts,

In this thread:

AM69A: Intermittent Boot Failures - Frequency Handshake Timeout and SPL Hang

there is a discussion about a hang issue just as the SPL is about to hand over to U-Boot proper. We have seen the same on our device, and I have now been able to reproduce it on the j721e_evm platform.

Boot log:

U-Boot SPL 2025.01-00551-g743712b9ee4b (Jul 31 2025 - 11:32:44 +0000)
SYSFW ABI: 4.0 (firmware rev 0x000b '11.1.8--v11.01.08 (Fancy Rat)')
Trying to boot from SPI
Skipping authentication on GP device
Skipping authentication on GP device
Skipping authentication on GP device
Skipping authentication on GP device
Skipping authentication on GP device
Loading Environment from nowhere... OK
Starting ATF on ARM64 core...

NOTICE:  BL31: v2.13.0(release):v2.13.0-259-ge0c4d3903b-dirty
NOTICE:  BL31: Built : 07:01:36, Jul  1 2025
I/TC:
I/TC: OP-TEE version: 4.6.0-dev (gcc version 13.4.0 (GCC)) #1 Fri Apr 25 11:17:53 UTC 2025 aarch64
I/TC: WARNING: This OP-TEE configuration might be insecure!
I/TC: WARNING: Please check https://optee.readthedocs.io/en/latest/architecture/porting_guidelines.html
I/TC: Primary CPU initializing
I/TC: GIC redistributor base address not provided
I/TC: Assuming default GIC group status and modifier
I/TC: SYSFW ABI: 4.0 (firmware rev 0x000b '11.1.8--v11.01.08 (Fancy Rat)')
I/TC: Activated SA2UL device
I/TC: Fixing SA2UL firewall owner for GP device
I/TC: Enabled firewalls for SA2UL TRNG device
I/TC: SA2UL TRNG initialized
I/TC: SA2UL Drivers initialized
I/TC: HUK Initialized
I/TC: Primary CPU switching to normal world boot

U-Boot SPL 2025.01-00551-g743712b9ee4b-dirty (Jul 31 2025 - 11:32:44 +0000)
SYSFW ABI: 4.0 (firmware rev 0x000b '11.1.8--v11.01.08 (Fancy Rat)')
DM ABI: 3.0 (firmware ver 0x000b 'PSDK.11.01.00.04--v11.01.08a' patch_ver: 8)
Detected: J7X-BASE-CPB rev A
Detected: J7X-GESI-EXP rev A
Trying to boot from SPI
k3-navss-ringacc ringacc@2b800000: Ring Accelerator probed rings:286, gp-rings[96,20] sci-dev-id:235
k3-navss-ringacc ringacc@2b800000: dma-ring-reset-quirk: disabled
cadence_spi spi@47040000: Pattern not found. Skipping calibration
Skipping authentication on GP device
Skipping authentication on GP device

(hangs here)

I downloaded the latest (ti-processor-sdk-linux-adas-j721e-evm-11_01_00_03) version and burned the pre-built image to an SD card. I then flashed the four boot files to the QSPI flash and bootstrapped it to QSPI (to mimic our own custom device). I then created systemd scripts to perform a MCU_PORz reset (using a GPIO pin to the push button) 10 seconds after Linux has loaded. 919 reboots later I see the behavior above.

Do you have any insight as to what can be the problem, or better yet, any workarounds to keep it from happening?

Regards,

/Bo

  • Hi Bo,

    The TI resource assigned to this post, is currently out of office, please be aware that there will be a delay in response.

    Regards,

    kb

  • Hi Bo,

    This is seen at the A72 SPL boot stage. At this point the DDR is up and running.

    U-boot is being loaded from the qspi to DDR.

    Can you check if A72 is executing u-boot or hangs are SPL level trying to load the u-boot image?

    Best Regards,

    Keerthy 

  • Hi Keerthy,

    I have reproduced this on the EVM platform as we first saw it on out custom device.

    This is a log from my own device when it hangs:

    U-Boot SPL 2024.04-ti-gda00c65d74dc (Oct 02 2025 - 08:57:18 +0000)
    SYSFW ABI: 4.0 (firmware rev 0x000a '10.1.6--v10.01.06 (Fiery Fox)')
    >>> boot_from_devices
    Trying to boot from SPI
    cadence_spi spi@47050000: Unable to find PHY pattern partition
    cadence_spi_of_to_plat: regbase=47050000 ahbbase=58000000 max-frequency=40000000 page-size=256
    cadence_spi_set_speed: speed=40000000
    cadence_spi_set_speed: speed=40000000
    Authentication passed
    Authentication passed
    Authentication passed
    >>> spl_perform_fixups
    >>> bootstage_mark_name
    >>> spl_board_prepare_for_boot
    >>> HANDOVER to A72
    Loading Environment from nowhere... OK
    Authentication passed
    Authentication passed
    Starting ATF on ARM64 core...
    
    NOTICE:  BL31: v2.11.0(release):v2.11.0-906-g58b25570c9-dirty
    NOTICE:  BL31: Built : 08:49:40, Aug  5 2025
    I/TC:
    I/TC: OP-TEE version: 4.4.0-dev (gcc version 13.3.0 (GCC)) #1 Tue Jul  1 15:17:54 UTC 2025 aarch64
    I/TC: WARNING: This OP-TEE configuration might be insecure!
    I/TC: WARNING: Please check https://optee.readthedocs.io/en/latest/architecture/porting_guidelines.html
    I/TC: Primary CPU initializing
    I/TC: GIC redistributor base address not provided
    I/TC: Assuming default GIC group status and modifier
    I/TC: SYSFW ABI: 4.0 (firmware rev 0x000a '10.1.6--v10.01.06 (Fiery Fox)')
    I/TC: Activated SA2UL device
    I/TC: Enabled firewalls for SA2UL TRNG device
    I/TC: SA2UL TRNG initialized
    I/TC: SA2UL Drivers initialized
    I/TC: HUK Initialized
    I/TC: Primary CPU switching to normal world boot
    
    U-Boot SPL 2024.04-ti-gda00c65d74dc (Oct 02 2025 - 08:57:19 +0000)
    SYSFW ABI: 4.0 (firmware rev 0x000a '10.1.6--v10.01.06 (Fiery Fox)')
    Successfully set the A72 clock frequency to 1000000000
    Successfully set the MSMC clock frequency to 500000000
    >>> boot_from_devices
    Trying to boot from SPI
    cadence_spi spi@47050000: Unable to find PHY pattern partition
    Authentication passed
    Authentication passed
    >>> spl_perform_fixups
    >>> bootstage_mark_name
    >>> spl_board_prepare_for_boot
    >>> HANDOVER to A72
    image entry point: 0x80800000
    
    
    (hangs here)

    The last debug printout is from this function, in spl.c:

    __weak void __noreturn jump_to_image_no_args(struct spl_image_info *spl_image)
    {
    	typedef void __noreturn (*image_entry_noargs_t)(void);
    
    	image_entry_noargs_t image_entry =
    		(image_entry_noargs_t)spl_image->entry_point;
    
    	debug("image entry point: 0x%lx\n", spl_image->entry_point);
    	image_entry();
    }
    

    This is as far as I have been able to trace it. But please read up on the previous e2e, linked in my first post. They have observed an issue with MCU_RESETSTATz and RESETSTATz being held low when this error occurs.

    This has now been observed on two different platforms (AM69 and DRA829), which makes me think that this is a fundamental flaw in the k3 architecture boot process.

    A colleague of mine has experimented with swapping out tiboot3.bin and sysfw.itb to older versions based on U-Boot 2023.04 and SYSFW ABI version 10.0.1, which has shows a better stability (say 1 in 10000 boots) but they still hang in the handover if let run long enough.

    Also note that our device, based on DRA829V, now has been released and is an active product on the market. I can not stress enough that this problem keeps us awake at night, so please escalate it to the highest instance.

    Regards,

    /Bo