Hello.
I'm using a evmj784s4 development board and in the process of porting a custom operating system to this board.
We've previously successfully run on a few different arm devices, such as cortex A53, A57 and another A72 soc, this problem is new to us.
with the following information from boot:
U-Boot SPL 2021.01-g1c0d06c606 (Oct 30 2022 - 20:20:29 +0000) SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi') SPL initial stack usage: 13472 bytes Trying to boot from MMC2 Starting ATF on ARM64 core... NOTICE: BL31: v2.7(release):v2.7.0-359-g1309c6c805-dirty NOTICE: BL31: Built : 20:18:26, Oct 30 2022 I/TC: I/TC: OP-TEE version: 3.18.0-134-g6bf4a81a8 (gcc version 9.2.1 20191025 (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10))) #1 Sun Oct 30 20:18:43 UTC 2022 aarch64 I/TC: WARNING: This OP-TEE configuration might be insecure! I/TC: WARNING: Please check https://optee.readthedocs.io/en/latest/architecture/porting_guidelines.html I/TC: Primary CPU initializing I/TC: SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi') I/TC: HUK Initialized I/TC: Activated SA2UL device I/TC: Fixing SA2UL firewall owner for GP device I/TC: Enabled firewalls for SA2UL TRNG device I/TC: SA2UL TRNG initialized I/TC: SA2UL Drivers initialized I/TC: Primary CPU switching to normal world boot U-Boot SPL 2021.01-g1c0d06c606 (Oct 30 2022 - 20:19:37 +0000) SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi') Trying to boot from MMC2 U-Boot 2021.01-g1c0d06c606 (Oct 30 2022 - 20:19:37 +0000) SoC: J784S4 SR1.0 GP Model: Texas Instruments J784S4 EVM DRAM: 32 GiB Flash: 0 Bytes MMC: mmc@4f80000: 0, mmc@4fb0000: 1 Loading Environment from MMC... OK In: serial@2880000 Out: serial@2880000 Err: serial@2880000 am65_cpsw_nuss ethernet@46000000: K3 CPSW: nuss_ver: 0x6BA02102 cpsw_ver: 0x6BA82102 ale_ver: 0x00293904 Ports:1 mdio_freq:1000000 Net: eth0: ethernet@46000000port@1
I am currently having issues with an (for me) unexplained external abort triggered by the execution of an ldarx instruction.
The ldaxr instruction is part of a spin lock. I have no issues reading the lock variable with a normal read/load.
The MMU is enabled, I and D cache as well. We are executing on the a72 cores, nothing is running on the r5's.
The lock is used during the initialization and boot of our software running in EL1.
Disabling this lock (commenting it out, or replacing it with a simple interrupt lock) will hide the external abort, no crash is observed and other locks (using ldaxr) later on in our software works as intended.
So it is not (as far as i can tell) an issue with the instruction or how the spin lock(s) are implemented.
During debugging I've tried to clean and invalidate the cache, but the external abort still appears.
We do not have any custom software running in EL3, so the default software handles the external abort and gives the following dump:
ERROR: Unhandled External Abort received on 0x80000000 from S-EL1 ERROR: exception reason=0 syndrome=0xbf000002 Unhandled Exception from EL1 x0 = 0x0000000000000001 x1 = 0x0000000000000040 x2 = 0x0000000080020000 x3 = 0x000000008002000b x4 = 0x0000000000000000 x5 = 0x0000000000000034 x6 = 0x2d2d2d2d2d2d2d2d x7 = 0x000000000000000d x8 = 0x0000000082aed000 x9 = 0x0000000080022578 x10 = 0x0000000082713c40 x11 = 0x0000000000000020 x12 = 0x0000000082f1dc20 x13 = 0x0000000082f1dc20 x14 = 0x00000000826f5000 x15 = 0x0000000000000002 x16 = 0x0000000000000100 x17 = 0x0000000082335658 x18 = 0x0000000082335678 x19 = 0x0000000082aef750 x20 = 0x0000000000000001 x21 = 0x0000000000000001 x22 = 0x0000000000000061 x23 = 0x00000000826f2000 x24 = 0x0000000082137ad0 x25 = 0x0000000000000001 x26 = 0x00000000000002c0 x27 = 0x0000000082725ed0 x28 = 0x00000000826f2638 x29 = 0x0000000082f1dba0 x30 = 0x0000000082137bcc scr_el3 = 0x000000000000073d sctlr_el3 = 0x0000000030cd183f cptr_el3 = 0x0000000000000000 tcr_el3 = 0x0000000080803520 daif = 0x00000000000002c0 mair_el3 = 0x00000000004404ff spsr_el3 = 0x00000000200002c5 elr_el3 = 0x0000000082137c04 ttbr0_el3 = 0x0000000070011cc0 esr_el3 = 0x00000000bf000002 far_el3 = 0x0000000000000000 spsr_el1 = 0x0000000000000000 elr_el1 = 0x0000000000000000 spsr_abt = 0x0000000000000000 spsr_und = 0x0000000000000000 spsr_irq = 0x0000000000000000 spsr_fiq = 0x0000000000000000 sctlr_el1 = 0x0000000030d00801 actlr_el1 = 0x0000000000000000 cpacr_el1 = 0x0000000000300000 csselr_el1 = 0x0000000000000002 sp_el1 = 0x0000000082f1dba0 esr_el1 = 0x0000000000000000 ttbr0_el1 = 0x0000000000000000 ttbr1_el1 = 0x0000000000000000 mair_el1 = 0x0000000000000000 amair_el1 = 0x0000000000000000 tcr_el1 = 0x0000000000800080 tpidr_el1 = 0x0000000082aeff30 tpidr_el0 = 0x0000000000000000 tpidrro_el0 = 0x0000000000000000 par_el1 = 0x0000000000000000 mpidr_el1 = 0x0000000080000000 afsr0_el1 = 0x0000000000000000 afsr1_el1 = 0x0000000000000000 contextidr_el1 = 0x0000000000000000 vbar_el1 = 0x00000000ffee2000 cntp_ctl_el0 = 0x0000000000000000 cntp_cval_el0 = 0x0000000000000000 cntv_ctl_el0 = 0x0000000000000000 cntv_cval_el0 = 0x0000000000000000 cntkctl_el1 = 0x0000000000000102 sp_el0 = 0x000000007000b380 isr_el1 = 0x0000000000000000 dacr32_el2 = 0x0000000000000000 ifsr32_el2 = 0x0000000000000000 cpuectlr_el1 = 0x0000001b00000040 cpumerrsr_el1 = 0x0000000000000000 l2merrsr_el1 = 0x0000000000000000
The elf disasembled around elr_el3 looks like this:
82137bfc: d50320bf sevl 82137c00: d503205f wfe 82137c04: 885fff60 ldaxr w0, [x27] <---------- elr_el3 82137c08: 35ffffc0 cbnz w0, 82137c00 <xxx+0x130> 82137c0c: 88007f75 stxr w0, w21, [x27] 82137c10: 35ffffa0 cbnz w0, 82137c04 <xxx+0x134> 82137c14: 39402262 ldrb w2, [x19, #8]
I am at a loss here, any input is appreciated, please let me know if you require more information from me and I will supply it to the best of my ability.
Thanks in advance,
Jonas Karlsson