PROCESSOR-SDK-J784S4: External abort executing ldaxr instruction.

Jonas Karlsson

Hello.

I'm using a evmj784s4 development board and in the process of porting a custom operating system to this board.
We've previously successfully run on a few different arm devices, such as cortex A53, A57 and another A72 soc, this problem is new to us.

with the following information from boot:

U-Boot SPL 2021.01-g1c0d06c606 (Oct 30 2022 - 20:20:29 +0000)
SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi')
SPL initial stack usage: 13472 bytes
Trying to boot from MMC2
Starting ATF on ARM64 core...

NOTICE: BL31: v2.7(release):v2.7.0-359-g1309c6c805-dirty
NOTICE: BL31: Built : 20:18:26, Oct 30 2022
I/TC:
I/TC: OP-TEE version: 3.18.0-134-g6bf4a81a8 (gcc version 9.2.1 20191025 (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10))) #1 Sun Oct 30 20:18:43 UTC 2022 aarch64
I/TC: WARNING: This OP-TEE configuration might be insecure!
I/TC: WARNING: Please check https://optee.readthedocs.io/en/latest/architecture/porting_guidelines.html
I/TC: Primary CPU initializing
I/TC: SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi')
I/TC: HUK Initialized
I/TC: Activated SA2UL device
I/TC: Fixing SA2UL firewall owner for GP device
I/TC: Enabled firewalls for SA2UL TRNG device
I/TC: SA2UL TRNG initialized
I/TC: SA2UL Drivers initialized
I/TC: Primary CPU switching to normal world boot

U-Boot SPL 2021.01-g1c0d06c606 (Oct 30 2022 - 20:19:37 +0000)
SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi')
Trying to boot from MMC2


U-Boot 2021.01-g1c0d06c606 (Oct 30 2022 - 20:19:37 +0000)

SoC: J784S4 SR1.0 GP
Model: Texas Instruments J784S4 EVM
DRAM: 32 GiB
Flash: 0 Bytes
MMC: mmc@4f80000: 0, mmc@4fb0000: 1
Loading Environment from MMC... OK
In: serial@2880000
Out: serial@2880000
Err: serial@2880000
am65_cpsw_nuss ethernet@46000000: K3 CPSW: nuss_ver: 0x6BA02102 cpsw_ver: 0x6BA82102 ale_ver: 0x00293904 Ports:1 mdio_freq:1000000
Net: eth0: ethernet@46000000port@1

I am currently having issues with an (for me) unexplained external abort triggered by the execution of an ldarx instruction.
The ldaxr instruction is part of a spin lock. I have no issues reading the lock variable with a normal read/load.
The MMU is enabled, I and D cache as well. We are executing on the a72 cores, nothing is running on the r5's.

The lock is used during the initialization and boot of our software running in EL1.
Disabling this lock (commenting it out, or replacing it with a simple interrupt lock) will hide the external abort, no crash is observed and other locks (using ldaxr) later on in our software works as intended.
So it is not (as far as i can tell) an issue with the instruction or how the spin lock(s) are implemented.

During debugging I've tried to clean and invalidate the cache, but the external abort still appears.
We do not have any custom software running in EL3, so the default software handles the external abort and gives the following dump:

ERROR: Unhandled External Abort received on 0x80000000 from S-EL1
ERROR: exception reason=0 syndrome=0xbf000002
Unhandled Exception from EL1
x0 = 0x0000000000000001
x1 = 0x0000000000000040
x2 = 0x0000000080020000
x3 = 0x000000008002000b
x4 = 0x0000000000000000
x5 = 0x0000000000000034
x6 = 0x2d2d2d2d2d2d2d2d
x7 = 0x000000000000000d
x8 = 0x0000000082aed000
x9 = 0x0000000080022578
x10 = 0x0000000082713c40
x11 = 0x0000000000000020
x12 = 0x0000000082f1dc20
x13 = 0x0000000082f1dc20
x14 = 0x00000000826f5000
x15 = 0x0000000000000002
x16 = 0x0000000000000100
x17 = 0x0000000082335658
x18 = 0x0000000082335678
x19 = 0x0000000082aef750
x20 = 0x0000000000000001
x21 = 0x0000000000000001
x22 = 0x0000000000000061
x23 = 0x00000000826f2000
x24 = 0x0000000082137ad0
x25 = 0x0000000000000001
x26 = 0x00000000000002c0
x27 = 0x0000000082725ed0
x28 = 0x00000000826f2638
x29 = 0x0000000082f1dba0
x30 = 0x0000000082137bcc
scr_el3 = 0x000000000000073d
sctlr_el3 = 0x0000000030cd183f
cptr_el3 = 0x0000000000000000
tcr_el3 = 0x0000000080803520
daif = 0x00000000000002c0
mair_el3 = 0x00000000004404ff
spsr_el3 = 0x00000000200002c5
elr_el3 = 0x0000000082137c04
ttbr0_el3 = 0x0000000070011cc0
esr_el3 = 0x00000000bf000002
far_el3 = 0x0000000000000000
spsr_el1 = 0x0000000000000000
elr_el1 = 0x0000000000000000
spsr_abt = 0x0000000000000000
spsr_und = 0x0000000000000000
spsr_irq = 0x0000000000000000
spsr_fiq = 0x0000000000000000
sctlr_el1 = 0x0000000030d00801
actlr_el1 = 0x0000000000000000
cpacr_el1 = 0x0000000000300000
csselr_el1 = 0x0000000000000002
sp_el1 = 0x0000000082f1dba0
esr_el1 = 0x0000000000000000
ttbr0_el1 = 0x0000000000000000
ttbr1_el1 = 0x0000000000000000
mair_el1 = 0x0000000000000000
amair_el1 = 0x0000000000000000
tcr_el1 = 0x0000000000800080
tpidr_el1 = 0x0000000082aeff30
tpidr_el0 = 0x0000000000000000
tpidrro_el0 = 0x0000000000000000
par_el1 = 0x0000000000000000
mpidr_el1 = 0x0000000080000000
afsr0_el1 = 0x0000000000000000
afsr1_el1 = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1 = 0x00000000ffee2000
cntp_ctl_el0 = 0x0000000000000000
cntp_cval_el0 = 0x0000000000000000
cntv_ctl_el0 = 0x0000000000000000
cntv_cval_el0 = 0x0000000000000000
cntkctl_el1 = 0x0000000000000102
sp_el0 = 0x000000007000b380
isr_el1 = 0x0000000000000000
dacr32_el2 = 0x0000000000000000
ifsr32_el2 = 0x0000000000000000
cpuectlr_el1 = 0x0000001b00000040
cpumerrsr_el1 = 0x0000000000000000
l2merrsr_el1 = 0x0000000000000000

The elf disasembled around elr_el3 looks like this:

82137bfc:   d50320bf        sevl
82137c00:   d503205f        wfe
82137c04:   885fff60        ldaxr   w0, [x27]       <---------- elr_el3
82137c08:   35ffffc0        cbnz    w0, 82137c00 <xxx+0x130>
82137c0c:   88007f75        stxr    w0, w21, [x27]
82137c10:   35ffffa0        cbnz    w0, 82137c04 <xxx+0x134>
82137c14:   39402262        ldrb    w2, [x19, #8]

I am at a loss here, any input is appreciated, please let me know if you require more information from me and I will supply it to the best of my ability.

Thanks in advance,
Jonas Karlsson

over 2 years ago

0 Keerthy J over 2 years ago

TI__Guru**** 162770 points

Hi Jonas,

Jonas Karlsson said:
I'm using a evmj784s4 development board and in the process of porting a custom operating system to this board.
We've previously successfully run on a few different arm devices, such as cortex A53, A57 and another A72 soc, this problem is new to us.

with the following information from boot:

Could you share the full logs?

Jonas Karlsson said:
The MMU is enabled, I and D cache as well. We are executing on the a72 cores, nothing is running on the r5's.

We cannot say this if you are executing U-Boot on A72. MCU1_0 is running the device manager firmware binary.

M4 is running the TIFS binary.

- Keerthy

0 Jonas Karlsson over 2 years ago in reply to Keerthy J

Prodigy 10 points

Keerthy J said:
Could you share the full logs?

The logs I attached is the information that is given to me as I reset the board, I am not sure how to give you the full logs. Could you give me a hint as to how that is possible?

This is how we load the image, if that helps.

=> reset
resetting ...

U-Boot SPL 2021.01-g1c0d06c606 (Oct 30 2022 - 20:20:29 +0000)
SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi')
SPL initial stack usage: 13472 bytes
Trying to boot from MMC2
Starting ATF on ARM64 core...

NOTICE:  BL31: v2.7(release):v2.7.0-359-g1309c6c805-dirty
NOTICE:  BL31: Built : 20:18:26, Oct 30 2022
I/TC: 
I/TC: OP-TEE version: 3.18.0-134-g6bf4a81a8 (gcc version 9.2.1 20191025 (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10))) #1 Sun Oct 30 20:18:43 UTC 2022 aarch64
I/TC: WARNING: This OP-TEE configuration might be insecure!
I/TC: WARNING: Please check https://optee.readthedocs.io/en/latest/architecture/porting_guidelines.html
I/TC: Primary CPU initializing
I/TC: SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi')
I/TC: HUK Initialized
I/TC: Activated SA2UL device
I/TC: Fixing SA2UL firewall owner for GP device
I/TC: Enabled firewalls for SA2UL TRNG device
I/TC: SA2UL TRNG initialized
I/TC: SA2UL Drivers initialized
I/TC: Primary CPU switching to normal world boot

U-Boot SPL 2021.01-g1c0d06c606 (Oct 30 2022 - 20:19:37 +0000)
SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.5--v08.04.05 (Jolly Jellyfi')
Trying to boot from MMC2


U-Boot 2021.01-g1c0d06c606 (Oct 30 2022 - 20:19:37 +0000)

SoC:   J784S4 SR1.0 GP
Model: Texas Instruments J784S4 EVM
DRAM:  32 GiB
Flash: 0 Bytes
MMC:   mmc@4f80000: 0, mmc@4fb0000: 1
Loading Environment from MMC... OK
In:    serial@2880000
Out:   serial@2880000
Err:   serial@2880000
am65_cpsw_nuss ethernet@46000000: K3 CPSW: nuss_ver: 0x6BA02102 cpsw_ver: 0x6BA82102 ale_ver: 0x00293904 Ports:1 mdio_freq:1000000
Net:   eth0: ethernet@46000000port@1
=> tftpboot 0x82000000 joka/evm4.bin
k3-navss-ringacc ringacc@2b800000: Ring Accelerator probed rings:286, gp-rings[96,20] sci-dev-id:328
k3-navss-ringacc ringacc@2b800000: dma-ring-reset-quirk: disabled
am65_cpsw_nuss_port ethernet@46000000port@1: K3 CPSW: rflow_id_base: 2
link up on port 1, speed 1000, full duplex
Using ethernet@46000000port@1 device
TFTP from server 172.24.20.23; our IP address is 172.24.20.140
Filename 'joka/evm4.bin'.
Load address: 0x82000000
Loading: #################################################################
	 #################################################################
	 #################################################################
	 #################################################################
	 #################################################################
	 #################################################################
	 #################################################################
	 #############################################
	 9.2 MiB/s
done
Bytes transferred = 7332152 (6fe138 hex)
=> tftpboot 0x82000000 joka/evm4.bin
am65_cpsw_nuss_port ethernet@46000000port@1: K3 CPSW: rflow_id_base: 2
link up on port 1, speed 1000, full duplex
Using ethernet@46000000port@1 device
TFTP from server 172.24.20.23; our IP address is 172.24.20.140
Filename 'joka/evm4.bin'.
Load address: 0x82000000
Loading: #################################################################
	 #################################################################
	 #################################################################
	 #################################################################
	 #################################################################
	 #################################################################
	 #################################################################
	 #############################################
	 9.2 MiB/s
done
Bytes transferred = 7332152 (6fe138 hex)
=> go 0x82000000
## Starting application at 0x82000000 ..

From there on we have some logs from our OS up until the usage of the spinlock, which results in the dump i posted in the original post.

On the SD-card:

=> fatls mmc 1
   422365   tiboot3.bin
   946564   tispl.bin
  1116176   u-boot.img
      484   uenv.txt
        0   uenv.txt.base
      120   uenv.txt.disp_sharing
      101   uenv.txt.jailhouse
       54   uenv.txt.psdkra
       15   version
      483   uEnv_am62a_edgeai_apps.txt
      483   uEnv_am62a_vision_apps.txt
      483   uEnv_j721e_edgeai_apps.txt
      483   uEnv_j721e_vision_apps.txt
      484   uEnv_j721s2_edgeai_apps.txt
      484   uEnv_j721s2_vision_apps.txt
      484   uEnv_j784s4_edgeai_apps.txt
      484   uEnv_j784s4_vision_apps.txt
        1   .psdk_setup

18 file(s), 0 dir(s)

We are using the bootloader(s) that came with the board, no customization made.

Keerthy J said:
We cannot say this if you are executing U-Boot on A72. MCU1_0 is running the device manager firmware binary.

M4 is running the TIFS binary.

Yes, thank you. This was also pointed out to me by a coworker here. What I was aiming at is that the software we are having issues with runs on the a72 cores. Sorry for the confusion.

~ Jonas Karlsson

0 Keerthy J over 2 years ago in reply to Jonas Karlsson

TI__Guru**** 162770 points

Hi Jonas,

I believe this will be better question on the arm forums.

https://community.arm.com/support-forums/f/architectures-and-processors-forum/44293/atomic-write-ldaxr-stlxr-causes-infinite-loop-on-cortex-a72

I found a thread above and has some resolution. Since the OS is not known to us it is difficult to comment without reproducing ourself.

Best Regards,

Keerthy

0 Alexandru Avadanii over 1 year ago in reply to Keerthy J

Prodigy 40 points

Hi, Keerthy,

We investigated this further and made some changes that got us a bit further.

This is most likely a SoC specific issue, as we have tested the same code on multiple Cortex A-72 CPUs and the behavior described by Jonas differs between J784S4 and other SoCs integrating A-72.

When we started looking into this, we were using the bootloader binaries that came with our EVM board on the SD card.

The ARM Trusted Firmware (ATF) version used was probably old enough to not include [1], which means asynchronous aborts (SError) were taken at EL3 and not at EL2/EL1.

That was our first source of confusion, as we were expecting both synchronous and asynchronous exceptions to be delivered to our operating system running in EL1 (non-secure).

Since then, we upgraded all the bootloader binaries (tiboot3.bin, tispl.bin and u-boot.img) to the versions from the most recent SDK (09.x) and also analyzed the behavior using the XDS110 on-board debugger.

With the new ATF, SErrors are delivered to our operating system at EL1, as expected.

We still see the SError triggered by "ldarx", but since this is an asynchronous exception, it doesn't necessarily mean "ldarx" was the instruction that caused the SError.

We added some "nop" instructions and saw that the following sequence of events:

82137c04:   885fff60        ldaxr   w0, [x27]                 <--- the first time this is called, it loads the contents from [x27] to w0
82137c08:   35ffffc0        cbnz    w0, 82137c00 <xxx+0x130>
82137c0c:   88007f75        stxr    w0, w21, [x27]            <--- seems to store the contents of w21 to [x27], but writes "1" to w0
82137c10:   35ffffa0        cbnz    w0, 82137c04 <xxx+0x134>  <--- jumps back to ldaxr to try again, what happens next should be another "ldaxr", but instead we get a SError

If we change the above sequence by adding some "nop" instructions:

ldaxr <---- ok
cbnz  <---- ok
stxr  <---- returns "1"
nop   <---- ok
nop   <---- SError, most likely triggered by "ldaxr" above

Now, before going further, a bit of context:

- the code snippet above is a very common spinlock implementation, if that was not obvious;

- we execute this code section very early, before the MMU was initialized, so both SCTLR_EL1.M and SCTLR_EL1.C are 0 (MMU off, caches off);

- we don't expect the "ldaxr"/"stxr" to actually execute successfully with caches off, as they rely on the cache to mark for exclusive access;

- we _do_ expect one of the following though:

(a) ldaxr to throw a _synchronous_ exception, like a data abort - this is what most (if not all) of our other A-72 based SoCs do;

(b) ldaxr/stxr to use the "Global monitor" instead of the "local monitor" to ensure exclusive access - however, based on the comment in [2], we suspect that J784S4 does not support that (?);

- the SError is only triggered on core 0 for some reason, all other cores will gracefully fail during stxr (write "1" to w0, but _not_ throw any exception);

We are familiar with exceptions handling, so for us it would be ideal if we could get a _synchronous_ exception in this context, instead of an _asynchronous_ SError _or_ get ldaxr/stxr to work using the "global monitor" (if that is available on J784S4).

Note that once MMU is ON and caches are enabled on all cores, ldaxr/stxr do work as intended.

So:

- does J784S4 have a "global monitor" and if so, can ldaxr/stxr leverage that instead of the "local monitor" while caches are off?

- is there another mechanism that we can use for "ldarx" to trigger a _synchronous_ abort instead of an _asynchronous_ exception?

- is there anything we can do for all the cores to trigger the same exception in a _predictable_ manner, instead of currently getting SErrors only on core 0?

Thank you,

Alex

[1] git.ti.com/.../k3

[2] git.ti.com/.../k3

0 Richard Woodruff over 1 year ago in reply to Alexandru Avadanii

TI__Mastermind 23505 points

Hello,

At the time of error is only 1 cluster up, or is this spin lock being used across clusters? There may be some multi-cluster settings which are missing, however, based on your comments it seem like an early error where the code is still in one cluster.

Exclusives are meant to be used memory on normal memory attributes. Local monitors in the cache levels normally take care of things. At the next bus level out of caches I have recollection the coherency engine should result in the same effect. I also have recollection that exclusives against non-normal memory have an undefined behavior. I recall aborts when folks tried to port in and run exclusive code in loaders before cache and mmu is enabled, that was fixed by not doing that as its undefined/illegal. Before the mmu is on the default behavior was that of a non-normal type. Some research and looking up in specs would be needed to get to 100% on recollections. At any rate there is no global monitor and is a construction which arm reviewed and has been working on multiple HLOS (Linux, QNX) and RTOS. I am not optimistic you can be assured a synchronous error and if other happen do to that its likely in the implementation defined zone.

Regards,

Richard W.

0 Alexandru Avadanii over 1 year ago in reply to Richard Woodruff

Prodigy 40 points

Hi, Richard,

Yes, when this happens, only A72_0_0 is up, all other A72 cores are off. We didn't get past the early spinlock lock attempt in order to continue bringing up the MMU on A72_0_0, then bring up the other cores.

Richard Woodruff said:
At any rate there is no global monitor

Well, that might explain the differences between J784S4 and our other SoCs.

Richard Woodruff said:
I am not optimistic you can be assured a synchronous error and if other happen do to that its likely in the implementation defined zone.

That's ok, we wanted to make sure it's not us missing something in the specs, as there are a lot of places that mention "implementation defined" or "unknown" behavior in this area. Since this is not well defined behavior (could have been SoC specific, as long as it was predictable), we will just avoid using spinlocks before MMU and caches are enabled on all A72 cores.

For what it's worth, we successfully hacked a synchronous abort for the "ldarx" instruction by adding "1" to the address used if SCTLR_EL1.M is 0. Of course, that's not something we can use, but it was nice to see a synchronous abort for a change.

Thank you for looking into this, I think we got our answers.

Best regards,

Alex

+1 Richard Woodruff over 1 year ago in reply to Alexandru Avadanii

TI__Mastermind 23505 points

Hello Alex,

Before the MMU is up several features will not behave (alignment fix ups and fpu are other examples which can fail). Not using these primitives and getting it on quickly is the best path. Thanks for sharing your experimental result, maybe that will help spot some future issue faster.

As far as exclusives you will need to ensure you follow the setup here otherwise cross cluster can have an issue: https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/ti/k3/common/k3_helpers.S#L127

Regards,

Richard W.

Processors

Processors forum

PROCESSOR-SDK-J784S4: External abort executing ldaxr instruction.