AM62L: AM62L Real-time Performance Issue

xi he

Part Number: AM62L

Hi,

We are encountering real-time performance issues while using the AM62L chip. The current SDK version in use is ti-processor-sdk-linux-rt-am62lxx-evm-11.02.08.02-Linux-x86-Install.bin, with a kernel version of 6.12.57. The issue details are as follows:

1. Core isolation coconfig.gz nfiguration was applied to CPU1, and real-time optimization parameters were set in the cmdline, as shown below:

root@am62xx-evm:/# cat /proc/cmdline
console=ttyS0,115200n8 earlycon=ns16550a,mmio32,0x02800000 ubi.mtd=ospi_nand.rootfs root=ubi0:rootfs rw rootfstype=ubifs rootwait rcu_nocb_poll rcu_nohz=1 idle=poll rcu_nocbs=1 nohz=on nohz_full=1 kthread_cpus=0 irqaffinity=0 isolcpus=managed_irq,domain,1

2. A CPU load was added to Core 1, and jitter was tested using cyclictest with the following commands:

root@am62xx-evm:/# taskset -c 1 stress-ng --cpu 1 --cpu-load 70 --vm 1 --vm-bytes 80% &
root@am62xx-evm:/# cyclictest -a 0-1 -t 2 -p 99 -m -D 0 &

Under this stress load, the maximum jitter exceeded 150µs within a 10-minute test, as illustrated in the figure below:
T: 0 ( 404) P:99 I:1000 C:72252 Min: 5 Act: 11 Avg: 15 Max: 203
T: 1 ( 405) P:99 I:1500 C:48159 Min: 9 Act: 33 Avg: 33 Max: 152

We are particularly concerned about the real-time performance of Core 1, as real-time tasks in actual application scenarios will also be assigned to this core. However, the current system results deviate significantly from the 60–70µs jitter self-tested in the official documentation. We would appreciate your suggestions for real-time performance optimization.

Thank you for your support!

Attachment: Our kernel .config file.

1 month ago

0 Nick Saulnier 1 month ago

TI__Guru** 109890 points

Hello Xi,

I always like to make sure that we have the same "known good" starting point before doing any development.

Before attempting any core isolation, did you attempt to replicate the jitter results of the official documentation by following the exact steps in the docs? Please share the results of that test with an unmodified default SDK image:
https://software-dl.ti.com/processor-sdk-linux-rt/esd/AM62LX/11_02_08_02/exports/docs/devices/AM62LX/linux/RT_Linux_Performance_Guide.html

Regards,

Nick

0 xi he 1 month ago in reply to Nick Saulnier

Prodigy 20 points

Hi Nick,

On our own product, we have performed driver adaptation based on the official SDK, incorporating considerations specific to our product design. Following the stress test commands and cyclictest parameters described in the issue, we observed the following:

When conducting tests strictly per the official documentation, the applied stress appears relatively light, yielding real-time performance results of approximately 69μs. However, when using our customized test commands—which simulate higher workloads aligned with actual application scenarios—the real-time performance degrades significantly.

Given this discrepancy, we would appreciate real-time performance optimization recommendations tailored to our revised test methodology and increased load conditions.

Regards,

0 Nick Saulnier 1 month ago in reply to xi he

TI__Guru** 109890 points

Hello Xi,

Before we continue:

Disclaimer: I am not an expert at tuning RT-Linux

I will be learning with you as we work to tune your RT Linux system.

What processor are you using?

You mentioned AM62L, but your terminal logs report AM62x (am62xx-evm instead of am62lxx-evm). Which processor are you using?

These different processors have slightly different hardware, which affects some of the adjustments you could make to improve performance.

Tell me more about your usecase

Ok, so we have 2 cores. You want core1 dedicated to performing a single heavy task (I assume this is a stand-in for CodeSys), and everything else running on core0. Is that correct?

What other tasks are running on core0?

Do I see any issues with your current test setup?

I have used cgroups in the past to divide tasks between processor cores. I am not sure if it would cause significantly different behavior from isolcpus. I have attached feedback from an AI agent below [1].

How to tune the RT-Linux performance for your application?

You can find some getting started documentation for tuning RT Linux performance here:
https://software-dl.ti.com/processor-sdk-linux-rt/esd/AM62LX/latest/exports/docs/linux/How_to_Guides/Target/How_to_Tune_Real_Time_Linux.html

Bootlin also provides excellent trainings. You can find their RT Linux training here:
https://bootlin.com/training/preempt-rt/

The first step is to disable any applications or drivers that you are not using, regardless of which core that code is running on. Why?

Core1 is still impacted by the code running on core0, even after you isolate the tasks and the interrupts:

1) shared cache: all cores in the A53 cluster share a single cache. A cache miss means a longer response time

2) DDR interface: this could be a bottleneck if both cores are doing heavy DDR traffic

Regards,

Nick

[1] AI feedback on isolcpus vs cgroup isolation:

Q: What are the differences between isolcpus and cgroup isolation?

A: [GENERAL KNOWLEDGE] Both mechanisms, key differences:

isolcpus (what you have)

Kernel scheduler mechanism. Works at boot, no userspace needed.

What it does:
- domain — removes CPU 1 from SMP load balancing domains. Scheduler won't migrate tasks onto it spontaneously
- managed_irq — prevents managed IRQs (MSI/MSI-X) from being assigned to CPU 1
- Tasks can still be explicitly pinned to isolated CPU via taskset/sched_setaffinity

Limits:
- Kernel threads (kthread_cpus=0 helps but not complete coverage)
- Some kernel subsystems ignore it
- No enforcement — any process can call sched_setaffinity and land on CPU 1
- No memory, I/O, or network isolation — only scheduling

cpuset cgroups (what you don't have — CONFIG_CPUSETS not set)

Hierarchical resource container. Enforced by kernel at task level.

What it does differently:
- Hard partition: assign CPUs to cgroups, tasks in that cgroup cannot run elsewhere
- cpuset.cpus.exclusive=1 (cgroups v2) — full exclusive ownership, not just isolation hint
- Combined with cpuset.mems — also isolates NUMA memory nodes (less relevant on single-socket AM62L, but still affects page allocator
behavior)
- Inheritable: child processes stay in cgroup automatically
- Composable with other cgroup controllers (memory limits, I/O weights)

The key semantic difference:

isolcpus: "scheduler, don't put things HERE unless explicitly asked"
cpusets: "this CPU belongs to THIS cgroup, full stop"

isolcpus is an opt-out hint to the scheduler. cpuset is an enforced partition.

Without CONFIG_CPUSETS, what can still escape CPU 1?

[GENERAL KNOWLEDGE] Even with your full cmdline:
- Workqueue threads — kthread_cpus=0 helps, but WQ_UNBOUND workqueues may still use any CPU
- Softirqs — bound to CPU where hardware interrupt fires; irqaffinity=0 helps but not complete
- Timer softirqs — nohz_full=1 suppresses most, but not all kernel activity
- Any process that explicitly calls sched_setaffinity(1) — no enforcement layer to stop this without cpusets

Practical recommendation for real-time workload on CPU 1

[GENERAL KNOWLEDGE]

Option A — pure isolcpus (what you have): Sufficient if your RT task is the only thing you deliberately pin to CPU 1, and you trust
userspace not to misbehave. Simpler.

Option B — enable cpusets: Stronger guarantee. Boot CPU 1 isolated, then:
# Create exclusive cpuset for RT task
mkdir /sys/fs/cgroup/cpuset/rt
echo 1 > /sys/fs/cgroup/cpuset/rt/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/rt/cpuset.mems
echo $RT_PID > /sys/fs/cgroup/cpuset/rt/tasks

# Remaining system tasks stay in root cpuset (CPU 0 only)
echo 0 > /sys/fs/cgroup/cpuset/cpuset.cpus

Both together is strongest: isolcpus keeps scheduler away during boot before cgroups are configured; cpusets enforce the partition at
runtime.

Given CONFIG_CPUSETS is not set — if your RT workload is sensitive, worth enabling it. Single kernel recompile cost, significant
runtime isolation guarantee gain.

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

What is the target performance?

Are you looking for a specific benchmark number?

Or are you just trying to get the cyclictest number to go down, and then re-running Codesys with the new configuration to see if a faster cyclictest is associated with better Codesys performance?

0 xi he 1 month ago in reply to Nick Saulnier

Prodigy 20 points

Hi，Nick，

Thank you very much for your reply.

Firstly, the actual hardware model I adopt is AM62L3.

Secondly, my core business application is Codesys. The real-time workload runs as the Codesys master station. I configure all Codesys EtherCAT tasks to be scheduled on real-time Core 1 through the Codesys upper computer.

In terms of my test scheme: I plan to verify the system real-time jitter with professional test commands first. My expectation is that the maximum jitter can be kept below 70 microseconds in a 48-hour long-term continuous test. After reaching this target, I will deploy the actual business scenarios, which includes driving servo motors via EtherCAT bus, and monitoring the peak jitter values through the Codesys host software. Therefore, I use cyclictest and stress-ng to simulate high CPU occupancy and heavy memory pressure consistent with actual business loads.

Besides, I am carrying out relevant verification tests on the cpuset isolation solution you mentioned simultaneously. I will feed back the latest test data to you in a timely manner once I get new results.

Regards，

0 Nick Saulnier 1 month ago in reply to xi he

TI__Guru** 109890 points

EDITED May 18, 2026 - Edits in RED

Hello Xi,

Expectations for this week

My AM62L EVM is set up. I will run tests this week to isolate specific settings which could reduce worst-case latency.

I expect lots of DDR traffic will lead to higher worst-case interrupt response time. I am not sure if <70usec cyclictest is reasonable for a 48 hour test, but let's see how low we can get it.

Test 260518_1: exactly replicate commands in SDK 11.2 docs (except using OPTEE TRNG driver)

UPDATE: The SDK benchmark results were captured with the Pseudo RNG driver enabled. I did not enable it for this test. Will run again with updated results.

My initial results do not match docs. Will look into
* stress-ng -c 4 vs -c 2 for a dual core A53
* WARN: stat /dev/cpu_dma_latency failed: No such file or directory

FOLLOWUP NOTES:

Yes, stress-ng -c 4 is the standard test TI runs for all the AM6 devices, regardless of whether the A53 core is quad core or dual core. Unrelated to test results.

The warning about cpu_dma_latency is expected. The TI DMA drivers do not have a cpu_dma_latency option, which is why cyclictest is unable to find it on this processor.

root@am62lxx-evm:~# uname -a
Linux am62lxx-evm 6.12.57-ti-rt-g31b07ab8dfbc #1 SMP PREEMPT_RT Thu Dec  4 13:07:37 UTC 2025 aarch64 GNU/Linux

root@am62lxx-evm:~# stress-ng --cpu-method=all -c 4 &
[2] 1002
[1]   Done                    stress-ng --cpu-method=all -c 2
root@am62lxx-evm:~# stress-ng: info:  [1002] defaulting to a 1 day, 0 secs run per stressor
stress-ng: info:  [1002] dispatching hogs: 4 cpu
root@am62lxx-evm:~# cyclictest -m -Sp80 -D6h -h400 -i200 -M -q
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
# Histogram
000000 000000   000000
000001 000000   000000
000002 000000   000000
000003 000000   000000
000004 000000   000000
000005 345248   762554
000006 27360548 24662134
000007 37407885 25009511
000008 24363913 18800255
000009 10887474 15688180
000010 3723560  10831733
000011 1546892  5943833
000012 880035   2643272
000013 538533   1211026
000014 319457   674355
000015 187475   449129
000016 117677   316653
000017 083693   227635
000018 066541   163902
000019 054741   119252
000020 042386   090612
000021 028144   076970
000022 016581   069648
000023 009738   059346
000024 005631   046448
000025 003812   034093
000026 002662   024576
000027 001900   018179
000028 001396   014439
000029 001044   011841
000030 000767   009951
000031 000603   008023
000032 000472   006384
000033 000324   005158
000034 000257   004241
000035 000189   003392
000036 000097   002807
000037 000086   002390
000038 000047   001997
000039 000034   001506
000040 000021   001195
000041 000012   000865
000042 000009   000688
000043 000004   000463
000044 000010   000362
000045 000001   000248
000046 000002   000206
000047 000004   000144
000048 000002   000081
000049 000001   000076
000050 000001   000050
000051 000004   000038
000052 000003   000028
000053 000005   000015
000054 000002   000013
000055 000001   000006
000056 000002   000005
000057 000004   000006
000058 000002   000002
000059 000001   000000
000060 000000   000000
000061 000001   000000
000062 000000   000000
000063 000003   000000
000064 000004   000001
000065 000003   000000
000066 000001   000001
000067 000003   000000
000068 000003   000000
000069 000002   000000
000070 000001   000000
000071 000002   000000
000072 000000   000000
000073 000004   000000
000074 000002   000000
000075 000003   000000
000076 000000   000000
000077 000002   000000
000078 000001   000000
000079 000002   000000
000080 000001   000000
000081 000004   000000
000082 000002   000000
000083 000002   000000
000084 000001   000000
000085 000003   000000
000086 000003   000000
000087 000001   000000
000088 000002   000000
000089 000000   000000
000090 000000   000000
000091 000001   000000
000092 000000   000000
000093 000000   000000
000094 000001   000000
000095 000000   000000
000096 000001   000000
000097 000000   000000
000098 000001   000000
000099 000000   000000
000100 000000   000000
000101 000000   000000
000102 000001   000000
000103 000001   000000
000104 000000   000000
000105 000001   000000
000106 000000   000000
000107 000000   000000
000108 000001   000000
000109 000000   000000
000110 000001   000000
000111 000000   000000
000112 000000   000000
000113 000000   000000
000114 000000   000000
000115 000000   000000
000116 000000   000000
000117 000000   000000
000118 000000   000000
000119 000001   000000
000120 000000   000000
000121 000000   000000
000122 000000   000000
000123 000000   000000
000124 000000   000000
000125 000000   000000
000126 000001   000000
…
# Total: 108000000 107999918
# Min Latencies: 00005 00005
# Avg Latencies: 00007 00008
# Max Latencies: 00126 00066
# Histogram Overflows: 00000 00000
# Histogram Overflow at cycle number:
# Thread 0:
# Thread 1:

root@am62lxx-evm:~# stress-ng: info:  [1002] skipped: 0
stress-ng: info:  [1002] passed: 4: cpu (4)
stress-ng: info:  [1002] failed: 0
stress-ng: info:  [1002] metrics untrustworthy: 0
stress-ng: info:  [1002] successful run completed in 1 day, 0.77 secs

[2]+  Done                    stress-ng --cpu-method=all -c 4

Next test

exactly replicate your initial test above

re-do Test 260518_1 with OP_TEE TRNG disabled

Topics for later

DDR & A53 Quality of Service (QoS) and Class of Service (CoS) might be helpful for us here. I am still reading up on it. Links for future readers:

https://www.ti.com/lit/sprads6

RE: AM6422: [LinuxRT] Poor lantency performance on isolated core

Regards,

Nick

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

Hello Xi,

Did you build OP-TEE with Pseudo RNG drivers for your tests?
https://software-dl.ti.com/processor-sdk-linux-rt/esd/AM62LX/11_02_08_02/exports/docs/linux/Foundational_Components_OPTEE.html#building-optee-with-prng

The performance guide document has a slight bug. It mentions disabling the OP-TEE true RNG driver (TRNG) and enabling the Psueudo RNG driver as an option, which is good. But the performance guide does not mention that the TI tests were conducted with default SDK image + uboot files, where the u-boot files are rebuilt with a modified OP-TEE with PRNG:
https://software-dl.ti.com/processor-sdk-linux-rt/esd/AM62LX/11_02_08_02/exports/docs/linux/Foundational_Components/U-Boot/BG-Build-K3.html

I have filed a ticket to update the performance guide document in future SDK releases.

I will re-run Test 260518_1 with OP_TEE TRNG disabled to see if that allows me to replicate the benchmark results.

Regards,

Nick

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

Please note that there is a bug in the SDK 11.2 version of the SDK docs for building OP-TEE which is fixed starting in SDK 12.0 docs.

This is the wrong argument in the SDK 11.2 version of docs:
https://software-dl.ti.com/processor-sdk-linux-rt/esd/AM62LX/11_02_08_02/exports/docs/linux/Foundational_Components_OPTEE.html

PLATFORM=k3-k3-am62lx

This is the correct argument in the SDK 12.0 version of docs:
https://software-dl.ti.com/processor-sdk-linux-rt/esd/AM62LX/latest/exports/docs/linux/Foundational_Components_OPTEE.html

PLATFORM=k3-am62lx

0 Tony Tang 1 month ago in reply to Nick Saulnier

TI__Mastermind 30522 points

Nick Saulnier said:
I have filed a ticket to update the performance guide document in future SDK releases.

Jira can only update future version. Is there a way to update the error on line for released version to avoid confusion for later readers since it is reported and confirmed.

As all released version will exist on line always, let a known error existing on line for world wide readers doesn't make sense.

Nick Saulnier said:
PLATFORM=k3-k3-am62lx

I found this error when I built OPTEE also. and I reported to somebody, if it got updated, you would not see this again.

0 xi he 1 month ago in reply to Nick Saulnier

Prodigy 20 points

Hi Nick,
My OP-TEE firmware uses the SDK-provided prebuilt image: board-support/prebuilt-images/am62lxx-evm/bl32.bin. Since there were no development or modification requirements, I directly used this firmware without recompilation.
However, upon reviewing my log records, it appears that the OP-TEE module was not properly loaded. While this does not affect the overall system boot or functionality, could you help confirm the potential implications of this issue? Below are relevant excerpts from my U-Boot logs and kernel logs related to OP-TEE:

UBOOT LOG:

NOTICE: bl1_plat_arch_setup arch setup
NOTICE: Booting Trusted Firmware
NOTICE: BL1: v2.12.0(release):11.02.01-14-g5939ceaeb-dirty
NOTICE: BL1: Built : 08:05:07, Jan 28 2026
NOTICE: BL1: dram_class: 10
NOTICE: lpddr4: post start - PI training status=0x29c02000
NOTICE: bl1_platform_setup DDR init done
NOTICE: k3_bl1_handoff ENTERING WFI - end of bl1
NOTICE: BL31: v2.12.0(release):11.02.01-14-g5939ceaeb-dirty
NOTICE: BL31: Built : 08:05:11, Jan 28 2026
NOTICE: SYSFW ABI: 4.0 (firmware rev 0x000b '11.2.5-v11.02.05a (Fancy Rat)')
get_device_type a0a
ERROR: Agent 0 Protocol 0x10 Message 0x7: not supported

U-Boot SPL 2025.01-g398f44b6f7db (Apr 30 2026 - 03:21:19 +0000)
SPL initial stack usage: 2048 bytes
Trying to boot from SPINAND
ERROR: Agent 0 Protocol 0x10 Message 0x7: not supported


U-Boot 2025.01-g398f44b6f7db (Apr 30 2026 - 03:21:19 +0000)

SoC: AM62LX SR1.1 HS-FS
Model: Texas Instruments AM62L3 Evaluation Module
DRAM: 512 MiB
ERROR: Agent 0 Protocol 0x10 Message 0x7: not supported
Core: 67 devices, 31 uclasses, devicetree: separate
MMC: mmc@fa10000: 0
Loading Environment from nowhere... OK
In: serial@2800000
Out: serial@2800000
Err: serial@2800000
Net: eth0: ethernet@8000000port@1
Warning: ethernet@8000000port@2 (eth1) using random MAC address - 4e:11:69:6c:28:a0
, eth1: ethernet@8000000port@2
Hit any key to stop autoboot: 0
Total of 1 byte(s) were the same
Total of 1 byte(s) were the same
Setting bus to 0
ubi0: attaching mtd4
ubi0: scanning is finished
ubi0: attached mtd4 (name "ospi_nand.rootfs", size 123 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi0: good PEBs: 991, bad PEBs: 0, corrupted PEBs: 0
ubi0: user volume: 1, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 4/1, WL threshold: 4096, image sequence number: 80226512
ubi0: available PEBs: 0, total reserved PEBs: 991, PEBs reserved for bad PEB handling: 20
Loading file '/boot/Image' to addr 0x82000000...
Done
Loading file '/boot/dtb/ti/k3-am62l3-plc.dtb' to addr 0x88000000...
Done
## Flattened Device Tree blob at 88000000
Booting using the fdt blob at 0x88000000
Working FDT set to 88000000
Loading Device Tree to 000000008fff2000, end 000000008ffff7b9 ... OK
Working FDT set to 8fff2000

Starting kernel ...

And here is my kernel log about op-tee:

[ 1.338980] optee: probing for conduit method.
[ 1.338998] optee: api uid mismatch
[ 1.339004] optee firmware:optee: probe with driver optee failed with error -22

Regards，

0 Nick Saulnier 1 month ago in reply to xi he

TI__Guru** 109890 points

Test 260519_1: exactly replicate commands in SDK 11.2 docs, INCLUDING disabling OP-TEE TRNG driver

I am now able to replicate the observations in the Linux SDK 11.2 docs.

Step 1: Get the OP-TEE source code. I will link to the SDK 12.0 version of the docs since it fixes some bugs in the SDK 11.2 docs.

Step 2: Build the OP-TEE code with Pseudo RNG instead of True RNG:

#!/bin/bash

# set variables
export OPTEE_PLATFORM="k3-am62lx"
export SDK_INSTALL_DIR="<path_to>/ti-processor-sdk-linux-rt-am62lxx-evm-11.02.08.02"
export CROSS_COMPILE_64="${SDK_INSTALL_DIR}/linux-devkit/sysroots/x86_64-arago-linux/usr/bin/aarch64-oe-linux/aarch64-oe-linux-"
export SYSROOT_64="${SDK_INSTALL_DIR}/linux-devkit/sysroots/aarch64-oe-linux"
export CC_64="${CROSS_COMPILE_64}gcc --sysroot=${SYSROOT_64}"
export CROSS_COMPILE_32="${SDK_INSTALL_DIR}/k3r5-devkit/sysroots/x86_64-arago-linux/usr/bin/arm-oe-eabi/arm-oe-eabi-"
export CFLAGS64="--sysroot=${SYSROOT_64}"
export KCFLAGS="--sysroot=${SYSROOT_64}"
export LDFLAGS="--sysroot=${SYSROOT_64}"

# clean sources
make CROSS_COMPILE="$CROSS_COMPILE_64" clean

# make OPTEE
make CROSS_COMPILE64="$CROSS_COMPILE_64" PLATFORM="$OPTEE_PLATFORM" CFG_ARM64_core=y CFG_WITH_SOFTWARE_PRNG=y CFG_USER_TA_TARGETS=ta_arm64

echo "Make succeeded!"

Step 3: Rebuild U-Boot with the updated OP-TEE code:

#!/bin/bash

# set variables
export PLATFORM_DEFCONFIG="am62lx_evm_defconfig"
export SDK_INSTALL_DIR="<path_to>/ti-processor-sdk-linux-rt-am62lxx-evm-11.02.08.02"
export CROSS_COMPILE_64="${SDK_INSTALL_DIR}/linux-devkit/sysroots/x86_64-arago-linux/usr/bin/aarch64-oe-linux/aarch64-oe-linux-"
export SYSROOT_64="${SDK_INSTALL_DIR}/linux-devkit/sysroots/aarch64-oe-linux"
export CC_64="${CROSS_COMPILE_64}gcc --sysroot=${SYSROOT_64}"
export CROSS_COMPILE_32="${SDK_INSTALL_DIR}/k3r5-devkit/sysroots/x86_64-arago-linux/usr/bin/arm-oe-eabi/arm-oe-eabi-"
export CFLAGS64="--sysroot=${SYSROOT_64}"
export KCFLAGS="--sysroot=${SYSROOT_64}"
export LDFLAGS="--sysroot=${SYSROOT_64}"

export UBOOT_DIR="${SDK_INSTALL_DIR}/board-support/ti-u-boot-2025.01+git"
export TI_LINUX_FW_DIR="${SDK_INSTALL_DIR}/board-support/prebuilt-images/am62lxx-evm"
# I used prebuilt TFA
export TFA_DIR="${TI_LINUX_FW_DIR}"
# I used rebuilt OPTEE
# NOTE: Make sure to check out the appropriate OPTEE tag, check Release Notes
export OPTEE="/home/a0226750local/git/optee_os/out/arm-plat-k3/core/tee-pager_v2.bin"
# Using prebuilt OPTEE
#export OPTEE="${TI_LINUX_FW_DIR}/bl32.bin"

# clean sources
make CROSS_COMPILE="$CROSS_COMPILE_64" clean

#configure u-boot
make CROSS_COMPILE="$CROSS_COMPILE_64" "$PLATFORM_DEFCONFIG"

# build u-boot
make CROSS_COMPILE="$CROSS_COMPILE_64" BL1=$TFA_DIR/bl1.bin BL31=$TFA_DIR/bl31.bin BINMAN_INDIRS=$TI_LINUX_FW_DIR TEE=$OPTEE

echo "Make succeeded!"

Test results:

root@am62lxx-evm:~# uname -a
Linux am62lxx-evm 6.12.57-ti-rt-g31b07ab8dfbc #1 SMP PREEMPT_RT Thu Dec  4 13:07:37 UTC 2025 aarch64 GNU/Linux
root@am62lxx-evm:~# stress-ng --cpu-method=all -c 4 &
[1] 936
root@am62lxx-evm:~# stress-ng: info:  [936] defaulting to a 1 day, 0 secs run per stressor
stress-ng: info:  [936] dispatching hogs: 4 cpu

root@am62lxx-evm:~# cyclictest -m -Sp80 -D6h -h400 -i200 -M -q
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
# Histogram
000000 000000   000000
000001 000000   000000
000002 000000   000000
000003 000000   000000
000004 000000   000000
000005 316691   705021
000006 20586842 22417152
000007 29812256 33595187
000008 23874842 24719871
000009 16066047 15036644
000010 8610582  6758354
000011 3841554  2421157
000012 1763729  936561
000013 1002112  478290
000014 640858   266835
000015 406657   151581
000016 249747   094050
000017 157131   069519
000018 111235   059859
000019 091638   056972
000020 084283   053234
000021 078843   046950
000022 070029   037546
000023 057772   027135
000024 042498   018041
000025 030672   011984
000026 023045   008631
000027 017919   006895
000028 014503   005833
000029 012056   004995
000030 009495   003834
000031 007223   002603
000032 005447   001679
000033 003909   001064
000034 002929   000702
000035 002046   000463
000036 001535   000347
000037 001102   000274
000038 000761   000189
000039 000577   000142
000040 000375   000100
000041 000272   000080
000042 000190   000027
000043 000148   000025
000044 000131   000019
000045 000087   000009
000046 000064   000005
000047 000032   000004
000048 000028   000002
000049 000025   000001
000050 000025   000003
000051 000014   000000
000052 000011   000000
000053 000011   000001
000054 000005   000001
000055 000003   000001
000056 000003   000000
000057 000002   000000
000058 000003   000000
000059 000000   000000
000060 000003   000000
000061 000002   000000
000062 000001   000000
000063 000000   000000
...
000399 000000   000000
# Total: 108000000 107999872
# Min Latencies: 00005 00005
# Avg Latencies: 00008 00007
# Max Latencies: 00062 00055
# Histogram Overflows: 00000 00000
# Histogram Overflow at cycle number:
# Thread 0:
# Thread 1:

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

How does changing OP-TEE settings impact Linux interrupt response time?

My understanding is that the context switch between Linux and OP-TEE can add additional latency to the kernel's interrupt response time. Since Linux also has a pseudo RNG driver, disabling the OP-TEE hardware RNG driver means that Linux does NOT switch to OP-TEE when RNG is needed (since Linux already has the driver to generate the RNG). Since context switching to OP-TEE is reduced, latency is reduced.

This is NOT the same as disabling or removing OP-TEE. My current understanding is that if your application required regular switching to OP-TEE, then we should expect that the latency should get worse again.

Regards,

Nick

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

Please create a separate thread for the OP-TEE loading question

that way we can make sure that separate question gets addressed properly

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

Edited May 20 2026

Test 260519_2: exactly replicate customer setup from May 12, including using OoB OP-TEE

root@am62lxx-evm:~# cat /proc/cmdline
console=ttyS0,115200n8 vt.global_cursor_default=0 rcu_nocb_poll rcu_nohz=1 rcu_nocbs=1 idle=poll nohz=on nohz_full=1 kthread_cpus=0 irq
affinity=0 isolcpus=managed_irq,domain,1 earlycon=ns16550a,mmio32,0x02800000 root=PARTUUID=076c4a2a-02 rw rootfstype=ext4 rootwait
root@am62lxx-evm:~# uname -a
Linux am62lxx-evm 6.12.57-ti-rt-g31b07ab8dfbc #1 SMP PREEMPT_RT Thu Dec  4 13:07:37 UTC 2025 aarch64 GNU/Linux
root@am62lxx-evm:~# taskset -c 1 stress-ng --cpu 1 --cpu-load 70 --vm 1 --vm-bytes 80% &
[1] 986
root@am62lxx-evm:~# stress-ng: info:  [986] defaulting to a 1 day, 0 secs run per stressor
stress-ng: info:  [986] dispatching hogs: 1 cpu, 1 vm
stress-ng: info:  [988] cpu: for stable load results, select a specific cpu stress method with --cpu-method other than 'all'

root@am62lxx-evm:~# cyclictest -a 0-1 -t 2 -p 99 -m -D 0 &

# output from earlier in the day
policy: fifo: loadavg: 0.57 1.56 1.84 1/158 1920
stress-ng: info:  [986] failed: 0
T: 0 (  992) P:99 I:1000 C:86443004 Min:      5 Act:    9 Avg:   15 Max:     489
T: 1 (  993) P:99 I:1500 C:57628654 Min:      6 Act:    8 Avg:   28 Max:     295

# output as of 3:30pm central, not sure if stress-ng has finished running or not
policy: fifo: loadavg: 0.00 0.01 0.01 1/157 1964
T: 0 (  992) P:99 I:1000 C:90242144 Min:      5 Act:    5 Avg:   15 Max:     489
T: 1 (  993) P:99 I:1500 C:60161415 Min:      6 Act:    7 Avg:   27 Max:     295

This is the test after running for almost 24 hours ^

Plan for tomorrow:

1) Re-run Test 260519_2 with OP-TEE TRNG disabled

2) Re-run Test 260519_2 with OoB OP-TEE, but different cyclictest arguments:

* cyclictest -m -S -p99 -D6h -h600 -i200 -M -q cmdline entry isolcpus=domain,1 breaks -S. Need to explicitly set affinity.

* cyclictest -m -a 0-1 -t 2 -p99 -D6h -h600 -i200 -M -q
* change 1ms --> 200 usec test interval (-i200)
* avoid page faults from stack & memory swapped to disk (-m -M)
* leave cyclictest priority at 99 (not sure how this relates to CodeSys behavior)

see an AI comparison of the differences between the initial two tests at [1]

Regards,

Nick

[1] compare the behavior of "cyclictest -a 0-1 -t 2 -p 99 -m -D 0" against "cyclictest -m -Sp80 -D6h -h400 -i200 -M -q"

  ┌───────────────┬────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────┐
  │   Parameter   │                cyclictest -a 0-1 -t 2 -p 99 -m -D 0                │ cyclictest -m -Sp80 -D6h -h400 -i200 -M  │
  │               │                                                                    │                    -q                    │
  ├───────────────┼────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
  │ Thread        │ -t 2 explicit 2 threads                                            │ -S auto: 1 thread per CPU                │
  │ creation      │                                                                    │                                          │
  ├───────────────┼────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
  │ CPU affinity  │ -a 0-1 pin to CPU0+1                                               │ -S handles affinity automatically        │
  ├───────────────┼────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
  │ RT priority   │ -p 99 (max SCHED_FIFO)                                             │ -p 80                                    │
  ├───────────────┼────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
  │ Duration      │ -D 0 [UNCERTAIN: 0 = indefinite or exit immediately — test before  │ -D 6h explicit 6-hour run                │
  │               │ relying on it]                                                     │                                          │
  ├───────────────┼────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
  │ Sample        │ default = 1000 µs                                                  │ -i 200 = 200 µs (5× more samples)        │
  │ interval      │                                                                    │                                          │
  ├───────────────┼────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
  │ Histogram     │ none                                                               │ -h 400 (buckets up to 400 µs)            │
  ├───────────────┼────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
  │ Stack         │ standard stack                                                     │ -M mmap stack (avoids stack growth       │
  │               │                                                                    │ faults)                                  │
  ├───────────────┼────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
  │ Output        │ verbose — prints every sample, floods terminal                     │ -q quiet — prints only final             │
  │               │                                                                    │ summary/histogram                        │
  └───────────────┴────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────┘

0 xi he 1 month ago in reply to Nick Saulnier

Prodigy 20 points

Hi， Nick,

In our project, the OP-TEE module is not loaded, and our actual business scenarios do not utilize any security modules. I would like to know whether pursuing this direction and conducting related verification is necessary？

Additionally, in the provided test commands, I originally applied only 70% CPU load and 80% memory load to CPU1 while leaving CPU0 idle. However, in real-world scenarios, CPU0 runs business-critical functions such as CANopen and Ethernet-related CODESYS operations. Only the CODESYS EtherCAT master station operates on CPU1. Therefore, I have also added a similar load to CPU0 using the following command:
taskset -c 0 stress-ng --cpu 1 --cpu-load 70 --vm 1 --vm-bytes 40% &
(The CPU/memory load was reduced because assigning too high a load would trigger OOM [Out-Of-Memory] on our hardware platform.) Under this configuration, the significant jitter issue on CPU1 can be reproduced more quickly.

Could you please provide some additional kernel-level optimization approaches to help mitigate the maximum jitter on CPU1?

Regards，

0 Tony Tang 1 month ago in reply to Nick Saulnier

TI__Mastermind 30522 points

Nick Saulnier said:
I am now able to replicate the observations in the Linux SDK 11.2 docs.

When I tested with the self build OPTEE, the jitter varies between test, and between power cycle, sometimes I can get good result as yours/sdk user guide, sometimes get larger number. Can you test more times?

0 Nick Saulnier 1 month ago in reply to Tony Tang

TI__Guru** 109890 points

Tony and I are discussing offline.

Hello Xi,

Let's set expectations: this will take time. There is no "magic" kernel optimization that will allow us to skip important tests

You have set a difficult goal. Right now, I do not have a pre-packaged solution which magically reaches your goal.

Your goal might be possible, or it might not. There are a LOT of ways to optimize performance on a complex part like this. It will take time to run tests and see which tests improve performance. You can speed up the process by running tests on your side and sharing your observations.

What is my approach?

I talk with our TI experts about things that they expect to impact RT Linux performance, and then I run tests to verify (or challenge) their expectations. I am starting with things that the TI experts have already tested, or that the experts are very confident about (like OP-TEE TRNG).

I am also looking at specific hardware settings to make sure that the hardware configuration is not slowing down code execution.

Once I have investigated the suggestions of the TI experts and we are confident that the hardware is running as efficiently as possible, then I could look at other software settings. If you want to start testing your own software configuration, feel free to do the Bootlin RT training and get started: https://bootlin.com/training/preempt-rt/

Questions for you

"isolcpus=managed_irq" - is there a specific reason you are using managed_irq for this test?

Have you benchmarked resource usage on your CodeSys platform, and found that taskset -c 0 stress-ng --cpu 1 --cpu-load 70 --vm 1 --vm-bytes 40% accurately mirrors your resource usage? Or is this just a guess? These tests will be most useful if we are actually approximating your system.

If there is a different setup that you would like me to use, please give me the updated cmdline arguments, taskset for both cores, etc.

Test 260520_1: replicate customer setup from May 12, but disable OP-TEE TRNG

Disabling OP-TEE TRNG appears to improve performance. Hard to tell for sure since the cyclictest command used did not generate a histogram output.

Out-of-the-box OP-TEE:
T: 0 ( 992) P:99 I:1000 C:86443004 Min: 5 Act: 9 Avg: 15 Max: 489
T: 1 ( 993) P:99 I:1500 C:57628654 Min: 6 Act: 8 Avg: 28 Max: 295

OP-TEE TRNG disabled:
T: 0 ( 945) P:99 I:1000 C:75969951 Min: 4 Act: 42 Avg: 13 Max: 372
T: 1 ( 946) P:99 I:1500 C:50646620 Min: 7 Act: 23 Avg: 25 Max: 252

screenshot at 2:30pm: ~21 hrs test run

root@am62lxx-evm:~# uname -a
Linux am62lxx-evm 6.12.57-ti-rt-g31b07ab8dfbc #1 SMP PREEMPT_RT Thu Dec  4 13:07:37 UTC 2025 aarch64 GNU/Linux
root@am62lxx-evm:~# cat /proc/cmdline
console=ttyS0,115200n8 vt.global_cursor_default=0 rcu_nocb_poll rcu_nohz=1 rcu_nocbs=1 idle=poll nohz=on nohz_full=1 kthread_cpus=0 irq
affinity=0 isolcpus=managed_irq,domain,1 earlycon=ns16550a,mmio32,0x02800000 root=PARTUUID=076c4a2a-02 rw rootfstype=ext4 rootwait
root@am62lxx-evm:~# taskset -c 1 stress-ng --cpu 1 --cpu-load 70 --vm 1 --vm-bytes 80% &
[1] 939
root@am62lxx-evm:~# stress-ng: info:  [939] defaulting to a 1 day, 0 secs run per stressor
stress-ng: info:  [939] dispatching hogs: 1 cpu, 1 vm
stress-ng: info:  [941] cpu: for stable load results, select a specific cpu stress method with --cpu-method other than 'all'

root@am62lxx-evm:~# # started test at 5:20pm central
root@am62lxx-evm:~# cyclictest -a 0-1 -t 2 -p 99 -m -D 0 &
[2] 944
root@am62lxx-evm:~# WARN: stat /dev/cpu_dma_latency failed: No such file or directory
policy: fifo: loadavg: 2.04 2.05 2.00 3/161 1794                                                                               [0/7798]

T: 0 (  945) P:99 I:1000 C:75969951 Min:      4 Act:   42 Avg:   13 Max:     372
T: 1 (  946) P:99 I:1500 C:50646620 Min:      7 Act:   23 Avg:   25 Max:     252

Test 260520_2: replicate customer setup from May 12 with OoB OP-TEE, but use different cyclictest arguments

cyclictest -m -a 0-1 -t 2 -p99 -D6h -h600 -i200 -M -q
* change 1ms --> 200 usec test interval (-i200)
* avoid page faults from stack & memory swapped to disk (-m -M)
* leave cyclictest priority at 99 (not sure how this relates to CodeSys behavior)

This leads to MUCH lower results. Additional testing needed to determine which change or changes contributed.

root@am62lxx-evm:~# uname -a
Linux am62lxx-evm 6.12.57-ti-rt-g31b07ab8dfbc #1 SMP PREEMPT_RT Thu Dec  4 13:07:37 UTC 2025 aarch64 GNU/Linux
root@am62lxx-evm:~# cat /proc/cmdline
console=ttyS0,115200n8 vt.global_cursor_default=0 rcu_nocb_poll rcu_nohz=1 rcu_nocbs=1 idle=poll nohz=on nohz_full=1 kthread_cpus=0 irq
affinity=0 isolcpus=managed_irq,domain,1 earlycon=ns16550a,mmio32,0x02800000 root=PARTUUID=076c4a2a-02 rw rootfstype=ext4 rootwait
root@am62lxx-evm:~# taskset -c 1 stress-ng --cpu 1 --cpu-load 70 --vm 1 --vm-bytes 80% &
[1] 985
root@am62lxx-evm:~# stress-ng: info:  [985] defaulting to a 1 day, 0 secs run per stressor
stress-ng: info:  [985] dispatching hogs: 1 cpu, 1 vm
stress-ng: info:  [987] cpu: for stable load results, select a specific cpu stress method with --cpu-method other than 'all'

root@am62lxx-evm:~# cyclictest -m -a 0-1 -t 2 -p99 -D6h -h600 -i200 -M -q
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
# Histogram
000000 000000   000000
000001 000000   000000
000002 000000   000000
000003 000000   000000
000004 000084   000000
000005 63517791 000000
000006 23151139 000149
000007 9505702  000107
000008 4289494  000096
000009 2509577  1496370
000010 1595242  24763741
000011 929138   32345454
000012 531403   23753747
000013 373906   13131295
000014 293321   5402244
000015 237471   2276379
000016 192376   1205394
000017 081270   761352
000018 000679   543054
000019 000040   418520
000020 000027   338381
000021 013613   283021
000022 058118   238270
000023 085291   199003
000024 097215   163140
000025 101050   135156
000026 097044   112877
000027 084739   094526
000028 067873   077422
000029 050777   062743
000030 035735   048960
000031 024383   037456
000032 016773   028125
000033 011804   020976
000034 008708   015963
000035 006392   011913
000036 005001   009026
000037 004123   006706
000038 003454   004873
000039 002870   003411
000040 002557   002530
000041 002241   001877
000042 001884   001409
000043 001550   001037
000044 001373   000724
000045 001123   000466
000046 000908   000308
000047 000733   000219
000048 000568   000180
000049 000422   000152
000050 000373   000105
000051 000277   000084
000052 000237   000107
000053 000206   000072
000054 000158   000088
000055 000163   000060
000056 000142   000053
000057 000133   000054
000058 000119   000052
000059 000127   000051
000060 000105   000044
000061 000096   000039
000062 000083   000031
000063 000070   000031
000064 000055   000024
000065 000073   000023
000066 000063   000035
000067 000045   000023
000068 000048   000021
000069 000035   000016
000070 000027   000014
000071 000028   000013
000072 000032   000011
000073 000032   000007
000074 000022   000007
000075 000022   000008
000076 000025   000006
000077 000017   000011
000078 000015   000006
000079 000012   000004
000080 000013   000009
000081 000011   000006
000082 000008   000005
000083 000009   000008
000084 000008   000004
000085 000013   000008
000086 000000   000001
000087 000007   000003
000088 000005   000002
000089 000003   000002
000090 000004   000003
000091 000003   000005
000092 000004   000004
000093 000003   000002
000094 000002   000001
000095 000003   000003
000096 000005   000002
000097 000003   000001
000098 000006   000002
000099 000003   000002
000100 000005   000000
000101 000007   000000
000102 000005   000002
000103 000007   000001
000104 000001   000002
000105 000003   000002
000106 000003   000000
000107 000003   000000
000108 000004   000001
000109 000002   000001
000110 000001   000002
000111 000004   000000
000112 000004   000000
000113 000001   000001
000114 000000   000000
000115 000002   000000
000116 000001   000000
000117 000001   000000
000118 000000   000000
000119 000001   000000
000120 000000   000000
000121 000001   000000
000122 000001   000000
000123 000002   000000
000124 000001   000000
000125 000002   000000
000126 000000   000000
000127 000003   000000
000128 000000   000000
...
000599 000000   000000
# Total: 108000000 107999937
# Min Latencies: 00004 00006
# Avg Latencies: 00006 00011
# Max Latencies: 00127 00113
# Histogram Overflows: 00000 00000
# Histogram Overflow at cycle number:
# Thread 0:
# Thread 1:

Next steps:

I am working on modifying the DDR QoS / CoS settings so that we can ensure DDR accesses are happening as efficiently as possible, and generating a minimal filesystem to check the impact of removing unneeded software (I have been told that removing unneeded software is a BIG part of the software optimizations you asked about). But it will be another day or so before those tests are ready.

In the meantime, tonight I will
1) replicate test 260520_2, but with OP-TEE TRNG disabled
2) replicate test 260520_2, but with 1ms test interval (cyclictest -m -a 0-1 -t 2 -p99 -D6h -h600 -i1000 -M -q)

Regards,

Nick

0 xi he 1 month ago in reply to Nick Saulnier

Prodigy 20 points

Hi Nick，

Thanks a lot for helping analyze this performance issue.

First, regarding the isolcpus=managed_irq parameter, there are no extra dedicated business configurations configured at the application layer.

Second, for the resource occupancy evaluation of Codesys, we mainly adopt the pressure simulation mode for Core 1 we discussed before. I have observed that the CPU load on Core 0 fluctuates dynamically from 0% to 50% in actual service scenarios. Such load changes normally will not cause jitter on Core 1. My only concern is that unexpected sharp load surges on Core 0 may impact the running performance of Core 1.

In short, I believe there is no need to change the current test environment configuration.

Regards,

0 Nick Saulnier 1 month ago in reply to xi he

TI__Guru** 109890 points

Hello Xi,

Thanks for the confirmation. I will continue using your initial stressors from May 12 in my tests for now. Let me know if you find a better way to model the system in the future.

Brief update: tests from today

My goal for the previous 24 hours was to dig further into how the cyclictest parameters impact the test results. I collected histograms from 3 more tests, and I started graphing them to make it easier to compare behavior. I will share that information when it is ready - maybe tomorrow, but my higher priority goal is to run tests with different DDR options. So it might be a few days before you see those graphs.

Those tests were:
260521_1 replicate 260520_2 with -i1000 (cyclictest -m -a 0-1 -t 2 -p99 -D6h -h600 -i1000 -M -q)
260521_2 replicate 260520_2 (OP-TEE TRNG disabled)
260521_3 replicate 260520_2 with -i1000 (OP-TEE TRNG disabled)
260522_1 replicate 260520_2 without -M (OoB OP-TEE)

Next steps

I added cyclictest and stress-ng and their dependencies to a base filesystem image. I will report initial tests tomorrow. (this is a barebones image. SDK documentation is here: https://software-dl.ti.com/processor-sdk-linux-rt/esd/AM62LX/11_02_08_02/exports/docs/linux/Foundational_Components_Filesystem.html ).

My hypothesis is that removing most of the code from the filesystem will improve latency. We will see how much of a difference it makes.

I am still figuring out the DDR settings. I hope to start running DDR tests by Friday night, but it might take until next week. Please note that Monday is a holiday in the U.S.

Regards,

Nick

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

quick update since I am on vacation today:

adjusting the DDR settings made a huge improvement in latency numbers. By tomorrow I’ll have histograms from 20+ different test runs.

I’ll post an analysis of the results before Wednesday your time. Looking at

DDR setting impact

OP-TEE setting impact

filesystem impact (ie, impact of removing code)

Core isolation impact

impact of different cyclic test arguments

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

Hello Xi,

Status update

26 tests have been run, all 6 hours or more. 24 histograms generated. I will continue running a few 6-hour tests every day for the next few days.

Summary of findings:

1) DDR QoS has the biggest impact on cyclictest results for your test setup. I will provide settings for you to run experiments

2) A smaller filesystem does NOT result in a clear reduction of latency

3) the cyclictest arguments have a big impact on the reported results (specifically, -i1000 vs -i200). Since there is not a 1 to 1 mapping between cyclictest results and CodeSys performance, I would prioritize actual CodeSys performance over specific cyclictest output numbers

More tests needed to comment on OP-TEE impact and core isolation impact.

How to apply the DDR QoS configuration to test with cyclictest & CodeSys?

type this into your terminal before running tests:

# first, set CBASS QoS priority
# A53 READ and WRITE ports default to the same EPRIORITY = 7
# 7 is the lowest priority
# set A53 READ port EPRIORITY from 7 (default) to 6 (higher priority)
root@am62lxx-evm:~# devmem2 0x45D20500 w 0x00006000

# verify write
root@am62lxx-evm:~# devmem2 0x45D20500

# next, set DDR priority
# DDR priority defaults to all entries = AXI priority 0 (highest priority)
# DDRSS DEF_PRI_MAP — map VBUSM priority 7 → DDR AXI priority 1
# all other PRIMAP entries stay 0 = DDR AXI priority 0 (higher priority)
# thus A53 READ has higher priority than WRITE
root@am62lxx-evm:~# devmem2 0x0F300030 w 0x00000001

# verify write
root@am62lxx-evm:~# devmem2 0x0F300030

Show me the raw data please

Sure.

test_table.csv has the list of all the tests that have been run that resulted in a histogram, and all the configurations that were used for each test.

260525_test_table.csv

cat1/2/3 files compare the histograms for tests that can be used to learn more about the impact of cyclictest parameters, DDR configuration, and the filesystem.

cat1_cyclictest_parameters.html

cat2_ddr_configuration.html

cat3_filesystem.html

latency_report has all histograms.

latency_report.html

Regards,

Nick

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

Hello Xi,

Just wanted to let you know that I am still running tests. I will put together reports on whether I observe any improvements with OP-TEE and with core isolation at the end of the day on Friday.

Regards,

Nick

0 Nick Saulnier 1 month ago in reply to Nick Saulnier

TI__Guru** 109890 points

Hello Xi,

I got some interesting findings for how to get the best cyclictest results for your test case:

1) use DDR QoS settings - already discussed

2) Do NOT isolate cores - performance was significantly worse in all tests

3) Do NOT use OPTEE TRNG (I assume not a factor if you are not actually loading OPTEE)

4) The data around whether the filesystem matters is inconclusive. If it matters, the effect is much smaller than the previous 3 points.

I was surprised that core isolation caused such a drop in performance. I am not sure how this would affect Codesys, but I would suggest testing with core isolation off to see if the performance improves.

Full reports

report_core_isolation.html

report_ddr_qos.html

report_optee.html

report_filesystem.html

Any explanation of the DDR QoS settings?

Here is my current draft. Let me know if there are followup questions.

Fullscreen ddr_qos_a53_read_priority.md Download

# Prioritizing A53 Memory Reads Over Writes on AM\* Sitara™ Processors

---

## Section 1 — Background and Motivation

Real-time Linux workloads on AM\* processors can exhibit elevated cyclictest latency under heavy DDR
load, even when the A53 core is isolated (`isolcpus`, `nohz_full`, etc.). A representative
reproduction command:

```bash
stress-ng --vm-method=zero-one --memrate 2 &
cyclictest -m -p 99 -i 200 -l 100000 -a <isolated_cpu>
```

The symptom is a large increase in worst-case latency (maximum, not average) that appears only when
DDR bandwidth is under pressure. On AM64x, field testing showed worst-case cyclictest latency
drop from **800+ µs to ~170 µs** after applying the register writes described in this document.

**Root cause mechanism:**

1. An RT task preempts a lower-priority task and must run on the isolated A53 core.
2. The RT task's working set is not fully in L1/L2 cache (cold cache, or evicted by the previous
   task). This causes cache-fill read misses.
3. Cache-fill reads must complete before the A53 core can execute the first instruction of the RT
   task. The core stalls until the data arrives from SDRAM.
4. Meanwhile, background write traffic from the same or other masters is in flight. At reset,
   DDR QoS is not configured, so reads and writes compete at equal priority inside the DDR
   controller's command queue.
5. A53 cache-fill reads are delayed behind write traffic. The RT task does not actually begin
   executing until those reads complete — this delay is the observed cyclictest latency spike.

**Solution:** Configure the CBASS (interconnect) and the DDR subsystem (DDRSS) to give A53 read
transactions higher priority than write transactions. Both hardware blocks have independent
priority arbitrators, and configuring both yields the maximum latency reduction.

**Devices covered:** AM62L, AM62x, AM64x, AM62Ax, AM62Px.

---

## Section 2 — Hardware Architecture

### 2.1 Signal Path from A53 to DDR

Two hardware building blocks sit between the A53 cluster and SDRAM:

1. **CBASS (Crossbar Switch)** — Routes transactions between all initiators (A53, R5F, DMA, GPU,
   etc.) and targets (DDR, MSMC, peripherals). Performs priority arbitration. Priority is encoded
   as a 3-bit VBUSM field where **0 = highest priority, 7 = lowest**. Each initiator port has a
   QoS block that injects a configurable EPRIORITY value into outgoing transactions.

2. **DDR Subsystem (DDRSS)** — Contains a VBUSM2AXI bridge (called MSMC2DDR on AM64x) that
   translates the VBUSM priority carried on incoming transactions to an AXI priority used by the
   DDR controller's command queue. A second stage of priority arbitration occurs here, operating
   on all transactions already buffered inside the DDR subsystem.

```
A53 Cluster (128-bit VBUSM per port)
  │
  ├─ AXI Read port ──→ [CBASS QoS Block]  ─┐
  └─ AXI Write port ─→ [CBASS QoS Block]  ─┤
                                            │  VBUSM bus (carries priority + Route ID)
                       Other initiators ───→┤
                                            ↓
                               ┌─────────────────────────────────┐
                               │  DDR Subsystem (DDRSS)          │
                               │  ┌──────────────────────────┐   │
                               │  │  VBUSM2AXI Bridge        │   │
                               │  │  • Route ID comparators  │   │
                               │  │  • VBUSM→AXI priority    │   │
                               │  │    mapping               │   │
                               │  └──────────┬───────────────┘   │
                               │             ↓                    │
                               │  ┌──────────────────────────┐   │
                               │  │  DDR Controller          │   │
                               │  │  (AXI priority queue)    │   │
                               │  └──────────┬───────────────┘   │
                               └─────────────│────────────────────┘
                                             ↓
                                        SDRAM (LPDDR4/DDR4)
```

**DDR32SS devices (AM62Ax, AM62Px)** use a bridge with two physical VBUSM input ports:

```
                               │  ┌──────────────────────────┐   │
                               │  │  VBUSM2AXI Bridge        │   │
                               │  │                          │   │
                               │  │  HPT port ← orderID 8-15 │   │  (High Priority Thread)
                               │  │  LPT port ← orderID 0-7  │   │  (Low Priority Thread)
                               │  │                          │   │
                               │  │  HPT always preempts LPT │   │
                               │  └──────────────────────────┘   │
```

HPT transactions always enter the DDR controller's command queue ahead of LPT transactions —
this is a hardware-enforced structural priority, not a software-controlled mapping.

### 2.2 Key Concept: Two-Stage Priority Control

There are two independent priority control points:

| Stage | Location | What it controls |
|-------|----------|-----------------|
| **Stage 1 — CBASS** | CBASS QoS MAP0 register, EPRIORITY field | Which transactions win arbitration through the interconnect fabric, before they reach the DDR subsystem |
| **Stage 2 — DDRSS** | VBUSM2AXI bridge DEF_PRI_MAP register | Which transactions win arbitration inside the DDR controller's command queue |

For maximum latency reduction, configure both stages. Stage 2 alone still provides meaningful
improvement because the DDR controller holds many in-flight transactions simultaneously and its
internal arbitration determines which complete first.

### 2.3 Route ID

Every VBUSM transaction carries a 12-bit **Route ID** identifying the originating initiator
interface. The DDRSS VBUSM2AXI bridge can inspect this Route ID for its range-match CoS
mechanism (see Section 3, Approach C).

Route IDs are assigned per initiator port in the CBASS connectivity table. Confirmed values
across AM62x, AM62Ax, AM62Px, and AM64x:

| Initiator                  | Route ID range |
|----------------------------|---------------|
| A53 Write port (CBA_AXI_W) | **0–7**       |
| A53 Read port (CBA_AXI_R)  | **16–23**     |
| Other initiators (R5F, GPU, DMA, etc.) | 64+ (device-specific) |

> **[UNCERTAIN for AM62L]** Route IDs for AM62L (write 0–7, read 16–23) have not been directly
> verified from the AM62L TRM Route ID table. The pattern is consistent across all other devices
> and the same CBASS IP is used.

### 2.4 Device Comparison Table

| Device | A53 Cores    | DDR SS   | DDR Ports     | Bridge        | CBASS QoS base |
|--------|-------------|---------|---------------|---------------|----------------|
| AM62L  | 2 (dual)    | DDR16SS | 1 (LPT only)  | VBUSM2AXI     | 0x45D20000     |
| AM62x  | 4 (quad)    | DDR16SS | 1 (LPT only)  | VBUSM2AXI     | 0x45D20000     |
| AM64x  | 2 (dual)    | DDR16SS | 1 (LPT only)  | MSMC2DDR      | **0x45D80000** |
| AM62Ax | 4 (quad)    | DDR32SS | 2 (HPT + LPT) | VBUSM2AXI     | 0x45D20000     |
| AM62Px | 4 (quad)    | DDR32SS | 2 (HPT + LPT) | VBUSM2AXI     | 0x45D20000     |

**AM64x** is the exception: its A53 QoS block resides in a separate CBASS region at **0x45D80000**,
not 0x45D20000. AM64x also has four R5F cores (two R5FSS subsystems). AM62Ax has a C7x DSP and
MCU R5F. AM62Px has an MCU R5F.

### 2.5 How Priority Settings Work

This section traces a single transaction end-to-end under two configurations — reset defaults
and after applying Approach A1 — to make the interaction between Stage 1 and Stage 2 concrete.

**Default state (at reset):**

- All CBASS QoS MAP0 registers reset to **0x7000**: every initiator port injects EPRIORITY = 7
  (numerically lowest priority) into outgoing transactions. A53 reads, A53 writes, DMA, GPU, and
  all other masters leave CBASS carrying the same VBUSM priority 7.
- DDRSS DEF_PRI_MAP resets to **0x00000000**: all VBUSM priorities (0–7) map to DDR AXI
  priority 0 (numerically highest). The DDR controller sees every transaction at equal AXI
  priority 0 and services them in arrival order.
- Net effect: every master competes identically at both the CBASS arbitration stage and the DDR
  controller command queue. A53 cache-fill reads wait behind write traffic with no mechanism to
  advance.

**After Approach A1:**

- A53 read port MAP0 is written to **0x6000**: A53 reads now leave CBASS carrying VBUSM
  priority 6. All other traffic (A53 writes, DMA, etc.) remains at VBUSM priority 7.
- DDRSS DEF_PRI_MAP is written to **0x00000001**: VBUSM priority 7 now maps to DDR AXI
  priority 1 (one step lower). VBUSM priority 6 retains its mapping to DDR AXI priority 0.
- Net effect: A53 reads carry VBUSM priority 6 and are mapped to DDR AXI priority 0 (highest).
  A53 writes and all other traffic carry VBUSM priority 7 and are mapped to DDR AXI priority 1.
  At both the CBASS arbitration point and the DDR controller command queue, A53 reads win.

| Stage | Default | After Approach A1 |
|-------|---------|-------------------|
| A53 read VBUSM priority out of CBASS | 7 (equal to writes) | 6 (beats writes at 7) |
| A53 read DDR controller AXI priority | 0 (equal to all traffic) | 0 (writes degraded to AXI 1) |

**Why both stages are necessary:** At reset, DEF_PRI_MAP maps all VBUSM priorities to DDR AXI
priority 0 regardless of their VBUSM value. If only Stage 1 (CBASS EPRIORITY) is configured
without updating DEF_PRI_MAP, the DDR controller receives transactions at VBUSM 6 and VBUSM 7
but maps both to AXI priority 0 — the distinction is lost, and reads and writes still compete
equally inside the DDR controller. Stage 2 (DDRSS DEF_PRI_MAP) must be written to translate
the distinct VBUSM priorities into distinct DDR AXI priorities for the differentiation to have
effect inside the DDR controller.

---

## Section 3 — Configuration Approaches

### Overview: Which approaches are available per device

| Approach | Works on | What it does |
|----------|----------|--------------|
| **A: EPRIORITY + DEF_PRI_MAP** | All devices | Prioritizes at both CBASS and DDR levels |
| **B: HPT/LPT orderID routing** | AM62Ax, AM62Px only | Routes reads to structurally-preferred HPT port |
| **C: DDRSS Route ID range match** | All devices | DDR-level differentiation by initiator Route ID |

Approaches can be combined. Approach A is the well-tested baseline. Approach B is the most
direct method on DDR32SS devices. Approach C provides the finest per-initiator control.

---

### Approach A: EPRIORITY + DEF_PRI_MAP (all devices)

Register addresses for this approach: CBASS QoS MAP0 (see Appendix A.1), DDRSS DEF_PRI_MAP
(see Appendix A.2).

**How it works:**

1. Write a lower EPRIORITY value (numerically smaller = higher priority) into the A53 read port
   MAP0 register. This raises the read port's VBUSM priority above the write port's value.
2. At reset, all traffic enters the DDR controller with the same AXI priority (DEF_PRI_MAP = 0).
   After raising EPRIORITY on reads, reads arrive with a distinct VBUSM priority.
3. Configure DEF_PRI_MAP to map the now-distinct VBUSM priorities to distinct DDR AXI priorities.
4. The effect operates at both the interconnect (CBASS arbitration) and the DDR controller
   (command queue arbitration).

#### Variant A1: Read priority only (minimal, recommended starting point)

Result: A53 reads → VBUSM priority 6 → DDR AXI priority 0 (highest). A53 writes and all other
masters → VBUSM priority 7 → DDR AXI priority 1.

```bash
# ── Stage 1: CBASS QoS — raise A53 read EPRIORITY from 7 to 6 ──────────────────
# EPRIORITY = 6 → 6 << 12 = 0x6000
# AM62L / AM62x / AM62Ax / AM62Px
mw.l 0x45D20500 0x00006000   # A53 read port EPRIORITY=6
md.l 0x45D20500 1            # read-back verification

# AM64x (different CBASS region)
mw.l 0x45D80500 0x00006000   # A53 read port EPRIORITY=6
md.l 0x45D80500 1

# ── Stage 2: DDRSS DEF_PRI_MAP — map VBUSM priority 7 → DDR AXI priority 1 ──────
# PRIMAP7[2:0] = 1 → 0x00000001  (all other PRIMAP entries stay 0 = DDR AXI priority 0)
mw.l 0x0F300030 0x00000001   # same address on all devices
```

> **NOTE for AM62Ax / AM62Px:** 0x0F300030 is LPT_DEF_PRI_MAP. This setting applies only to
> traffic on the LPT port. If A53 reads are routed to HPT via Approach B, also configure
> HPT_DEF_PRI_MAP at 0x0F30004C with the same value.

#### Variant A2: Both read and write boosted above other masters

Result: A53 reads → VBUSM 5 → DDR AXI 0. A53 writes → VBUSM 6 → DDR AXI 1. All other
masters → VBUSM 7 → DDR AXI 2.

```bash
# ── Stage 1: CBASS QoS ────────────────────────────────────────────────────────────
# AM62L / AM62x / AM62Ax / AM62Px
mw.l 0x45D20500 0x00005000   # A53 read EPRIORITY=5 (5<<12 = 0x5000)
md.l 0x45D20500 1
mw.l 0x45D20900 0x00006000   # A53 write EPRIORITY=6 (6<<12 = 0x6000)
md.l 0x45D20900 1

# AM64x
mw.l 0x45D80500 0x00005000   # A53 read EPRIORITY=5
md.l 0x45D80500 1
mw.l 0x45D80900 0x00006000   # A53 write EPRIORITY=6
md.l 0x45D80900 1

# ── Stage 2: DDRSS DEF_PRI_MAP ───────────────────────────────────────────────────
# PRIMAP6[6:4] = 1 (VBUSM 6 → DDR AXI 1), PRIMAP7[2:0] = 2 (VBUSM 7 → DDR AXI 2)
# = (1<<4) | (2<<0) = 0x10 | 0x02 = 0x00000012
mw.l 0x0F300030 0x00000012   # all devices (LPT_DEF_PRI_MAP on DDR32SS)
```

> **NOTE on 0x00000102:** Some historical AM64x references cite this value for DEF_PRI_MAP in
> the A2 variant. This value encodes PRIMAP5=1 (VBUSM 5 → AXI 1) and PRIMAP7=2 (VBUSM 7 →
> AXI 2), leaving PRIMAP6=0 (VBUSM 6 → AXI 0). That would give A53 writes (VBUSM 6) higher DDR
> AXI priority than A53 reads (VBUSM 5) — the opposite of the intended behavior. The correct
> value for the stated intent is **0x00000012**.
>
> Field testing showed no measurable latency difference between A1 and A2. **Variant A1 is
> recommended** as the simpler starting point.

---

### Approach B: HPT/LPT orderID Routing (AM62Ax, AM62Px only)

Register addresses for this approach: CBASS QoS MAP0 ORDERID field (see Appendix A.1). DDRSS
range registers not required for B1; see Appendix A.2 for HPT_DEF_PRI_MAP used in B2.

**How it works:**

The DDR32SS VBUSM2AXI bridge has two physical VBUSM input ports: HPT (High Priority Thread) and
LPT (Low Priority Thread). HPT always has structural priority over LPT — HPT commands enter
the DDR controller's command queue ahead of LPT commands at every arbitration cycle. This is
enforced in hardware and does not require any DEF_PRI_MAP configuration.

The CBASS routes a transaction to HPT if its ORDERID field ≥ 8, to LPT if ORDERID ≤ 7. ORDERID
is set per-initiator in the CBASS QoS MAP0 register, bits [7:4].

#### Approach B1: Route A53 reads to HPT, leave writes on LPT

```bash
# ── CBASS QoS: set ORDERID=8 on A53 read port ────────────────────────────────────
# MAP0: EPRIORITY stays at 7 (default = 0x7000), ORDERID = 8 → 8<<4 = 0x0080
# Full register value: 0x7000 | 0x0080 = 0x00007080
mw.l 0x45D20500 0x00007080   # AM62Ax / AM62Px: A53 read → HPT
md.l 0x45D20500 1
# A53 write port stays at default 0x7000 (ORDERID=0 → LPT). No write needed.
```

No DDRSS register changes are required. The HPT structural priority handles the differentiation.

#### Approach B2: Combine HPT routing with DEF_PRI_MAP for finer control

```bash
# A53 reads to HPT (ORDERID=8) and also elevated EPRIORITY=6
# EPRIORITY=6 → 0x6000, ORDERID=8 → 0x0080 → combined: 0x00006080
mw.l 0x45D20500 0x00006080   # AM62Ax / AM62Px: read → HPT, EPRIORITY=6
md.l 0x45D20500 1

# HPT_DEF_PRI_MAP: reads arrive at HPT with VBUSM priority 6
# PRIMAP7[2:0] = 1 → HPT traffic at VBUSM 7 → DDR AXI 1
mw.l 0x0F30004C 0x00000001   # HPT_DEF_PRI_MAP: PRIMAP7=1

# LPT_DEF_PRI_MAP: writes arrive at LPT with VBUSM priority 7 → DDR AXI 1
mw.l 0x0F300030 0x00000001   # LPT_DEF_PRI_MAP: PRIMAP7=1
```

> **NOTE (AM62Ax):** UDMA write channels are mapped to HPT by hardware for guaranteed QoS (per
> TRM section 4.6). When using HPT routing for A53 reads, UDMA writes also compete on the HPT
> port. If UDMA write bandwidth is a concern, use EPRIORITY or range-match registers
> (LPT_R\*/HPT_R\*) to further differentiate within the HPT port.

---

### Approach C: DDRSS Route ID Range Matching (all devices)

Register addresses for this approach: DDRSS range match MAT registers and range priority map
registers (see Appendix A.3), DEF_PRI_MAP (see Appendix A.2).

**How it works:**

The VBUSM2AXI bridge inspects the Route ID on every incoming transaction. If the Route ID
matches one of the three range match registers (R1, R2, R3 MAT), the corresponding range
priority map register (R1, R2, R3 PRI_MAP) overrides DEF_PRI_MAP for that transaction. This
allows different VBUSM→AXI priority mappings per initiator, entirely within the DDRSS and
without any CBASS EPRIORITY change.

**Match logic:** `(incoming_routeid >> MASK) == (ROUTEID_field >> MASK)`, where MASK specifies
how many LSBs to ignore. MASK=3 matches any Route ID in the same octet (e.g., 16–23 all match
with ROUTEID=16 and MASK=3).

**Route ID values confirmed for AM62x, AM62Ax, AM62Px, AM64x:**

| Initiator                   | Route ID range | ROUTEID_x field | MASK_x |
|-----------------------------|---------------|-----------------|--------|
| A53 Write (CBA_AXI_W)       | 0–7           | 0x000           | 3      |
| A53 Read (CBA_AXI_R)        | 16–23         | 0x010 (= 16)    | 3      |
| C7x DSP (AM62Ax)            | 32–39         | 0x020 (= 32)    | 3      |
| GPU Read/Write (AM62x/Px)   | 64–65         | see device TRM  | 0      |
| R5FSS (AM64x)               | 66–74         | see device TRM  | varies |
| PRU_ICSSG (AM64x)           | 384–447       | see device TRM  | 6      |
| MMCSD, GIC, USB, etc.       | 256+          | see device TRM  | varies |

**Encoding example for A53 reads (Route IDs 16–23):**

```
R1_MAT = (RANGEEN_A=1 << 31) | (MASK_A=3 << 28) | (ROUTEID_A=16 << 16)
       = 0x80000000 | 0x30000000 | 0x00100000
       = 0xB0100000
```

#### Approach C1: A53 reads → highest DDR priority (Route ID match, no CBASS change)

Effect: A53 reads → DDR AXI priority 0 (via range match, reset default). All other traffic →
DDR AXI priority 1 (via DEF_PRI_MAP). CBASS EPRIORITY is not changed.

```bash
# R1_MAT: enable A, MASK_A=3 (ignore lower 3 bits), ROUTEID_A=16 → matches Route IDs 16-23
mw.l 0x0F300024 0xB0100000

# R1_PRI_MAP stays at reset (0x0) → A53 reads get DDR AXI priority 0 (all fields = 0)

# DEF_PRI_MAP: PRIMAP7[2:0]=1 → everyone else (VBUSM 7) → DDR AXI priority 1
mw.l 0x0F300030 0x00000001

# AM62Ax / AM62Px: the above sets LPT registers. HPT range map registers (0x0F300050)
# reset to 0 (AXI priority 0) and typically need no change.
```

#### Approach C2: Three-tier — A53 reads > A53 writes > everyone else

Uses two range registers. R3 matches A53 reads; R1 matches A53 writes. R3 > R1 in precedence.

```bash
# R3_MAT: match A53 reads (Route IDs 16-23)
mw.l 0x0F30002C 0xB0100000

# R3_PRI_MAP stays at reset (0x0) → A53 reads → DDR AXI priority 0

# R1_MAT: match A53 writes (Route IDs 0-7), ROUTEID_A=0, MASK_A=3
# (1<<31)|(3<<28)|(0<<16) = 0x80000000|0x30000000|0x00000000 = 0xB0000000
mw.l 0x0F300024 0xB0000000

# R1_PRI_MAP: PRIMAP7[2:0]=1 → A53 writes → DDR AXI priority 1
mw.l 0x0F300034 0x00000001

# DEF_PRI_MAP: PRIMAP7[2:0]=2 → everyone else → DDR AXI priority 2
mw.l 0x0F300030 0x00000002
```

On AM62Ax / AM62Px, also write the same values to LPT_R3_PRI_MAP (0x0F30003C) and
LPT_R1_PRI_MAP (0x0F300034) when range-matching LPT traffic.

#### Approach C3: MCU R5F above A53 above others (AM64x, AM62Ax, AM62Px)

Use case: a system where an MCU R5F runs the hardest RT task and needs priority over A53.
AM64x R5FSS0 CPU0 read Route ID = 66, write = 67.

```bash
# R3_MAT: match R5F read (Route ID 66, exact match MASK=0)
# RANGEEN_A=1, MASK_A=0, ROUTEID_A=66 (0x042) → (1<<31)|(0<<28)|(0x042<<16) = 0x80420000
mw.l 0x0F30002C 0x80420000

# R3_PRI_MAP stays at reset → R5F reads → DDR AXI priority 0

# R2_MAT: match A53 reads (Route IDs 16-23)
mw.l 0x0F300028 0xB0100000

# R2_PRI_MAP: PRIMAP7[2:0]=1 → A53 reads → DDR AXI priority 1
mw.l 0x0F300038 0x00000001

# DEF_PRI_MAP: PRIMAP7[2:0]=2 → everyone else → DDR AXI priority 2
mw.l 0x0F300030 0x00000002
```

### Approach Comparison Summary

| Aspect | A: EPRIORITY + DEF_PRI_MAP | B: HPT/LPT orderID | C: Route ID range match |
|--------|---------------------------|---------------------|-------------------------|
| Devices | All | AM62Ax, AM62Px only | All |
| Affects CBASS arbitration | Yes | Indirectly (via orderID) | No |
| Affects DDR arbitration | Yes | Yes (structurally) | Yes |
| Requires Route ID knowledge | No | No | Yes |
| Per-initiator granularity | Low (all share same PRIMAP bucket) | Low | High (up to 6 initiators) |
| Field tested | Yes (AM64x, AM62Px) | Not specifically confirmed | Not confirmed |
| Complexity | Low | Low | Medium |

**Recommendation:** Start with **Approach A, Variant A1** on all devices. On AM62Ax/AM62Px,
**Approach B1** is the most direct method due to the hardware-enforced HPT priority. Use
Approach C when multiple RT masters need independent priority tiers (primarily AM64x with MCU
R5F and A53 both running RT workloads).

---

## Section 4 — When to Apply These Settings

### 4.1 Initialization Timing

Apply these register writes during early firmware initialization (BL1 / R5 SPL), before the
A53 cluster starts executing workloads. The TRM states:

> *"QoS block programming shall happen during device initialization time while there is no
> in-flight transaction for that initiator."*

For Linux-only setups, these can also be applied as one-time register writes in early kernel
init or via a pre-boot register write mechanism, since the registers persist until the next
power cycle.

### 4.2 U-Boot K3 QoS Framework Support

U-Boot provides a K3 QoS framework (`setup_qos()` iterating over a `qos_data[]` array in
`arch/arm/mach-k3/`) that writes CBASS QoS MAP registers from per-device data files at SPL
init time. The `K3_QOS_REG(base, i)` macro computes `base + 0x100 + i*4` to locate MAP
register offsets; `K3_QOS_VAL()` encodes the EPRIORITY, ORDERID, and ASEL fields into the
register value.

| Device | U-Boot QoS file | A53 priority entries | Notes |
|--------|-----------------|----------------------|-------|
| AM62Ax | `r5/am62ax/am62a_qos_uboot.c` | Not present | DSS display DMA only (orderID=8) |
| AM62Px | `r5/am62px/am62p_qos_uboot.c` | Not present | DSS display DMA ×2 (orderID=15) |
| AM62L  | None | N/A | Must write registers manually |
| AM62x  | None | N/A | Must write registers manually |
| AM64x  | None | N/A | Must write registers manually |

`CONFIG_K3_QOS` defaults to `y` only for `SOC_K3_AM62A7` in Kconfig. It must be explicitly
enabled for AM62Px (`CONFIG_K3_QOS=y` in defconfig). It is not set for AM62x, AM64x, or AM62L.

### 4.3 Gaps — What Is Missing from the Current U-Boot Framework

The following gaps affect all five devices and are documented here as reference for the
development team.

**Gap 1 — AM62Ax and AM62Px: A53 priority entries absent from qos_data[]**

The `am62a_qos.h` and `am62p_qos.h` headers define the A53 CBASS block addresses:

- `SAM62A_A53_512KB_WRAP_MAIN_0_A53_QUAD_WRAP_CBA_AXI_R = 0x45D20400`
- `SAM62A_A53_512KB_WRAP_MAIN_0_A53_QUAD_WRAP_CBA_AXI_W = 0x45D20800`

The MAP0 registers for these blocks are at base + 0x100, i.e.:
- A53 read MAP0 = 0x45D20400 + 0x100 + 0×4 = **0x45D20500**
- A53 write MAP0 = 0x45D20800 + 0x100 + 0×4 = **0x45D20900**

Despite these addresses being defined in the headers, the existing `qos_data[]` arrays in both
`am62a_qos_uboot.c` and `am62p_qos_uboot.c` only configure DSS display DMA. No entries exist
to set EPRIORITY on the A53 read or write ports.

**Gap 2 — AM62L, AM62x, AM64x: No _qos_uboot.c files exist at all**

`CONFIG_K3_QOS` defaults to `y` only for `SOC_K3_AM62A7`; it is not enabled for AM62x, AM64x,
or AM62L in Kconfig. No `qos.h` header files with initiator endpoint addresses exist for these
devices in the U-Boot tree (verified against both SDK 11.2 / ti-u-boot-2025.01 and SDK 12.0 /
ti-u-boot-2026.01 — both are identical in QoS content). Any QoS configuration for these devices
must be written manually via direct register access in SPL or a separate init stage.

**Gap 3 — DDRSS Stage 2 registers not in framework scope**

The K3 QoS framework (`setup_qos()` / `k3_qos_data`) only covers CBASS QoS MAP registers.
The DDRSS registers — DEF_PRI_MAP (0x0F300030), HPT_DEF_PRI_MAP (0x0F30004C), and range match
MAT and PRI_MAP registers at the 0x0F300000 base — are outside the current framework and would
require a separate mechanism (e.g., explicit writes in the board init sequence alongside or
after `setup_qos()`).

---

## Verification

After applying the register settings on hardware:

1. **Baseline:** `cyclictest -m -p 99 -i 200 -l 100000 -a <isolated_cpu>`
2. **Apply DDR load:** `stress-ng --vm-method=zero-one --memrate 2 &`
3. **Loaded run:** `cyclictest -m -p 99 -i 200 -l 100000 -a <isolated_cpu>`
4. **Compare** worst-case (maximum) latency between the two runs.
5. **Expected result:** Significant reduction in maximum latency. AM64x reference:
   800+ µs → ~170 µs.

**Verify register writes took effect** by reading back immediately after writing:

```bash
# AM62L / AM62x / AM62Ax / AM62Px — after Approach A1
md.l 0x45D20500 1   # should read 0x00006000
md.l 0x0F300030 1   # should read 0x00000001

# AM64x — after Approach A1
md.l 0x45D80500 1   # should read 0x00006000
md.l 0x0F300030 1   # should read 0x00000001
```

---

## Appendix A — Register Reference

### A.1 CBASS QoS MAP0 Registers (Stage 1)

Each initiator port has one QoS block. The MAP0 register is at `block_base + 0x100`. All
registers reset to **0x7000** (EPRIORITY = 7, lowest priority).

**MAP0 register bitfield layout (identical on all devices):**

| Bits  | Field     | Reset | Description |
|-------|-----------|-------|-------------|
| 14:12 | EPRIORITY | 7h    | VBUSM priority injected on outgoing transactions. 0 = highest, 7 = lowest. |
| 11:8  | ASEL      | 0h    | Leave at 0 for DDR access. (Values 14/15 reserved for A53 ACP.) |
| 7:4   | ORDERID   | 0h    | 0–7 → LPT port; 8–15 → HPT port. Relevant only on DDR32SS devices (AM62Ax, AM62Px). |
| 2:0   | QOS       | 0h    | Not used. |

**A53 QoS register addresses (MAP0 = block_base + 0x100):**

| Device | A53 Read port MAP0 | A53 Write port MAP0 | QoS block bases (R / W) |
|--------|--------------------|---------------------|--------------------------|
| AM62L  | **0x45D20500**     | **0x45D20900**      | 0x45D20400 / 0x45D20800  |
| AM62x  | **0x45D20500**     | **0x45D20900**      | 0x45D20400 / 0x45D20800  |
| AM64x  | **0x45D80500**     | **0x45D80900**      | 0x45D80400 / 0x45D80800  |
| AM62Ax | **0x45D20500**     | **0x45D20900**      | 0x45D20400 / 0x45D20800  |
| AM62Px | **0x45D20500**     | **0x45D20900**      | 0x45D20400 / 0x45D20800  |

> **TRM NOTE:** After any write to the 0x4500_0000–0x45FF_FFFF range, always read back the
> register to confirm the write landed.

> **TRM NOTE:** For peripherals with both a QoS block0 and block1 serving the same function
> (e.g., MMCSD), both must be written to the same value. For A53 specifically, block0 = read
> port and block1 = write port — intentionally different values are correct.

### A.2 DDRSS Priority Map Registers (Stage 2)

**Base address: 0x0F300000** (DDR16SS0 SSCFG, all devices)

**Default priority map registers (VBUSM → DDR AXI priority):**

| Register                            | Offset | Address      | Devices                        |
|-------------------------------------|--------|--------------|--------------------------------|
| EMIF_SSCFG_V2A_DEF_PRI_MAP_REG      | 0x30   | 0x0F300030   | AM62L, AM62x, AM64x            |
| EMIF_SSCFG_V2A_LPT_DEF_PRI_MAP_REG  | 0x30   | 0x0F300030   | AM62Ax, AM62Px (LPT port)      |
| EMIF_SSCFG_V2A_HPT_DEF_PRI_MAP_REG  | 0x4C   | 0x0F30004C   | AM62Ax, AM62Px (HPT port)      |

**Range match registers (all devices; shared between LPT/HPT on DDR32SS):**

| Register                  | Offset | Address    |
|---------------------------|--------|------------|
| EMIF_SSCFG_V2A_R1_MAT_REG | 0x24   | 0x0F300024 |
| EMIF_SSCFG_V2A_R2_MAT_REG | 0x28   | 0x0F300028 |
| EMIF_SSCFG_V2A_R3_MAT_REG | 0x2C   | 0x0F30002C |

**Range priority map registers:**

| Register                           | Offset | Address    | Devices                        |
|------------------------------------|--------|------------|--------------------------------|
| EMIF_SSCFG_V2A_R1_PRI_MAP_REG      | 0x34   | 0x0F300034 | AM62L, AM62x, AM64x            |
| EMIF_SSCFG_V2A_R2_PRI_MAP_REG      | 0x38   | 0x0F300038 | AM62L, AM62x, AM64x            |
| EMIF_SSCFG_V2A_R3_PRI_MAP_REG      | 0x3C   | 0x0F30003C | AM62L, AM62x, AM64x            |
| EMIF_SSCFG_V2A_LPT_R1_PRI_MAP_REG  | 0x34   | 0x0F300034 | AM62Ax, AM62Px (LPT)           |
| EMIF_SSCFG_V2A_LPT_R2_PRI_MAP_REG  | 0x38   | 0x0F300038 | AM62Ax, AM62Px (LPT)           |
| EMIF_SSCFG_V2A_LPT_R3_PRI_MAP_REG  | 0x3C   | 0x0F30003C | AM62Ax, AM62Px (LPT)           |
| EMIF_SSCFG_V2A_HPT_R1_PRI_MAP_REG  | 0x50   | 0x0F300050 | AM62Ax, AM62Px (HPT)           |
| EMIF_SSCFG_V2A_HPT_R2_PRI_MAP_REG  | 0x54   | 0x0F300054 | AM62Ax, AM62Px (HPT)           |
| EMIF_SSCFG_V2A_HPT_R3_PRI_MAP_REG  | 0x58   | 0x0F300058 | AM62Ax, AM62Px (HPT)           |

**DEF_PRI_MAP / LPT_DEF_PRI_MAP / HPT_DEF_PRI_MAP bitfield layout (identical for all):**

| Bits  | Field   | Description |
|-------|---------|-------------|
| 30:28 | PRIMAP0 | VBUSM priority 0 → DDR AXI priority (0 = highest, 7 = lowest) |
| 26:24 | PRIMAP1 | VBUSM priority 1 → DDR AXI priority |
| 22:20 | PRIMAP2 | VBUSM priority 2 → DDR AXI priority |
| 18:16 | PRIMAP3 | VBUSM priority 3 → DDR AXI priority |
| 14:12 | PRIMAP4 | VBUSM priority 4 → DDR AXI priority |
| 10:8  | PRIMAP5 | VBUSM priority 5 → DDR AXI priority |
| 6:4   | PRIMAP6 | VBUSM priority 6 → DDR AXI priority |
| 2:0   | PRIMAP7 | VBUSM priority 7 → DDR AXI priority |

**Reset value: 0x00000000** — all VBUSM priorities map to DDR AXI priority 0. At reset, every
master has equal highest priority inside the DDR controller.

### A.3 Range Match Register Bitfields

Each MAT register contains two independent Route ID matchers (A and B):

| Bits  | Field      | Description |
|-------|------------|-------------|
| 31    | RANGEEN_A  | Enable matcher A |
| 30:28 | MASK_A     | Number of LSBs to ignore: 0 = exact match, 1 = match pairs, 3 = match octets |
| 27:16 | ROUTEID_A  | 12-bit Route ID pattern for matcher A |
| 15    | RANGEEN_B  | Enable matcher B |
| 14:12 | MASK_B     | Number of LSBs to ignore for matcher B |
| 11:0  | ROUTEID_B  | 12-bit Route ID pattern for matcher B |

**Priority resolution:** if multiple range registers match a transaction, the highest-numbered
range wins: R3 > R2 > R1 > DEF.

**Encoding formula:**

```
REG = (RANGEEN_A<<31) | (MASK_A<<28) | (ROUTEID_A<<16)
    | (RANGEEN_B<<15) | (MASK_B<<12) | (ROUTEID_B<<0)
```

---

*Document scope: DDR QoS configuration for A53 read/write prioritization. Out of scope: leaky
bucket threshold registers, ECC CoS configuration, per-range priority tuning beyond the
examples given.*

Next steps

I am going to do a few more test runs of the "best case scenario" with the default and base filesystems to see if there is a difference in behavior. After that, optimizations would be at the software level.

Regards,

Nick

0 xi he 30 days ago in reply to Nick Saulnier

Prodigy 20 points

Hi Nick，

Thanks for your help.

After adding the DDR QoS configuration, we ran the CODESYS application test environment (1ms cycle, 8 motor axes in operation). The test has been ongoing for 6 hours, with jitter ranging from -141 μs to 140 μs. Another performance indicator shows the maximum cycle duration reached 1120 μs, exceeding the expected 1ms.

We also conducted verification with the same DDR configuration using cyclictest. Over a 6-hour test, the maximum jitter was 132 μs. Therefore, we expect to explore new optimization approaches to reduce the jitter to the target of 100 μs.

As for core isolation: disabling it brought no noticeable improvement in the CODESYS scenario. Hence I do not recommend turning it off. In the current setup, CPU1 maintains a steady load of 70%–75% for a long time, while CPU0 runs at 40%–50%. If the load on CPU0 rises along with business operations, disabling core isolation will likely impact CPU1 performance.

Regards,

0 Nick Saulnier 30 days ago in reply to xi he

TI__Guru** 109890 points

For future readers, I moved the discussion around moving interrupts around with CodeSys to another thread:
RE: AM62L: Testing Codesys

This thread is already pretty long, so we will focus on cyclictest results here.

Hello Xi,

1) Did you test with both core isolation = OFF, and DDR QoS = ON? I would suggest running with both settings

2) Did you ever create a separate thread to confirm the OP-TEE behavior discussed? If you are confident that your usecase is not loading OPTEE, then we do not need to investigate that further. My hypothesis is that OPTEE TRNG enabled lead to increased latencies by swapping between "regular" Linux context and OPTEE context, so if you do not have OPTEE then there is no OPTEE context switch to worry about.

Regards,

Nick

0 xi he 29 days ago in reply to Nick Saulnier

Prodigy 20 points

Hi Nick,

Yes, I have tried not enabling isolcpus, and simultaneously enabled DDR QoS, but my test commands still apply fixed stress to the specified CPU cores: cyclictest -a 0-1 -t 2 -p 99 -m -D 0; taskset -c 1 stress-ng --cpu 1 --cpu-load 70 --vm 1 --vm-bytes 80% & ; taskset -c 0 stress-ng --cpu 1 --cpu-load 50 &.

By the way, I'm applying stress to CPU0 because, based on the CODESYS environment reference, stressing CPU0 can quickly reproduce issues. After 6 hours of testing, the current maximum jitter is 166us, whereas with isolcpus + DDR QoS enabled, under the same stress conditions, the maximum jitter after 6 hours of testing is 132us.

Regarding OP-TEE, there are no logs showing its successful startup in either our kernel or U-Boot, so I believe OP-TEE is not being invoked in my scenario, and therefore I'm not considering this direction.

Regards,

0 Nick Saulnier 23 days ago in reply to xi he

TI__Guru** 109890 points

Hello Xi,

Sounds good. I have been doing some background research into how different drivers work to try to determine the next direction to investigate. Nothing major to report from a driver standpoint.

It is possible that additional DDR QoS settings may get better performance (If we raise the priority of data transfers from the CPSW DMA to DDR).

I do not have a CodeSys setup. If I give you register settings, would you be able to run tests and report back?

Regards,

Nick

0 xi he 23 days ago in reply to Nick Saulnier

Prodigy 20 points

Hi Nick,

No problem, I'm happy to assist with the relevant testing and verification. Please give me the register settings which need to be tested in Codesys.

0 Nick Saulnier 22 days ago in reply to xi he

TI__Guru** 109890 points

Hello Xi,

Ok, let's start with a slight variation of the previous tests. Remember that first there is CBASS QoS (quality of service) to say which initiators are prioritized within the bus structure inside of the processor, and then DDR CoS (class of service) which maps CBASS priority with priority in the DDR controller.

PREVIOUS TEST

CBASS QoS
EPRIORITY=0 is highest priority, 7 is lowest priority
CPSW DMA = 0 (unchanged)
A53 read = 6 (changed)
A53 writes & everything else = 7 (unchanged)

DDR CoS
priority 0 is highest priority, 7 is lowest priority
CPSW DMA = 0 (unchanged)
A53 read = 0 (unchanged)
A53 writes & everything else = 1 (changed)

The CPSW DMA cannot be changed from EPRIORITY = 0 for the CBASS QoS. So there are 2 simple ways to adjust the test:

1) we could lower the priority of CPSW at the DDR CoS to give CPSW DMA priority in both CBASS and also DDR (this might help if latencies are caused by CPSW not getting packets into DDR consistently)

2) We could raise the priority of A53 reads in CBASS to the same as CPSW DMA, and adjust DDR CoS to prefer A53 reads over all other entries (this might help if latencies are caused by DMA writes preempting A53 reads to refresh the cache)

Register writes to test CPSW DMA > A53 reads > everything else

CPSW > A53 reads > everything else

# Stage 1: CBASS — set A53 reads EPRIORITY=1
devmem2 0x45D20500 w 0x00001000
devmem2 0x45D20500               # expect 0x00001000

# Stage 2: DDRSS DEF_PRI_MAP
# PRIMAP0[30:28]=0: pktDMA/BCDMA (VBUSM 0) → AXI 0
# PRIMAP1[26:24]=1: A53 reads  (VBUSM 1) → AXI 1
# PRIMAP7[ 2: 0]=2: everything else (VBUSM 7) → AXI 2
# = (0<<28)|(1<<24)|(2<<0) = 0x01000002
devmem2 0x0F300030 w 0x01000002
devmem2 0x0F300030               # expect 0x01000002

Register writes to test A53 reads > CPSW DMA > everything else

A53 reads > CPSW (Route ID match) > everything else

# Stage 1: CBASS — set A53 reads to EPRIORITY=0 (same bucket as pktDMA)
devmem2 0x45D20500 w 0x00000000
devmem2 0x45D20500               # expect 0x00000000

# This assumes route IDs for AM62L A53 cores match AM62x
# Stage 2: DDRSS — R1_MAT: match Route IDs 16-23 (A53 reads)
# R1_MAT: RANGEEN_A=1, MASK_A=3 (match octets), ROUTEID_A=16 → matches Route IDs 16-23 (A53 reads)
# = (1<<31)|(3<<28)|(16<<16) = 0x80000000|0x30000000|0x00100000 = 0xB0100000
devmem2 0x0F300024 w 0xB0100000
devmem2 0x0F300024               # expect 0xB0100000

# R1_PRI_MAP at 0x0F300034 stays at reset (0x00000000) — no write needed
# PRIMAP0=0 in R1_PRI_MAP → matched A53 reads (VBUSM 0) → DDR AXI 0

# DDRSS DEF_PRI_MAP
# PRIMAP0[30:28]=1: VBUSM 0 (pktDMA/BCDMA, unmatched) → AXI 1
# PRIMAP7[ 2: 0]=2: everything else (VBUSM 7) → AXI 2
# = (1<<28)|(2<<0) = 0x10000002
devmem2 0x0F300030 w 0x10000002
devmem2 0x0F300030               # expect 0x10000002

Regards,

Nick

0 Nick Saulnier 21 days ago in reply to Nick Saulnier

TI__Guru** 109890 points

Hello Xi,

I am going to start a new round of testing soon. Does this look like the test you would want me to run to more closely replicate your environment?

taskset -c 1 stress-ng --cpu 1 --cpu-load 70 --vm 1 --vm-bytes 80% &
taskset -c 0 stress-ng --cpu 1 --cpu-load 50 &
cyclictest -m -a 0-1 -t 2 -p99 -D6h -h600 -i1000 -q

Regards,

Nick

0 xi he 21 days ago in reply to Nick Saulnier

Prodigy 20 points

Hi Nick,

Of course, I will perform multiple verification tests after power cycling this configuration to ensure the test data remains stable each time.

However, under the "Register writes to test A53 reads > CPSW DMA > everything else" configuration,:

the 0x0F300024 register could not be read or written to - only the other two registers were successfully modified. Under the CODESYS business scenario, after testing for approximately 3 hours, the jitter results were [-136, 136], so I stopped further testing.

Here is the error log:

root@am62xx-evm:~/hxtest# devmem2 0x0f300024 w
/dev/mem opened.
Memory mapped at address 0xffff7f772000.
Bus error

The CODESYS business scenario involves: 1ms cycle time with 8-axis motor load.

By the way, the first there is CBASS QoS tests, with jitter resultls of: [-132, 136].

So, I will perform multiple verification tests after power cycling this configuration to ensure the test data remains stable each time.

Additionally: I believe your new load stress test is similar to my current business CPU situation, and I think it can quickly simulate and reproduce harsh scenarios.

Regards,

0 Nick Saulnier 20 days ago in reply to xi he

TI__Guru** 109890 points

Hello Xi,

Thank you for the report. I will take a closer look at the failed register write later this week.

I am starting a new round of testing now with a kernel config - will probably take a few days to run all the tests. Expect to see more early next week.

FYI, I am still using your core isolation settings, but some of them do not actually apply on AM62L:

root@am62lxx-evm:~# dmesg | grep -i "unknown\|command line"
[    0.000000] Kernel command line: console=ttyS0,115200n8 vt.global_cursor_default=0 rcu_nocb_poll rcu_nohz=1 rcu_nocbs=1 idle=poll no
hz=on nohz_full=1 kthread_cpus=0 irqaffinity=0 isolcpus=managed_irq,domain,1 earlycon=ns16550a,mmio32,0x02800000 root=PARTUUID=076c4a2a
-02 rw rootfstype=ext4 rootwait
[    0.000000] Unknown kernel command line parameters "rcu_nohz=1 idle=poll kthread_cpus=0", will be passed to user space.
[    5.787020] systemd[1]: Starting Generate network units from Kernel command line...
[    6.370733] systemd[1]: Finished Generate network units from Kernel command line.

Here is a summary of analysis from an AI tool:

  What is actually active vs. dropped

  Dropped (passed to userspace, no kernel effect):
  - rcu_nohz=1 — confirmed not a real param.
  - kthread_cpus=0 — confirmed not upstream / not in this kernel.
  - idle=poll — [GENERAL KNOWLEDGE] this is an x86-only parameter, parsed in arch/x86/kernel/process.c. On arm64 (AM62Lx is
  Cortex-A53/A35-class arm64) the idle= knob is not registered, so it is silently ignored.

  Active and recognized:
  - rcu_nocb_poll, rcu_nocbs=1, nohz=on, nohz_full=1, irqaffinity=0, isolcpus=managed_irq,domain,1.

  The "no kernel threads on CPU 1" and "no idle entry on CPU 1" properties Set 1 appears to claim are not happening — those parameters
  are unknown.

  What to use instead on this arm64 kernel

  To actually get the protections that the dropped parameters were trying to provide:

  - Idle-state suppression on CPU 1 — at runtime, disable cpuidle states per-CPU:
  for s in /sys/devices/system/cpu/cpu1/cpuidle/state*/disable; do echo 1 > "$s"; done
  - Or use PM-QoS via /dev/cpu_dma_latency (write a 32-bit 0 and hold the fd open) for a global cap. [GENERAL KNOWLEDGE] cpuidle.off=1
  is a kernel param that disables cpuidle entirely; verify in this kernel before relying on it.
  - Kernel-thread placement — there is no upstream kthread_cpus= boot param. The closest mechanisms are:
    - nohz_full= already routes some kthreads off the full-dynticks CPUs automatically.
    - For unbound kthreads that remain, set affinity at runtime (taskset -p 1 <pid> for each unbound kworker) or use cgroup cpusets
  with the housekeeping mask.
  - rcu_nohz — drop it; not a real parameter.
  - nohz=on — drop it; redundant with the kernel default.

Regards,

Nick

0 Nick Saulnier 7 days ago in reply to Nick Saulnier

TI__Guru** 109890 points

Hello Xi,

Can I get you to share your latest kernel configs?

I am still working on the report and testing the QoS register writes.

Regards,

Nick

0 Nick Saulnier 7 days ago in reply to Nick Saulnier

TI__Guru** 109890 points

I am unable to replicate your observation about writing to the register addresses:

root@am62lxx-evm:~# # writes for CPSW DMA > A53 reads > everything else
root@am62lxx-evm:~# devmem2 0x45D20500 w 0x00001000
/dev/mem opened.
Memory mapped at address 0xffff9f4dc000.
Read at address  0x45D20500 (0xffff9f4dc500): 0x00007000
Write at address 0x45D20500 (0xffff9f4dc500): 0x00001000, readback 0x00001000
root@am62lxx-evm:~# devmem2 0x0F300030 w 0x01000002
/dev/mem opened.
Memory mapped at address 0xffff9518a000.
Read at address  0x0F300030 (0xffff9518a030): 0x00000000
Write at address 0x0F300030 (0xffff9518a030): 0x01000002, readback 0x01000002

root@am62lxx-evm:~# # now writes for A53 reads > CPSW DMA > everything else
root@am62lxx-evm:~# devmem2 0x45D20500 w 0x00000000
/dev/mem opened.
Memory mapped at address 0xffff8103d000.
Read at address  0x45D20500 (0xffff8103d500): 0x00001000
Write at address 0x45D20500 (0xffff8103d500): 0x00000000, readback 0x00000000
root@am62lxx-evm:~# devmem2 0x0F300024 w 0xB0100000
/dev/mem opened.
Memory mapped at address 0xffffad545000.
Read at address  0x0F300024 (0xffffad545024): 0x00000000
Write at address 0x0F300024 (0xffffad545024): 0xB0100000, readback 0xB0100000
root@am62lxx-evm:~# devmem2 0x0F300024 w
/dev/mem opened.
Memory mapped at address 0xffffb93f7000.
Read at address  0x0F300024 (0xffffb93f7024): 0xB0100000
root@am62lxx-evm:~# devmem2 0x0F300030 w 0x10000002
/dev/mem opened.
Memory mapped at address 0xffff8a297000.
Read at address  0x0F300030 (0xffff8a297030): 0x01000002
Write at address 0x0F300030 (0xffff8a297030): 0x10000002, readback 0x10000002

The read command should be "devmem2 0x0f300024", not "devmem2 0x0f300024 w". I have fixed my previous post. But either way I am able to write to the registers.

0 xi he 7 days ago in reply to Nick Saulnier

Prodigy 20 points

6131.config.gz

Hi Nick,

Here is my latest kernel configs.

As for register address 0x0F300030, i still write error and didn't figured out the reason. Here is the error log:

root@am62xx-evm:~# devmem2 0x0F300024 w 0xB0100000
/dev/mem opened.
Memory mapped at address 0xffffbe524000.
Bus error

0 Kevin Peng 5 days ago in reply to Nick Saulnier

TI__Expert 4858 points

Hi Nick,

Anything you could think of why Xi still cannot write this register? (e.g. firewall restrictions?)

Thanks,

Kevin

Arm-based microcontrollers

Arm-based microcontrollers forum

AM62L: AM62L Real-time Performance Issue