Part Number: SK-AM62
Other Parts Discussed in Thread: AM625
Hello all,
I ran a series of tests using the PREEMPT_RT kernel (v5.10.120-rt70) on an am625 SoC, aimed at assessing the real-time performance I can get from this device, specifically the worst-case latency for a user task waiting for timer events. My question is whether the results below are accurately representing the typical worst-case latency we may expect (sorry for the long post, details may help though).
The test configuration is as follows:
- PLL set to 25Mhz (boot switches)
- worst-case latency on timer events measured with the regular 'cyclictest' program from the rt-tests suite (clock_nanosleep() interface only).
- TI vendor kernel available from [1]
- kernel config tweaks:
* disable ACPI (CONFIG_ACPI)
* force enable CPU_FREQ 'performance' governor
(CPU_FREQ_DEFAULT_GOV_PERFORMANCE), disable all other governors
* all kernel debug switches off
- 20' sampling loop running at 1Khz, performed by a single thread. This may be way too short to observe the worst figure, but enough in our case to observe high values already.
- the sampling thread was always pinned on a single CPU, either isolated (CPU2) or not (CPU1).
- a stress load was running in parallel to the test, composed of a dd loop continuously clearing memory and a 'hackbench' loop issuing a massive amount of context switches, all left freely running on the non-isolated CPUs.
Practically, the commands used were:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# dd if=/dev/zero of=/dev/null bs=128M&
# while :; do hackbench; done&
# cyclictest -a <cpu_nr> -p 98 -m -n -i 1000 -D 20m -q
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With the 'isolcpus=2' boot param when testing the isolated CPU case. Proper CPU affinity for the sampling thread was double-checked.
The results are as follows, displayed as "worst-case(average)", all in microseconds,
ISOLATED(CPU2) | NON-ISOLATED(CPU1)
170(27) 368(59)
These figures seem high for that class of hardware, with significant disturbance/noise on the isolated CPU running the latency test, caused by activities running on other CPUs which move memory around and switch context at high rate. As expected, it's much worse in the non-isolated case.
Any insight about those figures, and a way to get them down if possible would be much appreciated.
Sidenote: the xenomai4 EVL core ported this SoC on top of the TI base kernel [2] revealed the same impact of the non-rt stress load, with 130(10) and 284(40) respectively. This may rule out a PREEMPT_RT-specific issue there, since the implementations have nothing in common. Another takeaway from this particular test: we could definitely see a (negative) impact of enabling the transparent huge page support in the configuration when looking to the EVL figures. Ftracing tells us that this may have to do with I/D cache maintenance operations after fixing up page table entries, significantly delaying interrupts despite the CPU is not masking them. I could not check this for PREEMPT_RT, since this configuration is not supported.
Thanks,
[1] git.ti.com/.../ at #gca705d5c043)