[FAQ] AM625: How to measure interrupt latency on multicore Sitara devices using cyclictest?

Pekka Varis

Part Number: AM625
Other Parts Discussed in Thread: AM3357, , AM6442

How can I measure interrupt latency on Sitara multicore devices such as AM625 and compare the results to single core devices such as AM3357?

For more information on testing interrupt latency with cyclictest, reference e2e.ti.com/.../faq-linux-how-do-i-test-the-real-time-performance-of-an-am3x-am4x-am6x-soc

over 1 year ago

0 Pekka Varis over 1 year ago

TI__Mastermind 22180 points

Cyclictest (https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start ) is the typical real-time test tool to measure interrupt latency in Linux. OSADL ( https://www.osadl.org/Realtime-Preempt-Kernel.kernel-rt.0.html#externaltestingtool ) pages provide a lot of details and examples on how to run tests as well as they have an extensive farm of hardware with public details on the Linux kernel running and the resulting interrupt latencies. Most often the maximum or worst case latency is the more important metric than the average.

cyclictest -l100000000 -m -Sp90 -i200 -h400 -q >output

is the OSADL example or default parameters:

-m means lock all memory allocations, in a realtime embedded system you do not want to have paging of memory out to disk

-l100000000 is for 100M iterations. Most of the worst case outliers happen only for example once per 1M or once per an hour, so a long run is required (note this means >5h of realtime)

- S is SMP or start one thread per core. It is often simplest to just start one per core rather than just measure on one core (-t1 -a2 would be an alternative to start one thread on core 2)

-p90 is run the interrupt at realtime priority 90 (99 is the highest). So this is a choice of the relative priority to other things you want to measure.

-i200 is for measure latency every 200 microseconds (thread per core scheduled to run every 200us)

-h400 is to collect a histogram of observed interrupt latencies up to 400us, 400 buckets

-q is to be quiet until the end

In general it is also important to consider what else is running and how representative is it of your final system. Things like filesystem over Ethernet (NFS) will effect the result. Just cyclictest in an otherwise idle system is perhaps of limited value, often it makes sense to start a background load to represent the non-realtime of the actual system. One possible way to do this is to use a synthetic test program like stress-ng.

On a single core system both background and realtime will co-exist on the same core, but will execute in serial. On a SMP multicore system background threads can run in parallel on another core. To control for this variable and guarantee lower maximum latency in both real application and cyclictest tests core affinity and isolating cores can be used. On Sitara multicore processors you can isolate one (or more) cores from the kernel scheduling using the kernel command line parameter isolcpus . For example below I halt the boot at uboot and isolate cpu 3, so kernel will not consider it in scheduling or schedule anything on it

U-Boot 2021.01-g2dd2e1d366 (Sep 27 2022 - 16:43:29 +0000)
 
SoC:   AM62X SR1.0 GP
Model: Texas Instruments AM625 SK
EEPROM not available at 0x50, trying to read at 0x51
Board: AM62-SKEVM rev E2
DRAM:  2 GiB
MMC:   mmc@fa10000: 0, mmc@fa00000: 1, mmc@fa20000: 2
Loading Environment from MMC... OK
In:    serial@2800000
Out:   serial@2800000
Err:   serial@2800000
Net:   eth0: ethernet@8000000port@1
Hit any key to stop autoboot:  0
=>
=>
=>
=> optargs="isolcpus=3"
=>
=>
=> boot
switch to partitions #0, OK
mmc1 is current device

Then you can start a background load, in this case a memory heavy test (memrate does 1MB bursts at the requested rate to meet the paraleters per second):

stress-ng --memrate 1 --memrate-rd-mbs 70 --memrate-wr-mbs 140 --taskset 0 &

This is one read and one write on core 0. You can also give it a list like 0,2 and more threads. This represents the background load.

Then start the actual cyclictest. Either just on core you isolated (3) with -t1 -a3, or as I do below on all cores:

cyclictest -l100000000 -m -Sp90 -i200 -h400 -q > output

This will run for just under 5 and a half hours. The file output will have the histogram and basic statistics. I have attached 3 runs below. The graph is drawn as suggested in https://www.osadl.org/Create-a-latency-plot-from-cyclictest-hi.bash-script-for-latency-plot.0.html

AM625 SDK 8.4 isolcpu=3, default filesystem with everything running. stress-ng memrate on core 0.

	CPU0	CPU1	CPU2	CPU3
min	5	5	5	5
avg	7	6	6	6
max	99	167	167	47

So the worst case for the realtime core is 47us.CPU3 in the graph above.

AM6442 SDK 8.4 isolcpu=1, default filesystem with everything running and inline ECC on. stress-ng memrate on core 0

	CPU0	CPU1
min	6	7
avg	10	9
max	178	64

So the worst case for the realtime core is 64us.CPU1 in the graph above.

And for reference a case without isolcpus.

AM6442 SDK 8.4, default filesystem with everything running no isolcpus. stress-ng memrate on core 0

CPU0	CPU1
min	6
avg	9	10
max	163	351

The OSADL site has results for many embedded processors up to server processors. See https://www.osadl.org/Thumbnails-of-all-default-latency-plots.qa-latencyplot-thumbnails.0.html as a starting point. For embedded targets <50us is a realistic target but worst case behavior below 20us is only reached with very high end processors.

Pekka

Processors

Processors forum

[FAQ] AM625: How to measure interrupt latency on multicore Sitara devices using cyclictest?