This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] AM625: How to measure interrupt latency on multicore Sitara devices using cyclictest?

Part Number: AM625
Other Parts Discussed in Thread: AM3357, , AM6442

How can I measure interrupt latency on Sitara multicore devices such as AM625 and compare the results to single core devices such as AM3357?

  • Cyclictest (https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start ) is the typical real-time test tool to measure interrupt latency in Linux. OSADL ( https://www.osadl.org/Realtime-Preempt-Kernel.kernel-rt.0.html#externaltestingtool ) pages provide a lot of details and examples on how to run tests as well as they have an extensive farm of hardware with public details on the Linux kernel running and the resulting interrupt latencies. Most often the maximum or worst case latency is the more important metric than the average.

    cyclictest -l100000000 -m -Sp90 -i200 -h400 -q >output

    is the OSADL example or default parameters:

     -m means lock all memory allocations, in a realtime embedded system you do not want to have paging of memory out to disk

     -l100000000 is for 100M iterations. Most of the worst case outliers happen only for example once per 1M or once per an hour, so a long run is required (note this means >5h of realtime)

    - S is SMP or start one thread per core. It is often simplest to just start one per core rather than just measure on one core (-t1 -a2 would be an alternative to start one thread on core 2)

    -p90 is run the interrupt at realtime priority 90 (99 is the highest). So this is a choice of the relative priority to other things you want to measure.

    -i200 is for measure latency every 200 microseconds (thread per core scheduled to run every 200us)

    -h400 is to collect a histogram of observed interrupt latencies up to 400us, 400 buckets

    -q is to be quiet until the end

    In general it is also important to consider what else is running and how representative is it of your final system. Things like filesystem over Ethernet (NFS) will effect the result. Just cyclictest in an otherwise idle system is perhaps of limited value, often it makes sense to start a background load to represent the non-realtime of the actual system. One possible way to do this is to use a synthetic test program like stress-ng.

    On a single core system both background and realtime will co-exist on the same core, but will execute in serial. On a SMP multicore system background threads can run in parallel on another core. To control for this variable and guarantee lower maximum latency in both real application and cyclictest tests core affinity and isolating cores can be used. On Sitara multicore processors you can isolate one (or more) cores from the kernel scheduling using the kernel command line parameter isolcpus . For example below I halt the boot at uboot and isolate cpu 3, so kernel will not consider it in scheduling or schedule anything on it

    U-Boot 2021.01-g2dd2e1d366 (Sep 27 2022 - 16:43:29 +0000)
     
    SoC:   AM62X SR1.0 GP
    Model: Texas Instruments AM625 SK
    EEPROM not available at 0x50, trying to read at 0x51
    Board: AM62-SKEVM rev E2
    DRAM:  2 GiB
    MMC:   mmc@fa10000: 0, mmc@fa00000: 1, mmc@fa20000: 2
    Loading Environment from MMC... OK
    In:    serial@2800000
    Out:   serial@2800000
    Err:   serial@2800000
    Net:   eth0: ethernet@8000000port@1
    Hit any key to stop autoboot:  0
    =>
    =>
    =>
    => optargs="isolcpus=3"
    =>
    =>
    => boot
    switch to partitions #0, OK
    mmc1 is current device

    Then you can start a background load, in this case a memory heavy test (memrate does 1MB bursts at the requested rate to meet the paraleters per second):

    stress-ng --memrate 1 --memrate-rd-mbs 70 --memrate-wr-mbs 140 --taskset 0 &

    This is one read and one write on core 0. You can also give it a list like 0,2 and more threads. This represents the background load.

    Then start the actual cyclictest. Either just on core you isolated (3) with -t1 -a3, or as I do below on all cores:

    cyclictest -l100000000 -m -Sp90 -i200 -h400 -q > output

    This will run for just under 5 and a half hours. The file output will have the histogram and basic statistics. I have attached 3 runs below. The graph is drawn as suggested in https://www.osadl.org/Create-a-latency-plot-from-cyclictest-hi.bash-script-for-latency-plot.0.html 

    AM625 SDK 8.4 isolcpu=3, default filesystem with everything running. stress-ng memrate on core 0.

    CPU0 CPU1 CPU2 CPU3
    min 5 5 5 5
    avg 7 6 6 6
    max 99 167 167 47

    So the worst case for the realtime core is 47us.CPU3 in the graph above.

    AM6442 SDK 8.4 isolcpu=1, default filesystem with everything running and inline ECC on. stress-ng memrate on core 0

    CPU0 CPU1
    min 6 7
    avg 10 9
    max 178 64

    So the worst case for the realtime core is 64us.CPU1 in the graph above.

    And for reference a case without isolcpus.

    AM6442 SDK 8.4, default filesystem with everything running no isolcpus. stress-ng memrate on core 0

    CPU0 CPU1
    min 6
    avg 9 10
    max 163 351

    The OSADL site has results for many embedded processors up to server processors. See https://www.osadl.org/Thumbnails-of-all-default-latency-plots.qa-latencyplot-thumbnails.0.html as a starting point. For embedded targets <50us is a realistic target but worst case behavior below 20us is only reached with very high end processors.

      Pekka