This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-AM64X: Real-time performance regression since AM57xx

Part Number: PROCESSOR-SDK-AM64X
Other Parts Discussed in Thread: AM5728, AM6441

We are evaluating the XAM6442 on AM64x SKEVM with Linux-RT SDK. What we observe is relative bad real-time performance - "cyclictest" results in the range of 100us under load. 

Up to now, we were using AM571x and we were quite happy with the single-core (no SMP) "cyclictest" result of less than 30us. 

I know that the SDK is in an early stage of development and probably the kernel configuration is still not optimized. Are there any plans to release a 32-bit kernel (armv7)? We need the best possible single-thread performance and the lowest possible latency in the Linux A53.

  • Hi,

    Could you please share which SDK release version you are using?

    I will check but I don't believe there are plans to release a 32bit ARMV7 kernel for the AM64. Could you respond with a reason why this would be something would be desirable?

    Best Regards,

    Schuyler

  • We are using the prebuilt binary "tisdk-default-image-am64xx-evm.wic.xz" from the Linux-RT SDK at "https://downloads.ti.com/processor-sdk-linux-rt/esd/AM64X/latest/exports/tisdk-default-image-am64xx-evm.wic.xz?tracked=1" 

    The kernel version is "Linux am64xx-evm 5.4.106-rt54-g519667b0d8 #1 SMP PREEMPT_RT Fri May 28 14:38:16 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux"

    In fact, we don't need the 32-bit kernel - it was just a wild guess if it will provide better real-time performance (as we saw on Cortex-A15 in AM57xx - it is a beast). There we found that disabling SMP on AM5728 leads to much better cyclictest latency results. We are focused on single-thread performance and probably we would switch to AM6441 - but on AM64x_SK we have the 6442. We cannot disable the SMP option in aarch64 kernels - it is designed this way. 

    In the default image, I see a lot of features that might affect the RT latency - wifi, jailhouse, and maybe other peripherals (interrupts) that might cause the bad max latency results. 

    I've made some trials to disable various subsystems without success (no improvement on the latency). Also, I played a little bit with ISOLCPUS but again, no improvement. I think that the RT kernels do have CPU idle and frequency scaling stopped by default - at least it was so in other TI kernels for older SITARA SDKs. I've checked also the interrupt statistics in Linux - removing wl18xx related kernel modules did help on stopping some interrupts. What remains are the "rescheduling interrupts", and the core timers interrupts (on both cores). 

    I also see the "irqbalance" service running - not sure what it is and I going to check if it affects the results.

    So I wonder if TI has some internal figures about this? I saw the lmbench latency metrics (lat_unix and friends) seem fine - around 40us. Usually, this corresponds to the cyclictest results, so I wonder what I did wrong on my system? I suppose TI uses the GPEVM and I use the SKEVM - maybe something in the platform is different?