This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: Codesys performance problems

Part Number: AM6442
Other Parts Discussed in Thread: SK-AM64B

Edited March 26 2025

Tool/software:

Hello Sitara team,

my customer  has developed a small scale PLC based on AM6442, running Codesys as EtherCAT Master.

They are trying to achieve best CODESYS EtherCAT performance (Max. Cycle Time) with AM6442.
First I will describe the problem (in attached presentation) and then what measures we have taken to improve the cycle time performance. 

 4863.CODESYS Performance.pptx

Some basic information:
For our tests we are using SK-AM64B board SK-AM64B Evaluation board | TI.com
We are using Yocto to build custom firmware based on SDK 09.02.00  with added RT Patch.

 

root@plcnext:~# uname -a
Linux plcnext 6.1.83-rt28-ti-rt-g96b0ebd82722 #1 SMP PREEMPT_RT Mon May 13 23:06:24 UTC 2024 aarch64 GNU/Linux

 

We are using very simple CODESYS project with almost no code just one recommended for Trace. EtherCAT Master is running and cyclically exchange process data with Beckhoff EK1100 and EL2252.
We have USB Dongle for CODESYS licensing.

This is the Codesys project:
https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/3324.Trace_5F00_demo.project

The goal is to rotate 10 servo drives in SoftMotion on 1 usec 1 msec cycle time.

Is this possible?

Do you see other measures to improve CODESYS Max. Cycle time?

Best regards
Manuel

  • Hello Manuel,

    The thread owner is out of the office for the rest of March. Feel free to ping the thread in early April if you have not received a response.

    Regards,

    Nick

  • Hello,

    in the mean time I have pointed my customer to this similar thread: (+) PROCESSOR-SDK-AM64X: XDP Support - Processors forum - Processors - TI E2E support forums

    Unfortunately changing the ksoftirqs priority to FIFO 52 has not improved the maximum cycle time.


    Can you please help to debug this cause?

    Best regards
    Manuel

  • Have you contacted Codesys? What is their expected performance for a dual core A53 at 1GHz? Step through all the guidance they and others have:

    https://content.helpme-codesys.com/en/CODESYS%20Control/_rtsl_performance_optimization_linux.html

    https://www.linutronix.de/blog/A-Checklist-for-Real-Time-Applications-in-Linux 

    We have USB Dongle for CODESYS licensing.

    It is very likely the latency outliers are related to USB and the license server codemeter Codesys uses. Have you run the same without the USB involved? Some other licensing method, or just the demo version not requiring USB. In our Codesys test runs the worst offender we saw was the licensing infrastructure and related USB drivers.

    Is there a specific reason you are using AM6442? For running Codesys I'd recommend https://www.ti.com/product/DRA821U , https://www.ti.com/product/AM62P or https://www.ti.com/product/AM67 . These should perform significantly better even without tuning the RT Linux setup .

      Pekka

  • The goal is to rotate 10 servo drives in SoftMotion on 1 usec 1 msec cycle time.

    Is this possible?

    Good clarification, I was assuming 1millisecond. Theoretical EtherCAT best performance is 31.25us (microseconds). See https://www.ibv-augsburg.de/downloads/icECAT_EtherCAT_Master_Stack_Benchmark.pdf showing 100us cycle time benchmark breakdown on AM64x R5 core. You'll also see 500us A53 Linux results, so assuming setup is tuned properly 1ms EtherCAT master seems possible.

      Pekka

  • Customer has tested a licensing without the USB dongle but the results did not change a lot.

    This is with USB licensing:

    This is without:

    Would it make any difference if they re-compiled Codesys with their own environment and toolchain?

    Best regards
    Manuel

  • First question what is Codesys saying?

    Second question why AM64x? Why not a more powerful device from the TI portfolio. AM64x A53 Linux performance is the worst of all the TI AM6x portfolio.

    Would it make any difference if they re-compiled Codesys with their own environment and toolchain?

    Codesys is a black box binary, to my knowledge they do not allow source code access to their customers.

    From the couple screenshots I see both look like meet 1000us cycle time? Without USB dongle the run looks very short 370s compared with 1429s, but anyway both look like meet 1000us. To my knowledge (they should of course check with the vendor that sold the Codesys), "max cycle time" is the worst case, when it is below desired cycle time the schedule is being met.

      Pekka

  • The design with AM6442 is done already and performance was expected to be better based on the DMIPS compared to the Zynq solution used before, with lower core clock.

    The issue is that the 1ms cycle time is only reached with one device in the loop.
    With 8 devices the performance drops to ~3ms:

    No of Axis

    PRG/LINE

    Min Cycle Time

    Average Cycle Time

    Max Cycle Time

    Diffence

    Recommended Max

    Target Cycle Time

    8

    Idle

    1348

    1423

    1710

    287

    3200

    8000

    8

    CAM

    1499

    1565

    1879

    314

    3200

    8000

    8

    60000

    2699

    2765

    2948

    183

    3200

    8000

    8

    120000

    4021

    4089

    4303

    214

    3200

    8000

    8

    240000

    6661

    6722

    6949

    227

    3200

    8000

    This is their full task priority list:

    There is no IPC between R5F and A53 running.

    root@plcnext:~# uname -a
    Linux plcnext 6.1.83-rt28-ti-rt-g96b0ebd82722 #1 SMP PREEMPT_RT Mon May 13 23:06:24 UTC 2024 aarch64 GNU/Linux

    In a discussion with Thomas Schneider he mentioned that we have seen ~300us on AM6442 with 3 drives connected.

    There seems to be something wrong with the overall Codesys configuration?

    Regards
    Manuel

  • Ok so seems a quite long way away from the goal. For the reference device, do you have more specifics, what ZYNQ device? Old one like ZYNQ-7000 with Cortex-A9s, or Ultrascale+ with A53s? The steps and kernel configurations used there to get to performance they needed, have the followed all the same ones?

    It should be down to "normal" RT tuning and latency optimizations. This is something where companies like Linutronix and BayLibre are good. Or other high end Linux contractors. But a few starting steps.

    1. Lets make sure interrupt handling is not the issue. PREEMPT_RT uses ksoftirq's to handle interrupts (https://bootlin.com/doc/training/preempt-rt/preempt-rt-slides.pdf ). What is their priority? Type:

    ps aux | grep ksoftirq

    Look at the PID of the two ksofirqs (one per core). By default in our SDKs they are not RT, or FIFO scheduling. Change to high priority like FIFO at 10. With commands like below assuming the PIDs were 13 and 27:

    chrt -f -p 10 13 
    chrt -f -p 10 27 

    2. In the full task priority list I see lots of stull at RT priority (negative numbers) in the PRI+RT column. I'm thinking >90% of the stuff in green should not be at that high priority. Only stuff related to the Ethernet interface used with Codesys should be RT, everything else should not. RT is a zero-sum game, the priority comes from having as little as possible at a high priority, squash down everyone else so the ambulance can pass. The more there is in high priority the less high priority is worth. Get the same printout on the ZYNQ and compare.

    3. Make the system lean. The smaller the Linux kernel the better RT performnce will be is a good rule of thumb. Remove all services and kernel modules you don't need.

    4. https://www.linutronix.de/blog/A-Checklist-for-Real-Time-Applications-in-Linux start going down the list.

      Pekka

  • Hello, and thanks for adding me to the forum and thank you for your support!

    Changing the priority of the ksoft interrupts doesn't improve the performance.

    Ok so seems a quite long way away from the goal. For the reference device, do you have more specifics, what ZYNQ device? Old one like ZYNQ-7000 with Cortex-A9s, or Ultrascale+ with A53s? The steps and kernel configurations used there to get to performance they needed, have the followed all the same ones?

    For our old systems we are using ZYNQ 7020 based on Cortex-A9.

    "Second question why AM64x? Why not a more powerful device from the TI portfolio. AM64x A53 Linux performance is the worst of all the TI AM6x portfolio."

    This statement is quite interesting for me. What is the reason? And would the performance become worse if we use rpmsgs between all 4 R5 cores and ARM64 core?

  • Why do you have all the interrupts as real-time at priority -51? You should only have the Codesys tasks, ksoftirq, maybe a couple other things as real-time. Everything else should be lower priority. Real-time is a zero sum game, you can really have only one thing prioritized, everyone else should be pushed down and suffer. Can you print out the tasks and priorities in your old system and the new evaluation system? Something like

    uname -a
    to get the kernel version

    ps -ALo psr,policy,priority,pid,tid,cputime,comm
    to get what is running and what is real-time

    performance was expected to be better based on the DMIPS compared to the Zynq solution

    DMIPS measures warm L1 cache average performance. It has nothing to do with real-time performance, and correlation with a complex Linux application performance performance is almost non-existent.

    "Second question why AM64x? Why not a more powerful device from the TI portfolio. AM64x A53 Linux performance is the worst of all the TI AM6x portfolio."

    This statement is quite interesting for me. What is the reason? And would the performance become worse if we use rpmsgs between all 4 R5 cores and ARM64 core?

    What are you trying to do? Codesys does not utilize the R5's. For Codesys Linux performance you just want A-cores, max clock speed and cache, and DDR performance. Generally yes, the more things you have running in parallel, the worse Linux real-time performance will be. 

    This is with USB licensing:

    This is without:

    The results in these screenshots look like meeting 1ms cycle time and 5x better than:

    No of Axis

    PRG/LINE

    Min Cycle Time

    Average Cycle Time

    Max Cycle Time

    Diffence

    Recommended Max

    Target Cycle Time

    8

    Idle

    1348

    1423

    1710

    287

    3200

    8000

    8

    CAM

    1499

    1565

    1879

    314

    3200

    8000

    8

    60000

    2699

    2765

    2948

    183

    3200

    8000

    8

    120000

    4021

    4089

    4303

    214

    3200

    8000

    8

    240000

    6661

    6722

    6949

    227

    3200

    8000

    Can you elaborate what is different?

      Pekka