AM6442: PRP throughput on Linux

Part Number: AM6442
Other Parts Discussed in Thread: SK-AM64B, , AM6422

Tool/software:

Dear Experts,

The PRP throughput was measured by connecting two SK-AM64Bs in opposition to each other.
The measured throughput for client-server communication of iperf was approximately 300Mbps.

-The two Ethernet ports on the board are connected to the ports of the communication partner without an L2SW in between.
-The PRP stack implemented in the Linux driver was used in NON-offload mode for PRP.
The SDK used was 09.02.01.09 (29 Mar 2024).
-In a normal single-cable connection without PRP, the throughput is over 800Mbps.

So here are my questions.Please answer as much as possible.

Is the PRP throughput of 300Mbps for the SK-AM64B reasonable?
1)Is there an error in the way I'm using the PRP?
Is the interface speed supported by the PRP correct at 1Gbps? Or is it 100Mbps?
Does the PRP in the SDK I used implement the IEC 62439-3 Edition 3 standard?

2)How can I increase the throughput?
If I use two PC-Linux instead of SK-AM64B, the throughput of PRP is 1Gbps.
Will updating to the latest SDK 10.00.07.04 (14 Aug 2024) improve it?
Are there any tuning items?

3)Is this the limit of performance in non-offload mode?
When will the PRP offload modules (firmware) for PRU-ICSSG and CPSW3g be completed?
When will the Linux drivers for these offload modules be provided?

Thank you in advance
Regards,
Takeshi OHNO

  • Hello Takeshi,

    2)How can I increase the throughput?

    A couple of initial suggestions for this:

    • elevate the priority of the ksoftirq threads
    • change the scheduling policy of ksoftirq from "TS" to "FIFO" and elevate the priority
    • elevate the priority of iperf3 if that is the test program you are using to test the throughput of PRP
    1)Is there an error in the way I'm using the PRP?

    Are you using the setup steps provided in https://software-dl.ti.com/processor-sdk-linux/esd/AM64X/09_02_01_09/exports/docs/linux/Foundational_Components/Kernel/Kernel_Drivers/Network/HSR_PRP_Non_Offload.html ? According to the same page, IEC 62439-3 is supported.

    When will the PRP offload modules (firmware) for PRU-ICSSG and CPSW3g be completed?

    As you probably already know, the current latest released SDK 10.00.07.04 (14 Aug 2024) does not support PRP offloaded. There are only some limited improvements that can be found by offloading PRP as discussed in https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1397884/am6442-prp-offload-firmware-on-linux. PRP offload support is currently planned for 1H 2025.

    -Daolin

  • Hello Daolin,

    Thank you for your quick response.
    Please continue to discuss this matter for a little while.
    I would like to ask a few additional questions.

    A couple of initial suggestions for this:

    • elevate the priority of the ksoftirq threads
    • change the scheduling policy of ksoftirq from "TS" to "FIFO" and elevate the priority
    • elevate the priority of iperf3 if that is the test program you are using to test the throughput of PRP

    I am using iperf3.
    The only processes or kernel processing running after Linux starts up are iperf3 and PRP communication.
    Before starting iperf3, the A53 load is only 2-3%, but while iperf3 is running it reaches its limit of about 90%.
    The method you advised gives priority to iperf3 and related processes over other processes,
    so I don't think it will be very effective if there are almost no other processes.
    One way to increase the PRP throughput would be to make the PRP parallel processing so that both A53 cores on the AM6442 can be fully used.
    Is this possible?
    Without using PRP, the iperf3 throughput is 800Mbps, but the CPU load at that time reaches its limit of about 90%.

    Yes, it's almost the same procedure as the information you gave me. The difference is that I used "ifconfig" to set up the "ip link".
    Since there doesn't seem to be a problem with how we're using it, can we assume that the problem of the PRP throughput being only 300Mbps is the limit of the AM6442?
    Are you getting similar results in your environment?
    Are there any plans to improve the throughput for the PRP Non-Offload implementation?

    As you probably already know, the current latest released SDK 10.00.07.04 (14 Aug 2024) does not support PRP offloaded. There are only some limited improvements that can be found by offloading PRP as discussed in https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1397884/am6442-prp-offload-firmware-on-linux. PRP offload support is currently planned for 1H 2025.

    Thank you for the information about when PRP off-loading will be supported.
    PRP processing can be almost completely covered by dup-offload and tag-in-offload on the sender, and tag-rm-offload on the receiver,
    so the effect of off-loading is likely to be great.
    If the PRP offload function is used, even if the PRP throughput does not improve to 300Mbps,
    I hope that the load on the A53 core will decrease to the same level as when PRP is not used.

    Thank you in advance
    Regards,
    Takeshi OHNO

  • Hi Takeshi,

    The only processes or kernel processing running after Linux starts up are iperf3 and PRP communication.

    You should be able to see the ksoftirq threads with "ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep ksoft". These threads are related to handling ethernet interrupts so that is why we made the suggestion to try and elevate the priority and change the scheduling policy to a more real-time friendly scheduling policy.

    One way to increase the PRP throughput would be to make the PRP parallel processing so that both A53 cores on the AM6442 can be fully used.
    Is this possible?

    From what I understand, a process is only able to make use of more than one core at the same time if it has been programmed/designed so that it can be split into multiple parallelizable threads at which point each of the threads could run on a different core. Since the PRP setup doesn't appear to be an application level program and it seems to be configured via Linux drivers, I don't think parallel processing of PRP will be possible. However, I could be incorrect by this statement and I will double check with the internal team.

    PRP processing can be almost completely covered by dup-offload and tag-in-offload on the sender, and tag-rm-offload on the receiver,
    so the effect of off-loading is likely to be grea

    We currently don't have data on the specific CPU load improvements of specifically dup-offload, tag-in-offload, and tag-rm-offload but the effects should be present but won't be as great as the forwarding-offload that would be seen in an HSR setup.

    A potential way to measure the load improvement of dup-offload, tag-in-offload, and tag-rm-offload may be possible by estimating the CPU load improvement by testing HSR only with dup-offload, tag-in-offload, and tag-rm-offload and without dup-offload, tag-in-offload, and tag-rm-offload.

    Are you getting similar results in your environment?
    Are there any plans to improve the throughput for the PRP Non-Offload implementation?

    I recently haven't been able to test PRP in my environment but I will try to do so on Friday or early next week. If I'm able to replicate the performance you are seeing I will also need to check with the development team if there will be any plans to improve PRP Non-Offload implementation. From my knowledge, there will not be because we are planning to support PRP offload.

    Please feel free to ping this thread again if you don't hear back from me by Monday next week.

    -Daolin

  • Hi Takeshi,

    Is there a particular reason why you have chosen AM64x instead of for example AM62x? AM62x has a lower price and would result in more throughput with zero SW effort (has 4 A53 cores compared to AM64x 2 A53 cores).

    You should be able to see the ksoftirq threads with "ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep ksoft". These threads are related to handling ethernet interrupts so that is why we made the suggestion to try and elevate the priority and change the scheduling policy to a more real-time friendly scheduling policy.

    Please let us know if you can see the ksoftirqs and tested what the performance is like with these elevated in priority.

    The PRP throughput was measured by connecting two SK-AM64Bs in opposition to each other.
    The measured throughput for client-server communication of iperf was approximately 300Mbps.

    -The two Ethernet ports on the board are connected to the ports of the communication partner without an L2SW in between.
    -The PRP stack implemented in the Linux driver was used in NON-offload mode for PRP.
    The SDK used was 09.02.01.09 (29 Mar 2024).
    -In a normal single-cable connection without PRP, the throughput is over 800Mbps.

    Just to clarify, is your setup to test this the following? I.e. can you share a diagram of what your test topology looks like?

    Board 1: SK-AM64B port 0 <> port 0 SK-AM64B (Board 2)

    Board 1: SK-AM64B port 1 <> port 1 SK-AM64B (Board 2)

    -Daolin

  • Hi Daolin,

    First, a correction. I made a mistake in my original post.
    In the title at the beginning, I should have written AM6442 instead of AM6422. Sorry.
    Can you fix that?

    Is there a particular reason why you have chosen AM64x instead of for example AM62x?

    The CPU board we need is one that can operate stably in both high and low environmental temperatures.
    I've heard that the only SoC that meets this condition and is currently available is the AM64x.

    Please let us know if you can see the ksoftirqs and tested what the performance is like with these elevated in priority.

    Previously, we changed to RT scheduling and raised the priority of ksoftirq in order to speed up receiving communications.
    However, simply raising the priority of ksoftirq disrupted the processing order with other kernel processes, resulting in a number of incomprehensible system errors.
    Do you know how to safely raise the priority of ksoftirq?

    Just to clarify, is your setup to test this the following? I.e. can you share a diagram of what your test topology looks like?

    Yes, that is correct.
    I have taken a photo of our measurement environment and will send them to you for your reference.

    I recently haven't been able to test PRP in my environment but I will try to do so on Friday or early next week. If I'm able to replicate the performance you are seeing I will also need to check with the development team if there will be any plans to improve PRP Non-Offload implementation.

    We look forward to hearing from you regarding the above.

    Thank you in advance.
    Regards,
    Takeshi OHNO

  • Hi Takeshi,

    In the title at the beginning, I should have written AM6442 instead of AM6422. Sorry.
    Can you fix that?

    Got it, I just fixed the part number in the post.

    However, simply raising the priority of ksoftirq disrupted the processing order with other kernel processes, resulting in a number of incomprehensible system errors.
    Do you know how to safely raise the priority of ksoftirq?

    What priority are you raising the ksoftirqs to? From my understanding, the highest possible priority to raise in Linux is 98, priority 99 is used by the system tick and so running any application/thread at 99 might introduce issues.

    Regarding testing on my environment, since I wanted to clarify your setup before I try to setup something and run on my side, I was not yet able to test PRP. I will not be able to get to setting this up this week and will be out of office early next week. I will try setting this up later next week or more likely the following week.

    In the meantime, if possible, can you share exactly what the commands you ran run boot up of the EVMs to testing the CPU load and throughput so that once I get to this I can compare?

    -Daolin

  • Hi Daolin,

    Got it, I just fixed the part number in the post.

    Thank you for your trouble.

    What priority are you raising the ksoftirqs to?

    Before answering, I have one question.
    How much do you expect throughput to improve from the current value by raising ksoftirqs to its maximum of 98?
    The CPU load due to processes unrelated to the throughput measurement process is 2-3%.
    As I reported before, the CPU load just before starting the throughput measurement was 2-3%.
    Therefore, I expect that the improvement in throughput when ksoftirqs is raised to the maximum of 98 will be at most 2-3%.
    Is my thinking correct?

    In the meantime, if possible, can you share exactly what the commands you ran run boot up of the EVMs to testing the CPU load and throughput so that once I get to this I can compare?

    While we appreciate your consideration, we would like you to provide us with the results as soon as possible,
    so we will send you the command execution procedure for our environment.
    (1) Network and PRP settings after starting Linux on the client: setPRP_client.sh

    (2) Network and PRP settings after starting Linux on the server: setPRP_server.sh

    (3) PRP performance measurement procedure: HowToUse_measurementPRP.txt
    However, please stop CPU load measurements and tcpdump execution, which may affect the measurement, when measuring throughput.
    ・Client script: measurementPRP_client.sh
    ・Server script: measurementPRP_server.sh

    Takeshi OHNO

    MeasurementProcedure.zip

  • Hello Takeshi,

    Thank you for your patience. I was able to see similar results of ~300Mbps throughput rate when PRP was setup on two AM64x EVMs (for me this was on the TMDS64EVMs) on the CPSW ethernet interfaces.

    Topology:

    TMDS64EVM1 eth0 <> TMDS64EVM2 eth0

    TMDS64EVM1 eth1 <> TMDS64EVM2 eth1

    SDK Version:

    RT-Linux 10.00.07

    Test Sequence:

    1. Run sh ./prp_setup.sh prp eth0 eth1 192.168.1.10 on EVM1 where prp_setup.sh is from the script in https://software-dl.ti.com/processor-sdk-linux/esd/AM64X/10_00_07_04/exports/docs/linux/Foundational_Components/Kernel/Kernel_Drivers/Network/HSR_PRP_Non_Offload.html?highlight=prp

    2. Run sh ./prp_setup.sh prp eth0 eth1 192.168.1.20 on EVM2

    3. Ping from EVM1 to EVM2 and disconnect one cable path --> no discontinuation of communication

    4. Setup iperf3 server on EVM2 and iperf3 client on EVM1 using TCP communication and I see the below result. What is concerning is that there are retries occuring for every transfer, which is indicating that many packets are being dropped. This in turn could be influencing the throughput rate, resulting in a lower than expected rate.

    EVM1 Console Log:
    
    root@am64xx-evm:~# iperf3 -c 192.168.1.20                                                           
    Connecting to host 192.168.1.20, port 5201
    [  5] local 192.168.1.10 port 44482 connected to 192.168.1.20 port 5201
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  26.0 MBytes   218 Mbits/sec   23   46.5 KBytes       
    [  5]   1.00-2.00   sec  26.2 MBytes   220 Mbits/sec   20   36.6 KBytes       
    [  5]   2.00-3.00   sec  26.8 MBytes   224 Mbits/sec   22   52.1 KBytes       
    [  5]   3.00-4.00   sec  26.6 MBytes   223 Mbits/sec   22   76.0 KBytes       
    [  5]   4.00-5.00   sec  32.5 MBytes   273 Mbits/sec   13   97.2 KBytes       
    [  5]   5.00-6.00   sec  38.9 MBytes   326 Mbits/sec   32   81.7 KBytes       
    [  5]   6.00-7.00   sec  38.8 MBytes   325 Mbits/sec   11   98.6 KBytes       
    [  5]   7.00-8.00   sec  39.6 MBytes   332 Mbits/sec   14   80.3 KBytes       
    [  5]   8.00-9.00   sec  39.8 MBytes   333 Mbits/sec   12   69.0 KBytes       
    [  5]   9.00-10.01  sec  38.8 MBytes   322 Mbits/sec   12   80.3 KBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.01  sec   334 MBytes   280 Mbits/sec  181             sender
    [  5]   0.00-10.01  sec   334 MBytes   280 Mbits/sec                  receiver
    
    iperf Done.
    root@am64xx-evm:~# 
    

    Some additional actions I tried was

    1. Changing the iperf3 priority to 60 and scheduling policy from Round Robin to FIFO --> increased throughput slightly to ~450Mbps

    2. Changing ksoftirq0 and ksoftirq1 priority to 70 and scheduling policy from Time Slice to FIFO --> did not result in any noticeable changes to throughput

    Is there a particular reason why you have chosen AM64x instead of for example AM62x?
    The CPU board we need is one that can operate stably in both high and low environmental temperatures.
    I've heard that the only SoC that meets this condition and is currently available is the AM64x.

    May I ask where you heard this information from? Comparing the datasheets for AM64x and AM62x the temperature range is the same for both.

    AM62x: https://www.ti.com/document-viewer/AM625/datasheet#GUID-086D5402-AAEC-4ED6-B0BD-CFF7E4FE15D5/TITLE-SPRSP56INT_SPRS858_PACK_00005

    AM64x: https://www.ti.com/document-viewer/AM6442/datasheet#GUID-CD6E613A-70C1-4449-8B74-B692CA434DE6/TITLE-SPRSP56INT_SPRS858_PACK_00005

    Again, why we point to AM62x as a better option is due to the higher CPU frequency and number of cores which can potentially improve the throughput performance.

    What is the end equipment that you are trying to implement PRP for? It might be useful to try and compare PRP performance on AM62x if you have access to some AM62x SK-EVMs.

    -Daolin

  • Hi Daolin,

    Thank you for measuring at your company as well.
    I'm relieved that you got the same results.

    4. Setup iperf3 server on EVM2 and iperf3 client on EVM1 using TCP communication and I see the below result. What is concerning is that there are retries occuring for every transfer, which is indicating that many packets are being dropped. This in turn could be influencing the throughput rate, resulting in a lower than expected rate.

    In my measurements, I observed packet drops, but they occurred infrequently, so I believe the impact on throughput is small.

    1. Changing the iperf3 priority to 60 and scheduling policy from Round Robin to FIFO --> increased throughput slightly to ~450Mbps

    I'm amazed by your results.
    The throughput has improved by just under 50%.
    I'll measure it under the same conditions.
    However, the version of RT-Linux used here is 09.02.01.09.

    May I ask where you heard this information from? Comparing the datasheets for AM64x and AM62x the temperature range is the same for both.

    Certainly, what you say is correct.
    I will now check the authenticity of my information. It is possible that I have misunderstood something.
    Please wait a moment.

    Regards,
    Takeshi OHNO

  • Hi Daolin,

    1. Changing the iperf3 priority to 60 and scheduling policy from Round Robin to FIFO --> increased throughput slightly to ~450Mbps

    We performed the same measurements as above in our environment, but did not see the improvement in throughput that you experienced.
    Our results were roughly the same throughput as last time.

    So, I have three questions for you:

    1. I have attached the file with our measurements, but is there any difference between our measurements and your procedure?
    - Although the type of evaluation board is different, I think the performance is almost the same.
    - I changed the SDK version from 09.02.01.09 to 10.00.07 as yours, and the results were the same.

    FIFO_SchedulingPolicy.zip

    2. Can you observe a throughput of 450Mbps by repeating the process several times?
    I would like to know the fluctuation of the throughput you measured, so please tell me the maximum and minimum throughput.

    3. Why do you think your throughput was improved by the above changes?
    Processing other than the processing related to the measurement only takes up a few percent of the CPU load.
    Even so, the above changes improved it by nearly 50%.
    From this, I do not think it is due to increasing the priority.
    If you change to the FIFO scheduling policy of RT-Linux, is it possible that the mechanism of thread waiting and synchronization will change?

    May I ask where you heard this information from? Comparing the datasheets for AM64x and AM62x the temperature range is the same for both.

    The information I explained was from before the AM62x specs were made public. My apologies.
    Even now, the reason to choose the AM64x is that it has four Ethernet ports.
    On the other hand, the AM62x only has two ports.
    I'm not an expert on hardware, but it would be difficult to increase the number of ports on the AM62x to four.

    Regards,
    Takeshi OHNO

  • Hi Takeshi,

    1. I have attached the file with our measurements, but is there any difference between our measurements and your procedure?

    Are you able to double check the Sched Policy and Priority of your iperf3 configuration with "ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep iperf"?

    I do not think there any significant differences between our setups other than what you have already pointed out.

    We performed the same measurements as above in our environment, but did not see the improvement in throughput that you experienced.
    2. Can you observe a throughput of 450Mbps by repeating the process several times?

    I attempted to reproduce the ~450Mbps throughput I saw from testing last week. Unfortunately, using the same steps as I used before, I can only see maximum of ~320Mbps for testing 4 times.

    root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
    Connecting to host 192.168.1.20, port 5201
    [  5] local 192.168.1.10 port 50890 connected to 192.168.1.20 port 5201
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  37.1 MBytes   311 Mbits/sec   19   40.8 KBytes       
    [  5]   1.00-2.00   sec  38.8 MBytes   325 Mbits/sec   18   70.4 KBytes       
    [  5]   2.00-3.00   sec  39.6 MBytes   332 Mbits/sec   38   76.0 KBytes       
    [  5]   3.00-4.00   sec  39.0 MBytes   327 Mbits/sec   19   81.7 KBytes       
    [  5]   4.00-5.00   sec  39.1 MBytes   328 Mbits/sec   35   60.6 KBytes       
    [  5]   5.00-6.00   sec  39.5 MBytes   331 Mbits/sec   30   62.0 KBytes       
    [  5]   6.00-7.00   sec  39.5 MBytes   331 Mbits/sec   19   70.4 KBytes       
    [  5]   7.00-8.00   sec  40.2 MBytes   338 Mbits/sec   17   91.5 KBytes       
    [  5]   8.00-9.00   sec  39.5 MBytes   331 Mbits/sec   15    107 KBytes       
    [  5]   9.00-10.00  sec  40.5 MBytes   339 Mbits/sec   19   76.0 KBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec   393 MBytes   330 Mbits/sec  229             sender
    [  5]   0.00-10.00  sec   392 MBytes   329 Mbits/sec                  receiver
    
    iperf Done.
    root@am64xx-evm:~# 
    root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
    Connecting to host 192.168.1.20, port 5201
    [  5] local 192.168.1.10 port 42044 connected to 192.168.1.20 port 5201
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  26.4 MBytes   221 Mbits/sec   33   63.4 KBytes       
    [  5]   1.00-2.00   sec  26.5 MBytes   222 Mbits/sec   27   54.9 KBytes       
    [  5]   2.00-3.00   sec  26.1 MBytes   219 Mbits/sec   22   36.6 KBytes       
    [  5]   3.00-4.00   sec  26.1 MBytes   219 Mbits/sec   25   53.5 KBytes       
    [  5]   4.00-5.00   sec  25.4 MBytes   213 Mbits/sec   25   52.1 KBytes       
    [  5]   5.00-6.00   sec  31.0 MBytes   260 Mbits/sec   18   80.3 KBytes       
    [  5]   6.00-7.00   sec  38.1 MBytes   320 Mbits/sec   15   97.2 KBytes       
    [  5]   7.00-8.00   sec  37.8 MBytes   317 Mbits/sec   17   64.8 KBytes       
    [  5]   8.00-9.00   sec  38.2 MBytes   321 Mbits/sec   18   60.6 KBytes       
    [  5]   9.00-10.00  sec  38.0 MBytes   318 Mbits/sec   18   56.3 KBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec   314 MBytes   263 Mbits/sec  218             sender
    [  5]   0.00-10.00  sec   314 MBytes   263 Mbits/sec                  receiver
    
    iperf Done.
    root@am64xx-evm:~# 
    root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
    Connecting to host 192.168.1.20, port 5201
    [  5] local 192.168.1.10 port 57844 connected to 192.168.1.20 port 5201
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  37.8 MBytes   317 Mbits/sec   14   94.3 KBytes       
    [  5]   1.00-2.00   sec  36.2 MBytes   304 Mbits/sec   16   88.7 KBytes       
    [  5]   2.00-3.00   sec  38.8 MBytes   325 Mbits/sec   16   62.0 KBytes       
    [  5]   3.00-4.00   sec  38.2 MBytes   321 Mbits/sec   14   67.6 KBytes       
    [  5]   4.00-5.00   sec  38.5 MBytes   323 Mbits/sec   18   49.3 KBytes       
    [  5]   5.00-6.00   sec  37.9 MBytes   318 Mbits/sec   17   81.7 KBytes       
    [  5]   6.00-7.00   sec  38.5 MBytes   323 Mbits/sec   15   64.8 KBytes       
    [  5]   7.00-8.00   sec  38.5 MBytes   323 Mbits/sec   15    103 KBytes       
    [  5]   8.00-9.00   sec  39.0 MBytes   327 Mbits/sec   15   50.7 KBytes       
    [  5]   9.00-10.00  sec  38.0 MBytes   318 Mbits/sec   14    100 KBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec   382 MBytes   320 Mbits/sec  154             sender
    [  5]   0.00-10.01  sec   380 MBytes   319 Mbits/sec                  receiver
    
    iperf Done.
    root@am64xx-evm:~# 
    root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
    Connecting to host 192.168.1.20, port 5201
    [  5] local 192.168.1.10 port 55036 connected to 192.168.1.20 port 5201
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  26.0 MBytes   218 Mbits/sec   44   40.8 KBytes       
    [  5]   1.00-2.00   sec  25.6 MBytes   215 Mbits/sec   26   46.5 KBytes       
    [  5]   2.00-3.00   sec  27.2 MBytes   229 Mbits/sec   20   31.0 KBytes       
    [  5]   3.00-4.00   sec  25.4 MBytes   213 Mbits/sec   25   36.6 KBytes       
    [  5]   4.00-5.00   sec  26.2 MBytes   220 Mbits/sec   26   50.7 KBytes       
    [  5]   5.00-6.00   sec  26.2 MBytes   220 Mbits/sec   32   54.9 KBytes       
    [  5]   6.00-7.00   sec  27.0 MBytes   226 Mbits/sec   21   50.7 KBytes       
    [  5]   7.00-8.00   sec  28.0 MBytes   235 Mbits/sec   22   57.7 KBytes       
    [  5]   8.00-9.00   sec  39.4 MBytes   330 Mbits/sec   17   56.3 KBytes       
    [  5]   9.00-10.00  sec  39.1 MBytes   328 Mbits/sec   20   56.3 KBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec   291 MBytes   244 Mbits/sec  253             sender
    [  5]   0.00-10.00  sec   290 MBytes   243 Mbits/sec                  receiver
    
    iperf Done.
    root@am64xx-evm:~# 
    root@am64xx-evm:~# 
    root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
    Connecting to host 192.168.1.20, port 5201
    [  5] local 192.168.1.10 port 43844 connected to 192.168.1.20 port 5201
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec  33.2 MBytes   279 Mbits/sec   18    100 KBytes       
    [  5]   1.00-2.00   sec  35.6 MBytes   299 Mbits/sec   15   66.2 KBytes       
    [  5]   2.00-3.00   sec  34.4 MBytes   288 Mbits/sec   14   74.6 KBytes       
    [  5]   3.00-4.00   sec  34.8 MBytes   292 Mbits/sec   16   73.2 KBytes       
    [  5]   4.00-5.00   sec  35.6 MBytes   299 Mbits/sec   33   78.9 KBytes       
    [  5]   5.00-6.00   sec  38.6 MBytes   324 Mbits/sec   22   56.3 KBytes       
    [  5]   6.00-7.00   sec  39.2 MBytes   329 Mbits/sec   18   40.8 KBytes       
    [  5]   7.00-8.00   sec  39.5 MBytes   331 Mbits/sec   18   40.8 KBytes       
    [  5]   8.00-9.00   sec  39.0 MBytes   327 Mbits/sec   17   69.0 KBytes       
    [  5]   9.00-10.00  sec  40.4 MBytes   338 Mbits/sec   15   83.1 KBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec   370 MBytes   311 Mbits/sec  186             sender
    [  5]   0.00-10.01  sec   370 MBytes   310 Mbits/sec                  receiver
    
    iperf Done.
    root@am64xx-evm:~# 

    3. Why do you think your throughput was improved by the above changes?

    The FIFO scheduling policy is known to be a "real-time" policy that will always immediately preempt any currently running SCHED_OTHER, SCHED_BATCH or SCHED_IDLE processes. Theoretically, changing to FIFO should help prevent other processes in the background from preempting the iperf3 process.

    However, theoretically, the main improvement in throughput should be from the priority (higher the priority, the better the throughput is the theoretical behavior).

    I noticed that comparing the below three, #3 has a much better load for the iperf3 process. However, what is weird is that I wasn't able to see the throughput improvement that I saw from last week.

    1. Baseline CPSW (no PRP) CPU Load = ~35.8%

    2. Baseline PRP CPU Load = ~77.2%

    3. FIFO priority 85 CPU load = ~17.6%

    4. Baseline CPSW CPU load
        0[####************************************************************                         66.9%] Tasks: 34, 16 thr, 119 kthr; 0 running
        1[*********************************************************************************        86.0%] Load average: 1.38 0.70 0.28 
      Mem[||||||#@$$$$$$$$$$                                                                  114M/1.78G] Uptime: 00:02:52
      Swp[                                                                                         0K/0K]
    
      [Main] [I/O]
        PID USER       PRI  NI  VIRT   RES   SHR S  CPU%-MEM%   TIME+  Command                                                                                                                                  
        956 root        20   0 17560  3328  2688 S  35.8  0.2  0:20.33 iperf3 -c 192.168.2.20 -t0                                                                                                               
        957 root        20   0  5588  3840  2432 R   4.1  0.2  0:02.65 htop
    
    5. Baseline PRP CPU load
        0[###**************************************************************************            80.5%] Tasks: 35, 16 thr, 119 kthr; 0 running
        1[*********************************************************                                59.9%] Load average: 1.57 0.96 0.42 
      Mem[||||||#@$$$$$$$$$$                                                                  109M/1.78G] Uptime: 00:04:20
      Swp[                                                                                         0K/0K]
    
      [Main] [I/O]
        PID USER       PRI  NI  VIRT   RES   SHR S  CPU%-MEM%   TIME+  Command                                                                                                                                  
       1038 root       -51   0 17560  3328  2688 R  77.2  0.2  0:19.59 iperf3 -c 192.168.1.20 -t0                                                                                                               
       1039 root        20   0  5584  3840  2432 R   4.0  0.2  0:01.04 htop
        961 root        20   0 10136  2432  1152 S   1.3  0.1  0:01.04 rpmsg_json /usr/share/benchmark-server/app/oob_data.json
    
    6. FIFO 85 iperf CPU load
        0[###*************************************************                                     54.9%] Tasks: 35, 16 thr, 119 kthr; 0 running
        1[#********************************************************************                    72.7%] Load average: 1.58 1.04 0.47 
      Mem[||||||#@$$$$$$$$$$                                                                  110M/1.78G] Uptime: 00:05:09
      Swp[                                                                                         0K/0K]
    
      [Main] [I/O]
        PID USER       PRI  NI  VIRT   RES   SHR S  CPU%-MEM%   TIME+  Command                                                                                                                                  
       1051 root       -86   0 17560  3328  2688 S  17.6  0.2  0:02.19 iperf3 -c 192.168.1.20 -t0                                                                                                               
       1052 root        20   0  5584  3840  2432 R   4.1  0.2  0:00.56 htop
        157 root        20   0 28084  8320  5760 S   1.4  0.4  0:01.91 /usr/lib/systemd/systemd-udevd
    

    On the other hand, the AM62x only has two ports.

    Yes it looks like from the product page, AM62x only supports 2 external ethernet ports: https://www.ti.com/product/AM625#features

    I'll need to check with the internal team to see if there are additional ideas to why the throughput observed is lower than expected. I'm aiming to give an update later this week.

    -Daolin

  • Hi Daolin,

    Thank you for reproducing the ~450Mbps throughput test.

    I'll need to check with the internal team to see if there are additional ideas to why the throughput observed is lower than expected. I'm aiming to give an update later this week.

    I'm waiting for an update from you, but I haven't heard anything.
    So, I've concluded the following from your AM6442 measurements. Is this correct?
    - PRP throughput is limited to ~320Mbps.
    - I can't reproduce ~450Mbps throughput. I think there must be some kind of mistake.

    Are you able to double check the Sched Policy and Priority of your iperf3 configuration with "ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep iperf"?

    The output of the above command is as follows for both Client and Server. Is it correct?
    1 FF -61 7595 7595 00:00:01 iperf3

    However, theoretically, the main improvement in throughput should be from the priority (higher the priority, the better the throughput is the theoretical behavior).

    We believe that your theory above does not hold true under the following circumstances. Don't you agree?
    In a situation where only one specific process is using most of the CPU and other processes are barely using the CPU,
    the performance of that process will not change whether the priority of that process is high or low.

    4. Baseline CPSW CPU load
    0[####************************************************************ 66.9%] Tasks: 34, 16 thr, 119 kthr; 0 running
    1[********************************************************************************* 86.0%] Load average: 1.38 0.70 0.28
    Mem[||||||#@$$$$$$$$$$ 114M/1.78G] Uptime: 00:02:52
    Swp[ 0K/0K]
    [Main] [I/O]

    Please tell me the procedure for measuring CPU load with iperf3 in your environment.
    What Linux command is used to execute the above output?
    Our measurement procedure uses the vmstat command,
    running vmstat at two points, before and during the execution of iperf3,
    and calculating the CPU load from the increase in the two CPU load values ​​that are output.

    1. Baseline CPSW (no PRP) CPU Load = ~35.8%

    2. Baseline PRP CPU Load = ~77.2%

    3. FIFO priority 85 CPU load = ~17.6%

    The CPU load rates of iperf3 you measured above are all lower than ours, and the CPU is under load.
    In this case, where do you think the bottleneck in iperf3/PRP throughput is?
    According to our measurements using vmstat, the CPU load is over 90% when the throughput of iperf3/PRP is 300Mbps.
    Based on these results, we believe that the throughput bottleneck is in the CPU processing capacity.

    I have three new questions.
    - Will the PRP offload function of AM6442, scheduled for the first half of 2025, work with both PRU-ICSSG and CPSW3g?

    - I would like to know the iperf3 throughput value with the PRP offload function as soon as possible.
      When will you be able to provide a beta version in advance?
      If it is difficult to provide one, please tell us the performance target value estimated at the time of design.
    - Has the PRP offload function ported an implementation that improves performance by improving the current PRP implementation?

    Regards,
    Takeshi OHNO

  • Hi Takeshi,

    I'm waiting for an update from you, but I haven't heard anything.

    Thanks for following up, apologies for the late response, I still have not learned of any useful information as to why this throughput would be around ~300Mbps. What I'm currently trying to understand is the relationship between the throughput and the CPU load.

    So, I've concluded the following from your AM6442 measurements. Is this correct?
    - PRP throughput is limited to ~320Mbps.
    - I can't reproduce ~450Mbps throughput. I think there must be some kind of mistake

    Yes this is correct, I was unable to replicate the ~450Mbps throughput.

    The output of the above command is as follows for both Client and Server. Is it correct?
    1 FF -61 7595 7595 00:00:01 iperf3

    This is correct and matching to what I see when I configure the iperf3 for FIFO and priority 60.

    We believe that your theory above does not hold true under the following circumstances. Don't you agree?
    In a situation where only one specific process is using most of the CPU and other processes are barely using the CPU,
    the performance of that process will not change whether the priority of that process is high or low.

    Yes, I see your point. If increasing priority does not impact throughput, then from your perspective you are thinking that if CPU load is decreased that will in turn also improve throughput?

    Please tell me the procedure for measuring CPU load with iperf3 in your environment.
    What Linux command is used to execute the above output?
    Our measurement procedure uses the vmstat command,
    running vmstat at two points, before and during the execution of iperf3,
    and calculating the CPU load from the increase in the two CPU load values ​​that are output.

    1. Setup PRP on both EVMs

    2. Run iperf3 with elevated priority (priority 85)

    3. Run htop to view the CPU load. This Linux utility tool should already be included in the filesystem from the SDK and can be used to view information about all processes that are currently running live. Information about htop: https://htop.dev/ 

    4. Compare the CPU load from htop to the case of PRP not being setup (Baseline CPSW) and the case of iperf3 without the elevated priority (Baseline PRP)

    The CPU load rates of iperf3 you measured above are all lower than ours, and the CPU is under load.
    In this case, where do you think the bottleneck in iperf3/PRP throughput is?
    According to our measurements using vmstat, the CPU load is over 90% when the throughput of iperf3/PRP is 300Mbps.
    Based on these results, we believe that the throughput bottleneck is in the CPU processing capacity.

    Could you show the result of running htop (show the entire output as a snapshot). I'm wondering if there might be some other processes you might have running in the background that resulted in the higher CPU load you reported.

    Will the PRP offload function of AM6442, scheduled for the first half of 2025, work with both PRU-ICSSG and CPSW3g?

    From my knowledge, the PRP offload function will be scheduled specifically for PRU-ICSSG Ethernet interfaces. This is because the offload feature will be to offload to the PRU cores which can be done through PRU-ICSSG interfaces. This is currently how HSR offload is handled.

    May I ask why the interest in PRP topology as opposed to HSR topology? 

    When will you be able to provide a beta version in advance?
      If it is difficult to provide one, please tell us the performance target value estimated at the time of design.
    - Has the PRP offload function ported an implementation that improves performance by improving the current PRP implementation?

    From my understanding, since PRP offload function is still in development, we currently don't have a date for when a version can be shared in advance. I can check if there is a performance target value but I'm guessing the performance will be similar to the HSR offloaded performance. 

    -Daolin