AM6442: PRP throughput on Linux

Takeshi OHNO

Part Number: AM6442
Other Parts Discussed in Thread: SK-AM64B, , AM6422, DRA821U

Tool/software:

Dear Experts,

The PRP throughput was measured by connecting two SK-AM64Bs in opposition to each other.
The measured throughput for client-server communication of iperf was approximately 300Mbps.

-The two Ethernet ports on the board are connected to the ports of the communication partner without an L2SW in between.
-The PRP stack implemented in the Linux driver was used in NON-offload mode for PRP.
The SDK used was 09.02.01.09 (29 Mar 2024).
-In a normal single-cable connection without PRP, the throughput is over 800Mbps.

So here are my questions.Please answer as much as possible.

Is the PRP throughput of 300Mbps for the SK-AM64B reasonable?
1)Is there an error in the way I'm using the PRP?
Is the interface speed supported by the PRP correct at 1Gbps? Or is it 100Mbps?
Does the PRP in the SDK I used implement the IEC 62439-3 Edition 3 standard?

2)How can I increase the throughput?
If I use two PC-Linux instead of SK-AM64B, the throughput of PRP is 1Gbps.
Will updating to the latest SDK 10.00.07.04 (14 Aug 2024) improve it?
Are there any tuning items?

3)Is this the limit of performance in non-offload mode?
When will the PRP offload modules (firmware) for PRU-ICSSG and CPSW3g be completed?
When will the Linux drivers for these offload modules be provided?

Thank you in advance
Regards,
Takeshi OHNO

3 months ago

0 Daolin Qiu 3 months ago

TI__Expert 8175 points

Hello Takeshi,

Takeshi OHNO said:
2)How can I increase the throughput?

A couple of initial suggestions for this:

elevate the priority of the ksoftirq threads
change the scheduling policy of ksoftirq from "TS" to "FIFO" and elevate the priority
elevate the priority of iperf3 if that is the test program you are using to test the throughput of PRP

Takeshi OHNO said:
1)Is there an error in the way I'm using the PRP?

Are you using the setup steps provided in https://software-dl.ti.com/processor-sdk-linux/esd/AM64X/09_02_01_09/exports/docs/linux/Foundational_Components/Kernel/Kernel_Drivers/Network/HSR_PRP_Non_Offload.html ? According to the same page, IEC 62439-3 is supported.

Takeshi OHNO said:
When will the PRP offload modules (firmware) for PRU-ICSSG and CPSW3g be completed?

As you probably already know, the current latest released SDK 10.00.07.04 (14 Aug 2024) does not support PRP offloaded. There are only some limited improvements that can be found by offloading PRP as discussed in https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1397884/am6442-prp-offload-firmware-on-linux. PRP offload support is currently planned for 1H 2025.

-Daolin

0 Takeshi OHNO 3 months ago in reply to Daolin Qiu

Prodigy 10 points

Hello Daolin,

Thank you for your quick response.
Please continue to discuss this matter for a little while.
I would like to ask a few additional questions.

Daolin Qiu said:
A couple of initial suggestions for this:

elevate the priority of the ksoftirq threads

change the scheduling policy of ksoftirq from "TS" to "FIFO" and elevate the priority

elevate the priority of iperf3 if that is the test program you are using to test the throughput of PRP

I am using iperf3.
The only processes or kernel processing running after Linux starts up are iperf3 and PRP communication.
Before starting iperf3, the A53 load is only 2-3%, but while iperf3 is running it reaches its limit of about 90%.
The method you advised gives priority to iperf3 and related processes over other processes,
so I don't think it will be very effective if there are almost no other processes.
One way to increase the PRP throughput would be to make the PRP parallel processing so that both A53 cores on the AM6442 can be fully used.
Is this possible?
Without using PRP, the iperf3 throughput is 800Mbps, but the CPU load at that time reaches its limit of about 90%.

Daolin Qiu said:
Are you using the setup steps provided in https://software-dl.ti.com/processor-sdk-linux/esd/AM64X/09_02_01_09/exports/docs/linux/Foundational_Components/Kernel/Kernel_Drivers/Network/HSR_PRP_Non_Offload.html ? According to the same page, IEC 62439-3 is supported.

Yes, it's almost the same procedure as the information you gave me. The difference is that I used "ifconfig" to set up the "ip link".
Since there doesn't seem to be a problem with how we're using it, can we assume that the problem of the PRP throughput being only 300Mbps is the limit of the AM6442?
Are you getting similar results in your environment?
Are there any plans to improve the throughput for the PRP Non-Offload implementation?

Daolin Qiu said:
As you probably already know, the current latest released SDK 10.00.07.04 (14 Aug 2024) does not support PRP offloaded. There are only some limited improvements that can be found by offloading PRP as discussed in https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1397884/am6442-prp-offload-firmware-on-linux. PRP offload support is currently planned for 1H 2025.

Thank you for the information about when PRP off-loading will be supported.
PRP processing can be almost completely covered by dup-offload and tag-in-offload on the sender, and tag-rm-offload on the receiver,
so the effect of off-loading is likely to be great.
If the PRP offload function is used, even if the PRP throughput does not improve to 300Mbps,
I hope that the load on the A53 core will decrease to the same level as when PRP is not used.

Thank you in advance
Regards,
Takeshi OHNO

0 Daolin Qiu 3 months ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hi Takeshi,

Takeshi OHNO said:
The only processes or kernel processing running after Linux starts up are iperf3 and PRP communication.

You should be able to see the ksoftirq threads with "ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep ksoft". These threads are related to handling ethernet interrupts so that is why we made the suggestion to try and elevate the priority and change the scheduling policy to a more real-time friendly scheduling policy.

Takeshi OHNO said:
One way to increase the PRP throughput would be to make the PRP parallel processing so that both A53 cores on the AM6442 can be fully used.
Is this possible?

From what I understand, a process is only able to make use of more than one core at the same time if it has been programmed/designed so that it can be split into multiple parallelizable threads at which point each of the threads could run on a different core. Since the PRP setup doesn't appear to be an application level program and it seems to be configured via Linux drivers, I don't think parallel processing of PRP will be possible. However, I could be incorrect by this statement and I will double check with the internal team.

Takeshi OHNO said:
PRP processing can be almost completely covered by dup-offload and tag-in-offload on the sender, and tag-rm-offload on the receiver,
so the effect of off-loading is likely to be grea

We currently don't have data on the specific CPU load improvements of specifically dup-offload, tag-in-offload, and tag-rm-offload but the effects should be present but won't be as great as the forwarding-offload that would be seen in an HSR setup.

A potential way to measure the load improvement of dup-offload, tag-in-offload, and tag-rm-offload may be possible by estimating the CPU load improvement by testing HSR only with dup-offload, tag-in-offload, and tag-rm-offload and without dup-offload, tag-in-offload, and tag-rm-offload.

Takeshi OHNO said:
Are you getting similar results in your environment?
Are there any plans to improve the throughput for the PRP Non-Offload implementation?

I recently haven't been able to test PRP in my environment but I will try to do so on Friday or early next week. If I'm able to replicate the performance you are seeing I will also need to check with the development team if there will be any plans to improve PRP Non-Offload implementation. From my knowledge, there will not be because we are planning to support PRP offload.

Please feel free to ping this thread again if you don't hear back from me by Monday next week.

-Daolin

0 Daolin Qiu 3 months ago in reply to Daolin Qiu

TI__Expert 8175 points

Hi Takeshi,

Is there a particular reason why you have chosen AM64x instead of for example AM62x? AM62x has a lower price and would result in more throughput with zero SW effort (has 4 A53 cores compared to AM64x 2 A53 cores).

Daolin Qiu said:
You should be able to see the ksoftirq threads with "ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep ksoft". These threads are related to handling ethernet interrupts so that is why we made the suggestion to try and elevate the priority and change the scheduling policy to a more real-time friendly scheduling policy.

Please let us know if you can see the ksoftirqs and tested what the performance is like with these elevated in priority.

Takeshi OHNO said:
The PRP throughput was measured by connecting two SK-AM64Bs in opposition to each other.
The measured throughput for client-server communication of iperf was approximately 300Mbps.

-The two Ethernet ports on the board are connected to the ports of the communication partner without an L2SW in between.
-The PRP stack implemented in the Linux driver was used in NON-offload mode for PRP.
The SDK used was 09.02.01.09 (29 Mar 2024).
-In a normal single-cable connection without PRP, the throughput is over 800Mbps.

Just to clarify, is your setup to test this the following? I.e. can you share a diagram of what your test topology looks like?

Board 1: SK-AM64B port 0 <> port 0 SK-AM64B (Board 2)

Board 1: SK-AM64B port 1 <> port 1 SK-AM64B (Board 2)

-Daolin

0 Takeshi OHNO 3 months ago in reply to Daolin Qiu

Prodigy 10 points

Hi Daolin,

First, a correction. I made a mistake in my original post.
In the title at the beginning, I should have written AM6442 instead of AM6422. Sorry.
Can you fix that?

Daolin Qiu said:
Is there a particular reason why you have chosen AM64x instead of for example AM62x?

The CPU board we need is one that can operate stably in both high and low environmental temperatures.
I've heard that the only SoC that meets this condition and is currently available is the AM64x.

Daolin Qiu said:
Please let us know if you can see the ksoftirqs and tested what the performance is like with these elevated in priority.

Previously, we changed to RT scheduling and raised the priority of ksoftirq in order to speed up receiving communications.
However, simply raising the priority of ksoftirq disrupted the processing order with other kernel processes, resulting in a number of incomprehensible system errors.
Do you know how to safely raise the priority of ksoftirq?

Daolin Qiu said:
Just to clarify, is your setup to test this the following? I.e. can you share a diagram of what your test topology looks like?

Yes, that is correct.
I have taken a photo of our measurement environment and will send them to you for your reference.

Daolin Qiu said:
I recently haven't been able to test PRP in my environment but I will try to do so on Friday or early next week. If I'm able to replicate the performance you are seeing I will also need to check with the development team if there will be any plans to improve PRP Non-Offload implementation.

We look forward to hearing from you regarding the above.

Thank you in advance.
Regards,
Takeshi OHNO

0 Daolin Qiu 3 months ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hi Takeshi,

Takeshi OHNO said:
In the title at the beginning, I should have written AM6442 instead of AM6422. Sorry.
Can you fix that?

Got it, I just fixed the part number in the post.

Takeshi OHNO said:
However, simply raising the priority of ksoftirq disrupted the processing order with other kernel processes, resulting in a number of incomprehensible system errors.
Do you know how to safely raise the priority of ksoftirq?

What priority are you raising the ksoftirqs to? From my understanding, the highest possible priority to raise in Linux is 98, priority 99 is used by the system tick and so running any application/thread at 99 might introduce issues.

Regarding testing on my environment, since I wanted to clarify your setup before I try to setup something and run on my side, I was not yet able to test PRP. I will not be able to get to setting this up this week and will be out of office early next week. I will try setting this up later next week or more likely the following week.

In the meantime, if possible, can you share exactly what the commands you ran run boot up of the EVMs to testing the CPU load and throughput so that once I get to this I can compare?

-Daolin

0 Takeshi OHNO 3 months ago in reply to Daolin Qiu

Prodigy 10 points

Hi Daolin,

Daolin Qiu said:
Got it, I just fixed the part number in the post.

Thank you for your trouble.

Daolin Qiu said:
What priority are you raising the ksoftirqs to?

Before answering, I have one question.
How much do you expect throughput to improve from the current value by raising ksoftirqs to its maximum of 98?
The CPU load due to processes unrelated to the throughput measurement process is 2-3%.
As I reported before, the CPU load just before starting the throughput measurement was 2-3%.
Therefore, I expect that the improvement in throughput when ksoftirqs is raised to the maximum of 98 will be at most 2-3%.
Is my thinking correct?

Daolin Qiu said:
In the meantime, if possible, can you share exactly what the commands you ran run boot up of the EVMs to testing the CPU load and throughput so that once I get to this I can compare?

While we appreciate your consideration, we would like you to provide us with the results as soon as possible,
so we will send you the command execution procedure for our environment.
(1) Network and PRP settings after starting Linux on the client: setPRP_client.sh

(2) Network and PRP settings after starting Linux on the server: setPRP_server.sh

(3) PRP performance measurement procedure: HowToUse_measurementPRP.txt
However, please stop CPU load measurements and tcpdump execution, which may affect the measurement, when measuring throughput.
・Client script: measurementPRP_client.sh
・Server script: measurementPRP_server.sh

Takeshi OHNO

MeasurementProcedure.zip

0 Daolin Qiu 3 months ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hello Takeshi,

Thank you for your patience. I was able to see similar results of ~300Mbps throughput rate when PRP was setup on two AM64x EVMs (for me this was on the TMDS64EVMs) on the CPSW ethernet interfaces.

Topology:

TMDS64EVM1 eth0 <> TMDS64EVM2 eth0

TMDS64EVM1 eth1 <> TMDS64EVM2 eth1

SDK Version:

RT-Linux 10.00.07

Test Sequence:

1. Run sh ./prp_setup.sh prp eth0 eth1 192.168.1.10 on EVM1 where prp_setup.sh is from the script in https://software-dl.ti.com/processor-sdk-linux/esd/AM64X/10_00_07_04/exports/docs/linux/Foundational_Components/Kernel/Kernel_Drivers/Network/HSR_PRP_Non_Offload.html?highlight=prp

2. Run sh ./prp_setup.sh prp eth0 eth1 192.168.1.20 on EVM2

3. Ping from EVM1 to EVM2 and disconnect one cable path --> no discontinuation of communication

4. Setup iperf3 server on EVM2 and iperf3 client on EVM1 using TCP communication and I see the below result. What is concerning is that there are retries occuring for every transfer, which is indicating that many packets are being dropped. This in turn could be influencing the throughput rate, resulting in a lower than expected rate.

EVM1 Console Log:

root@am64xx-evm:~# iperf3 -c 192.168.1.20                                                           
Connecting to host 192.168.1.20, port 5201
[  5] local 192.168.1.10 port 44482 connected to 192.168.1.20 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  26.0 MBytes   218 Mbits/sec   23   46.5 KBytes       
[  5]   1.00-2.00   sec  26.2 MBytes   220 Mbits/sec   20   36.6 KBytes       
[  5]   2.00-3.00   sec  26.8 MBytes   224 Mbits/sec   22   52.1 KBytes       
[  5]   3.00-4.00   sec  26.6 MBytes   223 Mbits/sec   22   76.0 KBytes       
[  5]   4.00-5.00   sec  32.5 MBytes   273 Mbits/sec   13   97.2 KBytes       
[  5]   5.00-6.00   sec  38.9 MBytes   326 Mbits/sec   32   81.7 KBytes       
[  5]   6.00-7.00   sec  38.8 MBytes   325 Mbits/sec   11   98.6 KBytes       
[  5]   7.00-8.00   sec  39.6 MBytes   332 Mbits/sec   14   80.3 KBytes       
[  5]   8.00-9.00   sec  39.8 MBytes   333 Mbits/sec   12   69.0 KBytes       
[  5]   9.00-10.01  sec  38.8 MBytes   322 Mbits/sec   12   80.3 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   334 MBytes   280 Mbits/sec  181             sender
[  5]   0.00-10.01  sec   334 MBytes   280 Mbits/sec                  receiver

iperf Done.
root@am64xx-evm:~#

Some additional actions I tried was

1. Changing the iperf3 priority to 60 and scheduling policy from Round Robin to FIFO --> increased throughput slightly to ~450Mbps

2. Changing ksoftirq0 and ksoftirq1 priority to 70 and scheduling policy from Time Slice to FIFO --> did not result in any noticeable changes to throughput

Takeshi OHNO said:
Is there a particular reason why you have chosen AM64x instead of for example AM62x?

Takeshi OHNO said:
The CPU board we need is one that can operate stably in both high and low environmental temperatures.
I've heard that the only SoC that meets this condition and is currently available is the AM64x.

May I ask where you heard this information from? Comparing the datasheets for AM64x and AM62x the temperature range is the same for both.

AM62x: https://www.ti.com/document-viewer/AM625/datasheet#GUID-086D5402-AAEC-4ED6-B0BD-CFF7E4FE15D5/TITLE-SPRSP56INT_SPRS858_PACK_00005

AM64x: https://www.ti.com/document-viewer/AM6442/datasheet#GUID-CD6E613A-70C1-4449-8B74-B692CA434DE6/TITLE-SPRSP56INT_SPRS858_PACK_00005

Again, why we point to AM62x as a better option is due to the higher CPU frequency and number of cores which can potentially improve the throughput performance.

What is the end equipment that you are trying to implement PRP for? It might be useful to try and compare PRP performance on AM62x if you have access to some AM62x SK-EVMs.

-Daolin

0 Takeshi OHNO 2 months ago in reply to Daolin Qiu

Prodigy 10 points

Hi Daolin,

Thank you for measuring at your company as well.
I'm relieved that you got the same results.

Daolin Qiu said:
4. Setup iperf3 server on EVM2 and iperf3 client on EVM1 using TCP communication and I see the below result. What is concerning is that there are retries occuring for every transfer, which is indicating that many packets are being dropped. This in turn could be influencing the throughput rate, resulting in a lower than expected rate.

In my measurements, I observed packet drops, but they occurred infrequently, so I believe the impact on throughput is small.

Daolin Qiu said:
1. Changing the iperf3 priority to 60 and scheduling policy from Round Robin to FIFO --> increased throughput slightly to ~450Mbps

I'm amazed by your results.
The throughput has improved by just under 50%.
I'll measure it under the same conditions.
However, the version of RT-Linux used here is 09.02.01.09.

Daolin Qiu said:
May I ask where you heard this information from? Comparing the datasheets for AM64x and AM62x the temperature range is the same for both.

Certainly, what you say is correct.
I will now check the authenticity of my information. It is possible that I have misunderstood something.
Please wait a moment.

Regards,
Takeshi OHNO

0 Takeshi OHNO 2 months ago in reply to Takeshi OHNO

Prodigy 10 points

Hi Daolin,

Daolin Qiu said:
1. Changing the iperf3 priority to 60 and scheduling policy from Round Robin to FIFO --> increased throughput slightly to ~450Mbps

We performed the same measurements as above in our environment, but did not see the improvement in throughput that you experienced.
Our results were roughly the same throughput as last time.

So, I have three questions for you:

1. I have attached the file with our measurements, but is there any difference between our measurements and your procedure?
- Although the type of evaluation board is different, I think the performance is almost the same.
- I changed the SDK version from 09.02.01.09 to 10.00.07 as yours, and the results were the same.

FIFO_SchedulingPolicy.zip

2. Can you observe a throughput of 450Mbps by repeating the process several times?
I would like to know the fluctuation of the throughput you measured, so please tell me the maximum and minimum throughput.

3. Why do you think your throughput was improved by the above changes?
Processing other than the processing related to the measurement only takes up a few percent of the CPU load.
Even so, the above changes improved it by nearly 50%.
From this, I do not think it is due to increasing the priority.
If you change to the FIFO scheduling policy of RT-Linux, is it possible that the mechanism of thread waiting and synchronization will change?

Daolin Qiu said:
May I ask where you heard this information from? Comparing the datasheets for AM64x and AM62x the temperature range is the same for both.

The information I explained was from before the AM62x specs were made public. My apologies.
Even now, the reason to choose the AM64x is that it has four Ethernet ports.
On the other hand, the AM62x only has two ports.
I'm not an expert on hardware, but it would be difficult to increase the number of ports on the AM62x to four.

Regards,
Takeshi OHNO

0 Daolin Qiu 2 months ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hi Takeshi,

Takeshi OHNO said:
1. I have attached the file with our measurements, but is there any difference between our measurements and your procedure?

Are you able to double check the Sched Policy and Priority of your iperf3 configuration with "ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep iperf"?

I do not think there any significant differences between our setups other than what you have already pointed out.

Takeshi OHNO said:
We performed the same measurements as above in our environment, but did not see the improvement in throughput that you experienced.

Takeshi OHNO said:
2. Can you observe a throughput of 450Mbps by repeating the process several times?

I attempted to reproduce the ~450Mbps throughput I saw from testing last week. Unfortunately, using the same steps as I used before, I can only see maximum of ~320Mbps for testing 4 times.

root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
Connecting to host 192.168.1.20, port 5201
[  5] local 192.168.1.10 port 50890 connected to 192.168.1.20 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  37.1 MBytes   311 Mbits/sec   19   40.8 KBytes       
[  5]   1.00-2.00   sec  38.8 MBytes   325 Mbits/sec   18   70.4 KBytes       
[  5]   2.00-3.00   sec  39.6 MBytes   332 Mbits/sec   38   76.0 KBytes       
[  5]   3.00-4.00   sec  39.0 MBytes   327 Mbits/sec   19   81.7 KBytes       
[  5]   4.00-5.00   sec  39.1 MBytes   328 Mbits/sec   35   60.6 KBytes       
[  5]   5.00-6.00   sec  39.5 MBytes   331 Mbits/sec   30   62.0 KBytes       
[  5]   6.00-7.00   sec  39.5 MBytes   331 Mbits/sec   19   70.4 KBytes       
[  5]   7.00-8.00   sec  40.2 MBytes   338 Mbits/sec   17   91.5 KBytes       
[  5]   8.00-9.00   sec  39.5 MBytes   331 Mbits/sec   15    107 KBytes       
[  5]   9.00-10.00  sec  40.5 MBytes   339 Mbits/sec   19   76.0 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   393 MBytes   330 Mbits/sec  229             sender
[  5]   0.00-10.00  sec   392 MBytes   329 Mbits/sec                  receiver

iperf Done.
root@am64xx-evm:~# 
root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
Connecting to host 192.168.1.20, port 5201
[  5] local 192.168.1.10 port 42044 connected to 192.168.1.20 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  26.4 MBytes   221 Mbits/sec   33   63.4 KBytes       
[  5]   1.00-2.00   sec  26.5 MBytes   222 Mbits/sec   27   54.9 KBytes       
[  5]   2.00-3.00   sec  26.1 MBytes   219 Mbits/sec   22   36.6 KBytes       
[  5]   3.00-4.00   sec  26.1 MBytes   219 Mbits/sec   25   53.5 KBytes       
[  5]   4.00-5.00   sec  25.4 MBytes   213 Mbits/sec   25   52.1 KBytes       
[  5]   5.00-6.00   sec  31.0 MBytes   260 Mbits/sec   18   80.3 KBytes       
[  5]   6.00-7.00   sec  38.1 MBytes   320 Mbits/sec   15   97.2 KBytes       
[  5]   7.00-8.00   sec  37.8 MBytes   317 Mbits/sec   17   64.8 KBytes       
[  5]   8.00-9.00   sec  38.2 MBytes   321 Mbits/sec   18   60.6 KBytes       
[  5]   9.00-10.00  sec  38.0 MBytes   318 Mbits/sec   18   56.3 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   314 MBytes   263 Mbits/sec  218             sender
[  5]   0.00-10.00  sec   314 MBytes   263 Mbits/sec                  receiver

iperf Done.
root@am64xx-evm:~# 
root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
Connecting to host 192.168.1.20, port 5201
[  5] local 192.168.1.10 port 57844 connected to 192.168.1.20 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  37.8 MBytes   317 Mbits/sec   14   94.3 KBytes       
[  5]   1.00-2.00   sec  36.2 MBytes   304 Mbits/sec   16   88.7 KBytes       
[  5]   2.00-3.00   sec  38.8 MBytes   325 Mbits/sec   16   62.0 KBytes       
[  5]   3.00-4.00   sec  38.2 MBytes   321 Mbits/sec   14   67.6 KBytes       
[  5]   4.00-5.00   sec  38.5 MBytes   323 Mbits/sec   18   49.3 KBytes       
[  5]   5.00-6.00   sec  37.9 MBytes   318 Mbits/sec   17   81.7 KBytes       
[  5]   6.00-7.00   sec  38.5 MBytes   323 Mbits/sec   15   64.8 KBytes       
[  5]   7.00-8.00   sec  38.5 MBytes   323 Mbits/sec   15    103 KBytes       
[  5]   8.00-9.00   sec  39.0 MBytes   327 Mbits/sec   15   50.7 KBytes       
[  5]   9.00-10.00  sec  38.0 MBytes   318 Mbits/sec   14    100 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   382 MBytes   320 Mbits/sec  154             sender
[  5]   0.00-10.01  sec   380 MBytes   319 Mbits/sec                  receiver

iperf Done.
root@am64xx-evm:~# 
root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
Connecting to host 192.168.1.20, port 5201
[  5] local 192.168.1.10 port 55036 connected to 192.168.1.20 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  26.0 MBytes   218 Mbits/sec   44   40.8 KBytes       
[  5]   1.00-2.00   sec  25.6 MBytes   215 Mbits/sec   26   46.5 KBytes       
[  5]   2.00-3.00   sec  27.2 MBytes   229 Mbits/sec   20   31.0 KBytes       
[  5]   3.00-4.00   sec  25.4 MBytes   213 Mbits/sec   25   36.6 KBytes       
[  5]   4.00-5.00   sec  26.2 MBytes   220 Mbits/sec   26   50.7 KBytes       
[  5]   5.00-6.00   sec  26.2 MBytes   220 Mbits/sec   32   54.9 KBytes       
[  5]   6.00-7.00   sec  27.0 MBytes   226 Mbits/sec   21   50.7 KBytes       
[  5]   7.00-8.00   sec  28.0 MBytes   235 Mbits/sec   22   57.7 KBytes       
[  5]   8.00-9.00   sec  39.4 MBytes   330 Mbits/sec   17   56.3 KBytes       
[  5]   9.00-10.00  sec  39.1 MBytes   328 Mbits/sec   20   56.3 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   291 MBytes   244 Mbits/sec  253             sender
[  5]   0.00-10.00  sec   290 MBytes   243 Mbits/sec                  receiver

iperf Done.
root@am64xx-evm:~# 
root@am64xx-evm:~# 
root@am64xx-evm:~# chrt -f 60 iperf3 -c 192.168.1.20
Connecting to host 192.168.1.20, port 5201
[  5] local 192.168.1.10 port 43844 connected to 192.168.1.20 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  33.2 MBytes   279 Mbits/sec   18    100 KBytes       
[  5]   1.00-2.00   sec  35.6 MBytes   299 Mbits/sec   15   66.2 KBytes       
[  5]   2.00-3.00   sec  34.4 MBytes   288 Mbits/sec   14   74.6 KBytes       
[  5]   3.00-4.00   sec  34.8 MBytes   292 Mbits/sec   16   73.2 KBytes       
[  5]   4.00-5.00   sec  35.6 MBytes   299 Mbits/sec   33   78.9 KBytes       
[  5]   5.00-6.00   sec  38.6 MBytes   324 Mbits/sec   22   56.3 KBytes       
[  5]   6.00-7.00   sec  39.2 MBytes   329 Mbits/sec   18   40.8 KBytes       
[  5]   7.00-8.00   sec  39.5 MBytes   331 Mbits/sec   18   40.8 KBytes       
[  5]   8.00-9.00   sec  39.0 MBytes   327 Mbits/sec   17   69.0 KBytes       
[  5]   9.00-10.00  sec  40.4 MBytes   338 Mbits/sec   15   83.1 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   370 MBytes   311 Mbits/sec  186             sender
[  5]   0.00-10.01  sec   370 MBytes   310 Mbits/sec                  receiver

iperf Done.
root@am64xx-evm:~#

Takeshi OHNO said:
3. Why do you think your throughput was improved by the above changes?

The FIFO scheduling policy is known to be a "real-time" policy that will always immediately preempt any currently running SCHED_OTHER, SCHED_BATCH or SCHED_IDLE processes. Theoretically, changing to FIFO should help prevent other processes in the background from preempting the iperf3 process.

However, theoretically, the main improvement in throughput should be from the priority (higher the priority, the better the throughput is the theoretical behavior).

I noticed that comparing the below three, #3 has a much better load for the iperf3 process. However, what is weird is that I wasn't able to see the throughput improvement that I saw from last week.

1. Baseline CPSW (no PRP) CPU Load = ~35.8%

2. Baseline PRP CPU Load = ~77.2%

3. FIFO priority 85 CPU load = ~17.6%

4. Baseline CPSW CPU load
    0[####************************************************************                         66.9%] Tasks: 34, 16 thr, 119 kthr; 0 running
    1[*********************************************************************************        86.0%] Load average: 1.38 0.70 0.28 
  Mem[||||||#@$$$$$$$$$$                                                                  114M/1.78G] Uptime: 00:02:52
  Swp[                                                                                         0K/0K]

  [Main] [I/O]
    PID USER       PRI  NI  VIRT   RES   SHR S  CPU%-MEM%   TIME+  Command                                                                                                                                  
    956 root        20   0 17560  3328  2688 S  35.8  0.2  0:20.33 iperf3 -c 192.168.2.20 -t0                                                                                                               
    957 root        20   0  5588  3840  2432 R   4.1  0.2  0:02.65 htop

5. Baseline PRP CPU load
    0[###**************************************************************************            80.5%] Tasks: 35, 16 thr, 119 kthr; 0 running
    1[*********************************************************                                59.9%] Load average: 1.57 0.96 0.42 
  Mem[||||||#@$$$$$$$$$$                                                                  109M/1.78G] Uptime: 00:04:20
  Swp[                                                                                         0K/0K]

  [Main] [I/O]
    PID USER       PRI  NI  VIRT   RES   SHR S  CPU%-MEM%   TIME+  Command                                                                                                                                  
   1038 root       -51   0 17560  3328  2688 R  77.2  0.2  0:19.59 iperf3 -c 192.168.1.20 -t0                                                                                                               
   1039 root        20   0  5584  3840  2432 R   4.0  0.2  0:01.04 htop
    961 root        20   0 10136  2432  1152 S   1.3  0.1  0:01.04 rpmsg_json /usr/share/benchmark-server/app/oob_data.json

6. FIFO 85 iperf CPU load
    0[###*************************************************                                     54.9%] Tasks: 35, 16 thr, 119 kthr; 0 running
    1[#********************************************************************                    72.7%] Load average: 1.58 1.04 0.47 
  Mem[||||||#@$$$$$$$$$$                                                                  110M/1.78G] Uptime: 00:05:09
  Swp[                                                                                         0K/0K]

  [Main] [I/O]
    PID USER       PRI  NI  VIRT   RES   SHR S  CPU%-MEM%   TIME+  Command                                                                                                                                  
   1051 root       -86   0 17560  3328  2688 S  17.6  0.2  0:02.19 iperf3 -c 192.168.1.20 -t0                                                                                                               
   1052 root        20   0  5584  3840  2432 R   4.1  0.2  0:00.56 htop
    157 root        20   0 28084  8320  5760 S   1.4  0.4  0:01.91 /usr/lib/systemd/systemd-udevd

Takeshi OHNO said:
On the other hand, the AM62x only has two ports.

Yes it looks like from the product page, AM62x only supports 2 external ethernet ports: https://www.ti.com/product/AM625#features

I'll need to check with the internal team to see if there are additional ideas to why the throughput observed is lower than expected. I'm aiming to give an update later this week.

-Daolin

0 Takeshi OHNO 2 months ago in reply to Daolin Qiu

Prodigy 10 points

Hi Daolin,

Thank you for reproducing the ~450Mbps throughput test.

Daolin Qiu said:
I'll need to check with the internal team to see if there are additional ideas to why the throughput observed is lower than expected. I'm aiming to give an update later this week.

I'm waiting for an update from you, but I haven't heard anything.
So, I've concluded the following from your AM6442 measurements. Is this correct?
- PRP throughput is limited to ~320Mbps.
- I can't reproduce ~450Mbps throughput. I think there must be some kind of mistake.

Daolin Qiu said:
Are you able to double check the Sched Policy and Priority of your iperf3 configuration with "ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep iperf"?

The output of the above command is as follows for both Client and Server. Is it correct?
1 FF -61 7595 7595 00:00:01 iperf3

Daolin Qiu said:
However, theoretically, the main improvement in throughput should be from the priority (higher the priority, the better the throughput is the theoretical behavior).

We believe that your theory above does not hold true under the following circumstances. Don't you agree?
In a situation where only one specific process is using most of the CPU and other processes are barely using the CPU,
the performance of that process will not change whether the priority of that process is high or low.

4. Baseline CPSW CPU load
0[####************************************************************ 66.9%] Tasks: 34, 16 thr, 119 kthr; 0 running
1[********************************************************************************* 86.0%] Load average: 1.38 0.70 0.28
Mem[||||||#@$$$$$$$$$$ 114M/1.78G] Uptime: 00:02:52
Swp[ 0K/0K]
[Main] [I/O]

Please tell me the procedure for measuring CPU load with iperf3 in your environment.
What Linux command is used to execute the above output?
Our measurement procedure uses the vmstat command,
running vmstat at two points, before and during the execution of iperf3,
and calculating the CPU load from the increase in the two CPU load values that are output.

Daolin Qiu said:
1. Baseline CPSW (no PRP) CPU Load = ~35.8%

2. Baseline PRP CPU Load = ~77.2%

3. FIFO priority 85 CPU load = ~17.6%

The CPU load rates of iperf3 you measured above are all lower than ours, and the CPU is under load.
In this case, where do you think the bottleneck in iperf3/PRP throughput is?
According to our measurements using vmstat, the CPU load is over 90% when the throughput of iperf3/PRP is 300Mbps.
Based on these results, we believe that the throughput bottleneck is in the CPU processing capacity.

I have three new questions.
- Will the PRP offload function of AM6442, scheduled for the first half of 2025, work with both PRU-ICSSG and CPSW3g?

- I would like to know the iperf3 throughput value with the PRP offload function as soon as possible.
When will you be able to provide a beta version in advance?
If it is difficult to provide one, please tell us the performance target value estimated at the time of design.
- Has the PRP offload function ported an implementation that improves performance by improving the current PRP implementation?

Regards,
Takeshi OHNO

0 Daolin Qiu 2 months ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hi Takeshi,

Takeshi OHNO said:
I'm waiting for an update from you, but I haven't heard anything.

Thanks for following up, apologies for the late response, I still have not learned of any useful information as to why this throughput would be around ~300Mbps. What I'm currently trying to understand is the relationship between the throughput and the CPU load.

Takeshi OHNO said:
So, I've concluded the following from your AM6442 measurements. Is this correct?
- PRP throughput is limited to ~320Mbps.
- I can't reproduce ~450Mbps throughput. I think there must be some kind of mistake

Yes this is correct, I was unable to replicate the ~450Mbps throughput.

Takeshi OHNO said:
The output of the above command is as follows for both Client and Server. Is it correct?
1 FF -61 7595 7595 00:00:01 iperf3

This is correct and matching to what I see when I configure the iperf3 for FIFO and priority 60.

Takeshi OHNO said:
We believe that your theory above does not hold true under the following circumstances. Don't you agree?
In a situation where only one specific process is using most of the CPU and other processes are barely using the CPU,
the performance of that process will not change whether the priority of that process is high or low.

Yes, I see your point. If increasing priority does not impact throughput, then from your perspective you are thinking that if CPU load is decreased that will in turn also improve throughput?

Takeshi OHNO said:
Please tell me the procedure for measuring CPU load with iperf3 in your environment.
What Linux command is used to execute the above output?
Our measurement procedure uses the vmstat command,
running vmstat at two points, before and during the execution of iperf3,
and calculating the CPU load from the increase in the two CPU load values that are output.

1. Setup PRP on both EVMs

2. Run iperf3 with elevated priority (priority 85)

3. Run htop to view the CPU load. This Linux utility tool should already be included in the filesystem from the SDK and can be used to view information about all processes that are currently running live. Information about htop: https://htop.dev/

4. Compare the CPU load from htop to the case of PRP not being setup (Baseline CPSW) and the case of iperf3 without the elevated priority (Baseline PRP)

Takeshi OHNO said:
The CPU load rates of iperf3 you measured above are all lower than ours, and the CPU is under load.
In this case, where do you think the bottleneck in iperf3/PRP throughput is?
According to our measurements using vmstat, the CPU load is over 90% when the throughput of iperf3/PRP is 300Mbps.
Based on these results, we believe that the throughput bottleneck is in the CPU processing capacity.

Could you show the result of running htop (show the entire output as a snapshot). I'm wondering if there might be some other processes you might have running in the background that resulted in the higher CPU load you reported.

Takeshi OHNO said:
Will the PRP offload function of AM6442, scheduled for the first half of 2025, work with both PRU-ICSSG and CPSW3g?

From my knowledge, the PRP offload function will be scheduled specifically for PRU-ICSSG Ethernet interfaces. This is because the offload feature will be to offload to the PRU cores which can be done through PRU-ICSSG interfaces. This is currently how HSR offload is handled.

May I ask why the interest in PRP topology as opposed to HSR topology?

Takeshi OHNO said:
When will you be able to provide a beta version in advance?
If it is difficult to provide one, please tell us the performance target value estimated at the time of design.
- Has the PRP offload function ported an implementation that improves performance by improving the current PRP implementation?

From my understanding, since PRP offload function is still in development, we currently don't have a date for when a version can be shared in advance. I can check if there is a performance target value but I'm guessing the performance will be similar to the HSR offloaded performance.

-Daolin

0 Takeshi OHNO 2 months ago in reply to Daolin Qiu

Prodigy 10 points

Daolin Qiu said:
What I'm currently trying to understand is the relationship between the throughput and the CPU load.

We hope that both of your iperf3/PRP measurement results will be the same as ours.
Your communication throughput measurement result was about 300Mbps, so there is no problem.
On the other hand, your CPU usage measurement result differs from ours. We believe that the cause is the difference in measurement method.
We also tried the htop command that you used.
The output screen of this command has a problem where it updates with some delay when the CPU load is high, so we determined that this command does not display CPU usage correctly.
Therefore, we recommend that you use the same vmstat/mpstat as we did to measure CPU usage accurately.
While vmstat outputs the average CPU usage per core, mpstat also outputs the CPU usage per core in addition to that.
Therefore, we would like you to measure CPU usage using mpstat as follows.
- mpstat -P ALL 1

We would also like you to at least measure the CPU usage of the data receiving side, i.e. the server machine.
The reason for this is that, according to our measurements, the CPU load of the server machine reaches its upper limit before that of the client machine.
(The server machine's CPU usage reaches 90% or more.)
The server machine's receiving process is the bottleneck for communication throughput.
The same phenomenon was observed when measuring iperf3 throughput without using PRP.
We apologize for the inconvenience, but please make sure that the same phenomenon can be observed in your environment.

Specifically, we would like you to measure the following two types of CPU usage on the server machine in your environment. The increase of the two CPU usage rates will be the CPU usage for iperf3/prp.
1. Measure the CPU usage before running iperf3/PRP.
If CPU usage is more than 2-3%, check whether any processes that are not necessary for the measurement are running.
If it is high, stop the unnecessary processes.
2. Measure the CPU usage while iperf3/PRP is running.
You should be able to confirm that CPU usage is at or above 90%, the same as ours.

Daolin Qiu said:
Could you show the result of running htop (show the entire output as a snapshot). I'm wondering if there might be some other processes you might have running in the background that resulted in the higher CPU load you reported.

In our environment, no processes unrelated to iperf3/PRP were started when iperf3/PRP was executed. (See attached file)

Daolin Qiu said:
If increasing priority does not impact throughput, then from your perspective you are thinking that if CPU load is decreased that will in turn also improve throughput?

This is not accurate.
We are saying that if no processing other than iperf3/PRP is being performed and CPU usage is 90% or more just due to iperf3/PRP-related processing,
then no matter how high the priority of iperf3/PRP is set, throughput will not improve.

Daolin Qiu said:
May I ask why the interest in PRP topology as opposed to HSR topology?

With HSR, communications occur along both the clockwise and counterclockwise routes of the ring.
For example, when communicating with a machine on the right of the ring, communications via a direct route to that machine have much smaller latency than the reverse route.
Therefore, frames arriving from the direct route will be received first, and frames arriving from the reverse route will be delayed, so they become duplicate frames and are discarded.
If a route failure causes the direct route to be disconnected, the route will switch to the reverse route, and communications delays will be significantly larger.
Because we require end-to-end RealTime communication, we chose PRP, which keeps communications latency constant even when routes are switched.

Regards,
Takeshi OHNO

0 Daolin Qiu 2 months ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hello Takeshi,

Thanks for explaining your reasonings on the CPU load differences and the use of PRP topology.

I will try to run these tests as you described Friday or early next week (I'm out of office Tues-Thurs).

-Daolin

0 Daolin Qiu 2 months ago in reply to Daolin Qiu

TI__Expert 8175 points

Hi Takeshi,

Apologies for the delay

Here are my findings:

Takeshi OHNO said:
1. Measure the CPU usage before running iperf3/PRP.
If CPU usage is more than 2-3%, check whether any processes that are not necessary for the measurement are running.
If it is high, stop the unnecessary processes.

No PRP, no iperf3 setup:

TMDS64EVM1 CPU load --> ~1% load

data-mode="text">root@am64xx-evm:~# uname -a 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty #1 SMP PREEMPT_RT Fri Jul 26 14:42:37 UTC 20x g04a9ad081f0f-dirty (am64xx-evm) 03/02/24 _aarch64_ (2 ) %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 99.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 99.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.98 0.00 0.98 0.00 0.00 0.00 0.00 0.00 98.04 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.50 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.50 0.00 1.01 0.00 0.00 0.00 0.00 0.00 98.49 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 98.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.50 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 99.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 99.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 %usr %nice %sys %iowait %irq %soft %steal %guest %idle 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 99.00

TMDS64EVM2 CPU load --> ~1% load

root@am64xx-evm:~# uname -a
Linux am64xx-evm 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty #1 SMP PREEMPT_RT Fri Jul 26 14:42:37 UTC 202x
root@am64xx-evm:~# mpstat -P ALL 1
Linux 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty (am64xx-evm)        03/02/24        _aarch64_       (2 C)

04:57:21     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:22     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:22       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:22       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

04:57:22     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:23     all    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.50
04:57:23       0    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   99.00
04:57:23       1    0.00    0.00    0.99    0.00    0.00    0.00    0.00    0.00   99.01

04:57:23     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:24     all    0.00    0.00    0.00    0.00    0.00    0.50    0.00    0.00   99.50
04:57:24       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:24       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

04:57:24     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:25     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:25       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:25       1    0.00    0.00    0.00    0.00    0.00    0.99    0.00    0.00   99.01

04:57:25     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:26     all    0.50    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.50
04:57:26       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:26       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

04:57:26     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:27     all    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.50
04:57:27       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:27       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

04:57:27     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:28     all    0.00    0.00    0.00    0.00    0.00    0.50    0.00    0.00   99.50
04:57:28       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:28       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

04:57:28     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:29     all    0.00    0.00    1.01    0.00    0.00    0.00    0.00    0.00   98.99
04:57:29       0    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   99.00
04:57:29       1    0.00    0.00    1.01    0.00    0.00    0.00    0.00    0.00   98.99

04:57:29     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:30     all    0.50    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.50
04:57:30       0    0.00    0.00    0.00    0.00    0.00    0.99    0.00    0.00   99.01
04:57:30       1    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00   98.02

04:57:30     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:31     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:31       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:31       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

04:57:31     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:32     all    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.50
04:57:32       0    0.00    0.00    0.99    0.00    0.00    0.00    0.00    0.00   99.01
04:57:32       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

04:57:32     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:33     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:33       0    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
04:57:33       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

04:57:33     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
04:57:34     all    0.00    0.00    0.50    0.00    0.00    0.50    0.00    0.00   99.00
04:57:34       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
04:57:34       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
^C
root@am64xx-evm:~#

Takeshi OHNO said:
2. Measure the CPU usage while iperf3/PRP is running.
You should be able to confirm that CPU usage is at or above 90%, the same as ours.

Takeshi OHNO said:
The same phenomenon was observed when measuring iperf3 throughput without using PRP.

No PRP, baseline iperf3 setup:

Both eth0 and eth1 connected and running iperf3

TMDS64EVM1 CPU load (iperf3 client) --> greater 90% load

root@am64xx-evm:~# iperf3 -c 192.168.3.20 -t0 > /dev/null &
[1] 1084
root@am64xx-evm:~# iperf3 -c 192.168.2.20 -t0 > /dev/null &
[2] 1093
root@am64xx-evm:~# mpstat -P ALL 1                                                            
Linux 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty (am64xx-evm)        03/02/24        _aarch64_       (2 )

06:07:47     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:48     all    0.54    0.00   35.68    0.00    0.00   43.78    0.00    0.00   20.00
06:07:48       0    0.00    0.00   53.19    0.00    0.00   17.02    0.00    0.00   29.79
06:07:48       1    0.00    0.00   17.78    0.00    0.00   72.22    0.00    0.00   10.00

06:07:48     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:49     all    1.63    0.00   38.04    0.00    0.00   41.30    0.00    0.00   19.02
06:07:49       0    4.21    0.00   50.53    0.00    0.00   15.79    0.00    0.00   29.47
06:07:49       1    0.00    0.00   23.91    0.00    0.00   67.39    0.00    0.00    8.70

06:07:49     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:50     all    0.54    0.00   38.71    0.00    0.00   41.40    0.00    0.00   19.35
06:07:50       0    0.00    0.00   55.21    0.00    0.00   16.67    0.00    0.00   28.12
06:07:50       1    0.00    0.00   21.59    0.00    0.00   68.18    0.00    0.00   10.23

06:07:50     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:51     all    1.62    0.00   35.68    0.00    0.00   40.54    0.00    0.00   22.16
06:07:51       0    2.11    0.00   50.53    0.00    0.00   16.84    0.00    0.00   30.53
06:07:51       1    1.11    0.00   20.00    0.00    0.00   66.67    0.00    0.00   12.22

06:07:51     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:52     all    0.55    0.00   30.60    0.00    0.00   42.08    0.00    0.00   26.78
06:07:52       0    2.06    0.00   42.27    0.00    0.00   13.40    0.00    0.00   42.27
06:07:52       1    0.00    0.00   17.98    0.00    0.00   71.91    0.00    0.00   10.11

06:07:52     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:53     all    1.60    0.00   33.69    0.00    0.00   41.18    0.00    0.00   23.53
06:07:53       0    1.06    0.00   46.81    0.00    0.00   14.89    0.00    0.00   37.23
06:07:53       1    1.11    0.00   20.00    0.00    0.00   70.00    0.00    0.00    8.89

06:07:53     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:54     all    0.00    0.00   23.24    0.00    0.00   42.16    0.00    0.00   34.59
06:07:54       0    0.00    0.00   32.63    0.00    0.00    5.26    0.00    0.00   62.11
06:07:54       1    0.00    0.00   13.19    0.00    0.00   80.22    0.00    0.00    6.59

06:07:54     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:55     all    1.69    0.00   37.64    0.00    0.00   36.52    0.00    0.00   24.16
06:07:55       0    2.20    0.00   42.86    0.00    0.00   35.16    0.00    0.00   19.78
06:07:55       1    2.27    0.00   31.82    0.00    0.00   37.50    0.00    0.00   28.41

06:07:55     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:56     all    5.52    0.00   41.44    0.00    0.00   33.15    0.00    0.00   19.89
06:07:56       0    7.87    0.00   32.58    0.00    0.00   43.82    0.00    0.00   15.73
06:07:56       1    2.20    0.00   50.55    0.00    0.00   23.08    0.00    0.00   24.18

06:07:56     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:57     all    1.73    0.00   43.35    0.00    0.00   30.64    0.00    0.00   24.28
06:07:57       0    1.12    0.00   48.31    0.00    0.00   38.20    0.00    0.00   12.36
06:07:57       1    2.41    0.00   39.76    0.00    0.00   21.69    0.00    0.00   36.14

06:07:57     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:58     all    1.07    0.00   35.29    0.00    0.00   41.71    0.00    0.00   21.93
06:07:58       0    2.04    0.00   50.00    0.00    0.00   15.31    0.00    0.00   32.65
06:07:58       1    1.11    0.00   17.78    0.00    0.00   70.00    0.00    0.00   11.11

06:07:58     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:07:59     all    2.69    0.00   34.41    0.00    0.00   41.94    0.00    0.00   20.97
06:07:59       0    3.12    0.00   48.96    0.00    0.00   16.67    0.00    0.00   31.25
06:07:59       1    2.17    0.00   19.57    0.00    0.00   68.48    0.00    0.00    9.78

06:07:59     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:08:00     all    0.53    0.00   24.74    0.53    0.00   42.11    0.00    0.00   32.11
06:08:00       0    1.01    0.00   35.35    1.01    0.00    7.07    0.00    0.00   55.56
06:08:00       1    0.00    0.00   13.33    0.00    0.00   81.11    0.00    0.00    5.56

06:08:00     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:08:01     all    1.60    0.00   24.06    0.00    0.00   42.25    0.00    0.00   32.09
06:08:01       0    2.08    0.00   34.38    0.00    0.00    7.29    0.00    0.00   56.25
06:08:01       1    1.10    0.00   13.19    0.00    0.00   78.02    0.00    0.00    7.69

06:08:01     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:08:02     all    0.54    0.00   37.30    0.00    0.00   33.51    0.00    0.00   28.65
06:08:02       0    1.01    0.00   56.57    0.00    0.00   13.13    0.00    0.00   29.29
06:08:02       1    0.00    0.00   13.95    0.00    0.00   58.14    0.00    0.00   27.91

06:08:02     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:08:03     all    2.35    0.00   3^C
root@am64xx-evm:~#

TMDS64EVM2 CPU load (iperf3 server) --> greater 90% load

root@am64xx-evm:~# iperf3 -s -B 192.168.2.20 > /dev/null &                                           
[1] 1055
root@am64xx-evm:~# iperf3 -s -B 192.168.3.20 > /dev/null &                                           
[2] 1056
root@am64xx-evm:~# iperf3: the client has terminated

root@am64xx-evm:~# mpstat -P ALL 1
Linux 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty (am64xx-evm)        03/02/24        _aarch64_       (2 C)

05:56:38     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:39     all    2.51    0.00   48.24    0.00    0.00   45.73    0.00    0.00    3.52
05:56:39       0    5.00    0.00   88.00    0.00    0.00    4.00    0.00    0.00    3.00
05:56:39       1    0.00    0.00    7.14    0.00    0.00   87.76    0.00    0.00    5.10

05:56:39     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:40     all    2.03    0.00   47.21    0.00    0.00   47.72    0.00    0.00    3.05
05:56:40       0    3.06    0.00   88.78    0.00    0.00    6.12    0.00    0.00    2.04
05:56:40       1    0.00    0.00    7.00    0.00    0.00   89.00    0.00    0.00    4.00

05:56:40     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:41     all    2.53    0.00   48.48    0.00    0.00   46.46    0.00    0.00    2.53
05:56:41       0    5.00    0.00   88.00    0.00    0.00    5.00    0.00    0.00    2.00
05:56:41       1    0.00    0.00    8.16    0.00    0.00   88.78    0.00    0.00    3.06

05:56:41     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:42     all    1.00    0.00   48.50    0.00    0.00   48.00    0.00    0.00    2.50
05:56:42       0    2.02    0.00   90.91    0.00    0.00    5.05    0.00    0.00    2.02
05:56:42       1    1.00    0.00    7.00    0.00    0.00   90.00    0.00    0.00    2.00

05:56:42     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:43     all    3.02    0.00   49.25    0.00    0.00   46.23    0.00    0.00    1.51
05:56:43       0    6.00    0.00   87.00    0.00    0.00    5.00    0.00    0.00    2.00
05:56:43       1    0.00    0.00   11.11    0.00    0.00   87.88    0.00    0.00    1.01

05:56:43     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:44     all    1.01    0.00   46.73    0.00    0.00   46.23    0.00    0.00    6.03
05:56:44       0    1.01    0.00   84.85    0.00    0.00    7.07    0.00    0.00    7.07
05:56:44       1    0.00    0.00    9.00    0.00    0.00   85.00    0.00    0.00    6.00

05:56:44     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:45     all    4.02    0.00   49.25    0.00    0.00   44.22    0.00    0.00    2.51
05:56:45       0    7.00    0.00   85.00    0.00    0.00    5.00    0.00    0.00    3.00
05:56:45       1    2.00    0.00   13.00    0.00    0.00   84.00    0.00    0.00    1.00

05:56:45     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:46     all    1.54    0.00   47.69    0.00    0.00   47.18    0.00    0.00    3.59
05:56:46       0    3.03    0.00   89.90    0.00    0.00    5.05    0.00    0.00    2.02
05:56:46       1    0.00    0.00    4.12    0.00    0.00   89.69    0.00    0.00    6.19

05:56:46     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:47     all    2.54    0.00   49.24    0.00    0.00   44.16    0.00    0.00    4.06
05:56:47       0    5.00    0.00   88.00    0.00    0.00    4.00    0.00    0.00    3.00
05:56:47       1    0.00    0.00    8.33    0.00    0.00   86.46    0.00    0.00    5.21

05:56:47     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:48     all    2.00    0.00   49.00    0.00    0.00   47.50    0.00    0.00    1.50
05:56:48       0    3.06    0.00   90.82    0.00    0.00    5.10    0.00    0.00    1.02
05:56:48       1    0.00    0.00    9.00    0.00    0.00   90.00    0.00    0.00    1.00

05:56:48     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:56:49     all    2.02    0.00   50.00    0.00    0.00   46.46    0.00    0.00    1.52
05:56:49       0    4.00    0.00   88.00    0.00    0.00    5.00    0.00    0.00    3.00
05:56:49       1    0.00    0.00   12.12    0.00    0.00   86.87    0.00    0.00    1.01
^C
root@am64xx-evm:~#

PRP, baseline iperf3 setup:

TMDS64EVM1 CPU load (iperf3 client) --> ~70% load

root@am64xx-evm:~# ifconfig prp0
prp0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1494
        inet 192.168.1.10  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::3608:e1ff:fe80:a7ad  prefixlen 64  scopeid 0x20<link>
        ether 34:08:e1:80:a7:ad  txqueuelen 1000  (Ethernet)
        RX packets 67  bytes 9605 (9.3 KiB)
        RX errors 0  dropped 8  overruns 0  frame 0
        TX packets 48  bytes 5065 (4.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@am64xx-evm:~# iperf3 -c 192.168.1.20 -t0 > /dev/null &                                         
[1] 1065
root@am64xx-evm:~# mpstat -P ALL 1                                                            
Linux 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty (am64xx-evm)        03/02/24        _aarch64_       (2 )

05:28:20     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:21     all    2.12    0.00   50.26    0.00    0.00   17.99    0.00    0.00   29.63
05:28:21       0    3.30    0.00   36.26    0.00    0.00   23.08    0.00    0.00   37.36
05:28:21       1    1.02    0.00   62.24    0.00    0.00   14.29    0.00    0.00   22.45

05:28:21     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:22     all    1.60    0.00   47.34    0.00    0.00   20.21    0.00    0.00   30.85
05:28:22       0    2.20    0.00   34.07    0.00    0.00   24.18    0.00    0.00   39.56
05:28:22       1    1.03    0.00   59.79    0.00    0.00   15.46    0.00    0.00   23.71

05:28:22     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:23     all    1.08    0.00   51.61    0.00    0.00   17.20    0.00    0.00   30.11
05:28:23       0    2.22    0.00   40.00    0.00    0.00   23.33    0.00    0.00   34.44
05:28:23       1    1.02    0.00   61.22    0.00    0.00   12.24    0.00    0.00   25.51

05:28:23     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:24     all    1.06    0.00   46.03    0.00    0.00   21.16    0.00    0.00   31.75
05:28:24       0    2.17    0.00   33.70    0.00    0.00   27.17    0.00    0.00   36.96
05:28:24       1    0.00    0.00   58.33    0.00    0.00   15.62    0.00    0.00   26.04

05:28:24     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:25     all    0.53    0.00   46.28    0.00    0.00   20.21    0.00    0.00   32.98
05:28:25       0    0.00    0.00   37.36    0.00    0.00   24.18    0.00    0.00   38.46
05:28:25       1    0.00    0.00   55.10    0.00    0.00   16.33    0.00    0.00   28.57

05:28:25     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:26     all    1.60    0.00   48.94    0.00    0.00   19.68    0.00    0.00   29.79
05:28:26       0    3.30    0.00   38.46    0.00    0.00   24.18    0.00    0.00   34.07
05:28:26       1    0.00    0.00   59.57    0.00    0.00   14.89    0.00    0.00   25.53

05:28:26     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:27     all    0.00    0.00   49.73    0.00    0.00   18.38    0.00    0.00   31.89
05:28:27       0    0.00    0.00   33.71    0.00    0.00   23.60    0.00    0.00   42.70
05:28:27       1    1.01    0.00   62.63    0.00    0.00   14.14    0.00    0.00   22.22

05:28:27     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:28     all    1.07    0.00   49.20    0.00    0.00   19.25    0.00    0.00   30.48
05:28:28       0    1.10    0.00   38.46    0.00    0.00   25.27    0.00    0.00   35.16
05:28:28       1    0.00    0.00   60.42    0.00    0.00   13.54    0.00    0.00   26.04

05:28:28     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:29     all    0.00    0.00   48.66    0.00    0.00   20.32    0.00    0.00   31.02
05:28:29       0    1.10    0.00   28.57    1.10    0.00   28.57    0.00    0.00   40.66
05:28:29       1    0.00    0.00   65.98    0.00    0.00   12.37    0.00    0.00   21.65

05:28:29     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:30     all    1.60    0.00   42.55    0.00    0.00   22.87    0.00    0.00   32.98
05:28:30       0    3.30    0.00   35.16    0.00    0.00   25.27    0.00    0.00   36.26
05:28:30       1    0.00    0.00   50.52    0.00    0.00   19.59    0.00    0.00   29.90

05:28:30     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:31     all    0.53    0.00   47.59    0.00    0.00   19.79    0.00    0.00   32.09
05:28:31       0    0.00    0.00   32.95    0.00    0.00   26.14    0.00    0.00   40.91
05:28:31       1    1.03    0.00   60.82    0.00    0.00   14.43    0.00    0.00   23.71

05:28:31     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:32     all    0.53    0.00   51.06    0.00    0.00   18.62    0.00    0.00   29.79
05:28:32       0    0.00    0.00   34.44    0.00    0.00   24.44    0.00    0.00   41.11
05:28:32       1    1.01    0.00   65.66    0.00    0.00   13.13    0.00    0.00   20.20

05:28:32     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:33     all    0.53    0.00   47.59    0.00    0.00   19.79    0.00    0.00   32.09
05:28:33       0    0.00    0.00   33.33    0.00    0.00   26.67    0.00    0.00   40.00
05:28:33       1    0.00    0.00   61.46    0.00    0.00   13.54    0.00    0.00   25.00

05:28:33     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:34     all    4.79    0.00   51.06    0.00    0.00   18.09    0.00    0.00   26.06
05:28:34       0    6.52    0.00   41.30    0.00    0.00   21.74    0.00    0.00   30.43
05:28:34       1    3.06    0.00   60.20    0.00    0.00   15.31    0.00    0.00   21.43

05:28:34     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:35     all    0.54    0.00   49.73    0.54    0.00   17.84    0.00    0.00   31.35
05:28:35       0    1.15    0.00   31.03    0.00    0.00   25.29    0.00    0.00   42.53
05:28:35       1    1.02    0.00   66.33    0.00    0.00   11.22    0.00    0.00   21.43

05:28:35     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:36     all    1.06    0.00   48.40    0.00    0.00   19.68    0.00    0.00   30.85
05:28:36       0    1.12    0.00   43.82    0.00    0.00   21.35    0.00    0.00   33.71
05:28:36       1    0.00    0.00   54.17    0.00    0.00   17.71    0.00    0.00   28.12
^C
root@am64xx-evm:~#

TMDS64EVM2 CPU load (iperf3 server) --> ~85% load

root@am64xx-evm:~# ifconfig prp0
prp0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1494
        inet 192.168.1.20  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::1e63:49ff:fe1a:d8aa  prefixlen 64  scopeid 0x20<link>
        ether 1c:63:49:1a:d8:aa  txqueuelen 1000  (Ethernet)
        RX packets 17  bytes 3088 (3.0 KiB)
        RX errors 0  dropped 26  overruns 0  frame 0
        TX packets 51  bytes 5456 (5.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@am64xx-evm:~# iperf3 -s > /dev/null &                                                    
[1] 1050
root@am64xx-evm:~# mpstat -P ALL 1                                                            
Linux 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty (am64xx-evm)        03/02/24        _aarch64_       (2 C)

05:17:06     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:07     all    1.08    0.00   26.34    0.00    0.00   45.70    0.00    0.00   26.88
05:17:07       0    2.15    0.00   50.54    0.00    0.00    4.30    0.00    0.00   43.01
05:17:07       1    0.00    0.00    1.08    0.00    0.00   87.10    0.00    0.00   11.83

05:17:07     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:08     all    1.57    0.00   28.80    0.52    0.00   40.31    0.00    0.00   28.80
05:17:08       0    3.06    0.00   52.04    0.00    0.00    3.06    0.00    0.00   41.84
05:17:08       1    0.00    0.00    4.30    0.00    0.00   80.65    0.00    0.00   15.05

05:17:08     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:09     all    1.04    0.00   27.60    0.00    0.00   41.67    0.00    0.00   29.69
05:17:09       0    2.08    0.00   51.04    0.00    0.00    2.08    0.00    0.00   44.79
05:17:09       1    0.00    0.00    4.26    0.00    0.00   81.91    0.00    0.00   13.83

05:17:09     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:10     all    2.12    0.00   28.57    0.00    0.00   39.15    0.00    0.00   30.16
05:17:10       0    3.03    0.00   52.53    0.00    0.00    3.03    0.00    0.00   41.41
05:17:10       1    0.00    0.00    3.30    0.00    0.00   79.12    0.00    0.00   17.58

05:17:10     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:11     all    1.05    0.00   25.79    0.00    0.00   42.11    0.00    0.00   31.05
05:17:11       0    2.08    0.00   48.96    0.00    0.00    3.12    0.00    0.00   45.83
05:17:11       1    0.00    0.00    2.11    0.00    0.00   81.05    0.00    0.00   16.84

05:17:11     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:12     all    1.57    0.00   27.23    0.00    0.00   41.88    0.00    0.00   29.32
05:17:12       0    3.12    0.00   51.04    0.00    0.00    2.08    0.00    0.00   43.75
05:17:12       1    0.00    0.00    3.16    0.00    0.00   82.11    0.00    0.00   14.74

05:17:12     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:13     all    2.08    0.00   27.60    0.00    0.00   41.67    0.00    0.00   28.65
05:17:13       0    3.12    0.00   51.04    0.00    0.00    3.12    0.00    0.00   42.71
05:17:13       1    1.08    0.00    3.23    0.00    0.00   81.72    0.00    0.00   13.98

05:17:13     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:14     all    1.05    0.00   26.32    0.00    0.00   42.63    0.00    0.00   30.00
05:17:14       0    2.06    0.00   50.52    0.00    0.00    4.12    0.00    0.00   43.30
05:17:14       1    0.00    0.00    1.06    0.00    0.00   82.98    0.00    0.00   15.96

05:17:14     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:15     all    1.60    0.00   28.72    0.00    0.00   40.96    0.00    0.00   28.72
05:17:15       0    3.09    0.00   51.55    0.00    0.00    3.09    0.00    0.00   42.27
05:17:15       1    0.00    0.00    4.44    0.00    0.00   81.11    0.00    0.00   14.44

05:17:15     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:16     all    1.07    0.00   25.67    0.00    0.00   41.18    0.00    0.00   32.09
05:17:16       0    2.06    0.00   48.45    0.00    0.00    3.09    0.00    0.00   46.39
05:17:16       1    0.00    0.00    2.17    0.00    0.00   81.52    0.00    0.00   16.30

05:17:16     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:17     all    1.57    0.00   27.75    0.00    0.00   42.41    0.00    0.00   28.27
05:17:17       0    3.12    0.00   52.08    0.00    0.00    2.08    0.00    0.00   42.71
05:17:17       1    0.00    0.00    3.12    0.00    0.00   82.29    0.00    0.00   14.58

05:17:17     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:18     all    1.05    0.00   26.32    0.00    0.00   41.58    0.00    0.00   31.05
05:17:18       0    2.06    0.00   49.48    0.00    0.00    3.09    0.00    0.00   45.36
05:17:18       1    1.09    0.00    2.17    0.00    0.00   81.52    0.00    0.00   15.22

05:17:18     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:19     all    4.69    0.00   28.65    0.00    0.00   41.67    0.00    0.00   25.00
05:17:19       0    7.29    0.00   51.04    0.00    0.00    3.12    0.00    0.00   38.54
05:17:19       1    1.05    0.00    5.26    0.00    0.00   81.05    0.00    0.00   12.63

05:17:19     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:20     all    0.53    0.00   26.60    0.00    0.00   41.49    0.00    0.00   31.38
05:17:20       0    2.08    0.00   50.00    0.00    0.00    2.08    0.00    0.00   45.83
05:17:20       1    0.00    0.00    2.15    0.00    0.00   81.72    0.00    0.00   16.13

05:17:20     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:21     all    2.11    0.00   26.32    0.00    0.00   42.63    0.00    0.00   28.95
05:17:21       0    3.19    0.00   50.00    0.00    0.00    3.19    0.00    0.00   43.62
05:17:21       1    0.00    0.00    3.16    0.00    0.00   83.16    0.00    0.00   13.68

05:17:21     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:22     all    1.05    0.00   24.61    0.00    0.00   39.79    0.00    0.00   34.55
05:17:22       0    2.06    0.00   46.39    0.00    0.00    2.06    0.00    0.00   49.48
05:17:22       1    0.00    0.00    2.11    0.00    0.00   77.89    0.00    0.00   20.00

05:17:22     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:23     all    1.62    0.00   26.49    0.00    0.00   41.62    0.00    0.00   30.27
05:17:23       0    3.26    0.00   50.00    0.00    0.00    2.17    0.00    0.00   44.57
05:17:23       1    1.08    0.00    4.30    0.00    0.00   79.57    0.00    0.00   15.05

05:17:23     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:24     all    1.05    0.00   26.32    0.00    0.00   41.58    0.00    0.00   31.05
05:17:24       0    2.02    0.00   48.48    0.00    0.00    4.04    0.00    0.00   45.45
05:17:24       1    0.00    0.00    1.10    0.00    0.00   82.42    0.00    0.00   16.48

05:17:24     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:17:25     all    2.12    0.00   25.93    0.00    0.00   42.33    0.00    0.00   29.63
05:17:25       0    4.26    0.00   48.94    0.00    0.00    3.19    0.00    0.00   43.62
05:17:25       1    0.00    0.00    3.19    0.00    0.00   81.91    0.00    0.00   14.89
^C
root@am64xx-evm:~#

PRP, FIFO priority 85 iperf3 setup:

TMDS64EVM1 CPU load (iperf3 client) --> ~70% load

root@am64xx-evm:~# chrt -f 85 iperf3 -c 192.168.1.20 -t0 > /dev/null &                              
[1] 1282
root@am64xx-evm:~# ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep iperf  
  1 FF  -86    1282    1282 00:00:00 iperf3
  1 FF  -86    1282    1283 00:00:33 iperf3
root@am64xx-evm:~# 
root@am64xx-evm:~# mpstat -P ALL 1                                                             
Linux 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty (am64xx-evm)        03/02/24        _aarch64_       (2 )

05:39:15     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:16     all    2.15    0.00   13.98    0.00    0.00   47.31    0.00    0.00   36.56
05:39:16       0    2.11    0.00    2.11    0.00    0.00   68.42    0.00    0.00   27.37
05:39:16       1    2.20    0.00   26.37    0.00    0.00   24.18    0.00    0.00   47.25

05:39:16     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:17     all    0.53    0.00   14.97    0.00    0.00   45.99    0.00    0.00   38.50
05:39:17       0    0.00    0.00    3.16    0.00    0.00   67.37    0.00    0.00   29.47
05:39:17       1    1.08    0.00   26.88    0.00    0.00   24.73    0.00    0.00   47.31

05:39:17     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:18     all    0.00    0.00   14.84    0.00    0.00   45.05    0.00    0.00   40.11
05:39:18       0    0.00    0.00    2.17    0.00    0.00   66.30    0.00    0.00   31.52
05:39:18       1    0.00    0.00   28.09    0.00    0.00   22.47    0.00    0.00   49.44

05:39:18     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:19     all    0.00    0.00   14.13    0.00    0.00   46.74    0.00    0.00   39.13
05:39:19       0    1.06    0.00    2.13    0.00    0.00   67.02    0.00    0.00   29.79
05:39:19       1    0.00    0.00   25.56    0.00    0.00   25.56    0.00    0.00   48.89

05:39:19     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:20     all    0.54    0.00   15.14    0.00    0.00   47.57    0.00    0.00   36.76
05:39:20       0    0.00    0.00    2.11    0.00    0.00   71.58    0.00    0.00   26.32
05:39:20       1    1.09    0.00   29.35    0.00    0.00   22.83    0.00    0.00   46.74

05:39:20     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:21     all    0.00    0.00   14.21    0.00    0.00   46.45    0.00    0.00   39.34
05:39:21       0    0.00    0.00    2.15    0.00    0.00   68.82    0.00    0.00   29.03
05:39:21       1    0.00    0.00   26.14    0.00    0.00   22.73    0.00    0.00   51.14

05:39:21     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:22     all    0.54    0.00   14.59    0.00    0.00   45.95    0.00    0.00   38.92
05:39:22       0    0.00    0.00    2.13    0.00    0.00   67.02    0.00    0.00   30.85
05:39:22       1    0.00    0.00   27.47    0.00    0.00   25.27    0.00    0.00   47.25

05:39:22     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:23     all    0.54    0.00   14.59    0.00    0.00   46.49    0.00    0.00   38.38
05:39:23       0    0.00    0.00    2.20    0.00    0.00   69.23    0.00    0.00   28.57
05:39:23       1    1.08    0.00   27.96    0.00    0.00   23.66    0.00    0.00   47.31

05:39:23     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:24     all    0.00    0.00   14.59    0.00    0.00   46.49    0.00    0.00   38.92
05:39:24       0    0.00    0.00    2.11    0.00    0.00   67.37    0.00    0.00   30.53
05:39:24       1    0.00    0.00   27.47    0.00    0.00   25.27    0.00    0.00   47.25

05:39:24     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:25     all    0.56    0.00   14.44    0.00    0.00   43.33    0.00    0.00   41.67
05:39:25       0    1.09    0.00    2.17    0.00    0.00   65.22    0.00    0.00   31.52
05:39:25       1    1.14    0.00   27.27    0.00    0.00   19.32    0.00    0.00   52.27

05:39:25     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:26     all    0.00    0.00   14.75    0.00    0.00   47.54    0.00    0.00   37.70
05:39:26       0    0.00    0.00    2.13    0.00    0.00   68.09    0.00    0.00   29.79
05:39:26       1    0.00    0.00   28.09    0.00    0.00   25.84    0.00    0.00   46.07

05:39:26     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:27     all    0.54    0.00   14.67    0.00    0.00   46.20    0.00    0.00   38.59
05:39:27       0    0.00    0.00    1.06    0.00    0.00   68.09    0.00    0.00   30.85
05:39:27       1    1.10    0.00   27.47    0.00    0.00   24.18    0.00    0.00   47.25

05:39:27     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:28     all    3.83    0.00   19.67    0.00    0.00   45.90    0.00    0.00   30.60
05:39:28       0    2.15    0.00    6.45    0.00    0.00   68.82    0.00    0.00   22.58
05:39:28       1    4.49    0.00   34.83    0.00    0.00   22.47    0.00    0.00   38.20

05:39:28     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:29     all    0.54    0.00   15.59    0.00    0.00   46.24    0.00    0.00   37.63
05:39:29       0    0.00    0.00    1.10    0.00    0.00   69.23    0.00    0.00   29.67
05:39:29       1    1.08    0.00   29.03    0.00    0.00   23.66    0.00    0.00   46.24

05:39:29     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:39:30     all    0.00    0.00   13.66    0.00    0.00   45.36    0.00    0.00   40.98
05:39:30       0    1.03    0.00    3.09    0.00    0.00   64.95    0.00    0.00   30.93
05:39:30       1    0.00    0.00   25.84    0.00    0.00   22.47    0.00    0.00   51.69
^C
root@am64xx-evm:~#

TMDS64EVM2 CPU load (iperf3 server) --> ~89% load

root@am64xx-evm:~# chrt -f 85 iperf3 -s > /dev/null &                                                
[1] 1245
root@am64xx-evm:~# ps -ALo psr,policy,priority,pid,tid,cputime,comm | grep iperf 
  0 FF  -86    1245    1245 00:00:00 iperf3
  1 FF  -86    1245    1251 00:01:19 iperf3
root@am64xx-evm:~# 
root@am64xx-evm:~# mpstat -P ALL 1                                                              
Linux 6.6.32-rt32-ti-rt-g04a9ad081f0f-dirty (am64xx-evm)        03/02/24        _aarch64_       (2 C)

05:27:58     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:27:59     all    1.05    0.00   20.94    0.00    0.00   43.46    0.00    0.00   34.55
05:27:59       0    0.00    0.00    1.06    0.00    0.00   84.04    0.00    0.00   14.89
05:27:59       1    2.08    0.00   39.58    0.00    0.00    3.12    0.00    0.00   55.21

05:27:59     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:00     all    0.53    0.00   19.68    0.00    0.00   42.02    0.00    0.00   37.77
05:28:00       0    0.00    0.00    2.11    0.00    0.00   81.05    0.00    0.00   16.84
05:28:00       1    1.05    0.00   37.89    0.00    0.00    3.16    0.00    0.00   57.89

05:28:00     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:01     all    1.06    0.00   19.05    0.00    0.00   44.44    0.00    0.00   35.45
05:28:01       0    0.00    0.00    1.05    0.00    0.00   84.21    0.00    0.00   14.74
05:28:01       1    2.20    0.00   37.36    0.00    0.00    3.30    0.00    0.00   57.14

05:28:01     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:02     all    0.53    0.00   20.11    0.00    0.00   44.44    0.00    0.00   34.92
05:28:02       0    0.00    0.00    2.06    0.00    0.00   83.51    0.00    0.00   14.43
05:28:02       1    2.11    0.00   38.95    0.00    0.00    3.16    0.00    0.00   55.79

05:28:02     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:03     all    1.04    0.00   20.21    0.00    0.00   44.04    0.00    0.00   34.72
05:28:03       0    0.00    0.00    1.03    0.00    0.00   84.54    0.00    0.00   14.43
05:28:03       1    2.08    0.00   38.54    0.00    0.00    4.17    0.00    0.00   55.21

05:28:03     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:04     all    4.71    0.00   20.42    0.00    0.00   42.93    0.00    0.00   31.94
05:28:04       0    2.11    0.00    2.11    0.00    0.00   82.11    0.00    0.00   13.68
05:28:04       1    7.37    0.00   40.00    0.00    0.00    3.16    0.00    0.00   49.47

05:28:04     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:05     all    1.06    0.00   20.74    0.00    0.00   43.09    0.00    0.00   35.11
05:28:05       0    0.00    0.00    2.11    0.00    0.00   83.16    0.00    0.00   14.74
05:28:05       1    2.13    0.00   39.36    0.00    0.00    3.19    0.00    0.00   55.32

05:28:05     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:06     all    1.06    0.00   18.62    0.00    0.00   43.62    0.00    0.00   36.70
05:28:06       0    0.00    0.00    2.11    0.00    0.00   82.11    0.00    0.00   15.79
05:28:06       1    2.15    0.00   35.48    0.00    0.00    4.30    0.00    0.00   58.06

05:28:06     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:07     all    1.05    0.00   20.42    0.00    0.00   43.98    0.00    0.00   34.55
05:28:07       0    0.00    0.00    2.06    0.00    0.00   83.51    0.00    0.00   14.43
05:28:07       1    2.13    0.00   39.36    0.00    0.00    2.13    0.00    0.00   56.38

05:28:07     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:08     all    0.52    0.00   20.94    0.00    0.00   43.98    0.00    0.00   34.55
05:28:08       0    0.00    0.00    2.11    0.00    0.00   84.21    0.00    0.00   13.68
05:28:08       1    1.05    0.00   40.00    0.00    0.00    4.21    0.00    0.00   54.74

05:28:08     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:09     all    0.53    0.00   20.11    0.00    0.00   44.44    0.00    0.00   34.92
05:28:09       0    0.00    0.00    1.03    0.00    0.00   84.54    0.00    0.00   14.43
05:28:09       1    1.06    0.00   39.36    0.00    0.00    3.19    0.00    0.00   56.38

05:28:09     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:10     all    1.05    0.00   19.37    0.00    0.00   45.03    0.00    0.00   34.55
05:28:10       0    0.00    0.00    2.06    0.00    0.00   84.54    0.00    0.00   13.40
05:28:10       1    1.10    0.00   37.36    0.00    0.00    3.30    0.00    0.00   58.24

05:28:10     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:11     all    1.06    0.00   20.11    0.00    0.00   43.39    0.00    0.00   35.45
05:28:11       0    0.00    0.00    2.08    0.00    0.00   83.33    0.00    0.00   14.58
05:28:11       1    2.13    0.00   39.36    0.00    0.00    3.19    0.00    0.00   55.32

05:28:11     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:12     all    0.54    0.00   17.74    0.00    0.00   46.24    0.00    0.00   35.48
05:28:12       0    0.00    0.00    1.03    0.00    0.00   85.57    0.00    0.00   13.40
05:28:12       1    2.22    0.00   35.56    0.00    0.00    3.33    0.00    0.00   58.89

05:28:12     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:13     all    1.06    0.00   18.52    0.00    0.00   43.92    0.00    0.00   36.51
05:28:13       0    0.00    0.00    1.04    0.00    0.00   83.33    0.00    0.00   15.62
05:28:13       1    2.15    0.00   36.56    0.00    0.00    3.23    0.00    0.00   58.06

05:28:13     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:14     all    1.05    0.00   20.00    0.00    0.00   43.68    0.00    0.00   35.26
05:28:14       0    0.00    0.00    2.06    0.00    0.00   82.47    0.00    0.00   15.46
05:28:14       1    2.13    0.00   38.30    0.00    0.00    3.19    0.00    0.00   56.38

05:28:14     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:15     all    0.53    0.00   18.95    0.00    0.00   45.26    0.00    0.00   35.26
05:28:15       0    0.00    0.00    1.03    0.00    0.00   85.57    0.00    0.00   13.40
05:28:15       1    1.09    0.00   38.04    0.00    0.00    3.26    0.00    0.00   57.61

05:28:15     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:16     all    1.06    0.00   19.05    0.00    0.00   42.86    0.00    0.00   37.04
05:28:16       0    0.00    0.00    2.08    0.00    0.00   81.25    0.00    0.00   16.67
05:28:16       1    2.17    0.00   36.96    0.00    0.00    2.17    0.00    0.00   58.70

05:28:16     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:17     all    1.05    0.00   19.90    0.00    0.00   42.93    0.00    0.00   36.13
05:28:17       0    0.00    0.00    2.08    0.00    0.00   81.25    0.00    0.00   16.67
05:28:17       1    2.11    0.00   37.89    0.00    0.00    4.21    0.00    0.00   55.79

05:28:17     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:18     all    1.06    0.00   20.21    0.00    0.00   44.68    0.00    0.00   34.04
05:28:18       0    0.00    0.00    1.04    0.00    0.00   85.42    0.00    0.00   13.54
05:28:18       1    1.10    0.00   39.56    0.00    0.00    2.20    0.00    0.00   57.14

05:28:18     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:28:19     all    4.21    0.00   24.74    0.00    0.00   45.26    0.00    0.00   25.79
05:28:19       0    1.02    0.00    3.06    0.00    0.00   84.69    0.00    0.00   11.22
05:28:19       1    7.45    0.00   47.87    0.00    0.00    4.26    0.00    0.00   40.43
^C
root@am64xx-evm:~#

Takeshi OHNO said:
You should be able to confirm that CPU usage is at or above 90%, the same as ours.

For the case of no PRP setup, just running iperf3 I can see the above 90% load as well. However, for some reason CPU load is less (~80%) for when PRP is setup with iperf3 running.

I'm reaching out internally about why this behavior is showing up.

-Daolin

0 Takeshi OHNO 1 month ago in reply to Daolin Qiu

Prodigy 10 points

Hi Daolin,

Thank you for sending us your CPU usage measurement data.
Your measurement results are almost the same as ours.
We have concluded the following about the usage of two cores in non-offload PRP/iperf3 execution:
(1) The data receiving side (server machine) is a throughput bottleneck because the usage of two cores is almost at the upper limit.
(2) The usage of two cores of the user thread at that time is only a few percent, so the kernel threads PRP processing and reception processing are almost using up two cores.

We also measured the throughput and CPU usage of off-load HSR/iperf3.

Our measurement environment is as follows.
- Two TMDS64GSEVM were used as evaluation boards.
The two ports (eth1, eth2) were directly connected with an Ethernet cable as shown in the attached photo.
- SDK 09.02.01.09 was used.
- eth1 is connected to CPSW3g by default, so we changed it to PRU-ICSSG by making the following settings.
software-dl.ti.com/.../PRU_ICSSG_Ethernet.html
- The following Linux settings were used:
software-dl.ti.com/.../HSR_Offload.html

Our measurement results for this environment are as follows:
- iperf3 throughput was 450Mbps. This is about 1.5 times better than Non-Off-load PRP.
- CPU usage at that time was about 80%. CPU usage was 10% lower than Non-Off-load PRP

Our thoughts on this result are as follows:
- It can be said that the throughput of Off-load PRP is almost equal to the measurement result of Off-load HSR.
The reason is that there is no difference between ring topology and tree topology when two evaluation boards are directly connected.
In other words, the process of relaying frames addressed to other machines, which is specific to ring topology, does not occur when two boards are directly connected.
- The reason why the throughput is 450Mbps is that the PRU-ICSSG that executes Off-load HSR is expected to be the bottleneck.
To improve further, it is necessary to review the programming of Off-load HSR and increase the processing efficiency.
- The CPU usage of Off-load HSR/iperf3 should be logically the same as the CPU usage of iperf3 only, which does not use HSR, because HSR is processed by PRU-ICSSG.
In other words, the CPU usage of Off-load HSR/iperf3 should be the same as the CPU usage (45%) when the throughput is 450Mbps and only iperf3 is used without HSR.
However, we predict that the reason it is not the same is because the data transfer from PRU-ICSSG to the CPU is not a DMA transfer, but a transfer using the CPU.

Please point out any errors or misunderstandings in our above thoughts.
We would also like you to confirm whether Off-load HSR/iperf3 produces the same results in your environment.

Regards,
Takeshi OHNO

0 Daolin Qiu 1 month ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hi Takeshi,

Thanks for sharing your analysis on PRP and HSR performance.

Takeshi OHNO said:
- It can be said that the throughput of Off-load PRP is almost equal to the measurement result of Off-load HSR.
The reason is that there is no difference between ring topology and tree topology when two evaluation boards are directly connected.
In other words, the process of relaying frames addressed to other machines, which is specific to ring topology, does not occur when two boards are directly connected.

Takeshi OHNO said:
- CPU usage at that time was about 80%. CPU usage was 10% lower than Non-Off-load PRP

In this setup where only 2 devices are directly connected, direct from source to destination without a forwarding/bridging device, logically, there would be no need for the HSR switching offload which is the main difference in the offloading features between HSR and PRP. The other offloading of duplicate insert, duplicate removal, etc would be similar between HSR and PRP and this may have contributed to the ~10% lower CPU load seen on HSR offload vs non-offload PRP.

This CPU load of about 80% is similar to what I have previously seen testing with 2-devices.

However, if you tested HSR switching offload (on a 3-device setup), you would see the CPU load (checking the load on the bridge device) to go from about 80% load (HSR without offload) to about 0-1% load (HSR with offload). This shows the main benefit of offloading lies in the switch offload. This is why we previously said the effects of PRP offload would not be as great as HSR offload (see quote below).

Daolin Qiu said:
Takeshi OHNO said:
PRP processing can be almost completely covered by dup-offload and tag-in-offload on the sender, and tag-rm-offload on the receiver,
so the effect of off-loading is likely to be grea

We currently don't have data on the specific CPU load improvements of specifically dup-offload, tag-in-offload, and tag-rm-offload but the effects should be present but won't be as great as the forwarding-offload that would be seen in an HSR setup.

I don't know if I can fully agree with the below statement because the ~450Mbps throughput is also observed when both Ethernet interfaces (without any HSR or PRP setup) have traffic passing through with iperf3. I think this throughput is due to having to share a 1Gbps line rate between the two Ethernet interfaces that traffic is passing through (whether the two interfaces configured as HSR/PRP or just a standard 2 Ethernet interfaces passing traffic). In other words, it highly unlikely to get a better throughput than ~450Mbps as long as the two Ethernet interfaces are used at the same time.

Takeshi OHNO said:
- The reason why the throughput is 450Mbps is that the PRU-ICSSG that executes Off-load HSR is expected to be the bottleneck.
To improve further, it is necessary to review the programming of Off-load HSR and increase the processing efficiency.

>>> In other words, the CPU usage of Off-load HSR/iperf3 should be the same as the CPU usage (45%) when the throughput is 450Mbps and only iperf3 is used without HSR.
However, we predict that the reason it is not the same is because the data transfer from PRU-ICSSG to the CPU is not a DMA transfer, but a transfer using the CPU.

Actually, what I saw was that the CPU load when only iperf3 is used (no HSR or PRP) is close to 90% (based on the results in my previous reply) - this is when both Ethernet ports/interfaces are in use. When running PRP or HSR the CPU load is actually less. PRP nonoffload at around 80% and HSR offload at around 80% when just two devices (but 0-1% when measuring the switching/bridge device in 3 device setup).

Are you saying you see 45% CPU load when only using iperf3?

>>>We would also like you to confirm whether Off-load HSR/iperf3 produces the same results in your environment.

3-device HSR offload - around 0-1% load on forwarding/bridge device

root@am64xx-evm:~# mpstat -P ALL 1
Linux 6.1.80-rt26-ti-rt-g3c08dbfd7bfd (am64xx-evm)      05/19/22        _aarch64_       (2 CPU)

12:10:26     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:10:27     all    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.50
12:10:27       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:27       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

12:10:27     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:10:28     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:28       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:28       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

12:10:28     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:10:29     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:29       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:29       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

12:10:29     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:10:30     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:30       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:30       1    0.00    0.00    0.99    0.00    0.00    0.99    0.00    0.00   98.02

12:10:30     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:10:31     all    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.50
12:10:31       0    0.00    0.00    0.99    0.00    0.00    0.00    0.00    0.00   99.01
12:10:31       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

12:10:31     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:10:32     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:32       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:32       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

12:10:32     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:10:33     all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:33       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:33       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

12:10:33     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:10:34     all    0.00    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.50
12:10:34       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:10:34       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

2-device HSR offload - around 80% load

EVM1:

root@am64xx-evm:~# mpstat -P ALL 1
Linux 6.1.80-rt26-ti-rt-g3c08dbfd7bfd (am64xx-evm)      04/30/22        _aarch64_       (2 CPU)
 
03:27:49     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:27:50     all    0.55    0.00   53.59    0.00    0.00   19.89    0.00    0.00   25.97
03:27:50       0    1.02    0.00   62.24    0.00    0.00   16.33    0.00    0.00   20.41
03:27:50       1    0.00    0.00   44.05    0.00    0.00   22.62    0.00    0.00   33.33
 
03:27:50     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:27:51     all    0.00    0.00   54.19    0.00    0.00   18.44    0.00    0.00   27.37
03:27:51       0    0.00    0.00   59.38    0.00    0.00   17.71    0.00    0.00   22.92
03:27:51       1    0.00    0.00   48.19    0.00    0.00   20.48    0.00    0.00   31.33
 
03:27:51     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:27:52     all    0.56    0.00   52.51    0.00    0.00   19.55    0.00    0.00   27.37
03:27:52       0    1.02    0.00   60.20    0.00    0.00   16.33    0.00    0.00   22.45
03:27:52       1    1.22    0.00   42.68    0.00    0.00   23.17    0.00    0.00   32.93
 
03:27:52     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:27:53     all    2.70    0.00   54.59    0.00    0.00   20.00    0.00    0.00   22.70
03:27:53       0    1.03    0.00   61.86    0.00    0.00   17.53    0.00    0.00   19.59
03:27:53       1    4.65    0.00   46.51    0.00    0.00   22.09    0.00    0.00   26.74
 
03:27:53     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:27:54     all    0.57    0.00   52.84    0.00    0.00   21.02    0.00    0.00   25.57
03:27:54       0    0.00    0.00   64.29    0.00    0.00   18.37    0.00    0.00   17.35
03:27:54       1    0.00    0.00   38.46    0.00    0.00   25.64    0.00    0.00   35.90

EVM2:

root@am64xx-evm:~# mpstat -P ALL 1
Linux 6.1.80-rt26-ti-rt-g3c08dbfd7bfd (am64xx-evm)      04/30/22        _aarch64_       (2 CPU)
 
20:14:03     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
20:14:04     all    1.03    0.00   35.57    0.00    0.00   42.27    0.00    0.00   21.13
20:14:04       0    2.02    0.00   63.64    0.00    0.00   19.19    0.00    0.00   15.15
20:14:04       1    0.00    0.00    6.32    0.00    0.00   66.32    0.00    0.00   27.37
 
20:14:04     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
20:14:05     all    2.07    0.00   36.79    0.00    0.00   41.97    0.00    0.00   19.17
20:14:05       0    3.03    0.00   62.63    0.00    0.00   21.21    0.00    0.00   13.13
20:14:05       1    1.06    0.00    8.51    0.00    0.00   63.83    0.00    0.00   26.60
 
20:14:05     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
20:14:06     all    0.52    0.00   34.02    0.00    0.00   42.27    0.00    0.00   23.20
20:14:06       0    2.06    0.00   62.89    0.00    0.00   18.56    0.00    0.00   16.49
20:14:06       1    0.00    0.00    6.19    0.00    0.00   64.95    0.00    0.00   28.87
 
20:14:06     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
20:14:07     all    1.05    0.00   36.13    0.00    0.00   42.93    0.00    0.00   19.90
20:14:07       0    1.02    0.00   64.29    0.00    0.00   20.41    0.00    0.00   14.29
20:14:07       1    0.00    0.00    5.43    0.00    0.00   67.39    0.00    0.00   27.17

-Daolin

0 Daolin Qiu 1 month ago in reply to Daolin Qiu

TI__Expert 8175 points

Hello Takeshi,

I want to correct my previous comment on the below. Since offloaded HSR is executed based on PRU_ICSSG rather than CPSW, the physical limitation of the CPSW ports with a single internal host port should not be applied to PRU_ICSSG.

My speculation is that the throughput observed is likely due to the processing power of the cores to support the duplicated data streams. I'm not sure whether this is a bottleneck on the A53 cores or PRU because the results for the A53 cores is ~80% load so not quite reaching >90% load.

Daolin Qiu said:
I don't know if I can fully agree with the below statement because the ~450Mbps throughput is also observed when both Ethernet interfaces (without any HSR or PRP setup) have traffic passing through with iperf3. I think this throughput is due to having to share a 1Gbps line rate between the two Ethernet interfaces that traffic is passing through (whether the two interfaces configured as HSR/PRP or just a standard 2 Ethernet interfaces passing traffic). In other words, it highly unlikely to get a better throughput than ~450Mbps as long as the two Ethernet interfaces are used at the same time.

Takeshi OHNO said:
- The reason why the throughput is 450Mbps is that the PRU-ICSSG that executes Off-load HSR is expected to be the bottleneck.
To improve further, it is necessary to review the programming of Off-load HSR and increase the processing efficiency.

-Daolin

0 Takeshi OHNO 1 month ago in reply to Daolin Qiu

Prodigy 10 points

Hi Daolin,

Thank you for sending us your HSR-offlad measurement data.

Daolin Qiu said:
In this setup where only 2 devices are directly connected, direct from source to destination without a forwarding/bridging device, logically, there would be no need for the HSR switching offload which is the main difference in the offloading features between HSR and PRP. The other offloading of duplicate insert, duplicate removal, etc would be similar between HSR and PRP and this may have contributed to the ~10% lower CPU load seen on HSR offload vs non-offload PRP.

From your explanation above, I understand that we share the same idea about the difference between HSR and PRP, and the difference between off-load PRP/HSR and Non-Offload PRP/HSR.
However, there is no logical explanation for the high CPU usage of the receiving machine at 80% during HSR offload.
The reason is that on the receiving machine during HSR-offload, the three processes tag-rm-offload, fwd-offload, and dup-offload should not be performed by the CPU, but by the PRU-ICSSG chip.
Since these processes include the majority of HSR processes, we expect that the CPU usage when using HSR-offload will theoretically be almost equal to the CPU usage when not using HSR.

When two TMDS64GSEVMs are connected to each other, the iperf3 throughput when not using HSR/PRP is about 900Mbps, and the CPU usage of the receiving machine at that time is about 90%.
Also, if you limit the iperf3 throughput to 450Mbps by using the -b, --bandwidth option, the CPU usage drops to roughly half.
Please confirm that you get similar results in your environment.

From these measurement results, we expect that since the iperf3 throughput when using HSR-offload is 450Mbps, the CPU usage of the receiving machine will logically be equal to the aforementioned 450Mbps CPU usage.
We do not consider the fact that the throughput is limited to 450Mbps when using HSR-offload to be a problem.
On the other hand, we do consider the fact that the CPU usage when using HSR-offload is high at 80%, which is more than 30% off the theoretical value.
We would like you to consider the cause, even if it is just a hypothesis.
For example, the CPU of the receiving machine is used to transfer received data from the PRU-ICSSG chip to the CPU.

We aim to use your AM6442 to maximize its capabilities.
Thank you for your cooperation.

Regards,
Takeshi OHNO

0 Daolin Qiu 1 month ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hello Takeshi,

Takeshi OHNO said:
When two TMDS64GSEVMs are connected to each other, the iperf3 throughput when not using HSR/PRP is about 900Mbps, and the CPU usage of the receiving machine at that time is about 90%.

Was this 900Mbps the result of just one Ethernet ports/interface passing iperf3 traffic or both Ethernet interfaces/ports passing traffic at the same time? I expect to see 900Mbps when only one Ethernet port/interface is passing traffic but not the case of both ports running.

When I ran with both Ethernet interfaces/ports (PRU_ICSSG ethernet ports) passing traffic at the same time I see about 400Mbps throughput. This is just with iperf3 (no PRP or HSR configuration). The load on the client EVM was about 80% and the load on the server EVM was about 90%.

Takeshi OHNO said:
However, there is no logical explanation for the high CPU usage of the receiving machine at 80% during HSR offload.
The reason is that on the receiving machine during HSR-offload, the three processes tag-rm-offload, fwd-offload, and dup-offload should not be performed by the CPU, but by the PRU-ICSSG chip.
Since these processes include the majority of HSR processes, we expect that the CPU usage when using HSR-offload will theoretically be almost equal to the CPU usage when not using HSR.

I believe the reason this ~80% load seen on the source and destination devices in the HSR loop is because of iperf3 running and not because of the HSR processes. For instance, if the test was with 3-EVMs, one EVM as the source/iperf3 client, one EVM as the destination/iperf3 server, and one EVM to forward the traffic from the source to the destination, the load on the forwarding EVM is about 2% close to the case of CPU usage when not running HSR. All 3 EVMs would be running the offloaded HSR, only difference is that the source and destination EVMs are running iperf3 while the forwarding EVM is not running iperf3.

This also aligns with my results from running iperf3 (with no HSR or PRP) with both Ethernet ports in use. The CPU load is also about 80% and only iperf3 is main application running.

-Daolin

0 Daolin Qiu 1 month ago

TI__Expert 8175 points

Hello Takeshi,

Takeshi OHNO said:
Is the PRP throughput of 300Mbps for the SK-AM64B reasonable?

I got feedback from our internal development team. We never had any benchmark claims for a certain throughput performance with our PRP support. For this reason and the fact that your results and my results return about 300Mbps throughput for PRP (currently nonoffload only is available), it could be that PRP is limited to about 300Mbps throughput.

>>>Because we require end-to-end RealTime communication, we chose PRP, which keeps communications latency constant even when routes are switched.

We currently don't have any plans to improve this PRP throughput and I see that you have been investigating HSR offload performance. From our interactions, it appears that at least a throughput of 400Mbps is one of your requirements. Your previous comment, quoted above also states your main reason for choosing PRP is because of better latency for real-time communication. Is there a specific threshold of latency that needs to be met? The reason why I ask is that we need to assess between the below options to determine if which feature/solution we should focus on implementing to address your use case.

1. Can HSR offload with some additional optimizations meet the latency requirements you need

2. If HSR offload cannot meet the latency requirements, and PRP must be used, which below two options to pursue

A. Improving PRP base throughput

B. Assess whether PRP offloaded will achieve a better throughput

Daolin Qiu said:
PRP offload support is currently planned for 1H 2025.

Additionally, I wanted to clarify this statement I made. It is most likely PRP offload support is planned to be released for RTOS. Generally, for true real-time requirements, using RTOS is a much better solution. Are you planning on implementing PRP (or HSR) using RTOS or in with RT-Linux?

-Daolin

0 Takeshi OHNO 1 month ago in reply to Daolin Qiu

Prodigy 10 points

Hi Daolin,

Sorry for the late reply. I was sick and was hospitalized for a week, but I'm recovered now.

Daolin Qiu said:
Was this 900Mbps the result of just one Ethernet ports/interface passing iperf3 traffic or both Ethernet interfaces/ports passing traffic at the same time?

This 900Mbps is the throughput when using only one Ethernet port/interface.

Daolin Qiu said:
I expect to see 900Mbps when only one Ethernet port/interface is passing traffic but not the case of both ports running.

When I ran with both Ethernet interfaces/ports (PRU_ICSSG ethernet ports) passing traffic at the same time I see about 400Mbps throughput. This is just with iperf3 (no PRP or HSR configuration). The load on the client EVM was about 80% and the load on the server EVM was about 90%.

We also believe that the above results are correct.
Regardless of whether one or two Ethernet ports are used, we predict that the total receive data throughput of the server EVM (the data receiving machine) will be limited to 900 Mbps.
The reason is that the server EVM's CPU utilization is almost at 90%, which is the upper limit for receiving data at 900 Mbps.
Therefore, if the receive data throughput on the server EVM is adjusted to 450 Mbps, the server EVM's CPU utilization will be almost 50% or less.
In this way, there is a roughly proportional relationship between the receive data throughput and the server EVM's CPU utilization.

Daolin Qiu said:
I believe the reason this ~80% load seen on the source and destination devices in the HSR loop is because of iperf3 running and not because of the HSR processes.

From my explanation above, on the server EVM when using HSR offload, the CPU utilization required for 450Mbps iperf3 should theoretically be less than 50%.
However, the actual CPU utilization is 30% higher than the theoretical value. Please discuss the cause with the designers and implementers of HSR offload.

Daolin Qiu said:
This also aligns with my results from running iperf3 (with no HSR or PRP) with both Ethernet ports in use. The CPU load is also about 80% and only iperf3 is main application running.

Is your explanation above correct?
When using HSR offload, HSR processing is performed by the PRU_ICSSG chip, so even if two Ethernet ports each receive at 450Mbps, duplicate frames should be discarded by the PRU_ICSSG chip.
Therefore, the server EVM's CPU should only need to perform 450Mbps receive processing.

We think your explanation above is insufficient as to why the CPU utilization is 80%.
We are concerned about whether there are any problems with the implementation of HSR offload.
Because the same thing can be said about PRP offload, which is scheduled to be available in 2025.
We also have a request.
When you explain HSR offload, please use the same ring configuration with two EVMs as ours as an example to simplify the explanation.
This is because HSR processing in a ring configuration with two EVMs is almost equivalent to PRP processing.

Regards,
Takeshi OHNO

0 Takeshi OHNO 1 month ago in reply to Daolin Qiu

Prodigy 10 points

Hi Daolin,

Daolin Qiu said:
We never had any benchmark claims for a certain throughput performance with our PRP support. For this reason and the fact that your results and my results return about 300Mbps throughput for PRP (currently nonoffload only is available), it could be that PRP is limited to about 300Mbps throughput.

We decided that PRP Non-off-load would not meet our requirements for the following reasons.
-While using a 1Gbps network, we could only use a maximum of 30%.
Since data is output to both routes, we would like the throughput to be more than 50% of the network bandwidth.
-Furthermore, at that time the AM6442 CPU usage was 90%, and there was no spare CPU capacity.
We would like to keep the CPU usage for communications processing to 20% or less, and use the remaining 80% for application processing.

Therefore, if we use PRP Non-Off-load, we would need to limit the throughput to 60Mbps, which is 20% of 300Mbps.
This would not meet our requirements.

Daolin Qiu said:
Is there a specific threshold of latency that needs to be met? The reason why I ask is that we need to assess between the below options to determine if which feature/solution we should focus on implementing to address your use case.

Furthermore, the problem with HSR is not the latency of real-time communication itself, but the large fluctuations in latency due to failures in the communication path.
In other words, the latency of real-time communication can be a few tens of milliseconds, but we would like the fluctuations in latency to be kept to within 10% of the latency.

For these two points, the only implementation that could meet our requirements was PRP Off-load.

Daolin Qiu said:
It is most likely PRP offload support is planned to be released for RTOS. Generally, for true real-time requirements, using RTOS is a much better solution. Are you planning on implementing PRP (or HSR) using RTOS or in with RT-Linux?

We chose RT-Linux instead of RTOS.
Therefore, we strongly hope to provide PRP Off-load running on RT-Linux as soon as possible.

Regards,
Takeshi OHNO

0 Paula Carrillo 1 month ago in reply to Takeshi OHNO

TI__Mastermind 34910 points

Takeshi-san, Daolin is currently in a business trip plus some time off. Response might be delayed. Thanks for your understanding

Paula

0 Pekka Varis 1 month ago in reply to Paula Carrillo

TI__Mastermind 24900 points

https://www.ti.com/product/DRA821U device has 4 Ethernet ports and enough compute power for that amount of ports and RT-Linux.

Pekka

0 Takeshi OHNO 25 days ago in reply to Pekka Varis

Prodigy 10 points

Hi Pekka,

Thank you for the interesting information about the DRA821U.
It meets our temperature conditions and Ethernet port number requirements, and the CPU performance is more than twice that of the AM6442.
When was the DRA821U released?
Is it possible to get an evaluation board for the DRA821U?

Regards,
Takeshi OHNO

0 Pekka Varis 25 days ago in reply to Takeshi OHNO

TI__Mastermind 24900 points

Yes https://www.ti.com/product/DRA821U is in volume production, evaluation boards are available https://www.ti.com/product/DRA821U#design-development . It sampled in 2022 and has been in production since 2023.

Pekka

0 Takeshi OHNO 24 days ago in reply to Pekka Varis

Prodigy 10 points

Hi Pekka,

Thank you for your reply.
I would like to ask the hardware specialist why they did not choose DRA821U.
In any case, for us, options other than AM6442 are for the future.
For now, it is essential to find the optimal solution using AM6442 and RT-Linux.
I look forward to Daolin-san's reply.

Regards,
Takeshi OHNO

0 Pekka Varis 24 days ago in reply to Takeshi OHNO

TI__Mastermind 24900 points

Takeshi OHNO said:
For now, it is essential to find the optimal solution using AM6442 and RT-Linux.
I look forward to Daolin-san's reply.

Takeshi OHNO said:
We chose RT-Linux instead of RTOS.
Therefore, we strongly hope to provide PRP Off-load running on RT-Linux as soon as possible.

Currently there is no committed schedule for PRP offload in Linux for AM64x. This is not something Daolin can change. MCU+ with RTOS plans to include PRP in 1Q2025. Writing Linux drivers, upstreaming , and solving all dependencies around this offload is not something TI has plans for in 2025.

Pekka

0 Daolin Qiu 11 days ago in reply to Takeshi OHNO

TI__Expert 8175 points

Hi Takeshi,

Thank you for your patience while I was out of office.

Takeshi OHNO said:
Daolin Qiu said:
This also aligns with my results from running iperf3 (with no HSR or PRP) with both Ethernet ports in use. The CPU load is also about 80% and only iperf3 is main application running.

Is your explanation above correct?
When using HSR offload, HSR processing is performed by the PRU_ICSSG chip, so even if two Ethernet ports each receive at 450Mbps, duplicate frames should be discarded by the PRU_ICSSG chip.
Therefore, the server EVM's CPU should only need to perform 450Mbps receive processing.

We think your explanation above is insufficient as to why the CPU utilization is 80%.
We are concerned about whether there are any problems with the implementation of HSR offload.
Because the same thing can be said about PRP offload, which is scheduled to be available in 2025.

The point I was trying to make was that the 80% load is likely not related to any HSR processing. Whether HSR is configured or not, the CPU load of the client/server remains about 80% load. This means the EVM's CPU cores are taking about that much load to process the ~400Mbps transmit/receive processing. For this reason, I do not think the 80% load has to do with the HSR offload implementation, just simply the load you would see for processing the Ethernet packets.

In an oversimplified explanation, in terms of receive path, the Ethernet packets are received in the FIFO buffers, each of which have descriptors which need to be processed using the CPU cores. A similar processing scheme, but in the opposite direction would be used for the transmit path. This happens for the client/server device regardless of whether HSR offload is configured or not.

Takeshi OHNO said:
For now, it is essential to find the optimal solution using AM6442 and RT-Linux.
I look forward to Daolin-san's reply.

Takeshi OHNO said:
Furthermore, the problem with HSR is not the latency of real-time communication itself, but the large fluctuations in latency due to failures in the communication path.
In other words, the latency of real-time communication can be a few tens of milliseconds, but we would like the fluctuations in latency to be kept to within 10% of the latency.

Pekka Varis said:
Currently there is no committed schedule for PRP offload in Linux for AM64x.

To put it simply, PRP offload or PRP nonoffload improvement is not currently planned for Linux. So, for an AM6442 and RT-Linux solution, the more likely option we can try to move towards is HSR. With HSR, I suspect the fluctuations in latency also have to do with the number of devices configured in the topology and the difference in number of devices on both the redundant paths in the HSR ring. Do you have an idea to the numbers of devices that will be supported for your end device? Or will this be an arbitrary number?

-Daolin

0 PratheeshGangadhar 9 days ago in reply to Daolin Qiu

TI__Mastermind 44561 points

Daolin Qiu said:
PRP offload or PRP nonoffload improvement is not currently planned for Linux

I would like to clarify that PRP offload to ICSSG on RT-Linux is part of our roadmap and something we are tracking to address during 2Q 25. This will be functionally similar to HSR offload when it comes to TX duplication and RX duplicate discard.

Regards

Pratheesh

0 Takeshi OHNO 6 days ago in reply to PratheeshGangadhar

Prodigy 10 points

We appreciate your consideration of our request.
We would appreciate it if you could let us know when the RT-Linux version of PRP Offload will be provided to us as soon as it is decided.

Regards
Takeshi

+1 PratheeshGangadhar 6 days ago in reply to Takeshi OHNO

TI__Mastermind 44561 points

Takeshi OHNO said:
if you could let us know when the RT-Linux version of PRP Offload will be provided to us as soon as it is decided.

As per the current alignment - Linux kernel version will be v6.12

Processors

Processors forum

AM6442: PRP throughput on Linux