This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM4378: PRU-ETH performance

Genius 12760 points
Part Number: AM4378

Customer AM4378 board is currently running a 4.9 Linux kernel with TI adaptations and the PREEMPT_RT patch.
We would like to migrate to a newer version. This board is connected to ethernet via the pru.

With the 4.9 kernel, ethernet performance is about 11 MBytes/second in both directions.
With the newer ti-rt-linux-5.4.y kernel, the AM4378 only uploads about 3 MBytes/second.  Download are as expected 11 MBytes/second.

The 5.10 rt kernel would also be an option for us, but when I last checked, the 5.10 rt kernel was not officially released and prueth didn't work at all.

# Software versions

## Kernel 5.4.106+git
KERNEL_GIT_URI = "git://git.ti.com/ti-linux-kernel/ti-linux-kernel.git"
BRANCH = "ti-rt-linux-5.4.y"
SRCREV = "519667b0d81d74a6e55105dcd6072ae550352599"

## prueth-fw 5.6.15-r0 (am437x-pru1-prueth-fw.elf) SRC_URI = "git://git.ti.com/processor-firmware/ti-linux-firmware.git"
BRANCH = "ti-linux-firmware"
SRCREV = "11fecaf08eeed27f2a834c9911edb8a5fb2a23b1"


AM437x download:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec   112 MBytes  11.2 MBytes/sec
receiver
-----------------------------------------------------------

AM437x upload:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  34.0 MBytes  3.40 MBytes/sec    0
sender
-----------------------------------------------------------


Is there a fix for this issue or is support for kernel 5.10 planed?

Regards, Bernd

  • Hello Bernd,

    Apologies for the delayed response. Generic PRUETH will be supported in the AM437x Linux kernel 5.10 release. At this point, I have not tested the download and upload speeds.

    Could you post the instructions you used for your tests so I can verify?

    Regards,

    Nick

  • Hello Nick

    I'm the person who did the measurements. The measurements were done with iperf3.
    Start the server like this:

    iperf3 -4 -s -p 12345 -f M

    Then start the client and do the measurements like this:

    iperf3 -c 192.168.0.1 -p 12345 -4 -f M

    Adding '-R' will send/receive in the reverse direction.
    It doesn't matter if you start the server or the client on the AM437x board.

    Question:
    From your response, it is implied that Generic PRUETH is not supported in the AM437x Linux kernel 5.4 release. Is that correct?

    Thanks for your help

    Regards, Daniel

  • Hello Daniel,

    We descoped some PRU networking functionality in AM437x Linux kernel 5.4 release (HSR/PRP with PRU offload & the userspace ICSS-EMAC driver. See https://software-dl.ti.com/processor-sdk-linux/esd/AM437X/07_03_00_005/exports/docs/devices/AM437X/linux/Release_Specific_Release_Notes.html#release-07-03-00 ). However, generic PRUETH is supported in the AM437x Linux kernel 5.4 release.

    Factors like ARM core frequency and power settings, overall processing capability, how many other threads are using the ARM core, etc will impact the rate at which a processor can generate packets to send over iperf/iperf3. With that said, I do observe a significant decrease in throughput when AM437x is generating the packets on Linux kernel 4.9 and Linux kernel 5.4. I have not had time yet to run tests on the kernel 5.10 release or look at why there might be differences between those kernel releases.

    Note that I used iperf instead of iperf3 since iperf3 was not on the default filesystem for AM437x Processor SDK 4.3.

    Setup: AM64x EVM connected directly to AM437x IDK PRU Ethernet port with a CAT6 cable. AM437x is running the out-of-the-box filesystem.

    When AM64x processor was generating the Ethernet packets for iperf3, I saw a bitrate of 94 Mbits/sec (i.e., line rate) regardless of the version of Linux running on the AM437x. That is expected.

    SDK 7.3 (Linux kernel 5.4). When AM437x was generating the Ethernet packets for iperf3, I saw a bitrate of ~27 Mbits/sec.

    root@am437x-evm:~# uname -a
    Linux am437x-evm 5.4.106-g023faefa70 #1 PREEMPT Fri Jul 2 11:03:51 UTC 2021 armv7l armv7l armv7l GNU/Linux
    
    root@am437x-evm:~# iperf -c 192.168.2.160 -p 12345
    ------------------------------------------------------------
    Client connecting to 192.168.2.160, TCP port 12345
    TCP window size: 83.8 KByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.2.141 port 41836 connected with 192.168.2.160 port 12345
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  32.6 MBytes  27.3 Mbits/sec
    
    root@am437x-evm:~# iperf -s -p 12345
    ------------------------------------------------------------
    Server listening on TCP port 12345
    TCP window size:  128 KByte (default)
    ------------------------------------------------------------
    [  4] local 192.168.2.141 port 12345 connected with 192.168.2.160 port 60608
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.1 sec   113 MBytes  94.0 Mbits/sec
    

    SDK 4.3 (Linux kernel 4.9). When AM437x was generating the Ethernet packets for iperf3, I saw a bitrate of ~88 Mbits/sec.

    root@am437x-evm:~# uname -a
    Linux am437x-evm 4.9.69-g9ce43c71ae #1 PREEMPT Mon Mar 26 12:08:26 EDT 2018 armv7l GNU/Linux
    
    root@am437x-evm:~# iperf -c 192.168.2.160 -p 12345
    ------------------------------------------------------------
    Client connecting to 192.168.2.160, TCP port 12345
    TCP window size: 43.8 KByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.2.141 port 35008 connected with 192.168.2.160 port 12345
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec   105 MBytes  88.3 Mbits/sec
    
    root@am437x-evm:~# iperf -s -p 12345
    ------------------------------------------------------------
    Server listening on TCP port 12345
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    [  4] local 192.168.2.141 port 12345 connected with 192.168.2.160 port 60606
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.1 sec   113 MBytes  94.2 Mbits/sec

    Regards,

    Nick

  • Hello Nick

    Thanks for looking into this. Is there going to be any further development on
    the 5.4 kernel release?

    Regards, Daniel

  • Hello Daniel,

    Apologies for the delayed response. I have confirmed that AM335x Processor SDK 8.2 (kernel 5.10) has similar performance to kernel 5.4. Unfortunately the core development for Processor SDK 8.2 has already been completed, so the official TI release will still have reduced performance.

    However, the developers will take a look at addressing this in the next AM335x SDK release (at this point in time, it looks like the next AM335x SDK release will also be on kernel 5.10). If they are able to find a fix, we can publish the patch here before the next release is out.

    The developers will be focused on kernel 5.10 instead of kernel 5.4, so the patch would be for kernel 5.10. It does not sound like the team made a bunch of changes to the PRU Ethernet drivers between kernel 5.4 and kernel 5.10 however, so I would expect that the kernel 5.10 patches could be back ported to kernel 5.4.

    Regards,

    Nick

  • Hello Nick,

    Thanks for the info. Since 5.10 is going to be used with Yocto 4.0 Kirkstone and Kirkstone is an LTS release, it would be good if it could be fixed for it since it will be used for quite a while.

    Regards, Daniel

  • Hello yall,

    Unlocking this thread so we can continue the conversation.

    It sounds like using zero copy mode leads to improved performance, but your application does not support zero copy. is that correct? Are there any other details you can provide?

    I am passing along to the developers that we zero copy is not an acceptable workaround, and we need to continue working on the issue.

    Regards,

    Nick

  • Hello Nick,

    Yes, that is correct. If I run iperf3 with the `-Z` flag, the performance on the wire is as expected. But unfortunately, our applications don't support zero-copy so we have to use the "normal" mode. I'm using iperf3 because it's an easy way to measure the bandwidth.

    I ran the tests again with a 5.10 kernel with the following versions:
    - ti-rt-linux-5.10.y branch commit 44a4e68ecf519fd2e35417371fbac546c416d2d9
    - prueth-fw 5.1.4-r0.0

    What kind of information can I provide to assist? Are you able to reproduce the problem on an am437x evk?

    Regards, Daniel

  • Hello Nick,

    Do you have an update?

    Kind regards, Daniel

  • Hello Daniel,

    Thank you for the ping. We are able to replicate your observations. I have not heard back from the developers yet, will ping them again and see if I can give yall a timeframe for when you can expect more information from us.

    Regards,

    Nick

  • Hello Daniel,

    The dev team is looking for additional information about your usecase. I assume you are not using iperf3 in your application, but some other Ethernet software or commands? Can you share those with us, either here or over email via Bernd?

    Thanks,

    Nick

  • Hello Nick,
    We are using the following:
    - nodejs 14.17.1
    - swupdate 2021.11
    - mosquitto 2.0.12
    - openssh 8.7p1

    Most important in terms of bandwidth are the first two.

    But I want to point out that I first discovered the issue with ssh. Then I tested with iperf3, and also nodejs. To me it seems that the issue is not application-dependent, as I have the same behavior with all of them.

    Do you need detailed steps to reproduce the problem?

    Regards, Daniel

  • Hello Daniel,

    Great, thank you. We have replicated with iperf3, but it might be helpful to see the commands you are using with openssh and nodejs to make sure that any patches actually address the problem for your usecase.

    Regards,

    Nick

  • Hello Daniel,

    The developer has started bisecting the patches between Linux 4.19 and Linux 5.4 to look for the culprit. I do not have a timeframe for when we can expect another update. I am going on vacation for the rest of August, but my manager will be watching your thread. Feel free to ping if you need an update.

    Regards,

    Nick

  • Hello Nick,

    Thank you for the update. I will post concrete steps how to reproduce later today.

    Regards, Daniel

  • Hello Nick,

    To test this is a test for downloading from node.js:
    I mentioned before that we're using node.js. You can test it like this:
    - Install node.js (we're using 14.17.1) on the am437x board.
    - Put fileserver.js (see below) somewhere.
    - Create a file with random bytes to download:
    dd if=/dev/urandom of=random.dat bs=1M count=256
    - Start node.js:
    node fileserver.js
    - On another computer, use wget to measure download speed:
    wget 192.168.0.1:9000/random.dat
    - Check the speed, it will look like this:

    $ wget 192.168.0.1:9000/random.dat
    --2022-08-18 08:05:08-- 192.168.0.1:9000/random.dat
    Connecting to 192.168.0.1:9000... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 268435456 (256M) [text/plain]
    Saving to: ‘random.dat’

    random.dat 100%[======================>] 256.00M 3.51MB/s in 72s

    2022-08-18 08:06:25 (3.53 MB/s) - ‘random.dat’ saved [268435456/268435456]

    - With the 4.9 kernel, it looks like this:

    2022-08-18 08:14:47 (11.0 MB/s) - ‘random.dat’ saved [268435456/268435456]

    Regards, Daniel


    // fileserver.js
    const http = require('http');
    const url = require('url');
    const fs = require('fs');
    const path = require('path');
    const port = process.argv[2] || 9000;
    http.createServer(function (req, res) {
    console.log(`${req.method} ${req.url}`);
    const parsedurl = url.parse(req.url);
    let pathname = `.${parsedurl.pathname}`;
    fs.exists(pathname, function (exist) {
    if(!exist) { res.statusCode = 404; res.end(`file not found: ${pathname} `); return; }
    fs.readFile(pathname, function(err, data) {
    if(err) { res.statusCode = 500;
    res.end(`can't read file: ${err}.`);
    } else { res.setHeader('Content-type', 'application/octet-stream' );
    res.end(data); } }); }); }).listen(parseInt(port));
    console.log(`listening port: ${port}`);

  • Hi Daniel, the subject owner is currently out of the office but will be back next week. He should then be able to update the status of the developer testing etc. Regards, Andreas

  • Daniel,

    testing on our end has shown that the speed drop is caused by a regression in the PRU Ethernet firmware, and not the Kernel. Can you please try am437x-pru0-prueth-fw.elf and am437x-pru1-prueth-fw.elf from the v6.03 SDK release (also attached) and report back here,. The files should go into 

    /lib/firmware/ti-pruss/



    am437x-prueth-firmwre-sdk-6.03.zip

    Regards, Andreas

  • Hello Andreas,

    It doesn't work at all. Even though the new firmware seem to be loaded ok, there is just no traffic coming through. Against which kernel version is this firmware supposed to run? I tried ti-linux

    - 5.10.109-rt65 (08.03.00.003-rt), version used for my original tests
    - 5.10.131-rt72 (08.04.01.003-rt), latest version available in git

    Are there any changes needed in the device tree?

    Kind regards, Daniel

  • Hello Daniel,

    Test updates

    I have verified both your observations and the developer's observations on AM335x (should behave the same as PRUETH on AM437x), with a link partner of AM64x.

    SDK 8.2 running firmware from SDK 8.2: Link comes up, able to ping, TX throughput for iperf3 ~29Mbits/sec
    SDK 8.2 running firmware from SDK 7.3: Link comes up, not able to ping
    SDK 8.2 running firmware from SDK 6.3: Link comes up, not able to ping

    SDK 7.3 running firmware from SDK 8.2: Link comes up, able to ping, TX throughput for iperf3 ~26Mbits/sec
    SDK 7.3 running firmware from SDK 7.3: Link comes up, able to ping, TX throughput for iperf3 ~27Mbits/sec
    SDK 7.3 running firmware from SDK 6.3: Link comes up, able to ping, TX throughput for iperf3 ~ 90Mbits/sec

    SDK 6.3 running firmware from SDK 8.2: Link comes up, able to ping (long ping times ~1-15ms), TX throughput for iperf3 ~24Mbits/sec
    SDK 6.3 running firmware from SDK 7.3: Link comes up, able to ping (long ping times ~1-10ms), TX throughput for iperf3 ~30Mbits/sec
    SDK 6.3 running firmware from SDK 6.3: Link comes up, able to ping (long ping times ~1-15ms), TX throughput for iperf3 ~87Mbits/sec

    Developer updates 

    I am still waiting to get input from the firmware developers, I have pinged them again. The Linux developer is travelling without access to hardware for the next week or so. I am still figuring out our plan while he is away from his boards.

    Double-checking on your project needs 

    At this point, which kernel version should I make sure the developers are focusing on? Any other specific system needs we should keep in mind?

    Regards,

    Nick

  • Hello Nick,

    Thanks for the update. We are focusing on kernel 5.10-rt.

    I did my last tests with 08.03.00.003-rt (44a4e68ecf519fd2e35417371fbac546c416d2d9).
    This is branch `ti-rt-linux-5.10.y` from git.ti.com/.../.
    I guess this is a little bit newer than SDK 8.2.

    Regards, Daniel

  • Hello Daniel,

    Understood. I will ask the team to focus on getting SDK 8.2 working then, since that is closest to your development branch.

    Status update: The PRU firmware developers gave us a summary of changes between the different SDK releases, and the Linux developer is back this week. They are starting to look at how the interaction between the PRU firmware and the Linux driver changed between firmware releases to see if there is something simple we can set from the Linux side.

    I will provide another update next week. Please ping the thread if I have not provided an update by September 28.

    Regards,

    Nick

  • Hello Daniel,

    This week's update: We need more information and debug from the firmware developers. Last week the firmware developers were busy getting the MCU+ SDKs out for AM24x / AM62x / AM64x, and this week they are on vacation. I am getting them and the Linux developer on the phone so they can talk to each other late this week or early next week, and will provide another update next week.

    Regards,

    Nick

  • Hello Daniel,

    Thank you for your patience. This week the firmware developers are finally working on the request. I am syncing with them every day to get updates and provide support where I can. No major updates yet (the past couple days they have been focused on getting detailed benchmarks with different packet sizes on different software releases). I will give another update here as soon as I have news of forward progress, or sometime next week.

    Regards,

    Nick

  • Hello Daniel,

    We are making good progress! It looks like the issue was caused by extra error prevention code that was added to the PRU firmware.

    (more assembly instructions means it takes longer to execute all the assembly instructions. That means it takes longer to send Ethernet packets, which means Ethernet throughput is reduced).

    The developer has found a way to rewrite that firmware code to do the same error checking with fewer assembly instructions. The updated firmware seems to be running at SDK 6.3 throughput levels. The developer is still validating the firmware this week and next week, so this is still "draft" firmware. I will send Andreas PRU firmware binaries for you to test by the end of the day today.

    Summary of changes to allow testing 

    Linux driver & kernel: no changes needed

    PRU firmware: provided as a prebuilt binary. Just replace the existing PRU firmware in the filesystem at 
    lib/firmware/ti-pruss
    (if you want to keep the original binaries for testing, feel free to rename, like "am335x-pru0-prueth-fw.elf.orig"

    Regards,

    Nick

  • Hello Daniel,

    I received the following feedback offline:

    I did some tests and have some observations:
    - Right after boot, the performance is still degraded (tested with iperf3).
    - After some time, the performance goes to normal. Some times after less than
       a minute, sometimes after much more than that.
    - It is not clear to me yet how to trigger it.

    Note that transmitting packets is pretty resource intensive for the Linux ARM core. If throughput is not as high immediately after boot, it is possible that Linux is still working on setting up other parts of the system.

    You can test by setting up multiple connections to the EVM and checking processor usage with top. When the TX performance is not as high, is the Linux core getting consumed by other processes? I've attached some sample test setups below.

    Example setup 1: static IP address on AM437x board used in loading filesystem over NFS 

    Linux PC terminal 1: connected to /dev/ttyUSBx connection as usual

    Boot board. board has a static IP address assigned to CPSW port in uboot (e.g., 192.168.1.140), and a PRUETH port connected to Linux PC ethernet port 192.168.2.100.

    once the EVM login prompt appears on Linux terminal 1, connect over SSH in Linux PC terminal 2:
    ssh root@192.168.1.140

    and then use Linux terminal 3 for running commands on the Linux PC itself.

    now we can run experiments:
    terminal 1: # top
    terminal 3: $ iperf3 -4 -s -p 12345 -f M
    terminal 2: # ifconfig eth1 192.168.2.140
    terminal 2: # iperf3 -c 192.168.2.100 -p 12345 -4 -f M

    Example setup 2: not using NFS 

    In this case, we need to assign an IP address to the CPSW port before we can connect to it.

    Linux PC terminal 1: connected to /dev/ttyUSBx connection as usual

    CPSW port connected to Linux PC port 192.168.1.100, PRUETH port connected to Linux PC port 192.168.2.100.

    Once the EVM login prompt appears, login and set up the Ethernet ports:

    terminal 1: # ifconfig eth0 192.168.1.140
    terminal 1: # ifconfig eth1 192.168.2.140

    Now we can continue the testing as in example setup 1:

    terminal 2: $ ssh root@192.168.1.140
    terminal 3: $ iperf3 -4 -s -p 12345 -f M
    terminal 1: # top
    terminal 2: # iperf3 -c 192.168.2.100 -p 12345 -4 -f M

    Regards,

    Nick

  • Hello Nick

    You are right, there was a process using an excessive amount of cpu time degrading performance. After it is done, the performance is 10.5 MBytes/sec as expected.

    Kind regards, Daniel

  • Hello Daniel,

    Ok, good to know. If you run into any other issues while testing, please let me know. The developer should complete validation on their side by the end of this week. Once they finish validating, I will publish the AM437x PRUETH binaries publicly for SDK 7.3 and 8.2 and link to them here.

    Regards,

    Nick

  • Hello Nick,

    Do you have an update? I'd be interested to test the official release.

    Kind regards,

    Daniel

  • Hello Daniel,

    I have not gotten updates from the firmware developers yet. I am checking in with them again, thank you for the ping.

    Regards,

    Nick

  • Hello Nick,

    Do you have any news?

    Regards, Daniel

  • Hello Daniel,

    I have pinged the developers again. The firmware we provided to you has passed all the tests that the developer who modified it and I have put it through, but I also want the rest of the team to sign off that there is not any additional testing we want to put it through before calling the firmware "production ready".

    Regards,

    Nick

  • Hello Daniel,

    Apologies for the delayed responses here. The team says the AM437x PRU Ethernet firmware we sent you is good to go.

    Regards,

    Nick

  • Hi Nick

    Thanks for the feedback.

    Regards, Daniel

  • Pinging to keep this thread on my TODO list (I'm making sure this gets resolved for future AM335x & AM437x SDK releases). No action needed on your side

  • Last update that I'll put on this thread.

    Future readers,

    We will include the updated AM437x firmware that has equivalent throughput to Linux SDK 6.3 in the AM437x Linux SDK 9.1 later this year. The team is still looking into the AM335x firmware to see if we can apply a similar fix. If you want an update, feel free to create a new thread and ask for an update based on this thread.

    Regards,

    Nick