This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3356: PRU Ethernet "frame" errors

Part Number: AM3356
Other Parts Discussed in Thread: DP83822I, AM3358

We are using the PRUs on the AM3356 for two additional Ethernet interfaces with DP83822i PHYs. The CPU runs at 300MHz and our maximum UDP throughput without any errors using the PRU Ethernet interfaces is ~40Mbit/s; above 40Mbit/s we see increasingly more lost packets in direction to the DUT. Overclocking the CPU leads to slightly better results so we suspect that the CPU is the bottleneck here.

However, we are confused by the "frame" value of ifconfig:

root@DUT:~# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX  
          inet addr:192.168.XXX.XXX  Bcast:192.168.XXX.XXX  Mask:255.255.255.0
          inet6 addr: fe80::a0e8:11ff:feac:47d6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:821369 errors:0 dropped:0 overruns:0 frame:425
          TX packets:189814 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1219881292 (1.1 GiB)  TX bytes:279577183 (266.6 MiB)

This value is increasing only if we increase the datarate, e.g. iperf3 [...] -b80M. Here is some additional information:

root@DUT:~# ethtool -S eth2
NIC statistics:
     txBcast: 1
     txMcast: 29
     txUcast: 189823
     txOctets: 280342003
     rxBcast: 74
     rxMcast: 0
     rxUcast: 821329
     rxOctets: 1223170114
     tx64byte: 23
     tx65_127byte: 1540
     tx128_255byte: 764
     tx256_511byte: 90
     tx512_1023byte: 3
     tx1024byte: 187433
     rx64byte: 68
     rx65_127byte: 2414
     rx128_255byte: 331
     rx256_511byte: 41
     rx512_1023byte: 0
     rx1024byte: 818549
     lateColl: 0
     singleColl: 0
     multiColl: 0
     excessColl: 0
     rxMisAlignmentFrames: 0
     stormPrevCounter: 0
     macRxError: 0
     SFDError: 0
     defTx: 0
     macTxError: 0
     rxOverSizedFrames: 0
     rxUnderSizedFrames: 0
     rxCRCFrames: 0
     droppedPackets: 8
     txHWQOverFlow: 0
     txHWQUnderFlow: 0
     emacMulticastDropped: 4
     emacVlanDropped: 0

root@DUT:~# mii -i eth2 -d 2
MII-Address 0x02
 00   01   02   03   04   05   06   07   08   09   0A   0B   0C   0D   0E   0F
3100 786D 2000 A240 01E1 C5E1 000D 2001 4806 0000 0100 100B 0000 0000 0000 0000 
 10   11   12   13   14   15   16   17   18   19   1A   1B   1C   1D   1E   1F
4615 0108 0000 0000 0000 0000 0100 0049 0400 8C22 0000 007D 05EE 0000 0102 0000

We are worried that ifconfig's "frame" value may indicate a HW error. How can we exclude a hardware issue here?

  • Hi,

    I will to discuss internally with fellow team members. The MAC statistics are not showing any errors which where you would expect to find HW errors. 

    Best Regards,

    Schuyler

  • Hi Schuyler,

    Thank you for the fast reply. Do you already have an update from your colleagues?

    Best regards,
    Christoph

  • Hi Christoph,

    Sorry for the delayed response.

    We are looking into the details you mentioned.

    Can you please confirm Which SDK version and Kernel version you are using.

    Also can you please confirm that you have all recent patches in Firmware.

    Regards,

    Mohan

  • Hi Mohan,

    We are using Linux kernel version is 4.19.59. Assuming that you are referring to the PRU firmware, prueth-fw package version is 5.2.7-r0.0. I do not know how to find out the Processor SDK version. We are at meta-processor-sdk commit 0d3220f2aa26 and meta-ti commit a65f0a338d8f.

    Best regards,
    Christoph

  • Hi Mohan,

    Do you need any more information?

    Best regards,

    Christoph

     

  • Hi Christoph,

    Sorry for the delayed response.

    Since ethtool -S does not show any errors, one possibility is kernel is seeing pkt_info.length is either <=0 or >= max ethernet packet length(ndevstats->rx_length_errors).

    Ideally when this type of error is detected the corresponding frame will be dropped.

    Also frame error is looks like accumulation of error stats.


    Regards,

    Mohan

  • Hi Mohan,

    Thank you for the explanation on how the symptoms arise. However, we would like to understand the root cause. Is it CPU load related or could it be a hardware problem?

    Best regards,
    Christoph

  • Hi Christoph,

    Wanted to get few more error status at FW for better analysis.

    Can you please try to print MII_RT error registers(RXERR0 & RXERR1) when a frame error has occurred and kindly share the log.

    Regards,

    Mohan.

  • Hi Mohan,

    I am not sure which register exactly you want me to read: In the PRU ICSS documentation available to us, only Bit 0 of register MII_RT (0x2C) is documented/not reserved. With and without frame errors, address 0x4a30002c reads 0x00000014.

    Best regards,
    Christoph

  • Hi Christoph,

    Sorry for the delayed response.

    The address which you read was pointing to DRAM.

    MII_RT register space will start from 0x4A332000 address.

    Can you please try to get the RXERR0 and RXERR1 from 0x4A332050 and 0x4A332054 addresses respectively.

    Regards,

    Mohan.

  • Hi Mohan,

    both registers read 0x0 with and without frame errors.

    Best regards,
    Christoph

  • Hi Christoph,

    Apologies for the delayed response.


    We haven't seen this issue in our testing so far. We are trying to reproduce the issue at our end.

    We have multiple fixes in firmware and wanted you to check weather it is reproducible with the latest firmware.

    Also can you please explain the test procedure which you are following to reproduce.

    You can find the latest FW from git commit id 63238bb7aef515dce407a6b3f016ac3d46771198 of pdk master branch.

    Please check with the latest fw and let us know the results, Meanwhile we will try at our end .

    Regards,
    Mohan

  • Hi Mohan,

    I am confused: icss_emac_ver.h on commit 63238bb7aef says "icss_emac Driver Revision: 01.00.00.16" whereas our current Yocto recipe references commit 5978212c594 of git://git.ti.com/keystone-rtos/icss-emac.git and defines PV = "01.00.00.17". This suggests that we are actually using a more recent version than the one you suggest. Can you clarify?

    Best regards,
    Christoph

  • Hi Christoph,

    I can see that you are pointing to icss-emac repo and i am pointing to pdk repo.

    I am trying to find differences between these repos and versions. I will get back to you by Monday with more information.

    Regards,

    Mohan

  • Hi Christoph,

    Sorry for the delayed response


    Below is the information which i have

    I checked the code base at 5978212c594 commit id. There is a Release tag for this commit as "DEV.ICSS-EMAC_LLD.01.00.00.17" and it says that version is 01.00.00.17 and you are pointing to this as a present version number.

    I checked the version number in icss_emac_ver.h file and it says 01.00.00.16 only. 

    I checked the icss_emac_ver.h file from the pdk repo commit id(63238bb7aef515dce407a6b3f016ac3d46771198) which i have provided earlier and it is 01.00.00.16 only.

    I am checking internally with the team to get more details on the release tag. I will keep you posted.

    regards,

    Mohan

  • Hi Mohan,

    Thank you for the update.

    Looking through your other replies, I noticed that we still owe you the test procedure:

    1. On the AM3356 DUT, start an iperf3 server (we use version 3.6):
      iperf3 -s -B <PRU interface IP>
    2. On the test host (some PC), start an iperf3 (we use version 3.7) test:
      iperf3 -u -B <interface connected to AM3356 PRU interface> -c <PRU interface IP> -t 10 -b40M

    If we run with more than 40Mbit/s, e.g. "-b60M" in step 2, packets are lost.

    Best regards,
    Christoph

  • Hello Christoph,

    Apologies for the delayed response. I will try to replicate your observations on am AM335x ICE board over the weekend.

    Regards,

    Nick

  • Hello Christoph,

    I am able to observe ifconfig's frame count going up when following your test steps, using Linux SDK 6.3 (kernel 4.19.94) and the packaged firmware. I have not yet tried the updated firmware Mohan mentioned above. Will give another update tomorrow.

    Regards,

    Nick

  • Do you need additional information to be able to access that updated firmware?

    Regards,

    Nick

  • Hello Christoph,

    There might be multiple elements at play here. Let's separate them out:

    1) UDP throughput upper limit (regardless of whether there are frame errors):

    The ARM running at 300MHz could be a limiting factor here. You mentioned that you have at least two Ethernet ports (both PRU Ethernet). Do you also have CPSW ports? How much data are you trying to move across all ports? Is there anything else fighting for processor time?

    Using TCP as an example: on the AM335x SDK 7.3 coming out in a couple of weeks, TCP is about 93 Mbits/sec for a single PRU Ethernet server when the ARM core is running at 800MHz. However, when I limit the ARM core to 300MHz, TCP is about 45 Mbits/sec. On AM335x SDK 6.3 release, TCP is about the same as the SDK 7.3 results when the ARM core is running at 800MHz, but lower than the SDK 7.3 results when the ARM core is running at 300MHz.

    2) Frame errors:

    I observe UDP frame errors with PRU Ethernet on SDK 6.3, but not CPSW Ethernet. Even when I set UDP to 100M and reduce the ARM clock frequency to 300MHz, UDB packets will start getting dropped, but no frame errors are reported for CPSW.

    I have not yet dug into potential changes to reduce UDP dropped packets (e.g., adjusting the UDP buffer size).

    Regards,

    Nick

  • Hi Nick,

    I tried with the firmware version Mohan suggested and even with the pdk.git master branch head. I replaced /lib/firmware/ti-pruss/am335x-pru0/1-prueth-fw.elf but none of the two did work: A link was detected but no packets would pass through. Maybe there is more to it than replacing those two files?

    Yes, we are using an additional CPSW interface but we made sure that no CPU intensive task was running in parallel. Your TCP results point to a similar throughput limitation at 300MHz on your side.

    TCP is about the same as the SDK 7.3 results when the ARM core is running at 800MHz, but lower than the SDK 7.3 results when the ARM core is running at 300MHz.

    What is your maximum UDP throughput on SDK 6.3?

    It is reassuring that you see similar results on your hardware. We have to discuss this internally, but I am confident that we can accept this performance if you see a similar limitation for UDP on SDK 6.3 @300MHz CPU.

    Best regards,
    Christoph

  • Hello Christoph,

    Brief update: for SDK 6.3, I am seeing UDP throughput around 92 Mbits/sec with AM3358 CPU governor set to userspace, scaling_setspeed = 800000 (800MHz), and UDP throughput around 10 Mbits/sec with AM335x CPU governor set to userspace, scaling_setspeed = 300000 (300MHz). On SDK 7.3, I am seeing UDP throughput around 94 Mbits/sec with AM3358 CPU governor set to userspace, scaling_setspeed = 800000 (800MHz), and UDP throughput around 20 Mbits/sec with AM335x CPU governor set to userspace, scaling_setspeed = 300000 (300MHz). 

    That uses test iperf3 -s <AM335x IP address> and iperf3 -c <AM335x IP address> -t 10 -u -b 100M

    When talking with the others, the fact that ethtool -S does not report the frame errors that we see in ifconfig makes us suspect those frames are getting lost somewhere in the software stack rather than at the hardware level.

    I started digging into potentially optimizing network performance, but I have not made much progress at this point in time. So far, increasing buffer sizes for RX and TX at the kernel level has not yielded any big changes in results. (increasing rmem_max, tcp_rmem, udp_rmem_min values like in this link: Linux Tune Network Stack (Buffers Size) To Increase Networking Performance - nixCraft (cyberciti.biz) ).

    Here are other pages I've started looking at. TI does not guarantee information on non-TI websites is correct, but maybe they will give you some ideas:
    linux - UDP Packet drop - INErrors Vs .RcvbufErrors - Stack Overflow and the link in the top answer,
    Monitoring and Tuning the Linux Networking Stack: Receiving Data - Packagecloud Blog
    28.6.2.3. Improving UDP Performance by Configuring OS UDP Buffer Limits JBoss Enterprise Application Platform 5 | Red Hat Customer Portal
    How to enlarge Linux UDP buffer size? - SysTutorials

    Regards,

    Nick

  • Hi Nick,

    if we use -b100M, the throughput will be extremely poor when sending UDP packets to the AM3356, around 10kbit/s in the best case. This can only be seen on the server side, the sending iperf3 does not see any errors (which is expected behavior). With UPD we have to carefully increase the maximum throughput to find out what the maximum rate without packet loss is.

    We would be very interested in your maximum throughput using this approach with both SDK versions.

    Best regards,
    Christoph

  • Hello Christoph,

    With Processor SDK 6.3 and the processor limited to 300MHz, the best UDP throughput I see is ~10.5Mbits/sec when I'm using iperf3 -c <Ip address> -t 10 -u -b 20M. Reducing the UDP traffic (e.g., to 5M) still sees a decent number of lost datagrams.

    I need to chat with my team mates about the tests I'm running. Will respond again tomorrow.

    Regards,

    Nick

  • Hello Christoph,

    Are you using RT Linux or regular Linux?

    I am seeing higher UDP drops than I would expect. I am checking with the software dev team about a Linux 5.10 patch that addresses a buffer overflow issue for PRU Ethernet on a different processor. I want to make sure that issue does not affect AM335x running Linux 4.19, and establish workarounds if it does affect your software.

    Regards,

    Nick

  • Hi Nick,

    No, we are not using any real-time patches. We are at commit 5f8c1c6121da on branch processor-sdk-linux-4.19.y of git.ti.com/processor-sdk/processor-sdk-linux.git. Here is our kernel configuration: 6242.kernel.config

    Best regards
    Christoph

  • Hello Christoph,

    Apologies for the delayed response. My issue with lots of UDP drops was due to using a Linux PC as a link partner. I switched to using an AM65x EVM as a link partner for the AM335x ICEv2 board. When setting the CPU governer to userspace, 300000 frequency, I am able to replicate your observation of starting to lose a small number of packets at UDP throughput of 40 mBit:

    root@am65xx-evm:~# iperf3 -c 192.168.2.131 -t 300 -u -b 40M
    [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
    .
    .
    .
    [  5]   0.00-300.00 sec  1.40 GBytes  40.0 Mbits/sec  0.000 ms  0/1035909 (0%)  sender
    
    [  5]   0.00-300.03 sec  1.40 GBytes  40.0 Mbits/sec  0.227 ms  501/1035909 (0.048%)  receiver
    
    
    root@am335x-evm:~# iperf3 -s 192.168.2.131
    -----------------------------------------------------------
    Server listening on 5201
    -----------------------------------------------------------
    Accepted connection from 192.168.2.166, port 50668
    [  5] local 192.168.2.131 port 5201 connected to 192.168.2.166 port 46292
    .
    .
    .
    [ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
    [  5]   0.00-300.03 sec  1.40 GBytes  40.0 Mbits/sec  0.227 ms  501/1035909 (0.048%)  receiver
    

    Regards,

    Nick

  • Hi Nick,

    Glad to hear that you can reproduce the issue. Is there something that can be done about the low performance? Or maybe something is already scheduled for a more recent SDK?

    Best regards,
    Christoph