This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3352: RGMII throughput testing iperf3 lower bandwidth than expected

Part Number: AM3352

Hello,

I have a customer testing gigabit Ethernet throughput on a custom board and they're seeing decent throughput but not achieving the bandwidth expected based on our processor SDK benchmarking. The test setup is two customer boards directly connected via Ethernet, no additional equipment is on the network and the boards are otherwise idle besides running iperf3. Customer is using a DP83867 Ethernet PHY. Are there any differences between how we test our EVMs compared to how the customer is testing which could contribute to a discrepancy? Any suggestions on what else could contribute to a lower than expected throughput?

Representative iperf3 output below:

# iperf3 -c 192.168.2.3 -t 30

Connecting to host 192.168.2.3, port 5201

[  5] local 192.168.2.1 port 43668 connected to 192.168.2.3 port 5201

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd

[  5]   0.00-1.01   sec  29.4 MBytes   245 Mbits/sec   83    286 KBytes

[  5]   1.01-2.00   sec  27.1 MBytes   229 Mbits/sec   87    191 KBytes

[  5]   2.00-3.00   sec  29.8 MBytes   250 Mbits/sec    0    279 KBytes

[  5]   3.00-4.00   sec  27.9 MBytes   233 Mbits/sec   26    211 KBytes

[  5]   4.00-5.00   sec  30.1 MBytes   253 Mbits/sec    0    290 KBytes

[  5]   5.00-6.00   sec  29.4 MBytes   246 Mbits/sec    5    283 KBytes

[  5]   6.00-7.00   sec  30.4 MBytes   255 Mbits/sec   53    260 KBytes

[  5]   7.00-8.00   sec  30.1 MBytes   252 Mbits/sec  130    252 KBytes

[  5]   8.00-9.00   sec  30.1 MBytes   254 Mbits/sec   48    257 KBytes

[  5]   9.00-10.00  sec  29.3 MBytes   245 Mbits/sec  102    185 KBytes

[  5]  10.00-11.00  sec  28.1 MBytes   236 Mbits/sec   20    208 KBytes

[  5]  11.00-12.00  sec  28.8 MBytes   241 Mbits/sec    0    288 KBytes

[  5]  12.00-13.00  sec  30.4 MBytes   256 Mbits/sec   58    270 KBytes

[  5]  13.00-14.00  sec  29.5 MBytes   247 Mbits/sec   42    247 KBytes

[  5]  14.00-15.00  sec  31.1 MBytes   261 Mbits/sec   35    233 KBytes

[  5]  15.00-16.00  sec  30.1 MBytes   253 Mbits/sec    6    221 KBytes

[  5]  16.00-17.00  sec  30.6 MBytes   256 Mbits/sec    0    300 KBytes

[  5]  17.00-18.00  sec  30.7 MBytes   257 Mbits/sec  116    276 KBytes

[  5]  18.00-19.01  sec  30.6 MBytes   256 Mbits/sec    4    267 KBytes

[  5]  19.01-20.01  sec  30.4 MBytes   255 Mbits/sec   10    256 KBytes

[  5]  20.01-21.00  sec  23.4 MBytes   197 Mbits/sec   66    201 KBytes

[  5]  21.00-22.00  sec  29.8 MBytes   250 Mbits/sec    0    290 KBytes

[  5]  22.00-23.01  sec  26.2 MBytes   219 Mbits/sec   75    232 KBytes

[  5]  23.01-24.08  sec  29.4 MBytes   230 Mbits/sec    0    298 KBytes

[  5]  24.08-25.00  sec  26.3 MBytes   239 Mbits/sec    4    264 KBytes

[  5]  25.00-26.01  sec  28.8 MBytes   241 Mbits/sec   21    202 KBytes

[  5]  26.01-27.00  sec  30.6 MBytes   258 Mbits/sec    0    291 KBytes

[  5]  27.00-28.01  sec  30.9 MBytes   258 Mbits/sec    4    281 KBytes

[  5]  28.01-29.00  sec  30.1 MBytes   254 Mbits/sec    1    262 KBytes

[  5]  29.00-30.00  sec  25.3 MBytes   212 Mbits/sec   59    214 KBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bitrate         Retr

[  5]   0.00-30.00  sec   875 MBytes   245 Mbits/sec  1055             sender

[  5]   0.00-30.03  sec   874 MBytes   244 Mbits/sec                  receiver

Thanks!

Munan

  • Hi,

    Could you provide the TI SDK version? What is the clock speed the processor is running at? Which kernel version?

    Also could attach the results of ethtool -S eth0? I am assuming that eth0 is used? 

    Best Regards,

    Schuyler

  • Schuyler,

    The SDK version is sdk-06.03.00.106

    The processor clock speed is 800 MHZ

    The linux kernel version appears to be 4.19.94 (as reported by “uname -a”) and based on the git log

    I don’t know if we mentioned it, but we’re using an AM3352 processor and two DP83867 ethernet transceivers

     

    I’m currently testing with eth1 but see basically the same performance with eth0. I have eth0 on the same ethernet network as my pc. I use eth1 connected between two of our boards so there’s nothing else on that network to interfere. (They are separate networks: eth0 is 169.254.x.x and eth1 is 192.168.x.x.) I run iperf3 on only one ethernet port at a time.

     

    With ethool, I’ve verified both ports are full duplex and 1000 Mb/s. (We’re seeing throughput well above 100 Mb/s but I still checked)

     

    =========================

    # ethtool -S eth1

    NIC statistics:

         Good Rx Frames: 970543

         Broadcast Rx Frames: 464574

         Multicast Rx Frames: 57499

         Pause Rx Frames: 0

         Rx CRC Errors: 0

         Rx Align/Code Errors: 0

         Oversize Rx Frames: 0

         Rx Jabbers: 0

         Undersize (Short) Rx Frames: 0

         Rx Fragments: 0

         Rx Octets: 82461618

         Good Tx Frames: 12617530

         Broadcast Tx Frames: 380121

         Multicast Tx Frames: 18914

         Pause Tx Frames: 0

         Deferred Tx Frames: 0

        Collisions: 0

         Single Collision Tx Frames: 0

         Multiple Collision Tx Frames: 0

         Excessive Collisions: 0

         Late Collisions: 0

         Tx Underrun: 0

         Carrier Sense Errors: 0

         Tx Octets: 1403621831

         Rx + Tx 64 Octet Frames: 373080

         Rx + Tx 65-127 Octet Frames: 880258

         Rx + Tx 128-255 Octet Frames: 2491

         Rx + Tx 256-511 Octet Frames: 116136

         Rx + Tx 512-1023 Octet Frames: 5205

         Rx + Tx 1024-Up Octet Frames: 12210903

         Net Octets: 1486083449

         Rx Start of Frame Overruns: 1668

         Rx Middle of Frame Overruns: 0

         Rx DMA Overruns: 1668

         Rx DMA chan 0: head_enqueue: 1

         Rx DMA chan 0: tail_enqueue: 937490

         Rx DMA chan 0: pad_enqueue: 0

         Rx DMA chan 0: misqueued: 245

         Rx DMA chan 0: desc_alloc_fail: 0

         Rx DMA chan 0: pad_alloc_fail: 0

         Rx DMA chan 0: runt_receive_buf: 0

         Rx DMA chan 0: runt_transmit_bu: 0

         Rx DMA chan 0: empty_dequeue: 0

         Rx DMA chan 0: busy_dequeue: 862282

         Rx DMA chan 0: good_dequeue: 937363

         Rx DMA chan 0: requeue: 6

         Rx DMA chan 0: teardown_dequeue: 0

         Tx DMA chan 0: head_enqueue: 584076

         Tx DMA chan 0: tail_enqueue: 12033454

         Tx DMA chan 0: pad_enqueue: 0

         Tx DMA chan 0: misqueued: 169666

         Tx DMA chan 0: desc_alloc_fail: 0

         Tx DMA chan 0: pad_alloc_fail: 0

         Tx DMA chan 0: runt_receive_buf: 0

         Tx DMA chan 0: runt_transmit_bu: 334210

         Tx DMA chan 0: empty_dequeue: 584074

         Tx DMA chan 0: busy_dequeue: 3476743

         Tx DMA chan 0: good_dequeue: 12617530

         Tx DMA chan 0: requeue: 349403

         Tx DMA chan 0: teardown_dequeue: 0

    =========================

  • Hi,

    This indicates some frames are getting dropped as CPSW FIFO's are full, a small number when compared to the overall number of frames received. 

    Rx Start of Frame Overruns: 1668

    Rx Middle of Frame Overruns: 0

    Rx DMA Overruns: 1668

    You mentioned two ports are connected but are they link up at the same time? I see that you mentioned only using one link at a time but there maybe background traffic on the links not being used for iperf. Could you please try the test with the non-used link down? 

    Regarding the performance listed in the SDK user guide that was run on a processor running at 1GHz and I believe it used iperf2 or iperf for benchmark testing. Could try running iperf with the different window sizes listed the benchmark table?

    Best Regards,

    Schuyler

  • I’m using two of our boards for the testing. Each board has two ethernet ports.

    Eth0 is disconnected (and reports link down) on both boards

    Eth1 is directly connected between the two separate boards (There’s just wire - no hub, switch, or similar)

     

    I found an old version of “iperf” for the test. I rebooted the board and ran this command:

       for w in 4 8 16 32 64 128 256; do

          iperf -c 192.168.2.3 -t 15 -w ${w}k

       done

     

    The window sizes were picked to match the values listed in “2.2.1.1.9.1. TCP Throughput”

     

    Here’s the iperf output:

    ------------------------------------------------------------

    Client connecting to 192.168.2.3, TCP port 5001

    TCP window size: 8.00 KByte (WARNING: requested 4.00 KByte)

    ------------------------------------------------------------

    [  3] local 192.168.2.1 port 57136 connected with 192.168.2.3 port 5001

    [ ID] Interval       Transfer     Bandwidth

    [  3]  0.0-15.0 sec   289 MBytes   161 Mbits/sec

    ------------------------------------------------------------

    Client connecting to 192.168.2.3, TCP port 5001

    TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)

    ------------------------------------------------------------

    [  3] local 192.168.2.1 port 57138 connected with 192.168.2.3 port 5001

    [ ID] Interval       Transfer     Bandwidth

    [  3]  0.0-15.0 sec   295 MBytes   165 Mbits/sec

    ------------------------------------------------------------

    Client connecting to 192.168.2.3, TCP port 5001

    TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)

    ------------------------------------------------------------

    [  3] local 192.168.2.1 port 57140 connected with 192.168.2.3 port 5001

    [ ID] Interval       Transfer     Bandwidth

    [  3]  0.0-15.0 sec   299 MBytes   167 Mbits/sec

    ------------------------------------------------------------

    Client connecting to 192.168.2.3, TCP port 5001

    TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)

    ------------------------------------------------------------

    [  3] local 192.168.2.1 port 57142 connected with 192.168.2.3 port 5001

    [ ID] Interval       Transfer     Bandwidth

    [  3]  0.0-15.0 sec   323 MBytes   181 Mbits/sec

    ------------------------------------------------------------

    Client connecting to 192.168.2.3, TCP port 5001

    TCP window size:  128 KByte (WARNING: requested 64.0 KByte)

    ------------------------------------------------------------

    [  3] local 192.168.2.1 port 57144 connected with 192.168.2.3 port 5001

    [ ID] Interval       Transfer     Bandwidth

    [  3]  0.0-15.0 sec   370 MBytes   207 Mbits/sec

    ------------------------------------------------------------

    Client connecting to 192.168.2.3, TCP port 5001

    TCP window size:  256 KByte (WARNING: requested  128 KByte)

    ------------------------------------------------------------

    [  3] local 192.168.2.1 port 57146 connected with 192.168.2.3 port 5001

    [ ID] Interval       Transfer     Bandwidth

    [  3]  0.0-15.0 sec   487 MBytes   272 Mbits/sec

    ------------------------------------------------------------

    Client connecting to 192.168.2.3, TCP port 5001

    TCP window size:  352 KByte (WARNING: requested  256 KByte)

    ------------------------------------------------------------

    [  3] local 192.168.2.1 port 57148 connected with 192.168.2.3 port 5001

    [ ID] Interval       Transfer     Bandwidth

    [  3]  0.0-15.0 sec   480 MBytes   268 Mbits/sec

     

     

    =========================

    # ethtool -S eth1

    NIC statistics:

         Good Rx Frames: 49252

         Broadcast Rx Frames: 271

         Multicast Rx Frames: 62

         Pause Rx Frames: 0

         Rx CRC Errors: 0

         Rx Align/Code Errors: 0

         Oversize Rx Frames: 0

         Rx Jabbers: 0

         Undersize (Short) Rx Frames: 0

         Rx Fragments: 0

         Rx Octets: 3465829

         Good Tx Frames: 1850713

         Broadcast Tx Frames: 220

         Multicast Tx Frames: 105

         Pause Tx Frames: 0

         Deferred Tx Frames: 0

         Collisions: 0

         Single Collision Tx Frames: 0

         Multiple Collision Tx Frames: 0

         Excessive Collisions: 0

         Late Collisions: 0

         Tx Underrun: 0

         Carrier Sense Errors: 0

         Tx Octets: 2795182850

         Rx + Tx 64 Octet Frames: 59

         Rx + Tx 65-127 Octet Frames: 50592

         Rx + Tx 128-255 Octet Frames: 180

         Rx + Tx 256-511 Octet Frames: 429

         Rx + Tx 512-1023 Octet Frames: 16303

         Rx + Tx 1024-Up Octet Frames: 1832402

         Net Octets: 2798648679

         Rx Start of Frame Overruns: 0

         Rx Middle of Frame Overruns: 0

         Rx DMA Overruns: 0

         Rx DMA chan 0: head_enqueue: 1

         Rx DMA chan 0: tail_enqueue: 49329

         Rx DMA chan 0: pad_enqueue: 0

         Rx DMA chan 0: misqueued: 0

         Rx DMA chan 0: desc_alloc_fail: 0

         Rx DMA chan 0: pad_alloc_fail: 0

         Rx DMA chan 0: runt_receive_buf: 0

         Rx DMA chan 0: runt_transmit_bu: 0

         Rx DMA chan 0: empty_dequeue: 0

         Rx DMA chan 0: busy_dequeue: 47244

         Rx DMA chan 0: good_dequeue: 49202

         Rx DMA chan 0: requeue: 0

         Rx DMA chan 0: teardown_dequeue: 0

         Tx DMA chan 0: head_enqueue: 56557

         Tx DMA chan 0: tail_enqueue: 1794156

         Tx DMA chan 0: pad_enqueue: 0

         Tx DMA chan 0: misqueued: 2436

         Tx DMA chan 0: desc_alloc_fail: 0

         Tx DMA chan 0: pad_alloc_fail: 0

         Tx DMA chan 0: runt_receive_buf: 0

         Tx DMA chan 0: runt_transmit_bu: 187

         Tx DMA chan 0: empty_dequeue: 56557

         Tx DMA chan 0: busy_dequeue: 1213604

         Tx DMA chan 0: good_dequeue: 1850713

         Tx DMA chan 0: requeue: 48461

         Tx DMA chan 0: teardown_dequeue: 0

    =========================

  • Hi,

    The statistics look good, no overruns. 

    I am assuming that you have not added rx interrupt coalesce to the interface. Could you try this command as well:

     ethtool -C eth0 rx-usecs 500

    Would you be able to run the against a Linux PC? The tests in the benchmark I believe were done against a PC.

    Best Regards,

    Schuyler

  • Hi Schuyler,

    Feedback from customer is that running iperf3 from a windows PC was slower than running between two of their boards. Any point in pressing further on testing specifically on a Linux platform or should we move on?

    Customer also ran the command suggested for rx interrupt coalesce and also saw a speed decrease after running the command as specified. Any other suggestions? Results below.

    ------------------------------------------------------------
    
    Client connecting to 192.168.2.3, TCP port 5001
    
    TCP window size: 8.00 KByte (WARNING: requested 4.00 KByte)
    
    ------------------------------------------------------------
    
    [  3] local 192.168.2.1 port 37626 connected with 192.168.2.3 port 5001
    
    [ ID] Interval       Transfer     Bandwidth
    
    [  3]  0.0-15.0 sec   255 MBytes   143 Mbits/sec
    
    ------------------------------------------------------------
    
    Client connecting to 192.168.2.3, TCP port 5001
    
    TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
    
    ------------------------------------------------------------
    
    [  3] local 192.168.2.1 port 37628 connected with 192.168.2.3 port 5001
    
    [ ID] Interval       Transfer     Bandwidth
    
    [  3]  0.0-15.0 sec   259 MBytes   145 Mbits/sec
    
    ------------------------------------------------------------
    
    Client connecting to 192.168.2.3, TCP port 5001
    
    TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
    
    ------------------------------------------------------------
    
    [  3] local 192.168.2.1 port 37630 connected with 192.168.2.3 port 5001
    
    [ ID] Interval       Transfer     Bandwidth
    
    [  3]  0.0-15.0 sec   256 MBytes   143 Mbits/sec
    
    ------------------------------------------------------------
    
    Client connecting to 192.168.2.3, TCP port 5001
    
    TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
    
    ------------------------------------------------------------
    
    [  3] local 192.168.2.1 port 37632 connected with 192.168.2.3 port 5001
    
    [ ID] Interval       Transfer     Bandwidth
    
    [  3]  0.0-15.0 sec   262 MBytes   146 Mbits/sec
    
    ------------------------------------------------------------
    
    Client connecting to 192.168.2.3, TCP port 5001
    
    TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
    
    ------------------------------------------------------------
    
    [  3] local 192.168.2.1 port 37634 connected with 192.168.2.3 port 5001
    
    [ ID] Interval       Transfer     Bandwidth
    
    [  3]  0.0-15.0 sec   374 MBytes   209 Mbits/sec
    
    ------------------------------------------------------------
    
    Client connecting to 192.168.2.3, TCP port 5001
    
    TCP window size:  256 KByte (WARNING: requested  128 KByte)
    
    ------------------------------------------------------------
    
    [  3] local 192.168.2.1 port 37636 connected with 192.168.2.3 port 5001
    
    [ ID] Interval       Transfer     Bandwidth
    
    [  3]  0.0-15.0 sec   462 MBytes   258 Mbits/sec
    
    ------------------------------------------------------------
    
    Client connecting to 192.168.2.3, TCP port 5001
    
    TCP window size:  352 KByte (WARNING: requested  256 KByte)
    
    ------------------------------------------------------------
    
    [  3] local 192.168.2.1 port 37638 connected with 192.168.2.3 port 5001
    
    [ ID] Interval       Transfer     Bandwidth
    
    [  3]  0.0-15.0 sec   439 MBytes   245 Mbits/sec

    Munan
  • Hi Munan,

    The coalesce command can be tried with different settings, the 500uS is probably the max. Is the uname -a command returning PREEMPT RT or just PREEMPT?  That would explain lower throughput. 

    Overall if there are no errors and the bandwidth is sufficient for their application then I would think it is ok to move on. 

    Best Regards,

    Schuyler