This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM4372: Rx start of frame overruns

Part Number: AM4372
Other Parts Discussed in Thread: TMDXSK437X

Hi Sitara support Team,

Related the following thread, my customer is facing the packet loss.
e2e.ti.com/.../3183488

[Check list items]
processors.wiki.ti.com/.../5x_CPSW

-Kernel version and source, also include the results of this command: uname -a
 Linux am437x-evm 4.19.38-g4dae378bbe #155 PREEMPT Mon Nov 11 09:38:02 JST 2019 armv7l GNU/Linux

-File system, TI SDK or Arago/Yocto based filesystem
 targetNFS; NFS boot

-Custom board or TI board? Please include device tree source file.
 TMDXSK437X

-ifconfig <interface such as eth0 or eth1>

6332.20191211_ifconfig.txt
root@am437x-evm:/home# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:80:B3:71:20:7B
          inet addr:192.168.200.1  Bcast:192.168.200.255  Mask:255.255.255.0
          inet6 addr: fe80::280:b3ff:fe71:207b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1456725 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1362267 errors:13 dropped:17 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2009790566 (1.8 GiB)  TX bytes:1969445225 (1.8 GiB)
          Interrupt:43

eth1      Link encap:Ethernet  HWaddr 00:80:B3:71:20:7C
          inet addr:192.168.100.1  Bcast:192.168.100.255  Mask:255.255.255.0
          inet6 addr: fe80::280:b3ff:fe71:207c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2493884 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1709897 errors:5 dropped:5 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2573924347 (2.3 GiB)  TX bytes:2528802376 (2.3 GiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:124 errors:0 dropped:0 overruns:0 frame:0
          TX packets:124 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:13116 (12.8 KiB)  TX bytes:13116 (12.8 KiB)


Here is the test Environment; 
-Host: server PC, 
-Client: TI:AM3358SK board
-LAN 1ch TCP communication (16MB data TX/RX)

Result;
"Rx Start of Frame Overruns" counts up.
Please refer the log of "ethtool", "cat /proc/sys/net/core/rmem_max",
"cat /proc/sys/net/core/rmem_default", "cat /proc/net/snmp". 

0412.LAN1ch_NG_log.txt
cat /proc/sys/net/core/rmem_max
163840
cat /proc/sys/net/core/rmem_default
163840


cat /proc/sys/net/core/rmem_max
163840
cat /proc/sys/net/core/rmem_default
163840


cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
Ip: 2 64 1666056 0 0 0 0 0 1666056 2234203 0 0 0 0 0 0 0 0 0
Icmp: InMsgs InErrors InCsumErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps
Icmp: 2 0 0 0 0 0 0 0 0 2 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0
IcmpMsg: InType0 OutType8
IcmpMsg: 2 2
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts InCsumErrors
Tcp: 1 200 120000 -1 9 8 0 1 1 1665979 2234058 0 0 8 0
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors InCsumErrors
Udp: 10 0 0 10 0 0 0
UdpLite: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors InCsumErrors
UdpLite: 0 0 0 0 0 0 0


ethtool -S eth0
NIC statistics:
     Good Rx Frames: 1667934
     Broadcast Rx Frames: 68
     Multicast Rx Frames: 14
     Pause Rx Frames: 0
     Rx CRC Errors: 0
     Rx Align/Code Errors: 0
     Oversize Rx Frames: 0
     Rx Jabbers: 0
     Undersize (Short) Rx Frames: 0
     Rx Fragments: 0
     Rx Octets: 2250244253
     Good Tx Frames: 2234223
     Broadcast Tx Frames: 149
     Multicast Tx Frames: 0
     Pause Tx Frames: 0
     Deferred Tx Frames: 0
     Collisions: 0
     Single Collision Tx Frames: 0
     Multiple Collision Tx Frames: 0
     Excessive Collisions: 0
     Late Collisions: 0
     Tx Underrun: 0
     Carrier Sense Errors: 0
     Tx Octets: 2339581508
     Rx + Tx 64 Octet Frames: 301
     Rx + Tx 65-127 Octet Frames: 920938
     Rx + Tx 128-255 Octet Frames: 87
     Rx + Tx 256-511 Octet Frames: 157
     Rx + Tx 512-1023 Octet Frames: 25
     Rx + Tx 1024-Up Octet Frames: 2980649
     Net Octets: 294858465
     Rx Start of Frame Overruns: 1991
     Rx Middle of Frame Overruns: 0
     Rx DMA Overruns: 1991
     Rx DMA chan: head_enqueue: 1
     Rx DMA chan: tail_enqueue: 1665992
     Rx DMA chan: pad_enqueue: 0
     Rx DMA chan: misqueued: 76
     Rx DMA chan: desc_alloc_fail: 0
     Rx DMA chan: pad_alloc_fail: 0
     Rx DMA chan: runt_receive_buf: 0
     Rx DMA chan: runt_transmit_buf: 0
     Rx DMA chan: empty_dequeue: 0
     Rx DMA chan: busy_dequeue: 454591
     Rx DMA chan: good_dequeue: 1665929
     Rx DMA chan: requeue: 28
     Rx DMA chan: teardown_dequeue: 0
     Tx DMA chan: head_enqueue: 1090146
     Tx DMA chan: tail_enqueue: 1144077
     Tx DMA chan: pad_enqueue: 0
     Tx DMA chan: misqueued: 57621
     Tx DMA chan: desc_alloc_fail: 0
     Tx DMA chan: pad_alloc_fail: 0
     Tx DMA chan: runt_receive_buf: 0
     Tx DMA chan: runt_transmit_buf: 122
     Tx DMA chan: empty_dequeue: 1443565
     Tx DMA chan: busy_dequeue: 446485
     Tx DMA chan: good_dequeue: 2234223
     Tx DMA chan: requeue: 1097574
     Tx DMA chan: teardown_dequeue: 0
     

Questions;
1. Regarding "Rx Start of Frame Overruns" in the log,
 does it mean the count value of "15.3.2.20.1.11 Rx Start of Frame Overruns" in TRM?

2. If there are packet drops due to the frame overrun or the FIFO restriction in CPDMA,
 is there any workaround for this?

Any advises or suggestions would be appreciated.

Best regards,
Kanae

  • Hi,

    Yes there is a solution to this problem. There needs to be more RX descriptors added.  

    In this link in the software developers guide please look for the section on Configure number of TX/RX descriptors.


    This command line option will allow you to increase the number of descriptors that the CPSW driver will use. We don't have an algorithm to give you that would allow you to compute the exact number of descriptors as this is very traffic dependent. The example gives a 4096 which should be more than enough. Usually RX descriptor exhaustion which causes the overflows typically happens in bursty environments, excessive small packets or in your case high volume usage on both ports. To confirm what you are saying is that the switch is dropping the packets due to an RX descriptor not being available to push the packet into DDR.

    One other item to point out is the network stack buffering. You appear to be aware of this since you are looking at rmem_max. If packets are dropping at this level it would be to undersized buffering configured for the network stack. The overflow problem you see is completely independent of this level.

    Best Regards,

    Schuyler

  • Hi Schuyler,

    Thank you for your reply.

    My customer has already check the site to add RX descriptors.

    Here are the results.
    *It changes the size of descs_pool_size in cpsw.c, and executes TCP 16MB data communication with LAN 2ch.

    [Size, Status(baud rate/ch)]
    1. descs_pool_size = 256; // Exception occurred during communication (driver default setting)

    2. descs_pool_size = 4096; // Rx Start of Frame Overruns count up (12~14MB/s)

    3. descs_pool_size = 32*1024; // Rx Start of Frame Overruns count up(10~12MB/s)

    4. descs_pool_size = 128*1024; // Rx Start of Frame Overruns count up(5~7MB/s)

    5. descs_pool_size = 512*1024; // Rx Start of Frame Overruns count up(2~4MB/s)

    6. descs_pool_size = 1024*1024;// Linux boot error(Kernel panic)

    TCP communication works on Test No.1,2 3,4,5, but Rx Start of Frame Overruns count up.

    Questions;
    1. Is this "Rx Start of Frame Overruns count up" normal status?
       Or there is something wrong to the system? 

    2. Can you suggest the other solution?

    3. This question is in my last post.
        Regarding "Rx Start of Frame Overruns" in the log,
      does it mean the count value of "15.3.2.20.1.11 Rx Start of Frame Overruns" in TRM?

    4. Does the count number of Rx Start of Frame Overruns mean the number of packets
        that received a large number of packets and could not be processed due to insufficient memory buffer, and were discarded?

    Best regards,
    Kanae

  • Hi Schuyler,

    Thank you for your support.
    Could you please answer as soon as possible,
    especially for the question No.4 ?

    Best regards,
    Kanae

  • Hi Kanae,

    To answer question 4 completely requires some additional information as the issues you are seeing are up against a possible system limitation. Could you describe the network load expected in terms of bit rate per Ethernet port? Could you generically describe the application running?

    What the overflows happen this is due to an exhaustion of rx descriptors, until there is an RX descriptor then any received packets will be dropped. Exhaustion of descriptors can come from the processor not processing the packets fast enough due to other application requirements or perhaps the processor is running too slowly for the planned network load. 

    I notice that you have two threads open on this topic, I will review and may close one of them. 

    Best Regards,

    Schuyler

  • Hi Schuyler,

    Thank you for your support!

    Here are answers to your confirmations.

    1. Could you describe the network load expected in terms of bit rate per Ethernet port? 

    >It uses 1000-Base and it is 35.5MB/s (284Mbit / s) per 1ch LAN with TCP,
     when the 2ch ports are combined, it would be received 71MB/s (568Mbit/s) data.

    2. Could you generically describe the application running?

    >It is simple application.
    Data transmission from client 

    -> Send back data on the server
    -> Receive data to client

    In case of TCP, TX/RX data is 16MB.
    In case of UDP, TX/RX data is 30KB.

    Additional questions to your comments;

    1. Are there any other cases except for "Other processing or CPU is slow" or 
    "It cannnot kick DMA processing and it makes not to que by not enough RX descriptors"?

    For example, if the FIFO RX speed is faster than the DMA transfer,
    the DMA transfer can not be in time. It makes "not enough FIFO or RX descriptor".
    If this is the cause of issue, is it possible to avoid the issue by increasing the internal clock speed?

    2. Currently, there are only two processes that run the applications of LAN communication (client).
        The one process: LAN 1ch.

    When it monitors the CPU usage rate with top command, the total with "ksoftirqd" is 93.2% as shown below.
    As a result, does this happen because the processing of the processor is too slow?

    ===============================================================================
    PID USER  PR  NI VIRT    RES  SHR  S  %CPU   %MEM   TIME+ COMMAND
    1166 root   0      0 18268 17688  1224  S     37.4          1.7   2:14.30 client
    1184 root   20    0 18268 17684  1224  S     35.5          1.7   0:26.34 client
    9       root   20    0         0         0        0  R     20.6         0.0   2:24.42 ksoftirqd/0
    ================================================================================

    Best regards,
    Kanae

  • Hi Kanae,

    Thank you for the description of the application and the processor load. The network load that you are looking is beyond the capability of the AM437, if the above statement is correct the application needs greater than 852Mbps which is more than twice what the AM437x class processor can handle. The first example load does not align with the second load description, there is only approximately 130Mbps defined here, this load would work.

    The best and only advice I can provide is to look for a way to reduce the network load.

    To answer the other questions:

    - The CPDMA packet transfers are triggered by HW, the SW interaction is the CPSW driver processes the descriptor after packet reception and pushes the packet to the network stack and then adds the descriptor back to the list of available descriptors. So the exhaustion is due to that the processor is not able to restore the descriptors fast enough as the network load is beyond what the AM437 can handle.

    - The ksoftirq is several threads, it includes the network processing thread.

    Best Regards,

    Schuyler

  • Hi Schuyler,

    Thank you for your support!
    My customer will adjust to reduce the load on the network.
    So, could please advise the following questions.
    Since these questions are vague, you can give the answers on some conditions.

    Q1. How much is the limitation of the AM437x's network load per channel?

    Q2. Does the above network load depend on the CPU load status?
      Can you suggest the rough estimation how much withstand the network load?

    Q3. Does it correspond "pause frames"?
      If it corresponds pause frames, is there possibility to avoid this issue?

    Best regards,
    Kanae

  • Hi Schuyler,

    Thank you for your support.

    Regarding Q3. in my last post,
    my customer supposes that it makes to output the Pause frame
    when the RX descriptors were exhausted.
    Is this function not supported?

    If not, is there any reason why this is not possible to output the pause frame?

    My customer would like to clear these questions in this year.
    Please give us the answers as much as you can.
    If you need to know the other information, please let me know. 

    Best regards,
    Kanae


  • Hi Sitara support Team,

    Are there any support members in this forum?
    Has this year' s support ended?

    Best regards
    Kane

  • Hi Schuyler,

    Could I have your reply for the questions?

    Should I post this as a new thread?

    Best regards,
    Kanae

  • Hi,

    The pause frames are only issued when the internal FIFOs between the two ports inside the switch are full and not when there is an RX descriptor exhaustion. I would suggest reducing the bit rate as the best route to take since even if the RX descriptor issued pause frames this would still cause the bit rate to drop significantly.

    Best Regards,

    Schuyler

  • Hi Schuyler,

    Thank you for your reply!

    I would like to make sure what you explained,
    "even if the RX descriptor issued pause frames ...".
    This case happens when the system flow control turns off the CPDMA.
    Because if the system flow control turns on the CPDMA,
    the CPPI DMA engine drops the frame that means it does not make
    the internal FIFOs between the two ports inside the switch fulled.

    Is my understanding correct?
    I appreciate your support.

    Best regards,
    Kanae