This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

The AM335x ethernet receive performance issue for small UDP package size on Linux sdk 5.07

 Hi all,

   We faced an AM335x ethernet receive performance issue for small UDP package size on Linux sdk 5.07:

   The server send the UDP package to the beaglebone without delay:  

1)      If the package size is 172 Bytes or < 1200 Bytes, and send 5000 packages to the AM335x, the AM335x loses over 3500 packages.

2)      If the package size is >= 1200 Bytes, then no packages lost.

3)      I set  the pacing interval to 500 usecs, according to the command line on wiki, but no difference to the packing losing when its size is less than 1200 Bytes.

   So could you give some comments on how to tune it?

Yaoming.

Thanks!

 

  • Hello,

    There are two places where packets could be getting dropped.

    1) Within the hardware due to DMA overrun. You can check if this is the case by reading the stats available in /sys/class/net/eth0/hw_stats

    2) Within the Linux network stack. You can try increasing the socket buffer and UDP queue sizes. Here is a link with information on how to do this: http://developerweb.net/viewtopic.php?id=5872

    Thanks,

    Sekhar

  • Hi Sekhar,

       Thanks for your reply!

       For the DMA overrun, I check it on the latest ezsdk, whose kernel is 3.2.0,  the result for cat  /sys/class/net/eth0/hw_stats is as below, but I didn't find the "rxdmaoverruns" in the following list, do you know why, and how to judge whether there is the DMA overrun with the following results?

    CPSW Statistics:
    rxgoodframes ............................ 1187
    rxbroadcastframes ....................... 413
    rxmulticastframes ....................... 767
    rxoctets ................................ 139726
    txgoodframes ............................ 2
    txbroadcastframes ....................... 2
    txoctets ................................ 664
    octetframes64 ........................... 260
    octetframes65t127 ....................... 739
    octetframes128t255 ...................... 27
    octetframes256t511 ...................... 153
    octetframes512t1023 ..................... 2
    octetframes1024tup ...................... 8
    netoctets ............................... 140390

    RX DMA Statistics:
    head_enqueue ............................ 1
    tail_enqueue ............................ 899
    busy_dequeue ............................ 783
    good_dequeue ............................ 836

    TX DMA Statistics:
    head_enqueue ............................ 2
    empty_dequeue ........................... 783
    good_dequeue ............................ 2



  • Hello

    The above log mentions that there is no packet drop in hardware. Packet drop lies in network stack. DMA overruns stat will be showed only when the count  is non zero.
    Did you tried to increase the network queue sizes as mentioned by Sekhar. For more info you can also refer to the following wiki
    http://processors.wiki.ti.com/index.php/TI81XX_UDP_Performance_Improvement 

    Regards
    Mugunthan V N 

  • Hi  Mugunthan,

      Thanks for your reply!

      I am sorry that there is some mistake in my test set up. And actually there is the UDP package lost in my test, the /sys/class/net/eth0/hw_stats are as below with the different nRecvBuf. And it seems that the bigger nRecvBuf is, the less DMA overrun. So do you have any further comments on how to improve the performance?

    Meanwhile, I will check the wiki page you mentioned.

       nRecvBuf=1024*1024:

       

    CPSW Statistics:
    rxgoodframes ............................       6049
    rxbroadcastframes .......................          1
    rxoctets ................................    1293246
    txgoodframes ............................       1048
    txbroadcastframes .......................          1
    txoctets ................................     228310
    octetframes64 ...........................          3
    octetframes128t255 ......................       7094
    netoctets ...............................    1521556
    rxsofoverruns ...........................        658
    rxdmaoverruns ...........................        658

     

    RX DMA Statistics:
    head_enqueue ............................          1
    tail_enqueue ............................       5454
    misqueued ...............................       1205
    busy_dequeue ............................       1279
    good_dequeue ............................       5391

     

    TX DMA Statistics:
    head_enqueue ............................       1046
    tail_enqueue ............................          2
    misqueued ...............................          2
    runt_transmit_buff ......................          1
    empty_dequeue ...........................       1340
    busy_dequeue ............................          7
    good_dequeue ............................       1048

    nRecvBuf=2048*1024

    /mnt/rwfs/var/bin # cat /sys/class/net/eth0/hw_stats
    CPSW Statistics:
    rxgoodframes ............................       6060
    rxbroadcastframes .......................          4
    rxoctets ................................    1295292
    txgoodframes ............................       1056
    txbroadcastframes .......................          1
    txoctets ................................     230054
    octetframes64 ...........................          2
    octetframes65t127 .......................          3
    octetframes128t255 ......................       7111
    netoctets ...............................    1525346
    rxsofoverruns ...........................        641
    rxdmaoverruns ...........................        641

     

    RX DMA Statistics:
    head_enqueue ............................          1
    tail_enqueue ............................       5482
    misqueued ...............................       1156
    busy_dequeue ............................       1341
    good_dequeue ............................       5419

     

    TX DMA Statistics:
    head_enqueue ............................       1054
    tail_enqueue ............................          2
    misqueued ...............................          2
    runt_transmit_buff ......................          1
    empty_dequeue ...........................       1399
    busy_dequeue ............................         10
    good_dequeue ............................       1056

  • Hi Mugunthan,

       According to your , here, we should handle the DMA overrun firstly.

       However, could you clarify how to improve the following points on AM335x:

     

    • Move the DMA descriptors from internal BD Ram to DDR location
    • Increase the size of Descriptors Memory size
    • Increase number of Rx descriptors queued to the hardware
        Thanks!
  • Hello

    Please make the following changes to am33xx_cpsw_pdata structure in arch/arm/mach-omap2/devices.c

    .no_bd_ram              = true,
    .bd_ram_size            = <have increased size for BD> 
    .rx_descs               = <increase this according to the above BD ram size>, 

    Regards
    Mugunthan V N 

  •  Hi Mugunthan V N,

       

        Thanks for your reply!

       After some tuning, we have good improvement.

        However, for the edma channel, I still have the question for the software configuration of RX HDP/TX HDP. 

        I dump the registers values from 0x4A100A00 to 0x4A100A3C, only the offset 0x20 has the value, the others values are all 0.

       It seems that only 1 DMA channel is enabled, while, as the am33xx_cpsw_pdata in arch/arm/mach-omap2/devices.c, its member, .channels, is 8.

        So does the channels here refer to the DMA channel?

        If so, how to enable the 8 channels to improve the performance?

    Thanks!

    Yaoming



  • Yaoming

    Are you still seeing DMA over runs, or it stopped now?
    Davinci CPDMA driver was not implemented for multichannel usage, I don't think with the existing driver we can use multichannel for DMA.

    Regards
    Mugunthan V N 

  • Hi  Mugunthan,

       There are still some DMA over runs, but much less than before.

        Thanks for your clarification!

        I will continue to tune the performance, if any question, I will continue to post here.

      thanks!

    Yaoming

  • Hi Mugunthan V N ,


       Could you explain why we didn't support the DMA multi-channel in the current Linux driver?

       Is there any plan to support it?

    Thanks!

    Yaoming

  • Hi Mugunthan, I found an issue when configured the cpsw DMA buffer to ddr as below on the lates ezsdk 6.0, and psp version is psp04.06.00.11

    1. In arch/arm/mach-omap2/devices.c,  in structure am33xx_cpsw_pdata, change .no_bd_ram to true

    2.  With the modification, the CPSW has the issue when it is probed, the error log is as below:

    [    1.588470] cpsw cpsw: coherent DMA mask is unset

    [    1.593414]  (null): error initializing dma

    3. To debug the issue, I did some log and found that it failed at dma_alloc_coherent(). The calling stack is as below:

    cpsw_probe-> cpdma_ctlr_create -> cpdma_desc_pool_create -> dma_alloc_coherent.

     

    According to the above calling stack, it seems that the root cause is in the configuration for the CPSW DMA buffer allocation.

    Could you have a check as well?

    thanks!

    Yaoming