This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: CPSW9G performance issue on SDK 7.0

Part Number: TDA4VM

Hi, since we have high network throughput, it is about 800Mbps.

And in our arch design, we will use CPSW9G for network data exchange. But it seems that CPSW9G's CPU usage is too high, here is my test results:

iperf TCP TX on EVM board: 942 Mbits/sec with ~25% CPU usage

iperf TCP RX on EVM board: 811 Mbits/sec with ~60% CPU usage

So we wonder if there is some technics to reduce the CPU usage, since the CPU resource is so precious in our software design.

Any driver patches and Linux config tweaks are welcome.

Thanks.

  • felix,

    Can you confirm if checksum offoad is currently enabled (confirm with ethtool -K command).

    This is supported and should be default, but if it is not enabled will be a possible quick improvement. Additional optimizations are under evaluation.

    Best regards,

    Dave

  • Hi Dave,

    Yes, checksum offload is enabled by default

    root@j7-evm:~# ethtool -k eth1
    Features for eth1:
    rx-checksumming: on
    tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on [fixed]
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
    scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
    tcp-segmentation-offload: off
    tx-tcp-segmentation: off [fixed]
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: off [fixed]
    tx-tcp6-segmentation: off [fixed]
    generic-segmentation-offload: on
    generic-receive-offload: on
    large-receive-offload: off [fixed]
    rx-vlan-offload: off [fixed]
    tx-vlan-offload: off [fixed]
    ntuple-filters: off [fixed]
    receive-hashing: off [fixed]
    highdma: off [fixed]
    rx-vlan-filter: off [fixed]
    vlan-challenged: off [fixed]
    tx-lockless: off [fixed]
    netns-local: off [fixed]
    tx-gso-robust: off [fixed]
    tx-fcoe-segmentation: off [fixed]
    tx-gre-segmentation: off [fixed]
    tx-gre-csum-segmentation: off [fixed]
    tx-ipxip4-segmentation: off [fixed]
    tx-ipxip6-segmentation: off [fixed]
    tx-udp_tnl-segmentation: off [fixed]
    tx-udp_tnl-csum-segmentation: off [fixed]
    tx-gso-partial: off [fixed]
    tx-sctp-segmentation: off [fixed]
    tx-esp-segmentation: off [fixed]
    tx-udp-segmentation: off [fixed]
    fcoe-mtu: off [fixed]
    tx-nocache-copy: off
    loopback: off [fixed]
    rx-fcs: off [fixed]
    rx-all: off [fixed]
    tx-vlan-stag-hw-insert: off [fixed]
    rx-vlan-stag-hw-parse: off [fixed]
    rx-vlan-stag-filter: off [fixed]
    l2-fwd-offload: off [fixed]
    hw-tc-offload: off [fixed]
    esp-hw-offload: off [fixed]
    esp-tx-csum-hw-offload: off [fixed]
    rx-udp_tunnel-port-offload: off [fixed]
    tls-hw-tx-offload: off [fixed]
    tls-hw-rx-offload: off [fixed]
    rx-gro-hw: off [fixed]
    tls-hw-record: off [fixed]

  • Felix,

    There were some recent updates to remove some unnecessary logging that should help. You can pull the latest v5.4 kernel. picking patches (assuming customer is seeing this on EVM too) https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/log/?h=ti-linux-5.4.y

    Also, can you try running netperf or iperf3 for performance measurement and report the results?

    Best regards,

    Dave

  • I picked up latest j721e-cpsw-virt-mac.c and k3-cppi-desc-pool.c from v5.4 kernel, the throughput and CPU usage are still the same, no any obvious difference.

  • felix,

    Thanks for confirming. We don't have other optimization tips to quickly apply on top of 7.0, and as mentioned are looking at options for improvements in future SDK releases.

    Please do try netperf and iperf3 as well.

    Best regards,

    Dave

  • Hi Dave,

    Since we do not have too much optimization on A72 Linux, is it possible to use R5F to achieve high bandwidth with low loading?

    Their target is to utilize the interface to transmit as much as possible.

    Thanks & Best Regards!

    ZM

  • ZM,

    As this thread is marked resolved, please do start a new thread for discussion on R5F and/or distributing the host traffic loading/optimizations so we do not miss updates.

    Best regards,

    Dave