TDA4VM: CPSW9G performance issue on SDK 7.0

felix xu

Part Number: TDA4VM

Hi, since we have high network throughput, it is about 800Mbps.

And in our arch design, we will use CPSW9G for network data exchange. But it seems that CPSW9G's CPU usage is too high, here is my test results:

iperf TCP TX on EVM board: 942 Mbits/sec with ~25% CPU usage

iperf TCP RX on EVM board: 811 Mbits/sec with ~60% CPU usage

So we wonder if there is some technics to reduce the CPU usage, since the CPU resource is so precious in our software design.

Any driver patches and Linux config tweaks are welcome.

Thanks.

over 5 years ago

0 Dave Bell over 5 years ago

TI__Genius 14680 points

felix,

Can you confirm if checksum offoad is currently enabled (confirm with ethtool -K command).

This is supported and should be default, but if it is not enabled will be a possible quick improvement. Additional optimizations are under evaluation.

Best regards,

Dave

0 felix xu over 5 years ago in reply to Dave Bell

Intellectual 500 points

Hi Dave,

Yes, checksum offload is enabled by default

root@j7-evm:~# ethtool -k eth1
Features for eth1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on [fixed]
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off [fixed]
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

0 Dave Bell over 5 years ago in reply to felix xu

TI__Genius 14680 points

Felix,

There were some recent updates to remove some unnecessary logging that should help. You can pull the latest v5.4 kernel. picking patches (assuming customer is seeing this on EVM too) https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/log/?h=ti-linux-5.4.y

Also, can you try running netperf or iperf3 for performance measurement and report the results?

Best regards,

Dave

0 felix xu over 5 years ago in reply to Dave Bell

Intellectual 500 points

I picked up latest j721e-cpsw-virt-mac.c and k3-cppi-desc-pool.c from v5.4 kernel, the throughput and CPU usage are still the same, no any obvious difference.

0 Dave Bell over 5 years ago in reply to felix xu

TI__Genius 14680 points

felix,

Thanks for confirming. We don't have other optimization tips to quickly apply on top of 7.0, and as mentioned are looking at options for improvements in future SDK releases.

Please do try netperf and iperf3 as well.

Best regards,

Dave

0 Ming Zhong10 over 5 years ago in reply to Dave Bell

TI__Intellectual 2235 points

Hi Dave,

Since we do not have too much optimization on A72 Linux, is it possible to use R5F to achieve high bandwidth with low loading?

Their target is to utilize the interface to transmit as much as possible.

Thanks & Best Regards!

0 Dave Bell over 5 years ago in reply to Ming Zhong10

TI__Genius 14680 points

ZM,

As this thread is marked resolved, please do start a new thread for discussion on R5F and/or distributing the host traffic loading/optimizations so we do not miss updates.

Best regards,

Dave

Processors

Processors forum

TDA4VM: CPSW9G performance issue on SDK 7.0