This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3352: USB Ethernet TX errors

Part Number: AM3352

We have issus with an USB ethernet controller (lan9500, smsc95xx driver) on a custom board with am3352 when DMA is enabled packets suddenly can no longer be transmitted and the TX error count increases. It occurs under moderate traffic within a few hours, or weeks. It cannot be reproduced by simply transmitting a terrabyte of data, the packet count seems to be more influencal.


The problem is present with all kernel versions we have used so far (4.0 4.4 4.14, kernel.org and git.ti.com). Currently we are using 4.14.172 based on the ti-linux-4.14.y branch with the missing stable kernel patches applied. The lan9500 is connected by a USB Hub and it has an ethernet switch attached to itself. The switch is controlled by a Distributed Switch Architecture (DSA) driver, but it is the same on the cpsw ethernet and its working fine there. I don't think the USB hub is causing problems either, because there have been similar errors with other devices (CDC Ether, or another lan9500 without the switch) where the TX error count increased. However they resulted in netdev watchdog errors. The likelyhood of the error occuring increases if more (up to 3) lan9500 are attached to the USB hub. Once it occurred it is also a lot more likely to reoccurr.

The workaround so far has been to check the TX error counter on the network interface and restart it once it increases. This causes packet loss and a downtime of at least a few seconds. A big customer will now longer accept that and furthermore restarting the interface is not fully reliable. It might simply stop working and the only workaround is to reboot the whole device. Receiving packets is still working, another lan9500 attached to the USB hub can continue to function properly.
Turning off DMA is not an option, because of performance issues.

Here is an example of the error occuring. The network interfaces are called swm4 and swm5.

[142502.538889] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142502.538966] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142502.539055] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142502.539187] smsc95xx 2-1.2:1.0 swm4: kevent 0 may have been dropped
[142502.539253] smsc95xx 2-1.2:1.0 swm4: kevent 0 may have been dropped
[142502.568045] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142502.568147] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142507.341366] net_ratelimit: 298 callbacks suppressed
[142507.341389] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142507.341458] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142507.341520] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142507.601300] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142507.601380] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142507.601447] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142507.601567] smsc95xx 2-1.2:1.0 swm4: kevent 0 may have been dropped
[142507.601630] smsc95xx 2-1.2:1.0 swm4: kevent 0 may have been dropped
[142507.754775] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped
[142507.754853] smsc95xx 2-1.1:1.0 swm5: kevent 0 may have been dropped

~ # ifconfig swm4
swm4      Link encap:Ethernet  HWaddr 00:05:B6:99:88:78  
          inet6 addr: fe80::205:b6ff:fe99:8878/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:183313 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3478853 errors:70475 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:35783060 (34.1 MiB)  TX bytes:409513090 (390.5 MiB)

~ # ifconfig swm5
swm5      Link encap:Ethernet  HWaddr 00:05:B6:03:10:52  
          inet6 addr: fe80::205:b6ff:fe03:1052/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:344310 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3967330 errors:58438 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:29199213 (27.8 MiB)  TX bytes:337955614 (322.2 MiB)


I will give kernel 5.4 a try, but it will take me at least a week to merge my other changes and run the tests.
Any suggestions?

  • Hi Markus,

    What kind of test program do you run? Can you connect the lan9500 device to an AM3352 EVM (for example the GP EVM or Beaglebone Black) to run your test? By this I think you don't have to spend time to merge your changes.

    I would recommend to test TI 4.19.y kernel and community v5.6 kernel. I think there were some CPPI41 DMA driver patches added in the recent kernels. If the issue still happens with the latest TI and community kernels. The next step would be to connect a Lecroy usb protocol analyzer to capture the usb bus traffic, but it still would be challenging to figure out how to trigger the analyzer to stop once the problem happens, because you mentioned the issue takes hours or weeks to happen.

  • Hi Bin thank you for your comment.
    I forgot to mention that I was not able to trigger the error with any deterministic test program yet. A busy corporate network, or PLCs talking Profinet can trigger the error. I'm suspecting the broadcast and multicast packets to trigger the error. It is a 4 port switch, so the kernel would send 4 packets "at once".
    I have a BBB with 5.6 community kernel and 2 consumer lan9500 devices attached running now. I might be able to mimic the behavior of the DSA with a bridge with lots of vlan interfaces.
    I don't see why a USB analyzer would help, if it runs perfectly smooth with DMA disabled.
    The cppi41 fixes you were refering to, are already included in 4.14.152, I think. There were other musb fixes in .165 and .172, at least I saw 2 other issues solved. There was a high load when disconnecting a wireless modem and a LAN7850 did not work properly, when connected through a USB hub.

  • Hi Markus,

    A USB analyzer provides USB bus traces, which could help me to understand the data pattern in your use case.This is typically the first step of the debug process.

    Please let me know your test result with BBB. If you could reproduce the symptom with BBB or AM335x GP EVM and I could replicate the same, it would help me to debug the issue locally.