This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Wifi stops working with dmesg log `transmit queue 2 timed out`

Hi,

    We are using beagle bone based custom board running linux 3.12.10 and we have a Marvell Wifi chip(88w8801) connected to the MMC0 port. Everything seemed to be working fine until we deployed the software in one specific alpha site. We are seeing issues where after a random amount of time and random amount of multiple client connections, the Wifi just hangs (none of the stations can connect, the clients that were connected before also can't communicate) and I see the following dmesg logs.

Note: The kernel is not hung, I am able to login and communicate through the wired interface. The problem is only in case of Wi-Fi. Once this happens, i'm also not able to communicate with the wifi chip. However SSID continues to be broad-casted from the wifi chip


```
[ 1870.342582] ADDBA REQ: 80:XX:XX:XX:f8:45 tid=0 ssn=0 win_size=64,amsdu=0
[ 1871.609920] uap0: 1470354595.716682 : Event: 0x55
[ 1871.609975] EVENT: TX_DATA_PAUSE
[ 1871.609994] TxPause: 80:XX:XX:XX:f8:45 pause=1, pkts=5
[ 1871.668451] uap0: 1470354595.775212 : Event: 0x55
[ 1871.668498] EVENT: TX_DATA_PAUSE
[ 1871.668516] TxPause: 80:XX:XX:XX:f8:45 pause=0, pkts=2
[ 2110.089467] ------------[ cut here ]------------
[ 2110.089525] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x254/0x260()
[ 2110.089539] NETDEV WATCHDOG: uap0 (wlan_sdio): transmit queue 2 timed out
[ 2110.089547] Modules linked in: sd8xxx(O) mlan(PO)
[ 2110.089581] CPU: 0 PID: 0 Comm: swapper Tainted: P           O 3.12.10-005-ts-armv7l #1
[ 2110.089633] [<c0012d24>] (unwind_backtrace+0x0/0xf4) from [<c0011130>] (show_stack+0x10/0x14)
[ 2110.089672] [<c0011130>] (show_stack+0x10/0x14) from [<c0036090>] (warn_slowpath_common+0x6c/0x84)
[ 2110.089698] [<c0036090>] (warn_slowpath_common+0x6c/0x84) from [<c00360d8>] (warn_slowpath_fmt+0x30/0x40)
[ 2110.089721] [<c00360d8>] (warn_slowpath_fmt+0x30/0x40) from [<c033a4a8>] (dev_watchdog+0x254/0x260)
[ 2110.089754] [<c033a4a8>] (dev_watchdog+0x254/0x260) from [<c003e7d4>] (call_timer_fn.isra.37+0x24/0x88)
[ 2110.089780] [<c003e7d4>] (call_timer_fn.isra.37+0x24/0x88) from [<c003e978>] (run_timer_softirq+0x140/0x1b4)
[ 2110.089802] [<c003e978>] (run_timer_softirq+0x140/0x1b4) from [<c00393c0>] (__do_softirq+0xf4/0x1d0)
[ 2110.089822] [<c00393c0>] (__do_softirq+0xf4/0x1d0) from [<c0039534>] (do_softirq+0x4c/0x54)
[ 2110.089841] [<c0039534>] (do_softirq+0x4c/0x54) from [<c00397ac>] (irq_exit+0xa0/0xe8)
[ 2110.089862] [<c00397ac>] (irq_exit+0xa0/0xe8) from [<c000eaa8>] (handle_IRQ+0x3c/0x84)
[ 2110.089885] [<c000eaa8>] (handle_IRQ+0x3c/0x84) from [<c0008770>] (omap3_intc_handle_irq+0x68/0x74)
[ 2110.089911] [<c0008770>] (omap3_intc_handle_irq+0x68/0x74) from [<c041ce40>] (__irq_svc+0x40/0x50)
[ 2110.089922] Exception stack(0xc05c3f68 to 0xc05c3fb0)
[ 2110.089940] 3f60:                   00000000 00000000 00000000 00000000 c05c2000 c05ca094
[ 2110.089958] 3f80: c05c2000 c0614308 00000001 c0614308 c05c2000 c05c2000 01000000 c05c3fb0
[ 2110.089971] 3fa0: c000ec10 c000ec14 60000013 ffffffff
[ 2110.089994] [<c041ce40>] (__irq_svc+0x40/0x50) from [<c000ec14>] (arch_cpu_idle+0x2c/0x30)
[ 2110.090025] [<c000ec14>] (arch_cpu_idle+0x2c/0x30) from [<c005c2e4>] (cpu_startup_entry+0xc0/0xf8)
[ 2110.090057] [<c005c2e4>] (cpu_startup_entry+0xc0/0xf8) from [<c058b9f8>] (start_kernel+0x2dc/0x2e8)
[ 2110.090070] ---[ end trace 76d71ba5357e91cd ]---
[ 2110.090085] 181008 : uap0 (bss=1): Tx timeout (1)
[ 2192.169473] 189216 : uap0 (bss=1): Tx timeout (2)
[ 2274.089476] 197408 : uap0 (bss=1): Tx timeout (3)
[ 2356.169471] 205616 : uap0 (bss=1): Tx timeout (4)
[ 2438.089472] 213808 : uap0 (bss=1): Tx timeout (5)
```

I'm not sure if this is a mmc driver issue or a wifi chip issue. I was trying to google the error in dmesg `transmit queue 2 timed out` but was not every successful in finding a solution.

Any help regarding this issue is highly appreciated.

Thanks
Shankar

  • Hi Shankar,

    I need of more information about how the issue can be reproduces and could you clarify what means: "we deployed the software in one specific alpha site."?

    BR
    Tsvetolin Shulev
  • Hi Tsvetolin Shulev,
    What I meant was, we placed our product at a place to test its features. This place seems to exhibit something unique as the problem that we are seeing with the "transmit queue 2 timed out" error only happens here. We haven't seen this issue with our product in other places where we have tried to run.

    We are not able to figure out what the uniqueness of this environment (or something else) is. This place has a couple of other routers such as cisco, linksys etc (similar to other places where the issue is not seen)

    I'm really concerned with "transmit queue 2 timed out" error in dmesg. What does this mean and when does this happen?
    I haven't been able to find a good answer yet.

    Thanks
    Shankar
  • We found some more things... The issue seems to happen sometimes when we connect to the product through winscp on a laptop which in turn is connected to the product's wifi.
    Winscp is configured to connect to the product with ssh keys using sftp.
    There seems to be lot of traffic when winscp is logging in, which causes seems to cause this issue.

    Note: I was able to reproduce this with winscp multiple times but not everytime. So it seems to be random.

    Is there a way that we would end up with transmit queue timeout when there is lot of traffic?

    Thanks
    Shankar
  • Hi Shankar,

    Sorry for the delay. The error message "NETDEV WATCHDOG: %s (%s): transmit queue 2 timed out" is caused by timeout of transmit queue of device and driver name specified with "%s (%s)". See ../linux/net/sched/sch_generic.c file dev_watchdog(...) function.
    I only can guess about the reason causing this timeout but I suggest you to search in the WiFi chip documentation about known issues or limitation related with incompatibility with some settings of the WiFI router.
    An other my assumption is that the winscp trying to send traffic with higher rate then the max throughput of WiFi chip.

    BR
    Tsvetolin Shulev
  • Looks like the issue was in Wi-Fi Chip as you suggested. The Wi-Fi chip seems to lock up due to some issue.

    The Marvell guys gave us a new firmware and this seems to have solved the issue for us.

    Thanks for all the help.

    Thanks

    Shankar

  • Hi Shankar,

    Thank you for update!

    BR

    Tsvetolin Shulev