Other Parts Discussed in Thread: WL1837,
Hi,
We have 2 devices, both of them using the WL1837 wifi chip, connected together, where one of them is acting as access point and the other works in device mode; no other device is connected to the wifi network.
5 Ghz band, channel 48 is used.
Linux kernel 4.14.x is used.
There is a custom SW protocol running between the two systems, which has timeout values around 150ms.
The problem is that every few hours (1-3 hours) we get timeouts/disconnections related with the following kernel error: kernel: wlcore: ERROR SW watchdog interrupt received! starting recovery.
When the issue happens, we see the following kernel log on the access point side:
Jun 22 19:23:47 x5_Test user.err kernel: wlcore: ERROR SW watchdog interrupt received! starting recovery.
Jun 22 19:23:47 x5_Test user.warn kernel: ------------[ cut here ]------------
Jun 22 19:23:47 x5_Test user.warn kernel: WARNING: CPU: 0 PID: 584 at /home/autosvn/work/exorint-1.3.x/build/tmp/work-shared/ns01-x5xx/kernel-source/drivers/net/wireless/ti/wlcore/main.c:795 wl12xx_queue_recovery_work+0x64/0x68
Jun 22 19:23:47 x5_Test user.warn kernel: Modules linked in: wl18xx caam_jr wlcore_sdio secvio caam
Jun 22 19:23:47 x5_Test user.warn kernel: CPU: 0 PID: 584 Comm: irq/133-s-wl18x Not tainted 4.14.78-rt47 #1
Jun 22 19:23:47 x5_Test user.warn kernel: Hardware name: Freescale i.MX6 UltraLite (Device Tree)
Jun 22 19:23:47 x5_Test user.warn kernel: [<8010e83c>] (unwind_backtrace) from [<8010adf0>] (show_stack+0x10/0x14)
Jun 22 19:23:47 x5_Test user.warn kernel: [<8010adf0>] (show_stack) from [<808aef84>] (dump_stack+0x78/0x8c)
Jun 22 19:23:47 x5_Test user.warn kernel: [<808aef84>] (dump_stack) from [<80125b18>] (__warn+0xe4/0x100)
Jun 22 19:23:47 x5_Test user.warn kernel: [<80125b18>] (__warn) from [<801257dc>] (warn_slowpath_null+0x20/0x28)
Jun 22 19:23:47 x5_Test user.warn kernel: [<801257dc>] (warn_slowpath_null) from [<8058eaa4>] (wl12xx_queue_recovery_work+0x64/0x68)
Jun 22 19:23:47 x5_Test user.warn kernel: [<8058eaa4>] (wl12xx_queue_recovery_work) from [<8058ee68>] (wlcore_irq+0xf8/0x154)
Jun 22 19:23:47 x5_Test user.warn kernel: [<8058ee68>] (wlcore_irq) from [<8016b618>] (irq_thread_fn+0x1c/0x54)
Jun 22 19:23:47 x5_Test user.warn kernel: [<8016b618>] (irq_thread_fn) from [<8016b8c4>] (irq_thread+0x11c/0x1d4)
Jun 22 19:23:47 x5_Test user.warn kernel: [<8016b8c4>] (irq_thread) from [<801420fc>] (kthread+0x124/0x154)
Jun 22 19:23:47 x5_Test user.warn kernel: [<801420fc>] (kthread) from [<80107630>] (ret_from_fork+0x14/0x24)
Jun 22 19:23:47 x5_Test user.warn kernel: ---[ end trace 0000000000000002 ]---
Jun 22 19:23:47 x5_Test user.info kernel: wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.0.81
Jun 22 19:23:47 x5_Test user.info kernel: wlcore: pc: 0x18f7a, hint_sts: 0x00000000 count: 1
Jun 22 19:23:47 x5_Test user.info kernel: wlcore: down
Jun 22 19:23:47 x5_Test user.info kernel: ieee80211 phy0: Hardware restart was requested
Jun 22 19:23:48 x5_Test user.info kernel: wlcore: using inverted interrupt logic: 2
Jun 22 19:23:48 x5_Test user.info kernel: wlcore: PHY firmware version: Rev 8.2.0.0.243
Jun 22 19:23:48 x5_Test user.info kernel: wlcore: firmware booted (Rev 8.9.0.0.81)
At the same time, the following kernel log can be seen at the other device side:
Jun 22 19:23:47 HMI-aba6 user.err kernel: wlcore: ERROR SW watchdog interrupt received! starting recovery.
Jun 22 19:23:47 HMI-aba6 user.warn kernel: ------------[ cut here ]------------
Jun 22 19:23:47 HMI-aba6 user.warn kernel: WARNING: CPU: 0 PID: 606 at /home/autosvn/work/exorint-1.3.x/build/tmp/work-shared/ns01-x5xx/kernel-source/drivers/net/wireless/ti/wlcore/main.c:795 wl12xx_queue_recovery_work+0x64/0x68
Jun 22 19:23:47 HMI-aba6 user.warn kernel: Modules linked in: wl18xx caam_jr asix usbnet mii wlcore_sdio secvio caam
Jun 22 19:23:47 HMI-aba6 user.warn kernel: CPU: 0 PID: 606 Comm: irq/84-s-wl18xx Not tainted 4.14.78-rt47 #1
Jun 22 19:23:47 HMI-aba6 user.warn kernel: Hardware name: Freescale i.MX6 UltraLite (Device Tree)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<8010e83c>] (unwind_backtrace) from [<8010adf0>] (show_stack+0x10/0x14)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<8010adf0>] (show_stack) from [<808aef84>] (dump_stack+0x78/0x8c)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<808aef84>] (dump_stack) from [<80125b18>] (__warn+0xe4/0x100)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<80125b18>] (__warn) from [<801257dc>] (warn_slowpath_null+0x20/0x28)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<801257dc>] (warn_slowpath_null) from [<8058eaa4>] (wl12xx_queue_recovery_work+0x64/0x68)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<8058eaa4>] (wl12xx_queue_recovery_work) from [<8058ee68>] (wlcore_irq+0xf8/0x154)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<8058ee68>] (wlcore_irq) from [<8016b618>] (irq_thread_fn+0x1c/0x54)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<8016b618>] (irq_thread_fn) from [<8016b8c4>] (irq_thread+0x11c/0x1d4)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<8016b8c4>] (irq_thread) from [<801420fc>] (kthread+0x124/0x154)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: [<801420fc>] (kthread) from [<80107630>] (ret_from_fork+0x14/0x24)
Jun 22 19:23:47 HMI-aba6 user.warn kernel: ---[ end trace 0000000000000002 ]---
Jun 22 19:23:47 HMI-aba6 user.info kernel: wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.0.81
Jun 22 19:23:47 HMI-aba6 user.info kernel: wlcore: pc: 0x18f7a, hint_sts: 0x00000000 count: 1
Jun 22 19:23:47 HMI-aba6 user.info kernel: wlcore: down
Jun 22 19:23:47 HMI-aba6 user.info kernel: ieee80211 phy0: Hardware restart was requested
Jun 22 19:23:47 HMI-aba6 user.notice jmlauncher: channelReqReadyRead process last data 2
Jun 22 19:23:48 HMI-aba6 user.info kernel: wlcore: using inverted interrupt logic: 2
Jun 22 19:23:48 HMI-aba6 user.info kernel: wlcore: PHY firmware version: Rev 8.2.0.0.243
Jun 22 19:23:48 HMI-aba6 user.info kernel: wlcore: firmware booted (Rev 8.9.0.0.81)
Jun 22 19:23:48 HMI-aba6 user.info kernel: wlcore: Association completed.
Jun 22 19:23:48 HMI-aba6 user.info kernel: wlcore: Beacon loss detected. roles:0x1
To be noticed that the situation has improved (the issue is less frequent) after upgrading the WL18xx firmware to the latest version 8.9.0.0.81 but we still see some errors as above.
The two communicating systems are very close each other (less than 1m) during the execution of the whole test.
Is the issue already known ? Is there any fixing ?
Very, very urgent.
Thanks,
Stefano