This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

WL1837MOD: Driver randomly crashes with R8.7_SP3

Part Number: WL1837MOD

We have an embedded device that utilizes the TI WL1837MOD module, and in order to resolve some performance issues we were facing with 802.11s mesh, we recently switched drivers & wpa_supplicant versions from mainline releases to the TI forks. Our current versions in use are as follows:

  • Kernel: v3.10 (mainline)
  • wl18xx driver: TI kernel driver - R8.7_SP3
  • wl18xx firmware: 8.9.0.0.79
  • wpa_supplicant/hostapd: 2.6 (TI fork - R8.7_SP3)

We saw the improved mesh speeds that we were looking for, but we've also been experiencing occasional driver crashes. They appear to occur randomly. For example, over the course of 14 hours of uptime, our device saw 4 different crashes at 8, 8 1/2, 9, and 13 1/2 hours of uptime.

A snippet from the crash can be seen below:

[48635.774871] wlcore: Scan completed due to error.
[48635.779535] ------------[ cut here ]------------
[48635.784245] WARNING: at /home/kollisc/ssp-linux/lib_oss/linux-backports-a/drivers/net/wireless/ti/wlcore/main.c:807 wl1271_op_cancel_hw_scan+0x83c/0x840 [wlcore]()
[48635.798894] Modules linked in: ieee1609dot3(PO) ieee1609dot4(PO) ieee1609gnl(PO) cw_llc(O) cw_os(O) cw_pps(O) cryptodev(O) wl18xx(O) wlcore(O) mac80211(O) cfg80211(O) wlcore_sdio(O) compat(O)
[48635.816182] CPU: 0 PID: 1893 Comm: kworker/u2:0 Tainted: P        W  O 3.10.108-ltsi #2
[48635.824265] Workqueue: phy0 wl1271_scan_complete_work [wlcore]
[48635.830165] [<c0012874>] (unwind_backtrace+0x0/0xf4) from [<c0011134>] (show_stack+0x10/0x14)
[48635.838737] [<c0011134>] (show_stack+0x10/0x14) from [<c002a5ec>] (warn_slowpath_common+0x54/0x68)
[48635.847772] [<c002a5ec>] (warn_slowpath_common+0x54/0x68) from [<c002a69c>] (warn_slowpath_null+0x1c/0x24)
[48635.857507] [<c002a69c>] (warn_slowpath_null+0x1c/0x24) from [<bf0c66d4>] (wl1271_op_cancel_hw_scan+0x83c/0x840 [wlcore])
[48635.868580] [<bf0c66d4>] (wl1271_op_cancel_hw_scan+0x83c/0x840 [wlcore]) from [<bf0d99c8>] (wl1271_scan_complete_work+0x100/0x104 [wlcore])
[48635.881173] [<bf0d99c8>] (wl1271_scan_complete_work+0x100/0x104 [wlcore]) from [<c0041094>] (process_one_work+0xf4/0x344)
[48635.892178] [<c0041094>] (process_one_work+0xf4/0x344) from [<c004141c>] (worker_thread+0x138/0x3d4)
[48635.901352] [<c004141c>] (worker_thread+0x138/0x3d4) from [<c0046cf0>] (kthread+0xb4/0xb8)
[48635.909657] [<c0046cf0>] (kthread+0xb4/0xb8) from [<c000dfa0>] (ret_from_fork+0x14/0x34)
[48635.917771] ---[ end trace b9d5c2d586feab90 ]---
[48635.954584] wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.0.79
[48635.975226] wlcore: pc: 0x0, hint_sts: 0x00000000 count: 4
[48635.981401] wlcore: down
[48635.989523] wlcore: down
[48635.995332] ieee80211 phy0: Hardware restart was requested
[48636.489759] wlcore: PHY firmware version: Rev 8.2.0.0.242
[48636.680608] wlcore: firmware booted (Rev 8.9.0.0.79)
[48636.709952] wlcore: Association completed.
[48636.742602] wlan0: deauthenticated from 08:60:6e:bb:5b:40 (Reason: 7=CLASS3_FRAME_FROM_NONASSOC_STA)
[48636.880554] cfg80211: World regulatory domain updated:
[48636.885718] cfg80211:  DFS Master region: unset
[48636.890115] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[48636.899895] cfg80211:   (2402000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 600 mBm), (N/A)
[48637.277447] cfg80211: Regulatory domain changed to country: CA
[48637.283325] cfg80211:  DFS Master region: FCC
[48637.287546] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[48637.297330] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm), (N/A)
[48637.305352] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 1700 mBm), (N/A)
[48637.313410] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2400 mBm), (0 s)
[48637.321444] cfg80211:   (5490000 KHz - 5730000 KHz @ 80000 KHz), (N/A, 2400 mBm), (0 s)
[48637.329531] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm), (N/A)
[48643.669927] wlcore: down

Any ideas as to why we're seeing this crash? Please let me know what information I can provide to help diagnose the issue.

  • Hi,

    Can you pls re-test with latest firmware version : https://git.ti.com/cgit/wilink8-wlan/wl18xx_fw/

    Thanks

    Saurabh

  • Thanks for the response. It looks like we still hit the driver crash. Below is a snippet from dmesg:

    [ 4233.294824] wlcore: Scan completed due to error.
    [ 4233.299485] ------------[ cut here ]------------
    [ 4233.304179] WARNING: at /home/kollisc/ssp-linux/lib_oss/linux-backports-a/drivers/net/wireless/ti/wlcore/main.c:807 wl1271_op_cancel_hw_scan+0x83c/0x840 [wlcore]()
    [ 4233.318865] Modules linked in: wl18xx(O) wlcore(O) cryptodev(O) mac80211(O) cfg80211(O) wlcore_sdio(O) compat(O) [last unloaded: wlcore]
    [ 4233.331329] CPU: 0 PID: 20447 Comm: kworker/u2:1 Tainted: G        W  O 3.10.108-ltsi #2
    [ 4233.339495] Workqueue: phy3 wl1271_scan_complete_work [wlcore]
    [ 4233.345371] [<c0012874>] (unwind_backtrace+0x0/0xf4) from [<c0011134>] (show_stack+0x10/0x14)
    [ 4233.353947] [<c0011134>] (show_stack+0x10/0x14) from [<c002a5ec>] (warn_slowpath_common+0x54/0x68)
    [ 4233.362936] [<c002a5ec>] (warn_slowpath_common+0x54/0x68) from [<c002a69c>] (warn_slowpath_null+0x1c/0x24)
    [ 4233.372658] [<c002a69c>] (warn_slowpath_null+0x1c/0x24) from [<bf18e6d4>] (wl1271_op_cancel_hw_scan+0x83c/0x840 [wlcore])
    [ 4233.383714] [<bf18e6d4>] (wl1271_op_cancel_hw_scan+0x83c/0x840 [wlcore]) from [<bf1a19c8>] (wl1271_scan_complete_work+0x100/0x104 [wlcore])
    [ 4233.396299] [<bf1a19c8>] (wl1271_scan_complete_work+0x100/0x104 [wlcore]) from [<c0041094>] (process_one_work+0xf4/0x344)
    [ 4233.407288] [<c0041094>] (process_one_work+0xf4/0x344) from [<c004141c>] (worker_thread+0x138/0x3d4)
    [ 4233.416448] [<c004141c>] (worker_thread+0x138/0x3d4) from [<c0046cf0>] (kthread+0xb4/0xb8)
    [ 4233.424743] [<c0046cf0>] (kthread+0xb4/0xb8) from [<c000dfa0>] (ret_from_fork+0x14/0x34)
    [ 4233.432849] ---[ end trace 3a1b164fbdf3dd69 ]---
    [ 4233.437502] wlcore: ERROR failed to send reg domain dfs config
    [ 4233.449412] wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.0.85
    [ 4233.465473] wlcore: pc: 0x0, hint_sts: 0x00000000 count: 1
    [ 4233.471440] wlcore: down
    [ 4233.474456] wlcore: down
    [ 4233.478449] ieee80211 phy3: Hardware restart was requested
    [ 4233.915197] wlcore: PHY firmware version: Rev 8.2.0.0.245
    [ 4234.150512] wlcore: firmware booted (Rev 8.9.0.0.85)
    [ 4234.181726] wlcore: Association completed.
    [ 4234.219386] wlan0: deauthenticated from 00:25:ca:2e:19:a9 (Reason: 7=CLASS3_FRAME_FROM_NONASSOC_STA)
    [ 4234.275713] wlan0: authenticate with 00:25:ca:2e:19:a9
    [ 4234.294967] wlan0: send auth to 00:25:ca:2e:19:a9 (try 1/3)
    [ 4234.581704] cfg80211: World regulatory domain updated:
    [ 4234.586882] cfg80211:  DFS Master region: unset
    [ 4234.591242] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
    [ 4234.601013] cfg80211:   (2402000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 600 mBm), (N/A)
    [ 4234.609707] wlan0: authenticated
    [ 4234.614414] wl18xx_driver wl18xx.0.auto wlan0: disabling HT as WMM/QoS is not supported by the AP
    [ 4234.623351] wl18xx_driver wl18xx.0.auto wlan0: disabling VHT as WMM/QoS is not supported by the AP
    [ 4234.638236] wlan0: associate with 00:25:ca:2e:19:a9 (try 1/3)
    [ 4235.120259] cfg80211: Regulatory domain changed to country: US
    [ 4235.126116] cfg80211:  DFS Master region: FCC
    [ 4235.130321] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
    [ 4235.140093] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm), (N/A)
    [ 4235.148113] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 1700 mBm), (N/A)
    [ 4235.156197] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2300 mBm), (0 s)
    [ 4235.164231] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm), (N/A)
    [ 4235.172259] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 4000 mBm), (N/A)
    [ 4235.183025] wlan0: RX AssocResp from 00:25:ca:2e:19:a9 (capab=0x411 status=0 aid=1)
    [ 4235.203713] wlan0: associated
    [ 4235.210431] wlan0: Limiting TX power to 30 (30 - 0) dBm as advertised by 00:25:ca:2e:19:a9
    [ 4235.226287] wlcore: Association completed.
    [ 4235.236216] cfg80211: Regulatory domain changed to country: US
    [ 4235.242119] cfg80211:  DFS Master region: FCC
    [ 4235.246315] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
    [ 4235.256091] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm), (N/A)
    [ 4235.264111] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 1700 mBm), (N/A)
    [ 4235.272135] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2300 mBm), (0 s)
    [ 4235.280169] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm), (N/A)
    [ 4235.288195] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 4000 mBm), (N/A)

  • In case it helps track down the root cause of the issue, I enabled full debug logging by running "echo 1048575 > /sys/module/wlcore/parameters/debug_level" and recreated the issue.  The resulting log file is attached.  Let me know if any other information would be helpful.

    syslog_2020-12-11_14-30-03.log.gz

  • Hi Saurabh,

    Happy new year!  I hope you are doing well.  Just checking to see if you have any further thoughts on our issue or ideas to try.  I can reproduce the issue fairly easily, but I'm not sure what else to do to try and resolve it.

    Thanks!

  • Hi,

    Thanks , Happy new year to you too . How many roles are running on WiLink8 device ? AP / STA / Mesh . Is BT enabled ?

    Regards

    Saurabh

  • The easiest way to reproduce the issue is to configure both AP & STA roles simultaneously.  However, I've seen the issue very occasionally occur when using just the STA role.  Bluetooth is not enabled.

  • Hi,

    In wlcore\scan.c, can  you pls try commenting out "wl12xx_queue_recovery_work(wl);" and re-test. This is a troubleshooting step

    Also pls try to set country code ex: "iw reg set US" before running the tests

    Regards

    Saurabh

  • I repeated the test with that line commented.  I added the message "Scan completed due to error. Skipping recovery" to confirm my changes were in place.

    In the attached log, the connection issues started shortly after the following line:

    Jan  1 03:09:48 buildroot user.debug kernel: [  571.067076] wlcore: Scanning complete

    This continued until around the following line:

    Jan  1 03:19:30 buildroot user.debug kernel: [ 1153.454837] wlcore: Scan completed due to error. Skipping recovery.

    Shortly after that line, it seems like recovery was still performed elsewhere:

    Jan  1 03:19:36 buildroot user.err kernel: [ 1158.462620] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery

    After recovery, the connection started working again.  Let me know if you need more details.

    syslog_2021-01-06_10-43-53.log.gz

  • Sorry, I just realized that the most recent test I just ran was using the older 8.9.0.0.79 firmware again.  I see a 8.9.0.0.86 firmware is out now.  I'll try and recreate the issue using that & attach a new log file.

  • I think I have some good news!  I tried out the latest 8.9.0.0.86 firmware, and after running for 16 hours, I haven't been able to reproduce the issue.  I reverted back to the 8.9.0.0.85 firmware, and I hit the issue in less than 10 minutes.  I'd like to let my test run on firmware .86 through the weekend before calling it resolved, but things look promising.

    I noticed that there are no changelogs for the minor firmware revisions.  Do you have any details on what changed from version .85 to .86?  Does it make sense that this new firmware would resolve the issue I was seeing?

  • Hi,

    Glad to hear it's working better for you. We have had similar feedback from other customers.

    We made adjustments to WiFi adaptivity code in firmware release 86. Pls keep us posted

    Regards

    Saurabh

  • Ran the test through the weekend, and didn't see a single crash.  At this point, I'm comfortable saying that v8.9.0.0.86 of the firmware resolved the issues we were seeing.

    The only thing I noticed are the following log messages occasionally printing:

    Jan 3 02:11:36 buildroot user.err kernel: [ 6567.791373] wlcore: ERROR Unsupported RX rate from HW: 28
    Jan 3 04:58:06 buildroot user.err kernel: [16557.843698] wlcore: ERROR Unsupported RX rate from HW: 28
    Jan 3 05:14:10 buildroot user.err kernel: [17522.332191] wlcore: ERROR Unsupported RX rate from HW: 28
    Jan 3 07:27:38 buildroot user.err kernel: [25530.208244] wlcore: ERROR Unsupported RX rate from HW: 28
    Jan 3 19:45:40 buildroot user.err kernel: [69811.867040] wlcore: ERROR Unsupported RX rate from HW: 28
    Jan 4 10:52:05 buildroot user.err kernel: [124196.881365] wlcore: ERROR Unsupported RX rate from HW: 28
    Jan 5 15:18:08 buildroot user.err kernel: [226559.758716] wlcore: ERROR Unsupported RX rate from HW: 28
    Jan 5 21:33:43 buildroot user.err kernel: [249095.458940] wlcore: ERROR Unsupported RX rate from HW: 28

    I didn't notice any negative behavior when these messages occurred.  Do you have any more detail on what they indicate?

    Thanks again for all of the help!