This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

WL1807MOD: Mesh and AP: Incompatible with 5.4.56+ and occasional "skbuff: skb_under_panic"

Part Number: WL1807MOD

We are using WL1807MOD with Linux 5.4 on i.MX6 platform. I know that only 4.19 is supported, but I'd like to report our findings anyway. These are the patches we are using:

0002-wlcore-mesh-Add-support-for-RX-Boradcast-Key.patch
0005-wlcore-patch.patch
0007-Adding-support-to-IGTK-key-AES-CMAC128-in-the-wlcore.patch
0018-Adding-support-for-AP-MESH-multi-role.patch
0023-wlcore-Fixing-PN-drift-on-encrypted-link-after-recov.patch
rev-3f15e3e62c80-better-leak-than-crash.patch

All but last are from wilink. The last one is a revert of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=3f15e3e62c80 . Without that revert, system crashes as soon as wl18xx is loaded. This commit was introduced with Linux v5.4.56. In other words, something in the wilink patches depends on that memory leak. Any idea? 

Additionally, in some environments, the kernel crashes with "skbuff: skb_under_panic". This has been verified even with the 5.4.229 kernel. I noticed that the same problem is described on https://e2e.ti.com/support/wireless-connectivity/other-wireless-group/other-wireless/f/other-wireless-technologies-forum/963451/wilink-sw-facing-skb-panic-when-connecting-mesh-devices-in-sae-authentication , but with kernel 4.19.59. In other words, this problem seems to apply to a wide range of Linux kernels. Any ideas?

  • 2023-01-24T15:30:22+0000 imx6ull14x14evk sshd[1895]: Timeout, client not responding from user admin 172.24.1.195 port 12815
    2023-01-24T15:30:22+0000 imx6ull14x14evk systemd[1]: sshd@0-172.24.1.1:22-172.24.1.195:12815.service: Succeeded.
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: skbuff: skb_under_panic: text:7374c860 len:158 put:16 head:2c36c147 data:b6d2be94 tail:0x9ba09c9c end:0x9ba09d00 dev:wlan0_mesh
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP ARM
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Modules linked in: iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ip_tables x_tables lm75 wlcore_sdio qcaspi qca_7k_common wl18xx wlcore mac80211 cfg80211
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: CPU: 0 PID: 184 Comm: irq/178-wl18xx Not tainted 5.4.229-garo-1 #1
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Hardware name: Freescale i.MX6 Ultralite (Device Tree)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: PC is at skb_panic+0x60/0x64
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: LR is at skb_panic+0x60/0x64
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: pc : [<80a767ac>]    lr : [<80a767ac>]    psr: 60030013
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: sp : 9b0ffe28  ip : 00000000  fp : 9b787840
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: r10: 00000002  r9 : 00000000  r8 : 9b2b6e88
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: r7 : 9b031680  r6 : 00000001  r5 : 9ba09c9c  r4 : 9ba09bfe
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: r3 : 81004f08  r2 : 00000000  r1 : 9db45414  r0 : 0000007f
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Control: 10c5387d  Table: 9b17406a  DAC: 00000051
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Process irq/178-wl18xx (pid: 184, stack limit = 0x3ba87ba0)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Stack: (0x9b0ffe28 to 0x9b100000)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fe20:                   00000010 9ba09c00 9ba09bfe 9ba09c9c 9ba09d00 9b2b6000
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fe40: 9b031680 808e6580 00000000 0000009e 00000001 7f10fdfc 9b0ffe9f 9b031680
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fe60: 00000000 00000005 00000000 9ad90180 00000001 00000000 00000000 00147201
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fe80: 000fac01 00000000 000fac02 7f120564 7f1209b8 00000000 00000000 ff05fffc
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fea0: 00000000 81004f08 00000000 9b0316ac 00000040 9b2b4ea4 00000001 00000002
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fec0: 00000060 9b031680 9b0317b4 7f105b74 00000000 fffffff1 9b8690c0 7f125e00
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fee0: 7f11e12c 9b032488 000000ff 00000076 9b0316d8 9b0317b4 9b0fff3c 9b031680
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ff00: 9b0316ac 60030013 9b0316d8 9b0316c4 9b031860 ffffe000 00000000 7f107e74
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ff20: 9ae133c0 9a174800 9b0fe000 00000001 9ae133c0 8016b660 ffffe000 8016b67c
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ff40: 9a174800 9ae133e4 9b0fe000 8016ba64 00000000 00000000 8016b714 81004f08
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ff60: 9b0fff74 9ad92040 9ae13ac0 9b0fe000 00000000 9ae133c0 8016b910 9a8a9e1c
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ff80: 9ad9205c 801409b4 00000000 9ae13ac0 80140864 00000000 00000000 00000000
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ffa0: 00000000 00000000 00000000 801010e8 00000000 00000000 00000000 00000000
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80a767ac>] (skb_panic) from [<808e6580>] (__skb_checksum+0x0/0x328)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<808e6580>] (__skb_checksum) from [<00000000>] (0x0)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Code: e34800e1 e58d3014 e59c3054 ebffcc5d (e7f001f2) 
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: ---[ end trace b95c2649d2c4cba5 ]---
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: genirq: exiting task "irq/178-wl18xx" (184) is an active IRQ thread (irq 178)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: irq 178: nobody cared (try booting with the "irqpoll" option)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: CPU: 0 PID: 184 Comm: irq/178-wl18xx Tainted: G      D           5.4.229-garo-1 #1
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Hardware name: Freescale i.MX6 Ultralite (Device Tree)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<801101f0>] (unwind_backtrace) from [<8010b51c>] (show_stack+0x10/0x14)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8010b51c>] (show_stack) from [<80a76eac>] (dump_stack+0x90/0xa4)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80a76eac>] (dump_stack) from [<80a69bbc>] (__report_bad_irq+0x3c/0xc0)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80a69bbc>] (__report_bad_irq) from [<8016e21c>] (note_interrupt+0x264/0x2b0)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8016e21c>] (note_interrupt) from [<8016a8dc>] (handle_irq_event_percpu+0x80/0x88)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8016a8dc>] (handle_irq_event_percpu) from [<8016a91c>] (handle_irq_event+0x38/0x5c)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8016a91c>] (handle_irq_event) from [<8016ede4>] (handle_level_irq+0xb8/0x13c)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8016ede4>] (handle_level_irq) from [<801699fc>] (generic_handle_irq+0x24/0x34)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<801699fc>] (generic_handle_irq) from [<804af230>] (mxc_gpio_irq_handler+0x40/0xf8)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<804af230>] (mxc_gpio_irq_handler) from [<804afb68>] (mx3_gpio_irq_handler+0x60/0xac)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<804afb68>] (mx3_gpio_irq_handler) from [<801699fc>] (generic_handle_irq+0x24/0x34)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<801699fc>] (generic_handle_irq) from [<80169fe0>] (__handle_domain_irq+0x5c/0xb0)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80169fe0>] (__handle_domain_irq) from [<8049c3c8>] (gic_handle_irq+0x4c/0x90)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8049c3c8>] (gic_handle_irq) from [<80101b0c>] (__irq_svc+0x6c/0xa8)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Exception stack(0x9b0ffbf0 to 0x9b0ffc38)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fbe0:                                     00208044 80f84e80 00000000 1cbc7000
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fc00: 00000002 00000000 00000000 00000001 9b0fe000 9b0ffcd8 9b0ffc40 ffffe000
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fc20: 540a3b30 9b0ffc40 80126470 801022c0 60030113 ffffffff
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80101b0c>] (__irq_svc) from [<801022c0>] (__do_softirq+0x98/0x278)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<801022c0>] (__do_softirq) from [<80126470>] (irq_exit+0xb0/0xd8)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80126470>] (irq_exit) from [<80169fe4>] (__handle_domain_irq+0x60/0xb0)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80169fe4>] (__handle_domain_irq) from [<8049c3c8>] (gic_handle_irq+0x4c/0x90)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8049c3c8>] (gic_handle_irq) from [<80101b0c>] (__irq_svc+0x6c/0xa8)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Exception stack(0x9b0ffcd8 to 0x9b0ffd20)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fcc0:                                                       9a174868 a08bc014
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fce0: 9a174800 00009125 9a174800 9ae133c0 9a174868 9a174814 9a5b3cd4 9a5b3cb0
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fd00: 9a5b3840 ffffe000 00000000 9b0ffd28 8016b568 80a7d100 20030013 ffffffff
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80101b0c>] (__irq_svc) from [<80a7d100>] (_raw_spin_unlock_irq+0x1c/0x4c)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80a7d100>] (_raw_spin_unlock_irq) from [<8016b568>] (irq_finalize_oneshot.part.0+0x84/0xd4)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8016b568>] (irq_finalize_oneshot.part.0) from [<8013e420>] (task_work_run+0x8c/0xa8)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8013e420>] (task_work_run) from [<80124fc4>] (do_exit+0x374/0xabc)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80124fc4>] (do_exit) from [<8010b88c>] (die+0x36c/0x378)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8010b88c>] (die) from [<8010bb2c>] (do_undefinstr+0x144/0x1d0)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<8010bb2c>] (do_undefinstr) from [<80101be8>] (__und_svc_finish+0x0/0x38)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Exception stack(0x9b0ffdd8 to 0x9b0ffe20)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fdc0:                                                       0000007f 9db45414
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fde0: 00000000 81004f08 9ba09bfe 9ba09c9c 00000001 9b031680 9b2b6e88 00000000
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: fe00: 00000002 9b787840 00000000 9b0ffe28 80a767ac 80a767ac 60030013 ffffffff
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80101be8>] (__und_svc_finish) from [<80a767ac>] (skb_panic+0x60/0x64)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<80a767ac>] (skb_panic) from [<808e6580>] (__skb_checksum+0x0/0x328)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<808e6580>] (__skb_checksum) from [<00000000>] (0x0)
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: handlers:
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: [<70c4bc2a>] irq_default_primary_handler threaded [<611c88e3>] wlcore_irq [wlcore]
    2023-01-24T15:53:58+0000 imx6ull14x14evk kernel: Disabling IRQ #178
    

  • Hi,

    patch #23 also requires the latest firmware 8.9.1.0.0. Are you using the latest firmware?

    Regards,

    Shlomi

  • Yes, we introduced patch #23 when we updated to the latest firmware (since this is required).

  • Hi,

    I will look into it and let you know.

    Is this happening with SAE only or also in OPEN?

    Shlomi

  • We have now changed so that wpa_supplicant runs with key_mgmt=NONE, but the problem happens still. Any ideas?

    Also, what are the plans for the future regarding patches and kernel versions? The current solution with Wilink 8.8 now consists of many patches. In some cases patches which changes earlier patches, patches which has been rejected in the upstream community, and patches without description. The requirement of Linux 4.19 is also problematic. This Linux version will be End Of Life in December 2024. This is only 22 months away. Thus, it does not make sense to build new products based on this kernel. Other vendors such as NXP have moved to later kernels. 

    We would really appreciate some effort to get a stable solution based on, for example, Linux 5.4. Can we work together towards this goal?

  • Hi,

    Thanks for the update.

    Actually, I am currently in the process of integrating a new wpa_supplicant.

    As we moved a couple years ago from kernel 4.4.x to 4.19.38, we are now discussing moving to 5.x as well but there is no schedule yet.

    In any case, at least with WiLink there is currently no plan to upstream it to the main kernel repository.

    Any reason you chose to take only specific patches of the kernel?

    also, what wpa_supplicant are you using?

    Shlomi

  • We have now changed so that wpa_supplicant runs with key_mgmt=NONE, but the problem happens still. Any ideas?

    We spoke too soon - actually it seems like the problem disappears when disabling SAE.

    I will soon follow up with additional responses and information.

  • thanks for the update, please share more details once you have.

  • Any reason you chose to take only specific patches of the kernel?

    Yes. With Linux 5.4.X, some patches are already included, not necessary, or even wrong for this kernel version. The purpose of the patches are not very well described. In many cases, the description is about what is being changed, rather than the underlying purpose. In some cases, such as with 0004 and 0005, there is no description at all. 

    I have now analyzed all patches and summarized this at https://github.com/astrand/wilink8-wlan-build-utilites/wiki . 

    If I understand it correctly, patch 0003 is not even relevant for your recommended version 4.19.38, since that feature has been included in upstream Linux since 4.8.

    also, what wpa_supplicant are you using?

    We are currently using an upstream development version (https://w1.fi/cgit/hostap/commit/?id=f725254cc1297532a553de9fa5af8484ec95cda4), with a few custom patches. We understand that we need to support this version ourselves, but in any case, an userspace application such as wpa_supplicant should never be able to crash the kernel, so the skb_under_panic is indeed a kernel bug. It is of course possible though that this version of wpa_supplicant triggers the skb_under_panic problem and explains why you are not getting this.

    In any case, at least with WiLink there is currently no plan to upstream it to the main kernel repository.

    It is not necessary to upstream all of WiLink, especially for added features. However, when it comes to actual bugs in the driver, or changes required to use the latest firmware, these should be pushed upstream, I think. As my Wiki page indicates, such efforts has been done in the past.

    The interface between the driver and the firmware is very problematic. The latest firmware apparently includes a security correction, but this firmware "mandates upgrading to the latest driver", which in turn is only supported on Linux 4.19.38. Thus, the rest of the Linux community, which does not use this version, might not be able to get this security correction. 

    Regarding our main problem with "skb_under_panic" we will do additional tests with those patches that are marked with "???" in my table, but if you have any suggestions this would be very much appreciated.

  • Thanks for the update,

    At least referring to the last comment on the firmware-driver interface, I agree. however, there was no other option as a major bug was found and the only option was to modify the API (more accurately the event coming from firmware). It was highlighted of course and in addition a mechanism that checks the version was added so at least the firmware would not get loaded.

    The security fix was actually just added as a piggy back on top but it was not the main contributor to the release.

    Lastly, the driver patch should be trivial to integrate on top of later kernel versions as the source code functions are limited and haven't changed for a long time. Anyway, it may have been implemented differently to also allow older driver versions to load without crashing.

    Regards,

    Shlomi

  • Regarding our main problem with "skb_under_panic" we will do additional tests with those patches that are marked with "???" in my table

    We have done such a test now, but unfortunately the kernel still crashes with skb_under_panic. The list of patches we were using for this test were:


    0002-wlcore-mesh-Add-support-for-RX-Boradcast-Key.patch
    0004-mac80211-patch.patch
    0010-mac80211-mesh-fixed-HT-ies-in-beacon-template.patch
    0011-Mesh-bypass-blockack-encryption-WORKAROUND.patch
    0012-mesh-frames-received-out-of-order.patch
    0018-Adding-support-for-AP-MESH-multi-role.patch
    0023-wlcore-Fixing-PN-drift-on-encrypted-link-after-recov.patch
    rev-3f15e3e62c80-better-leak-than-crash.patch

    This bug is a major problem for us, any recommendations how to proceed?

  • We have now managed to reproduce the problem even with 5.0.21, with all of your recommended patches. The Linux 5.0 series is based on 4.20, so not that far from 4.19 which you support. Would it make any difference if we manage to reproduce this problem with Linux 4.20 or 4.19, could you provide any additional help in that case?

  • Hi,

    reproducing on 4.19 would be perfect.

    can you clarify again if the issue also happens with no security because you mentioned above that when SAE is not used, it does not happen.

    Regards,

    Shlomi

  • After a lot of work with migrating to 4.19, we have been able to reproduce the "skb_under_panic" problem with 4.19:

    skbuff: skb_under_panic: text:c7703318 len:158 put:16 head:cf7a8af6 data:10cc3c43 tail:0x9c9d269c end:0x9c9d2740 dev:wlan0_mesh
    Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP ARM
    Modules linked in: iptable_filter iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ip_tables x_tables lm75 wlcore_sdio qcaspi qca_7k_common wl18xx wlcore mac80211 cfg80211
    CPU: 0 PID: 124 Comm: kworker/u2:2 Not tainted 4.19.38-g-1 #1
    Hardware name: Freescale i.MX6 UltraLite (Device Tree)
    Workqueue: phy0 wl1271_tx_work [wlcore]
    PC is at skb_panic+0x60/0x64
    LR is at skb_panic+0x60/0x64
    pc : [<80861bbc>]    lr : [<80861bbc>]    psr: 60070013
    sp : 9cabfe78  ip : 00000002  fp : 9d2f0300
    r10: 00000002  r9 : 00000000  r8 : 9c7d2d9c
    r7 : 9c614d20  r6 : 00000001  r5 : 9c9d269c  r4 : 9c9d25fe
    r3 : 80f04d08  r2 : 00000000  r1 : 80c1dd74  r0 : 0000007f
    Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
    Control: 10c53c7d  Table: 9d40006a  DAC: 00000051
    Process kworker/u2:2 (pid: 124, stack limit = 0x53236fc1)
    Stack: (0x9cabfe78 to 0x9cac0000)
    fe60:                                                       00000010 9c9d2600
    fe80: 9c9d25fe 9c9d269c 9c9d2740 9c7d2000 809c60d8 8085aaf8 00000000 0000009e
    fea0: 00000001 7f10aae4 9cabfeef 809c612c 9fbda180 00000005 00000000 9cacdd80
    fec0: 00000001 9d31c000 00000000 00147201 000fac01 00000000 000fac02 7f11b438
    fee0: 9d367000 00000000 9ca57688 ff054d08 00000000 80f04d08 00000004 9c614ef8
    ff00: 9c614d5c 9c614d20 9ce30900 00000000 9c614efc 00000000 9cabe000 7f10b640
    ff20: 9c614ef8 9c8a6680 9c004200 801464bc 00000088 9c004218 9c8a6680 9c8a6694
    ff40: 9c004200 00000088 9c004218 80f03d00 9c004200 801471cc ffffe000 80fa65c0
    ff60: 9c591ebc 9c90cd80 9cabca40 00000000 9cabe000 9c8a6680 80147180 9c90cd9c
    ff80: 9c591ebc 8014c0c4 00000000 9cabca40 8014bfa4 00000000 00000000 00000000
    ffa0: 00000000 00000000 00000000 801010e8 00000000 00000000 00000000 00000000
    ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
    [<80861bbc>] (skb_panic) from [<8085aaf8>] (__skb_to_sgvec+0x0/0x278)
    [<8085aaf8>] (__skb_to_sgvec) from [<00147201>] (0x147201)
    Code: e34800cc e58d3014 e59c305c ebe443b6 (e7f001f2)
    ---[ end trace 378110a279f72e5f ]---
    

    This is with all your recommended patches.

    Can you help?

    We have not been able to reproduce the problem when not using SAE. 

  • I can add that when the kernel got "skb_under_panic" above, /sys/kernel/debug/ieee80211/phy0/wlcore/tx_queue_len was 17.

  • Hi,

    Sorry, I was out.

    On the other E2E link above it is mentioned that it is hard to reproduce and requires several stations connected.

    Can you elaborate what is the exact conditions/environment on your side?

    Also, would be good to know the exact steps taken to reproduce the issue so I can give it a try on my side.

    Regards,

    Shlomi

  • Hi. The Mesh group consists of 3 nodes. In the vicinity there are other WiFi networks, which does not participate in this group. CCMP and SAE are used. Crash usually happens after a few hours. 

    Apparently, it is some kind of memory corruption. If it is difficult to reproduce, perhaps it is easier to start with debugging why this patch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=3f15e3e62c80 is incompatible with Wilink? With that patch, ie 5.4.70, kernel crashes immediately when Mesh is started, at least for us. I suspect that these two issues are caused by the same problem.

    The commit above is for the 5.4 series, but perhaps it applies to 4.19 as well? It's a one line fix, so very easy to try out.

  • Hi,

    I am not sure it is related to this patch that was reverted and hard to see why it crashes immediately.

    At least for the actual crash, would be challenging if it happens on your side only after few hours.

    I need to get more devices if I want to try reproducing it.

    Just to understand regarding the setup, you mentioned 3 devices, are all communicating over mesh? or some connected over legacy station/AP? I am asking since you mentioned mesh/AP role. Also, what kind of traffic are you doing?

    I am trying to understand the setup, roles and type of data/service in your setup.

    Regards,

    Shlomi

  • Just to understand regarding the setup, you mentioned 3 devices, are all communicating over mesh? or some connected over legacy station/AP? I am asking since you mentioned mesh/AP role. Also, what kind of traffic are you doing

    Yes, all 3 devices communicating over Mesh. AP mode (and hostapd) also active on all devices, but no devices connected to these APs. Traffic is low bandwith TCP/HTTP communication. 

  • OK, thanks for the clarification.

  • Hi,

    Updates from my side, I was able to grab another BBB so in total I have two units.

    I tried to reproduce with AP+Mesh as you did, doing a two-way TCP throughput on mesh (no connected stations to the AP) @100Kbps.

    I left it running for couple of days but nothing.

    Do you think it is the 3rd device that may cause the kernel panic?

  • Yes, this is possible.

    We have noticed that in cmd.c there are some skb calls that are made without checking if there is enough space available. We are trying a patch for this and currently evaluating if this helps.

  • thanks, will try to put my hands on another one. 

    greate catch, let me know if you find anything.

  • We have made a patch that tries to improve skb handling. Additional testing is pending, but I'm sharing it already below:

    diff -Naru org/drivers/net/wireless/ti/wlcore/cmd.c mod/drivers/net/wireless/ti/wlcore/cmd.c
    --- org/drivers/net/wireless/ti/wlcore/cmd.c	2023-02-20 07:52:48.234966433 +0100
    +++ mod/drivers/net/wireless/ti/wlcore/cmd.c	2023-02-22 13:39:29.343299684 +0100
    @@ -1278,15 +1278,29 @@
     	}
     
     	if (extra) {
    -		u8 *space = skb_push(skb, extra);
    -		memset(space, 0, extra);
    +		if (skb_headroom(skb) < extra && pskb_expand_head(skb, extra, 0, GFP_ATOMIC)) {
    +			wl1271_error("skb push debug, ti/cmd#1\n");
    +			ret = -ENOMEM;
    +			goto out;
    +		}
    +		memset(skb_push(skb, extra), 0, extra);
     	}
     
     	/* QoS header - BE */
    -	if (wlvif->sta.qos)
    +	if (wlvif->sta.qos) {
    +		if (skb_headroom(skb) < sizeof(__le16) && pskb_expand_head(skb, sizeof(__le16), 0, GFP_ATOMIC)) {
    +			wl1271_error("skb push debug, ti/cmd#2\n");
    +			ret = -ENOMEM;
    +			goto out;
    +		}
     		memset(skb_push(skb, sizeof(__le16)), 0, sizeof(__le16));
    -
    +	}
     	/* mac80211 header */
    +	if (skb_headroom(skb) < sizeof(*hdr) && pskb_expand_head(skb, sizeof(*hdr), 0, GFP_ATOMIC)) {
    +		wl1271_error("skb push debug, ti/cmd#3\n");
    +		ret = -ENOMEM;
    +		goto out;
    +	}
     	hdr = skb_push(skb, sizeof(*hdr));
     	memset(hdr, 0, sizeof(*hdr));
     	fc = IEEE80211_FTYPE_DATA | IEEE80211_FCTL_TODS;
    diff -Naru org/drivers/net/wireless/ti/wlcore/tx.c mod/drivers/net/wireless/ti/wlcore/tx.c
    --- org/drivers/net/wireless/ti/wlcore/tx.c	2023-02-20 07:52:48.250966336 +0100
    +++ mod/drivers/net/wireless/ti/wlcore/tx.c	2023-02-22 13:40:39.454976529 +0100
    @@ -210,6 +210,13 @@
     	total_blocks = wlcore_hw_calc_tx_blocks(wl, total_len, spare_blocks);
     
     	if (total_blocks <= wl->tx_blocks_available) {
    +		if (skb_headroom(skb) < (total_len - skb->len) &&
    +		    pskb_expand_head(skb, (total_len - skb->len), 0, GFP_ATOMIC)) {
    +			wl1271_free_tx_id(wl, id);
    +			dev_kfree_skb(skb);
    +			wl1271_error("skb push debug, ti/tx\n");
    +			return -ENOMEM;
    +		}
     		desc = skb_push(skb, total_len - skb->len);
     
     		wlcore_hw_set_tx_desc_blocks(wl, desc, total_blocks,
    

  • Thanks for the update Peter.

    Please let me know if you find something.

    Shlomi