This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

WL1831MOD: wl1271_sdio mmc1:0001:2: sdio write failed (-110) and hci0: command 0x0406 tx timeout

Part Number: WL1831MOD
Other Parts Discussed in Thread: WL1271

Environment

Device: SolidRun N6 Indoor (EU)

Board: Hummingboard 2

OS: Debian 10

Kernel: 5.10.0-0.deb10.16-armmp #1 SMP Debian 5.10.127-2~bpo10+1 (2022-07-28) armv7l GNU/Linux

[ 37.367306] wlcore: PHY firmware version: Rev 8.2.0.0.245
[ 37.432614] wlcore: firmware booted (Rev 8.9.0.0.88)


Description

We have few of these devices and the Wifi/BT module is causing problems after being running couple of days.

There are three types of installations:

Devices A) Connected to internet via wifi and constant Bluetooth connection to a node

Devices B) Connected to internet via Huawei 4G USB dongle and constant Bluetooth connection to a node, wifi is enabled (not connected)

Devices C) Connected to internet via Huawei 4G USB dongle and constant Bluetooth connection to a node, wifi is disabled

Devices D) Connected to internet via Ethernet and constant Bluetooth connection to a node, wifi is enabled (not connected)

Observations

Devices A are working without any issues. Connections are stable.

Devices B and D are working fine for couple days but then a kernel error happens (device_BD_kernel_error.txt)

Devices C are working fine for couple days but the Bluetooth suddenly stops working.

Actions on Devices C

The kernel log entries are

[66377.110327] Bluetooth: hci0: command 0x0406 tx timeout
[66389.141653] Bluetooth: hci0: command 0x0406 tx timeout

When checking hci0 status with hciconfig -a The status is: DOWN INIT RUNNING

When trying running hciconfig hci0 down and hciconfig hci0 up I get timeout response

When I tried rmmod and modprobe the hci_uart kernel module, the kernel log showed entries of registering the Bluetooth and loading the driver

Then unblocking bluetooth from rfkill and executing hciconfig hci0 up the Bluetooth came operational and hciconfig -a showed status UP RUNNING

Actions on devices B and D

When the kernel error happened I tried following:

rmmod and modprobe (in reverse order) modules: wlcore_sdio, hci_uart, wl18xx, wlcore, mac80211, cfg80211

The kernel showed following error:
kernel: [259780.864873] wl1271_sdio: probe of mmc1:0001:1 failed with error -16
kernel: [259780.879516] wl1271_sdio: probe of mmc1:0001:2 failed with error -16

When trying the same for the Bluetooth as did with devices C I got following error

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
kernel: [278120.054462] Bluetooth: HCI UART driver ver 2.3
kernel: [278120.059123] Bluetooth: HCI UART protocol H4 registered
kernel: [278120.083246] Bluetooth: HCI UART protocol LL registered
kernel: [278120.088622] Bluetooth: HCI UART protocol ATH3K registered
kernel: [278120.137792] Bluetooth: HCI UART protocol Three-wire (H5) registered
kernel: [278120.167498] Bluetooth: HCI UART protocol Intel registered
kernel: [278120.206318] Bluetooth: HCI UART protocol Broadcom registered
kernel: [278120.240479] Bluetooth: HCI UART protocol QCA registered
kernel: [278120.246033] Bluetooth: HCI UART protocol AG6XX registered
kernel: [278120.259502] Bluetooth: HCI UART protocol Marvell registered
kernel: [278120.287146] Bluetooth: hci0: Failed to get CTS
kernel: [278137.679614] Bluetooth: hci0: Failed to get CTS
kernel: [278137.951733] Bluetooth: hci0: Failed to get CTS
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

These are mmc clocks:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@machina1:/# cat sys/kernel/debug/mmc0/ios
clock: 0 Hz
vdd: 0 (invalid)
bus mode: 2 (push-pull)
chip select: 0 (don't care)
power mode: 0 (off)
bus width: 0 (1 bits)
timing spec: 0 (legacy)
signal voltage: 0 (3.30 V)
driver type: 0 (driver type B)
root@machina1:/# cat sys/kernel/debug/mmc1/ios
clock: 400000 Hz
vdd: 21 (3.3 ~ 3.4 V)
bus mode: 2 (push-pull)
chip select: 0 (don't care)
power mode: 2 (on)
bus width: 0 (1 bits)
timing spec: 0 (legacy)
signal voltage: 0 (3.30 V)
driver type: 0 (driver type B)
root@machina1:/# cat sys/kernel/debug/mmc2/ios
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

How to proceed with this problem?

Kernel error and decompiled dts file as an attachment.

device_BD_kernel_error.txt
Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[15317.287353] Bluetooth: received HCILL_WAKE_UP_ACK in state 0
[23199.690581] ------------[ cut here ]------------
[23199.695270] WARNING: CPU: 1 PID: 256 at drivers/net/wireless/ti/wlcore/sdio.c:123 wl12xx_sdio_raw_write+0x10c/0x1b8 [wlcore_sdio]
[23199.707002] Modules linked in: veth(E) xt_nat(E) xt_tcpudp(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) nft_counter(E) xt_addrtype(E) nft_compat(E) nft_chain_nat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) 8021q(E) garp(E) mrp(E) stp(E) llc(E) cmac(E) rfcomm(E) wl18xx(E) wlcore(E) mac80211(E) bnep(E) cfg80211(E) libarc4(E) dw_hdmi_cec(E) imx6_media_csi(CE) dw_hdmi_ahb_audio(E) v4l2_fwnode(E) evdev(E) imx_thermal(E) snd_soc_simple_card(E) snd_soc_simple_card_utils(E) hci_uart(E) btqca(E) btrtl(E) btbcm(E) btintel(E) imx_vdoa(E) snd_soc_imx_audmux(E) nvmem_imx_ocotp(E) wlcore_sdio(E) mux_mmio(E) video_mux(E) mux_core(E) snd_soc_fsl_ssi(E) imx_pcm_dma(E) imx_pcm_fiq(E) snd_soc_core(E) snd_pcm_dmaengine(E) imx2_wdt(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) imx6_media(CE) imx_media_common(CE) videobuf2_dma_contig(E) v4l2_mem2mem(E) dw_hdmi_imx(E) videobuf2_memops(E) dw_hdmi(E) cec(E) videobuf2_v4l2(E) videobuf2_common(E) imxdrm(E)
[23199.707405] imx_ipu_v3(E) drm_kms_helper(E) ir_rc6_decoder(E) leds_gpio(E) rc_rc6_mce(E) gpio_ir_recv(E) rc_core(E) drm(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) imx6q_cpufreq(E) iptable_mangle(E) iptable_filter(E) bluetooth(E) jitterentropy_rng(E) overlay(E) cbc(E) aes_arm_bs(E) crypto_simd(E) cryptd(E) drbg(E) aes_arm(E) aes_generic(E) ansi_cprng(E) ecdh_generic(E) rfkill(E) ecc(E) libaes(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) phy_generic(E) ci_hdrc_imx(E) ci_hdrc(E) ulpi(E) roles(E) ehci_hcd(E) udc_core(E) sdhci_esdhc_imx(E) sdhci_pltfm(E) cqhci(E) usbcore(E) i2c_imx(E) sdhci(E) usbmisc_imx(E) phy_mxs_usb(E) at803x(E) anatop_regulator(E) gpio_mxc(E)
[23199.861926] CPU: 1 PID: 256 Comm: wpa_supplicant Tainted: G WC E 5.10.0-0.deb10.16-armmp #1 Debian 5.10.127-2~bpo10+1
[23199.873585] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[23199.880117] Backtrace:
[23199.882598] [<c0cc7fb4>] (dump_backtrace) from [<c0cc837c>] (show_stack+0x20/0x24)
[23199.890179] r7:0000007b r6:60070013 r5:00000000 r4:c13cdd68
[23199.895873] [<c0cc835c>] (show_stack) from [<c0ccd2b0>] (dump_stack+0xd0/0xe4)
[23199.903118] [<c0ccd1e0>] (dump_stack) from [<c034d6b8>] (__warn+0xfc/0x114)
[23199.910088] r7:0000007b r6:00000009 r5:bf4ee26c r4:bf4efbc8
[23199.915763] [<c034d5bc>] (__warn) from [<c0cc91c0>] (warn_slowpath_fmt+0x70/0xd8)
[23199.923253] r7:0000007b r6:bf4efbc8 r5:c1305e4c r4:00000000
[23199.928933] [<c0cc9154>] (warn_slowpath_fmt) from [<bf4ee26c>] (wl12xx_sdio_raw_write+0x10c/0x1b8 [wlcore_sdio])
[23199.939115] r9:c6f42d80 r8:00000004 r7:c3730010 r6:0001fffc r5:c1305e4c r4:c4566800
[23199.946935] [<bf4ee160>] (wl12xx_sdio_raw_write [wlcore_sdio]) from [<bf709244>] (wlcore_runtime_resume+0xdc/0x220 [wlcore])
[23199.958164] r10:bf4ee160 r9:cc4f1818 r8:bf72cc40 r7:cc4f17ec r6:c1304d00 r5:c1305e4c
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

imx6dl-hummingboard2-emmc-som-v15.dts.txt

  • Hi,

    It seems that this is a duplicate ticket that is closed.

    As mentioned, it is usually an integrity issue on SDIO lines.

    Let's start by Wi-Fi (B and D). Seems like reducing the clock does not help. can you make sure that the modules are all up and running after insmod?

    Just as a test, during regular runtime, can you trigger a recovery and report whether the system continues to run? The command to trigger a recovery is (assuming phy1):

    echo 1 > /sys/kernel/debug/ieee80211/phy1/wl12xx/start_recovery

    Regards,

    Shlomi

  • Hi,

    I will verify the module statuses once the crash happens again. 

    When executing following command:
    echo 1 > /sys/kernel/debug/ieee80211/phy0/wlcore/start_recovery

    The system continues to run, bt is still connected and the wifi scan goes through succesfully.

    Dmesg reveals following messages:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [48502.485828] ------------[ cut here ]------------
    [48502.490650] WARNING: CPU: 1 PID: 14072 at drivers/net/wireless/ti/wlcore/main.c:802 wl12xx_queue_recovery_work+0x6c/0x70 [wlcore]
    [48502.502519] Modules linked in: algif_hash(E) algif_skcipher(E) af_alg(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) cmac(E) rfcomm(E) nft_counter(E) xt_addrtype(E) nft_compat(E) nft_chain_nat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) wl18xx(E) wlcore(E) bnep(E) mac80211(E) cfg80211(E) libarc4(E) evdev(E) imx6_media_csi(CE) v4l2_fwnode(E) hci_uart(E) snd_soc_simple_card(E) imx_thermal(E) btqca(E) snd_soc_simple_card_utils(E) btrtl(E) btbcm(E) btintel(E) imx_vdoa(E) 8021q(E) garp(E) mrp(E) stp(E) llc(E) nvmem_imx_ocotp(E) snd_soc_imx_audmux(E) wlcore_sdio(E) mux_mmio(E) video_mux(E) mux_core(E) snd_soc_fsl_ssi(E) imx_pcm_dma(E) imx_pcm_fiq(E) snd_soc_core(E) snd_pcm_dmaengine(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) imx2_wdt(E) imx6_media(CE) imx_media_common(CE) videobuf2_dma_contig(E) v4l2_mem2mem(E) videobuf2_memops(E) videobuf2_v4l2(E) videobuf2_common(E) imxdrm(E) imx_ipu_v3(E) drm_kms_helper(E)
    [48502.503249] iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) iptable_mangle(E) ir_rc6_decoder(E) drm(E) iptable_filter(E) rc_rc6_mce(E) gpio_ir_recv(E) rc_core(E) leds_gpio(E) imx6q_cpufreq(E) bluetooth(E) jitterentropy_rng(E) cbc(E) overlay(E) aes_arm_bs(E) crypto_simd(E) cryptd(E) drbg(E) aes_arm(E) aes_generic(E) ansi_cprng(E) ecdh_generic(E) rfkill(E) ecc(E) libaes(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) ci_hdrc_imx(E) ci_hdrc(E) phy_generic(E) ulpi(E) roles(E) ehci_hcd(E) udc_core(E) sdhci_esdhc_imx(E) sdhci_pltfm(E) cqhci(E) i2c_imx(E) sdhci(E) usbcore(E) usbmisc_imx(E) at803x(E) anatop_regulator(E) phy_mxs_usb(E) gpio_mxc(E)
    [48502.655562] CPU: 1 PID: 14072 Comm: bash Tainted: G WC E 5.10.0-0.deb10.16-armmp #1 Debian 5.10.127-2~bpo10+1
    [48502.666529] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
    [48502.673062] Backtrace:
    [48502.675552] [<c0cc7fb4>] (dump_backtrace) from [<c0cc837c>] (show_stack+0x20/0x24)
    [48502.683135] r7:00000322 r6:60000013 r5:00000000 r4:c13cdd68
    [48502.688811] [<c0cc835c>] (show_stack) from [<c0ccd2b0>] (dump_stack+0xd0/0xe4)
    [48502.696051] [<c0ccd1e0>] (dump_stack) from [<c034d6b8>] (__warn+0xfc/0x114)
    [48502.703023] r7:00000322 r6:00000009 r5:bf6fde20 r4:bf716e3c
    [48502.708694] [<c034d5bc>] (__warn) from [<c0cc91c0>] (warn_slowpath_fmt+0x70/0xd8)
    [48502.716184] r7:00000322 r6:bf716e3c r5:c1305e4c r4:00000000
    [48502.721919] [<c0cc9154>] (warn_slowpath_fmt) from [<bf6fde20>] (wl12xx_queue_recovery_work+0x6c/0x70 [wlcore])
    [48502.731932] r9:c2a69f58 r8:00000002 r7:00aac758 r6:00000002 r5:c2f197c0 r4:c2f197c0
    [48502.739761] [<bf6fddb4>] (wl12xx_queue_recovery_work [wlcore]) from [<bf70f368>] (start_recovery_write+0x30/0x40 [wlcore])
    [48502.750809] r5:c2f197c0 r4:c2f19804
    [48502.754437] [<bf70f338>] (start_recovery_write [wlcore]) from [<c0655698>] (full_proxy_write+0x64/0x80)
    [48502.763838] r7:00aac758 r6:cb818ee0 r5:bf70f338 r4:c342a780
    [48502.769513] [<c0655634>] (full_proxy_write) from [<c058b684>] (vfs_write+0xd4/0x434)
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Regards,

    Taneli

  • Thanks for the reply.

    I just wanted to make sure that reloading the firmware goes well.

    Just to make sure, reloading the modules also do not recover? only reboot?

    Shlomi

  • Hi,

    When the error: wl1271_sdio mmc1:0001:2: sdio write failed (-110)
    happens and trying to reload modules the following error happens:

    kernel: [259780.864873wl1271_sdioprobe of mmc1:0001:1 failed with error -16

    Only reboot recovers from this.

  • Hi,

    -16 error means that the device or resource is busy which ends up with the -110 error for SDIO failing to read/write.

    since BT is also problematic when it happens, it makes me believe that the chip may not get toggled.

    Can you make sure the BT_EN and WL_EN are working?

    another example where I got -16 with one of the customers is when they used the calibrator tool when the wlan0 interface was still up (and they needed to set it down before running the calibrator tool).

    Regards,

    Shlomi

  • Hi, thank you for your response.
    How can I make sure the BT_EN and WL_EN are working?

    We haven't ran the calibrator tool.

  • Hi,

    The calibrator tool is just an example of a similar behavior I had in the past. There is no need to run it.

    As for the BT_EN and WL_EN, you can probe it with an external logic analyzer.

    Regards,

    Shlomi