This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/WL1837MOD: INFO: task irq/107-wl18xx:635 blocked for more than 120 seconds.

Part Number: WL1837MOD
Other Parts Discussed in Thread: WL1833

Tool/software: Linux

On one of our Linux platforms, we are using WL1833 module.

We are seeing _random_ kernel lockups on our platform that suggests that the IO_RW_EXTENDED (CMD53) request from the host is not getting completed.

This call chain locks wl->mutex and then waits from the above request to complete thereby causing a deadlock between related tasks in the kernel. This can be seen from the tasks that go into deep sleep state:

 

>>>> 

….

[ 8178.341454] irq/107-wl18xx  D c04e5fb4     0  1026      2 0x00000000

[ 8178.347841] [<c04e5fb4>] (__schedule) from [<c04e6468>] (schedule+0x4c/0xa4)

[ 8178.354903] [<c04e6468>] (schedule) from [<c04e9298>] (schedule_timeout+0x1cc/0x20c)

[ 8178.362659] [<c04e9298>] (schedule_timeout) from [<c04e7168>] (wait_for_common+0x9c/0x154)

[ 8178.370935] [<c04e7168>] (wait_for_common) from [<c04e7240>] (wait_for_completion+0x20/0x24)

[ 8178.379392] [<c04e7240>] (wait_for_completion) from [<c0389aac>] (mmc_wait_for_req_done+0x5c/0x14c)

[ 8178.388456] [<c0389aac>] (mmc_wait_for_req_done) from [<c0389bc8>] (mmc_wait_for_req+0x2c/0x30)

[ 8178.397173] [<c0389bc8>] (mmc_wait_for_req) from [<c0394c10>] (mmc_io_rw_extended+0x254/0x2fc)

[ 8178.405802] [<c0394c10>] (mmc_io_rw_extended) from [<c039622c>] (sdio_io_rw_ext_helper+0xc8/0x1ac)

[ 8178.414778] [<c039622c>] (sdio_io_rw_ext_helper) from [<c0396558>] (sdio_readsb+0x2c/0x34)

[ 8178.423068] [<c0396558>] (sdio_readsb) from [<bf849490>] (wl12xx_sdio_raw_read+0xbc/0x1a8 [wlcore_sdio])

[ 8178.432615] [<bf849490>] (wl12xx_sdio_raw_read [wlcore_sdio]) from [<bfa313a4>] (wlcore_rx+0x1dc/0x764 [wlcore])

[ 8178.442847] [<bfa313a4>] (wlcore_rx [wlcore]) from [<bfa248d4>] (wlcore_irq_locked+0xf0/0x344 [wlcore])

[ 8178.452286] [<bfa248d4>] (wlcore_irq_locked [wlcore]) from [<bfa25890>] (wlcore_irq+0xdc/0x19c [wlcore])

[ 8178.461801] [<bfa25890>] (wlcore_irq [wlcore]) from [<c006f57c>] (irq_thread_fn+0x2c/0x50)

[ 8178.470083] [<c006f57c>] (irq_thread_fn) from [<c006f8d8>] (irq_thread+0x13c/0x188)

[ 8178.477756] [<c006f8d8>] (irq_thread) from [<c0044c18>] (kthread+0xec/0x104)

[ 8178.484822] [<c0044c18>] (kthread) from [<c00107e8>] (ret_from_fork+0x14/0x2c)

 

….

[ 8177.737949] wpa_supplicant  D c04e5fb4     0   282      1 0x00000000
[ 8177.744335] [<c04e5fb4>] (__schedule) from [<c04e6468>] (schedule+0x4c/0xa4)
[ 8177.751396] [<c04e6468>] (schedule) from [<c04e686c>] (schedule_preempt_disabled+0x20/0x2c)
[ 8177.759760] [<c04e686c>] (schedule_preempt_disabled) from [<c04e8054>] (__mutex_lock_slowpath+0xa0/0x180)
[ 8177.769339] [<c04e8054>] (__mutex_lock_slowpath) from [<c04e8190>] (mutex_lock+0x5c/0x60)
[ 8177.777564] [<c04e8190>] (mutex_lock) from [<bfa22560>] (wl1271_op_hw_scan+0x5c/0x104 [wlcore])
[ 8177.786357] [<bfa22560>] (wl1271_op_hw_scan [wlcore]) from [<bf98d37c>] (__ieee80211_start_scan+0x220/0x7b8 [mac80211])
[ 8177.797224] [<bf98d37c>] (__ieee80211_start_scan [mac80211]) from [<bf98e390>] (ieee80211_request_scan+0x34/0x4c [mac80211])
[ 8177.808534] [<bf98e390>] (ieee80211_request_scan [mac80211]) from [<bf9a130c>] (ieee80211_scan+0x94/0xd0 [mac80211])
[ 8177.819218] [<bf9a130c>] (ieee80211_scan [mac80211]) from [<bf91be8c>] (nl80211_trigger_scan+0x5f8/0x784 [cfg80211])
[ 8177.829815] [<bf91be8c>] (nl80211_trigger_scan [cfg80211]) from [<c040e4f4>] (genl_rcv_msg+0x26c/0x3ec)
[ 8177.839229] [<c040e4f4>] (genl_rcv_msg) from [<c040d70c>] (netlink_rcv_skb+0xb0/0xcc)
[ 8177.847074] [<c040d70c>] (netlink_rcv_skb) from [<c040e278>] (genl_rcv+0x34/0x44)
[ 8177.854572] [<c040e278>] (genl_rcv) from [<c040d02c>] (netlink_unicast+0x168/0x254)
[ 8177.862244] [<c040d02c>] (netlink_unicast) from [<c040d4ec>] (netlink_sendmsg+0x2fc/0x360)
[ 8177.870524] [<c040d4ec>] (netlink_sendmsg) from [<c03cd704>] (sock_sendmsg+0x24/0x34)
[ 8177.878368] [<c03cd704>] (sock_sendmsg) from [<c03ce1b8>] (___sys_sendmsg+0x1dc/0x1e4)
[ 8177.886298] [<c03ce1b8>] (___sys_sendmsg) from [<c03cef80>] (__sys_sendmsg+0x4c/0x78)
[ 8177.894142] [<c03cef80>] (__sys_sendmsg) from [<c03cefc4>] (SyS_sendmsg+0x18/0x1c)
[ 8177.901727] [<c03cefc4>] (SyS_sendmsg) from [<c0010740>] (ret_fast_syscall+0x0/0x3c)

….

 

<<<< 

 

This is the sequence that I think is leading to the above hang:

-          wilink driver holds wl->mutex and waits for IO_RW_EXTENDED request to complete.

-          wpa_supplicant holds rtnl_mutex and waits for wl->mutex.

-          Tasks waiting for rtnl_mutex get blocked.

 

We are using Kernel 4.1.13 and I can see the last commit in drivers/net/wireless/ti as (5c73cc4b6c83e88863a5de869cc5df3b913aef4a).

Is this a known issue?

-Thanks,

Sarang.

  • our WLAN expert will review and get back shortly.

    Thanks
  • I can;t really comment on the state of the drivers in the 4.1.13 kernel.

    Our last official release (R8.7_SP!) is actually 4.4.63 based so there is a chance that the drivers you use are not updated.

    I suggest building our official release with your kernel which is done using backports.

    You can use the following instructions:

    Best Regards,

    Eyal Reizer

  • Hi Eyal,

    We are using the R8.5 version as part of Kernel 4.1.13. I don't think we can immediately move to latest WiLink driver unless this problem is guaranteed to be fixed after the porting.

    We want to try and figure out what could be happening here. I enabled MMC_DEBUG and captured some debug data leading upto this hang. I have inlined some hand-written comments prefixed with [**]:

    >>>>

    [ 352.381869] mmc0: starting CMD53 arg 24a1f0ec flags 000001b5
    [ 352.381884] mmc0: blksz 236 blocks 1 flags 00000200 tsac 1000 ms nsac 0
    [ 352.381912] mmci-pl18x 50070000.sdi: blksz 00ec blks 0001 flags 00000200
    [ 352.381984] mmci-pl18x 50070000.sdi: op 35 arg 24a1f0ec flags 000001b5
    [ 352.410296] mmci-pl18x 500b0000.sdi: op 19 arg 0088b9f0 flags 000000b5
    [ 352.416845] mmci-pl18x 500b0000.sdi: irq0 (data+cmd) 00000000
    [ 352.422889] mmci-pl18x 50070000.sdi: irq0 (data+cmd) 00000140
    [ 352.428671] mmci-pl18x 50070000.sdi: cmd->resp = 0x00001000 0x00000000 0x00000000 0x00000000
    [ 352.437142] mmc0: req done (CMD53): 0: 00001000 00000000 00000000 00000000
    [ 352.444027] mmc0: 236 bytes transferred: 0
    [ 352.448502] mmci-pl18x 50070000.sdi: irq0 (data+cmd) 00000000
    [ 352.448622] mmc0: starting CMD53 arg 28c2280c flags 000001b5
    [ 352.448638] mmc0: blksz 256 blocks 12 flags 00000200 tsac 1000 ms nsac 0
    [ 352.466960] mmci-pl18x 50070000.sdi: blksz 0100 blks 000c flags 00000200
    [ 352.528190] mmci-pl18x 50070000.sdi: op 35 arg 28c2280c flags 000001b5
    [ 352.534765] mmci-pl18x 50070000.sdi: irq0 (data+cmd) 00000040

    [ 352.540561] mmci-pl18x 50070000.sdi: cmd->resp = 0x00001000 0x00000000 0x00000000 0x00000000

    [ 352.561199] mmci-pl18x 50070000.sdi: irq0 (data+cmd) 00000100
    [ 352.566973] mmc0: req done (CMD53): 0: 00001000 00000000 00000000 00000000
    [ 352.573856] mmc0: 3072 bytes transferred: 0
    [ 352.578413] mmci-pl18x 50070000.sdi: irq0 (data+cmd) 00000000
    [ 352.578621] mmc0: starting CMD53 arg 24a1f0ec flags 000001b5
    [ 352.578637] mmc0: blksz 236 blocks 1 flags 00000200 tsac 1000 ms nsac 0
    [ 352.596784] mmci-pl18x 50070000.sdi: blksz 00ec blks 0001 flags 00000200

    ** Last command that gets BLOCKED, opcode = CMD53


    [ 352.621923] mmci-pl18x 50070000.sdi: op 35 arg 24a1f0ec flags 000001b5

    [ 352.628500] mmci-pl18x 50070000.sdi: irq0 (data+cmd) 00000040

    ** CMDEND interrupt

    [ 352.646385] mmci-pl18x 50070000.sdi: cmd->resp = 0x00001000 0x00000000 0x00000000 0x00000000

    ** This is a custom printk that prints response[0]..[3] after the command is completed. Command response suggests no errors and IO_CURRENT_STATE as CMD.


    [ 352.654845] mmci-pl18x 50070000.sdi: irq0 (data+cmd) 00000000

    ** No DATAEND interrupt.

    <<<<

    It seems MMC controller never sees a DATAEND interrupt for the last CMD53 command which causes the hang. Are there any known edge cases where the WiLink device would fail to send data from a register that the host is requesting to read?

    -Thanks,

    Sarang.