This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

WiLink 8 WL1801/WG7801 module does not respond and hangs kernel when beginning the firmware upload process

Other Parts Discussed in Thread: WL1801MOD, WL1271, AM3352, WL1837, WL1831

Hello, 

We are using the Jorjin module WG7801-D0 based on the WL1801MOD WiLink8 chip from TI.  It is hooked up with a Sitara AM3352BZCZ100 on MMC2.  We are running linux kernel 3.14.19 and using wl18xx-build-utilities-R8.5.  Buildroot is the build system. 

We are seeing a kernel hang approximately 2.5% of the time on boot up whenever we bring up the WiFi interface as an access point using:
/usr/local/bin/hostapd -B /opt/configs/hostapd.conf -P /var/run/hostapd.pid 

After investigation, the kernel hangs when it tries to grab a mutex when the udhcpc client is bringing the Ethernet interface up (ifconfig eth0 up).  The mutex was previously locked by hostapd when trying to load the WiFi module’s firmware and was never released.  The system was unable to load the firmware and got stuck endlessly waiting for a response from the module.  Here is the call stack when the module fails to respond following the call to hostapd: 

[…] (previous calls from the kernel omitted because deemed irrelevant)
drv_add_interface (net/mac80211/drivers_ops.h)
wl1271_op_add_interface (drivers/net/wireless/ti/wlcore/main.c)
wl12xx_init_fw (drivers/net/wireless/ti/wlcore/main.c)
wl18xx_boot (drivers/net/wireless/ti/wl18xx/main.c)
wl18xx_pre_upload (drivers/net/wireless/ti/wl18xx/main.c)
wlcore_read_reg (drivers/net/wireless/ti/core/io.h)
wlcore_raw_read32 (drivers/net/wireless/ti/core/io.h)
wlcore_raw_read (drivers/net/wireless/ti/core/io.h)
wl12xx_sdio_raw_read (drivers/net/wireless/ti/wlcore/sdio.c)
sdio_memcpy_fromio (drivers/mmc/core/sdio_io.c)
sdio_io_rw_ext_helper (drivers/mmc/core/sdio_io.c)
mmc_io_rw_extended (drivers/mmc/core/sdio_ops.c)
mmc_wait_for_req (drivers/mmc/core/core.c)
mmc_wait_for_req_done (drivers/mmc/core/core.c)
wait_for_completion (kernel/sched/completion.c)
[…] (further calls from the kernel omitted because deemed irrelevant)

The module hangs always at the same place, when trying to read the REG_CHIP_ID_B register (see wl1271_op_add_interface function).  Further investigation using a logic analyzer showed that the module does not indeed respond.  Here are relevant snapshots.

When everything is working as expected:

When the module fails to respond:

  

In both snapshots:

  • D1 is the enable signal
  • D2 is the clock
  • D3 is CMD
  • D4, D5, D6, D7 are the data lines
  • (D0 is unused)

Please note that other SDIO transactions are going through just fine before this problematic one.  Just to be clear, the system powers up, some transactions are done on the SDIO bus, the module responds fine and then this IO_RW_EXTENDED command is sent on the SDIO bus to read register REG_CHIP_ID_B.  It is this command that never gets responded although previous ones were.

Here are the schematics for the WiFi module:

 

All signals going to/coming from page 6 are connected directly on the AM3352 without any passive or active parts.

HW-wise, I have probed the power rails of the module, looked at the setups and holds, applied heat or cold to change setups and holds, probed the enable signal for possible glitches, looked at the input clock, etc. without seeing anything suspicious.

We followed the WL18xx module integration checklist and the Jorjin datasheet.  As far as I can tell, everything was implemented as suggested.

Any help would be greatly appreciated.

Guillaume Fournier
Brioconcept Consulting Inc.

  • Hi,

    We'll check and get back to you on this.

    Regards,
    Gigi Joseph.
  • I read back my post and I would like to clarify what I meant by the "module does not respond".  In fact, the module responds but not completely.  Here is what's happening:

    • The IO_RW_EXTENDED command to read register REG_CHIP_ID_B is sent on the CMD lane from the host to the module;
    • The host keeps on feeding clocks to the module;
    • After a couple of clocks cycles, the module begins to respond on the CMD lane but it never outputs data on the data lanes.

    So it responds on the CMD lane but does not respond the resquested data on the 4-bit data lanes.  When I compare both a working trace and a non-working trace, the only difference is on the data lanes.

    Thought that could help you narrow the source of the problem.

    Guillaume

  • Hi,

    Is the AM335 IO 1.8V or is there there a level shifter between the am335x mmc2 port and the WL8 module ?
    Can we get the full schematics?
    We need the full schematics and and possibly also layout here as it can be a timing issue.

    In addition, can you provide the .dts file of your board?

    Best Regards,
    Eyal
  • Hello Eyal,

    I would be glad to send you the schematics, layout and DTS file for the board.  However, I would prefer not to post these directly on the forums.  How can I send them directly to you?

    As for your questions, bank VDDSHV5 is powered at 1V8 and there are no level shifters between the AM3352 MMC2 port and the WG7801 module.

    Thanks for helping, it is greatly appreciated.

    Guillaume

  • Guillaume and Eyal,
    We are facing a similar scenario.

    Could you provide how you resolved (if you did) this problem?

    Thanks
    David
  • Hi David,

    Can you please clarify which module?

    Regards,
    Gigi Joseph.
  • It is the WL1837 module

  • Hi David,

    Please raise a new post.
    Please also share more information in the new post (host platform, driver/firmware version, dts, schematics, etc).

    Regards,
    Gigi Joseph.
  • Hello David,

    We were unable to clearly identify the source of the problem.  TI and Jorjin are suggesting layout issues but we could not confirm yet.  We were able to make it work 100% of the time by reducing the SDIO clock to 8MHz.  I did this by fiddling with mmc_sdio_get_max_clock() in drivers/mmc/core/sdio.c to return 8000000 whenever I detect the request is for the module.  I performed some iperf tests to see if performance was impacted and here are the results.  Basically, the performance stays the same.

    At 48 MHz:

    $ iperf -c 10.0.0.2 -w 64kb -l 64kb -M 1400 -i 10 -t 60
    WARNING: attempt to set TCP maximum segment size to 1400, but got 536
    ------------------------------------------------------------
    Client connecting to 10.0.0.2, TCP port 5001
    TCP window size:  125 KByte (WARNING: requested 62.5 KByte)
    ------------------------------------------------------------
    [  3] local 10.0.0.1 port 56921 connected with 10.0.0.2 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  26.7 MBytes  22.4 Mbits/sec
    [  310.0-20.0 sec  26.4 MBytes  22.2 Mbits/sec
    [  320.0-30.0 sec  26.7 MBytes  22.4 Mbits/sec
    [  330.0-40.0 sec  27.0 MBytes  22.6 Mbits/sec
    [  340.0-50.0 sec  27.2 MBytes  22.8 Mbits/sec
    [  350.0-60.0 sec  25.6 MBytes  21.5 Mbits/sec
    [  3]  0.0-60.0 sec   160 MBytes  22.3 Mbits/sec
    $
    At 8 MHz:
    $ iperf -c 10.0.0.2 -w 64kb -l 64kb -M 1400 -i 10 -t 60
    WARNING: attempt to set TCP maximum segment size to 1400, but got 536
    ------------------------------------------------------------
    Client connecting to 10.0.0.2, TCP port 5001
    TCP window size:  125 KByte (WARNING: requested 62.5 KByte)
    ------------------------------------------------------------
    [  3] local 10.0.0.1 port 33696 connected with 10.0.0.2 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  26.7 MBytes  22.4 Mbits/sec
    [  310.0-20.0 sec  25.8 MBytes  21.7 Mbits/sec
    [  320.0-30.0 sec  25.9 MBytes  21.8 Mbits/sec
    [  330.0-40.0 sec  25.6 MBytes  21.5 Mbits/sec
    [  340.0-50.0 sec  27.0 MBytes  22.6 Mbits/sec
    [  350.0-60.0 sec  26.7 MBytes  22.4 Mbits/sec
    [  3]  0.0-60.0 sec   158 MBytes  22.1 Mbits/sec
    $
    So although we couldn't find the source of the problem, we found this workaround.
    Hope this helps,
    Guillaume
  • Dear Guillaume

    I am using WiLink 8 WL1831 modules and am also seeing the same issue as you had observed. However, as you had suggested the workaround, I tried that and with that I saw that my controller just could not initialize the SDIO device. Could you please elaborate the steps which needs to be done?

    Thanks a lot

    Neha.

  • Hello Neha,

    Yes of course!  Here are the files I modified and the code that was modified. What you see here is the final source code.  Sorry I couldn't produce a patch for technical reasons. You will have to diff it yourself. This code implements a very specific patch that might not work as is for you.  For example, the position of the WiFi module is hardcoded (index=2), this might need to change for you.

    drivers/mmc/core/core.c

    static void mmc_wait_for_req_done(struct mmc_host *host,
    				  struct mmc_request *mrq)
    {
    	struct mmc_command *cmd;
        unsigned long jiffies_since;
    
    	while (1) {
    
    		cmd = mrq->cmd;
    
    		// If we're communicating with the WiFi module (index == 2) and if register address to access is 0x13738 (REG_CHIP_ID_B),
    		// wait for completion using a timeout because module seems to hangs at this exact place once in a while.  If this happens,
    		// an emergency restart is triggered to avoid a kernel hang.
    		if ((host->index != 2) || (((cmd->arg >> 9) & 0x1FFFF) != 0x13738)) wait_for_completion(&mrq->completion);
    		else {
    			jiffies_since = wait_for_completion_timeout(&mrq->completion, msecs_to_jiffies(mrq->data->timeout_ns / 1000 / 1000));
    			if (jiffies_since == 0) {
    				printk(KERN_EMERG "mmc_wait_for_req_done: wait_for_completion_timeout() timed out while communicating with wl18xx module.  Emergency restart required...\n");
    				mdelay(2000);
    				emergency_restart();
    			}
    		}
    
    		/*
    		 * If host has timed out waiting for the sanitize
    		 * to complete, card might be still in programming state
    		 * so let's try to bring the card out of programming
    		 * state.
    		 */
    		if (cmd->sanitize_busy && cmd->error == -ETIMEDOUT) {

    drivers/mmc/core/sdio.c

    static unsigned mmc_sdio_get_max_clock(struct mmc_card *card)
    {
    	unsigned max_dtr;
    
    	if (mmc_card_highspeed(card)) {
    		/*
    		 * The SDIO specification doesn't mention how
    		 * the CIS transfer speed register relates to
    		 * high-speed, but it seems that 50 MHz is
    		 * mandatory.
    		 */
    		max_dtr = 50000000;
    	} else {
    		max_dtr = card->cis.max_dtr;
    	}
    
    	if (card->type == MMC_TYPE_SD_COMBO)
    		max_dtr = min(max_dtr, mmc_sd_get_max_clock(card));
    
    	// If we're communicating with the WiFi module (index == 2), set its SDIO clock frequency to
    	// 8MHz. This prevents a problem with the firmware upload to happen.
    	if (card->host->index == 2) return 8000000;
    	else return max_dtr;
    }
    

    Hope this helps !

    Guillaume

  • Dear Guillaume

    Thanks for the prompt reply. With this change also the module did not worked in our case.

    Actually I reread your post and realized that your code has proceeded quite further before getting the lock hang. In our case we are not even able to see the wlan0 interface getting created. And the reason is when for the very first time (when the probe of wl18xx is in progress) "mmc_io_rw_extended" command with "PART_BOOT" is sent to the device, it does not respond and the system waits for "wait_for_completion" command. This is the reason I think wlan0 interface is not getting created on the board.

    In case if you have faced this problem and know the reason/workaround, kindly share.

    Also, I have posted to TI as well lets see what they say.

    I really appreciate your previous reply.

    Regards
    ~ Neha
  • Hello Neha,

    Please make sure the hardware is all fine (all connections are OK, overshoot, undershoot, frequency, signal integrity, setup, hold, etc.). I would think it is most likely a HW issue although I have not run into this same specific problem.

    Good luck!
    Guillaume