This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/PROCESSOR-SDK-AM335X: Kernel Soft Lockup whe removing SDCard during I/O

Part Number: PROCESSOR-SDK-AM335X


Tool/software: Linux

Hello,

We've noticed an issue that causes the mmcqd thread to hang in an uninterruptible state, starving the other threads. This happens if we remove the SD card from its slot during I/O operations.

Checking the kernel, we've noticed that the mmc layer correctly detects the there has been a timeout during communication with the target, and tell the upper layers to abort, but the thread keeps trying to start new requisitions (mmc_start_req), eventually hanging. The omap_hsmmc also identifies the timeout, signalizing that 0 bytes were transfered.

While I understand that the operation should be retried a few times in case of spurious timeouts, I can't find the reason that it just doesn't give up after a reasonable number of tries.

With this in mind, i have two questions:

1 - Is there any way to signal the mmcqd that it should give up on waiting for the data?

2 - If I wish to implement a mechanism to forcefully finish the mmcqd thread, what is the best course?

We are using the 3.2 Kernel, and updating it is out of the question. Our MMC slot has no pin for card detection as well.

Regards,

Guilherme

  • Hi,

    Apologies for the delayed response.

    Is this a custom board? Or any of the reference TI designs, on which I can try to reproduce this?

    Best Regards,
    Yordan
  • Yordan,

    It is a custom board. As I mentioned, we use the 3.2 kernel, and we do not have a card detection pin on our MMC slot. I've enable card detection by polling with MMC_CAP_NEEDS_POLLING, but the problem persists.

    Regards,
    Guilherme
  • Figured out the solution!

    The issue was, as I suspected, that the block requests kept being issued even after the card was removed. The error was fixed in commit a8ad82cc1b22d04916d9cdb1dc75052e80ac803c from Texas's kernel repository:

    commit a8ad82cc1b22d04916d9cdb1dc75052e80ac803c
    Author: Sujit Reddy Thumma <sthumma@codeaurora.org>
    Date:   Thu Dec 8 14:05:50 2011 +0530
    
        mmc: card: Kill block requests if card is removed
    
        Kill block requests when the host realizes that the card is
        removed from the slot and is sure that subsequent requests
        are bound to fail. Do this silently so that the block
        layer doesn't output unnecessary error messages.
    
        Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
        Acked-by: Adrian Hunter <adrian.hunter@intel.com>
        Signed-off-by: Chris Ball <cjb@laptop.org>
    

    The key in the commit is that a return BLKPREP_KILL was added to the mmc_prep_request(), besides some minor tweaks that added some 'granularity' to the error treatment, adding a specific code for the cases where the card was removed.