This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM4372: sdhci-omap timout during reboot

Part Number: AM4372
Other Parts Discussed in Thread: AM4376

Hi All

 

We have following issue with the sdhci-omap driver in the AM4372 processor on a custom board

 

System MPU: AM4372BZDN80

Linux kernel: 5.10.120, SRCREV = "ab2d96e4f21159a7df2e87a6fb2a29bd9535506b" from git://git.ti.com/git/ti-linux-kernel/ti-linux-kernel.git

eMMC IC: various manufacturers, capacity 4GB and 8GB

Linux kernel configuration, device tree and design files can be delivered through the local support

 

Issue description:

Sometimes when running reboot command from the Linux console, there is a timeout on the sdhci-omap driver. Board is halted and it is not rebooting. Issue is reproducing randomly, but mostly below 1 hour when running reboot after each startup in the loop.

Console log:

Rebooting... [  216.799795] mmc1: Timeout waiting for hardware cmd interrupt.

[  216.805608] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========

[  216.812095] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00003101

[  216.818577] mmc1: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000

[  216.825059] mmc1: sdhci: Argument:  0x00010000 | Trn mode: 0x00000000

[  216.831541] mmc1: sdhci: Present:   0x01f70000 | Host ctl: 0x00000000

[  216.838022] mmc1: sdhci: Power:     0x00000000 | Blk gap:  0x00000000

[  216.844501] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00000000

[  216.850980] mmc1: sdhci: Timeout:   0x00000000 | Int stat: 0x00000000

[  216.857460] mmc1: sdhci: Int enab:  0x007f0003 | Sig enab: 0x007f0003

[  216.863939] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000

[  216.870419] mmc1: sdhci: Caps:      0x05e10080 | Caps_1:   0x00000000

[  216.876899] mmc1: sdhci: Cmd:       0x00000d1a | Max curr: 0x00000000

[  216.883378] mmc1: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000

[  216.889857] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000

[  216.896333] mmc1: sdhci: Host ctl2: 0x00000000

[  216.900805] mmc1: sdhci: ============================================

[  216.936868] sdhci-omap 481d8000.mmc: Timeout waiting on controller reset in sdhci_omap_reset

[  216.974576] sdhci-omap 481d8000.mmc: Timeout waiting on controller reset in sdhci_omap_reset

Do you have any ideas what could be the reason for that and how to solve it?

  • Hey Pawel,

    Thank you for your question, my name is Andrew and I would be happy to assist with this.  I have relayed your question to the team and hope to have a response for you within the next 1-2 business days.  While I look into this, has there been any updates/developments to this problem on your end that we should take into consideration?

    Best regards,

    Andrew

  • Hi Andrew

    Thank you for looking on it. So far we don't have more updates. Issue is still reproducing also on our custom boards with AM4376 HS processor version.

    Best regards

    Paweł

  • Hey Pawel,

    Issue is still reproducing

    I was hoping to clarify one thing: I know originally you said the issue was reproducing inconsistently, does this mean sometimes the above operation performs as expected and the error is avoided?  If so, could you please attach the log(s) from the successful run as well?

    Best regards,

    Andrew

  • Hi Andrew

    I mean that it still happen from time to time. It is not during each reboot, but we're observing it often. The log from a correct reboot is:

    Sending all processes the TERM signal...
    logout
    Sending all processes the KILL signal...
    Unmounting remote filesystems...
    Deactivating swap...
    Unmounting local filesystems...
    [ 229.205054] EXT4-fs (dm-0): re-mounted. Opts: (null)
    Rebooting... [ 229.531787] reboot: Restarting system
    CCCCCC

    When issue happens it is in a moment of rebooting:

    Sending all processes the TERM signal...
    logout
    Sending all processes the KILL signal...
    Unmounting remote filesystems...
    Deactivating swap...
    Unmounting local filesystems...
    [ 47.642124] EXT4-fs (dm-0): re-mounted. Opts: (null)
    Rebooting... [ 58.086607] mmc1: Timeout waiting for hardware cmd interrupt.
    [ 58.092421] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
    [ 58.098906] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00003101
    [ 58.105388] mmc1: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000
    [ 58.111869] mmc1: sdhci: Argument: 0x00010000 | Trn mode: 0x00000000
    [ 58.118351] mmc1: sdhci: Present: 0x01f70000 | Host ctl: 0x00000000
    [ 58.124830] mmc1: sdhci: Power: 0x00000000 | Blk gap: 0x00000000
    [ 58.131311] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000000
    [ 58.137791] mmc1: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000
    [ 58.144271] mmc1: sdhci: Int enab: 0x007f0003 | Sig enab: 0x007f0003
    [ 58.150750] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
    [ 58.157231] mmc1: sdhci: Caps: 0x05e10080 | Caps_1: 0x00000000
    [ 58.163711] mmc1: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
    [ 58.170191] mmc1: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0x00000000
    [ 58.176670] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
    [ 58.183148] mmc1: sdhci: Host ctl2: 0x00000000
    [ 58.187620] mmc1: sdhci: ============================================

    Best Regards

    Paweł

  • Hey Pawel,

    Thank you for the additional information.  Could you further clarify, about how often would you estimate this error occurs on reboot?  Additionally, are there any changes in circumstances (i.e board setup or applications running before reboot) between the runs that execute successfully and those that fail?

    Best regards,

    Andrew

  • Hi Andrew

    When adding reboot command to a startup script, it reproduces up to one hour. The circumstances during this test are always the same, system is starting and rebooting immediately after startup.

    Best Regards

    Paweł

  • Hey Pawel,

    Thank you for the clarification.  I have provided the team with all of your updates, and I will notify you with their response.

    Best regards,

    Andrew

  • Hey Pawel,

    Just an update from the team: we think this behavior may be caused by a race condition, observed during system shutdown.  We are working on verifying this error, and further identifying a fix; I will keep you updated with our progress.

    Best regards,

    Andrew

  • Hi Andrew

    Thanks for an update and I'm waiting for a result of yours investigation.

    Best regards

    Paweł

  • Hey Pawel,

    I apologize for the delay, team is still working on this with highest priority.  I will keep you updated with our progress.

    Best regards,

    Andrew

  • Hi Andrew

    One update from my side. We caught similar issue for an sdcard which is available in the system as /dev/mmcblk0p1 device. It happened during mounting of the sdcard:

    [ 1099.362364] mmc0: Timeout waiting for hardware interrupt.
    [ 1099.367829] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
    [ 1099.374315] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00003101
    [ 1099.380796] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000002
    [ 1099.387277] mmc0: sdhci: Argument: 0x00000802 | Trn mode: 0x00000033
    [ 1099.393757] mmc0: sdhci: Present: 0x01f70000 | Host ctl: 0x00000002
    [ 1099.400237] mmc0: sdhci: Power: 0x0000000e | Blk gap: 0x00000000
    [ 1099.406716] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000087
    [ 1099.413196] mmc0: sdhci: Timeout: 0x0000000a | Int stat: 0x000000c0
    [ 1099.419676] mmc0: sdhci: Int enab: 0x027f000b | Sig enab: 0x027f000b
    [ 1099.426155] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
    [ 1099.432635] mmc0: sdhci: Caps: 0x05e10080 | Caps_1: 0x00000000
    [ 1099.439115] mmc0: sdhci: Cmd: 0x0000123a | Max curr: 0x00000000
    [ 1099.445594] mmc0: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0x00000000
    [ 1099.452074] mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
    [ 1099.458551] mmc0: sdhci: Host ctl2: 0x00000000
    [ 1099.463022] mmc0: sdhci: ============================================

    The mount command was unsuccessful, a terminal was blocked. It was not possible to stop the mount command using Ctrl+C. Additionally it was not possible to reboot the system from other terminal session (ssh).

    Do you already have some results for this topic?

    Best Regards

    Paweł

  • Hey Pawel,

    Thank you for the additional information, I have copied it to the team.  I apologize for the delay, the team is still investigating this issue and we are working to have a resolution for you as soon as possible.

    Best regards,

    Andrew

  • Pawel

    We are not able to yet help you resolve this. We will continue to investigate. 

    one quick query - have you seen this behavior with a GP (non HS) EVM too? I do not think this is related to GP vs HS, but since you have explicitly called out HS, i wanted to see if you have seen any difference in behavior.

  • Hi Mukul

    Issue is reproducing on both versions of our custom boards (with GP and HS processors).

    It is important for us to solve it because we are restarting boards during software update. In a case of failure on emmc, device will not boot after SW update. In a case of failure on sdcard and further sw update, we are also not able to reboot the board and finalize update.

    Best Regards

    Paweł

  • Hi Andrew and Mukul

    Do you have any updates from your investigation?

    We observed that introducing a few seconds delay during shutdown causing that issue is not reproducing for the eMMC. Delay is at the end of the stop script for the process which is writing to the flash. It could be a workaround for the emmc case but is not solving the same problem with sdcard which happens during card mounting.

    Best Regards

    Paweł

  • Hello Pawel,

    Andrew is out of office for the rest of the week. Please ping the thread if you do not get a response by the middle of next week.

    Regards,

    Nick