AM6442: Intermittent eMMC boot issue

Genius 9375 points

Part Number: AM6442
Other Parts Discussed in Thread: AM62P, LP8733

Customer runs U-boot + Linux on a TI AM6442 based module. The vast majority of the time it boots and runs OK. Occasionally, after a software reboot, the system loads the bootloader from an 8GB Micron eMMC and (as part of the normal boot process for any eMMC) starts to detect and setup faster eMMC modes. In some situations, it runs into a problem where the eMMC times out and the system hangs. They  added a software reboot to get out of the hang and although the system restarts and again loads & runs to boot loader from the eMMC, it runs into the same problem. This results in a constant reboot cycle. Power cycle clears the problem.

The issue has been  observed  between 10 and 20 times, typically after the system has been rebooted after a software update. In these cases the systems have been up & running for several weeks. 

As part of debug process the customer has tried using a custom image that rebooted at the end of the bootloader instead of booting Linux.  They performed back 2 back reboots as fast as possible to see if they could trigger the failure condition. More than 15,000 reboot cycles were carried out without success. 1000's of reboots in many different configurations (heavy emmc read and writes, system idling , system performing software updates, etc) were also conducted. 

HS200 and HS400 modes are disabled for eMMC

The bootlog (both good and non-working) showed an error with EEPROM read:

SYSFW ABI: 4.0 (firmware rev 0x000a '10.1.8--v10.01.08 (Fiery Fox)')
EEPROM not available at 0x50, trying to read at 0x51
Reading on-board EEPROM at 0x51 failed -121 

  1. Customer is investigating this to see if a potential I2C hang is tied to this error and the subsequent boot failure

And actual boot failure message is:

mmc_get_op_cond: uhs_en=0, -110

mmc_get_op_cond:mmc_send_op_cond() -110

Card did not respond to voltage select! : -110

spl: mmc init failed with error: -95

SPL: failed to boot from all boot devices

2. Customer also checking if eMMC reset is asserted correctly both hardware and software wise from AM644x to eMMC

 3. eMMC legacy mode seems to work so performance impact of that is being looked at.

If there are any other suggestions on things to check please let me know.

Thanks!

  • Responses maybe delayed as Monday is a TI US holiday. 

    Please continue to share any additional relevant information from the suggested follow up action items above. 

  • Customer is investigating this to see if a potential I2C hang is tied to this error and the subsequent boot failure

    We have reviewed the I2C and can say that based on further testing, the I2C message is benign. It is reported because a device that is not present is being probed. We also confirmed that electrically SDA operates normally using an oscilloscope. To check for possible boot hang, we forced a short across SDA and were able to confirm that the boot completes normally with some additional messages:

    Timed out in wait_for_bb: status=1000
    Timed out in wait_for_bb: status=1000
    Timed out in wait_for_bb: status=1000
    EEPROM not available at 0x50, trying to read at 0x51
    Timed out in wait_for_bb: status=1000
    Reading on-board EEPROM at 0x51 failed -121
    Customer also checking if eMMC reset is asserted correctly both hardware and software wise from AM644x to eMMC

    We also looked at the reset going into the eMMC and the timings look OK, measuring 158uS.

    We are curious about this issue

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1402424/am6442-micron-emmc-chip-at-tmds64evm-b-not-answering-after-a-second-sotfware-reset-cmd-second-mmcsd-driver-init

    which sounds related and would appreciate comment in the context of our current eMMC problems.

    eMMC legacy mode seems to work so performance impact of that is being looked at.

    Currently work-in-progress.

    Thank you.

  • Hi Will,

    Thanks for the update.

    We also looked at the reset going into the eMMC and the timings look OK, measuring 158uS.

    I reviewed the provided failure boot log, and I don't think the issue is related to eMMC reset. the log shows ROM is able to load/run R5 SPL (tiboot3.bin) from eMMC, and R5 SPL is able to load/run A53 SPL (tispl.bin) from eMMC too, then the eMMC failure happens when A53 SPL tries to load A53 U-Boot (u-boot.img). So eMMC seems to be working fine in reading tiboot3.bin and tispl.bin, it seems to be reset properly by the warm-reset signal.

    eMMC legacy mode seems to work so performance impact of that is being looked at.

    Can you please show the U-Boot patch for configuring eMMC to legacy mode?

    The boot log shows SYSFW version v10.1.8, do you use U-Boot from AM64x SDK v10.1? There are some U-Boot MMC driver update in SDK11.2 which are related to MMC timing, I need to check if those patches should be back ported to SDK10.1 U-Boot for you to test.

  • Thanks for the update.

    Can you please show the U-Boot patch for configuring eMMC to legacy mode?

    Currently we are not forcing this mode. i.e. we believe that when the eMMC is reset, it starts in this mode and given that it appears to be readable until we get the timeout, we felt that legacy mode was apparently not exhibiting a problem. Our follow-up intention was to force the eMMC to stay in this mode and not attempt to use faster modes to see if the problem still occurred and also whether the kernel code was better able to deal with the eMMC state. When we have a patch to force this mode, I can share though ideally, this would be for test purposes, at least initially.

    The boot log shows SYSFW version v10.1.8, do you use U-Boot from AM64x SDK v10.1?

    Our U-Boot sources date from 2024 and I believe SDK v10.1 does too, however I need to compare the baselines to understand alignment. I am certainly interested in knowing more about the SDK11.2 updates particularly related to MMC timing. Thanks.

  • Hi Will,

    When we have a patch to force this mode, I can share though ideally, this would be for test purposes, at least initially.

    I agree forcing to legacy mode doesn't resolve the issue. So this work should be a lower priority.

    I am certainly interested in knowing more about the SDK11.2 updates particularly related to MMC timing.

    A quick check on U-Boot since SDK v10.1 release tag, there are 18 patches to the AM62x MMC controller driver.

    ti-u-boot.git$ glog 10.01.10.. drivers/mmc/am654_sdhci.c
    0fea7f943734 FROMLIST: mmc: am654_sdhci: Disable HS400 for AM62P SR1.0 and SR1.1
    98b6b3f5a259 (tag: cicd.scarthgap.202505151402, tag: 11.00.13) PENDING: mmc: am654_sdhci: Clear UHS_MODE_SELECT
    ee6c46a606bf FROMLIST: mmc: am654_sdhci: Add am654_sdhci_set_control_reg
    cd91d7360181 (tag: cicd.scarthgap.202503251551, tag: 11.00.09) PENDING: mmc: am654_sdhci: Unset HIGH_SPEED_ENA for MMC_HS_52
    804035fae6ea PENDING: mmc: am654_sdhci: Add MMC_HS_52 to timing data
    36e384d4eef5 PENDING: mmc: am654_sdhci: Set HIGH_SPEED_ENA for SDR12 and SDR25
    f2be440ceb3c PENDING: mmc: am654_sdhci: Fix possible NULL deref
    afdce7686371 mmc: am654_sdhci: Add the quirk to set TESTCD bit
    03de305ec48b Restore patch series "arm: dts: am62-beagleplay: Fix Beagleplay Ethernet"
    d678a59d2d71 Revert "Merge patch series "arm: dts: am62-beagleplay: Fix Beagleplay Ethernet""
    7938ac657ba6 mmc: Remove <common.h> and add needed includes
    2143a11e6149 mmc: Migrate MMC_SUPPORTS_TUNING to Kconfig
    f13a830e6e4a mmc: am654_sdhci: Fix ITAPDLY for HS400 timing
    a124e31a97cd mmc: am654_sdhci: Set ENDLL=1 for DDR52 mode
    056af04a39ae mmc: am654_sdhci: Add itap_del_ena[] to store itapdlyena bit
    5048b5c61afd mmc: am654_sdhci: Fix OTAP/ITAP delay values
    6b8dd9ca6e06 mmc: am654_sdhci: Add tuning algorithm for delay chain
    a3b2786651c7 mmc: Drop unused mmc_send_tuning() cmd_error parameter

    Instead of identifying which patches to back port, do you think it is better for you just compile the SDK11.02 U-Boot code base and test it on your board?

  • Hi Bin Liu,

    A quick check on U-Boot since SDK v10.1 release tag, there are 18 patches to the AM62x MMC controller driver.

    We will review the list of patches, thanks.

    In the design, we have an LP8733 PMIC and tried to perform a SW_RESET in an effort to power-cycle the eMMC however, it seems that the SW_RESET command removed power from the PMIC outputs but did not re-enable them so the system did not restart. If the PMIC be encouraged to do what we want, this could be a viable way to exit the reset loop and recover from the problem.

    Previously, I linked another E2E ticket & wondered if you had see it & whether you thought this might be related to what we are seeing. 

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1402424/am6442-micron-emmc-chip-at-tmds64evm-b-not-answering-after-a-second-sotfware-reset-cmd-second-mmcsd-driver-init

  • Hi Will, 

    The software reset for the LP8733 PMIC is more like a COLD reset that triggers a power-down sequence and turns OFF all regulators. PMIC does not have an I2C command to only toggle the PGOOD/PORz. 

    Thanks,

    Brenda

  • Hi Will, 

    regarding the thread you reference, we believe this could be a possible cause.  It refers to a condition where the eMMC host controller may get into a state in which a re-initialization (ie, trying to initialize an already initialized controller) fails, and the only way to recover is to power down (or reset) the eMMC host controller to get it back into the power-up state.  This is done with an LPSC (Local Power/Sleep Controller). 

    A possible experiment would be to take the failed board that you have and reset the host controller, but this could only be done via JTAG.  Do you have JTAG access to that board?   

    With JTAG access, we could also attempt to dump the eMMC controller register to see what state it is in.  

    Regards,

    James

  • Hi James,

    Thanks for the reply.

    Unfortunately, JTAG has been removed from the board for security reasons.

  • Thanks Brenda, that aligns with the tests that we carried out.

  • My action item from the call today was to describe my proposal of asserting the system's cold reset without cycling power.

    I initially misunderstood what you were trying to do with the PMIC I2C command. I thought you were trying to assert the processor MCU_PORz reset input to see if the system recovers without cycling power.

    Now I understand you were hoping the I2C command would cycle power to the eMMC device, where the PMIC would sequence the supply rails down followed by a power-up sequence. As Brenda mentioned, the I2C command you sent will only result in a power-down sequence.

    You may want to consider the test I thought you were trying to do. It may help us understand what is happening if we asserted MCU_PORz without cycling power to see if a failing system recovers from just a cold reset. I would not expect there to be any difference between a power cycle and the assertion of cold reset without cycling power since the assertion of MCU_PORz is expected to reset every circuit in the processor.  This is not the case when you assert a warm reset.

    It may be possible to pull the MCU_PORz signal low from an external source connected to the PMIC reset output since its output is open-drain. The external source would need to produce a low pulse with the appropriate min pulse duration to ensure you do not violate the processor pulse width requirement. This min pulse width parameter of 1200ns is defined as RST3 in the datasheet MCU_PORz Timing Requirements table.

    As mentioned above, I'm expecting the cold reset to produce the same results as cycling power. Therefore, you may want to hold off on this test if you have more fruitful tests for a system that takes a long time to get into the failing state.

    Regards,
    Paul

  • There was some confusion talking to Brenda last week, and portions of my previous reply is not correct. She saw my reply to E2E and sent a private email this morning to clarify how the LP8733 PMIC works.

    Brenda confirmed setting the SW_RESET bit will initiate a power-down sequence followed by a power-up sequence. She also confirmed there is no way to toggle the PMICs reset output without cycling power.

    Regards,
    Paul

  • Hi Will,

    I know that the eMMC lockup issue was initially reported in the field after system firmware update on eMMC, but have you ever reproduced the issue without updating the firmware on eMMC? I am trying to understand if the issue could be related to eMMC write transactions during the firmware update process.

  • I received offline response that confirms the issue has been seen without updating the firmware on eMMC.

    Hi Will,

    To continue the discussion about PMIC power down/up topic that we talked in today's meeting, can you please provide the details of the test procedure to trigger SW RESET to the PIMC on your board? For example, do you directly use i2c commands in Linux to access the PMIC registers, or use any Linux command which has i2c transfers to the PMIC implemented under the hood?

  • Hi Bin Lui.

    We used i2c from Linux.

    i2cset -f -y 0 0x61 0x18 0x1

     independently tried the same command and also the U-Boot equivalent.