This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5716: Boot issue

Part Number: AM5716
Other Parts Discussed in Thread: CSD

Dear TI team,

You can refer to the link below to understand my problem. I have encountered the same problem as this person. 

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/620142/linux-am3352-boot-issue?tisearch=e2e-sitesearch&keymatch=print%2520CCCCC#

I have confirmed that the md5 value of MLO is correct. As long as I start the device through the SD card, perform the following steps, then unplug the SD card, power on AM5716, the device will boot normally:

mount /dev/mmcblk1p1 /mnt/emmc

cd /mnt/emmc

cp MLO MLO-bake

rm MLO

mv MLO-bake MLO

umount /mnt/emmc

  • Dear TI,

    My SDK version is 6.3.0.106. This issue is not related to the SDK version. I feel that it is related to the bootrom code of the chip, because the MLO image of the /dev/mmcblk1p1 is not damaged. I only performed cp/mv operations to move the position of MLO, and the system can boot normally.

  • Hi,

    Could you please elaborate? 

    MLO image of the /dev/mmcblk1p1 is not damaged. I only performed cp/mv operations to move the position of MLO, and the system can boot normally

    Does this happen every boot? We have not seen this on TI boards.

    - Keerthy

  • Dear Keerthy,

    1.We found this issue on both am571x and am335x. The probability of this problem occurring is relatively low, and we also discovered it by chance.We are now testing and reproducing this problem. The steps are as follows, and you can run the following steps twice in 1 minute. The problem can be reproduced in approximately 5-10 hours:

    mount /dev/mmcblk1p1 /mnt/emmc

    rm /mnt/emmc/MLO

    cp /home/root/image/MLO /mnt/emmc/MLO

    umount /mnt/emmc

    sync

    sync

    reboot

    2.After the last reboot, AM335x/AM571x powered on and system couldn't find MLO

    3.I think this is a chip bug. You said you didn't find this problem, but I see that several people on E2E have encountered this problem, but you haven't taken it seriously, which has caused this problem to continue to occur.

  • 1.Let me explain this process in detail. The first time we encountered this issue was during the MLO upgrade. When we went through the following steps, we found that the system restarted after upgrading the MLO and there were no debug prints, only prints CCCCCCCCC

    mount /dev/mmcblk0p1 /mnt/emmc

    rm /mnt/emmc/MLO

    cp /home/root/image/MLO /mnt/emmc/MLO

    umount /mnt/emmc

    sync

    sync

    reboot

    2.Then I start the device through the SD card, perform the following steps, then unplug the SD card, power on AM5716, the device will boot normally:

    mount /dev/mmcblk1p1 /mnt/emmc

    cd /mnt/emmc

    cp MLO MLO-bake

    rm MLO

    mv MLO-bake MLO

    umount /mnt/emmc

    3.We are now testing and reproducing this problem. The steps are as follows, and we will run the following steps twice in 1 minute. The problem can be reproduced in approximately 5-10 hours:

    mount /dev/mmcblk0p1 /mnt/emmc

    rm /mnt/emmc/MLO

    cp /home/root/image/MLO /mnt/emmc/MLO

    umount /mnt/emmc

    sync

    sync

    reboot

  • Dear TI team,

    I think this issue is very serious. As we only use AM571x and AM335x, we have found this issue on these two chips so far. However, I suspect that all of your chips will have this issue. It is time for you to take this issue seriously

  • Hello Zhicheng,

    We will try to recreate this on our hardware. Please give us a couple of days and we will get back to you with our status.

    Is your hardware a TI board or custom?

    Regards,

    -Josue

  • Dear Josue,

    1.It is a custom board and we found this problem in am335x and am571x. If you want to reproduce this issue, you can do the following in the startup script(like /etc/rc5.d/S99reboot.sh):

    #! /bin/sh

    mount /dev/mmcblk0p1 /mnt/emmc

    rm -rf /mnt/emmc/MLO

    cp /root/image/MLO /mnt/emmc

    umount /mnt/emmc

    sync

    reboot

    2.The S99reboot. sh script above will always restart the device and replace the MLO in /dev/mmcblk0p1. It only takes 10-24 hours to recreate this problem. Therefore, if you set up the environment today, it can be recreated tomorrow.

  • Hello Zhicheng,

    I appreciate the script. I will test this and get back to you with our results.

    Regards,

    Josue

  • Dear Josue,

    May i know have you recreated this problem?

  • Zhicheng,

    Not yet. I was out for the weekend. I will be trying this today.

    Regards,

    Josue 

  • Hello Zhicheng,

    I apologize for the confusion, unfortunately we do not have a ready way to test since the TI AM571x EVM does not inherently have eMMC boot enabled.

    See this FAQ: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/959283/faq-ccs-am5728-am57xx-boot-modes-supported

    Discussing this internally and will get back to you.

    -Josue

  • Hello Zhicheng,

    This might take a long time since as determined in the previous response this is not validated in our latest SDK, and the experiments will take 5-10 hours which equates to one per day.

    We are working on setting up the experiment using an AM572x board, since it is the only board we have with eMMC boot available.

    I have a couple of questions:

    • Does this issue only occur in eMMC boot? Are there any other boot options and has this issue been recreated using these other boot options?

    In order to make sure this issue is the same as the one in the other E2E thread, could the bad MLO be saved and used to do a clean boot via SD card?

    • Does this work? If the "bad MLO" consistently fails to boot, then the issue we are seeing now seems similar to the old thread. Otherwise, if the MLO can boot, it seems like a new issue that we are facing now, but with similar behavior as the old thread

    Lastly, to narrow down whether the issue is with "copying" MLO, "booting", or a combination of both, can you see if the issue can be reproduced by just booting for 24 hours without copying to experiment if booting is causing the issue, and as a separate experiment to see if the md5sum changes after recopying the file for more than 24 hours to see if copying is causing the issue?

    Regards,

    -Josue

  • Dear Josue,

    1.Yes, this issue only occurs in the EMMC boot, because we only use EMMC boot, the SD card boot only can be used to flash emmc.

    2.After the startup fails, we copy the "bad MLO" image inside and place it in other bootable devices, which can still be started.

    3.We have been using tools to restart 5 devices for 15 days and have not found any restart failures so far. So this issue is related to "MLO" rather than "booting".

  • Dear Josue,

    How did you build an environment on AM572x board to recreated this issue?

  • Dear Zhicheng,

    I attempted to boot and run the rootfs from emmc but the emmc for AM572x only has 3.6Gi so I need to change the approach. Will attempt booting from eMMC and loading kernel from SD card next. 

    From the information you are providing, it seems like this issue is slightly different than the other thread with similar behavior. And  this might be a combination of replacing the MLO and reboot.

    .After the last reboot, AM335x/AM571x powered on and system couldn't find MLO

    Does this mean that once the MLO boot fails, the board does not ever boot up again with without changing the MLO?

    -Josue

  • Dear Josue,

    1.You must boot from emmc can recreate this issue.

    2.You can run below script to recreate this issue(/etc/rc5.d/S99reboot.sh):

    #! /bin/sh

    mount /dev/mmcblk0p1 /mnt/emmc

    rm -rf /mnt/emmc/MLO

    cp /root/image/MLO /mnt/emmc

    umount /mnt/emmc

    sync

    reboot

    3.After running the above script for a few hours, you will find that the system cannot start from EMMC. No matter how long you leave it or power it  on or off, the system will still fail to start. But if you boot through SD, perform the following operations, then unplug the SD card, and then power on, the system can boot from EMMC again:

    mount /dev/mmcblk0p1 /mnt/emmc

    cd /mnt/emmc

    cp MLO MLO-bake

    rm MLO

    mv MLO-bake MLO

    umount /mnt/emmc

    4.So the answer is yes ----->Does this mean that once the MLO boot fails, the board does not ever boot up again with without changing the MLO?

  • Dear Zhicheng,

    Running these test will take a little longer than expected. I appreciate your patience as we get our environment set up.

    -Josue

  • Dear Zhicheng,

    Will be working on trying to reproduce issue.

    -Josue

  • Dear Josue,

    May i know have you set up a reproduction environment yet?

    -->If not, you can tell me the problems you encountered while building the environment.

    -->If yes, have you recreated this issue?

  • Hello Zhicheng,

    I finally got a tiny image of Linux running on the eMMC on AM572x EVM. I should be able to run some or a test today depending how long it takes for the issue to present itself if it does. 

    I will let you know how it goes .

    -Josue

  • Hello Zhicheng,

    Testing now, will report results by Monday CST.

    -Josue 

  • Hello Zhicheng,

    Results over the weekend: I believe the board stopped booting in approx. 3 hours or 372 boots.

     

    Board stayed on without any print-outs. Did not see CCCCCC prints.

    The MLO has the same Md5sum

    Tried multiple times to re-boot board from eMMC, it did not work.

    I lastly tried to manually copy over the MLO again.

    -booted board via SD card

    -mounted mmcblk2p1 and copied over MLO from home/root/image into mmcblk2p1

    Turned off board, took out SD card, turned the board back on and it booted normally with same MLO.

    Does this match what you are seeing?

    I am doing another run today to investigate further. 

    -Josue

  • Dear Josue,

    1.Yes, this is the phenomenon I encountered. Actually, printing CCCC is not necessary. It depends on your bootmode configuration, and only when starting the system from UART will CCCC be printed.

    2.In our environment, CCCC will be printed on AM335x, and nothing will be printed on AM571x when booting fail. This is because the bootmode configurations of our two environments are different.

    3.In short, as long as we start through the SD card and then recopy the EMMC's MLO or execute the following command, the problem will disapear:

    mount /dev/mmcblk1p1 /mnt/emmc

    cd /mnt/emmc

    cp MLO MLO-bake

    rm MLO

    mv MLO-bake MLO

    sync

    cd ~

    umount /mnt/emmc

    reboot

  • Hello Zhicheng,

    I also recreated this from eMMC boot as primary. We are investigating internally and will report back.

    Thank you for your help.

    -Josue

  • Dear Josue,

    I'm glad you can recreate this issue,  looking forward to your input.

  • Hello Zhicheng,

    Could you share the eMMC part numbers used on your boards?

    -Josue

  • Dear Josue,

    Sorry for late, I have been on vacation for the past two days on weekends.

    We have used several eMMC that can cause problems, for example, Micron's emmc:MTFC8GAKAJCN-4MIT

    I don't think this has anything to do with eMMC, as we use several types of EMMC that can cause this problem

  • Zhicheng,

    Our team is investigating some issues with micron parts, but the error that I recreated happened on a Kingston part so I agree, This issue is more than likely not a eMMC part-related. 

    -josue

  • Dear Josue,

    thanks very much for help, looking forward to your new input.

  • Zhicheng,

    I am currently out of office until next week so please expect delays.

    -Josue

  • Zhicheng,

    Picking this back up this week. I appreciate your patience as we sort it out.

    -Josue

  • Dear Josue,

    Thank you for your efforts in this matter, looking forward to your input.

  • Dear Zhicheng,

    I regret to inform you that our tests came back inconclusive. 

    I looked at the Tracing ROM vectors, analyzed the MBR for corruption and checked the EXT_CSD register within the emmc to see if there was any corruption. Unfortunately this did not yield anything helpful. 

    The ROM team that was in charge of this device no longer exists within TI, so we do not have the ability to be of any further support.  

    I will be placing an internal record for this as a possible errata.

    Unfortunately there is no software fix for this, the only workaround we can do is what you are already doing, boot up via SD card, manually replace the MLO and boot up again.

    Best,

    Josue 

  • Hello Zhicheng,

    Had an idea of something to try. 

    I will try another test where we partition the boot partition every time and copy over the MLO to check if  this helps avoid this error.

    I will report back once this is done. Have you done anything like this already?

    -Josue

  • Dear Josue,

    1.We conducted the following testing a long time ago, and the testing can avoid this problem:

    mount /dev/mmcblk0p1   /mnt/emmc

    cp –rf /mnt/emmc*   /home/root/p1_bake/

    mkfs.vfat -F 32 -n boot /dev/mmcblk0p1

    cp /home/root/MLO   /home/root/p1_bake/

    cp /home/root/p1_bake/*    /mnt/emmc

    umount   /mnt/emmc

    sync

    reboot

    2.We have calculated that the execution time of the mkfs.vfat command is 600ms. If the device loses power during 600ms period, the device will not work and can only be restored by burning the system again, which is unacceptable to us.

  • Hello Zhicheng,

    We have detailed brainstorm on this issue and have identified the reason why the ROM code is unable to detect the newly written MLO file after certain amount of reboot.

    The reason is that the fat driver part of the eMMC boot functionality of ROM code has ability to read a limited entries of the FAT table. As we keep deleting and copying a new MLO file, a new FAT entry gets added, which over a period of time crosses the limit that ROM code can support. The only workaround for this is to format the BOOT partition (create a new FAT table) and then copy the MLO file.

    Please note that we cannot modify or fix the ROM code as this is fixed to the device. Any update would to this will mandate a new silicon revision which is not planned.

    So the only alternate solution for you is to format the BOOT partition and copy the MLO file.

    Thanks.