This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3358: ROM code could not detect the error data in 1st block

Genius 13655 points
Part Number: AM3358


Hello Champs,

Customer’s products have been in mass product, but some boards can’t boot successfully. He tracked the failure boards and found that there were bit reversed errors in u-boot-spl(MLO). Below is his check procedure.

1. Booting from UART to load u-boot to DDR RAM, then using nand dump to read out the u-boot-spl(MLO) including the ECC data. Comparing the data, there are bit reversed errors in u-boot-spl for the failure board.

2. Write u-boot-SPL to NAND flash with BCH8 ECC, u-boot version is u-boot-2011.09-psp04.06.00.07. Does the AM335x ROM code use ECC scheme when reading SPL from NAND flash?

3. In AM335x U-BOOT Guide, u-boot-spl has been written to 4 blocks of nand flash. If ROM code detect the error in 1st block, it should be read the next block like 2nd block to get the correct SPL. But in his case, the ROM code doesn’t switch to the next block. It seems that the ROM code can’t detect the bit reversed error in 1st block.



4. Reading 1st block and 2nd block SPL to DDR RAM, they are the same, so it means that the ECC of uboot is correct.

So customer doubt if there is some bug in the ROM bootloader code?

Thanks.
Rgds

Shine

  • Hi,

    What is the NAND geometry, esp. page size/OOB size? The ROM code uses either BCH8 or BCH16 depending on this. See Figure 5-15 in the AM437x TRM Rev. H for details (AM335x ROM code uses the same algorithm).
  • Hi, The NAND is Micron 2Gbit MT29F2G08ABAEA, Page Size: 2KB, OOB Size:64byte.

  • This should indeed be BCH8 then. The other thing to check is SYSBOOT[9] pin. It should be tied low.
  • we also checked  that SYSBOOT[9] pin is low.

  • Shine, a few more questions:
    -does multiple runs of nanddump result in revealing the same bad bits
    -do the bad board always fail on NAND boot, or is the failure intermittent?
    -are the bad bits always the same on a failed board?
    -do you compare the whole image in the NAND? How many bad bits are there in the image? Just one, or many?
    -is it possible to connect JTAG to a failed board?

    Regards,
    James
  • Hi,

    We do almost 10 times nand dump test on a failed board. the result is the bad bit always the same on the failed board, that is only one bad bit on the 1st block(SPL) image. And we also compared the 2nd block to 4th block (0x20000-0x7FFFF) on the failed board, those area is normal.

    Sorry, we don’t have ROM code source, how do we debug it by connecting JTAG to the failed board ?

  • It certainly seems like the ROM ECC should be correcting that one bit error. Is the spare area on both a passing and failing board the same? Can you nanddump the full block with spare area (ensure to disable ECC in nanddump)

    Also, after a fail, can you connect JTAG and read register 0x44E10040 (CONTROL_STATUS)? And take note of the PC.

    The tracing vectors: 0x4030CE40-0x4030CE48. can show you a little of what the ROM executed and possible failure codes.

    Regards,
    James
  • Hello,

    I will be closing the ticket and if you are still experiencing issues, please feel free to open the ticket in the future.

    Regards,
    Krunal
  • hi, I have posted nanddump data in the attached file.

    nanddump_data.rar

  • On a failing board, can you confirm what is the value of register 0x44E10040 (CONTROL_STATUS) by connecting JTAG to read it? Also, what is the value of the PC?

    Regards,
    James
  • As many people searched out this thread but without conclusion. although I don't know this case's final root cause. I want to give some advices here.

    From my experience, many customers made mistake on sysboot[9] setting, usually it should set to 0. Although ECC hand by NAND is a good option, but I did not see such a NAND device with integrated ECC feature so far.

    It is better to read back register 0x44E10040 value to confirm the BOOTMODE configured, I saw many cases designed to reserve both pull up and pull down resistor on SYSBOOT[9], usually intended to pull down, but pulled up on board in wrong, in this way it will not do ECC during SPL booting up at all, it won't cause failure on production line as usually new NAND device doesn't bitflip across limited power cycles, but it will report boot failure from time to time in field, even 1bit error in SPL will cause boot fail.