This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

NAND boot and reliability

Hi,

In my company we're working on updating our high-reliability product (used in power solutions) to using Sitara AM335x processor. In the current early stage, we are thinking of using NAND flash as the persistant memory componant instead of NOR which is used by the current hardware.

However, we are wondering if it is a wise move concerning this type of product. Is NAND with ECC-handling considered as "safe" as a NOR flash?

Specifically, we've been reading the AM335x Sitera ref manual about the ROM Code Start-up. Does this boot sequence allow reliable boot if we have both SPL and the U-Boot on NAND, considering ECC for error correction? Or do we need to manually implement a fall-back mechanism to handle such cases when the flash is corrupted (e.g. store the SPL in multiple places, use MMC, etc.)?

It seems ECC has to be enabled in SW, so we need to ensure at least that U-Boot can be loaded correctly even if the flash itself is not 100% reliable.

Thank you for your clarification and suggestions

Best regards,

Jeremy

  • Hi Jeremy,

    You have the correct document. The relevant section for booting information is section 26. ECC is initially handled by ROM code. In NAND there should be 4 copies of SPL at specific locations, which are checked by the ROM code.

    If you want my personal opinion, since you are saying that this is a high-reliability product (and I think therefore not too much cost-sensitive), NOR is the better option. You may also want to consider eMMC memory.

  • (First, sorry for the delayed answer)

    Thank you for your hints. Let me just detail a bit more details on "high-reliability", as this word is not really precise anyway. We need to be reliable and surely to minimize breakdowns of the system, but this is not only tied to the type of flash memory. In the current product we do have some failures coming from the field also because of other HW/SW problems, but basically we don't want NAND to double the error cases!

    The cost is still a factor because for thousands+ of products this could make a difference. We also need to enhance the capacity (right know we have 64 MB NOR but it's likely to be upgraded to 128 or more) and it's a wish no to be more expensive than currently.

    But again what we want to be sure of is if NAND would be usable in our case, and if we can get as much reliability as with NOR, and HOW (e.g. by making the proper use of ECC, duplication of SPL, etc.). What is the rule of thumb in this situation because it seems NAND IS used in telecom and/or power industry, but it's hard to have real insights about it.

    Thank you
  • If you are running Linux then NAND ECC handling is incorporated by default. As for SPL, there should be 4 copies of it in NAND per NAND layout requirements (http://processors.wiki.ti.com/index.php/AM335x_U-Boot_User's_Guide#NAND_Layout ).

  • Ok thanks. So just to sum up about the entire boot process :

    1) U-Boot SPL can be "reliably" stored because copied 4 times on NAND first sectors
    2) U-Boot can be reliably loaded because SPL already incorporates ECC handling (is that right??)
    3) Same for loading and booting Linux from U-Boot if U-Boot correctly handles ECC
    4) Like you said, from Linux, ECC is handled correctly and shouldn't be an issue then

    Jérémy
  • ECC handling is done even earlier. The processor ROM code handles ECC when loading SPL from NAND. From there on everything is done with ECC.

  • Ok thank you for your hints. As a side question do you know any example in industrial field about using NAND with success? That's in the end what we need to get the right confidence. If we're not the only one and NAND is not only used in consumer products then I think this wouldn't be a bad move...

    Thanks again
  • I don't have this sort of information, but I have asked and reply will be posted here.

  • In my company, we are producing devices with bare NAND flashes, which are running 24/7 and have a long life (5-10 years).

    Failure rate is low.

    There are some important points to consider:

    - NAND is using a big amount of software in the boot loader and in the Linux. Be sure that this software is running well and bullet-proofed.

    - Use a decent amount of error correction. For bigger NAND chips, you HAVE to use 8 bit ECC, as the available NAND chips demand them.

    - Use a file system which does wear leveling and spread the wearing over the full NAND area. (Use ONE UBI partition). If you use several partitions, NAND lifetime will suffer because some parts will wear out early.

    - an empty NAND sector is a special case: the ECC will be FF FF etc. (invalid, special ECC). DO NOT test for (all bytes == 0xFF). Instead, if some bytes are != 0xFF, count the zero bits, and allow for a small amount of zero bits. Remember that bit flips can happen in ECC area of erased sectors too.

    If you want an easy solution, and your storage requirement >= 4GByte, use an EMMC.

    regards

    Wolfgang

  • Hi Wolfgang,

    Thank you for your clarifications, this gives a good feeling about the possible use of NAND in this case.

    About boot loader, that is indeed our main concern. Seems like the issue is addressed here by adding redundancy for SPL, but then the main U-Boot itself may also be corrupted, for example if it happens that a block is getting bad at this place. How do you handle this particular problem? I saw here that there is a possibility to load U-Boot from an UBI volume which would solve the problem but I couldn't find if it is actually implemented at the moment. Should we use a TPL then?

    For ECC, we will of course use the best ECC possible, as requested by the HW. It is also planned to use 1 UBI volume with the kernel/dtb/application stored in an ubifs filesystem.

    The size would be something between 128 MB - 1 GB so we don't plan to use eMMC, as this would also ruin the cost advantage ;)

    Thank again


    Jeremy

  • The standard procedure for uboot is to have more blocks as needed, and to skip bad blocks at uboot write time and at read time. In normal use, ECC will handle bit flips. Nobody will mark these blocks as "bad".
    We use MLC NAND chips (4GBit) with declared ECC of 4bit, and we are using BCH8 as ECC (8bit).