This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Covering OOB data with GPMC ECC in Micron MT29F4G16ABBDAHC

Section 7.1.3.3.12.3.3.2 of the AM335X TRM (SPRUH73J) describes various schemes that can be used by the GPMC to map the out-of-band/spare area of a NAND Flash.  The WinCE BSP for the AM335X uses scheme M7, in which the spare data is not protected by the ECC, which prevents the Flash File system from mounting if a non-ECC spare bit (such as one used for the logical-to-physical sector mapping) is corrupted.  We have seen this occur after repartitioning the Flash (which entails erasing all blocks), but occasionally one such OOB bit is not successfully erased, despite the driver having reported a completed and successful erasure.

Scheme M5 would seem to offer an improvement on this situation, as it covers the non-ECC spares with the ECC of sector 0.  My questions are:

  1. Is there a reason the M5 scheme was not used in the BSP?
  2. Since the documentation says that 512 bytes are covered by the ECC, does this mean that some of the bytes in the data sector are no longer covered, or that their protection is in some way lessened by protecting the OOB data?
  3. Is there example code demonstrating the use of the M5 scheme in correcting OOB data errors?

Given the fatality of this condition and the ubiquity of Flash technology I am surprised not to see this discussed elsewhere.  We have SYSBOOT configured to disable the boot ROM after XLDR has come up.

  • Moving this to the WinCE forum.

  • Biser-

     

    I’m glad to hear from you as I know from the Forum you have wide experience in such matters.  Have you encountered single-bit erase failures in the past?

     

    Actually, I think this pertains to Linux users as well:  whatever the driver may be, the question is what the AM335X hardware can do to protect OOB data with ECC, and what a driver (whether WinCE or Linux) would need to do to make use of it.

     

    Thanks,

    Bruce

  • Hi Bruce,

    I'm looking into this for you. Still getting caught up after long weekend. I'll likely have some follow-up questions for you once I finish reviewing the relevant documentation.

     

    Regards,

    -Brad

     

  • Understood- thanks for getting back to me.  It's complicated by the fact that for ECC bits to be randomly writable (both 1 and 0) and stored into Flash they can only be written with the data they protect, so each independent write would need its own ECC.  The SectorInfo structure in the Microsoft FAL model has the physical-to-logical mapping in dwReserved1 and uses three dirty bits in wReserved2 which I believe are all written independently (I will confirm), so they would need to be protected independently.  We are looking at other ways of using the 64-52 (for BCH8 ELM) = 12 bytes available in the OOB data area to address this, but my fundamental question is "How does this work at all?" since the BSP obviously wasn't designed with protecting OOB data in mind and wear leveling will eventually erase all blocks.  How can WinCE possibly provide reliable operation with NAND Flash file systems?  Are we the only people still using WinCE 7 with a Subarctic and NAND Flash?  Are we the only ones to have repartitioned a Flash drive in such an architecture?  Are our well-regarded Micron Flash parts somehow defective?  I am at a loss to understand how else this can be.  Any insight on this would be appreciated.

    Thanks,

    Bruce

    P.S.

    We have been advised by Adeneo that WEC13 is not likely to differ from WinCE 7 in this regard.

  • Brad-

    Another fundamental question: my understanding had been that the OOB ECC bytes themselves are no more immune from corruption than the inband bytes, that therefore the 8 bit BCH ECC algorithm is resilient to bit errors in the OOB ECC data itself- otherwise it would seem rather pointless- and that the 8 correctable bit errors can be anywhere in the 512 inband bytes + 13 ELM ECC bytes.  However in reviewing my original data it is not clear that this is the case:  can you confirm or correct my understanding?

    Thanks,

    Bruce

  • I received the following responses (in red) from Micron:

    My understanding had been:

    a)  that the OOB ECC bytes themselves are physically identical to and hence no more immune from corruption than the inband bytes,Correct

    b) that therefore the ECC algorithm is resilient to bit errors in the OOB ECC data itself, and The ECC algorithm should be correcting the main array and the parity bytes in spare area equally (but this is not our algorithm).  

    c) that the 4 correctable bit errors can be anywhere in the 512 inband or the corresponding ECC bytes (else who will guard the guards?).  Correct

    However in reviewing my original data it is not clear that this is the case:  can you confirm or correct my understanding?

  • So then, can you confirm that the ECC algorithm as implemented by TI's ELM hardware is resilient to bit errors in the ECC OOB data itself as well as the inband data?

  • Bruce,

    • Is there a reason the M5 scheme was not used in the BSP?

      [BC] I cannot speak for what's implemented in the WinCE BSP. In looking through the Linux driver code, it doesn't use scheme M5 either. For reference, the linux driver I'm refering to is located in /ti-sdk-am335x-evm-07.00.00.00/board-support/linux-3.12.10-ti2013.12.01/drivers/mtd/nand/omap2.c from our AM335x EZSDK.It looks like the Linux driver uses a hybrid of the schemes listed in the TRM. I attached this file for your refernce.

      3833.omap2.zip

    • Since the documentation says that 512 bytes are covered by the ECC, does this mean that some of the bytes in the data sector are no longer covered, or that their protection is in some way lessened by protecting the OOB data?

      [BC] It's 512 bytes per calculation resulting in ECC code results whose size varies depending on algorithm used and depth of protection. So you can ECC protect a NAND page that's made up of multiple 512 byte sectors, it just takes multiple ECC calculations and results.


    • Is there example code demonstrating the use of the M5 scheme in correcting OOB data errors?
      [BC] There is no example code that I can find.
  • A page in NAND flash is just an array of cells. Even though we logically partition it up into 512 byte sectors then there's some OOB spare area, physically it's all the same. It's just one big page of flash with identical cells. Nothing special about the spare areas. With that said, you are correct, where within the page you store the ECC codes is no more resilient to bit failures than where you store the payload data. It's all the same from a flash standpoint.

    The ECC algorithms are designed such that the ECC parity words (I've been calling them the ECC codes) are part of what's protected by the algorithm. So the ECC check should detect/correct errors in the ECC codes themselves.

  • My understanding matches what you received from Micron.

    I received the following responses (in red) from Micron:

    My understanding had been:

    a)  that the OOB ECC bytes themselves are physically identical to and hence no more immune from corruption than the inband bytes,Correct

    b) that therefore the ECC algorithm is resilient to bit errors in the OOB ECC data itself, and The ECC algorithm should be correcting the main array and the parity bytes in spare area equally (but this is not our algorithm). 

    c) that the 4 correctable bit errors can be anywhere in the 512 inband or the corresponding ECC bytes (else who will guard the guards?).  Correct

    However in reviewing my original data it is not clear that this is the case:  can you confirm or correct my understanding?

  •  

    So then, can you confirm that the ECC algorithm as implemented by TI's ELM hardware is resilient to bit errors in the ECC OOB data itself as well as the inband data?

    I can say with certainty, by definition, yes.

  • Bruce,

    So I'd like to understand better what your goal is. Are you planning to modify the WinCE NAND driver yourself to implement wrapping scheme M5 to cover the unused spare area with ECC protection? Or are you looking to have the BSP provider (Adeneo) do this? I ask because I don't have any WinCE driver experience and after reading the Adeneo reply on OnTime, it looks pretty complicated.

     

    If you're looking to do the work yourself then I can certainly help from the AM335x chip perspective and I can always reach out internally to assist with that.

    -Brad