This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3552 NAND Programming ECC Errors



Hi,

We are programming the boot device  (2Gb NAND  Macronix  MX30LF2G18AC)  for the AM3552  in an off-line programmer which is needed for production process.  We are using a TI SW program "bin2nand" to embed the BCH8 ECC data into the files before we use them in the programmer.   We are using Linux, uboot and  UBIFS in our application.  In general the process works, but not on every part.  We have found that when there is even a single bit error within a page (compare between programming file and a read back of a part in the programmer) at certain locations being readout of the rootfs, the system will output a message such as " UBI warning: ubi_io_read: error -74 (ECC error) while reading 3023 bytes from PEB 22:106696, read only 3023 bytes, retry "...   The PEB of 22 does correlate with the block # where we see the bit error with programmer read back. It is my understanding up to 8 bits of error per sub-page (512) should be able to be handled by the BCH8 ECC.


In development, we programmed the NAND  ( ie, files copied from SDCARD ) using the SW tools in Linux and never ran into an issue. In this case, the BH8 ECC data is being generated from within the GPMC.

Perhaps the ECC calculation & data be adding in the bin2nand SW tool may not be completely correct.   It works fine when there are no errors in the page but that is not the case when there is a bit error(s).   It does have the correct format….   64 Bytes.. (see shaded region in attached example ) and matches the format of figure 26-15 in the TRM spruh73m.pdf with the exception of the bytes 15, 29, 43, and 57 being “FF” in our files and “00” in the TRM.


I know the explanation is brief but I am really looking to get clue on where to look since this subject matter is fairly complex.  Has anyone successfully used the bin2nand to generate the BCH8 ECC data for the AM3552 ? Was there any post processing of the output file needed ?  If  this method was not  used, what method was ( adding ECC in data files  for programmer) 

Also,  I understand there are issues/complexities  with NAND ECC handling within UBFIS,.  Are there setting/code we need to include /use for the ECC to be handled properly?

Thanks  in advance for any guidance and assistance which can be provided.


Sincerely,

Larry Bernstein

  • Hi,

    I will notify the factory team about this.
  • Biser,  

    One additional remark.. Below is from   processors.wiki.ti.com/.../Raw_NAND_ECC


    Wondering what is meant by the highlighted text below.

    The AM335x and AM437x devices support 4b, 8b, and 16b detection and error location.  The software only needs to flip the error bits at the locations provided by the ELM.&nbsp     Note:   We are using a 4 bit  ECC NAND part 

     

    Thank you for connecting us with the TI experts in this area..  

    Larry

  • Biser,   We believe we solved the ECC NAND  error problem ( see information below)  but we are waiting to get a confirmation back from TI indicating that they agrees that this is the proper solution.

    Brief Explaination :

    We have discovered through experimentation that bytes 15, 29, 43 and 57  in  OOB region need to be ''00"  for the ECC to work properly.  If that is not the case,  when there is a bit error in the NAND on  the corresponding page being readout,  Linux will report an EEC error msg  "UBI warning: ubi_io_read: error -74 (ECC error) " .  The bin2nand SW program sets these four bytes in the OOB to  "FF" along with adding BCH8 ECC data into program file..     We wrote a script to post-process the bin2nand program output files to write "00" in the bytes  5, 29, 43 and 57 in the every OOB region where data is written on the corresponding page  ( unwritten pages are not touched).     Based on several test cases; we strongly believe that this is robust solution.

    Getting into more details.

    We seem to have found that the 14th byte for each ECC sector when using AM335x's GPMC and ELM with the BCH-8 algorithm for calculating ECC with Linux really has a significant impact on if our system encounters uncorrectable ECC error due to a small number of incorrect bits.

    Our tests seem to show that the 14th byte of each ECC sector is being used as an indicator of some kind, rather than being ignored, by Linux. If there are 5 or more 1 bits in the 14th byte, then a single bit error in the sector will become an issue for us and is presented as an uncorrectable ECC error. If there are 4 or fewer 1 bits in the 14th byte of an ECC sector, then single bit errors are corrected without issue, as is expected. If we set the 14th byte in each ECC sector to 00h when the sector has data, as Linux and u-boot both do (but bin2nand does not do) this effectively gives us 5 bits of error correction if there are errors in the 14th byte, but we believe 8 bits of error correction if the errors are anywhere else. This is a much better situation than we had before, where 1 bit of error caused severe issues, but isn't as good as what we would like to see.

    I have spent a few days going through the ELM and NAND MTD code for am335x in Linux but I have not yet found a smoking gun which explains what we are seeing. This is my first time working with the ELM and NAND on AM335x, so any advice or pointers will be greatly appreciated.

    Our system is currently using Linux 3.14 from the Yocto Project's beaglebone branch with slight modifications, but I have also been working on backporting changes which impact the operation of the ECC algorithm from future versions of Linux into our kernel source. Would it be worth having me send a few key files related to the ECC to you or other engineers at TI for review? Or would it be better to provide a full source tree (via a non-email method)? I have not yet done a comparison against one of the official TI PROCESSOR-SDK-LINUX-AM335X releases to see what differences exist, but we are hesitant to move away from Linux 3.14 at this point in the project as there is perceived risk in doing so, hence backporting changes has been our main focus.


    Larry

  • Thanks for updating the thread. This is now being tracked offline by your TI FAE together with the software team. You can close this thread if you think the issue is solved.