We have a DM368 IPNC based camera design, with Appro’s 3.012 release which is based on MontaVista Pro 5 / Linux 2.6.18
- A production lot built with Samsung Nand K9F1G08U0D-SCB0 seems to be ok
- A production lot built with Micron Nand mt29f2g08abaeawp-it worked fine initially but after a number of months there seems to have a high incidence of errors preventing bootup
- If we check the console, usually there is an error message from SquashFS about a specific block. Each board with this issue reports a different block, but for a given board the error is consistent.
-
If we use uBoot commands to read the data from NAND to RAM and then write it back to NAND, the error disappears
- This happens with boards that were powered up and working continuously for months, as well as boards that were never powered up after successful initial programming and test. So the issue is not due to spurious writes corrupting the NAND, and even if there were any spurious writes, then the NAND data would be altered in a way that the simple reading and re-writing under uBoot wouldn't be able to fix it.
- This happens with boards that had bad blocks in the original bad block table, and boards that had no bad blocks in the factory bad block table.
Based on the above facts, I am guessing that there is some ECC error correction happening when uBoot reads the NAND, but not happening when the Kernel loads the SquashFS. Has anyone encountered this before ?
I understand that file systems other than SquashFS have more redundancy. My issue is not with SquashFS, but with making sure that the errors are corrected long before the file system sees them
I was under the impression that the listed Samsung and Micron NANDs are both ok. Is there possibly a subtle timing or operation difference ?
I am aware of the update to UBL to fix an ECC issue, and our UBL has that already. Evidently the problem is not in UBL or uBoot but in the Kernel.
I am also aware that there are newer versions of the Appro release, with newer Kernel, however that would mean a lot of changes for us, which might introduce new problems.
Thanks for any help or insights.