Need TI's help with DM368 IPNC NAND Flash issue

tek4

We have a DM368 IPNC based camera design, with Appro’s 3.012 release which is based on MontaVista Pro 5 / Linux 2.6.18

A production lot built with Samsung Nand K9F1G08U0D-SCB0 seems to be ok

A production lot built with Micron Nand mt29f2g08abaeawp-it worked fine initially but after a number of months there seems to have a high incidence of errors preventing bootup
If we check the console, usually there is an error message from SquashFS about a specific block. Each board with this issue reports a different block, but for a given board the error is consistent.
If we use uBoot commands to read the data from NAND to RAM and then write it back to NAND, the error disappears
This happens with boards that were powered up and working continuously for months, as well as boards that were never powered up after successful initial programming and test. So the issue is not due to spurious writes corrupting the NAND, and even if there were any spurious writes, then the NAND data would be altered in a way that the simple reading and re-writing under uBoot wouldn't be able to fix it.
This happens with boards that had bad blocks in the original bad block table, and boards that had no bad blocks in the factory bad block table.

Based on the above facts, I am guessing that there is some ECC error correction happening when uBoot reads the NAND, but not happening when the Kernel loads the SquashFS. Has anyone encountered this before ?

I understand that file systems other than SquashFS have more redundancy. My issue is not with SquashFS, but with making sure that the errors are corrected long before the file system sees them

I was under the impression that the listed Samsung and Micron NANDs are both ok. Is there possibly a subtle timing or operation difference ?

I am aware of the update to UBL to fix an ECC issue, and our UBL has that already. Evidently the problem is not in UBL or uBoot but in the Kernel.

I am also aware that there are newer versions of the Appro release, with newer Kernel, however that would mean a lot of changes for us, which might introduce new problems.

Thanks for any help or insights.

over 9 years ago

0 Cvetolin Shulev-XID over 9 years ago

TI__Guru 65405 points

Hello Tek4,

I suggest you to compare Micron NAND timing configuration in u-boot with configuration in Kernel. For details visit the link below:
processors.wiki.ti.com/.../Linux_Core_NAND_User's_Guide

BR
Tsvetolin Shulev

0 tek4 over 9 years ago in reply to Cvetolin Shulev-XID

Intellectual 700 points

Hi Cvetolin, thanks so much for your reply and the helpful link

I was under the impression that the timing for the EMIF is actually set up in UBL device initialization and then unchanged in Uboot and Kernel, but I will definitely double check this. I saw that the uBoot code had some timing overrides but it looked like those were bracketed with ifdefs that refer only to DM365 and didn't seem to apply to DM368. Still worth double and triple checking.

I am still confused about how the timing could make a difference in the ECC error correction function, though, as there is a wait loop in the code that waits as long as necessary for the ECC calculation to complete, and just to be sure I added a print statement to let me know if a timeout was reached (it wasn't)

0 Cvetolin Shulev-XID over 9 years ago in reply to tek4

TI__Guru 65405 points

Tek4,

I have an other assumption for reason causing NAND reading errors if using ECC calculation could be explained with DM368 errata Advisory 1.2.4 - Bootloader: RBL Code 4bit ECC Mode Limitation described at:
www.ti.com/.../sprz316b.pdf
I guess that the u-boot can be fixed as it is described in the linked errata but in the kernel not done. This can not explain why Micron NAND worked fine initially and later not but this assumption should be checked.

BR
Tsvetolin Shulev

Processors

Processors forum

Need TI's help with DM368 IPNC NAND Flash issue