This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

NAND ECC problems with the AM35x

Other Parts Discussed in Thread: AM3517

We're observing problems with the NAND ECC handling on an AM3517evm board.  We're using TI's 2.6.37 linux distribution.

Our configuration is set for OMAP_ECC_HAMMING_CODE_HW, which should detect and correct any single bit error within a subpage of 512 bytes.  But, it's clear that the generated ECC codes are incorrect, and can not detect single bit errors.  We can write two different pages, differing by only one bit, and see identical ECC codes generated for the two subpages.

One possible culprit is incorrect configuration.  We've checked the configuration carefully.  But, we've run across areas where NAND configuration parameters (not necessarily ECC related) are hard-coded into the kernel code rather than being handled through the normal kernel configuration menu, as they should be.  So, it's possible that we missed something.

The more likely culprit is buggy kernel code.  The actual ECC calculation is done in hardware, but it's the kernel's handling of the ECC hardware setup and ECC retrieval that raises our suspicions.  Our review of the kernel code turned up faulty code in places, some of which we've seen fixed in later kernels, and some of which hasn't been.  Postings on TI forums by others with NAND problems further raises the likelihood that issues exist.  And, in later kernel releases, much of the NAND code has changed, suggesting it's been a problem area needing rework along the way.  One specific area of concern is TI file gpmc.c, which handles the ECC setup, and which has been reworked since our release.

Because Linux 2.6.37 will soon be two years old, we're thinking that getting up to date before digging in further makes the most sense.  What we're not sure about is the best way to do that.  Moving to a later Linux release is out of the question, since TI supports only 2.6.37 for the AM3517.  We've already applied the NAND backports available from the MTD and UBIFS supporters at www.linux-mtd.infradead.org, and we are current there.  But, those backports don't include TI specific modules.  And, they don't deal at the hardware level, which is TI specific, so they're not the likely cause anyway.

So - can anyone direct us to the best way to get current on TI's kernel NAND modules in 2.6.37?  The TI AM35x trees in arago and rowboat appear all but dead, but perhaps we're looking in the wrong place.

Meanwhile, if anyone has any specific insights about the faulty handling of OMAP_ECC_HAMMING_CODE_HW ECC generation, we'd be interested in hearing about that.

Thanks,
Ron