Several of us have had trouble getting NAND devices to work properly in Linux 2.6.37. I for one have spent numerous hours chasing NAND problems on our AM35x board. Postings I've read over past months show others have also.
The cause of much of these troubles is bugs in the NAND drivers. Some of these bugs are specific to TI drivers (e.g. omap2.c, gpmc.c), and some are in the common drivers used for both TI and non-TI platforms (e.g. nand_base.c).
The issue is, what can we do to raise the level of both the 2.6.37 NAND driver quality, and the support provided by TI? I know for sure that TI has fixes already for some of these problems, as in the example I give below where TI committed a fix over a year ago, yet one year later still did not include the fix in their 2.6.37 support release. And, postings on this forum looking for help often end up in a black hole. In my particular example I've spent many many hours chasing the problem, at great expense in both time and dollars. A small amount of proactive support by TI would have prevented this.
Here's my example. There's a bug in the TI gmpc.c driver, where the Hamming 1 bit HW ECC is incorrectly setup, such that ECC correction just plain doesn't work. The field defining the number of bytes over which the ECC is computed is programmed into a gmpc register, but to get the field into the correct register position the code erroneously shifts the field left by 22 bits twice. The result is that the ECC is computed over 2 bytes instead of 512 bytes, meaning almost all NAND errors are ignored, being neither detected nor corrected. With NAND reliability being what it is, that's catastrophic, since it occurs silently, allowing bad data to slip by to upper layers, such as UBIFS, or Android. Those upper layers count on the lower MTD drivers to transparently correct problems, and if they can't, to at least report an error upwards. But, with the existing TI driver, bad data gets through, being neither corrected nor reported, and a file system such as UBIFS eventually, and possibly quickly, self-destructs.
A blatant error such as this ECC error, once discovered, should have been openly disclosed to TI's customers. For this bug, TI committed a fix over a year ago (2011-07-29 commit 563e83aff9e53d77733f4e17dd129f94c0945c91), yet it wasn't included in TI's AM35x support release of last month (am3517-evm-sdk-src-05.05.00.00). Yes - I know that we customers can (and often do) each rummage through the many commits in repositories at Arago and elsewhere, and hope we run across something. But, that's no way to expect customers to get up-to-date. Even if we stumble on to a promising patch that way, finding a cohesive set of patches that must accompany it is not an exercise the customers should have to tackle individually.
So . . . this is a plea for TI to raise the bar on NAND support. At a minimum, please document known issues. Better yet, fix the issues, either in the form of backports to your original release, or by moving your platforms forward to a more current Linux release.
Regards,
Ron