I'm verifying the ECC operation in a clients product that is experiencing unreliable NAND flash.
I'm testing the product by writing a corrupted data and OOB data into flash and reading it back. I use the MTD utilities from a Angstrom distribution running Linux 2.6.32.
The system correct 1-8 bits of randoms errors in the data correctly. However when I corrupt a single bit of the BCH8 codes stored in the OOB, the correction fails and none of the data corrections are applied.
The problem is in the algorithm starting at the decode_bch() function call. This source file (omap_bch_decoder.c) is identical to the source file in Arago.
Here is a test case which uses all 0xFFFF for data and has the following BCH8 codes. The read code has a single bit error (first byte should be 0x10)
Calculated ECC
0x6c, 0xa0, 0xd3, 0x66, 0x5f, 0x79, 0x17, 0xb5, 0x31, 0xd4, 0x7e, 0x32, 0xe6
Read ECC
0x11, 0xae, 0xd1, 0xf6, 0x12, 0x6c, 0x65, 0x3d, 0x68, 0x86, 0x1a, 0xdb, 0x4a
BCH decoding failed detect=1 correct=0
Can someone please explain why this is happening? It appears to be a short coming in the BCH decoding algorithm. I would expect the algorithm to correct error bits in its own BCH8 codes stored in the OOB of the flash.