I frequently get ECC uncorrectable bitflip errors when running nanddump on a particular partition /dev/mtd6. I also consistently get ECC corrected bitflip on at least 2 other partitions (total of 17 partitions). Micron believes the uncorrectable bitflip errors are due to a TI fix not in the SDK being using. Also, the consistent number of corrected bitflips for several partitions seems to be high (I'd expect none for new NAND).
I'm using the TI AM335x EVM Starter Kit ti-sdk-am335x-evm-06.00.00.00-Linux-x86-Install on a custom board with Micron MT29F4G08ABADAH4 NAND, AM3352 ARMv7 Processor rev 2 (v7l) CPU. Default ECC BCH8 is used.
/dev/mtd6 is flashed as follows:
mmc rescan
mw.b 0x81000000 0xFF 0x1E0000
fatload mmc 0 0x81000000 u-boot.img
iminfo 0x81000000
nand erase 0x60000 0x1E0000
nand write 0x81000000 0x600000 0x1E0000
nanddump (version 1.5.0) is run as follows (a SD card is mounted at /media/mmcblk0p2):
nanddump --bb=skipbad -o -f /media/mmcblk0p2/nanddump_mtd06 /dev/mtd6
ECC uncorrectable bitflips occur about 80% of the time on /dev/mtd6, sometimes in the hundreds:
root:~# nanddump --bb=skipbad -o -f /media/mmcblk0p2/nanddump_mtd06 /dev/mtd6
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x001e0000...
ECC: 3 uncorrectable bitflip(s) at offset 0x00000000
ECC: 3 uncorrectable bitflip(s) at offset 0x00000800
ECC: 3 uncorrectable bitflip(s) at offset 0x00001000
ECC: 4 uncorrectable bitflip(s) at offset 0x00001800
...
ECC: 2 uncorrectable bitflip(s) at offset 0x00176800
The file output by nanddump for /dev/mtd6 is different than other partitions of the same size flashed with the same image.
When I run nanddump a second time on the same partition, the ECC uncorrectible bitflip errors do not necessarily occur again, but the subsequent run shows a record of 650 ECC failures:
root:~# nanddump --bb=skipbad -o -f nanddump_mtd06 /dev/mtd6
ECC failed: 650
ECC corrected: 1
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x001e0000...
root:~#
Running nandtest seems to work fine, but it does report the large number of ECC failures, and surprisingly (at least to me as it's new NAND) shows a 1 bit ECC correction occuring:
root@skydrop:~# nandtest -p 10 /dev/mtd6
ECC corrections: 0
ECC failures : 650
Bad blocks : 0
BBT blocks : 0
001c0000: checking...
Finished pass 1 successfully
001c0000: checking...
Finished pass 2 successfully
00020000: reading...
1 bit(s) ECC corrected at 00020000
001c0000: checking...
Finished pass 3 successfully
001c0000: checking...
Finished pass 4 successfully
001c0000: checking...
Finished pass 5 successfully
001c0000: checking...
Finished pass 6 successfully
001c0000: checking...
Finished pass 7 successfully
001c0000: checking...
Finished pass 8 successfully
001c0000: checking...
Finished pass 9 successfully
001c0000: checking...
Finished pass 10 successfully
The uncorrectible bitflip reported by nanddump occur with or w/o the following patches intended for Spansion NAND: http://www.spansion.com/Support/Software/linux-psp-04.04.00.01-NAND.zip.
The biggest concern is factory reset partitions on this same NAND having uncorrectable bitflip errors when read and being subsequently copied to all the other partitions resulting in a bricked device.
A separate question: would doubling the timeout values in drivers/mtd/nand/nand_base.c adversely affect u-boot writes to NAND (nand write command)? I otherwise frequently get timeouts when writing nand partitions, especially for larger UBI rootfs paritions on this same Micron NAND (the following values have all been doubled):
u32 timeo = (CONFIG_SYS_HZ * 40) / 1000;
...
if (state == FL_ERASING)
timeo = (CONFIG_SYS_HZ * 800) / 1000;
else
timeo = (CONFIG_SYS_HZ * 40) / 1000;