Hi, I'm using the latest code out of of the master-ti81xx branch from the linux-omap3 arago repository as I needed 16-bit flash support (and also wanted BCH16). For the most part the code is working great but I ran into issues with UBI partitions. As far as I can tell the cause is commit 72bff96.
The issue seems to affect both my TI81848 and AM3874 hardware (in both BCH8 or BCH16 modes). Not sure if it's relevant to other related architectures. The easiest way I found to reproduce the bug is as follows:
- Erase NAND flash partition
- ubiformat the partition
- ubiattach to the partition
The UBI attach works but produces the following kernel dump:
UBI error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 2:0, read 64 bytes
Backtrace:
[<c0048dc8>] (dump_backtrace+0x0/0x110) from [<c036031c>] (dump_stack+0x18/0x1c)
r6:00000040 r5:ffffffb6 r4:dd876800 r3:60000013
[<c0360304>] (dump_stack+0x0/0x1c) from [<c02317cc>] (ubi_io_read+0x1cc/0x2a4)
[<c0231600>] (ubi_io_read+0x0/0x2a4) from [<c0231afc>] (ubi_io_read_ec_hdr+0x74/0x204)
[<c0231a88>] (ubi_io_read_ec_hdr+0x0/0x204) from [<c0235b68>] (ubi_scan+0x12c/0x132c)
[<c0235a3c>] (ubi_scan+0x0/0x132c) from [<c022c690>] (ubi_attach_mtd_dev+0x554/0xbf8)
[<c022c13c>] (ubi_attach_mtd_dev+0x0/0xbf8) from [<c022cf64>] (ctrl_cdev_ioctl+0xd8/0x168)
[<c022ce8c>] (ctrl_cdev_ioctl+0x0/0x168) from [<c00d2a18>] (do_vfs_ioctl+0x4d4/0x548)
r6:40186f40 r5:dd8af700 r4:bec29bb8
[<c00d2544>] (do_vfs_ioctl+0x0/0x548) from [<c00d2ae4>] (sys_ioctl+0x58/0x7c)
r9:dd810000 r8:00000000 r7:00000003 r6:40186f40 r5:bec29bb8
r4:dd8af700
[<c00d2a8c>] (sys_ioctl+0x0/0x7c) from [<c0045280>] (ret_fast_syscall+0x0/0x30)
r8:c0045428 r7:00000036 r6:00000003 r5:0000c9ce r4:bec29bb8
What seems to happen is that the first page written by the ubiformat command actually has the wrong ECC. This is then detected and corrected by ubiattach (but the block is never marked bad because the next write to it works fine). Once UBIFS volumes are set up and actual data is written to the partition, a lot more of these errors crop up.
The bug seems to be caused by the ordering of the GPMC ECC configuration register writes, and the behaviour is not documented anywhere as far as I could find. The only way I could get it to work was to re-organise the register access to be the same as before the above mentioned commit.
Here is a patch that fixes the issue: 6472.gpmc_ecc.diff
I'm not sure why it works though (all the other ordering combinations I tried don't seem to work). If anyone has any insights into what's going on here or documentation I've missed I would appreciate the feedback.