Hello,
we have two set of custom boards with OMAP3530 and different NAND chips (BOARD1 and BOARD2).
Relevant configurations for our original hardware revision are the followings:
BOARD1
- Micron NAND flash - MT29F4G08ABCHC-ET
- 2.6.29 Linux kernel
- YAFFS2 flash file system (system boots from flash memory)
BOARD2
- Micron NAND flash - MT29F4G08ABBDAH4-IT
- 2.6.32 Linux kernel (long term support release)
- UBIFS flash file system (system boots from flash memory)
The issue initially occurred during power cycle testing with BOARD2, Linux 2.6.32, and the UBIFS file system:
- Power on for 4 minutes (more than enough time for the system to boot completely), power off for 1 minute, and repeat.
- Intermittently UBIFS flash file system initialization fails, related to recovery from unclean shutdown at power off. The failure is persistent across subsequent power cycles and requires reinitializing the board's flash memory to recover. The specific error was
- UBIFS function ubifs_leb_unmap() returns an error that is handled by switching the file system to read only mode, which prevents normal startup.
Observations regarding the problem:
- Occurs at a constant 25C or during temperature cycling over -30C to 70C. Conclusion: temperature is not a factor.
- Occurs with BOARD1 and BOARD2 when using Linux 2.6.32 and UBIFS file system.
- Does not occur with BOARD1 using Linux 2.6.29 (old TI PSP) and YAFFS2 file system.
Our initial attempt to address the issue (with the Linux 2.6.32/UBIFS configuration) was:
- Analyze OMAP NAND flash timing. Timing was ruled out as cause of the issue, although improvements were identified and implemented.
- Apply a UBIFS patch to address ubifs_leb_unmap() error.
- Set 2.6.32 Linux kernel IO scheduler to NOOP to return to 2.6.29 behavior (by default IO scheduling is enabled in 2.6.32).
The results were:
- Significant decrease in failure rate -- from 3-4 out of 8 test units in < 1 day to 1 out of 8 test units in 6 days.
- The ubifs_leb_unmap() error appears to be resolved, although the external symptoms are the same (file system switched to read only mode, failure to boot).
- The current failure appears to be at a lower level beneath UBIFS in UBI or MTD, and related to incorrect handling of bad blocks during error recovery at system start. It is possible that UBIFS is triggering the error via incorrect usage of UBI/MTD.
The new error we have is this:
UBI: attaching mtd6 to ubi0
UBI: physical eraseblock size: 131072 bytes (128 KiB)
UBI: logical eraseblock size: 126976 bytes
UBI: smallest flash I/O unit: 2048
UBI: VID header offset: 2048 (aligned 2048)
UBI: data offset: 4096
UBI: max. sequence number: 1621513
UBI: attached mtd6 to ubi0
UBI: MTD device name: "FileSystem"
UBI: MTD device size: 847 MiB
UBI: number of good PEBs: 6769
UBI: number of bad PEBs: 7
UBI: number of corrupted PEBs: 0
UBI: max. allowed volumes: 128
UBI: wear-leveling threshold: 4096
UBI: number of internal volumes: 1
UBI: number of user volumes: 1
UBI: available PEBs: 0
UBI: total number of reserved PEBs: 6769
UBI: number of PEBs reserved for bad PEB handling: 134
UBI: max/mean erase counter: 2417/239
UBI: image sequence number: 0
UBI: background thread "ubi_bgt0d" started, PID 587
UBIFS: recovery needed
UBIFS error (pid 590): ubifs_scan: corrupt empty space at LEB 997:116771
UBIFS error (pid 590): ubifs_scanned_corruption: corruption at LEB 997:116771
UBIFS error (pid 590): ubifs_scan: LEB 997 scanning failed
UBIFS error (pid 590): do_commit: commit failed, error -117
UBIFS warning (pid 590): ubifs_ro_mode: switched to read-only mode, error -117
UBIFS: recovery completed
UBIFS: mounted UBI device 0, volume 0, name "ROOTFS"
UBIFS: file system size: 839819264 bytes (820136 KiB, 800 MiB, 6614 LEBs)
UBIFS: journal size: 33521664 bytes (32736 KiB, 31 MiB, 264 LEBs)
UBIFS: media format: w4/r0 (latest is w4/r0)
UBIFS: default compressor: none
UBIFS: reserved for root: 4952683 bytes (4836 KiB)
Do you have any idea on what we can do to solve this issue?
Thanks,
Matteo