This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3517 Froyo UBIFS Error with TI Kernel 2.6.32 (TI-Android-FroYo-DevKit-V2.2)

Other Parts Discussed in Thread: AM3517

Dear All,

I'm trying to use ubifs on NAND(Samsung K9F2G08UGC) of our AM3517 custom board.

It works fine several hours or sometimes tens of days,

but got freezed with the following messages.

[ 1664.949279] UBI error: ubi_io_write: error -5 while writing 2048 bytes to PEB 1839:118784, written 0 bytes
[ 1664.959167] UBI warning: ubi_eba_write_leb: failed to write data to PEB 1839
[ 1664.966278] UBI: recover PEB 1839, move data to PEB 1011
[ 1665.042999] UBI error: ubi_io_write: error -5 while writing 116736 bytes to PEB 1011:4096, written 40960 bytes
[ 1665.052978] UBI warning: recover_peb: failed to write to PEB 1011
[ 1665.059143] UBI: try again
[ 1665.061859] UBI: recover PEB 1839, move data to PEB 1012
[ 1665.136749] UBI error: ubi_io_write: error -5 while writing 116736 bytes to PEB 1012:4096, written 40960 bytes
[ 1665.146911] UBI warning: recover_peb: failed to write to PEB 1012
[ 1665.153137] UBI: try again
[ 1665.155822] UBI: recover PEB 1839, move data to PEB 1013
[ 1665.239959] UBI: data was successfully recovered
[ 1665.972717] UBI error: ubi_io_write: error -5 while writing 2048 bytes to PEB 659:67584, written 0 bytes
[ 1665.982421] UBI warning: ubi_eba_write_leb: failed to write data to PEB 659
[ 1665.989440] UBI: recover PEB 659, move data to PEB 1014
[ 1666.052215] UBI: data was successfully recovered
[ 1666.712554] UBI error: ubi_io_write: error -5 while writing 512 bytes to PEB 1334:0, written 0 bytes
[ 1666.721832] UBI error: erase_worker: failed to erase PEB 1334, error -5
[ 1666.728424] UBI: mark PEB 1334 as bad
[ 1666.749969] UBI error: ubi_io_mark_bad: cannot mark PEB 1334 bad, error -5
[ 1666.756896] UBI warning: ubi_ro_mode: switch to read-only mode
[ 1666.762756] UBI error: do_work: work failed with error code -5
[ 1666.768524] UBI error: ubi_thread: ubi_bgt0d: work failed with error code -5
[ 1666.816406] UBIFS error (pid 791): ubifs_wbuf_write_nolock: cannot write 2240 bytes to LEB 961:12288, error -30
[ 1666.826446] UBIFS warning (pid 791): ubifs_ro_mode: switched to read-only mode, error -30
[ 1666.834564] UBIFS error (pid 791): do_writepage: cannot write page 70 of inode 35509, error -30
[ 1666.872283] UBIFS error (pid 791): make_reservation: cannot reserve 160 bytes in jhead 1, error -30
[ 1666.881347] UBIFS error (pid 791): ubifs_write_inode: can't write inode 35509, error -30

As above messages, the recovery was successful but, marking bad block(ubi_io_mark_bad) failed.

Please give me any hint or advice on it...

I started with TI-Android-FroYo-DevKit-V2.2 (http://processors.wiki.ti.com/index.php/TI-Android-FroYo-DevKit-V2.2_UserGuide)

and make ubi image as followings.

# mkfs.ubifs -r temp/ -m 2048 -e 126976 -c 1948 -o ubifs.img

# /mkfs.ubifs -r temp/ -m 2048 -e 126976 -c 1948 -o ubifs.img

[ubifs]
mode=ubi
image=ubifs.img
vol_id=0
vol_size=200MiB
vol_type=dynamic
vol_name=rootfs
vol_flags=autoresize

nandargs=setenv bootargs console=${console} eth=${ethaddr} init=/init noinitrd ip=off rootwait mem=256M mpurate=600 root=ubi0:rootfs rw ubi.mtd=4,2048 rootfstype=ubifs rootdelay=2 omap_vout.vid1_static_vrfb_alloc=y vram=16M

[ 132.138702] omap2-nand driver initializing
[ 132.143005] NAND device: Manufacturer ID: 0xec, Chip ID: 0xda (Samsung NAND 256MiB 3,3V 8-bit)
[ 132.151611] Creating 5 MTD partitions on "omap2-nand.0":
[ 132.156890] 0x000000000000-0x000000080000 : "xloader-nand"
[ 132.164154] 0x000000080000-0x000000240000 : "uboot-nand"
[ 132.171417] 0x000000240000-0x000000280000 : "params-nand"
[ 132.178039] 0x000000280000-0x000000780000 : "linux-nand"
[ 132.186462] 0x000000780000-0x000010000000 : "rootfs"
[ 132.295532] UBI: attaching mtd4 to ubi0
[ 132.299316] UBI: physical eraseblock size: 131072 bytes (128 KiB)
[ 132.305603] UBI: logical eraseblock size: 126976 bytes
[ 132.310974] UBI: smallest flash I/O unit: 2048
[ 132.315612] UBI: sub-page size: 512
[ 132.320190] UBI: VID header offset: 2048 (aligned 2048)
[ 132.326141] UBI: data offset: 4096
[ 133.125396] UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB handling, reserved 10, need 19
[ 133.137542] UBI: attached mtd4 to ubi0
[ 133.141265] UBI: MTD device name: "rootfs"
[ 133.146301] UBI: MTD device size: 248 MiB
[ 133.151214] UBI: number of good PEBs: 1971
[ 133.155883] UBI: number of bad PEBs: 17
[ 133.160369] UBI: max. allowed volumes: 128
[ 133.164916] UBI: wear-leveling threshold: 4096
[ 133.169586] UBI: number of internal volumes: 1
[ 133.173980] UBI: number of user volumes: 1
[ 133.178375] UBI: available PEBs: 0
[ 133.182769] UBI: total number of reserved PEBs: 1971
[ 133.187683] UBI: number of PEBs reserved for bad PEB handling: 10
[ 133.193725] UBI: max/mean erase counter: 45/26
[ 133.198120] UBI: image sequence number: 1281654017
[ 133.203002] UBI: background thread "ubi_bgt0d" started, PID 402

  • Which is your kernel revision? Which is ECC algorithm used?

  • Kernel version is 2.6.32, downloaded from http://software-dl.ti.com/dsps/dsps_public_sw/sdo_tii/TI_Android_DevKit/02_02_00/index_FDS.html

    VERSION = 2
    PATCHLEVEL = 6
    SUBLEVEL = 32
    EXTRAVERSION =
    NAME = Man-Eating Seals of Antiquity

    ECC algorithm is SW ECC and I flashed UBI rootfs via fastboot which set nand ecc as "sw" (u-boot command "nandecc sw")

  • Choi,

    Can you send me the following files to renjith.thomas@pathpartnertech.com? 

    1. arch/arm/mach-omap2/board-am3517-evm.c or corresponding board file you are using?
    2. arch/arm/mach-omap2/board-flash.c
    3. arch/arm/mach-omap2/gpmc.c
    4. drivers/mtd/nand/nand_base.c
    5. drivers/mtd/nand/omap2.c
  • Thomas,

    I sent an email with the files you requested,

    but there is no arch/arm/mach-omap2/board-flash.c of my source tree.


    Please check the email.

    Thanks.

  • Choi,

    board-flash.c is not present in 2.6.32 kernel I guess. I'll check the code and get back by tomorrow.

  • Choi,

    I have gone through the code. It looks like only 1-bit Hamming code error correction is only supported in the driver. I think this error might be because of more than 1-bit error occurring in your case. Since your algorithm supports only 1-bit it might not be able to recover from this. 

    AM3517 supports 4-bit and 8-bit BCH ECC algorithms in hardware also. But I don't think your kernel has support for that. You might have to enable support in your NAND driver for BCH-8, which is really good enough to handle the ECC requirements most of the SLC NANDs with 2K page size. 

    Also can you create a robust test case which will help easily reproducing the issue? I suggest you to write a small script which will do the following. 

    1. write a file with random values.

    2. Calculate check sum and store in another file

    3. reboot the system,

    4. read the file and calculate check sum again

    5. read the stored check sum file and compare 

    6. If fails halt the system

    7. If successful, delete both the files and got to step 1.

    This script will help in reproducing the issue faster and solving the issue faster.

  • Thomas,

    Thanks for you answer !

    NAND Flash of our board is K9F2G08U0C, SLC 1-bit ECC of Samsung.

    So I think 1-bit ECC is enough for our NAND.

    Anyway, do I need to upgrade Kernel version to support 8-bit BCH ECC in hardware ?

    or is it possible to enable support just change my Kernel configurations ?

    TI's Kernel 2.6.32 defines CONFIG_MTD_NAND_OMAP_HWECC, but there is no option or comment about it.

    I grepped Kernel sources and found just 2 line on it

    [edchoi@icanjji-H55M-S2V Android_Linux_Kernel_2_6_32]$ grep -r CONFIG_MTD_NAND_OMAP_HWECC *
    drivers/mtd/nand/omap2.c:#ifdef CONFIG_MTD_NAND_OMAP_HWECC
    drivers/mtd/nand/omap2.c:#ifdef CONFIG_MTD_NAND_OMAP_HWECC

    In your suggestion of test case,

    How do I calculate check sum of file ?

    Is any algorithm of check sum  OK ?

    Edward.

  • Edward,

    Its not possible to enable BCH-8 using some kernel config as its not supported in the NAND driver code. So, you've to check a later kernel version which supports that.. You can either migrate to the newer kernel or back port the NAND driver to it. Alternatively you can add support for BCH8 yourself. 

    To calculate the checksum, you can use any algorithm. The purpose is to check for the data integrity. You can try the md5sum command.

    http://linux.about.com/library/cmd/blcmdl1_md5sum.htm

  • Thomas,

    In the datasheet of Samsung  NAND flash, K9F2G08U0C,

    ECC requirement is 1-bit /528 bytes.

    I'll send you an email with the specifications of Samsung  NAND flash, K9F2G08U0C.

    I believe that the 1-bit error correction is sufficient for our NAND used for our another product that is under manufacturing with 1-bit ECC.

    Please check my source code once more.

    I need to find the fastest way, for I have not enough time to migrate kernel versions or to port NAND driver(not just nand driver, but gpmc and others...).

    Thanks,

  • Choi,

    I have gone through the datasheet that you've shared. But I really doubt whether 1-bit is really sufficient from my past experience. But we can try one experiment.

    Can you enable all the prints with the string ECC UNCORRECTED_ERROR inside the file "drivers/mtd/nand/omap2.c" and see whether any of these prints occur before the crash?

    DEBUG(MTD_DEBUG_LEVEL0, "ECC UNCORRECTED_ERROR 1\n");

  • Choi,

    If you go through section 3.1 in the NAND datasheet, you'll find the statement 

    "The 1st block, which is placed on 00h block address, is guaranteed to be a valid block up to 1K program/erase cycles with 1bit/528Byte ECC". Does this mean that at least 1-bit ECC is required for first block and may be more ecc bits are required for subsequent blocks?