This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Problem: UBIFS/NAND -74 error

Hi,

we are faceing a serious problem for months now and is really urgent. We can't get UBIFS to work and beleive that there might man NAND driver problem.

We are using a kernel based on linux-omap3.git (97101e6c43c0e956dbc2863bd3e50ab70f987a91) for the DM8148 Centaurus. So all patches for subpages should be included. The linux system is based on a standard yocto poky-denzil build.

The NAND flash is a Micron MT29F4G16ABBDAH4-IT:

512 MiB, 128KiB blocks, 2KiB pages (min I/O), 512 B subapges. 16-bit interface
we are not sure if it only supports HW-ECC 4-bit (since it only states minimum 4 bit ECC), but it has a 4-bit internal ECC engine.

We tried both 8-bit and 4-bit BCH ECC for the OMAP NAND controller config in the Kernel, but it fails in both modes.


mkfs.ubifs -F -x lzo -U -v --root=$RFSDIR --min-io-size=2048 --leb-size=129024 --max-leb-cnt=3942 -o rfs.ubifs

# Ubinize the ubifs file system
ubinize -o rfs_ubifs.img -m 2048 -p 128KiB -s 512 -O 512 ubinize.cfg

The ubinize config just states a single volume (dynamic 485 MiB, autoresize).

Since we had errors, we tried different combinations instead, e.g., leb-size 124KiB and subpages to 2048, ... . We could mount the volume as RFS, but had problems that the number of UBI errors increased after some time and sometimes the UBIFS was not mountable anymore.

All errors (dirty workaround settings, correct settings) look like these (here BCH4 with the actual correct settings):


ubiattach /dev/ubi_ctrl -m 7 -O 512
UBI: attaching mtd7 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI: max. sequence number:       0
UBI: volume 0 ("rootfs") re-sized from 3942 to 3990 LEBs
UBI: attached mtd7 to ubi0
UBI: MTD device name:            "n_Filesystem/UBIFS"
UBI: MTD device size:            505 MiB
UBI: number of good PEBs:        4034
UBI: number of bad PEBs:         8
UBI: number of corrupted PEBs:   0
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     1
UBI: available PEBs:             0
UBI: total number of reserved PEBs: 4034
UBI: number of PEBs reserved for bad PEB handling: 40
UBI: max/mean erase counter: 0/0
UBI: image sequence number:  1455614892
UBI: background thread "ubi_bgt0d" started, PID 538
UBI device number 0, total 4034 LEBs (520482816 bytes, 496.4 MiB), available 0 LEBs (0 bytes), LEB size 129024 bytes (126.0 KiB)
root@q7centaurus:~# UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4039:512, read 512 bytes
UBI: run torture test for PEB 4039
UBI: PEB 4039 passed torture test, do not mark it as bad
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4039:512, read 512 bytes
UBI: run torture test for PEB 4039
UBI: PEB 4039 passed torture test, do not mark it as bad
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4039:512, read 512 bytes
UBI: run torture test for PEB 4039
UBI: PEB 4039 passed torture test, do not mark it as bad
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4039:512, read 512 bytes
UBI: run torture test for PEB 4039
UBI: PEB 4039 passed torture test, do not mark it as bad
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4039:512, read 512 bytes
UBI: run torture test for PEB 4039
....

Or e.g.:


root@q7centaurus:~# UBI: PEB 1 passed torture test, do not mark it as bad
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4040:512, read 512 bytes
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 1:512, read 512 bytes
UBI: run torture test for PEB 1
UBI: PEB 1 passed torture test, do not mark it as bad
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4040:512, read 512 bytes
UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 1:512, read 512 bytes
UBI: run torture test for PEB 1

Although sometimes when erasing the NAND from UBoot/Kernel and reformatting the partition with the ubifs/ubinized image from the Kernel it was able to attach and mount without errors. After detaching and reataching the it again the errors appeared.

If you use the mtd_stresstest and mtd_subpagetest on mtd7 we get errors like these:


root@q7centaurus:~# modprobe mtd_stresstest dev=7

=================================================
mtd_stresstest: MTD device: 7
mtd_stresstest: MTD device size 529793024, eraseblock size 131072, page size 2048, count of eraseblocks 4042, pages per eraseblock 644
mtd_stresstest: scanning for bad eraseblocks
mtd_stresstest: block 37 is bad
mtd_stresstest: block 120 is bad
mtd_stresstest: block 1702 is bad
mtd_stresstest: block 1748 is bad
mtd_stresstest: block 1804 is bad
mtd_stresstest: block 1949 is bad
mtd_stresstest: block 3092 is bad
mtd_stresstest: block 3260 is bad
mtd_stresstest: scanned 4042 eraseblocks, 8 are bad
mtd_stresstest: doing operations
mtd_stresstest: 0 operations done
mtd_stresstest: error: read failed at 0x11b2346e
mtd_stresstest: error -74 occurred
=================================================
FATAL: Error inserting mtd_stresstest (/lib/modules/2.6.37_Trunk_Build_122-dirty-DM8148+/kernel/drivers/mtd/tests/mtd_stresstest.ko):e
root@q7centaurus:~# modprobe mtd_subpagetest dev=7

=================================================
mtd_subpagetest: MTD device: 7
mtd_subpagetest: MTD device size 529793024, eraseblock size 131072, page size 2048, subpage size 512, count of eraseblocks 0, pages p4
mtd_subpagetest: scanning for bad eraseblocks
mtd_subpagetest: block 37 is bad
mtd_subpagetest: block 120 is bad
mtd_subpagetest: block 1702 is bad
mtd_subpagetest: block 1748 is bad
mtd_subpagetest: block 1804 is bad
mtd_subpagetest: block 1949 is bad
mtd_subpagetest: block 3092 is bad
mtd_subpagetest: block 3260 is bad
mtd_subpagetest: scanned 4042 eraseblocks, 8 are bad
mtd_subpagetest: erasing whole device
mtd_subpagetest: erased 4042 eraseblocks
mtd_subpagetest: writing whole device
mtd_subpagetest: written up to eraseblock 0
mtd_subpagetest: written up to eraseblock 256
mtd_subpagetest: written up to eraseblock 512
mtd_subpagetest: written up to eraseblock 768
mtd_subpagetest: written up to eraseblock 1024
mtd_subpagetest: written up to eraseblock 1280
mtd_subpagetest: written up to eraseblock 1536
mtd_subpagetest: written up to eraseblock 1792
mtd_subpagetest: written up to eraseblock 2048
mtd_subpagetest: written up to eraseblock 2304
mtd_subpagetest: written up to eraseblock 2560
mtd_subpagetest: written up to eraseblock 2816
mtd_subpagetest: written up to eraseblock 3072
mtd_subpagetest: written up to eraseblock 3328
mtd_subpagetest: written up to eraseblock 3584
mtd_subpagetest: written up to eraseblock 3840
mtd_subpagetest: written 4042 eraseblocks
mtd_subpagetest: verifying all eraseblocks
mtd_subpagetest: error: read failed at 0x0
mtd_subpagetest: error -74 occurred
=================================================
FATAL: Error inserting mtd_subpagetest (/lib/modules/2.6.37_Trunk_Build_122-dirty-DM8148+/kernel/drivers/mtd/tests/mtd_subpagetest.koe

Has somebody any clues? Thanks in advance!

Kind Regards,

Markus Hofstätter

  • OK, it could be due to a different required OOB Layout for the used NAND-Flash (see Micron Technote).
    Since the structure and layout of the OMAP NAND driver for the TI8148 is already completely different with OOB calculation, this will take me some time to integrate and test. According to the technote something in the read command must be adapted as well. And it looks like the internal ECC engine is indeed only 4-bit. So I'll try to change only the OOB part first.

    As a first temporary fix, in order to keep on working, I switched to SW Hamming ECC which looks to work fine so far.

    If any of you has some hints resp. information about what changes are not needed or any other changes which are need and how the are integrated flexible, please let me know. Especially if any of these changes can break the NAND driver due to assumptions that I don't know of.

    Kind Regards,

    Markus Hofstaetter

    EDIT: Also could it be that the OnDie ECC is enabled by default and is interfering with the BCH GPMC engine due to conflicting OOB layouts? Can't find anything special about that.

    EDIT2: Ok, from my understanding now, the OOB is purely SW defined and can't conflict with the NAND chip. So adding the new OOB Layout probably won't change anything. The only way it could conflict is if the NAND chip's internal ECC is used (but should be disabled by default?) since it may assume a different OOB layout than the one used by the SW.

    So maybe it could be due to an error in the driver like in this thread?

  • Almost as expected the fixes in the technote of micron did not help (only linux omap nand changes ). Even worse, forcing the HW oob layout to that one in the technote and setting HW ECC to BCH4 made basically every block bad. So nand scrubbing was the only solution to revert those changes.

    E.g., adding to omap2.c in the kernel


    } else {
    
    			omap_oobinfo.oobfree->offset = BADBLOCK_MARKER_LENGTH;
    			omap_oobinfo.oobfree->length = info->mtd.oobsize -
    				BADBLOCK_MARKER_LENGTH - omap_oobinfo.eccbytes;
    			/*
    			offset is calculated considering the following :
    			1) 12 bytes and 24 bytes ECC for OOB_64 can be supported
    			2)Ecc bytes lie to the end of OOB area.
    			3)Ecc layout must match with u-boot's ECC layout.
    			*/
    			offset = info->mtd.oobsize - MAX_HWECC_BYTES_OOB_64;
    		}
    
    		for (i = 0; i < omap_oobinfo.eccbytes; i++)
    			omap_oobinfo.eccpos[i] = i + offset;
    		
    		+info->nand.ecc.layout = &micron_nand_oob_64;

    Where the oob is a gobal static struct with the definition of the above technote. So, I'm kind of lost here were to start looking for the error would be. Still I think it is most plausible that there is an error in the NAND-driver. But since it works fine for a lot of other people...

    Maybe the OOB layout is not suited for the current subpage implementation of the driver? Anybody of TI watching this thread?

    Kind Regards,

    Markus Hofstätter

  • Okay,

    when I use (linux shell, BCH8 nand driver):

    flash_erase /dev/mtd7 0 0
    nandwrite -p /dev/mtd7 rfs_ubifs.img
    nandump -f comp_rfs -n -a -l [imagesize] /dev/mtd7

    The image date is equal, hence writing data and reading essentially works fine.
    But if I try to read just the first block with ecc enabled (-n option removed) I get an uncorrectable error.

    Here is the dump data of the first 2K page with the OOB:

    0x00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00 00
    0x00000010: 00 00 02 00 00 00 08 00 44 93 b9 8b 00 00 00 00
    0x00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x00000030: 00 00 00 00 00 00 00 00 00 00 00 00 9f be cb 4a
    0x00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000090: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000000a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000000b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000000c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000000d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000000e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000000f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000110: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000120: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000130: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000140: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000150: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000160: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000170: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000190: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000001a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000001b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000001c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000001d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000001e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000001f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000200: 55 42 49 21 01 01 00 05 7f ff ef ff 00 00 00 00
    0x00000210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x00000220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x00000230: 00 00 00 00 00 00 00 00 00 00 00 00 b8 25 64 a8
    0x00000240: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000250: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000260: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000270: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000280: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000290: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000002a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000002b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000002c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000002d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000002e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000002f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000310: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000320: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000330: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000340: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000350: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000360: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000370: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000390: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000003a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000003b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000003c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000003d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000003e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000003f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000400: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000410: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000420: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000430: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000440: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000450: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000460: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000470: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000490: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000004a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000004b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000004c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000004d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000004e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000004f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000510: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000520: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000530: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000540: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000550: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000560: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000570: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000590: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000005a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000005b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000005c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000005d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000005e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000005f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000610: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000620: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000630: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000640: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000650: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000660: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000670: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000690: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000006a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000006b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000006c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000006d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000006e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000006f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000710: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000720: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000730: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000740: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000750: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000760: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000770: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x00000790: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000007a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000007b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000007c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000007d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000007e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
    0x000007f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      OOB Data: ff ff f6 af ae 85 98 d6 10 c1 35 b4 78 d5 49 00
      OOB Data: bd 6e 53 23 92 31 02 b2 c0 4b d5 11 72 00 10 ae
      OOB Data: d1 f6 12 6c 65 3d 68 86 1a db 4a 00 10 ae d1 f6
      OOB Data: 12 6c 65 3d 68 86 1a db 4a 00 ff ff ff ff ff ff

    Kind Regards,

    Markus

  • Hello,

    try to flash the image with "ubiformat" instead of "nandwrite". Because nandwrite also writes the pages that contains only 0xFF data and calculated ECC datas. UBIFS may treat the 0xFF only pages as erased and use them , but then OOB datas will be corrupted.

    If you write the image with u-boot, try to use "nand write.trimffs" command.

    For more details : http://www.linux-mtd.infradead.org/doc/ubi.html#L_flasher_algo

    Also your NAND doesn't support subpage (512 B) writes.

    Regards,

    Sunay Mutlu

  • Hello Sunay,

    since ubiformat did not work properly before, not even under uboot, I tried to switch to nand write to do some sanity checks if writing to nand and reading from it succeeds without error using BCH8. It doesn't. I just used the ubi image to have some data to write and comapre when reading using nand read. I didn't care if UBI would be able to use it.

    To subpaging. But uboot as well as the driver report  that it has subpaging enabled and I found this in the read
    parameter page description of the datasheet: 86–89 Number of data bytes per partial page 00h, 02h, 00h, 00h.

    Which would indicate 512 Byte subpages, right? Also, using the ubimage with SW ECC and the 512 B supage layout for UBI (LEB size = 126K instead of 124K) works.

    So, I'm still clueless. Thank you,

    Regards,

    Markus

  • Are there any new ideas how to solve this issue?

  • HI,

    I am also facing the similar issue related to UBIFS image. I ran the mtd test for OOB and sub page and its failing.

    But other mtd tests like page read, stress test, speed test are working fine.

    What could be the issue? and How to resolve this?

  • Hi Markus,

    I have few questions,
    1. You might have total 7 mtd partition, right ?
    2. What is the size of each one ?
    3. Is your mtd7 partition 505MiB ?
    Have you done correct calculations regarding total PEBs and LEBs ?

    Hope you followed some steps mentioned in below website,
    http://processors.wiki.ti.com/index.php/UBIFS_Support

    I had different problem where UBIfs was not getting mounted, you can refer below post if it gives you some hint regarding this,
    http://stackoverflow.com/questions/20917921/unable-to-attach-ubi-to-mtd-partition/21013407#21013407

    commands i used were as follows,

    $(UBIFS_PATH)/mkfs.ubifs -r  $(TARGET_FS) -m 2048 -e 258048 -c 506 -o ubifs.img
    $(UBIFS_PATH)/ubinize -o $(SYSTEM_CFG)_$(HARDWARE_CFG)_ubifs -m 2048 -p 256KiB -O 2048  $(HARDWARE_CFG)_ubinize.cfg


    ubinize.cfg what i had used was as follows,

    [ubifs]
    peb=0x40000
    min_io_size=0x800
    compress_type=zlib
    mode=ubi
    image=ubifs.img
    vol_id=0
    vol_size=124MiB
    vol_type=dynamic
    vol_name=rootfs
    vol_flags=autoresize

    May be you can try out similar commands and ubinize.cfg for your NAND and see if works for you or not.

    Thank you,

    Regards,
    Ankur

  • Hi Ankur,

    thank you ver much for replying.

    1.) Yes, but only 5 of them are on the NAND-chip (u-boot1,env,uboot2,kernel,ubi. mtd7 is our ubi partition.
    2.) mtd7 starts at 0x6c0000. So these would be 6912 KiB for the first few partitions.
    3.) 505 MiB is approximately the size of mtd7 (512 MiB-~7 MiB).

    Our erase size (block size) is 128 KiB. So depending on if you choose to use subpages or not the LEB size would be 126 KiB or 124 KiB.

    mkfs.ubifs -F -x lzo -U -v --root=$RFSDIR --min-io-size=2048 --leb-size=126976 --max-leb-cnt=3957 -o $RFSDIR/../rfs.ubifs
    
    #and the ubinze params
    -m 2048 -p 128KiB -s 2048 -O 2048
    #or
    -m 2048 -p 128KiB -s 512 -O 2048

    The volume size was set to dynamic and autoresize with 485 MiB and altogether we use HW BCH8 as ECC.

    Using this image only works to some degree (the written image is abot 40 MiB). When writing it to NAND with U-Boot it usually works "well" and is mountable both by the kernel and u-boot. When writing it with the kernel it either does not work (uncorrectable ecc error -74) or it does not work after some time.  Also after some time relocates may happen, although the FS is not used/written very often.

    Since we noticed this problem, we wanted the verify if the NAND (hw or sw) had problems. So we switched to NAND nand checks, i.e. writing and reading stuff including the mtd tests. Therefore when writing the ubi images we were not concered anymore if ubi could work with them, but rahter if we could read (dump the raw data) without any problems. It turned out that this was mostly not the case, even when written from u-boot at times.


    Sicne we wanted to get closer to the root cause of the NAND system (HW or SW) not working correctly we updated to the current omap u-boot also used by TI814x from the arago project and switched to a chip release revision (3.0) of the TI814x instead of the experimental one. We ported the mtd_stresstest to u-boot (using only page sized accesses and with a readback after write). It turned out that the driver worked fine there (for (multi) paged sized accesses).

    Now we modified to the mtd_stresstest to do the same in linux (page accesses and  readback). It turned out that after about 160 read/write operations, so probably 80 write operations an error occured: Uncorrectable ECC error -74. To check that the page was clean/erased before writing to it we read out the page and compared it against 0xFF and printed an error if this was not the case. Suddenly, no error was encountered. To ensure this is no timing issue we replaced this empty page check with a sleep (from 3 ms to 3 seconds) and as expected the error returned. This means that there is probably an error in the write function of the mtd or omap2.c driver, when calculating the ECC or writing the data. Probably the ECC engine is not setup correctly, which is removed by the previous read of the empty page, since when writing the ubi image as a test this test sometimes returned an uncorrectable ECC error, but the dumped data was correct.


    I don't know if we have the time to find the bug within the driver, but since we pretty much rooted it down to a driver bug within the write function, which does not occur when this page is empty and read before writing to it, someone at TI might be faster at finding the bug and patching it.

    EDIT: Also we will check again if we really have the same driver as the current one of the arago project in linux-omap3,i.e. that the mtd subsystem is untouched, except for the mtd partion definition.

    Regards,


    Markus

  • Hi Markus,

    How are you writing ubifs into mtd 7? I suppose you must be using mtd-utils for doing it, right ?

    If yes then which version of mtd-utils you are using ?

    If it is older, You can find latest version of mtd-utils from below link, cross compile it and put it into rootfs and use it.

    http://git.infradead.org/?p=mtd-utils.git

    Below links mentions all the steps, recently i had compiled one for me, (i referred both the links and was successful in compiling it for my board.)

    http://processors.wiki.ti.com/index.php/MTD_Utilities

    http://wiki.beyondlogic.org/index.php/Cross_Compiling_MTD_Utils_for_ARM

    Thank you,

    Regards,

    Ankur

  • Reason i suspected mtd-utils is because when you flash ubifs from u-boot it works and when you flash it from kernel it fails.

  • Hi Ankur!


    The ubifs image and some random generated image have been written using mtd-utils. Although, due to the previous problems, it is not important to us if these images do represent valid file systems or are even mountable.

    The UBIFS image works, also when written with the mtd-utils, with the kernel too (mountable etc.), but sometimes mounting or just reading the partition with mtd-utils can fail with an uncorrectable error, when some changes have been made (relocation of a PEB, ...) by the kernel. Even when the image was written with u-boot before. That's why we suspected a lower level problem and switched to mtd-utils raw tests (writing either the ubifs.img or some random generated data image > 200 MiB).

    We then encoutered these uncorrectable errors too. Also if the (suspected) broken partition was read out with nand read in u-boot it failed there too. Meaning that the kernel corrrupted the partition. When we switched to the SW-ECC temporarily for writing the ubi image,  it worked for both, although we did not do any longer tests.

    We build our filesystems using yocto: Accoring to the temporary build tree the version of mtd-utils is 1.49.
    As soon as we have time to test a newer version we will do a switch to that a newer version.

    The stresstest passes, when we readout the erased page before we write in the mtd-stresstest. Otherwise it fails at times with the uncorrectable ecc error.

    The normal cycle in the mtd-stresstest is:

    select randomly if read or write shall be peformed

    For a write:

    select a random block -> select an random offset (page wrapped in our case) ->
    select a random length (multiple pages at most  to thend of the next block) ->
    erase blocks (only if writing to an offset with len which has not been erased before or has been written to already) ->
    write (the generated random) data.

    We added  ...->write data -> readout data and compare -> breaks with uncorrectable after some time.
    If we add ...->erase blocks -> readout affected pages and compare against 0xFF -> write data -> readout data and compare. The test passes without any error.

    Replacing the 0xFF readout with a sleep does not help. So the readout before writing sets the Hardware (or driver) into a correct state, wheres a previous write, erase (or previous read) function call leaves the hardware/driver in a bad state.

    I don't think that there is an (significant) error in the stress-test, since it is a kernel module and does not depend on mtd-utils. Furthermore, it does only use the functions erase, read and write of mtd driver. The ported one for u-boot works fine. That's why we suspect an error in the linux driver write function. For now we just want that the low level part works, since as long as the low level part does not work correctly, we do not need to consider writing or mounting any filesystem in HW BCH8 mode. Since we think that his may be the cause of the error when writing the UBI images (or any other data) with the mtd-utils.

    Thank your very much Ankur,


    Markus

  • hi  Ti,

    i found that the nand ready/busy is not implemented in the kernel linux-3.2.0-psp04.06.00.10 for am335x

    is it true ? did i miss something ? 

    if not implemented, how to added it ? did we have any patches ?

    if rdy/busy is implemented i think the delay implementation is not required. is it ?

    for ref:

    --------

    drivers/mtd/nand/omap2.c

    /*
    * If RDY/BSY line is connected to OMAP then use the omap ready
    * funcrtion and the generic nand_wait function which reads the status
    * register after monitoring the RDY/BSY line.Otherwise use a standard
    * chip delay which is slightly more than tR (AC Timing) of the NAND
    * device and read status register until you get a failure or success
    */
    if (pdata->dev_ready) {
    info->nand.dev_ready = omap_dev_ready;
    info->nand.chip_delay = 0;
    } else {  /* controle comes here */
    info->nand.waitfunc = omap_wait;
    info->nand.chip_delay = 50;
    }

    can u please calrify this ?

    regards,

    Nagendra


  • Hi Nagendra,

    I think this does not relate to the topic of my thread? 

    As it stated in the comment: If the RDY/BSY line is connected to the NAND-chip it won't use the delay approach.


    Regards,


    Markus