This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM365: Errors with YAFFS2 file system

Hello,

in my quest to find the best file system to use, I've incurred in some more errors while using YAFFS2 file system on my EVM board. Basically the problem happens when using "small partitions" while the same operations do not fail when using "big partitions". I have not been able to trace the exact size, but the same operations repeated on the same board using 2 different mtd devices, with different sizes, have different results. Notice that both mtd partitions are a lot bigger than the real data I am trying to put in.

Moreover when the problem happens the bad block table of the device becomes corrupt until next reboot.

This is my MTD setup:

dev:    size   erasesize  name
mtd0: 006c0000 00020000 "bootloader"
mtd1: 00060000 00020000 "params1"
mtd2: 00060000 00020000 "params2"
mtd3: 00400000 00020000 "Kernel1"
mtd4: 00400000 00020000 "Kernel2"
mtd5: 01800000 00020000 "Root1"
mtd6: 01800000 00020000 "Root2"
mtd7: 00400000 00020000 "User1"
mtd8: 00400000 00020000 "User2"
mtd9: 7b880000 00020000 "Spare"
mtd10: 00002000 00000010 "spi_eeprom"

 

In my test I'll use mtd6 (64MB partition) and mtd9 (nearly 2GB partition, but with a 512GB partition the results are the same).

The operation without problems: I erase mtd9, mount a new yaffs2 file system on it, untar my root in it.

# flash_eraseall /dev/mtd9
Erasing 128 Kibyte @ 1ba0000 --  1 % complete.
Skipping bad block at 0x01bc0000
Erasing 128 Kibyte @ dae0000 -- 11 % complete.
Skipping bad block at 0x0db00000

...

there are some bad blocks, but they do not cause any problem, as expected.

# mount -t yaffs2 /dev/mtdblock9 /mnt/point

No errors on mount command

# time tar -x -C /mnt/point -f /mnt/nfs/rootfs.arm.tar.gz
real    2m 10.99s
user    0m 0.34s
sys     1m 58.84s

untarring on yaffs2 is a bit slow, but it works as expected. No errors, the file system can be unmounted and mounted back. Notice the output of df says that the used space is a lot smaller than the partition. This is correct.

# df
Filesystem           1K-blocks      Used Available Use% Mounted on
mtd:Root1                24576      8464     16112  34% /
tmpfs                    25600         0     25600   0% /upgrade
tmpfs                     1024        64       960   6% /tmp
tmpfs                     1024        16      1008   2% /root
tmpfs                     1024        64       960   6% /dev
tmpfs                     1024        64       960   6% /var
tmpfs                     1024       424       600  41% /var/log
/dev/mtdblock7            4096       400      3696  10% /flashrw
/dev/mtdblock9         2023936      8368   2015568   0% /mnt/point

Now, the incorrect bahavior: I'll repeat the same operation on mtd6, a 64MB partition.

# flash_eraseall /dev/mtd6
Erasing 128 Kibyte @ 17e0000 -- 99 % complete.

No bad blocks, good.

# mount -t yaffs2 /dev/mtdblock6 /mnt/point

Mounting is fine, as in previous case. But not untarring the data in the fs causes problems. Notice that using "cp" to copy the data in causes the same.

# tar -x -C /mnt/point -f /mnt/nfs/rootfs.arm.tar
tar: can't open './usr/share/zoneinfo/Africa/Sao_Tome': Cannot allocate memory

At some point during untar, the system went out of memory.

Notice the output of dmesg, truncated since it is too long:

# dmesg
> Block 111 retired
Block 111 is in state 9 after gc, should be erased
yaffs: Block struck out
nand_update_bbt: Out of memory
nand_erase: attempt to erase a bad block at page 0x00006ac0
yaffs: Failed to mark bad and erase block 112
**>> Block 112 retired
Block 112 is in state 9 after gc, should be erased
yaffs: Block struck out
yaffs: Block struck out
nand_update_bbt: Out of memory
...

But what's worse is that the bad block table in memory is corrupted: notice the result of flash_eraseall repeated on the partition:

# umount /mnt/point

# flash_eraseall /dev/mtd6
Skipping bad block at 0x00000000
Skipping bad block at 0x00020000
Skipping bad block at 0x00040000
Skipping bad block at 0x00060000
Skipping bad block at 0x00080000
Skipping bad block at 0x000a0000
Skipping bad block at 0x000c0000
...

But the previous flash_eraseall on the same mtd device did not detect any problem. If I reboot the board, the problems with bad blocks disappear.

 

  • I made a complete reinstall from ccs+jtag, nand erase.

    The nand erase algo finds lots of bad blocks on nand 1 and writes them into the bad blocks table.

    Somehow this bad block table is ignored later.

    On flash_eraseall mtd3 - lots of blocks are skipped.

    Restoring the nand filesystem gives no error - on boot up I see the following:

    yaffs: dev is 32505859 name is "mtdblock3"
    yaffs: Attempting MTD mount on 31.3, "mtdblock3"
    block 312 is bad
    block 854 is bad
    block 1100 is bad
    block 1108 is bad
    block 1266 is bad
    block 1334 is bad
    block 1712 is bad
    block 2004 is bad
    block 2026 is bad
    block 2036 is bad
    block 2038 is bad
    block 2058 is bad
    block 2060 is bad
    block 2206 is bad
    block 2222 is bad
    block 2298 is bad
    block 2407 is bad
    block 2504 is bad
    block 2512 is bad
    block 2570 is bad
    block 2574 is bad
    block 2592 is bad
    block 2598 is bad
    block 2710 is bad
    block 3034 is bad
    block 3052 is bad
    block 3090 is bad
    block 3098 is bad
    block 3232 is bad
    block 3310 is bad
    block 3670 is bad
    block 3744 is bad
    block 3746 is bad
    block 3768 is bad
    block 3876 is bad
    block 3958 is bad
    block 3974 is bad
    VFS: Mounted root (yaffs2 filesystem).
    Freeing init memory: 196K
    Warning: unable to open an initial console.
    Kernel panic - not syncing: No init found.  Try passing init= option to kernel.

    So I dont think that there is any handling for bad blocks.

    I have a "REV D" board with micron flash.

    Had problems since first boot.

  • This seems the problem I experience...

    Bad block table is ok.

    I flash_eraseall and it works, the only bad blocks are those in the table.

    Then when I mount yaffs2 and I copy data in, a lot of (not existing) errors are produced and the BBT in memory is corrupted. In fact if I flash_eraseall again, a lot of blocks are skipped since they are marked as bad.

    Then if I reboot the errors disappear since the "correct" BBT is used.

     

  • Make sure to have Software ECC enabled in the kernel - using hardware ECC will result in a quite a few problems with YAFFS2.

  • Thanks for your suggestion! But with "software ECC" do you mean to enable it in MTD device (NAND driver must be modified) or in the YAFFS file system options?

     

  • Please consult the Davinci wiki, as it has this (and other quite useful stuff)

    http://wiki.davincidsp.com/index.php/Put_YAFFS_Image_to_Flash -> How to put YAFFS2 in flash

    http://wiki.davincidsp.com/index.php/Disabling_NAND_HW_ECC_support -> Disabling HW ECC support in NAND MTD driver

    I have had no problems with YAFFS2 on the DM365 - in fact, it has become the defacto filesystem of choice for our applications, as it is quite fast (about 5x faster than JFFS2 for mounting).

    Jerry

  • Will try that....

    I just followed the illusion that I can restore my board using the "default" kernel image performing "default" procedures as described.

    So I was missing the following lines:

    EVMDM365 is shipped with more than one type of nand flash.

    The nand flash differs in block size which triggers the need to enable/disable HW_ECC in the kernel depending on flash type.

    Kernel image xx is compiled for board revisions xx.

    For REV-D board you need to build image with xx features changed.

     

    rgds.