The problem of the yaffs2 filesystem on dm365

zhenhong

when I mount the the nand and restore filesystem to it, then reboot from nand, first time ti will be ok, but when I reset and reboot second time, it will display so many "Partially written block xxxx being set for retirement", why? And how could I do to resolve this?

over 14 years ago

0 Tom Hanson over 14 years ago

Intellectual 920 points

I've been having similar issues. LOTS of "block xxx prioritized" messages at unmount (after restoring the tar file into the filesystem or copying files in manually); lots of blocks being retired and struck out (both at mount and while running); lots of bbt out of memory errors at the same time.

I just switched to JFFS2 and all is quiet. Haven't run with it very long but it looks much better.

0 Marco Braga over 14 years ago in reply to Tom Hanson

Expert 1780 points

Confirmed by me too. There is a problem with yaffs2. Too bad since yaffs2 is way superior to jaffs2.

0 Craig Smith over 14 years ago in reply to Marco Braga

Intellectual 810 points

I'm seeing the same issue w/ yaffs2. However, I have not seen any actual problems other than the "Partially written block...." printouts.

Marco, when you said "there is a problem with yaffs2", can you elaborate?

I've tried jffs2, but it is painfully slow to mount.

0 Marco Braga over 14 years ago in reply to Craig Smith

Expert 1780 points

Hi,

I'm referring to this:

http://e2e.ti.com/forums/t/10966.aspx

0 Craig Smith over 14 years ago in reply to Marco Braga

Intellectual 810 points

Thanks for the reply Marco.

Did you ever try your experiment using software ECC (i.e. .ecc_mode = NAND_ECC_SOFT in nand device setup)?

0 Marco Braga over 14 years ago in reply to Craig Smith

Expert 1780 points

Sadly I've not had the time to try it. Also I'd like to understand the performance hit due to software ECC.

0 Kevin Claycomb over 14 years ago in reply to Marco Braga

Intellectual 420 points

I have been running YAFFS2 from nand with software ECC complied in my kernel and have seen none of the issues you guys are reporting. I have not run any performance comparisons, but my instinct is that the hit you take by going JFFS2 instead of YAFFS2 is worse than the one you take with software ECC. Just my 2 cents.

Kevin

0 Craig Smith over 14 years ago in reply to Kevin Claycomb

Intellectual 810 points

That's good news Kevin. Can you (or anyone) tell me why there isn't a mkfs.yaffs2 tool to generate a binary image file from a directory that can be burned directly to flash (with u-boot or something else)?

It seems the only way to create an initial yaffs2 filesystem is with an NFS mount, then mount yaffs2, then untar some file onto it. Or am I missing something?

0 Kevin Claycomb over 14 years ago in reply to Craig Smith

Intellectual 420 points

Unfortunately I do not think you are missing anything or at least their isn't anything that has been widely adopted in my brief searches. My process has been as you have described, except I use the Montavista Devrocket tool to build my platform image.

1. Use the Montavista platform image builder to generate a ext3 filesystem file.

2. Mount the file using a loopback e.g. mount -ext3 myfilesys.ext3 /tmp/tempFS -o loop

3. cp all files to a temp directory

4. Add or make install modules, dvsdk, or anything else you want on your root FS

5. tar it up (tar -czf), copy to your NFS mountpoint

6. (On the board) erase the nand, mount your nand partition using yaffs2 and untar to your filesystem to it.

It's painful, but can be automated with a little scripting.

Kevin

0 Marco Braga over 14 years ago in reply to Kevin Claycomb

Expert 1780 points

Hi Kevin,

can you please detail how you disabled hw ECC and enabled software? Thanks!

0 Kevin Claycomb over 14 years ago in reply to Marco Braga

Intellectual 420 points

Marco,

One wrinkle may be that I am using the Montavista Pro 5 2.6.18 distribution. I am not sure how much divergence it has from the TI git tree. That being said I have enabled the [Device Driver->MTD->NAND Flash Device Drivers->NAND Device Support->NAND ECC Smart Media Byte Order] option in my linux kernel xconfig window. I am also using a 4K page size NAND device.

Kevin

0 John Reynolds over 14 years ago in reply to Kevin Claycomb

Intellectual 255 points

Just to add my 2 cents, I've been running YAFFS2 with the 2.6.10 kernel patched for the software ECC, with no problems

on the DM355EVM pcb. The number of bad blocks does not increase on subsequent boot ups

and I don't see any MTD errors etc . I have the YAFFS2 partition mounted as root.

One problem I have seen on our custom pcb, based on the DM355EVM which uses a smaller 2Gb flash

chip a MT29G2G, I'm able to flash it with the DM3xx flash boot utils 1.50 ok and load uboot with no errors etc.

I can save params in uboot for a number of cycles then uboot complains about a "bad erase block" which prevents

you from saving the param block. I put the pcb away and the next day I was able to flash in the params ! As expected the

problem returned. We changed the flash chip, it worked for a while the got errors flashing in a uImage. As a last resort

i did a nand scrub and re-flashed uboot and it seems to be working so far. Not sure if this a hardware or software issue yet.

John Reynolds

0 Marco Braga over 14 years ago in reply to John Reynolds

Expert 1780 points

Hi,

can you comment on reliability on unclean mount / power cycle?

Thanks!

0 Sandeep Paulraj over 14 years ago

TI__Intellectual 1365 points

We have patches that apply on top of LSP 2.6.18 to take care of this.

The patch is basically a back port of the latest version of yaffs2 that was available in around April '09

0 Craig Smith over 14 years ago in reply to Sandeep Paulraj

Intellectual 810 points

Sandeep, can you be specific about what exactly what "this" is when you say the patch "takes care of this". Also, are these patches available for download from TI?

Regards,

-Craig

0 Sandeep Paulraj over 14 years ago in reply to Craig Smith

TI__Intellectual 1365 points

Use the attached patch

0 John Reynolds over 14 years ago in reply to Marco Braga

Intellectual 255 points

Not sure if this is addressed to me, but I'm using YAFFS2 as the root so I don't normally unmount it.

I have no problems rebooting or re-powering. No MTD errors or more bad blocks etc.

John Reynolds

0 Marco Braga over 14 years ago in reply to John Reynolds

Expert 1780 points

I'm asking about reliability because I've had some bad surprises with JFFS2. I expected it to be able to withstand power cycles, but in several cases it failed. I've even seen several devices failing to boot because JFFS2 crashed the kernel when mounting a damaged file system due to power cycle while writing. If yaffs2 is superior in this aspect, It might become my file system of choice. Too bad there is no way to create an image and write it with nandwrite. It is much faster with JFFS2.

0 Craig Smith over 14 years ago in reply to Marco Braga

Intellectual 810 points

FYI...I applied the patch posted by Sandeep and I no longer see the "Partially written block ... being set for retirement" message. I have not seen any issues yet, but I don't have too many cycles with it. Thanks.

0 Craig Smith over 14 years ago in reply to Craig Smith

Intellectual 810 points

OK, I spoke too soon. I am now seeing a problem where changes I make on the file system are not reflected after a reboot (i.e. modify a file, reboot and the file is resotored to the original version). Any ideas?

0 John Reynolds over 14 years ago in reply to Craig Smith

Intellectual 255 points

I don't change the contents of files much over reboots, but will
check this out next week when I get back to work.

John

0 Marco Braga over 14 years ago in reply to Craig Smith

Expert 1780 points

Try to check for yaffs2 errors using "dmesg". Perhaps there is something in there.

0 Pavel2 over 14 years ago

Prodigy 60 points

需要升级yaffs2的源码，可以使用最新的yaffs源码需要注意几点

1、原来使用的是硬件校验方式，但是oob只有64byte，硬件校验使用了40byte，剩下的如果使用yaff2则是不够用的，需要使用ecc软件校验方式。

2、修改mkyaff2image.c的源码，添加64byte的oob数据，使用软件校验方式

3.修改nand_base.c的一些设置，ti的默认使用硬件校验，但是软件校验的配置选项有点问题，需要修改。

3、修改u-boot，可以正确烧录image文件系统。

0 John Reynolds over 14 years ago in reply to Pavel2

Intellectual 255 points

Say What ? :)

I haven't noticed any abnormal YAFFS2 errors in the logs with my 2.6.10 build. I applied the software ECC patch.

I do update my root on a regular basic via the linux 'cp' command. I will keep an eye on it and post any problems

John

0 Kevin Claycomb over 14 years ago in reply to John Reynolds

Intellectual 420 points

All, I can report that the yaffs2 FS seems to be fairly robust to power cycles (as advertised). We have been running the filesystem in NAND flash on our three prototypes for about 2 weeks and have had no file system related issues at all. These boards are all seeing lots of reboots and hard power cycles as we are testing other parts of the prototype design.

Kevin

0 Craig Smith over 14 years ago in reply to Kevin Claycomb

Intellectual 810 points

I am experiencing a major problem using the yaffs2 filesystem. I am using 2.6.18 kernel patched with the patch supplied previously in this thread.

When I edit a file (a shell script for example), and reboot, the changes are gone. The original file reappears.

Also, then I copy a new file on the filesystem (for example from an nfs mounted directory), the file is gone after a reboot and I see:

ls: ./��y

Kevin, are you using 2.6.18? If so, did you apply the patch from this thread?

-Craig

0 Craig Smith over 14 years ago in reply to Craig Smith

Intellectual 810 points

Is it possible I am simply mounting the filesystem wrong? Can someone post their cmdline args and /etc/fstab they use with the yaffs2 filesystem? Thanks so much.

0 Craig Smith over 14 years ago in reply to Craig Smith

Intellectual 810 points

I just wanted to add some more debug info in case it might help someone give me a clue as to what is wrong.

It appears I am able to mount/unmount/modifiy file etc. no problem when I just mount the yaffs2 filesystem to a temporary mount point (when my root is mounted via NFS).

For example:

mount -t yaffs2 /dev/mtdblock3 /mnt/nand
cd /mnt/nand

<<then I can copy files, edit files, etc.>>

cd;umount /mnt/nand

Then I can re-mount, reboot, etc. that same device (but still with NFS root) and the changes to the file system are persistent. It seems only when I am using the yaffs2 filesystem as my root, that I see the problem.

Thanks in advance for any advice.

Regards,

-Craig

0 Kevin Claycomb over 14 years ago in reply to Craig Smith

Intellectual 420 points

Craig,

I am using the 2.6.18 kernel from Montavista.

From what you are describing it sounds like maybe there could be an issue with your boot parameters?? What are the boot args that you are passing into the linux kernel from u-boot in order to have the kernel mount the nand as the root filesystem? You should have a read write option in there "rw". Is it possible that this has been omitted and the filesystem is being mounted readonly?? I am not sure what the kernel would do by default. I have all of my boot args on my work PC, I will post them up tomorrow.

Kevin

0 Craig Smith over 14 years ago in reply to Kevin Claycomb

Intellectual 810 points

I thought it might be my bootargs, but they look OK to me:

setenv bootargs console=ttyS0,115200n8 root=/dev/mtdblock3 rw rootfstype=yaffs2 ip=192.168.1.86:192.168.1.1:192.168.1.1:255.255.255.0:::off mem=180M video=davincifb:osd0=1920x1080x16,4050K@0,0:osd1=1920x1080x4,2025K@0,0 dm365_imp.oper_mode=0 davinci_capture.device_type=4

I also wondered if it had something to do with the "checkpoint" feature.

Kevin, are you using the 2.6.18 patch supplied from this thread? Also, can you post your YAFFS kernel options? Here are mine:

grep YAFFS .config
CONFIG_YAFFS_FS=y
CONFIG_YAFFS_YAFFS1=y
# CONFIG_YAFFS_9BYTE_TAGS is not set
CONFIG_YAFFS_DOES_ECC=y
# CONFIG_YAFFS_ECC_WRONG_ORDER is not set
CONFIG_YAFFS_YAFFS2=y
CONFIG_YAFFS_AUTO_YAFFS2=y
# CONFIG_YAFFS_DISABLE_LAZY_LOAD is not set
# CONFIG_YAFFS_DISABLE_WIDE_TNODES is not set
CONFIG_YAFFS_ALWAYS_CHECK_CHUNK_ERASED=y
CONFIG_YAFFS_SHORT_NAMES_IN_RAM=y

0 Kevin Claycomb over 14 years ago in reply to Craig Smith

Intellectual 420 points

My bootargs look similar to yours.. so I guess thats not the problem.

console=ttyS0,115200n8 noinitrd ip=192.168.10.107:255.255.255.0:::DM365 root=/dev/mtdblock3 rw rootfstype=yaffs2 mem=76M video=davincifb:vid0=OFF:vid1=OFF:osd0=720x576x16,4050K dm365_imp.oper_mode=0 davinci_capture.device_type=4

I however did not patch my kernel for yaffs2... Apparently you don't need to do so if you use the Montavista source??

My Kernel params are below... hope this helps.

CONFIG_YAFFS_FS=y
CONFIG_YAFFS_YAFFS1=y
# CONFIG_YAFFS_9BYTE_TAGS is not set
# CONFIG_YAFFS_DOES_ECC is not set
CONFIG_YAFFS_YAFFS2=y
CONFIG_YAFFS_AUTO_YAFFS2=y
# CONFIG_YAFFS_DISABLE_LAZY_LOAD is not set
# CONFIG_YAFFS_DISABLE_WIDE_TNODES is not set
# CONFIG_YAFFS_ALWAYS_CHECK_CHUNK_ERASED is not set
CONFIG_YAFFS_SHORT_NAMES_IN_RAM=y

Kevin

0 Craig Smith over 14 years ago in reply to Kevin Claycomb

Intellectual 810 points

Hello Kevin, and others. So I changed my kernel config to match Kevin's and my YAFFS2 filesystem started working great (although it doesn't seem like the option that was different should have mattered). However, now I have a new problem related to bad blocks. It seems when the flash device detects bad blocks I am not able to mount the filesystem.

For example, if I create a new YAFFS2 filesystem using the "flash_eraseall /dev/mtd3, mount, untar, etc." method on a flash partition where bad blocks are detected, I can reboot and mount the filesystem once and it appears to work. However, the next time I reboot the filesystem can no longer be mounted (whether I make a change to the filesystem or not). I can post more details with kernel messages tomorrow when I'm in the office again. I am using 2.6.18 (Montavista Pro 5.0 from TI) with the patch supplied in this thread.

Has anyone experienced this? Shouldn't bad blocks be handled correctly no matter where they are detected?

Regards,

-Craig

0 Kevin Claycomb over 14 years ago in reply to Craig Smith

Intellectual 420 points

Craig,

I am not experiencing the issues with bad blocks you are describing. However, the hardware I am running on now is significantly different that the EVM board with respect to the NAND flash. I am only using one NAND bank instead of two and it is a much smaller chip. I am not sure if that is relevant or not.

Kevin

0 Craig Smith over 14 years ago in reply to Kevin Claycomb

Intellectual 810 points

Yes, I am also using custom hardware totally different form the EVM with a 256MB Numonyx NAND flash. <edit> Do yo know if the bad block table uboot creates is the same as what is created by the kernel (assuming you are using uboot)?

Thanks.

0 Craig Smith over 14 years ago in reply to Craig Smith

Intellectual 810 points

Kevin, let me ask another question....is your system tolerant of power loss (wrt the filesystem)? In other words can you "pull the plug" on your board and have it remount the filesystem successfully? I think this might be the cause of the problem I'm seeing. If I restart with a "reboot" command or "shutdown" and then power cycle, then it seems I can remount the yaffs2 successfully (even with bad blocks).

However, if I just pull the power (just in an "idle" state, not doing any explicit file I/O) I am not able to remount. It was my understanding that yaffs2 was tolerant of these sorts of things. Please let me know your experience with this situation. Thanks so much.

Regards,

-Craig

0 Kevin Claycomb over 14 years ago in reply to Craig Smith

Intellectual 420 points

Craig,

My system has seen a decent amount of power loss without a proper shutdown. We have not seen any real problems and Yaffs2 does seem to be tolerant of it.

However, you mentioned that you were using a different NAND chip from the EVM, does your chip have two banks or one? You may be aware of this, but on the EVM board the 2GB micron part has two banks of NAND. The EVM switches between the two banks by using only one of the CE lines of the EMIF and an addressing pin. These lines are decoded in the Altera CPLD into two separate chip enables for the NAND chip. Since my custom design only uses one bank of NAND ( I am using a MT29F4G08AAC 512MB Micron part), I had to alter the Linux driver to not use that addressing line as a second CE. This was causing weird issues because the Kernel was actually selecting the same chip twice, but didn't realize it. So in effect it was writing to the same bank thinking it was two different memory areas. Funny thing about it was that it would work fine until I tried to reboot and then it had all kinds of different behaviors.

This is a short explanation for a complicated issue so if you think you might have a similar problem let me know and I will go into it in more depth. Outside of that, I am not sure what could be going wrong. Have you tried using any other NAND parts?

Kevin

0 Craig Smith over 14 years ago in reply to Kevin Claycomb

Intellectual 810 points

Kevin, I was aware of the addressing-line-chipselect-using-the-CPLD on the EVM. However, since our flash has only 1 chip select (but 2 "internal banks") I did not think I needed to modify the kernel driver. We have tried using other NAND parts (Micron), but only in another prototype board and did not have many cycles on it.

So, I would be interested in the changes you made to disable the other chip select. Something in drivers/mtd I guess. I don't believe I've touched that code, so a patch may work.

Thanks,

-Craig

0 Craig Smith over 14 years ago in reply to Craig Smith

Intellectual 810 points

Kevin, I just looked at my kernel code again and it turns out I did make a small change to "force" it to only look for 1 NAND chip.

In drivers/mtd/nand/davinci.c:nand_davinci_probe(), I made the following change:

    /* Scan for the device */
//    if (nand_scan(info->mtd, info->data_res[1] ? 2 : 1)) {
// !! cds - force this to only use 1 "chip"
    if (nand_scan(info->mtd, 1)) {
        dev_err(dev, "no nand device detected\n");
        err = -ENODEV;
        goto out_unuse_clk;
    }

At the time, everything appeared to work. But maybe this isn't correct or you know of additional changes that need to be made. Please let me know....thanks again.

Regards,

-Craig

0 Craig Smith over 14 years ago in reply to Craig Smith

Intellectual 810 points

Hi Kevin, I thought I'd report that I've found that if I do "sync" prior to powering off the device I do not see the failures when I reboot. However, this is obviously not a solution.

I am still interested in what changes you made to the kernel to "undo" the address-line-chip-select function used for the EVM. Thanks.

Regards,

-Craig

0 Craig Smith over 14 years ago in reply to Craig Smith

Intellectual 810 points

My problem appears to be related to bad blocks. I do not see the "reboot/can't mount w/ out an explicit sync" problem on boards that don't have bad blocks in the file system partition. So I have a few questions hopefully someone can answer:

1) Do you have NAND_USE_FLASH_BBT option enabled (it is not by default if using .ecc_mode= NAND_ECC_SOFT, which is needed for yaffs2).

2) If the answer to 1) is "yes", did you also modify u-boot to use ECC_SOFT (otherwise the kernel reports ECC errors when reading the BBT)?

3) If the answer to 1) is "no", then is the kernel/filesystem supposed to create its own BBT and still handle bad blocks correctly?

I am still digging in to this, but it would be great if someone could answer these questions.

Thanks,

-Craig

0 Hao Liu48112 over 13 years ago in reply to Craig Smith

Intellectual 960 points

Hi Craig, have you solved your problem? I have same problem. Hope you can update your findings. Thanks.

0 John Reynolds over 13 years ago in reply to Craig Smith

Intellectual 255 points

Hi, I'm running the 2.6.10 kernel and was getting bad blocks after repeated writes to flash. Eventually the flash would run out of

good blocks and had to be formatted . To fix the problem I has to increase the delay in the function nand_scan() located

in drivers/mtd/nand/nand_base.c from 20 to 50 us.

John

int nand_scan (struct mtd_info *mtd, int maxchips)
{
    int i, nand_maf_id, nand_dev_id, busw, maf_id;
    struct nand_chip *this = mtd->priv;

    /* Get buswidth to select the correct functions*/
    busw = this->options & NAND_BUSWIDTH_16;

    /* check for proper chip_delay setup, set 20us if not */
    if (!this->chip_delay)
//// JJR         this->chip_delay = 20;
        this->chip_delay = 50;

0 Chitat over 13 years ago in reply to Craig Smith

Prodigy 160 points

Hi, Craig and all friends,

We faced a similar issue in mounting yaffs2 file system with a Samsung NAND Flash of Page Size (512 + 16) bytes & Erase Block Size (16K + 512) bytes on our DM365 hardware prototype board. We fixed the issue by changing .ecc_mode = NAND_ECC_SOFT only in the kernel source (linux_2.6.18_pro500 from MontaVista in DVSDK_2_00_2_10_01_18) WITHOUT APPLYING THE PATCH in this thread. After applied the change, we are able to mount yaffs2 file system without experiencing any reliability issue in the yaffs2 partition.

However, we found another issue after applied the change. We used the nandwrite command tool to flash uImage to the kernel partition as below:

flasherase_all /dev/mtd2

nandwrite -p /dev/mtd2 uImage

After reboot, we got the following error message from u-boot during loading the kernel image from the NAND flash:

Loading from NAND 64MiB 3,3V 8-bit, offset 0x100000
** Read error
## Booting kernel from Legacy Image at 80700000 ...
   Image Name:   Linux-2.6.18_pro500-davinci_evm-
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    2074716 Bytes = 2 MB
   Load Address: 80008000
   Entry Point: 80008000
   Verifying Checksum ... Bad Data CRC
ERROR: can't get kernel image!

But, if we resumed the old setting .ecc_mode = NAND_ECC_HW_SYNDROME and repeated the same procedure, u-boot can successfully load the kernel image and boot up without any problem.

Do you all have any ideas or comments on this weird problem?

Thanks

Chitat

0 Marco Braga over 13 years ago in reply to Chitat

Expert 1780 points

I suppose the problem is that u-boot and the linux kernel don't agree about the ECC information written on the spare area. If you set ".ecc_mode = NAND_ECC_SOFT" then ECC will be calculated and written in software. This means that when you use nandwrite, ECC is calculated by linux using the software algorithm. But then, when u-boot reads the image using HW ECC, it does not agree with the ECC information written by the kernel in software mode. Perhaps sw ECC and hardware ECC place the ECC information in a different place on the spare area, or it calculates 2 bit ECC instead of 4 bit ECC.

Yaffs2 instead works because the same algorithm is used in read and write. Only the linux kernel is used to access the file system, and of course it's compatible with itself.

0 Chitat over 13 years ago in reply to Marco Braga

Prodigy 160 points

Thank you so much, Marco.

I'm not the expert of the NAND flash file system. As I mentioned that there is a kernel patch for the yaffs2 problem, do you think that the patch can fix my ECC related issue? If not, where could I get more information on it?

Chitat

0 stevenyin over 13 years ago

Intellectual 440 points

Hi all:

I met a strange issue when I reboot my IPNC, I use APPRO IPNC Ref Design 1.5.0, in our own DM365 IPNC board, if I reboot the syetem at command line, "#reboot -f", Linux will hang up some times, the probability is about 10%.

Such as below hang up info:

******************************************************************************************************************************************************

nVideocodecmode = 5
nVideocodecres = 0
startStream 4**************
begin to call dual streams 1
begin to call dual streams 2
startStream 15**************
cmd = ./av_server.out FLIPH FLIPV AEWB APPRO2A 720P H264 3200000 CVBR H264 680000 CVBR MENUOFF &

Error: Invalid Semaphore handler
Error: Invalid Semaphore handler
Error: Invalid Semaphore handler
Error: Invalid Semaphore handler

******************************************************************************************************************************************************

it hangs up at different running step, but they all happen after ""system(cmd);" calling "av_server.out ******" at "StartStream" in av_server_ctrl.c,

Only power off and power on the board, can boot the kernel, linux applications successfully.

Is there some suggestion ?

Steven

BESTREGARDS

0 YJ Liu over 12 years ago in reply to Pavel2

Expert 1675 points

兄台说对了，但是你忘了ti的ic支持1bit ecc哦，所以把4bit改成1bit就OK了！easy吧？

Hi guys.

Just update your yaffs2 from the official webside. and set eccbit to 1 in board-dm365.c.

Enjoy:)

Processors

Processors forum

The problem of the yaffs2 filesystem on dm365