This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

erasing nand flash on OMAP 35x EVM

I've been trying to run yaffs2 on the mtdblock4 of my OMAP35x EVM (samsung memory) and have been starting to see numerous flash errors.  Now at boot time, I see...

Scanning device for bad blocks

num of blocks = 1024

Bad factory block61 at 0x007a0000

Bad factory black107 at 0x00d60000

...

 

I've tried erasing these using "onenand erase block 60-1023" but it skips all the "bad blocks".  "onenand scrub ..." doesn't seem to help.  Is there any way to get these blocks fully erased so I can start over?  I assume that I really don't have "bad" blocks, or this flash is failing in a hurry.

 

Thanks.

 

  • Running nand scrub will erase the bad block information but that does not mean it will actully repair bad blocks, they just get detected again, having bad blocks in a NAND flash is common, most nand devices out of the factory have some bad blocks so what you are seeing is not unusual, it would be more unusual and perhaps concerning if no bad blocks were detected. It just happens that the file system setup you have is printing these messages, if you have end users that may see these messages you may want to mask the messages to avoid this type of confusion.

  • Should I use "nand scrub" or "onenand scrub"?  If I uses "onenand scrub block 60-1023" it still reports "onenanc_erase: not erasing factory bad block@0x007a0000" etc for each bad block.

    If I use "onenand scrub 0x007a0000 0x08000000" (which help describes as including bad blocks) I get "Erase block from 28733 to 29696. onenand_erase: Erase past end of device"  I must not be specifying the start/end addr correctly.  Can you add some detail here?

     

  • Sorry.  I should also add that I don't think this is normal wear or bad block detection.  I went from no blocks being reported bad to 195 bad blocks being reported after adding the yaffs2 support to kernel.

     

  • This does sound like this issue is yaffs2 related; perhaps yaffs2 file system just does a more thorough check.  I believe what Bernie was suggesting is that if file system is working properly (e.g. bad block issues are being handled correctly underneath despite messages), the messages are simply informational and you can turn them off.  However, if they are causing issues with the file system, can you provide the software versions (Linux kernel/u-boot) you are using so we can look into this.

  • I do believe that the onenand scrub is correct for the OMAP3, on other devices (without onenand) the command is nand scrub, note that this should not be necessary, wiping the factory entered bad block table will not fix bad blocks.

    The issue here is likely that the bad blocks always existed, and you just did not have them reported until you installed yaffs2 (we have seen similar situations on other platforms where the terminal is spammed with these messages when using certain flash file systems), it is highly unlikely that your NAND flash had no bad blocks to begin with, as I mentioned previously almost every NAND device has some bad blocks out of the factory.

  • I understand that wiping the factory entered bad block table will not fix bad blocks, but I wanted to try this because I believe that the blocks aren't really "bad" but just marked that way erroneously.  I base this on the SDK u-boot (1.1.4, Jul 1 2008 - 17:07:33) did not report any bad blocks when "Scanning for bad blocks"  (see capture below).  I then added yaffs2 support (several months old) to kernel and continued to not see any bad blocks detected - either in kernel code or in u-boot.  I then updated the yaafs2 code to the latest - then when trying to write to yaffs2 mounted /dev/mtdblock4 saw numerous errors ("**>> Block 500 retured.  Block 500 is in state 9 after gc, should be erased. yaffs: Block struck out".  From this point on, u-boot began reporting errors on startup.  (see further below)...  So what I'm thinking is that completely erasing the blocks (included factory marked bad blocks) would allow the bad block table to start clean.  Then (I think) scanning would find and mark the "actual" bad blocks.

    I also understand that NAND devices will generally have bad blocks straight from the factory.  However, this NAND went from 0 of 1024 being reported bad to 195 of 1024 being reported bad.  How could one ever have faith in NAND flash if close to 20% of blocks go bad just like that; and if they were bad all along - that's even worse!

    Thanks for your help on this.  I appreciate any additional feedback or suggestions you have.

     

    Before problem...

    X-Loader 1.41.0

    Detected Samsung MuxOneNAND1G Flash

    Starting OS Bootloader...

    U-Boot 1.1.4 (Jul  1 2008 - 17:07:33)

    OMAP3-GP rev 2, CPU-OPP2 L3-165MHz

    OMAP3EVM 1.0 Version + mPOP (Boot ONND)

    DRAM:  128 MB

    OneNAND Manufacturer: Samsung (0xec)

    Muxed OneNAND 128MB 1.8V 16-bit (0x30)

    OneNAND version = 0x0221

    Scanning device for bad blocks

    num of blocks = 1024

    In:      serial

    Out:    serial

    Err:      serial

    ...

     

    After problem...

    X-Loader 1.41.0

    Detected Samsung MuxOneNAND1G Flash

    Starting OS Bootloader...

    U-Boot 1.1.4 (Jul  1 2008 - 17:07:33)

    OMAP3-GP rev 2, CPU-OPP2 L3-165MHz

    OMAP3EVM 1.0 Version + mPOP (Boot ONND)

    DRAM:  128 MB

    OneNAND Manufacturer: Samsung (0xec)

    Muxed OneNAND 128MB 1.8V 16-bit (0x30)

    OneNAND version = 0x0221

    Scanning device for bad blocks

    num of blocks = 1024

    Bad factory block61 at 0x007a0000

    Bad factory block107 at 0x00d60000

    Bad factory block108 at 0x00d80000

    : (~190 bad factory blocks reported)

    Bad factory block562 at 0x04640000

    Bad factory block576 at 0x04800000

    Bad factory block960 at 0x07800000

    In:      serial

    Out:    serial

    Err:      serial

     

  • So based on your description you believe that the cutting edge YAFFS2 code corrupted the bad block table? It would seem this to be the case since the message comes from U-Boot which you never actually changed. It still seems strange to me that a change to the kernel would cause U-Boot to start showing bad block messages, perhaps U-Boot is setup only to show bad block messages if a certain threshold of bad blocks are reached, in which case corruption by the newer YAFFS2 would explain the changes. On the other hand this could just be some incompatibility with the latest YAFFS2 file system and the TI U-Boot version.

    In any case if the table was corrupted and you needed to rebuild it than the scrub command would be the way of clearing the slate. I would not expect the scrub command to give you back the 'not erasing bad block' messages, looking in the source at OMAP35x_SDK_1.0.0\board_utilities\u-boot\drivers\onenand\onenand_base.c it looks like if scrub is set to 1 it should ignore bad blocks and erase blindly (line 978). Also in the else statement for non scrub it does not have the word factory in the debug message it prints, which makes me wonder if there is either another onenand source file in U-Boot that handles the scrub command or if you are using another U-Boot version somehow.

  • Yes, I believe the newer YAFFS2 code has corrupted/overwritten the bad block table.  Not knowing lots about YAFFS2 or onenand, I'm concerned that there's some conflict in how ecc data is calculated and stored - I'll have to investigate that. (or go back to the older yaffs2 code)

    I didn't give enough detail earlier.  When I use "onenand scrub 0x20780000 0x28000000" I do in fact the following when it detects bactory bad blocks...

    onenand_erase: not erasing factory bad block @0x7a0000

    So I see in the u-boot code what you see, I just hadn't explained the exact message I was receiving when I used "scrub".  From what I see, scrub should ignore bad blocks but not factory bad blocks.  I rebuilt u-boot with DEBUG turned up for MTD and confirmed this code is being executed.  Would there be harm in me modifying the scrub command to erase the block regardless of the return value of onenand_block_isbad() on line 977, and blindly doing the erase (even though _isbad() is returning 0x3)?

    One other question.  When I reflash u-boot, all the environment variables seem to get blown away.  Is this normal or am I doing something wrong?  (I basically load the reflash-samsung.txt env to target and run "run rf_ub")  It looks like from the reflash env file that the environment is kept in separate blocks, so I wonder if I've got something wrong.

    Thanks.

     

  • If the YAFFS2 has already corrupted the factory bad block table than clearing them with the scrub command seems like a valid option. The risk here is that you will be depending on U-Boot to detect all of the bad blocks on its own, this should work of course but there is always the possibility that there could be complications.

    When you reflash U-Boot on the OMAP3 EVM at least, you do wipe the environment variables, I am not sure if it is because of the erase commands being used in the 'reflash-samsung.txt' file or just by virtue of using a fresh U-Boot image, in any case it is to be expected. If you have variables you want to save you will probably want to keep them in a file like the GSG does, or a script for your terminal program (TeraTerm does a good job of this).

  • As a follow up, the u-boot onenand scrub command by default includes bad blocks but excludes factory bad blocks.  Somehow the latest YAFFS2 code erroneously decided that a bunch (about 195) of blocks were bad and marked them such that u-boot recognized them as factory bad blocks.  Without determining exactly where YAFFS2 went wrong, I modified u-boot onenand scrub command to include factory bad blocks and erase the blocks in question.  I also reverted to known good YAFFS2 code.  Since then all is OK: YAFFS2 is working, u-boot is working, and no bad blocks being reported by either.

    Thanks for your help.

     

  • I am glad to hear that it was fixed by modifying the scrub command and reverting to the older YAFFS2 code. I have not done much digging to see if this could be a known issue but you may want to post the issue to the YAFFS mailing list as a potential bug.

  • Thank you for sharing this with the community; I am sure others will find this information helpful.