Dealing with NAND bad blocks in production programming

tek4

We built a production lot of DM368 IPNC based cameras with originally specified Samsung NAND, and we're programming them with IPNC DM36x Version 3.1 software

It seems that if the NAND has bad blocks in the Kernel area, the bad blocks are tolerated, but if the bad blocks are in the Squashfs area, the camera complains about errors in the file system.

Tracing through the uBoot source code we find that it uses a function nand_read_skip_bad that reads the NAND and skips over bad blocks,and since the information was also written in uBoot with a nand_write_skip_bad then everythign works out fine for loading the kernel.

Unfortunately this doesn't seem to be happening when the Kernel reads the NAND to load the file system.

According to the docs, apparently Pathpartner has optimized the NAND read with assembly language for faster reading... but since it doesn't actually work when there are errors in the Squashfs area, it seems that perhaps the optimized routines don't skip over bad blocks or don't remember to continue reading to make up for the skipped bad blocks.

Has anyone else encountered this issue in production programming ? What is the best solution ?

over 12 years ago

0 Renjith Thomas over 12 years ago

Guru 31670 points

Hi,

We have implemented the optimized routines. But the optimized routines doesn't do any bad blocks checking. You can see whether the BBT scan is enabled in the board file. If BBT scanning is enabled it should check for the bad block area, depending on whether it checks the u-boots bad block table or not. I am not sure about this.

0 tek4 over 12 years ago in reply to Renjith Thomas

Intellectual 700 points

Hi Renjith, thanks for the response.

Well, if bad blocks aren't skipped, this won't work for a production environment where even brand new NAND chips come with a few bad bocks. Is there an easy way to turn off this optimization and go to the nand_skip_bad routine ? Any #define or configuration option ?

As it is, we'll just have some cameras that simply don't work, and that's out of a relatively small production lot.

If there is no easy configuration and if we'll have to revert to the non-optimized routines, is there an easy way to find exactly where the kernel is reading the squashfs section of the NAND ?

0 Renjith Thomas over 12 years ago in reply to tek4

Guru 31670 points

Hi,

There is #define flag which I'm not able to recollect the exact definition. Its something like BBT_SCAN or SCAN_BBT or similar. This can be passed to MTD driver from the board file. Please search for this pattern in the mach-davinci or drivers/mtd directory. You'll be able to figure it out.

You can enable print inside the do_read_pages() function inside drivers/mtd/nand/nand_base.c file. You can print the argument "from" to find out the offsets that kernel is reading.

I am trying to recollect it from memory and not reading code. So the names/paths may not be exact. You might have to do bit searching.

0 nick6755 over 11 years ago in reply to Renjith Thomas

Prodigy 40 points

Hi Tek and Renjith

I have exact same problem as this post, what was the final solution to the problem?

Thank for any help

Best regards

Nick

0 tek4 over 11 years ago in reply to nick6755

Intellectual 700 points

Well, TI never helped with this, and we noticed that the newer versions of firmware are even more sensitive to bad blocks.

In fact, TI's instructions for loading newer versions of firmware include using the "NAND scrub" command which causes warnings to be issued from uBoot that this will damage factory bad block info. Doesn't seem like a good idea. Not sure what the exact reasoning for this is... whether it was more expedient for firmware development, or to work around some bug, but it seems scary for production.

Currently we have a really impractical solution of screening the NAND chips for bad blocks, grouping them into different batches, and using one of several different layouts for the memory images to avoid the bad blocks in the areas that the firmware apparently can't tolerate.

0 nick6755 over 11 years ago in reply to tek4

Prodigy 40 points

Thank you for your response Tek4, I have done the same, I have x2 SD cards for programming with different memory layouts.

Renjith: Have you got any extra detailed info on how to make the kernel work with bad blocks? Thanks in advance for any help you can give!

0 Renjith Thomas over 11 years ago in reply to nick6755

Guru 31670 points

Nick,

Bad blocks can be handled certainly. I cannot give a solution right now, need to see your exact problem to propose something.

0 nick6755 over 11 years ago in reply to Renjith Thomas

Prodigy 40 points

Hi Renjith,

Our Problem:

Our design is based on IPNC DM368 MT5 hardware with version 3.1 software, our nand chip is MT29F2G08ABAEAWP, if the nand has bad blocks in the location where the file system is stored then the kernel will fail.

Starting kernel...

SQUASHFS error: Unknown inode type 10 in squashfs_iget!

kernel panic - not syncing: VFS Unable to mount root fs on unknown_block(31,3)

Nick

0 Renjith Thomas over 11 years ago in reply to nick6755

Guru 31670 points

Nick,

Are you sure that you are seeing this behavior because of bad blocks? Have you tried to write the filesystem into another offset?

0 Ivy Liu over 11 years ago in reply to nick6755

Intellectual 730 points

Hi Nick,

If your problem is caused by bad block, I think it is beacuse of the SQUASHFS.

SQUASHFS has no bad block management at all and requires all blocks on order.

Have you tried to use SQUASHFS with UBIFS? UBIFS can help SQUASHFS do the bad block management job.

Or you can try other filesystems that does have their own bad block management, such as JFFS2, YAFFS.

Ivy

0 tek4 over 11 years ago in reply to Ivy Liu

Intellectual 700 points

Hello Ivy,

Thanks for pointing out that other file systems have bad block management.

However, in this case the problem is deeper than that.

Some parts of the IPNC software read the NAND using a "read" function, some use a "read-and-skip-bad" function, and then this got further confused when pathpartnertech added a faster NAND read function. It has improved performance but doesn't seem to skip over bad blocks.

Clearly, if a file system is written to NAND, and if the write process skips over a bad block, the read function must do the same so that it ends up reading exactly what was written.

But if the optimized read does not skip bad blocks, it could be a nice demo on some cameras where the NAND doesn't have a bad block in an inconvenient location, but it wouldn't be suitable for production unless the whole issue of bad blocks is dealt with properly at lower software levels than the decision of what file system is implemented on top of it.

We've had to resort to using older versions of IPNC software, and since that also doesn't solve the entire issue, alternate file system layouts have to be used to avoid some bad blocks that cause trouble even with the older software.

Processors

Processors forum

Dealing with NAND bad blocks in production programming