workaround: am3517 NAND (x8) write errors in Linux

Anton Olofsson

Other Parts Discussed in Thread: AM3517, AM3359

Hi!

We have a custom board with am3517 and Samsung k9f2g x8 NAND. Using Linux 2.6.37 from http://arago-project.org/git/people/?p=sriram/ti-psp-omap.git -- OMAPPSP_04.02.00.07 branch.

When writing flash (jffs2 and ubi) we were getting errors of this sort (for jffs2) -- "Write of X bytes at Y failed. returned -5, retlen 0 Not marking the space at Y as dirty because the flash driver returned retlen zero"

Seeing other posts related to this we wanted to share our findings. We have found that for some reason the flash read/write/erase operations can take a lot of time to finish in polled mode.

Increasing the timeout for omap_wait(...) in omap2 nand driver removes all problems (it seems).

--- a/drivers/mtd/nand/omap2.c

+++ b/drivers/mtd/nand/omap2.c

@@ -924,11 +924,11 @@ static int omap_wait(struct mtd_info *mtd, struct nand_chip *chip)

mtd);

unsigned long timeo = jiffies;

int status = NAND_STATUS_FAIL, state = this->state;

+ //HACK:

if (state == FL_ERASING)

- timeo += (HZ * 400) / 1000;

+ timeo += (HZ * 4000) / 1000;

else

- timeo += (HZ * 20) / 1000;

+ timeo += (HZ * 1000) / 1000;

This is a hack however as the above timeouts are silly to be honest (the above timeouts are just some big enough numbers, as samsung spec says that 5/10/500us should be enough for r/w/e).

Seeing how the omap_wait code is written the performance of the flash is still pretty good as it will only have the complete timeout if an error actually occur (we belive).

Enableing DMA prefetch also works, however when flash load is high errors can still occur. We belive this is because of fallback to polled mode operations if DMA is busy.

We are currently using DMA prefetch with the above change to omap_wait, which feels pretty stable.

This is what we have found as of yet. It would be nice to get some comments on wheter this might be a problem with the NAND timings set in x-loader. We have not noticed any problems with flash in u-boot though, so this seems unlikely(?).

Another note is that we cannot use subpage writes with UBI and must manually turn these off in kernel to get ubi to work. When subpage is off ubi seems to work well.

over 14 years ago

0 daniel.nystrom over 14 years ago

Prodigy 10 points

I'm affected by this as well. Posting to subscribe the thread.

0 Anton Olofsson over 14 years ago

Prodigy 80 points

Subpage write seems to be fixed now : http://arago-project.org/git/people/?p=sriram/ti-psp-omap.git;a=commit;h=1f62a9d1143cffcdc5dbf5b433fb905fa5f78831

Nice work!

Regards,

Anton

0 Orjan Friberg over 13 years ago in reply to Anton Olofsson

Expert 1385 points

Anton,

What was your final configuration where this worked? NAND_OMAP_PREFETCH and NAND_OMAP_PREFETCH_DMA both on? JFFS2 and/or UBIFS?

I tried the subpage write patch (on a 2.6.32 kernel, JFFS2) but I still see the problem occasionally (once every 1-4 hours with heavy file system load).

Thanks,

Orjan

0 Anton Olofsson over 13 years ago in reply to Orjan Friberg

Prodigy 80 points

Hello Orjan,

I would try increasing the timeouts for polled mode if you havent already. (without this we also got errors during heavy load).

"Final" solution for us (seems stable for both jffs2 & ubi):

* NAND_OMAP_PREFETCH_DMA

* subpage fix from arago

* Increased timeout for polled mode (this was important even when using prefetch as polled mode might still be used from time to time).

Regards,

Anton

0 Orjan Friberg over 13 years ago in reply to Anton Olofsson

Expert 1385 points

Thanks. I wasn't sure if the increased timeout was needed once you got the subpage fix in place.

0 David Andrey over 13 years ago in reply to Orjan Friberg

Prodigy 235 points

Timeout value are ok, but the exit code have to be updated.

    while (time_before(jiffies, timeo)) {
        status = __raw_readb(this->IO_ADDR_R);
        if (status & NAND_STATUS_READY)
            break;
        cond_resched();
    }

+    /* if we have time-out exit, then check again */
+    if (!(status & NAND_STATUS_READY)) {
+      status = __raw_readb(this->IO_ADDR_R);
+    }

0 Anton Olofsson over 12 years ago in reply to David Andrey

Prodigy 80 points

Update (concerning am335x),

The reason for this (very late) post is that we found the same problem on am3359.

David: Thank you! Fix seems to do the trick! On a note TI seems to have introduced a very similar fix in their am335x kernel (arago).

0 David Andrey over 12 years ago in reply to Anton Olofsson

Prodigy 235 points

you're welcome ! Thanks for reporting :-)

Processors

Processors forum

workaround: am3517 NAND (x8) write errors in Linux