Problem in communicating with a flash device

Daniel Ciu

Other Parts Discussed in Thread: DM3730

Hi,

We are facing a problem using the DM3730 chip to communicate with a NAND device.

the NAND chip is connected over GPMC, and the software we are using is the standard Linux driver for OMAP NAND (mtd/nand/omap2.c).

The problem is that occasinally, incorrect data is being read from the NAND chip. Tests showed that the occasional incorrect data is received when the DM sometimes read from the NAND ahead of time (during the tR, while the NAND is still busy prepearing the requested block data). in other words, the DM sometimes does not wait (sleep) long enough between its command and its following read acceses.

The driver indeed sleeps between the read command and the following read acceses using a simple loop (udealy) 50µs. However, in practice, it was shown that the sleep usually takes the expected amount of time, but sometimes it's extremely short - significantly less than 1µs. This proves that the NAND commands executed by ARM before the sleep were somehow stalled inside the DM untill near the end of the sleep period.

Extra tests showed that a simlar problem exists with the GPIO controller: a loop of GPIO swinging code with udelay in between, demonstrated normal delays most of the time, but rarely an approximate zero delay time appeared. This proves that stalls exists also for the ARM GPIO commands.

Our tests included using memory barriers and uncached io remaps, and were run on several chips.

Are the stalls inside the DM a normal behavior? And are the any specifc workarounds to avoid theses stalls?

Thanks in advance,

Daniel

over 13 years ago

0 Cvetolin Shulev-XID over 13 years ago

TI__Guru 65405 points

Hi Daniel,

The described behaviour is strange and not normal. Could you specify which software release and version you are using?

Tsvetolin Shulev

0 Renjith Thomas over 13 years ago

Guru 31670 points

Daniel,

To get rid of the problem you can poll for the wait pin instead of putting delay. The GPMC NAND controller has builtin support for wait pin.

0 Daniel Ciu over 13 years ago in reply to Cvetolin Shulev-XID

Prodigy 240 points

Hi,

Mainline linux 3.4 version is used.

Is there any parameter that may be causing the stalls?

0 Daniel Ciu over 13 years ago in reply to Renjith Thomas

Prodigy 240 points

Renjith,

I tried polling the wait pin, but unfortunately the problem still exists, rarely.

To use the wait pin, some short delay is still needed (for the pin to become busy) immediately after performing the read command.

And it seems like the read command may still be (rarely) stalled during this short delay. Thus, the following polling from the wait pin itself is sometimes premature and wrong (not busy) and so the driver reads from the NAND without any further delay.

I would appreciate any further idea about the problem.

Thanks

0 Renjith Thomas over 13 years ago in reply to Daniel Ciu

Guru 31670 points

Daniel,

Have you tried increasing the NAND timing values in GPMC? You try with maximum values and see whether you can reproduce this first. Then gradually reduce the values for better performance and stability.

0 Daniel Ciu over 13 years ago in reply to Renjith Thomas

Prodigy 240 points

Renjith,

Unfortunately I have tried slowing down the GPMC timing, with no change in the problem.

The stalls also appeared for GPIO.

0 Renjith Thomas over 13 years ago in reply to Daniel Ciu

Guru 31670 points

Daniel,

When you read incorrect data, have you observed the following?

1. Is the data incorrect for the whole page or is it just getting corrupted by few bits?

2. Is the error happening consistently at the same NAND offset or is it totally random?

0 Daniel Ciu over 13 years ago in reply to Renjith Thomas

Prodigy 240 points

Renjith,

The incorrect data (when it occurs statistically) always appears as a few extra garbage bytes inserted at the beginning of a page, because the DM reads before the NAND is ready. After those bytes of garbage, the real data appears properly (but shifted by the garbage bytes). The garbage values result from the bus not being driven by the currently-busy NAND.

I already discovered using a logic analyzer that the NAND is busy because the DM read operation comes (rarely) very soon after the DM write operation (that requests the page from the NAND).

This finding eliminates any NAND-specific issues (and again, it also happened with GPIO writes from the DM).

The DM write and read operations are separated by a "udelay(50)" loop, so the only chance for them to appear too near is, if the write operation is stalled somehow.

The actual delay inspected with the logic analyzer was usually around 50us as it should, but sometimes less than 1us... which I can only explain by having the write operation stalled.

0 Renjith Thomas over 13 years ago in reply to Daniel Ciu

Guru 31670 points

Daniel,

Udelay is not a precise way of putting delay. Udelay just keeps looping based on a count. It depends on the CPU clock speed, i-cache, etc. Also can you try using getticks() or similar function which uses timer support? This will give better accuracy.

0 Daniel Ciu over 13 years ago in reply to Renjith Thomas

Prodigy 240 points

Renjith,

Udelay is the standard way for putting short delays. It is used extensively throughout the kernel, and in nand_base.c specifically.

This means that the stalls I am experiencing may appear in endless ways outside the NAND driver.

As for the precision of udelay, 10% variations are acceptable, but this cannot explain the over 10,000% anomaly described.

Using a timer-based function such as usleep_range affected performance dramatically, unfortunately, and anyway udelay as a simple loop can not be avoided in the entire kernel.

Do you have ideas why the writes are stalled in spite of using memory barriers?

Processors

Processors forum

Problem in communicating with a flash device