Some questions about "Flash Read Error and Susceptibility for MSP430F54xxA"

milanqingren

Other Parts Discussed in Thread: MSP430F5438A, MSP430F5529, MSP430F5437A

Hi,

I saw the flash problem of MSP430F5438A which is described in "Flash Read Error and Susceptibility for MSP430F54xxA".
So, I have some questions here:

(1) When will this problem be fixed? Will TI take a long time to release the fixed version?

(2) What is the revision number of MSP430F5438A that has the problem fixed? We don't want to buy the version that has the error.

(3) "Flash Read Error and Susceptibility for MSP430F54xxA" indicates that "Specific actions and customer recommendations" are described in PCN #20100910003A, However, I can't find this document from internet. Dose anyone know where to find this information?

Thanks

Liu

over 14 years ago

0 Jeff Tenney over 14 years ago

Guru 12160 points

The original PCN is attached to this post. [Edit: The updated PCN is attached to a later post in this thread.]

Some of the information in the original PCN was not completely accurate.

The information in SLAA470 for '54xxA and SLAA471 for '55xx is newer and better.

Jeff

0636.PCN20100910003.pdf

0 Priya Thanigai over 14 years ago

TI__Mastermind 26380 points

Liu,

The device errata is the best place to look for updated information: http://focus.ti.com/docs/prod/folders/print/msp430f5438a.html

The errata provides a link for the Techincal Document that accompanies the PCN (see Flash35).

Also if you received a PCN from TI it should have the dates for revised Silicon. The expected date is in Dec 2010. The revision with the fix will be F5438A Rev.E.

Regards,

Priya

0 Chikku over 14 years ago in reply to Priya Thanigai

Intellectual 805 points

hi,

Does this error occurs in all the present devices(F54xxA, F55x),

We are using MSP430F55xx, the device is working properly and the application(100 KB code size) is almost in its final stage.

But unfortunately this application has to pass some regulation tests and we are worried that

this "flash read error" errata might cause problem to such tests.

Please let me know what should we do in this case, should we wait for the next version(till december) to apply for regulation test

or else can we continue with the present device.

Unfortunately we don't have much time our schedule.

Thank you

0 Jeff Tenney over 14 years ago in reply to Chikku

Guru 12160 points

Hi Chikku,

SLAA471 says that '550x and '5510 are not susceptible. They are immune. The other 55xx and all 54xxA parts seem to be susceptible though.

Our application just started production using the '5529. During development, we never saw this error (as far as we know). But our production firmware still includes a workaround, just to be safe. We are going to keep using the '5529 with the flash read error.

Another of our applications continues to use the 54xx non-A part until the 'A' parts become available with the fix.

As you consider your options, keep in mind that the parts might be very difficult to acquire. We buy through distributors, and they seem to be completely cleaned out already.

Our biggest problem now is not the flash read error but that parts are not available anymore. Distributors are quoting for February delivery, which implies updated parts. We are checking now to see if the factory has recovered inventory. Can't ramp up production without parts!

In your case, maybe there's no sense in regulatory testing until you can start production.

Jeff

0 Jeff Tenney over 14 years ago in reply to Jeff Tenney

Guru 12160 points

Here's the updated PCN. SLAA470 and SLAA471 refer to it.

Jeff

7367.PCN20100910003A.pdf

0 Jens-Michael Gross over 14 years ago in reply to Jeff Tenney

Guru 227245 points

Putting all ISRs into the flash above 0x8000 is a relatively easy and safe workaround if your project is based on one of the affected chips.
So after some idle time, the ISRs address is the first thing fetched from flash and since the MSB is already 1 for all of them, nothing can go wrong.

There's of course nothing you can do against a device not starting after a reset because of the jump to the BSL code failing. In this case, however, the watchdog should retry after 32ms. And retry....

When using the non-A parts as a temporary replacement, there may some changes in the code be necessary:

Besides the lower maximum frequency, also, the PMM settings are different (CoreV 2 or 3 required for all speeds). Then there is no reference module (the reference for the ADC needs to be programmed in teh ADC module). This is a non-backwards-compatible change from non-A to A (and even more if going hte other way). Also the CRC module lacks the 'right' (CCITT-compliant) 'inverse' registers. And there is only an older and non-reprogrammable BSL.

The non-A, however, is cheaper. And available. :)

0 Chikku over 14 years ago in reply to Jens-Michael Gross

Intellectual 805 points

hi,

Thankyou,

Our device is MSP430F5529, as of now there are no such errors or problem with in the application,

does this mean, our chips are immune to this "flash read error" errata?

I don't understand what could be the possibility of the occurrence of this error, does it occur 100% in every F5529 devices?

Actually our application has code size of more than 100KB and there is no way to relocate the code to the unaffected flash.

The errata says that if the program code resides in the affected block(block 0 and 2)

"there might be an application error due to incorrect instruction execution or incorrect data fetch"

Does anybody know what kind of error or problem exactly could occur?

Apart from ISR placement what are the other workarounds for the robustness of the application.

Thank you and regards.

0 Jens-Michael Gross over 14 years ago in reply to Chikku

Guru 227245 points

Chikku said:
"there might be an application error due to incorrect instruction execution or incorrect data fetch"
Does anybody know what kind of error or problem exactly could occur?

It depends on what is stored on this flash location.

In short, the problem only occurs, if the flash is not read (idle) for a certain time. This happens e.g. if going into LPM or running code from RAM.

The 'certain time' depends on core voltage level (higher = worse) and temperature and varies between 0.5 and 20 milliseconds. It also varies over production (wafer etc.)

What happens? After this idle time, the first access to FLASH may result in the MSB of the currently read 32bit word to be set even if it is correctly stored low in flash.

This means that after some time in LPM, teh fetch of the interrupt vector may have the MSB erroneously set (jumping to e.g. 0xd800 instead of 0x5800). If all ISRs are above 0x8000, it won't hurt.

If you spend some time in RAM (e.g. for flashing/erasing), the next read from flash might fail too, which is maybe a constant read from flash by the ram funtion, or the first instruction (or its first parameter, depending on the 32bit boundary) after returning from the ram. Which is more difficult to handle but can be worked around by manually placing an instruction there that already has the bit set (and set in its parameter).

Another problem is if your code is jumping on place (e.g. a while(1) loop). There it depends whether the jump and the following instruction (which is not executed, but fetched before the jump is executed) are on same or different 32bit. If they are ont he same, then the flash is not accessedd during the loop (32 bit are buffered). Which then is similar to entering LPM and exposes the ISRs to the bug (with the above workaround).

If you

don't enter LPM,
don't execute code from ram,
don't have an empty 'jump on place' loop (it is sufficient to just permanently trigger the WDT inside the loop or add a NOP)

then you won't be affected at all.

0 Jeff Tenney over 14 years ago in reply to Jens-Michael Gross

Guru 12160 points

Jens-Michael Gross said:

If you

don't enter LPM,

don't execute code from ram,

don't have an empty 'jump on place' loop (it is sufficient to just permanently trigger the WDT inside the loop or add a NOP)

then you won't be affected at all.

I don't think that's right.

If code executes from an unaffected bank continuously for 1ms with no ISRs, then if program flow migrates into an affected bank, you could see this error on the first fetch from affected flash.

TI's latest documents regarding this error say the flash controller must access an affected bank frequently enough to prevent errors. Accessing any affected bank is sufficient to protect all the affected banks.

Also note that the '54xxA devices see the error in bit 31, while the '55xx devices see the error in bit 30.

This is definitely a tricky problem, so hopefully the forum can help us all figure it out together. In my workarounds I dedicate a DMA channel and a Timer A3 to the fix. I was lucky that my application could spare those resources.

Jeff

0 Chikku over 14 years ago in reply to Jeff Tenney

Intellectual 805 points

Hello Jeff,

I think in order to access an affected bank frequently, we may have to sacrifice the power consumption or

else don't use the LPM at all,

In our case we need to use LPM and also code should be executed from RAM

when doing the in-system flash programming.

If possible could you please let me know brief details of your workaround and its effect to the application.

Thankyou

0 Jeff Tenney over 14 years ago in reply to Chikku

Guru 12160 points

Hi Chikku,

1. I dedicate DMA channel 2 to reading a byte from affected flash and storing it to a peripheral register that ignores byte writes. This is a DMA channel I wasn't using. It uses repeat single transfer mode to stay armed continuously.

2. I trigger the DMA with TA2, which is a Timer A3 module I wasn't using and is now dedicated. TA2 counts in up mode and rolls over every 750 microseconds when it runs. Our application needs to run reliably up to 60 deg C.

3. I turn off TA2 when giving control to code in RAM, which is a bootstrap loader utility that uses block writes. Block writes are not tolerant to random DMA reads of flash.

4. RAM code (my bootstrap loader) always reads one word of affected flash and then one word of unaffected flash immediately before reading flash for the host application (on a PC).

5. [Optional] I turn off TA2 when entering LPM. I restart TA2 (including a counter clear) after an ISR wakes from LPM on exit. (I have a single function called WAIT_FOR_INTERRUPT.)

6. [Required with 5.] I have "soft" interrupt vectors that reside in affected flash. These are 1-word JMP statements that I have placed only in the low-order words, where all 16 bits can be trusted. The companion words are all unused (filled by a DC directive in my case). When the CPU wakes to process an interrupt, the hard vector is fetched as usual (from unaffected flash), which points to my soft vector. My soft vector (in affected flash but safe in the low-order word) then jumps to the real interrupt handler. The real interrupt handlers are all in affected flash too, keeping that access current.

Without 5 and 6, the workaround adds about 15uA of quiescent current in my application. That wasn't tolerable, so I added steps 5 and 6. Now there is no perceptible extra current. In fact, most of the time my code is going back to LPM before the DMA ever makes a transfer. However if my code stays awake a long time (more than 750 us) then the autotriggered DMA is there to keep me safe.

Again, you might consider whether the effort is worthwhile. The parts are difficult to acquire now.

Jeff

0 Chikku over 14 years ago in reply to Jeff Tenney

Intellectual 805 points

Hi Jeff,

Thank you very much for the details about your workaround.

Regarding the devices, we have around 60 custom boards using affected devices, later

we may use the newer version for the final production.

If time permits I shall see if we could implement the workaround, but since we have less time left and there are no errors as such,

we may apply for the regulatory testing as it is.

Regards.

0 Jens-Michael Gross over 14 years ago in reply to Jeff Tenney

Guru 227245 points

Jeff Tenney said:
TI's latest documents regarding this error say the flash controller must access an affected bank frequently

Either I didn't read the latest version or I just overlooked it.

You're right, if you have code that runs only in an unaffected bank (or in RAM) for that long, it is an issue.
My code only runs in bank0 (the others are just for storage and constants), so maybe I hadn't thought of that even if reading the 'affected' attribute.

Jeff Tenney said:
Also note that the '54xxA devices see the error in bit 31, while the '55xx devices see the error in bit 30.

I wonder why it is in a particular bit anyway. I had expected it to happen randomly on every bit. Anyway, this will limit the plocation for the ISR to > 0xc000 and/or limit the number of possible 'nop' instructions after a call that leads to an unaffected area and long execution time.

Jeff Tenney said:
In my workarounds I dedicate a DMA channel and a Timer A3 to the fix.

That's an elegant solution if you don't need them for the project.

Well, I know why I don't use bleeding edge technology. I can live with bugs I know before starting a project, but maybe not with those which show up after I had already doen most of the work.

0 Louis_T over 14 years ago in reply to Jens-Michael Gross

Prodigy 225 points

Hi All,

I received the MSP430F5437A Rev E three weeks ago. The errata document, SLAZ057I, still shows that Rev E still has this flash read susceptibility bug, FLASH35, and further described in SLAA470. So this flash read susceptibility bug is not fixed with the latest silicon (Rev E). Is that a correct statement?

Thanks,

Louis

0 Jeff Tenney over 14 years ago in reply to Louis_T

Guru 12160 points

Hi Louis,

FLASH35 does not affect Rev E. If you look closely at the matrix in SLAZ057I, there is not a check mark in the FLASH35 column for Rev E parts.

Jeff

**Attention** This is a public forum

MSP low-power microcontrollers

MSP low-power microcontroller forum

Some questions about "Flash Read Error and Susceptibility for MSP430F54xxA"