Chip flash failing in deployment

Doug Jorgesen

Other Parts Discussed in Thread: MSP430F2618

I have a serious problem with my deployed msp430F2618 chips. They work great in my prototypes and early production units, then when I ship them out to customers the flash on the chip is getting rewritten. It is obvious that the code is still working to some extent, but it has hung up at some point because the flash has been overwritten somehow. The error is very difficult to reproduce.

So my question: what kind of precautions are recommended in a deployed unit to make sure the MSP430 chip is safe in the unit?

over 13 years ago

0 Curt Carpenter over 13 years ago

Expert 1255 points

Can you tell us how you know that your flash has been over-written? Also. does your code itself ever write to flash during the course of normal operation?

0 Jan Kesten over 13 years ago

Expert 1610 points

Hi Doug,

have you got a unit which has failed back to you and verified that the flash memory indeed has changed its contents? Do you have intended flash writes in your program?

Apart from that, have you taken ESD and EMI precautions in your device?

0 Doug Jorgesen over 13 years ago in reply to Jan Kesten

Intellectual 290 points

My device is a meter, which reads a sensor and displays the reading on an LCD. I can tell that it has been corrupted because the background picture is displayed correctly, but the numbers in the picture are garbled. Also the communication with the computer has failed, so I can't tell if it is taking readings or anything. So something has either gone wrong with the pictures, or the program, but both are stored in flash, so that makes me think that the flash is corrupted. Since the code is deployed on a MSP430F2618, which doesn't have a spy bi wire interface, I can't read or reprogram the device once it is built into the unit. So I have to guess that the flash is corrupted. I've seen this same failure happen on a number of different units so far.

For ESD protections, I don't really know what to do. I have the circuit encased in an anodized aluminum housing, but the ground is not really connected because I wanted to separate the analog and digital grounds to some extent. Other than that I don't know what precautions to take.

The other possibility is that my program is overwriting it's own flash. This is entirely possible. I write to flash occassionally to save data between resets, which I perform fairly often to prevent the code from freezing or entering an infinite loop.

Here is the code I use to write to flash. If all of this looks good please let me know, and I can post the code where I write to the flash, which may be more complicated.

From flash.h

/// Configure Flash registers
void ConfigFlash();

/// Write a block of data to flash from an array.
void writeToFlashFromArray(unsigned char* flash_ptr, unsigned char* data_ptr, unsigned int length);

/// Write a block of data to flash from a queue.
void writeToFlashFromQueue(unsigned char* flash_ptr, queue* q, unsigned int length);

// Define a structure to store in flash, to minimize the number of flash writes
typedef struct
{
	uint8_t max_freq;     // Index pointing to front of queue
	uint8_t freq_set;      // Index pointint to back of queue
	uint16_t slope_size;
	uint16_t intercept_size;      // The occupied size of the queue
	uint8_t screen_orientation;
	uint8_t freq;  // Flag indicating overrun in the queue
	char IDN_name[24];
	char serial_number[8];

}flash_data;
#define FLASH_SIZE 38

From flash.c:

#define FLASH_BLOCK_SIZE 512

#ifdef FUNCTION_TIMER
long timerA_value3;
long timerA_value4;
#endif /* FUNCTION_TIMER */

void ConfigFlash()
{
	FCTL2 = FWKEY + FSSEL0 + FN2 + FN0;
}

void writeToFlashFromArray(unsigned char* flash_ptr, unsigned char* data_ptr, unsigned int length)
{
#ifdef FUNCTION_TIMER
	timerA_value3 = TAR;
#endif /* FUNCTION_TIMER */
	/**
	 * \todo Add support for writing blocks of data that
	 *       extend past more than one flash block.
	 */
	int flash_count;
	unsigned char flash_block[FLASH_BLOCK_SIZE];
	unsigned char* flash_start_ptr;
	unsigned int flash_index;
	
	//hold the watchdog and prevent resets during flash writing
	WDTCTL = WDTPW + WDTHOLD + WDTNMI;

	// Get start of address block
	flash_start_ptr = (unsigned char*) (((unsigned int) flash_ptr >> 9) << 9);

	// Read entire block into RAM
	for (flash_count = 0; flash_count < FLASH_BLOCK_SIZE; flash_count++)
	{
		flash_block[flash_count] = flash_start_ptr[flash_count];
	}

	// Get the index of the data to be changed in flash_block
	flash_index = (unsigned int) (flash_ptr - flash_start_ptr);

	// Write the new data to the RAM copy.
	for (flash_count = 0; flash_count < length; flash_count++)
	{
		flash_block[flash_count + flash_index] = data_ptr[flash_count];
	}

	// Remove flash lock.
	FCTL3 = FWKEY;

	// Erase flash segment
	FCTL1 = FWKEY + ERASE;
	*flash_start_ptr = 0;

	// Enable flash writing
	FCTL1 = FWKEY + WRT;

	// Write RAM copy back to flash
	for (flash_count = 0; flash_count < FLASH_BLOCK_SIZE; flash_count++)
	{
		while (!(FCTL3 & BIT3));
		flash_start_ptr[flash_count] = flash_block[flash_count];
	}

	// Disable flash writing.
	FCTL1 = FWKEY;

	// Reset Flash lock.
	FCTL3 = FWKEY + LOCK;
#ifdef FUNCTION_TIMER
	timerA_value4 = TAR;
	_NOP();
#endif /* FUNCTION_TIMER */
}

void writeToFlashFromQueue(unsigned char* flash_ptr, queue* q, unsigned int length)
{
#ifdef FUNCTION_TIMER
	TAR = 0;
	timerA_value3 = TAR;
#endif /* FUNCTION_TIMER */
	/**
	 * \todo Add support for writing blocks of data that
	 *       extend past more than one flash block.
	 */
	int flash_count;
	unsigned char flash_block[FLASH_BLOCK_SIZE];
	unsigned char* flash_start_ptr;
	unsigned int flash_index;
	
	//hold the watchdog and prevent resets during flash writing
	WDTCTL = WDTPW + WDTHOLD + WDTNMI;
	
	// Get start of address block
	flash_start_ptr = (unsigned char*) (((unsigned int) flash_ptr >> 9) << 9);

	// Read entire block into RAM
	for (flash_count = 0; flash_count < FLASH_BLOCK_SIZE; flash_count++)
	{
		flash_block[flash_count] = flash_start_ptr[flash_count];
	}

	// Get the index of the data to be changed in flash_block
	flash_index = (unsigned int) (flash_ptr - flash_start_ptr);

	// Write the new data to the RAM copy.
	for (flash_count = 0; flash_count < length; flash_count++)
	{
		flash_block[flash_count + flash_index] = popQueue(q);
	}

	// Remove flash lock.
	FCTL3 = FWKEY;

	// Erase flash segment
	FCTL1 = FWKEY + ERASE;
	*flash_start_ptr = 0;

	// Enable flash writing
	FCTL1 = FWKEY + WRT;

	// Write RAM copy back to flash
	for (flash_count = 0; flash_count < FLASH_BLOCK_SIZE; flash_count++)
	{
		while (!(FCTL3 & BIT3));
		flash_start_ptr[flash_count] = flash_block[flash_count];
	}

	// Disable flash writing.
	FCTL1 = FWKEY;

	// Reset Flash lock.
	FCTL3 = FWKEY + LOCK;
#ifdef FUNCTION_TIMER
	timerA_value4 = TAR;
	_NOP();
#endif /* FUNCTION_TIMER */
}

0 Curt Carpenter over 13 years ago in reply to Doug Jorgesen

Expert 1255 points

About how often do you write to flash? Could the program/erase endurance spec be the trouble (10^4 - 10^5 cycles)?

I think Jan is correct in the note above: you're going to have to strip one of your units down and do a more detailed failure analysis. Until you do that, we're all just guessing.

0 Jan Kesten over 13 years ago in reply to Doug Jorgesen

Expert 1610 points

Doug Jorgesen said:

For ESD protections, I don't really know what to do. I have the circuit encased in an anodized aluminum housing, but the ground is not really connected because I wanted toseparate the analog and digital grounds to some extent. Other than that I don't know what precautions to take.

There are many things to think about. First is a good pcb layout, decoupling capacitors, series resistors, clamp diodes, transient voltage supressors. There is for example a apaper from TI explaining some details. Even if not one of the favourite things to do, it's no (real) rocket science.

www.ti.com/lit/an/szza009/szza009.pdf

www.ti.com/lit/ml/slap126/slap126.pdf

Also in some (and yours!) devices there is the "Marginal Read Mode" to check flash at some level.

Doug Jorgesen said:

The other possibility is that my program is overwriting it's own flash. This is entirely possible. I write to flash occassionally to save data between resets, which I perform fairly often to prevent the code from freezing or entering an infinite loop.

As mentioned, flash wears out after a while. In MSP430s most devices claim 10000 to 100000 write-erase cycles endurance. This can be reached fairly quickly. You use a timer to rewrite your complete flash segmet at a regular interval. If you use one minute, wear out may occur after 10000 or 100000 minutes (a cycle for flash is the time between two segment erases and the writes between them), that is after about 7days or 70days.

I had a similar problem, where I needed to store a couple of values (only 4 bytes, but doesn't matter). To reduce flash cycles I used a function which reads the segment from the beginning and returns the last struct which is the one before I read 0xFFFFFFFF (erased flash) or the one at the end of the segment. Then when writing a new value, I use the position after the last one (which are erased) or I need to erase the segment and start over.

With segment erase every write I have again 7/70 days endurance - with successive writes I only need one erase every 128 write, extending endurance to 2.5/25 years.

But remark, when using something like this, the cummulative write time between erases must not exceed the specified time from the datasheet (often 10ms), but when using byte or word writes that should not be an issue.

Also, flash failures on MSP are always bits that read 0 instead of 1 and can't be erased any longer. So even if you look at a failed device and such bit faliures could cause your problems you have a strong guess what's happening.

But to be sure, attach a debugger and read the memory - as the unit has already failed, there shoud be no dreads in cutting up the traces and solder wires to the pins :-)

0 Jens-Michael Gross over 13 years ago in reply to Jan Kesten

Guru 227245 points

Besides flash wear out, there are additional reasons why flash may fail.

I experienced occasional flash failure when a massive overvoltage had been applied to a device (these were PICs, not MSPs, but still an erased flash).

Also, overtemperature might be a problem. If flash is exposed to a high temperature, its retention time significantly decreases. We had a lot PICs where the preprogrammed code vanished during the soldering process.

And you don't believe what customers do with their devices. If the case didn't melt, they will never tell you that the device was scorching in the sun, frying in a machine motor case or attached to 400V instead of 230V.

0 Doug Jorgesen over 13 years ago in reply to Jan Kesten

Intellectual 290 points

There is a lot of useful information in these documents, I'm not sure if I am understanding it properly. I have a few questions:

Is it saying that you should have a resistor in series and diodes in parallel to VCC and ground with every signal output for ESD absorption? If not every signal, then how do you decide which signals?

I have RF connectors that connect to my case ground, but I deliberately left the USB shielding floating, since I read that was better. Can anyone clarify what to do with a USB connector case?

The section on software considerations for flash was very confusing. I am writing to flash only periodically, when there is some user input, so that should be well below the 10000 cycle write time. I am just doing the basic flash write, I don't know what it means by variable generated keys, address range checking, destruction of variable keys, or writing checksum of data. Are these critical actions to take? What does it buy you?

I would really like to pull off my chip and analyze what is in the program memory right now, but I don't know how to do that either. How do you go about reading the memory on a chip that you haven't set to debug and have reset repeatedly since you took it off the debugger? This would be a really useful ability for me the have!

Thanks for all the help so far!

0 Jan Kesten over 13 years ago in reply to Doug Jorgesen

Expert 1610 points

Hi Doug,

ESD protection is necessary on any path that is exposed to ESD - especially any output that goes outside your case, any shielding connections and so on. There are also transient voltage supressors available on the market to do the job.

Calculating the flash key inside a variable instead of a constant is simply to ensure the "correct" program working while writing to flash. The chance for a runaway jump into the flash writing routine is a bit minimized by that. Reading the flash value after writing ensures that at least the write was sucessful and that there is no instant flash failure already. Checksuming the flash contents makes you able to check if a flash error has occured since the last write (some MSPs have build in CRC for that) if you need to detect this situation.

There is a tool MSP430Flasher from TI which should allow you to read the flash contents form a MSP without reprogramming it (I don't now if this can be done with Code Composer Studio or IAR directly). You will need an emulator and access to the SBW or JTAG pins for doing this.

But it is interesting that there is RF nearby - that could cause problems via EMI. To tell more here, more information about radio and layout is necessary.

**Attention** This is a public forum

MSP low-power microcontrollers

MSP low-power microcontroller forum

Chip flash failing in deployment