Obscure device failure during production: Software stops, no debug possible

Harald Ilg

Other Parts Discussed in Thread: CC2640

Hy there,

we are in the stage of SOP, firmware of our BLE device (CC2640, battery powered, some I2C devices) proved to run stable in extensive field testing, no HW issues either.

Now, in the first two badges of series production there is a strange effect, that hits about 5% of all devices: During casting the processor stops working. Devices don't advertise any more and are not connectable to the JTAG debugger, FLASH Programmer does not recognize the device.

After HW reset the devices run normal again. Unfortunately, this does not help, because JTAG connector is not accessible in the final product.

Apparently, there is no current draw, at least battery voltage seems normal, even after three weeks in the fail mode. As far as I can measure this, there is no peak current, that is typical for a running device (i.e. during advertising etc). So far, we did not implement watchdog but we do have a software "advert lifeguard". If there was no advertising callback for longer than 3min the device resets. So apparently, the SW does not run any more.

We did implement an exception handler, that writes some debug info to an external eeprom and resets the device, but apparently, that was not triggered.

Any ideas on what else could be the reason for this? Stopped quartz? Electromagnetic Immunity?

What measurements could help to find out the root cause for this?

Regards

Harald

over 9 years ago

0 Christin Lee over 9 years ago

TI__Guru 56545 points

Hi,

Can you try to move the chip on the failing board to a functional board? Then we can at least see if it's a board problem or chip problem.

0 Harald Ilg over 9 years ago in reply to Christin Lee

Genius 3060 points

Hy Christim

thanks a lot for your response. I probably did not express myself well: As soon as I perform a HW reset (connect Pin 10 to Pin 9 on JTAG connector) the device runs like charm. And I am pretty sure, that I can't unsolder the chip from the board without triggering a reset.

Nothing is damaged, it is only in a bad state.

Regards

Harald

0 Christin Lee over 9 years ago in reply to Harald Ilg

TI__Guru 56545 points

is it reproducable after you reset the device?
Can you send us some photos of the final product packaging, schematics and layout?

0 Harald Ilg over 9 years ago in reply to Christin Lee

Genius 3060 points

Hy,

>is it reproducable after you reset the device?

No, the only situation where it happens is during casting of the device. We have never observed any device getting into this state after or before this process.

We now have one theory that would explain the behaviour:

During encapsulation of the device, for whatever reasons (temperature, mechanical stress) one pin (i.e. quartz) lifts, so that the devices stops. After cooling and curing of the casting compund the pin returns to its original position but the controller/quartz would not start, because it won't get a proper reset signal. As soon as we actively reset it by pulling down the reset pin, everything is back to normal.

Does that sound reasonable to you?

Regards

Harald

0 Allen Jameson over 9 years ago in reply to Harald Ilg

Intellectual 370 points

Harald,

I'm asking simply out of curiosity, in case I ever encounter the same thing, since you linked to this from the other thread. What is your casting process - what's the material, temperature, and cooling time?

I have had some unexplained failures where a hard reset works (and we do currently have the JTAG exposed for troubleshooting) but I would also be very interested in seeing if implementing the built-in watchdog helps your situation. That seems like the best thing to try in your case. I am currently planning on converting from a software implementation "soft watchdog" to the hardware watchdog, and hope this helps for our issues.

Best,
--Allen

0 Harald Ilg over 8 years ago in reply to Allen Jameson

Genius 3060 points

Hy there,

although this is almost a year ago, I still owe our explanation for this: We find out, that the 32kHZ quartz apparently is very susceptiple to touching.

We defined new handling rules during production (wear gloves, ESD protection etc) and managed to diminish the issue almost completely. In addition a small SW change described here helped, too.

Regards

Harald

0 LeonardEllis over 8 years ago in reply to Harald Ilg

TI__Guru 71526 points

Harald,

Thank you for following up with this useful information. ESD protection and handling methods are indeed sometimes overlooked, and the consequences can be difficult to diagnose. Thank you for identifying and documenting this device handling issue.

~Leonard

0 Allen Jameson over 8 years ago in reply to Harald Ilg

Intellectual 370 points

Harald,

Thanks very much for your reply, it's actually very timely. I was just trying to diagnose some board failures on the line last week and this might help.

Did you ever have this issue (suspected) cause permanent device damage? I have two boards that would no longer connect to the debugger through JTAG regardless of performing hard resets or completely removing power.

Thanks,

--Allen

0 Harald Ilg over 8 years ago in reply to Allen Jameson

Genius 3060 points

Hy Allen,
actually, I do have quite an impressive pile of "bricked" devices on my desk, but none of them I can blame on ESD issues. It seems, that there is some circumstances that can be triggered by software that get devices in the non-resuscitating state. For instance, during implementation of the watchdog, three devices passed away. Also OAD development has some casualties on its conscience. However, the effect described in this thread can always be reverted by a hardware reset.
Regards
Harald

Bluetooth®︎

Bluetooth forum

Obscure device failure during production: Software stops, no debug possible