This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TM4C1294NCPDT: Catastrophic device failure

Part Number: TM4C1294NCPDT
Other Parts Discussed in Thread: TM4C1294NCZAD, STRIKE, EK-TM4C1294XL

Hi,

I recently had a TM4C1294 Micro-Controller fail catastrophically after 3 months of normal operation. The failure was on a design that is a couple of years old and a failure mode that I have not witnessed previously. The device was pulling down the 3V3 supply line. A resistance measurement from VDD to 0V on the device was less than 1 ohm. We replaced the device on the board. Which now runs as expected. A through inspection of supply voltages an signals associated with the processor yielded no clues as to the root cause of the failure.   

I have considered ESD damage, but in the past this generally manifests as FLASH corruption and not catastrophic failure. I also considered CMOS latch up but I believe that this has been eradicated from modern CMOS devices. Assembly/manufacturing issue is another possible cause but our initial inspection of the failed device provided no obvious assembly issues, no shorts etc.

If anyone has any ideas as to possible root causes or has experienced a similar failure mechanism. I would be interested to hear from you.

Thank you.  

  • Might the 'facts supplied' - lead to (more) - rather than fewer questions?

    RKRobinson said:
    Micro-Controller fail catastrophically after 3 months of normal operation.

    RKRobinson said:
    failure was on a design that is a couple of years old

    Your report notes a, 'Couple of years old design' (assumed successful)  - yet the (particular) failed device has performed 'normally' - for (only) 3 months!     Would not (some) explanation of why this board was (apparently) 'Placed into Service' (far later) than the 'years old' others - prove useful?

    • Is it possible that this failed board was a 'Replacement?'    (for an earlier failed board)    If so - that suggests a challenging operating environment.
    • Are you regularly placing such (new) boards into service?    (which also describes the '3 month life' of this failed board)
    • Might any component, software or assembly/handling 'changes' have impacted this failed board?     It's 'late in the campaign's' introduction is unexplained - 'improper handling' (if the board was built 'years ago' - with others) may prove reasonable.
    • Was the MCU the 'ONLY' component to Fail?    That's a 'key/critical' clue - yet silent...    (as the MCU shorted - the board's 3V3 Reg. either went into 'current-limit' - or it too - died from excess demand.)

    From my firm's (long) experience w/ARM MCUs (multiple vendors: ARM Cortex: M0, M3, M4, M7) it is suspected that:

    • VDD exceeded specification - either (rather) severely (voltage-wise) or (duration-wise) ... or both - in combination!    (especially deadly...)
    • Board was 'Reverse Powered!'    

    Under rushed lab conditions - I (past) 'bypassed' our 'SAFE - Polarized Connector'  (thus SWITCHING the Power Leads) - and BLEW:

    • the MCU
    • Power Inlet Electro. Caps
    • both (2) voltage regulators!    

    Our MCU then 'read' - just as you described...     (It is hoped that I've 'made the case' for a 'Broader Reporting of board damage' - which reduces the number of 'Troubleshooting Variables' - always desirable.)

    Lastly - the 'Frequency of Board Failure' - rather than the noting of 'Just one Failure' - proves far more insightful!      One failure out of  "hundreds of successes" must be treated differently than "order of magnitude  fewer" successes...

  • Hello RKRobinson,

    Is this a one-time failure that occurred, or a repeating occurrence? If repeating, how many boards have been affected, and what is the frequency of the occurrence of the failure? Also if there is a case of multiple devices, is the failure condition identical in all failed devices?

    It sounds like from your initial post that just a single device failed. If that is the case, it would be very hard for us from an applications support standpoint to offer valuable feedback. You've already discussed the most common failure causes we observe.

    If understanding the root cause for a single unit failure is your primary intent, then it would be best to try and submit a failure analysis request. If you have a local TI support team they can handle this, or you can contact the distributor you source the device from.
  • Ralph,
    This is one failure in about three years of using the device. So at this point I was looking for anything I might of missed.
    Thanks.
  • Could you provide a rough estimate as to how many devices have been built using this design over the 3 years?  What I'm getting at is whether this failure is 1 out of 100 or 1 out of a million.

    I primarily ask because I happen to have a device with an identical failure (VDD shorted to ground, repaired by replacing TM4C device) with no obvious cause.  Our design and manufacturing methods are extremely robust (vicor power supplies, massive amounts of decoupling, near ridiculous levels of ESD protection on inputs, IPC Class 3 assembly, moisture bake-out of all MSL2+ devices prior to assembly, silicone conformal coating, etc), but we've only produced a few hundred units over roughly 2 years.  I originally chalked it up to ESD or transient voltage spike damage, but your failure being so similar to mine makes me question my conclusion.

    If you've built tens of thousands or more units and only had one failure, then I won't be concerned enough to spend much time on this issue.  If however, like us, you've not manufactured all that many devices and seen the same failure that we have, I might start asking if you would share which TI lot number your device came from..

    Ben

  • Small update: Right after posting, I realized that we are technically using different devices. I'm using TM4C1294NCZAD (BGA Package) while you're using TM4C1294NCPDT (QFP Package). That probably makes TI Lot number comparison less interesting without TI referencing the lot numbers back to bare die lots. I'm assuming that the die is the same in the PDT and ZAD package, but even that might not be true I suppose.

  • 'ESD' indeed receives 'prime suspect' status - when other (potential, even likely) causes - are not in high evidence.     And - as you likely know - ESD may occur anywhere - and repeatedly - w/in the 'entire' entry & sequential handling/processing - w/in your facility.    Further - once 'In the chip's house' - ESD defects often grow - and become sufficient in magnitude - to finally  (days/months later)  - be noted.    (i.e. cause some failure)

    Note that first I - followed then by this vendor - and now you - have each made clear requests for the 'Failing devices'  'Percentage w/in Delivered!"     That key feedback  was 'never' properly provided.

    While working at a similar semi-giant - it was known that 'chip decap' - followed by optical inspection - often provided a (quick/dirty) - 'first-pass'  failure analysis.    The shrinking device geometries have caused VDD/VSS internal routings - more often than not - to be 'isolated.'    (i.e. each 'side' of the MCU has unique VDD/VSS routings.)     (such may be confirmed here - by 'careful & proper' (ESD Safe) probing - of a fresh device.)

    When the VDD-VSS 'short' - as you note - was confined to a single  MCU/IC side - such strongly pointed to the 'offender' arriving upon 'that' side.     When 'all' 4 MCU sides were impacted - most always - supply reversal or (prolonged) excursion - beyond VDD-VSS spec - was most likely.

    Is that device still available?    (It should be safeguarded - even though 'deceased!')     Most semi vendors will perform such post mortem reviews (graciously) - should you meet their (unique by vendor) 'qualifications.'     (Annual purchase volume - speaks most LOUDLY!)     Certain (advanced) board assembly houses may possess the equipment required  to perform such analysis - as well... (again - unless you have a 'solid' relationship - your checkbook - almost surely - will lessen...)

  • All,

    Thank you for your replies. As it stands now we have two failed devices out of 40 manufactured. This particular product is a new design, but we have been using this device in other products probably for the last three years without any failures. All supply lines are TVS protected and all user accessible I/O is TVS protected. The design was initially tested with an ESD gun and I have revisited this testing. However I have been unable to replicate the failures.   

  • RKRobinson said:
    The design was initially tested with an ESD gun... and revisited...

    Was such testing performed - both w/your equipment powered (under proper bias) and then (again) unpowered/unbiased?       It IS possible (even likely) for ESD events to occur when the board/system is unpowered - and is it certain (guaranteed) that ALL Protection components - will (still) function - at & with 'Full Force' - during those conditions?

    Were the components purchased from the identical source?     If low volume (non Reel) purchase - might your (failing) MCUs have received 'less than pristine handling' @ the Disty?    (clients of ours HAVE Reported such...  Distys  and/or their supporters - should 'direct Fire' at our clients - NEVER upon (this blameless/simple)  reporter...)

    Two out of 40 would, 'Warrant my firm's attention!'     Surely you've investigated 'Any/All' Differences - between those '3 year - failure free products' - and these 'new, failing ones.'     As one quick/eased means to 'Bridge that Gap' - might you replace several of the MCUs - used w/in the 'failure-free' product - with the newly arrived MCUs - and then 'Run those 24/7 (or as appropriate)' trying to 'Induce such failure?'    

    Along those lines - our testing of boards (bound for defense/medical device) always employ, 'Repeated & Continuous Temperature Cycling (Hi to Lo) - with the board's at/near Max Rating - for 'client agreed' duration.'    Boards - able to pass such testing - in our experience - ALWAYS ENJOY THE LOWEST FAILURE RATE - without doubt...    

  • Ralph,

    We have now had a total of 5 devices fail. All short circuit VDD-GND. I have revisited the design and looked at the following.

    VDD ripple: 48mV p-p also protected by a transil.

    All unused I/O pins connected to GND.

    Unused USB pins pulled to GND with 1K resistors.

    We do have the Vbat pin connected to VDD as specified in the datasheet when not used. I did check the rise time on VDD and it is far slower than the 0.7V/uS 

    Main Crystal oscillator follows table recommendations from datasheet. 

    Reflow profiles from our vendor.

    MSL issues. handled as MSL 3 devices according to our vendor.   

    We have used the LM3S9B96-IQC80-C5 prior to moving to the TM4C1294 for the last ten years without a single failure that I can recall. We have devices that have been out in the field for at least 3-4 years without failure. I would have expected if it was a design issue we would have seen more field failures. Is it possible to return devices for a postmortem? If so how do we go about this. Would it also be possible to talk to or have a visit from an applications engineer?

    Thanks,

    Richard.

  • Hello Richard,

    TM4C1294 has more than a couple differences from LM3S devices, they aren't exactly drop in replacements. Yes many design practices can be re-used, but I don't feel it's a safe assumption to compare the two like you are implying.

    That said, I absolutely agree that such failures are not expected and that something with the design has to be causing a currently unidentified issue.

    If you have not done so yet, I would recommend also cross checking the design again our System Design Guidelines: www.ti.com/.../spma056.pdf

    Regarding returning devices, yes you definitely can request for an analysis on what went wrong. This can be done by your local field sales team, or if you don't have such contacts, through this form: www.ti.com/.../createReturn.tsp

    The failure analysis process may be able to inform on possible root causes for device damage so you have a more narrow focus on investigation, depending on how the damage presents itself.
  • Ralph,
    The original design was done in accordance with the design guidelines. I have in last few days re-visited the datasheet, Design guidelines and errata. I have also compared the Launch pad design against our design. Can I send you the design and have it revived? Because if it is a design issue, I am unable to locate it. We ported all of our code to TI-RTOS. So other than I/O pin configuration we are using TI drivers.
    Thanks,
    Richard.
  • Hello RKRobinson,

    Unfortunately, we do not handle schematic and design reviews on the E2E forums. In general when asked for that, we recommend customers work with 3rd party designers such as those in the TI design network.
  • While, 'Oh for two' - in response received re: past diagnostic suggestions - arrives (one more!)

    Firm/I have observed (hearsay) that often the 'offending pin' (which introduced the MCU's destruction) will 'reveal itself' when the 'MCU is probed' (on or off board and unpowered.)      (DMM set to KΩ.)     The 'offender(s)' may (also) be shorted to VCC/GND (now joined) or be of 'significant difference' - when compared/contrasted to (other - usually GPIO pins.)

    This cannot equal the vendor's (proper) examination - but HAS proven of value (several times) and produces (far faster) results...    (when successful)

  • cb1,
    Thanks. I will investigate the i/O pins.
  • cb1_mobile said:
    Firm/I have observed (hearsay) that often the 'offending pin' (which introduced the MCU's destruction) will 'reveal itself' when the 'MCU is probed' (on or off board and unpowered.)      

    Likewise DMM diode check tends to find any shorted I/O gates or rail diodes just as quickly if not quicker MCU being left on PCB. 

  • Resolved posting reveals the truth is not the truth after all..
  • BP101 said:
    DMM diode check tends to find any shorted I/O gates or rail diodes just as quickly if not quicker

    How do you justify your 'If not quicker' comment?     Recall that poster was advised to, 'Check for Resistance Differences' - comparing/contrasting 'good MCU'  vs. 'bad MCU.'    When (many/most) DMMs are set to "Diode Test" - they prove 'ineffective' in making resistance measurements.    And it was those 'mid-range' resistance measures - which my group had found to be useful.

    By testing ONLY via your directed "Diode Test" - those critical resistance measurements will require 'function switching' (into the resistance mode) - and thus take FAR LONGER to satisfactorily complete!    In addition - you will note that the PAST SPECIFIED 'Resistance Mode' EASILY DETECTS (VDD or Gnd) shorts - without ANY requirement for your (clearly) DELAYING,  (near)  endless 'function switching.'

  • Yet typical curve tracer employs drop technology XY vector to detect bad gates or rail diodes. Ohms check applies 1 volt to the I/O and slowly charges surrounding rail caps, not as reveling of partially shorted GPIO pins.
  • When in doubt switch it out, DMM diode mode immediately checks voltage drop of pins rail diode! Though catastrophic 1ohm VDD to GND it is unlikely to reveal any better which pin/s shorted without removing MCU.
  • Cb1,

    We checked all the pins on the device and found several I/O pins that read short circuit to ground. Interestingly two pins that have no connection were short circuit. This lead me to believe that the damage was due to an ESD event. As I can think of no other reason for the gate structure to fail on an unconnected pin. Two other pins that drive LED's on the back of the board had failed in a similar way. I have tracked down what I believe to be the source of the ESD problem. This turns out to be a plastic part being attached to the board assembly. Thank you for your help.

    Richard.

  • Hello Richard,

    Thanks for the update, and thank you to cb1 for the excellent tip that helped solve this.

    If you would Richard, could you Verify the post from cb1 I have suggested as the resolution? That credits him correctly for the assistance for if any future users stumble onto this thread. :)
  • Thank you (both) poster Richard and of course - Vendor's Ralph - who provided encouragement - throughout this 'diagnostic exercise.'

    Note too - that there is an 'increasing desire' to, 'Install signal Leds upon the main (MCU) board - and yet, 'Pipe that light' (via a plastic 'light-guide') to a control panel - and that method INDEED ... proves ESD INVITING!

    Any/all connections - even those (strictly) mechanical - which exit the MCU's pcb - yet arrive @ 'the outside world' (panel, usually) are HIGHLY SUSPECT - and demand (very) careful consideration!

  • Have you considered the MCU has some ESD protection built into most GPIO pins and human body contact certifications. Highly doubt unless a blue arc spike above 20Kv has delivered such a plight upon said pins being simply from ESD. More like EMI from near by lightning strike and DC power supply intrusion.

    Placement of MOV at DC supply input will be invaluable in areas prone to strikes upon roofs covered in solar panels not 1 single lighting rod tied to ground, etc..
  • All,

    Just as an added data point. I decided to take one of TI's EK-TM4C1294XL booster pack boards and subject the I/O pins to some ESD discharges. The processor failure mode was identical to the failures we have seen in our boards. 

    Thanks,

    Richard.  

  • Thank you - that's an excellently detailed report - and 'further stresses' the importance of, 'STRICT ESD MEASURES - AT ALL POINTS - IN THE MCU RECEIVING/ASSEMBLY/TEST/PACKING PROCESSES!"

    Proper & Eternal  (and ENFORCED)  Vigilance  - appears the best, 'Take Away.'      Even as - and especially as - the necessary wrist-straps & static dissipation efforts - prove uncomfortable & inconvenient.

    Might you identify the 'Transil vendor' - ideally even device part number - if possible?     My firm has employed a (very) FAST N-FET - triggered by an (even faster) detection circuit - to 'Voltage Clamp' critical signal nodes - which are (necessarily) always monitored & so protected...

    [edit] ... somehow - poster Richard's 'Transil' and (other) effective ESD protective techniques posting - has 'disappeared!'     (replaced w/ 'Thanks')      Noted as this post was constructed (specifically) in 'follow' to poster Richard's (now AWOL) earlier posting...

  • Curiosity gets the Transil, anything like TVS diodes?

    Toshiba makes tiny 20Kv bidirectional TVS (DF2B5M4SL) tested level 2-3 ESD gun, difficult to see without magnification small.

    Another nice 3v3 - 3v6 TVS is OnSemi ESD9C3.3ST5G (15Kv air-8Kv contact, >16kV class 3 human body) <1ns response with SMT gullwing very easy add on to after market booster pack GPIO pins or place top of bypass caps (0402 pads). Yet how often does ESD simply come out of no where, typically being delivered via human or animal contact.
  • Cb1,
    Not sure what happened to my oringinal post, posted the general content again. and in answer to your question transils are TVS diodes. Just ST's trademark for their version of a TVS device.
    Richard.
  • Thank you - appreciated - several of our 'giant' clients seem to converge upon  'LittelFuse'  TVS devices.  (that's how they spell it, folks)     Will have to compare/contrast the specs - always a delight - when different vendors employ varying definitions/references/names - and 'points of (unequal) measure.'

    It appears that one, maybe two sentences 'survived' - w/in your earlier writing - yet the 'meat' has (disappeared) ... 

  • RKRobinson said:
    I decided to take one of TI's EK-TM4C1294XL booster pack boards and subject the I/O pins to some ESD discharges

    RKRobinson said:
    The processor failure mode was identical to the failures we have seen in our boards. 

    Your experiment has not proved the origin of said catastrophic GPIO damage. Seemingly your ESD gun testing reveals your PCB was exposed to human body contact in the field. Would not your system PCB be located within a shielded semi-metallic container per FCC rules regulations.

    Of course the launch pad will fail if exposed to ESD zap from gun well outside datasheet specifications and you admitted level 3 test 20KV zapped MCU. That is not conclusive evidence that ESD was or is the cause of your original failure. TM4C129x design guide clarifies external exposed GPIO pins near human body contact should be protected by TVS or other protection device. Seemingly your custom PCB omitted any protection devices in similar exposed GPIO areas??

    Again ESD level 3 testing of launch pad not exactly an apples to apples comparison. More possibly jump to an incorrect diagnosis based on very limited posted thread information of failed GPIO pins purposed to external or exposed human body contact of any control circuits. 

    A proper test would include adding protection to any exposed GPIO pins of custom PCB and only expose failed pins of (perceived) human body contact to similar ESD testing. Air transfer ESD would be controlled by proper layout of PCB with frame ground planted inside semi-metallic enclosure connected to earth ground. Perhaps once you have correctly identified the source of GPIO damage being outside the guidelines of ESD level 2 or 3 exposure then corrective action can be taken.

    Seemingly you are making a logical deduction based on poor forensic evidence.   

  • BP,

    The whole point of the test was to establish if the failure mode of the processor when exposed to ESD was the same as the failures we had experienced. I was deliberately aggressive with the intent of making the device fail. Using a level 3 contact discharge which is 6KV not 20KV. To this end the device failed with VDD shorted to GND. This is the same failure we have witnessed on 4 other devices. In addition to this we had taken a previous failed device and checked all I/O pins for damage, at least 2 pins that had failed could be directly linked to the addition of a plastic part that was added to the assembly during our production process.  Two other I/O pins that were unused and had not been tied GND also failed this I believe adding further credibility to the ESD theory. The failed device in question failed during the build process during production. This was not a latent failure. As I'm sure you are aware the human body model is a metric not an absolute. In the real world ESD events can easily exceed what is characterized by the human body model. especially in areas of low humidity or in air conditioned environments. Whilst it is impossible to categorically state that the root cause of these failures is ESD. There seems enough evidence to support the theory and whilst the electronics as shipped is protected by a metal case, during the build process they are not. All electronics that the user can come into contact with are suitable protected and have be tested with an ESD gun. All supply lines are protected by ESD devices. This is not a new design and the are a number of units that have been out in the field for over two years without failure. These failures have been a resent cluster which points to recent changes and or practices during production.

    This is a forum for sharing knowledge and ideas, your comments seem to be deliberately critical and unhelpful!

  • [Deleted by Forum Staff]

  • Hello BP101,

    RKRobinson has made it quite clear he is satisfied with the discoveries he has made and I see no issue with his findings.

    Based on what he has expressed in his last posting regarding your feedback, I've removed the contents of your recent post and would ask you please no longer reply on this thread.