Hi, We have serious problem with LM3S1B21 C5 devices. I carefully read all posts regarding the Stellaris flash problem - and I think we encounter something really strange. Facts: 1.First production 100+ boards ( several months developments with prototypes - described problems not observed off course...). 2.LM3S1B21 C5 marked IQC80C5XD $A 16P090H G3. 3.PCB - two layers IMHM quite well decoupled with many 100nF caps. 4.LDO internal (assuming C5 version flash corruption due improper supply ramp voltage resolved..) decoupled with 2x1uF 3x100nF caps. Observed startup supply ramp voltages definitely NOT as in datasheet file - but I belive in C5 version it is OK). 5.Supply - DC/DC LT3480 with quite good look at oscilloscope. 6.7,3728MHz main oscillator 4.194 MHZ for hibernation. We use battery backup circuitry with also TI battery charger BQ24745 device. 7.After first boards released - troubles - three boards returned from our customer - flash corrupted. No JTAG connection - we could only read device ID (JTAG tap: lm3s.cpu tap/device found: 0x4ba00477 mfg: 0x23b, part: 0xba00, ver: 0x4) with no possibility to start JTAG tools. JTAG recovery procedure did not work except on just one board - but we regain JTAG control only if we omitted the last recovery point - power cycle the microcontroller. So first deep look at the forum - and activation brownout solutions. In our first software release, there was default settings. So far so good - we did have hope that there were troubles caused by instable supply period during start up and down processes. 8.There is no clock activity on locked boards - and another funny thing with BQ24745 - in some cases BQ24745 gives 100Hz+ spikes on AC_OK which propagate to WAKE circuitry.. (After BQ24745 programming by our uC spikes are gone - so maybe there is wrong LM3S1B21 activity during failure, causing BQ24745 goes into troubles...) 7.BUT, Yesterday one board in lab did not started (software with brownout enable). Inspection - flash corrupted. Many attempts, with reset, JTAG etc. And here comes our case: We are pretty sure that JTAG loads the proper software into uC flash memory ( read/verify), than comes reset (any kind - EXT RST, soft reset, hibernation, POR) and we got one 1kB page starting 0x00014c00 filled with 0xff !!! (only 32bit words of address 0x14c00,0x14c08,0x14c10, 0x14c18... are erased, but 32bit words of 0x14c04,0x14c0c,0x14c14,0x14c1c... are OK). This was stable situation for 10...20 times and always the same page corrupted. What is going wrong?? And finally - board stops working and we got one more LM3S1B21 C5 not responding any more... 8.Is there a chance that our wrong EXT RST circuitry damaged the flash? We connected 220k pullup to VBAT instead to VDD ? In some situations (i.e hiberation) VBAT was present without VDD. Now it is fixed ( 27K to VDD - quite short connection and very close to uC)) but we are trying to catch a reasonable explanation. 9.What else can we/should we test? 10.We are aware about 100 flash reprogramming limit with 10 years guarantee - but that is not a reason - our uC were programmed 2-3 times so far.. 11.Now we are going to setup the on/off test for bunch of boards and may by we can catch more troubles like this in our lab, but off course time is running and customer demands are waiting...
Best regards,
Jerzy Kasperek
Hi Jerzy,
It sounds like you are looking in all of the correct places for solutions to this issue.
Can you elaborate on the brownout solutions you tried? Have you implemented the brown-out reset, or are you using brown-out interrupts?
Also could you please post oscilloscope traces of VDD and VDDC on power up and power down? If you would rather send these to me directly, I will send you my contact information.
Regards,Christian
Hi Christian,
Thank you for your fast reply.
1.Regarding brown-out reset - we just set BORIOR bit in PBORCTL register and we observed that it works OK. We use MDK-ARM Professional Version 4.232.Yesterday we connected 16 boards via a relay to power supply controlled by simple apps - just loop switch on for 10sec then switch off for 25 sec . It gives time to whole module switch on, then go to hibernate. 15hours test (>1500 off/on runs)- all boards are working OK.
3.Power down / up oscilloscope screenshotsA few words of explanation. We use internal battery as the supply backup. To investigate all cases I present power up/down cycles with and without battery.3.1.Power up without battery: (green signal taken from a node where battery and main supply meets and where from all uC voltages are taken to DC/DC converter…)
3.2.Power down without battery
When we got battery mounted so uC is working or is hibernated
3.3.Than we can switch off main supply - and after a while uC goes hiberate
3.4.Now we can wake up from hiberate (base on tilt sensor) there is still no main supply
3.5. Or we can wake up due to the main power returns.
3.6.And finally - there is no main power so we work on battery - an we can switch off
3.7. and switch on unit on again (battery only)
4.I've sent to you corrupted binaries as well as schematics - but just to fulfill potential post reader curiosityhere it is ... (red marked values show corrupted flash)
Any clues?
While you await official response - suggest this approach to "tighten" your problem identification:
a) Eliminate all aspects of the Hibernate Mode - force the part to be always "awake." Goal is to determine first "if" - and then how/where/when/why Hibernate causes or contributes to your corruption issue.
b) Reduce the number of power routings and/or sequences you present to the "BUT." (board under test) Goal is to determine if certain routings/sequences always work - and then to determine if certain routings/sequences guarantee corruption. Urge you to use the "best quality power" possible during this exploration.
c) You have a very nice series of scope captures - after performing methods (a & b, above) exaggerate the most suspect waveforms (both timing and rise/fall times) in the attempt to raise the frequency of failure. (at this stage - we simply seek to force the issue - find where (or if) the MCU is most sensitive to such disruption) You should test a minimum of 3-10 boards at a time - each isolated from its neighbor. Do everything in your power to cause the corruption issue as quickly as possible.
d) You need to develop a faster, better means to detect the onset of corruption. (one quick thought - place simple port toggle code w/in your program such that the program must call & execute all major functions - and then toggle port Led. This code then loops.
e) The fact that the corruption starts (and perhaps ends) at the same location argues against the suspected Power Fault. Where in your code do you read/write to this or nearby location? What normally resides in this location: code, data, anything special? Is this location a predictable "offset" from some other operation your code performs - possibly with this corruption as an unanticipated consequence?
f) And last - relocate your code (similar to boot-loader placement of code) to completely avoid this corruption zone. (if possible) If you get lucky - the corruption may persist at the vacated locations - hopefully to reduced (or no) bad consequence. Or (as I fear) the corruption zone will follow (seemingly track) your flash offset - which may provide further insight into its cause.
Perhaps something to consider as you await officialdom...
Will be brief - so as to not "get in the way" of official response you clearly seek. (however have encountered - and solved - such flash & ram MCU issues)
Jerzy Kasperek we got one 1kB page starting 0x00014c00 filled with 0xff !!! This was stable situation for 10...20 times and always the same page corrupted. What is going wrong??
Confirming IDE memory-view screen-shot you provide does not agree - instead your red high-light starts @ 0x0000.cc00 ! Of course you are under pressure - rushed - but devil is in the detail in these situations.
After further thinking about this flash corruption - becoming doubtful that your near exclusive emphasis on power conditioning is the only cause. If - despite the conflict within your flash "corruption start" location report - the corruption always "locks" to a certain range - suspect that software rises to firm suspect. (if not prime) I'd look especially hard at "set-up or incompletely deleted/switched-off flash test/view/seed" code. (we've found both instances @ clients designs) Base this upon the belief that any power-based flash corruption is unlikely to "cleave so cleanly" and "be so repeatable!" For completeness - have not read your MCU's (and most similar) errata history - to see if such consistent flash corruption has been past reported and/or acknowledged. (this very much your job!)
Will bow out now so offical analysis may proceed. (although these 2 posts should not have discouraged - may even have been of some value...)
OK, sure you are right - but I also added:
(only 32bit words of address 0x14c00,0x14c08,0x14c10, 0x14c18... are erased, but 32bit words of 0x14c04,0x14c0c,0x14c14,0x14c1c... are OK).
and I did not tell about the offset 0x8000 (uups sorry) - so it makes 0x14c00. For sure, there is no mistake in my post and the presented screenshot.I dig the forums IMHO quite well - and I do know (as you suspects....) this is first of all, MY company problem... The only clue so far is power up/down since there were notices in errata for C3 version that it can damage the flash permanently. So, is there a possibility that uC Stellaris malfunction software can damage the flash for ever?? Really? - if so, I must seriously think about uC retargetting...but before it, I would like to do such a test... please give me an example, if you can. Then I will concentrate on my software potential bugs... Off course I can overwrite some fragile registers and hurt my uC. But I imagine than even then, I can still erase the whole device via JTAG or SWD ? yes or no?
Regards,Jerzy Kasperek
No one seeks your sorrow - overall you've organized & presented a clear report - good job. My comments aimed to save you time - and reduce wear/tear on TI staff. (outsider may be qualified to perform basic analysis/problem organization)
The LMI Flash Programmer SW Tool can often "recover" a locked-up MCU. (you will 1st need to discover your MCU's "class") Have you tried this on your dead boards?
You do not address the issue I flagged, "Why and How does a "power up/down" cause data corruption to such a limited Flash memory range - and with such consistency?" Of course you must find some solution - but I think that your tight focus on, "Only Power" may retard your identification of some more likely cause. And that remains your software - and I have provided specific areas for you to investigate... And lean more heavily to your software as causal nexus...
Not my role/job/desire to defend MCU vendor - best to avoid negative comments (i.e. retargetting) and stick to the engineering issues as presented. (Likely you will need official analysis/comments once enfeebled/outsider diagnosis proves inexact...) (but one hopes not)
Read Sue Cozart's sticky post (atop this forum) - this reporter receives some credit and much of that post will prove helpful. Absolutely insure that you have not re-purposed PC0-PC3 - especially inadvertently. (and any other JTAG related pin - we have seen cases where sofware has over-flowed - and overwritten these registers.
Circuit strangely quiet. (NASA: "LOS" Loss of Signal - when craft's "straight-line" earth-view blocked) No such weekend siesta for our busy tech group...
Made the time/effort (Sat - 04:00 local) to review our discovery & correction of past, ARM, MCU flash corruption cases. (several Stellaris - many more from other ARM MCU vendors) And something "jumped out" - which remarkably duplicates an aspect of your code and implementation! (drum-roll, maestro... many of these similar, flash-corruption issues included, "CRC check" code - just as does yours!) Due to client confidentiality issues - cannot provide the exact solution here - but can state that in multiple instances - the failed "CRC check" code vector - produced unexpected (and very unwanted) consequences. Your treatment of the JTAG programming pins is also of very great importance - do not recall your description of their termination/treatment as your board is released to client.
While "official" LOS persists - suggest that any/all "block writes (even block reads)" be deeply revisited. (ideally by 2nd SW writer - free from, "pride of authorship!" - which descends too often into, "pride of defense"...) SW "NIH" is very much desired in your present situation...
Thank you cb1,
Investigations SW issues are also under way...
I am HW engineer so I concentrate on schematic and PCB design -> definately we've should correct JTAG circuit - add external pullups ....
Posted by cb1_mobile on 16 Mar 2012 8:35 AM) : Urge you to use dedicated pull-up R's for all JTAG signals - "promised" (w/in MCU) not always fully/properly implemented
:-) :-)...
Pins PC0...3 are used only to JTAG in our design...My question from previous post is still open - can I damage permanently flash by malfunction software?Using LMFlash utility we recovered just one dead board out of four. The others just say to JTAG:(JTAG tap: lm3s.cpu tap/device found: 0x4ba00477 mfg: 0x23b, part: 0xba00, ver: 0x4) and that is all :-(...
Jerzy KasperekMy question from previous post is still open
Jerzy - you don't want this reporter to "steal" all fun from vendor guys/gals - damage issues as you pose require great breadth & depth of data - not usually available to even "connected" outsider.
Are you sure that you've identified the proper "class" of your MCU - don't you need that to effectively use LM Flash Utility? Pointless to guess here - give the utility a chance to work - find your class MCU and enter it prior to attempting the "unlock."
The fact that you get some response via JTAG is generally good - which JTAG probe are you using? (recall that you use Keil - perhaps a U-Link?) Try another - and reduce the JTAG speed w/in your IDE by 25%, minimum. Install proper pull-ups (they may be external) to 3V3 to every JTAG pin (beware - sometimes PB7 has some degree of JTAG status - re-read your datasheet) *** Do not power these JTAG pull-ups from separate source - power them from 3V3 from your board - so that they can't "come up" before your board.
Something for you as HW guy to consider - "How many times did your hardware fault cause exactly the same 10 passives to fail?" Always same ones - in same locations. (assumes these are not series-connected - then string over-driven!) Yet this is the diagnosis seized upon - I believe mistakenly - at minimum incompletely...
BTW - not ideal practice to build/test/verify - then ship such small numbers. Your group has employed poor client as "after the fact" QC department - never good. (even though others may do this as well) Suggest that you build 10 board minimum asap - with external, on board JTAG pull-ups - and relaunch your tests.
*** In light of past finding (i.e. CRC check) suggest that you temporarily remove this "CRC" code block and repeat test/verify. Our report is indeed "anecdotal" - admittedly less than ideal - but in the absence of "official" analysis/comment/direction" - is the best you've got for now... (and we can hear your SW guy "groan/protest" from here...)
*** after bit added thought - LMI/TI's minimalist "blinky" program should be "near ideal" means to determine HW versus SW as likely flash corruption cause. Do NOT allow your SW guy to alter this - want the bare metal SW - I would bet the sailboat (not beach house) that corruption will not occur! (your lack of boards to "keep up" with fertile stream of suggestions is proving costly/frustrating!)
Hello,
Thank you for support from Christian from TI.
Some news from our testbed. We prepared 8 boards with original software and hardware (with error on RST net - pullup to VBAT) and 8 boards with RST network corrected and burnout bit set. We supply these 16 boards from their local batteries (we prepared near to exhausted batteries just to test margin situations...) and we are switch the main power on /off (performing on/hibernation/off cycle). Our software (bootloader and application) was the same as those already delivered to client except one feature - external eeprom where application shadow was stored was cleared. So if bootloader found crc error in application, there were no attempt to overwriting flash since there was no new data to do so... And we can still connect JTAG and see the trouble...
And today afters 2 days testing our SW engineer discovered:
"Originall" boards:
One board with bootlader flash corrupted (highlited in red differencies between the original and corrupted flash)
This board does not work.
Two others boards with flash corruption (although application still is working OK since corruption is out of active code )
First board:
and the second
Unfortunately one of corrected board also shows difference (application working OK)
Remark: presented data are loaded with offset 0x0000 to uC.
So - any clues what can change just single bits (0->1 and 1->0) errors in flash memory?If the bug is in our software it is very strange - we do not change single bits...I still suspect the power up sequence... Shall we implement flash protection for our bootloader?Is there any region flash somehow more vulnerable for such problems - we can shift bootloader...Or add external voltage monitor and somehow cut the power to uC when the power is too low?(duplicate internal brownout?)
(poster's delayed/uncertain response - reluctance to follow (or even acknowledge) brief, critical suggestions - shall find other walls upon which to pound head...cb1 exits)
Should TI have provided additional guidance (beyond the single, earlier TI post) such "private" guidance benefits no one (beyond lone recipient) seems counter-productive to "multiple, other" Stellaris users who may encounter similar issue/condition in the future... (and outside the purpose of such forum!)
Hi all interested!,
The last two weeks we performed many, many tests in our lab. So far, there is one conclusion - no problems running the whole supply cycle (power on -> hibernation -> power off... ) without a battery or with charged battery. Problems with uC flash occured again on two boards running with almost exhausted batterry. We are pretty convinced that uC flash does not tolerate troubles that might appear on VDD during unsuccesfull start supplying VDD DC-DC converter (with VBAT present..). uC burnout circuitry does not protect microcontroller in such a case. And naturally, during project review, we have found also some others pitfalls - like DC-DC converter soft start troubles or some minor bugs in software..
During test performed we've learnt a lot about our design :-)...
In the meantime, we had a telecon with TI and two our "locked" boards are already in TI lab Texas, for investigation.
So - just today we completed additional PCB - battery monitor with transistor switch - when battery voltage drops below fixed level ( with hysteresis of course..) we cut off the power from our board completely - there is no more troubles in supply circuitry. Tomorrow we are going to start testing the whole system... and we are waiting for TI locked uC diagnosiss ..
Hi, Unfortunately the case is still open... Today we noticed the flash corruption again...
After prevouis test we were convinced that the start VDD from low battery causes flash troubles. Test bed with 16 board worked OK for many days working in power on power off cycles without batteries (without entering the hibernation stage...) In fact, with low battery, there were unpleasant "hips" and "downs" on oscilloscope during VDD start during out-hibernation process and expecially during GSM supply (DC-DC with high capacity load...) start in normal mode. So, we added low battery cut-off circuit and modified a soft start circuit DC-DC for GSM and now VDD looks perfect - but one board shows different flash image...Regards, Jerzy Kasperek
Hi,
Interesting thing - we reprogrammed the board with corrupted flash from previous post and started to test it again... The next day..... , the same test, the same board (out of 16!!) does not work... But this time, we can not read the flash any more - another microcontroller locked...
So, if flash corruption occured twice, it seems that this board/uC has a greater vulnerability from completely unknown reasons...