MSP low-power microcontrollers

MSP low-power microcontroller forum

State TI Thinks Resolved
Locked Locked
Replies 38 replies
Answers 2 answers
Subscribers 80 subscribers
Views 19410 views
Users 0 members are here

Support feedback

Options

Options

Related

MSP430FR4133: How serious are correctable FRAM errors?

Tom Dehler

Intellectual 755 points

Part Number: MSP430FR4133
Other Parts Discussed in Thread: MSP-FET

After developing an application using the MSP430FR4133, the first 16 devices where build
and testing has begun. One of these devices showed a correctable FRAM error. I've almost
forgotten the fact that I enabled the NMI for these errors over a year ago since I never
experienced them on my two test boards.
So i wonder if this single error seen here is the famous "it just happens once in a
lifetime but now it happened to you" error or it is something serious.
I didn't find a lot of real world statistics about FRAM errors -- just the note that
soldering the devices with an improper temperature could cause problems.
Yes, it was a correctable error and only one but I've no idea if I shoudl worry about it...

over 8 years ago

0 Jace H over 8 years ago

TI__Guru 56685 points

Hello Tom,

One correctable FRAM error is not that much of a concern. It would get more concerning if there are several or repeated errors. I would be more concerned about uncorrectable errors similar to what was found on this E2E post. e2e.ti.com/.../575920

For more information about the reliability of our FRAM, please see the following application note. It contains more information about error rates and potential fallout (very low). http://www.ti.com/lit/slaa526

0 Tom Dehler over 8 years ago in reply to Jace H

Intellectual 755 points

Hi,

thanks for answering. I also was under the impression that one correctable error every now and then
is not of much concern. Will try to explain the whole story:

As I said there were 16 devices built. The firmware used on them is in a quite good shape -- it ran
on 2 different testing devices for months without problems. Of these 16 devices now built, 14 run
perfectly, one showed this single correctable error and the 16th is driving me nuts:

It restarts about 5 to 10 times a day. It starts at the entry point of the whole firmware -- this
it where the interrupt sources "System Reset (0xFFFE)" points to. Nothing else points to this
location and I added some debug counters and verified that this place really gets hit with each
restart.

I programmed quite a comfortable post-mortem analysis code which records, among other data, the
contents of the SYSRSTIV register so I can figure out the reason for restart. This is proven to
work as pushing the Reset button leaves 0x0004 there and a software POR gives me a 0x0014.
Interestingly, for about 90% of all these unwanted crashes, SYSRSTIV is 0x0000 but the debug
counter gets incremented so I can be sure it was caused by the "System Reset" vector at 0xFFFE.
I don't understand this as each restart should have a corresponding interrupt event which is
different from 0x0000.

The other 10% of these unwanted crashes are uncorrectable FRAM bit errors as I can see by the
value 0x001C left in SYSRSTIV.

I've had read your PDF about FRAM reliability before so currently my only conclusion would be that
the processors were affected somewhere or somehow. Maybe the soldering was inappropriate
(too hot / too long) or the sealed packs where opened to long before soldering or whatever.

Maybe I should replace the CPU of that crashing device and see if it performs better then...

Any ideas anybody?

Thanks!

0 Jace H over 8 years ago in reply to Tom Dehler

TI__Guru 56685 points

Tom,

If you have the ability to do a chip swap, that could give you a valuable data-point. Also, for you have a pull-down on the TEST pin as recommended in errata text for PORT28 errata? I could see accidentally entering a JTAG/Debug mode causing some resets.

0 Tom Dehler over 8 years ago in reply to Jace H

Intellectual 755 points

Yes, I can replace the chip. However, then all debugging options regarding
this issue will be gone ;-)

Stupid question: I've read PORT28 before but I wonder if, when using whatever
pulldown, won't there be a continuous reset and the device will never start?
Currently, RST_NMI_SBWTDIO is connected to a 1 nF capacitor against GND, a reset
switch and (using a short trace) to the pads for a TAG-Connect tool. SFRRPCR is
in it's default state (0x001C) so currently the pullup is active.

0 Jace H over 8 years ago in reply to Tom Dehler

TI__Guru 56685 points

Tom,

The pulldown should be placed on the TEST/SBWTCK pin. The RST/NMI/SBWTCK should only have the recommended 1nf cap to ground.

0 Tom Dehler over 8 years ago in reply to Jace H

Intellectual 755 points

OK, now I got it:

SFRRPCR.SYSRSTRE normally should only control the pull resistor of RST/NMI/SBWTCK .
But, due to a bug, it also controls the pulldown of TEST/SBWTCK. Therefore PORT28
recommends to leave SFRRPCR.SYSRSTRE set to 1 so we can be sure, that TEST/SBWTCK
gets pulled down. But this way we enforce the pull resistor of RST/NMI/SBWTCK to be
enabled as well so one has to use SFRRPCR.SYSRSTRUP to ensure this resistor is set up
in a way that it doesn't hurt.

As my SFRRPCR is in its default state (0x001C) I should be fine regarding this. However,
I will additionally tie TEST/SBWTCK to ground and see what happens.
Will report later, thanks so far!

0 Tom Dehler over 8 years ago in reply to Jace H

Intellectual 755 points

Jace H said:
The pulldown should be placed on the TEST/SBWTCK pin.

OK, here are the results:

First, just to be sure, I checked the internal pulldown and this gave
me approx. 32 k in both directions.

Next step was adding an additional 10k resistor to GND. Didn't
help -- the first restart came after a few minutes.

Then I tried pulling it directly to GND. Same result.

And now comes the fun: Whenn pulled to Vcc the beast runs for
24 hours without any restarts. Of course, the reset pin doesn't
work here since applying Vcc to TEST enables the JTAG which in
turn disables the external reset function.

So one explanation could be that this chip is somewhat messed
up and generates resets internally without real reasons. This
also could explain why SYSRSTIV is zero in 90% of all cases
and why this behaviour isn't seen on the 15+2 other devices.
By enabling TEST the reset function (including the errorneous
parts) is disabled.

0 Dietmar Walther over 8 years ago in reply to Tom Dehler

TI__Genius 14165 points

Hi Tom,

Jace invited me to this thread.
Did you use these parts in an ESD safe environment or did you operate them on your office desk without any ESD protection.
The 16th device which is showing resets without a value (0x0000) in the SYSRSTIV is strange. Even more it looks like damaged if you pull up TEST pin and enable JTAG mode.
Can you do me a favour and measure LPM4 current of this device with all ports to output low and high to see if we can identify leakage on this part?

Not sure if already asked but I think you set up the wait states according your operating frequency to ensure your operating in spec right?

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Dietmar Walther said:
Hi Tom,
Jace invited me to this thread.

Welcome ;-)

Did you use these parts in an ESD safe environment or did you operate them on your office desk without any ESD protection.

Well, for me I can say "yes" and for the manufacturer I'd say it as well (they
manfactured and are still manufacturing the predecessor of this device with the
MSP430P325A for ages).

The 16th device which is showing resets without a value (0x0000) in the SYSRSTIV is strange. Even more it looks like damaged if you pull up TEST pin and e
nable JTAG mode.

But isn't the behavior to still run with an enabled but not actively used JTAG normal?
Additionally, I use the JTAG (SBW, to be exact) all the time to upload software with
the MSP-FET and MSPFlasher on this device...

Can you do me a favour and measure LPM4 current of this device with all ports to output low and high to see if we can identify leakage on this part?

Well, that's difficult: Almost all ports are in use. Static low should be possible
(will have to check) but static high will produce bad effects which will make any
current measurement void.
But I can say this so far: The software goes down to LPM3. The usual current of
the whole(!) system in this case is around 3 uA (LCD, RTC,...) . This was measured
using a Keithley 2k DMM. This current is correct on the 15+2 other devices and
on this one as well.

Well, in case we decide to rip the whole part off the PCB I could cut all pins
apart from power, xtal, SBW,... and try that LPM4 measurement.

Not sure if already asked but I think you set up the wait states according your operating frequency to ensure your operating in spec right?

I run 16MHZ with NWAITS_1. But that brings me to the idea to try something slower,
just to see if it helps...

0 Dietmar Walther over 8 years ago in reply to Tom Dehler

TI__Genius 14165 points

Tom,

thanks so
- you operating in ESD safe environment
- you LPM3 current does indicate leakage means there seems to be no critical damage ( I think no LPM4 with all GPIO output low is required here)
- 16MHz with NWAITS_1 is according spec but please try slower to exclude this

And yes you're right enabling JTAG does not necesarily stop the device as long as you do not apply sequences to the JTAG pins. So are you JTAG pins terminated while you do this?

Do I understand this right that you still can reproduce the behavior and are able to work around it?

Another thing coming into my mind is a STACK violation. Are you sure your STACK cannot be corrupted by your firmware or data management?
Can you reproduce the behavior also with debugger connected?

0 Jace H over 8 years ago in reply to Dietmar Walther

TI__Guru 56685 points

Tom,

This is peculiar to say the least. Two other tests I think you should run.
1) Set RESET to NMI functionality and see if the same resets happen.
2) Perform an A-B-A chip swap. this would mean replacing the current misbehaving device with a known good chip, and placing the misbehaving device onto a known good board. From here we observe does the issue follow the chip, or the board.

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Dietmar Walther said:
Tom,
thanks so
- you operating in ESD safe environment
- you LPM3 current does indicate leakage means there seems to be no critical damage ( I think no LPM4 with all GPIO output low is required here)
- 16MHz with NWAITS_1 is according spec but please try slower to exclude this

Went up to NWAITS_7 -- still resetting. Currently I am running it
with 8 MHz and it did NOT reset yet. But this means nothing as
it's running for 18 minutes only. Will have to wait until tomorrow...

And yes you're right enabling JTAG does not necesarily stop the device as long as you do not apply sequences to the JTAG pins. So are you JTAG pins termin
ated while you do this?

Well, all pins I do not use (just 4) are pulled down internally.
The reset pin sees the typical 1nF and nothing else.

Do I understand this right that you still can reproduce the behavior and are able to work around it?

In order to reproduce it I have to:
- Leave TEST/SBWTCK open (internal pulldown) or GND it
- Wait. Sometimes 2 minutes, sometimes 2 hours. And all in between.

I have no workaround for it apart from pulling TEST/SBWTCK high!

Another thing coming into my mind is a STACK violation. Are you sure your STACK cannot be corrupted by your firmware or data management?

Well, I have 17 devices running without problems. One for weeks and
another one for months. Software is all assembler, no libs, no foreign
code, no funky compiler bugs. I don't need a lot of stack as I don't do
pushs and pops. Just some function calls and interrupts. I have a stack
guard sitting at the bottom (and below this there is still unused RAM)
and it never got overwritten.

Can you reproduce the behavior also with debugger connected?

I don't have a debugger, just an MSP-FET which is only used for upload
and download with MSPFlasher. Currently, the device runs unconnected
but I can retry with an MSP-FET connected (doing nothing, of course).

0 Tom Dehler over 8 years ago in reply to Jace H

Intellectual 755 points

I will try 1) tomorrow (or as soon as the 8 MHz test has failed ;-))

Regarding 2): I can replace the chip but I have my doubts that the currently misbehaving
one will survive. I assume that, as soon as I replace it, the board will behave properly.
It's no fancy design -- two layers (bottom almost entirely GND), some 0805 parts, some
SOT-23, a 24LC512, an FTDI for USB and an LCD. Easy to check optically and electrically
and manufactured together with the other 15 (good) devices...

0 Dietmar Walther over 8 years ago in reply to Tom Dehler

TI__Genius 14165 points

Tom, Jace,

thanks for all the inputs up to now I have not really an idea what's causing these resets based on the information we gathered so far.

The A-B-A swap is dangerous because it could heal out effects by solder heat therefore I would vote to do this as a last option.

Assuming it is related to memroy due to the fact that with 8 MHz and NWAIT_7 it seems to work we shoudl be able to see this with the debugger connected as well and maybe it is possible to idnentify the location where it happens.

Tom,

when you have the FET your already have the HW, do you also have the complete project and the source code for IAR or CCS so that you can let it run in an IDE to set breakpoints and debug the project. By doing this we might be able to gather more information.

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Dietmar Walther said:

The A-B-A swap is dangerous because it could heal out effects by solder heat therefore I would vote to do this as a last option.

Yes, will do this only when all other options are gone

Assuming it is related to memroy due to the fact that with 8 MHz and NWAIT_7 it seems to work we shoudl be able to see this with the debugger connected as
well and maybe it is possible to idnentify the location where it happens.

Wait, just to be clear: The test with NWAIT_7 was done with 16 MHz and failed!
The 8 MHz test (w/o WAITS) is not yet completed.
In fact, today in the morning it was running for 9 hours already -- something
I've never seen with 16 MHz. I didn't touch it and we'll see what will have
happened later in the afternoon today. If it survives with 8 Mhz, I will
switch back to 16 again and see if the reboots come back (they probably will
but it's better to be sure).

Additionally, the test what happens when RESET is configured to cause NMIs
has to be done as well...

when you have the FET your already have the HW, do you also have the complete project and the source code for IAR or CCS so that you can let it run in an
IDE to set breakpoints and debug the project. By doing this we might be able to gather more information.

Never touched IAR or CCS -- I am using a combination of cpp, gas (and other
binutils) and srec_cat with MSPFlasher. And IAR or CCS won't run on my
platform.

0 Tom Dehler over 8 years ago in reply to Tom Dehler

Intellectual 755 points

With a clock of 8 Mhz (no Waitstate) the system was running 26 hours without
resets. Then I switched back to 16 MHz (one Waitstate) and the first reset
came after a few minutes.

I have no idea what could be wrong. The power supply should be OK: One 1uF
sitting about 5 mm away from the chip, next comes a 10 uF and the TPS78233DDC
regulator. Vcc has got short and thick traces and GND is a GND plane anyway.
The chip doesn't have to drive a lot: One LCD, a 24LC512 EEPROM, a few
MOSFETs and thats all. I have also checked the 3.3 Volts with a fast scope
-- no glitches or noise, just a flat line.

When enabling the 16 MHz, the DCO is loaded from the TLV values and the
lower 9 bits of CSCTL0 jump between 164 and 165. That's not perfect but OK.

If time permits, I will play with the NMI function tomorrow and also enable
DCOFTRIMEN and set DCOFTRIM to 4 (this brings CSCTL0 closer to 256) just
to see if things change here...

0 Tom Dehler over 8 years ago in reply to Tom Dehler

Intellectual 755 points

I am getting closer. First some details which are needed to understand
what's going on:

The device uses an USB bus powered FTDI USB<->RS232 converter to transfer
data to a PC. I noticed that these spurious resets do NOT happen, when the
USB was connected (for normal operation it is not connected).

During normal operation, the device goes to LPM3 and is woken up by the
RTC ISR every 500 ms. It does some stuff for 10-100 us , then it goes back
to LPM3.

However, the FTDI's 3.3 Volts used for its interface logic is also fed to
a port pin of the MSP. This way the MSP can detect if the user has (dis)
connected the USB port and perform some appropriate action.

I am using a high baudrate which makes it necessary to use SMCLK for the
UART. So one of the tasks we have to deal with when the USB port is in use,
is to prevent the device from entering LPM3 -- the RTC ISR looks like this:

RTC_Interrupt:
do some stuff
if PC_connected then
prepare LPM0
else
prepare LPM3
fi
reti

This way the UART is available with its high baudrate as soon as the PC has
been connected.

When I noticed that these spurious resets were gone as soon as the USB was
connected, I disconnected all lines from the FTDI to the MSP to see what's
happening. And as soon as I pulled off the above mentioned 3.3 Volt
USB-is-present line, the device started to reset again after a few minutes
even though the USB port was connected.

The logical consequence was of course to self-supply the MSP with it's own
3.3V on this port pin (so it assumes the USB port was connected and prevents
it from entering LPM3). And now it runs for hours without reset (in LPM0 of
course).

So we can assume that something is happening with this device (but not the
17 others) in LPM3. Now I only have to figure out what...

0 Dietmar Walther over 8 years ago in reply to Tom Dehler

TI__Genius 14165 points

Hi Tom,

good that you make progress but I think I have to think a bit more in detail what you said but this triggers some more questions:
1. how the device is powered?

2. could it somehow be that you remove primary supply (DVCC) and still supply any GPIO pin with another voltage? By this you would supply the device via the ESD rail backdoor which is not allowed.

Best regards,
Dietmar

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Dietmar Walther said:
1. how the device is powered?

Vcc is good and stable (dedicated tps78233, close to the dvice and
sufficent capacitors).

2. could it somehow be that you remove primary supply (DVCC) and still supply any GPIO pin with another voltage? By this you would supply the device via the ESD rail backdoor which is not allowed.

I first thought in the same direction (assuming something in this
very chip is broken w.r.t. power distribution and that it gets
supplied via some port pins in the affected areas). But since I
changed jumping into LPM3 at the end of the ISR to switching only
into LPM0 it works even without USB (as this is the same what it
would have done with USB).

Given the fact that 8 MHz works and 16 MHz works only if we do not
drop into LPM3 makes me think that FLL / DCO might go insane on
this chip and it is slowly drifting out of the specs so it does
these funny things (like resetting w/o a vector or occasionally
spitting out FRAM errors).

This could be due to my usage pattern:

The cpu is in LPM3 all the time. Every 500 ms the RTC ISR wakes
it up. The task it has to do is normally quite short: between
1 and 4 ACLK cycles (that is something between 30 and 120 us).

It is currently configured with the FLL ON during these times.
However, I read in other posts that there are CPUs where the
FLL needs a longer time to stabilise the DCO properly and if it
doesn't get this time, it's going nuts.

I probably have to try somehting in this direction if no one has
a better idea...

0 Dietmar Walther over 8 years ago in reply to Tom Dehler

TI__Genius 14165 points

Tom,

ok got it so going into your direction with the clocks can you measure them with a scope in the critical conition and can provide the scope shots?

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Well, I played with this idea already. Fortunately, P8.0 is free so I can enable SMCLK here.
OTOH, it somtimes takes one hour to crash so I would have to sit one hour staring at the
scope to see what happens ;-). Well, I could try to check if trace buffers are big enough to
capture SMCLK for enough time so I put it into some pretrigger mode which actually gets
hit on the device's reset and then scroll back to see what had been...

My current approach is to disable FLL stabilisation entirely just for testing purposes -- as
long as I do not change Vcc (this will not happen) or the temperature, this should not hurt.

0 Dietmar Walther over 8 years ago in reply to Tom Dehler

TI__Genius 14165 points

Yeah you're right but you can use the pulse width trigger and trigger to a lower pulse witdth as you actual clock and let it run.
I mean assuming the clock is causing it due to whatever I would expect it goes to fast. But might be that you have to run it twice triggering for high and for low pulse (spike).

But for sure you can disable FLL as well but watching the clock gives you more confidence that the clock signal looks clean.

Another thing is that you might implement port toggles before all the power mode changes to trace with a scope when the device stops. This might help us to better understand when it happens.

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Trigger on pulse widths? Wonder how to teach my 1988 Gould scope to do this ;-)
No, speaking seriously, you are right. I'll get a more modern scope if everything else fails.

But first I will try playing with the FLL. You don't know by accident if the effects
mentioned in

e2e.ti.com/.../74890

might apply for the FR4133 as well?

0 Dietmar Walther over 8 years ago in reply to Tom Dehler

TI__Genius 14165 points

Hi Tom,

I think the thread is not related to it because if a clock stops you would not get a reset and if you would get it via the WDT in such a case it would be indicated by the SYSRSTIV. That's my opinion on this.

The FLL itself only mixes DCO steps back and forth with the fact that the FR4133 has a different DCO architecture with a much better resolution.

Again checking the clock with a high resolution scope would be best. Even better if you can trace the clock with 2 scope while one is triggering on the high spike and the ohter on a low spike! If the scopes do not trigger I would expect there is nothing.

Another idea is to measure the current in a dynamic way to see if there is a drop or spike indicating the reset maybe this will also give us some good conclusion.

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Dietmar Walther said:
I think the thread is not related to it because if a clock stops you would not get a reset and if you would get it via the WDT in such a case it would be indicated by the SYSRSTIV. That's my opinion on this.

I copied the wrong link, I ment this one:

So I started monitoring CSCTL0 and it stayed the same all the time
which means that the FLL does not adjust the DCO. However, it did
not run away as in the thread mentioned...

So enabling the FLL during my very short active tasks was completly
useless. But see below:

The FLL itself only mixes DCO steps back and forth with the fact that the FR4133 has a different DCO architecture with a much better resolution.

Yes. I modified the code in a way that every 10-20 seconds one RTC
ISR call spent 1.5 ms in an LPM0 state -- giving the FLL enough
time to adjust the DCO. Now I could see the MOD bits of CSCTL0
changing each time while the DCO bits where at 0x165 most of the
time, sometimes 0x164. The TLV value is 0x161 so this is not too bad.

However, the beast still resets ;-(

Again checking the clock with a high resolution scope would be best. Even better if you can trace the clock with 2 scope while one is triggering on the high spike and the ohter on a low spike! If the scopes do not trigger I would expect there is nothing.

I will try this tomorrow...

0 Tom Dehler over 8 years ago in reply to Tom Dehler

Intellectual 755 points

Things start to become funny...

One thing I promised to try was to test how the device will react if we configure the RST pin as NMI.
So I added the corresponding code to the device's initialisation sequence:

        bis     #SYSNMIIES, &SFRRPCR
        bis     #SYSNMI, &SFRRPCR
        bis     #NMIIE, &SFRIE1

and tested it by pressing the reset button and checking that it came up with 0x02 in SYSUNIV. It did.
And it ran 32 hours without restarting until I stopped it.

So I think we can safely assume that my theory of the DCO running away slowly can eventually put
to death. Why should this depend on the way the RST pin is configured?

But it's getting even more absurd:

I started it again with RST doing NMIs. After a few hours I decided to change this on the fly.
This is possible as I have a small monitor embedded within the code. So I started the monitor
and flipped Bit 0 in 0x104 which makes the RST pin act as RESET again. And the device did not
reset...

So I added "bic #SYSNMI, &SFRRPC" a few lines after the 3 lines from above, started it, and it
resetted within 10 minutes...

No idea what's going on here. Maybe it has to fall into LPM3 at least once until I may revert
the RST pin to actually doing RESETs. It is currently running in this mode but with #SYSNMI
cleared manually by the monitor (which implies several LPM3s) to see if it restarts during the
night but I have my doubts...

0 Tom Dehler over 8 years ago in reply to Tom Dehler

Intellectual 755 points

The device is still running after 12 hours. I think this chip is just nuts. No idea what
happened to it -- maybe an ESD issue (don't think so), maybe soldered too hot (don't
think so either) or it came broken from the fab...

0 Jace H over 8 years ago in reply to Tom Dehler

TI__Guru 56685 points

Tom,

What is really telling to me about this situation, is that switching the RST line to NMI stopped your resets. This tell me that some kind of instability on the RESET line is causing issues. I was going back through the thread, and I just want to clarify the circuit attached to the RST line. You have a 1nF cap to ground as well as a 47k pull-up to VCC on this pin, correct?

0 Dietmar Walther over 8 years ago in reply to Jace H

TI__Genius 14165 points

Tom,

to understand this correct. You changed RST pin function to NMI and switched it back after you've seen the reset were gone right?

And then you started again with setting it iniitally to RST function and now it works for 10 hours? Means the failure is gone?

By the way do you operate in ESD safe environment?

Best regards,

Dietmar

0 Tom Dehler over 8 years ago in reply to Jace H

Intellectual 755 points

Jace H said:
What is really telling to me about this situation, is that switching the RST line to NMI stopped your resets. This tell me that some kind of instability on the RESET line is causing issues.

If it was some instability on the (external part) of the RESET line, why does this not appear
when it is configured as NMI? And why only with 16 MHz and not with 8 MHz? And why not if I
do not drop down to LPM3 (and stay in LPM0 instead)?

I was going back through the thread, and I just want to clarify the circuit attached to the RST line. You have a 1nF cap to ground as well as a 47k pull-up to VCC on this pin, correct?

I have a 1 nF C to GND. I have no external pullup but as I do not modify SYSRSTRE and
SYSRSTUP, the internal pullup is enabled.

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Dietmar Walther said:
to understand this correct. You changed RST pin function to NMI and switched it back after you've seen the reset were gone right?

Right. But only if I switch it back manually with the internal monitor (which takes a few
seconds: attach the USB cable, type in the keystrokes to call the monitor, toggle SYSNMI,
detach the cable). If I do it during the initialisation sequence just a few assembler
commands after the code which changed it to NMI, it does not work.

And then you started again with setting it iniitally to RST function and now it works for 10 hours? Means the failure is gone?

No, I probably expressed myself badly: It only works if the switch-back is done "a few
seconds later". No idea if it has to be "a few seconds" or it must have been at least
once in LPM3 (which is implied after 500ms) or whatever. I will play around with this
a bit but as I always have to wait a few hours to be sure, it takes some time...

By the way do you operate in ESD safe environment?

I would say so, yes. Well, I have no idea what happened during manufacturing of this
device. It is a professional business who did it -- howver, nobody knows what might
have happened to this very chip over there. Maybe I am just wasting the time of all
of us and the chip is just fried...

0 Tom Dehler over 8 years ago in reply to Tom Dehler

Intellectual 755 points

Just for the sake of completeness: The device has just been running for 23+ hours with the SYSNMI bit
cleared manually from within the monitor. I pressed the RESET button now and it restarted (of course).
I checked the log (with every restart it records the reason for the restart) and it was really a Reset event
(0x04 in SYSRSTIV) and no NMI.

So I will go on now trying to find out where I have to clear SYSNMI exactly...

0 Tom Dehler over 8 years ago in reply to Tom Dehler

Intellectual 755 points

Here are the final results: I can keep the device from restarting
if I put the following 3 commands into the initialisation part:

bis     #SYSNMIIES, &SFRRPCR
bis     #SYSNMI, &SFRRPCR
bis     #NMIIE, &SFRIE1

This switches the RST pin to NMI functionality. I can revert this
by putting

bic     #SYSNMI, &SFRRPCR

into the RTC ISR (which gets called every 0.5 seconds) and the
device will still run without restarts. If I put the "bic..." into the
initialisation code (that is, it will be run very shortly after the 3
cmds from above) restarting comes back.

I really need all 3 cmds from above. As soon as I omit the cmds
which modify SYSNMIIES or NMIIE, it does not work.

I am now at a point where I think it is safe to allege that this chip
simply has a defect...

0 Dietmar Walther over 8 years ago in reply to Tom Dehler

TI__Genius 14165 points

Hi Tom,

I really appreciate all the testing and debug you did on this case.

It's still only a single part which is showing this right. All others are running correct right?

So I think we will take this offline and contact you discuss further actions.

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

Dietmar Walther said:
It's still only a single part which is showing this right. All others are running correct right?

Right.

So I think we will take this offline and contact you discuss further actions.

I will be on vacation anyway until 1.10. Maybe I will continue or a colleague will take over (or both).

Thanks so far!

0 Tom Dehler over 8 years ago in reply to Dietmar Walther

Intellectual 755 points

This is just a note that we decided to replace the CPU and now the device is running for 24 hours without any problems...

0 Jace H over 8 years ago in reply to Tom Dehler

TI__Guru 56685 points

Thanks for the update Tom. It seems to me the issue was a possible ESD event for that particular chip. Please let us know if it happens again. If you are up for it, we can try to confirm this by placing the problem chip into a known good board and see if the issue persists. For now I'm going to be closing this post as the issue was resolved with a replacement chip. Thanks for your help and patience.

0 Tom Dehler over 8 years ago in reply to Jace H

Intellectual 755 points

Jace H said:
... we can try to confirm this by placing the problem chip into a known good board and see if the issue persists.

Well, the chip didn't survive as I had to cut the legs in order to remove it.

Thanks for helping so far -- I also think this very chip simply got broken somehow...

**Attention** This is a public forum