This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430FR4133: How serious are correctable FRAM errors?

Part Number: MSP430FR4133
Other Parts Discussed in Thread: MSP-FET

After developing an application using the MSP430FR4133, the first 16 devices where build
and testing has begun. One of these devices showed a correctable FRAM error. I've almost
forgotten the fact that I enabled the NMI for these errors over a year ago since I never
experienced them on my two test boards.
So i wonder if this single error seen here is the famous "it just happens once in a
lifetime but now it happened to you" error or it is something serious.
I didn't find a lot of real world statistics about FRAM errors -- just the note that
soldering the devices with an improper temperature could cause problems.
Yes, it was a correctable error and only one but I've no idea if I shoudl worry about it...

  • Hello Tom,

    One correctable FRAM error is not that much of a concern. It would get more concerning if there are several or repeated errors. I would be more concerned about uncorrectable errors similar to what was found on this E2E post. e2e.ti.com/.../575920

    For more information about the reliability of our FRAM, please see the following application note. It contains more information about error rates and potential fallout (very low). http://www.ti.com/lit/slaa526
  • Hi,

    thanks for answering. I also was under the impression that one correctable error every now and then
    is not of much concern. Will try to explain the whole story:

    As I said there were 16 devices built. The firmware used on them is in a quite good shape -- it ran
    on 2 different testing devices for months without problems. Of these 16 devices now built, 14 run
    perfectly, one showed this single correctable error and the 16th is driving me nuts:

    It restarts about 5 to 10 times a day. It starts at the entry point of the whole firmware -- this
    it where the interrupt sources "System Reset (0xFFFE)" points to. Nothing else points to this
    location and I added some debug counters and verified that this place really gets hit with each
    restart.
     
    I programmed quite a comfortable post-mortem analysis code which records, among other data, the
    contents of the SYSRSTIV register so I can figure out the reason for restart. This is proven to
    work as pushing the Reset button leaves 0x0004 there and a software POR gives me a 0x0014.
    Interestingly, for about 90% of all these unwanted crashes, SYSRSTIV is 0x0000 but the debug
    counter gets incremented so I can be sure it was caused by the "System Reset" vector at 0xFFFE.
    I don't understand this as each restart should have a corresponding interrupt event which is
    different from 0x0000.
     
    The other 10% of these unwanted crashes are uncorrectable FRAM bit errors as I can see by the
    value 0x001C left in SYSRSTIV.

    I've had read your PDF about FRAM reliability before so currently my only conclusion would be that
    the processors were affected somewhere or somehow. Maybe the soldering was inappropriate
    (too hot / too long) or the sealed packs where opened to long before soldering or whatever.
     
    Maybe I should replace the CPU of that crashing device and see if it performs better then...
     
    Any ideas anybody?
     
    Thanks!

  • Tom,

    If you have the ability to do a chip swap, that could give you a valuable data-point. Also, for you have a pull-down on the TEST pin as recommended in errata text for PORT28 errata? I could see accidentally entering a JTAG/Debug mode causing some resets.
  • Yes, I can replace the chip. However, then all debugging options regarding
    this issue will be gone ;-)
     
    Stupid question: I've read PORT28 before but I wonder if, when using whatever
    pulldown, won't there be a continuous reset and the device will never start?
    Currently, RST_NMI_SBWTDIO is connected to a 1 nF capacitor against GND, a reset
    switch and (using a short trace) to the pads for a TAG-Connect tool. SFRRPCR is
    in it's default state (0x001C) so currently the pullup is active.

  • Tom,

    The pulldown should be placed on the TEST/SBWTCK pin. The RST/NMI/SBWTCK should only have the recommended 1nf cap to ground.
  • OK, now I got it:

    SFRRPCR.SYSRSTRE normally should only control the pull resistor of RST/NMI/SBWTCK .
    But, due to a bug, it also controls the pulldown of TEST/SBWTCK. Therefore PORT28
    recommends to leave SFRRPCR.SYSRSTRE set to 1 so we can be sure, that TEST/SBWTCK
    gets pulled down. But this way we enforce the pull resistor of RST/NMI/SBWTCK to be
    enabled as well so one has to use SFRRPCR.SYSRSTRUP to ensure this resistor is set up
    in a way that it doesn't hurt.

    As my SFRRPCR is in its default state (0x001C) I should be fine regarding this. However,
    I will additionally tie TEST/SBWTCK to ground and see what happens.
    Will report later, thanks so far!
  • Jace H said:
    The pulldown should be placed on the TEST/SBWTCK pin.

    OK, here are the results:

    First, just to be sure, I checked the internal pulldown and this gave
    me approx. 32 k in both directions.
     
    Next step was adding an additional 10k resistor to GND. Didn't
    help -- the first restart came after a few minutes.

    Then I tried pulling it directly to GND. Same result.
     
    And now comes the fun: Whenn pulled to Vcc the beast runs for
    24 hours without any restarts. Of course, the reset pin doesn't
    work here since applying Vcc to TEST enables the JTAG which in
    turn disables the external reset function.

    So one explanation could be that this chip is somewhat messed
    up and generates resets internally without real reasons. This
    also could explain why SYSRSTIV is zero in 90% of all cases        
    and why this behaviour isn't seen on the 15+2 other devices.
    By enabling TEST the reset function (including the errorneous
    parts) is disabled.                 

  • Hi Tom,

    Jace invited me to this thread.
    Did you use these parts in an ESD safe environment or did you operate them on your office desk without any ESD protection.
    The 16th device which is showing resets without a value (0x0000) in the SYSRSTIV is strange. Even more it looks like damaged if you pull up TEST pin and enable JTAG mode.
    Can you do me a favour and measure LPM4 current of this device with all ports to output low and high to see if we can identify leakage on this part?

    Not sure if already asked but I think you set up the wait states according your operating frequency to ensure your operating in spec right?
  • Dietmar Walther said:
    Hi Tom,
    Jace invited me to this thread.


    Welcome ;-)


    Did you use these parts in an ESD safe environment or did you operate them on your office desk without any ESD protection.

    Well, for me I can say "yes" and for the manufacturer I'd say it as well (they
    manfactured and are still manufacturing the predecessor of this device with the
    MSP430P325A for ages).
     

    The 16th device which is showing resets without a value (0x0000) in the SYSRSTIV is strange. Even more it looks like damaged if you pull up TEST pin and e
    nable JTAG mode.

    But isn't the behavior to still run with an enabled but not actively used JTAG normal?
    Additionally, I use the JTAG (SBW, to be exact) all the time to upload software with
    the MSP-FET and MSPFlasher on this device...
     

    Can you do me a favour and measure LPM4 current of this device with all ports to output low and high to see if we can identify leakage on this part?

    Well, that's difficult: Almost all ports are in use. Static low should be possible
    (will have to check) but static high will produce bad effects which will make any
    current measurement void.
    But I can say this so far: The software goes down to LPM3. The usual current of
    the whole(!) system in this case is around 3 uA (LCD, RTC,...) . This was measured
    using a Keithley 2k DMM. This current is correct on the 15+2 other devices and
    on this one as well.

    Well, in case we decide to rip the whole part off the PCB I could cut all pins
    apart from power, xtal, SBW,... and try that LPM4 measurement.

     

    Not sure if already asked but I think you set up the wait states according your operating frequency to ensure your operating in spec right?

    I run 16MHZ with NWAITS_1. But that brings me to the idea to try something slower,
    just to see if it helps...

  • Tom,

    thanks so
    - you operating in ESD safe environment
    - you LPM3 current does indicate leakage means there seems to be no critical damage ( I think no LPM4 with all GPIO output low is required here)
    - 16MHz with NWAITS_1 is according spec but please try slower to exclude this

    And yes you're right enabling JTAG does not necesarily stop the device as long as you do not apply sequences to the JTAG pins. So are you JTAG pins terminated while you do this?

    Do I understand this right that you still can reproduce the behavior and are able to work around it?

    Another thing coming into my mind is a STACK violation. Are you sure your STACK cannot be corrupted by your firmware or data management?
    Can you reproduce the behavior also with debugger connected?
  • Tom,

    This is peculiar to say the least. Two other tests I think you should run.
    1) Set RESET to NMI functionality and see if the same resets happen.
    2) Perform an A-B-A chip swap. this would mean replacing the current misbehaving device with a known good chip, and placing the misbehaving device onto a known good board. From here we observe does the issue follow the chip, or the board.
  • Dietmar Walther said:
    Tom,
    thanks so
    - you operating in ESD safe environment
    - you LPM3 current does indicate leakage means there seems to be no critical damage ( I think no LPM4 with all GPIO output low is required here)
    - 16MHz with NWAITS_1 is according spec but please try slower to exclude this


    Went up to NWAITS_7 -- still resetting. Currently I am running it
    with 8 MHz and it did NOT reset yet. But this means nothing as
    it's running for 18 minutes only. Will have to wait until tomorrow...


    And yes you're right enabling JTAG does not necesarily stop the device as long as you do not apply sequences to the JTAG pins. So are you JTAG pins termin
    ated while you do this?

    Well, all pins I do not use (just 4) are pulled down internally.
    The reset pin sees the typical 1nF and nothing else.


    Do I understand this right that you still can reproduce the behavior and are able to work around it?

    In order to reproduce it I have to:
    - Leave TEST/SBWTCK open (internal pulldown) or GND it
    - Wait. Sometimes 2 minutes, sometimes 2 hours. And all in between.

    I have no workaround for it apart from pulling TEST/SBWTCK high!


    Another thing coming into my mind is a STACK violation. Are you sure your STACK cannot be corrupted by your firmware or data management?

    Well, I have 17 devices running without problems. One for weeks and
    another one for months. Software is all assembler, no libs, no foreign
    code, no funky compiler bugs. I don't need a lot of stack as I don't do
    pushs and pops. Just some function calls and interrupts. I have a stack
    guard sitting at the bottom (and below this there is still unused RAM)
    and it never got overwritten.


    Can you reproduce the behavior also with debugger connected?

    I don't have a debugger, just an MSP-FET which is only used for upload
    and download with MSPFlasher. Currently, the device runs unconnected
    but I can retry with an MSP-FET connected (doing nothing, of course).

  • I will try 1) tomorrow (or as soon as the 8 MHz test has failed ;-))

    Regarding 2): I can replace the chip but I have my doubts that the currently misbehaving
    one will survive. I assume that, as soon as I replace it, the board will behave properly.
    It's no fancy design -- two layers (bottom almost entirely GND), some 0805 parts, some
    SOT-23, a 24LC512, an FTDI for USB and an LCD. Easy to check optically and electrically
    and manufactured together with the other 15 (good) devices...
  • Tom, Jace,

    thanks for all the inputs up to now I have not really an idea what's causing these resets based on the information we gathered so far.

    The A-B-A swap is dangerous because it could heal out effects by solder heat therefore I would vote to do this as a last option.

    Assuming it is related to memroy due to the fact that with 8 MHz and NWAIT_7 it seems to work we shoudl be able to see this with the debugger connected as well and maybe it is possible to idnentify the location where it happens.

    Tom,

    when you have the FET your already have the HW, do you also have the complete project and the source code for IAR or CCS so that you can let it run in an IDE to set breakpoints and debug the project. By doing this we might be able to gather more information.

  • Dietmar Walther said:

    The A-B-A swap is dangerous because it could heal out effects by solder heat therefore I would vote to do this as a last option.


    Yes, will do this only when all other options are gone


    Assuming it is related to memroy due to the fact that with 8 MHz and NWAIT_7 it seems to work we shoudl be able to see this with the debugger connected as
     well and maybe it is possible to idnentify the location where it happens.

    Wait, just to be clear: The test with NWAIT_7 was done with 16 MHz and failed!
    The 8 MHz test (w/o WAITS) is not yet completed.
    In fact, today in the morning it was running for 9 hours already -- something
    I've never seen with 16 MHz. I didn't touch it and we'll see what will have
    happened later in the afternoon today. If it survives with 8 Mhz, I will
    switch back to 16 again and see if the reboots come back (they probably will
    but it's better to be sure).

    Additionally, the test what happens when RESET is configured to cause NMIs      
    has to be done as well...
     


    when you have the FET your already have the HW, do you also have the complete project and the source code for IAR or CCS so that you can let it run in an
    IDE to set breakpoints and debug the project. By doing this we might be able to gather more information.

    Never touched IAR or CCS -- I am using a combination of cpp, gas (and other
    binutils) and srec_cat with MSPFlasher. And IAR or CCS won't run on my
    platform.

  • With a clock of 8 Mhz (no Waitstate) the system was running 26 hours without
    resets. Then I switched back to 16 MHz (one Waitstate) and the first reset
    came after a few minutes.
     
    I have no idea what could be wrong. The power supply should be OK: One 1uF
    sitting about 5 mm away from the chip, next comes a 10 uF and the TPS78233DDC
    regulator. Vcc has got short and thick traces and GND is a GND plane anyway.
    The chip doesn't have to drive a lot: One LCD, a 24LC512 EEPROM, a few
    MOSFETs and thats all. I have also checked the 3.3 Volts with a fast scope
    -- no glitches or noise, just a flat line.

    When enabling the 16 MHz, the DCO is loaded from the TLV values and the
    lower 9 bits of CSCTL0 jump between 164 and 165. That's not perfect but OK.

    If time permits, I will play with the NMI function tomorrow and also enable
    DCOFTRIMEN and set DCOFTRIM to 4 (this brings CSCTL0 closer to 256) just
    to see if things change here...

  • I am getting closer. First some details which are needed to understand
    what's going on:

    The device uses an USB bus powered FTDI USB<->RS232 converter to transfer
    data to a PC. I noticed that these spurious resets do NOT happen, when the
    USB was connected (for normal operation it is not connected).

    During normal operation, the device goes to LPM3 and is woken up by the
    RTC ISR every 500 ms. It does some stuff for 10-100 us , then it goes back
    to LPM3.

    However, the FTDI's 3.3 Volts used for its interface logic is also fed to
    a port pin of the MSP. This way the MSP can detect if the user has (dis)
    connected the USB port and perform some appropriate action.

    I am using a high baudrate which makes it necessary to use SMCLK for the
    UART. So one of the tasks we have to deal with when the USB port is in use,
    is to prevent the device from entering LPM3 -- the RTC ISR looks like this:

    RTC_Interrupt:
      do some stuff
      if PC_connected then
        prepare LPM0
      else
        prepare LPM3
      fi
      reti

    This way the UART is available with its high baudrate as soon as the PC has
    been connected.

    When I noticed that these spurious resets were gone as soon as the USB was
    connected, I disconnected all lines from the FTDI to the MSP to see what's
    happening. And as soon as I pulled off the above mentioned 3.3 Volt
    USB-is-present line, the device started to reset again after a few minutes
    even though the USB port was connected.

    The logical consequence was of course to self-supply the MSP with it's own
    3.3V on this port pin (so it assumes the USB port was connected and prevents
    it from entering LPM3). And now it runs for hours without reset (in LPM0 of
    course).

    So we can assume that something is happening with this device (but not the
    17 others) in LPM3. Now I only have to figure out what...

  • Hi Tom,

    good that you make progress but I think I have to think a bit more in detail what you said but this triggers some more questions:
    1. how the device is powered?

    2. could it somehow be that you remove primary supply (DVCC) and still supply any GPIO pin with another voltage? By this you would supply the device via the ESD rail backdoor which is not allowed.

    Best regards,
    Dietmar

  • Dietmar Walther said:
    1. how the device is powered?

    Vcc is good and stable (dedicated tps78233, close to the dvice and
    sufficent capacitors).

    2. could it somehow be that you remove primary supply (DVCC) and still supply any GPIO pin with another voltage? By this you would supply the device via the ESD rail backdoor which is not allowed.

    I first thought in the same direction (assuming something in this
    very chip is broken w.r.t. power distribution and that it gets
    supplied via some port pins in the affected areas). But since I
    changed jumping into LPM3 at the end of the ISR to switching only
    into LPM0 it works even without USB (as this is the same what it
    would have done with USB).

    Given the fact that 8 MHz works and 16 MHz works only if we do not
    drop into LPM3 makes me think that FLL / DCO might go insane on
    this chip and it is slowly drifting out of the specs so it does
    these funny things (like resetting w/o a vector or occasionally
    spitting out FRAM errors).

    This could be due to my usage pattern:

    The cpu is in LPM3 all the time. Every 500 ms the RTC ISR wakes
    it up. The task it has to do is normally quite short: between
    1 and 4 ACLK cycles (that is something between 30 and 120 us).
     
    It is currently configured with the FLL ON during these times.
    However, I read in other posts that there are CPUs where the
    FLL needs a longer time to stabilise the DCO properly and if it
    doesn't get this time, it's going nuts.
     
    I probably have to try somehting in this direction if no one has
    a better idea... 

  • Tom,

    ok got it so going into your direction with the clocks can you measure them with a scope in the critical conition and can provide the scope shots?
  • Well, I played with this idea already. Fortunately, P8.0 is free so I can enable SMCLK here.
    OTOH, it somtimes takes one hour to crash so I would have to sit one hour staring at the
    scope to see what happens ;-). Well, I could try to check if trace buffers are big enough to
    capture SMCLK for enough time so I put it into some pretrigger mode which actually gets
    hit on the device's reset and then scroll back to see what had been...

    My current approach is to disable FLL stabilisation entirely just for testing purposes -- as
    long as I do not change Vcc (this will not happen) or the temperature, this should not hurt.
  • Yeah you're right but you can use the pulse width trigger and trigger to a lower pulse witdth as you actual clock and let it run.
    I mean assuming the clock is causing it due to whatever I would expect it goes to fast. But might be that you have to run it twice triggering for high and for low pulse (spike).

    But for sure you can disable FLL as well but watching the clock gives you more confidence that the clock signal looks clean.

    Another thing is that you might implement port toggles before all the power mode changes to trace with a scope when the device stops. This might help us to better understand when it happens.
  • Trigger on pulse widths? Wonder how to teach my 1988 Gould scope to do this ;-)
    No, speaking seriously, you are right. I'll get a more modern scope if everything else fails.

    But first I will try playing with the FLL. You don't know by accident if the effects
    mentioned in

    e2e.ti.com/.../74890

    might apply for the FR4133 as well?
  • Hi Tom,

    I think the thread is not related to it because if a clock stops you would not get a reset and if you would get it via the WDT in such a case it would be indicated by the SYSRSTIV. That's my opinion on this.

    The FLL itself only mixes DCO steps back and forth with the fact that the FR4133 has a different DCO architecture with a much better resolution.

    Again checking the clock with a high resolution scope would be best. Even better if you can trace the clock with 2 scope while one is triggering on the high spike and the ohter on a low spike! If the scopes do not trigger I would expect there is nothing.

    Another idea is to measure the current in a dynamic way to see if there is a drop or spike indicating the reset maybe this will also give us some good conclusion.

  • Dietmar Walther said:
    I think the thread is not related to it because if a clock stops you would not get a reset and if you would get it via the WDT in such a case it would be indicated by the SYSRSTIV. That's my opinion on this.

    I copied the wrong link, I ment this one:

    So I started monitoring CSCTL0 and it stayed the same all the time
    which means that the FLL does not adjust the DCO. However, it did
    not run away as in the thread mentioned...

    So enabling the FLL during my very short active tasks was completly
    useless. But see below:

    The FLL itself only mixes DCO steps back and forth with the fact that the FR4133 has a different DCO architecture with a much better resolution.

    Yes. I modified the code in a way that every 10-20 seconds one RTC
    ISR call spent 1.5 ms in an LPM0 state -- giving the FLL enough
    time to adjust the DCO. Now I could see the MOD bits of CSCTL0
    changing each time while the DCO bits where at 0x165 most of the
    time, sometimes 0x164. The TLV value is 0x161 so this is not too bad.

    However, the beast still resets ;-(

    Again checking the clock with a high resolution scope would be best. Even better if you can trace the clock with 2 scope while one is triggering on the high spike and the ohter on a low spike! If the scopes do not trigger I would expect there is nothing.

    I will try this tomorrow...

  • Things start to become funny...

    One thing I promised to try was to test how the device will react if we configure the RST pin as NMI.
    So I added the corresponding code to the device's initialisation sequence:

            bis     #SYSNMIIES, &SFRRPCR
            bis     #SYSNMI, &SFRRPCR
            bis     #NMIIE, &SFRIE1
    

    and tested it by pressing the reset button and checking that it came up with 0x02 in SYSUNIV. It did.
    And it ran 32 hours without restarting until I stopped it.

    So I think we can safely assume that my theory of the DCO running away slowly can eventually put
    to death. Why should this depend on the way the RST pin is configured?

    But it's getting even more absurd:

    I started it again with RST doing NMIs. After a few hours I decided to change this on the fly.
    This is possible as I have a small monitor embedded within the code. So I started the monitor
    and flipped Bit 0 in 0x104 which makes the RST pin act as RESET again. And the device did not
    reset...

    So I added "bic #SYSNMI, &SFRRPC" a few lines after the 3 lines from above, started it, and it
    resetted within 10 minutes...

    No idea what's going on here. Maybe it has to fall into LPM3 at least once until I may revert
    the RST pin to actually doing RESETs. It is currently running in this mode but with #SYSNMI
    cleared manually by the monitor (which implies several LPM3s) to see if it restarts during the
    night but I have my doubts...

  • The device is still running after 12 hours. I think this chip is just nuts. No idea what
    happened to it -- maybe an ESD issue (don't think so), maybe soldered too hot (don't
    think so either) or it came broken from the fab...

  • Tom,

    What is really telling to me about this situation, is that switching the RST line to NMI stopped your resets. This tell me that some kind of instability on the RESET line is causing issues. I was going back through the thread, and I just want to clarify the circuit attached to the RST line. You have a 1nF cap to ground as well as a 47k pull-up to VCC on this pin, correct?

  • Tom,

    to understand this correct. You changed RST pin function to NMI and switched it back after you've seen the reset were gone right?

    And then you started again with setting it iniitally to RST function and now it works for 10 hours? Means the failure is gone?

    By the way do you operate in ESD safe environment?

    Best regards,

    Dietmar

  • Jace H said:
    What is really telling to me about this situation, is that switching the RST line to NMI stopped your resets. This tell me that some kind of instability on the RESET line is causing issues.

    If it was some instability on the (external part) of the RESET line, why does this not appear
    when it is configured as NMI? And why only with 16 MHz and not with 8 MHz? And why not if I
    do not drop down to LPM3 (and stay in LPM0 instead)?

    I was going back through the thread, and I just want to clarify the circuit attached to the RST line. You have a 1nF cap to ground as well as a 47k pull-up to VCC on this pin, correct?

    I have a 1 nF C to GND. I have no external pullup but as I do not modify SYSRSTRE and
    SYSRSTUP, the internal pullup is enabled.

  • Dietmar Walther said:
    to understand this correct. You changed RST pin function to NMI and switched it back after you've seen the reset were gone right?

    Right. But only if I switch it back manually with the internal monitor (which takes a few
    seconds: attach the USB cable, type in the keystrokes to call the monitor, toggle SYSNMI,
    detach the cable). If I do it during the initialisation sequence just a few assembler 
    commands after the code which changed it to NMI, it does not work.

    And then you started again with setting it iniitally to RST function and now it works for 10 hours? Means the failure is gone?

    No, I probably expressed myself badly: It only works if the switch-back is done "a few
    seconds later". No idea if it has to be "a few seconds" or it must have been at least
    once in LPM3 (which is implied after 500ms) or whatever. I will play around with this
    a bit but as I always have to wait a few hours to be sure, it takes some time...

    By the way do you operate in ESD safe environment?

    I would say so, yes. Well, I have no idea what happened during manufacturing of this
    device. It is a professional business who did it -- howver, nobody knows what might
    have happened to this very chip over there. Maybe I am just wasting the time of all
    of us and the chip is just fried...

  • Just for the sake of completeness: The device has just been running for 23+ hours with the SYSNMI bit
    cleared manually from within the monitor. I pressed the RESET button now and it restarted (of course).
    I checked the log (with every restart it records the reason for the restart) and it was really a Reset event
    (0x04 in SYSRSTIV) and no NMI.

    So I will go on now trying to find out where I have to clear SYSNMI exactly...
  • Here are the final results: I can keep the device from restarting
    if I put the following 3 commands into the initialisation part:

    bis     #SYSNMIIES, &SFRRPCR
    bis     #SYSNMI, &SFRRPCR
    bis     #NMIIE, &SFRIE1
    

    This switches the RST pin to NMI functionality. I can revert this
    by putting

    bic     #SYSNMI, &SFRRPCR
    

    into the RTC ISR (which gets called every 0.5 seconds) and the
    device will still run without restarts. If I put the "bic..." into the
    initialisation code (that is, it will be run very shortly after the 3
    cmds from above) restarting comes back.
     
    I really need all 3 cmds from above. As soon as I omit the cmds
    which modify SYSNMIIES or NMIIE, it does not work.
     
    I am now at a point where I think it is safe to allege that this chip
    simply has a defect...

  • Hi Tom,

    I really appreciate all the testing and debug you did on this case.

    It's still only a single part which is showing this right. All others are running correct right?

    So I think we will take this offline and contact you discuss further actions.

  • Dietmar Walther said:
    It's still only a single part which is showing this right. All others are running correct right?

    Right.

    So I think we will take this offline and contact you discuss further actions.

    I will be on vacation anyway until 1.10. Maybe I will continue or a colleague will take over (or both).

    Thanks so far!

  • This is just a note that we decided to replace the CPU and now the device is running for 24 hours without any problems...
  • Thanks for the update Tom. It seems to me the issue was a possible ESD event for that particular chip. Please let us know if it happens again. If you are up for it, we can try to confirm this by placing the problem chip into a known good board and see if the issue persists. For now I'm going to be closing this post as the issue was resolved with a replacement chip. Thanks for your help and patience.
  • Jace H said:
    ... we can try to confirm this by placing the problem chip into a known good board and see if the issue persists.

    Well, the chip didn't survive as I had to cut the legs in order to remove it.

    Thanks for helping so far -- I also think this very chip simply got broken somehow...

**Attention** This is a public forum