This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP doesn't respond to external reset, etc

Other Parts Discussed in Thread: MSP430WARE

Hello, I am having some big issues with a system using an MSP430. First, I am using some normally-open push buttons, where the GPIO port is configured as pull-up input. Usually it works okay, but sometimes after power-up the buttons do not work. It is odd because the rest of the code seems to execute okay.  I have some led's on a front panel that are driven by and I2C to GPIO chip, and they are working so that tells me that much of my code is running. I am also toggling some pins in a timer ISR and I can see them toggling on an oscope.

The second problem may or may not be related to the first. When I have the problem described above, a hardware reset (to the Reset pin on MSP) will often clear the problem and the system works okay again. But I have several boards where the hardware reset does not work at all any more, ever, even when I don't have the non-responsive push-button problem. At one time the hardware resets on these boards worked but now they don't.

Another interesting fact is that the system has an event that will trigger all init() functions to be called. After this event occurs, the push-buttons are responsive again. This is without any hardware or software reset.

I'm at a loss for what to do. I am sequentially trying different things to isolate the problem. This is the first project where I have tried using the MSP430Ware Driver Library. There have been quite a few times when I have been debugging a program with the FET and Code Composer, where the program seemed to hang. I would hit the pause button and it was often stuck in a MSP430Ware driver. So, from those observations, my first attempt will be to remove the MSP430Ware for the push buttons. I really doubt this is the issue.

All the buttons are on the same port P2 and I am initializing them all in one operation. I was also thinking that maybe their could be a problem with setting all the pins to pull up at the same time. But the datasheet indicates something like 30kOhm (don't remember exactly) and the buttons are normally closed. So I doubt this would be causing an issue too.

Hopefully someone has had similar or experience or can give me suggestions. I will post progress as I make it.

  • Ben Hagan said:
    a hardware reset (to the Reset pin on MSP) will often clear the problem and the system works okay again. But I have several boards where the hardware reset does not work at all any more, ever

    This can indicate problems in post-reset code. Very popular is oscillator failure, next one is running CPU out of spec - for example setting DCO at 16MHz while VCC is not reached voltage for safe 16MHz operation (most probably this is not your case, but anyway..). Is it possible to see your init code?

  • Ben Hagan said:
    But I have several boards where the hardware reset does not work at all any more, ever, even when I don't have the non-responsive push-button problem.

    THis (thogethwer with the other observations) seems to indicate that you are unintentionally reconfiguring your hardware configuration.

    Maybe you're writing to an uninitialized pointer or beyond array bounds, so your data gets stored in the port config registers. or something like that. One (and currently the only one that comes in mind) thing that will make hardware rests not working anymore is the reconfiguration of the RST pin for NMI.

    You should check your code for stack overflows, out-of-bounds array writes, uninitialized pointers etc.

  • Jens-Michael Gross said:
    One (and currently the only one that comes in mind) thing that will make hardware rests not working anymore

    This is explanation only when you are *sure* that reset does not work indeed - like lighting debug LED on right after reset, before executing *any* init code (just configure pin and set logical output value for LED on).

  • Ilmars said:
    This is explanation only when you are *sure* that reset does not work indeed

    I didn't say it is the explanation for the observed behaviour. I said it is the only explanation when resets don't work (which leaves the quesiton open whether resets really don't work - which is beyond my line of sight)
    As it is the prerequisite to my answer, it can be assumed to be "sure" by definition :)

  • Jens-Micahael, thank you for the response.

    Yes, this has been my fear. I've reviewed the code and I didn't see anything. I'm having another engineer peer review the code, so we'll see what comes of that. Any idea why I wouldn't see the same behavior on all units, and why it is always the same units (2 of 4 have this issue)? If it were a pointer issue, etc I would expect to see similar response on all units. I'm not using any dynamic memory, other than the stack. 

    I know the Interrupt Vectors are stored in flash memory, but where are the ISR's located after startup? I was thinking these are copied from Flash to somewhere in RAM, but  I don't know for sure. I was thinking I could hook up the debugger and check the memory location for the Reset ISR to see if it had been corrupted. 

  • Ilmars,

    I'm using the MSP430Ware Driver Library. Here is my init code. 

    void init (void)
    {

    //Stop WDT
    WDT_A_hold(WDT_A_BASE);

    status = PMM_setVCore(PMM_BASE, PMMCOREV_3);

    //Set DCO FLL reference = REFO
    UCS_clockSignalInit(UCS_BASE, UCS_FLLREF, UCS_REFOCLK_SELECT, UCS_CLOCK_DIVIDER_1);

    //Set ACLK = REFO
    UCS_clockSignalInit(UCS_BASE, UCS_ACLK, UCS_REFOCLK_SELECT, UCS_CLOCK_DIVIDER_1);

    SFRIE1 &= ~OFIE; //disable oscillator fault before configuring FLL
    // because of DCOFFG

    //! \param baseAddress is the base address of the UCS module.
    //! \param fsystem is the target frequency for MCLK in kHz
    //! \param ratio is the ratio x/y, where x = fsystem and
    //! y = FLL reference frequency.
    //~ 32*32768 = 1048576
    UCS_initFLLSettle(UCS_BASE, 1048, 32);

    }

  • Ben Hagan said:
    I'm using the MSP430Ware Driver Library. Here is my init code. 

    Only thing I can say about mspWare which I am not familiar with - function calls in your code looks neat. WDT is disabled so you will not notice init code lock-ups. As we discussed with JMG - better you have to be sure that reset does not work indeed. Try to light-up some led right after reset, before init() call.

    I had to investigate similar problem where "reset is unreliable", "sometimes nothing works". Actually uC did reset and after that executed code, only did it unexpected way - without any indications to outside world. Well, yes, it was init problem and bad program flow design too.

  • Ben Hagan said:
    Any idea why I wouldn't see the same behavior on all units, and why it is always the same units (2 of 4 have this issue)?

    It could be a racing condition. All MSPs have individual variations in timing. At startup, on the trigger level of the input gates, everywhere. Usually it doesn't make a difference, but somethimes it does. Even at places where you wouldn't expect it. Maybe a tierm interrupt happens at an unlucky moment in one unit while the slightly different timing on anothe runit lets the interrupt happen where it doesn't hurt.

    Many years ago, I had a batch of 200 two-layer PCBs for the C64 module port. One of them caused the internal RTC of the C64 to stop. There were no noticeable shorts on the PCB and it happened when it was still unpopulated (jus the FR4 and the traces). Also, there was no physical connection between the RTC-related part fo the board (actually a AC-derived 100/120Hz signal into a counter) and the module port. Also, all  other operations of the C64 (including all other functions of the chip that was counting the AC singal for RTC usage) were totally unaffected. I never figured out what was going on. I still have this PCB somewhere (marked with a big red "X").

    Another similar case: I once bought an external RTC with buffered supply for the C64. When I switched the C64 on, it stopped counting. Reproduceable. Only on this C64.
    Here I figured out the reason: to avoid the C64 suckign dry the supply, they used a diode between the C64 supply voltage and the RTC supply. When teh C64 was on, the higher supply voltage was switching the RTC in active mdoe (I/O on). Howeve,r due to the voltage drop on the diode, the supply voltage was a tad below the apparently required operating voltage with active I/O. This specific C64 did have a marginally lower +5V level than my others. So the specific variation of the 5V regulator in the C64, the specific voltage drop of the diode and the specific internal thresholds of the RTC chip made it fail. I suggested that the manufacturer should use a schottky type diode (giving another 0.5V operating voltage). They told me that they were able to reproduce the problem after I have analyzed the conditions. With every device. And thanked for the suggested solution. But apparently none of their customers (except me) did have such an unlucky combination of  parameters so far. (or those who had have just dumped the device or simply got it replaced in the shop and it worked for the one who got it next)

    The point is: the absence of an apparent failure is no proof that everything is okay. Sometimes it is just coincidence (or luck) that it works. And if you produce high-enough numbers of devices, you'll sooner or later get bad ones. Most companies just replace these parts together with those which have been badly manufactured and don't care for (or fix) flaws in the design. Unless the error count is high enough to justify the effort.
    Often enough, it isn't even detected whether it was just bad soldering or maybe a design problem.

    Ben Hagan said:
    where are the ISR's located after startup?

    Unless you explicitely told the linker (by compielr pragma) to put an ISR (or any function) in ram, it will be in rom and stay there and execute from rom. Only if you explicitely say so, the funciton code will be copied from rom to ram at startup and the interrupt vectors then point into ram.

    However, it might be that your problem is somewhere else. Maybe the voltage supply (the regulator) is weak and during flashing the firmware, the supply was so unstable that the flash wasn't programmed properly and cells start to change content after some time (and not after >10 years). Even though the immediate verification showed no errors.
    Exposure to radiation may change the content of ram cells. Unlikely, but perhaps someone uses your devices near a reactor or some radioactive lab samples :) (It has even happened that the chip material itself was emitting radiation. Most elements, including carbon, have a very small fraction of radioactive isotopes. But usually not enough to explain reproduceable failure.

**Attention** This is a public forum