This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Stack overflow with static variables

Hi guys,

I've got quite strange problem with the stack. All my variables are defined as a static, stack size is far from the value which is really needed, still IAR shows warrning message when exiting a debug mode. Of course, program goes wrong after few seconds of running.

When debuging I meet couple of strange things, which could help to identify the problem:

1/ There is a loop in my SW like for(iter=0;iter<100;iter++) { ... a code ...}

The value of variable iter is increasing by one each cycle, but durring the cycle var iter has also a value like 4635 ... it's like: 1 2 3 4   4635   5 6   4635   7 8 9 10 11 12...etc.

2/ I'm using SPI. Buffer is filled correctly but after while  I start to recieve corrupted data. It fixed after while ... correct data are recieved ... and then again, bad data.

 

Has anybody met something like this.

Thanks a lot

Alex.

 

HW and SW details:

MCU : 430F2410 (4Kb RAM)

MCU goes to LPM3

stack size: 200, heap 50 (but there is no dynamic allocation, so it could be zero may be?)

3 800 bytes of CODE  memory
   540 bytes of DATA  memory (+ 45 absolute )
     8 bytes of CONST memory

ps. excuse my language, I'm not native speaker [;)]

  • Alex,

    yes the stack monitor plugin can be a powerful tool, but one should exactly understand how it works (see IAR User's Guide) so that the results can be interpreted correctly. Basically, it is filling the stack with a known value (0xCD) before any code is executed, and then checks how many of these bytes are left intact after you stop code execution.

    For your scenario, a couple of quick ideas:

    1. Are you using nested interrupts? This could be a problem if not managed carefully from a system perspective.
    2. Maybe your static variables get corrupted by some means other than the stack? For example, some pointer that is going crazy? Some array that is accessed out of its bounds? C is a very powerful language but also very dangerous.
    3. Set a breakpoint at the reset entry point before the C startup code runs (at "?cstart_begin"). Then, fill the entire RAM with an arbitrary value such as 0xAD. Run your program. After a while, stop it. Now inspect the memory view. You should be able to exactly tell which areas of memory got overwritten (the contents is not 0xAD anymore). Some will have gotten overwritten intentionally by your variable allocations. Others will have gotten overwritten by stack activity. Can you see the stack growing outside it's allocated range? (Top of RAM minus 200 in your case). You should still see 0xCDs at the bottom of the stack as a result of the operation of IAR's stack plugin (="unused stack"). And for any typical application there should still be a nice amount of 0xADs being left intact in the middle between were your variables end and where the stack is (="unused RAM").

    Regards,
    Andreas

  • You are using c, which will automatically use the stack the way it sees fit. You may (a) blindly acept whatever c compiler decides to do with the stack, (b) try to trist its arms to do it the way you want, or (c) forget about c.

  • Hi there,

       well,I see that I must describe my problem little bit more. I've got watch xtal as a ACLK, BaudRate Xtal (cca 1MHz) as XT2, DCO not set. I use couple of interrupts SPI, UART RX, UART TX, PORT2, AD12, TimerA0. Main loop goes to LPM3 and is woken up by Uart RX ISR, SPI ISR, and TimerA0 ISR. The period of TimerA0 ISR is 10ms. Everything works fine but SPI ISR. This interrupt completelly destroy RAM data, it overwrite dozens of adrresses and corrupt variables.

    It's special kind of problem. It's nothing common. I've got 6 years of SW developping for industrial devices, so there's no question about probems like stack overflow, bad pointer etc. Something just demage RAM data and I don't what is that. It could be something wrong with exiting LPM3, may be? I don't really know. I implemented control algorithm on diferent MCU by Atmel and it works. My boss told me to leave MSPs and start using Atmel, but I don't like it [:D]

    If anybody has met something like this, please write me. It could be also connected with many interrupts routines, but I must say again it's not stack overflow issue really. There is just warrning message in IAR workbench saying "Stack overflow" but it's not, I'm 100% sure. If I disable SPI interrupt, everything works fine. SPI ISR is very short. Just like read buffer and LPM3exit.

    Any help is welcomed.I'm really confused by the behaviour of MCU.

    Thank you.

    Alex.

  • It seems that you have plenty of unused RAM. You could easily allocate more RAM to the stack and see if that can get ride of the warning about stack usage.

    The list of known bugs in F2410 is twenty-some items long. None of them seems to be related to what you described. I think it is unlikely that you found a new bug in F2410.

    The way MSP430 enters/exits LPMs and some of the other IAR “intrinsic” are quite different from other uC. IAR debugger is also not very robust in handling LMP and peripherals in real-time.

    If you like, you could post your source code in your file area here and I will take a look at it.

  • Yes, you're right. I've got lot of RAM for stack. I dedicated 3k for stack but it has no influence on my problem. Of course, I'm not allow to provide you whole code, but I can provide you some essetials of the code.

     

    #include <stdlib.h> //only standard library

    static unsigned long RTC=0 // and couple of static variables - u_int , u_char, u_long

    volatile unsigned char var2 //some of variables needs to be volatile

    main()

    {

      WDTCTL = WDTPW + WDTHOLD;                     // Stop WDT for now
     
      DCOCTL |= DCO0 + DCO1;
      BCSCTL1 = 0x00;
      BCSCTL2 |= SELM_2 + SELS;                     // XTL2 as a MCLK, XTL2 as a SMCLK 

      BCSCTL3 |= XCAP_2 + XT2S_1;                // 10pf for XTL1, XTL2 1-3MHZ
      //the diagnostic of OSC
      do
      {
      IFG1 &= ~OFIFG;                            // Clear OSCFault flag
      for (int i = 0xFFFF; i > 0; i--);          // Time for flag to set
      }
      while ((IFG1 & OFIFG));                 // OSCFault flag still set?

    /* RTC setting */                              //TimerA0 is dedicated for simple SW RTC
      // 1 tick is to be  ~ 10ms 
      CCTL0 = CCIE;                              // CCR0 interrupt enabled
      CCR0 = 328 - 1;                           // 1s=32786-1
      TACTL = TASSEL_1 + MC_1;                 // ACLK + up-mode

    //USART0 - UART mode
    //Setting for 1.843MHz Xtal on XTL2
      UCA0CTL0 = 0x00;                            // 8-bit character
      UCA0CTL1 |= UCSSEL1;
      UCA0BR0 = 0x10;                             // 1.843200MHz  - 0xC0=9600, 0x20=57600, 0x10=115200
      UCA0BR1 = 0x00;                             //
      UCA0MCTL = 0x00;                            // modulation
      UCA0CTL1 &= ~UCSWRST;                       // Initialize USART state machine
      IE2 |= UCA0RXIE;                            // Enable USART0 RX interrupt
     
    //USART1 - SPI mode
      UCB1CTL0 |= UCCKPL + UCCKPH + UCMSB;              
      trash = UCB1RXBUF;
      UCB1CTL1 &= ~UCSWRST;                         // Initialize USART state machine
      IFG2 &= ~ 0xF0;
      UC1IE |= UCB1RXIE;                           // Enable USART0 RX interrupt

    while(1)

        {

        there's no need to be anything still it works wrong

      LPM3, GIE etc. ...

        }

    }

     

    // Timer_A0 Interrupt Vector handler
    #pragma vector=TIMERA0_VECTOR
    __interrupt void Timer_A(void)
    {   
      __bic_SR_register_on_exit(GIE);  //important for RTC

      RTC++;                                    

      P6OUT ^= 0x08;

       __bic_SR_register_on_exit(LPM3_bits);
       __bis_SR_register_on_exit(GIE);
    }

     

    // USART0 UART Mode Interrupt Vector handler
    #pragma vector=USCIAB0RX_VECTOR
    __interrupt void usart0_rx (void)
    {   
      RTS_HIGH; 
     Buffer[xy] = UCA0RXBUF;                          // RXBUF0
      RTS_LOW;
     LPM3_EXIT;
    }

    // USCI A0/B0 Transmit ISR
    #pragma vector=USCIAB0TX_VECTOR
    __interrupt void USCI0TX_ISR(void)
    {
      UCA0TXBUF = Buffer[iter++];                 // TX next character
     
      if (iter >= Tx_Length)                  // TX over?
      {
        IE2 &= ~UCA0TXIE;                       // Disable USCI_A0 TX interrupt
        iter=0;
        Tx_Length=0;
      }
    }

    // SPI Interrupt Vector handler
    #pragma vector=USCIAB1RX_VECTOR
    __interrupt void SPI1_rx (void)
    {
        buf_spi[WR_point] = UCB1RXBUF;
        SPIbuf_WR_point++;
        if (WR_point >= SPI_buffer_size) WR_point = 0;
        LPM3_EXIT;
    }

    // Port 2 interrupt service routine
    #pragma vector=PORT2_VECTOR
    __interrupt void Port_2(void)
    {

      var++;
      P2IFG &= ~0x04;                           // P2.2 IFG cleared
    }

    #pragma vector=ADC12_VECTOR
    __interrupt void ADC12ISR (void)
    {
        temp = ADC12MEM0;                       // Move results, IFG is cleared
        _BIC_SR_IRQ(CPUOFF);                    // Clear CPUOFF bit from 0(SR)
        asm("NOP");
    }

     

    The code is like above. Of course, there is some more code and libraries for special kind of algorithm, but even if I disable this functionality the MCU go down or get lost. I've also made my code as simple as possible to find out more. But if I use SPI ISR, UART ISR and TIMERA ISR together, it's bad.

    Thank's for your ideas. I don't have any clue really ...

    Alex.

  • Alex,

    thanks for trying to boil this down, I know it's not always easy. Glancing over your code, a few quick comments:

    1. In such "strange" debug cases it's always good to make sure your device doesn't RESET. Set a breakpoint at the beginning of the code, for example at the line that disables the Watchdog timer. Run your code. Does the breakpoint get hit when it shouldn't? If so then that's your problem that needs to get investigated.
    2. Another good debug rule is to comment out the section that configures the clock system. Basically, leave all clock registers in their default configuration. There is always a chance that you accidently "overclock" the device by violating the "Max. allowed system frequency vs. voltage" specification in the datasheet.
    3. I saw you enable nested interrupts inside Timer_A(), probably unintentionally. On the MSP430, when you enter an ISR, GIE gets cleared automatically (no nesting). However you explicitly disable interrupts (that's redundant), but then just before you leave the ISR you enable them. If now an interrupt is pending it will get nested. You should remove both lines that clear and set GIE.

    Regards,
    Andreas

  • You said: "...I dedicated 3k for stack but it has no influence on my problem..."

    Yes, that will not solve your problem. But IAR will fill those 3k with 0xCD before it calls your main(). This might help you to see which part of RAM are changed. And you should not get the stack warning msg anymore.

    I think the real problem is your improper use of the IAR intrinsics: _BIC_SR_IRQ(..); __bis_SR_register_on_exit(...); __bic_SR_register_on_exit(...); LPM3; LPM3_EXIT; etc

    I suggest that you stay away from any LPM and make your code run correctly first. Do not use any of these intrinsics for the time being. The only exception is, after all peripherals are set up and when your code is ready to handle interrupt, do a _BIS_SR(GIE);// enable GIE

    BTW,

    (a) IAR may take the liberty to totally ignore the line:

    for (int i = 0xFFFF; i > 0; i--);          // Time for flag to set

    To avoid this, you could declare int i to be volatile

    (b) If I were you, I would move the line:

    BCSCTL2 |= SELM_2 + SELS;                     // XTL2 as a MCLK, XTL2 as a SMCLK 

    after the do ... while loop.

     

  • Thanks Andreas,

    ad1/ I forgot to write it. Of course I try to find out if it goes through the RESET. I've got a LED diod indicating RESET and it doesn't go through the RESET. It starts to behave crazy but without any reset.

    ad2/ I've got just 1.8MHz and watch Xtal. I know exactly what you mean by writing that but I'm in very safe area of characteristic that you probably think. I try to avoid any clock setting but clock looks good as I let them go out of MCU and watch them on my OSC.

    ad3/ Ok. I just didn't know how it's solved inside MCU so therefore GIE control. It's just one my try how fixed the problem but nothing changed.

    Alex.

    Ps. I send some more details later. It looks like hazard signals on some port when enabling SPI ...

  • // SPI Interrupt Vector handler
    #pragma vector=USCIAB1RX_VECTOR
    __interrupt void SPI1_rx (void)
    {
        buf_spi[WR_point] = UCB1RXBUF;
        SPIbuf_WR_point++;
        if (WR_point >= SPI_buffer_size) WR_point = 0;
        LPM3_EXIT;
    }

    This function will overwrite yor spi buffer.  Move the test:

     if (WR_point >= SPI_buffer_size) WR_point = 0;

    above the buffer assignment.

    This code is difficult to read (may be due to typos) as the indexing variables are not visible.

    One typo appears to be:

        SPIbuf_WR_point++;

    as you appear to use WR_point as the buffer index.

     

     

     

     

  • ...I've got 6 years of SW developping for industrial devices, so there's no question about probems like stack overflow, bad pointer etc...

    See Alex repllied on 10-16-2009 5:22 PM

  • Hey guys,

    I'm sorry that I haven't responsed, but I was out of my office.

    Thank's old_cow_yellow for your notes. First I must say, that I don't feel like a great programist, I feel more like greenhorn, especialy now. :-)

    The code I've provided is not the code I load to MCU really. It's just a way how the SW works. I'm not allow to load the code on the internet as it's for commercial purpose. I hope you understand my position and I'm little bit sorry about this.

    - The loop you noticed is basically for nothing (nop works also fine) but you're right, there is a mistake, it should be volatile. It's much safer still compiler didn't remove it.

    - I also tried to swich off LPM modes. Still the same.

    - I tried to use DCO instead of XT2. I mean I commented the line with BCSCTL2. Still the same.

    - RAM area for Stack is clear, there's no bad writing into stack area or stack overflow. It writes byte at diferent places in RAM like 0x1210 etc.  (upper RAM) but stack is clear.

    I made the pix of some signals as I see them on my OSC.First one is port6.6 which makes the period of 3s. Second is SPI data. Third one is Timer interrupt (XOR Port 6.4)  As you can see on the pix it seems to be an interrupt problem. When SPI interrupt(or also port2 interrupt, I tried to hande SPI by SW driver) comes up it rewrite RAM at 0x1210 where is the variable "Loop". Var Loop makes period of 3s based on RTC...  Because of the rewriting of RAM a "period" condition is true and port is XORed ... So it seems to me like nested interrupts problem. I know quite exactly what cause the problem but I don't know how to fix it. My code works on Atmel8 and also on ARM7. My questlion is how to work with multiple (nested) interrupts correctly. I don't have many experiences with MSPs but I like the architecture of MCU so I'd like to use them in my future projects. Nevertheless I've got boss also :-) and the problem lasts too much time. Nobody at my office sees the problem in the SW. It looks like strange behaviour of MCU with multiple interrupts.

    God, please, save me :-)

    Thank's guys.

    Alex.

  • Hi Matthew,

    yes, you're right there's a mistake. The variable name for buffer index should be WR_point everywhere. I wrote this example code too quickly but it's just an example of the code. There is just SPI HW buffer reading when debugging. No SW buffers in use. There is also reset instruction behind each sw buffer to avoid buffer overflow etc. This is really not a problem.

    Thank you anyway [;)]

    Alex.


  • Hey guys,

     

    I've got it [:D] The problem was UART or strickly speaking an UART baudate. When UART is running at 115,2kHz, the F2410 (or may be just my F2410) has got serious problem. I went down with baudrate and everything is alright. The question is what to do when a man needs a higher speed ...

    Ok, so thank you all for your advices.

    Alex.

  • Hi Alex,

    thanks for closing the loop, but I would be very careful with the "fix" you found. I'm pretty certain it just disguises some underlying fundamental system issue that could re-surface at any time during your development work. No matter what you set the UART baud rate to, the HW module itself won't have any issue handling it as long as it is used properly from a configuration and system perspective. I would recommend analyzing your system more in-depth. In case not done already, doing some real-time probing could be helpful. For example, toggling pins upon entering or exiting ISRs, and monitoring them in real-time using a scope, together with your UART data.

    Regards,
    Andreas

  • Hi Andre,

    for sure. You're right. I've connected my OSCs to every single pin I use and it seems to be clear. The problem is definitely conected with ISRs as I use many ports and peripherals. There was too much nested ISRs and therefore almost "no time" for main loop. I've slowed down and it helps. It would be great to know what's going on exactly but I dont have a time. SW testing departement says that SW is OK. It was just implementation problem in the MCU. If I find something I'll let others know. [;)]

    Thanks,Alex.

  • Hi Alex,

    I admire the optimism being put into the solution, but I also understand the constraints under which you are working. And I'd love to help you out but without looking at the physical system and source code it seems very difficult.

    But one additional small piece of advice. When doing testing, you should definitely try it out using several devices (from different lots), and over the full operating temperature and supply voltage range to cover your bases. Plus, a good production test should be in place. It's quite common that customers go to production with something, only to find out later that small shifts in device parameters (within the D/S specs) and/or unexpected operating conditions could lead to behavior interpreted as "device bug" - which in reality is just the uncovering of an underlying HW or SW design issue...

    Regards,
    Andreas

  • Well, I thing I'll find the problem. I'm sure it's not a device bug. I've got just a breadboard yet so there's lot of problems connected with this. Tommorow, I'll be given a prototype on the PCB so will see. I know I've just discovered a surface of the problem but I couldn't get into. I couldn't do anything when having just dozens warrnings and errors. Now it's "working" and I can solve particular issues.

    It drove me crazy because this part of a project is very simple in comparison with FPGAs, ARMs and so on. Well, there's still something new to study [;)]

    Thank you.

    Regard,

    Alex.

**Attention** This is a public forum