MSP430F5529: Stack pointer out of range

Bob Marley

Intellectual 310 points

Other Parts Discussed in Thread: MSP430F5529, MSP-FET, MSP430F5438

Hello,

I am using a MSP430F5529 and IAR-Embedded-Workbench.

My code is using the TI example code MSP430F552x_UCS_07.c where my problem arises during debugging.

During debugging step by step I am getting the following information:

"Warning CPU is off (Low Power Mode) and interrupts are disabled. Cannot execute step/go)."

Then the debug log window shows the following message:

"The stack pointer for stack 'Stack' (currently Memory:0xFFF6) is outside the stack range (Memory:0x4360 to Memory:0x4400) "

This happens somewhere in the loop for stabilization of the 32kHz-quartz or within __delay_cycles(782000).

As this error does not occure always on the same line - and the most of the times there is even no error mentioned but one of the loops is just running endless - I think it is probably a matter of time and not a matter of code if/when the failure occurs. One time I even have seen that the programm flow jumps back without any warning or message to __delay_cycles(782000) which can only be made by a strange stack pointer.

Can anybody put my nose into the right direction where the problem could be?

Can this problem for example be caused by the PC when saving power consumption of a USB device (FET)?

I am asking this because from time to time I get the message that there is a failure of connection when I start debugging again.

Many thanks in advance!

Bob

over 14 years ago

0 Jim Noxon over 14 years ago

TI__Genius 14940 points

This is a fairly long delay period. Any possibility the watchdog timer is active during this time and causes a PUC event? Any ISR's that could be running and semaphores overflowing due to not being serviced in the baseline code during this delay?

Jim Noxon

0 Piotr Romaniuk over 14 years ago

Expert 2830 points

Is it possible that your code changes msp430 to low power mode; do you change SR register somewhere?

Did you examine your ISRs, is anything executed durring this delay? Are interrupts dissabled during this test? If not, please try check if the error still appears when you disable them.

Can you determine what instruction (in assembler is the last executed before the error)?

Does the error appear when you debugging step-by-step only, or also when you set a breakpoint after that and execute __delay_cycles() at once?

Regards,
Piotr Romaniuk, Ph.D.
ELESOFTROM

0 Bob Marley over 14 years ago in reply to Jim Noxon

Intellectual 310 points

Hello Jim Noxon,

thanks for your anwer!

The first order within my main function is to switch off the WDT.

All interrupts that are used within the code will be enabled after the initialization of MCLK, SMCLK and ACLK.

So there shouldn`t be any interrupts or any GIE at all. This is what my code is doing so far:

- WDT off

- Pin configuration

- XT1 activation + oszillator settings

- Increase Vcore

- FLL + CLK settings

- __delay_cycles(782000)

- A loop for stabilization of XT1 and XT2 (see below).

How can I approcximate the right length of __delay_cycles(...) ?

Bob

0 Bob Marley over 14 years ago in reply to Piotr Romaniuk

Intellectual 310 points

Hello Piotr Romaniuk,

my code doesn`t change power modes. But within the clock inizialization Vcore of course will be changed, because MCLK is running with 16,7MHz.

(XT1=32768Hz --> FLL --> MCLK = 2^24Hz = 16,7MHz; XT1 --> FLL --> SMCLK = 2^16Hz).

I usually put a breakpoint right behind __delay_cycles() as I don't want to debug this piece of code step by step but often it also works very well with the step-over-debug.

Futher I put a breakpoint behind the last loop within my clock initializations function. This is the last loop which I am writing about:

do
{
    UCSCTL7 &= ~(XT2OFFG + XT1LFOFFG + DCOFFG); // Clear XT2,XT1,DCO fault flags
    SFRIFG1 &= ~OFIFG;                                                           // Clear fault flags
}while (SFRIFG1&OFIFG);                                                     // Test oscillator fault flag

After the first breakpoint I usually debug the code step by step again.

-----------------------

First attempt today:

The last time I have done this the error didn't appear during debugging step by step. I debugged within the last loop of my code, which is written above, and after following the loop several times then I just let the code run. But the second breakpoint never has stopped the code flow. After using the break-icon the dissasembly window showed me that I am at 000004 (memory) 3FFF (execution) and no error has been shown by the debugger.

But when I tried to debug just a further step the following warning occure: "Warning: CPU is OFF (Low Power Mode) and interrupts are disabled! Cannot execute Step/Go."

-----------------------

Several further attempts:

After the first breakpoint I debugged the code step by step again. After a different number of loops I could not see any green line within the dissasembly window during "}while (SFRIFG1&OFIFG);".

Within the dissasembly window this peace of code is translated in two lines (see below). The error always occure on the first line.

}while (SFRIFG1&OFIFG);

0049C0 (memory) B3A2 0102 (executions) 0049C4 (memory) 2FF8 (execution) <-- Here debugging step by step just went into nowhere.

I am sorry I can't translate this into assembler. But I hope it will help you.

}

0049C6 (memory) 0110 (execution)

------------------------

As the error occures sometimes after following several loops and sometimes when just beginning the first loop and as the error probably always arrises whithin the first disassembly line of the while execution I suppose that there must be something wrong with condition "SFRIFG1&OFIFG".

Yesterday I have seen that the debugger code flow was jumping back to __delay_cycles(). I assume that this occures because software (IDE) or PCs do somethimes strange things after working the whole day. Maybe the next saved home adress within the stack rose up over the whole day.

So the question I have got now is: What could be wrong with the condition condition "SFRIFG1&OFIFG"?

Many thanks in advance!

Bob

0 Bob Marley over 14 years ago in reply to Bob Marley

Intellectual 310 points

Okay, I think the error must have been because I did not change my old code correctly according to the latest core code version.

From the beginning on I just builded up that special part of my code again. Now there is no loop anymore that never ends. The complete code can be debugged as expected.

But sometimes I get the message "Communication error. Please connect the device and press Retry to reconnect, or press Cancel to abort" if starting debugging + downloading.

I have got a new PC. Is there a chance that Windows7 is powering down the MSP-FET430UIF if not used for a while?

Bob

0 Piotr Romaniuk over 14 years ago in reply to Bob Marley

Expert 2830 points

I think that this is not a case of powering down the MSP-FET430UIF. I it had happen the led diodes on debug device would have switched off.

I am also using Windows7 but with Code Composer Studio v.4. Unfortunatelly, I also experienced sometimes problems with communication. I have had to upgrade firmware in FET device, otherwise I was not able to use it in CCSv.4 on Windows 7 (some drivers issue).

In my opinion problems are related to drivers or firmware in MSP-FET. I think that there are still some bugs.

Regards,
Piotr Romaniuk, Ph.D.
ELESOFTROM

0 Jens-Michael Gross over 14 years ago in reply to Bob Marley

Guru 227245 points

Bob Marley said:
the dissasembly window showed me that I am at 000004 (memory) 3FFF (execution)

This looks "promising". It looks like the CPU has jumped to 0x0004 for some reason. It usually happens if there is an interrupt and no ISR for it (the interrupt vector points to 0xffff, which lets the CPU jump to 0xfffe and then 0x0000 etc.) Since you say the interrupts are off (and GIE is clear), there is only one typical explanation: somehow you're triggering an NMI (which will jump to the NMI vector independently of GIE).
At this point in your code I suspect the OFIFG. Do you set OFIE somewhere? This will trigger an NMI rather an IRQ when OFIFG is set, to handle oscillator faults with highest priority and unconditionally.

OFIFG is set when any of the active oscillators fails (e.g. XT2, if there is none and XT2OFF isn't set, or if there is no XT1 but OSCOFF is not set). On devices with FLL it will also be set if the DCO is operating on its lowest or highest tap, as this indicates that the FLL reached the end of its adjusting capabilities btu probyybley didn't reach the desired output frequency. In this case, you're probably usign the wrong RSEL setting (note that the DCO frequency spreads across devices and changes with temperature)

One more hint: pre-setting the DCO to a value that is known to be near (but below somethign that could exceed the allowed speed, based on the worst case values form the datasheet) the end frequency will significantly speed up the FLL adjustment process. (e.g. setting DCOx to half of the maximum will halve the maximum time needed for adjustment)

You can also replace your long delay with a loop that checks the value of the DCO config (adjusted by the FLL) over a period of 3 adjustment cycles (~100µs on 32kHz) and tests whether there was a change of 2 or more, which indicates that the FLL is still adjusting into one direction. If the change is 1 or less, the FLL has most likely reached the target frequency.

0 Piotr Romaniuk over 14 years ago in reply to Jens-Michael Gross

Expert 2830 points

Jens-Michael Gross said:
It looks like the CPU has jumped to 0x0004 for some reason. It usually happens if there is an interrupt and no ISR for it (the interrupt vector points to 0xffff, which lets the CPU jump to 0xfffe and then 0x0000 etc.)

Indeed, if the interrupt vector is not defined it may be expected that it contains 0xFFFF address (I checked IAR Workbench and CCSv.4). If interrupt related to that vector happens, execution will be moved to 0xFFFE (not 0xFFFF) because PC must contain aligned address (I checked it on debugger and msp430f5438).
Further behavior depends on the reset vector, because original meaning is an address and in this unusual case it is executed as opcode.
In my examplar program that I compiled the reset vector corresponds to "neutral" instruction:

ADD.W @R12,0xffff(R14)

So its execution does not redirect CPU to another part, but continues running further opcodes from 0x10000 (note that this type of MCU contains there second block of flash memory). If it was not programmed it contains 0xFFFF that means:

AND.B @R15+,0xffff(R15)

Such instructions are executed as long as flash memory is present and after that (at least in debugger) stops on prefetch where is no memory. I wonder if it can throw NMI interrupt in this case [?]. In my debugger it does not continue from the 0x0000 (the additional information is that from address 0x0000 is a segment with periferals).

Of course, reset vector may be different and be translated into other opcode, that makes jump. It need to be checked in particular example.

I think that jump to 0x0004 can occur because of some problems with clock, or stack corruption, RET execution or some explicit instruction that jumps to the address stored in a register. I am not sure, but althrough NMI is not maskable, the source that generates this interrupt should be enabled, should not?

Regards,
Piotr Romaniuk, Ph.D.
ELESOFTROM

0 Piotr Romaniuk over 14 years ago in reply to Bob Marley

Expert 2830 points

Hi Bob,

Would you be so kind to describe what was the source of your error if you have found it?
You wrote something about unexpected behavior after reset caused by interrupt usage, can you write something more about it?
Is it described in an errata?

I am asking because it is interesting and it is better know such pitfalls before we meet them.

Regards,
PIotr Romaniuk, Ph.D.
ELESOFTROM

0 Piotr Romaniuk over 14 years ago in reply to Piotr Romaniuk

Expert 2830 points

I found that access to vacant memory generates SNMI when enabled (VMAIE=1). Reads from this gap results in the value 0x3FFF (the same that Bob reported),
so we are sure that there is no memory and it was redirected to 0x0004 somehow. This code means JMP $, so it spins there.

Below there is a part from msp430f5xxx data sheet:

1.11.1 Vacant Memory Space

Vacant memory is non-existent memory space. Accesses to vacant memory space generate a system (non)maskable interrupt (SNMI) when enabled (VMAIE = 1). Reads from vacant memory results in the value 3FFFh. In the case of a fetch, this is taken as JMP $. Fetch accesses from vacant peripheral space result in a PUC. After the boot code is executed, it behaves like vacant memory space and also causes an NMI on access.

Of course, it does not explain how it happened that 0x0004 address was reached.

Regards,
Piotr Romaniuk, Ph.D.
ELESOFTROM

0 Bob Marley over 14 years ago in reply to Jens-Michael Gross

Intellectual 310 points

"On devices with FLL it will also be set if the DCO is operating on its lowest or highest tap, as this indicates that the FLL reached the end of its adjusting capabilities btu probyybley didn't reach the desired output frequency. In this case, you're probably usign the wrong RSEL setting (note that the DCO frequency spreads across devices and changes with temperature)"

Yes, many thanks Jens-Michael Gross!! I think that was the main problem.

Thanks, too, for the further hints. I am sure they will help for some fine tuning!

Romaniuk, I don't think that the Adress 000004 has been reached by just one step. I think the stack increased one adress step every time the "unexpected interrupt" has been made. So over the day there have been much more different adresses reached.

But I have to work through all of your sucessions and Jens-Michaels Gross hints today to give a complete answer.

Thanks to you, Romaniuk, too!

Bob

0 Piotr Romaniuk over 14 years ago in reply to Bob Marley

Expert 2830 points

Hi Bob,

together with Jens-Michael we discussed 'unexpected interrupt' issue. Can you provide us content of reset vector and following memory at 0xFFFE -0x10010 in your initial code that made problem?

Accumulation on the stack is not very probable, because when interrupt is entered interrupts are disabled and execution locks in some vacant memory. Further after reset stack is re-initialized by C start up code.

Regards,
Piotr Romaniuk, Ph.D.
ELESOFTROM

0 Bob Marley over 14 years ago in reply to Piotr Romaniuk

Intellectual 310 points

Hello Piotr Romaniuk,

at the moment I am held off the project. I am sorry at the moment I can't say when I am able to review my old code.

Thank you very much for your efford.

Regards,

Bob

**Attention** This is a public forum

MSP low-power microcontrollers

MSP low-power microcontroller forum

MSP430F5529: Stack pointer out of range