MSP430F5335 Crashing

Other Parts Discussed in Thread: MSP430F5335

Hello Team,

I am currently having a strange problem with my application on the auxiliary board MSP430F5335 crashing. I am running FreeRTOS v8.2.2 on an MSP430F5335 based board. I am using IAR v6.30.3. It is a relatively simple application which acts as an i2c slave device and provides analog outputs. A separate i2c master device controls the analog outputs.

I have been working on adding firmware update support where the master i2c device can update the firmware in the MSP430F5335 slave device. The first thing I did is implement the part where I send down the image. Right now, I am simply sending down the firmware in 128 byte chunks over i2c and getting back a 8 byte confirmation message for each chunk. The 128 bytes are stored in a RAM buffer. I actually use a 512 byte RAM buffer. Eventually, once I have 4 of these 128 byte chunks, I will program the 512 byte chunk into the flash memory in the MSP430. Right now though, I am only storing in the RAM buffer in my testing. The master does this repeatedly as fast as possible to send down a total of 0x10000 bytes in the firmware image.

I am using the i2c interrupt service routine to trigger a message processing task by calling xSemaphoreGiveFromISR when a complete i2c message has been received. portYIELD_FROM_ISR is called at the end of the service routine. The MSP430 application is crashing at random times, sometimes on the first i2c message and sometimes after many have been processed. Using the State Storage window in IAR CSpy, I set advanced triggers to break on address bus instruction fetches outside of the actual program memory space. The best I can tell, the crash seems to be occurring when xQueueGenericReceive calls VTaskSuspendAll on line 1496 in queue.c (FreeRTOS) code. It looks like the CALLA instruction jumps to 0x13736 instead of 0x13734 like it should. You can see the difference in the attached screen shots “Crash” and “Normal Operation”. From the State Storage Window you can see were the CALLA jumped to 0x13736 during the crash instead of 0x13734 like it did during the normal operation screen shot. When the jump is made to the wrong address, a “jne 0x1393A” is executed instead of the normal RETA instruction that should occur. 0x1393A is an invalid address which is outside of the actual code image. That is what triggered the breakpoint. I have not used the State Storage Window much in the past so I don't have much experience with interpreting the results. I could imagine with the 2-stage pipeline it may be somewhat deceiving at times. I have attached a screen shot of CSpy with the information during a successful normal call and also when the crash occurs.

I have been searching the internet info for possible solutions and have tried many things to solve the problem. Finally late last night I found something that seemed to fix the problem. I changed the optimization settings in IAR from "none" to "high, balanced". I set up the i2c master application to continually send the 0x10000 firmware image over and over and the MSP app ran all night with no problems. I have been running the code with optimizations set to “high, balance” for the past week or so with no problems at all. I changed the optimization level back to none today just to confirm, and immediately started seeing the problem again.

I wonder if the problem has to do with the CALLA and RETA. I have seen several issues that occurred while creating the MSP430X port for FreeRTOS, but have not really seen in reports of issues with recent versions.

Any ideas on why setting the IAR optimization level to "high balanced" would have solved the crashing problem?
Any ideas on how to fix the issue with the optimization level set to none would be greatly appreciated. Debugging is much easier when set to none. In addition, I really need to understand exactly why this is happening to make sure my application will be reliable long term. Setting the optimization to high may only be masking the problem. It may still occur with extensive long term testing.

Normal Operation Screenshot

Crash Screenshot

Thank you and I appreciate the help!!

over 8 years ago

0 Clemens Ladisch over 8 years ago

Guru 243090 points

Sounds like erratum CPU40:

PC is corrupted when executing jump/conditional jump instruction that is followed by instruction with PC as destination register or a data section

Apparently, with optimization, IAR generates different code. Or it needs optimization enabled to generate the NOP workaround.

0 Ryan Brown1 over 8 years ago

TI__Guru**** 202096 points

Javier,

Can you try re-posting the mentioned screen shots? I am not seeing them for some reason or another.

I agree with Clemens that the issue you are seeing could be related to CPU40 and that using the high balanced IAR compiler utilizes the NOP workaround that the default compiler fails to implement. In this case manually inserting NOPs between the jump/conditional jump instruction and program code might help, but I'm not sure how feasible this is inside of FreeRTOS. Anders Lindgren is an author of the MSP430 IAR compiler and may be able to provide more support regarding the differences between the default and high balanced compilers.

Regards,
Ryan

0 JP over 8 years ago in reply to Ryan Brown1

TI__Expert 3200 points

Sorry about that, I went ahead and added them to the bottom of the post.

-Javier

0 Anders Lindgren over 8 years ago

Expert 2070 points

Unfortunately, it's hard to give a straight answer to your question. But I can try to help you figure out what is going on. The fact that the optimization level affect the situation indicates that the problem is triggered by certain code patterns. Why this is, I don't know, but it can be anything from that a certain code sequence simply don't work (due to known or unknown CPU-bugs), interrupt timing (optimized code is faster), to someone part of the code is corrupted by a write by another part of the program, etc.

One thing you can do to try to narrow down the problem is to build different files with different optimization levels. Hopefully, you will find a single file. The next step is to isolate the "offending" function. One way to do this is to split the file repeatedly until you have a single function in a single file that is affected by the optimization level.

When it comes to the trace in the state storage window. We have noticed that sometimes the address that is presented there doesn't correspond to the real address. In other words, the call might have gone to 0x13734 after all, so this might be a red herring.

When it comes to the CPU40 bug: It only affects JMP and Jcc instructions, if the memory right after the instruction contains a certain bit pattern. (The next word is prefetched and, apparently, the processor acts upon some things even if the code should not be executed.) The IAR compiler ensures that this case is handled in C code, but you might need to investigate any assembly files for offending code. (The problem is fixed by adding a NOP after jumps at the end of a code block, or if the next instruction modified the PC.)

Can you factor out the RTOS in any way, and still trigger the problem? If you could, it would make the situation much easier to debug.

Another thing to investigate is cpubugs related to I2C. For example, one of our customers has reported having trouble with USC137, however, I failed to find a reference to it when doing a search.

-- Anders Lindgren, IAR Systems

0 Clemens Ladisch over 8 years ago in reply to JP

Guru 243090 points

Javier, to confirm whether this is caused by the CPU40 erratum (and a missing workaround in the compiler), we need to see the machine code instruction directly after the offending CALLA, i.e., at EAC6 in your example.

0 Anders Lindgren over 8 years ago in reply to Clemens Ladisch

Expert 2070 points

Clemens Ladisch said:
Javier, to confirm whether this is caused by the CPU40 erratum (and a missing workaround in the compiler), we need to see the machine code instruction directly after the offending CALLA, i.e., at EAC6 in your example.

My understanding is that the instruction CALLA is not affected by the CPU40 bug. Only conditional and uncoditional relative branches are.

Anyway, please post the relevant code section anyway, it might provide vital clues and it can allow us to disprove this theory.

-- Anders Lindgren

0 Clemens Ladisch over 8 years ago in reply to Anders Lindgren

Guru 243090 points

My understanding is that the instruction CALLA is not affected by the CPU40 bug. Only conditional and uncoditional relative branches are.

CALLA does an absolute jump, but the CPU40 description just says "jump/conditional jump instruction". It might be a good idea to insert NOPs after all jumps followed by 0?40/0?50.

(The trace buffer also shows that CPU37 happened, but this is probably unrelated.)

0 Anders Lindgren over 8 years ago in reply to Clemens Ladisch

Expert 2070 points

Clemens Ladisch said:
CALLA does an absolute jump, but the CPU40 description just says "jump/conditional jump instruction". It might be a good idea to insert NOPs after all jumps followed by 0?40/0?50.

Fortunately, the IAR tool never generates the offending instruction (i.e. an instruction that manipulates the PC). In addition, the tools never generate a call following a fall through to an unknown module. Hence, from C, there is no need to insert a NOP after a CALLA, even if it were affected by the CPU40 bug.

Again, I don't think this problem is CPU40 related. As you pointed out, it is more likely that the CPU37 bug caused the trace output to display the wrong location, even if the execution occurred in the correct place.

-- Anders Lindgren, IAR Systems

0 Greg Dunn over 8 years ago in reply to Anders Lindgren

Intellectual 475 points

I want to thank everyone very much for all of your comments. When I had this problem a few weeks back, I was preparing to release my code for use in a sales presentation our company was doing so I was forced to continue with my development using "high balanced" optimization. I now have some time focus on this issue and hopefully get to the root of the problem. I should have backed up my project when I created the original post but I didn't think of it at the time. With my new code, I have now set the optimization back to "none". I am seeing a similar issue as before but with a slightly different crash scenario. I am reviewing the code with your comments in mind and will post additional information when I feel I have something useful. I mostly wanted to thank you for your comments and let you know that I am still working on the problem and will give you further feedback hopefully by the end of the day.

Thanks,
Greg Dunn

0 Greg Dunn over 8 years ago in reply to Anders Lindgren

Intellectual 475 points

Below, I will attempt to provide more screen shots which include the various parts of code that have been requested. This code is slightly different than in the previous post as I did continue development on some other areas of the application - these addresses will not match addresses in earlier screen shots.

This first screen shot shows current execution point when the break point occurred. The breakpoint was configured to trigger anytime code is executed outside of the actual code ranges (PC<=0x8053 or PC>=0x13434). The PC seems to indicate that the jne 0x136A0 in the State Storage Window was probably executed since the current PC is 0x136A0 as shown in the CPU Register View.

The next two screen shots show the disassembly window and code window at the various points along the execution path in the State Storage Window. This first screen shot is in the xQueueGenericReceive function in the FreeRTOS queue.c file. The "calla #vTaskSuspendAll" is where the problem seems to begin.

The next screen shot shows the code and disassembly window for the vTaskSuspendAll function. The calla to this function is where the first evidence of something going wrong seems to occur. The calla was supposed to jump to 0x133FE but the State Storage Window looks like the calla jumped to 0x13400 which is 2 bytes past the intended jump address. This causes the instructions executed for VTaskSuspendAll to be incorrect. Address 0x13402 contains the jne 0x136A0 instruction which appears to have actually been executed as shown in the first screenshot when the breakpoint occurred.

As was mentioned previously, I would agree the first occurrence of address 0xE5C2 (calla #vTaskSuspendAll) in the State Storage Window is probably due to CPU37. I would think that the nop at 0xE5BC was actually executed instead.

Just as a reminder, this crash seems to be fairly random. It may crash immediately when I start testing or it may take several minutes. I wonder if it could be related to interrupt timing. It definitely occurs quicker when I am sending the i2c messages as quick as possible with no delays. I assume that if an interrupt service routine was called during the code sequence shown in the State Storage Window, it would have actually shown up in the window. Is that correct?

I assume that CPU40 would not apply if a "jump/conditional jump instruction" executes and then an interrupt (which changes the PC) occurs. Surely this would not be a problem or we all would have see it.

One other thought is that the board I am using has an MSP430F5335 that has a marginal problem. I will confirm that I see the same results on multiple boards tomorrow and update my results with a post.

Could USC139 be the problem? This may be the i2c errata that you were thinking of. The description says that Unpredictable code execution can occur. I wonder what exactly "unpredictable code execution" means. Does this make since at all with what is in the State Storage Window? I have tried updating the portENABLE_INTERRUPT macro in FreeRTOS to apply the USC139 work around but I still saw the same type of crash. Here are the two alternate versions I tested:

#define portENABLE_INTERRUPTS()	    UCB1IE &= ~(UCNACKIE+UCSTPIE+UCSTTIE);  \
                                    _NOP();                                 \
                                    _EINT();                                \
                                    UCB1IE |= UCSTPIE+UCSTTIE;
    
#define portENABLE_INTERRUPTS()	    uint8_t saved_UCB1IE = UCB1IE;          \
                                    UCB1IE &= ~(UCNACKIE+UCSTPIE+UCSTTIE);  \
                                    _NOP();                                 \
                                    _EINT();                                \
                                    UCB1IE |= saved_UCB1IE;

Again, thanks for all the help. I will continue to gather more information and see if I can somehow repeat the issue in a simplified application and/or isolate the offending code. Also, if anyone would be interested I am sure I could set up a remote session where you all could work with my computer remotely to see the issue first hand.

Thanks,

Greg Dunn

0 Anders Lindgren over 8 years ago in reply to Greg Dunn

Expert 2070 points

Unfortunately, I can't see the screenshots. Can you repost them?

-- Anders

0 Clemens Ladisch over 8 years ago in reply to Greg Dunn

Guru 243090 points

This is definitely not CPU40, because the following DINT is a 'safe' instruction.

This is likely to be USCI39, because all the conditions are there: I²C communication, and EINT being executed.
(Even if this particular problem were not USCI39, you should still protect against it.)

"Unpredictable code execution" can mean anything.

0 Greg Dunn over 8 years ago in reply to Clemens Ladisch

Intellectual 475 points

Here are the same three screen shots with the USC139 workaround added. It seems to be the exact same crash. I am still assuming that CPU37 is causing the first instance of "0xE212 calla #vTaskSuspendAll" to show up in the State Storage Window. It should have shown the "0xE202 and.b #0xD3,&UCB1ICTL" instruction which I believe was executed. The "calla #vTaskSuspendAll"still seems to be jumping 2 bytes past the target address for some reason. It did take several minutes of continuous i2c communications before the crash happened.

0 Clemens Ladisch over 8 years ago in reply to Greg Dunn

Guru 243090 points

I'm just flailing around blindly here, but could you just insert a bunch of _NOP();s before and after the call to vTaskSuspendAll(), to see whether it makes any difference?

In the worst case, you could insert a _NOP() at the beginning of vTaskSuspendAll(), but I wouldn't be happy not knowing what causes this.

0 Greg Dunn over 8 years ago in reply to Clemens Ladisch

Intellectual 475 points

After several days of testing this problem on 4 separate boards, I have all but convinced myself that this may be a CPUXv2 bug on the MSP4305335. Hopefully someone can point me in the direction of why this is occurring and how to solve it without the Band-Aids described below. I have narrowed the failure down to 2 different scenarios of which I have observed on all 4 boards. In the process of debugging, I have made many small changes in the code which cause the actual position of the functions involved to be located at different memory addresses. The same 2 exact failure scenarios occurred, at same 2 spots in code through all of the code versions.

I can’t explain why these two failure modes would occur and am very concerned at this point about the reliability of my entire application. Even if it runs with the fixes described below, as the development continues surely these types of problems would show up again. I really need some help to understand the root cause.

Here are the details on each of the two failure scenarios.

1) First Failure Mode

The “calla #vTaskSuspendAll” instruction jumps to an address which is 2 bytes past the target address. I am sure there are lots and lots of calla #address instructions throughout the application. I am not sure why this one would be so special. It always occurs at the exact same spot in the code in FreeRTOS where the xQueueGenericReceive function calls vTaskSuspendAll.

Here are the details of what occurs during this failure:

Code from disassembly window in IAR EW430 v6.30.3:

This is the State Storage Window after a breakpoint triggers on an instruction fetch outside of the actual code range. Note: I believe the first occurrence of address 0xE2BC calla #vTaskSuspendAll in the window is due to CPU37, the instruction at 0xE2AC is actually executed since the jump was not taken. You can see that the second occurrence of 0xE2AC calla #vTaskSuspendAll actually jumps to 0x134EC instead of 0x134EA as specified in the calla instruction. This causes invalid code to execute which includes the 0x134EE jne 0x1378C which is jumping to an invalid address and triggering the breakpoint.

Here is the CPU Register window and disassembly window right when the breakpoint triggers:

As a test recommended by Clemens Ladisch, I inserted a __no_operation() as the first statement in the vTaskSuspendAll() function. This would allow the calla to work correctly if the target address hits on the correct address or 2 bytes beyond. This seems to have fixed (“put an ugly Band-Aid on”) the problem. I have been running the code with this patch along with the one described below on item 2 for about 5 hours now with no failures. Previously, I would have seen the problem on at least one board after about 15 minutes.

Here is the disassembly window showing the fix that was added in the code that does not crash:

2) Second Failure Mode

The second failure mode that I have observed is when a single “popm.a #4,R11” instruction appears to be executed twice leaving SP pointing to the wrong place for the reta instruction. When the running function exits with the reta, the program returns to a location outside of the actual code area triggering my breakpoint. As with item 1 above, this failure seems to occur at the same spot each time in code also, even with various small changes in the application which shift the addresses. This one occurs at the end of the xTaskResumeAll function when it tries to return to the XQueueGenericReceive.

Here is a detailed description of this failure mode:

Code from disassembly window in IAR EW430 v6.30.3:

Code below is in xQueueGenericReceive where xTaskResumeAll is called

Code below is at the end of the xTaskResumeAll function where it should return from the above call.

Below is the State Storage Window, along with the CPU Registers and a memory dump of the stack area at the time the breakpoint triggered. You can see the double execution of the popm.a #4,R11 instruction in the state storage window. You can also see in the memory dump of the stack that the correct return address (0xE37C) was actually on the stack (highlighted in the memory view). At the time the reta was executed, SP was positioned 16 bytes (4*4) higher in memory due to the extra popm.a being executed. This caused the reta to go to 0x50004 instead of 0xE37C like it should have. Looking at the register view, SP is pointing to 0x31AA when the breakpoint occurred (after the reta). The previous 4 bytes on the stack would have been used for the reta: 05 00 a5 a5. I believe this would have given a return address of 0x50005 – with word alignment would become 0x50004 which is seen in the current PC. If you look at the 4 previous 4-byte values on the stack, you can see that they were popped into R8..11. R8 received the return address that should have been used for the reta.

Disassembly window at time of breakpoint:

I decided to try adding some extra NOP’s before the popm.a instruction as a test. This appears to have fixed (another big Band-Aid) the problem. This modification along with the extra NOP described in item 1 have been running on all 4 boards with no failures for the last 4-5 hours. I have been running the code with this patch along with the one described above on item 1 for about 5 hours now with no failures.

Below is the modified C code at the end of the xTaskResumeAll function along with the resulting disassembly showing the fix that was added in the code that has not crashed so far:

Thank you very much for any help that may be provided.

Greg Dunn

0 Clemens Ladisch over 8 years ago in reply to Greg Dunn

Guru 243090 points

In both cases, you have funny things happening after an EINT (combined with I²C events).

Try appending a call to a dummy function containing NOPs at the end of portENABLE_INTERRUPTS(). This is still a band-aid, but should catch all places where this error could happen.

0 Anders Lindgren over 8 years ago in reply to Greg Dunn

Expert 2070 points

Can you try the following:

* Add a NOP before the "bis.b #0xC, &UCB1ICTL" instruction?

The reason I ask is that Ti has specified that a NOP should be inserted in 430Xv2 before every instruction that enables interrupt. However, the tools can only do this for the EINT instruction, not for control registers.

* Try to swap the order of the aforementioned instruction and the EINT instruction.

-- Anders Lindgren, IAR Systems

0 Greg Dunn over 8 years ago in reply to Anders Lindgren

Intellectual 475 points

Well, I have tried all of the suggestions. I first removed the extra nops that I had added in my above post for these tests. Here are the results:

1) Clemens's suggestion: "Try appending a call to a dummy function containing NOPs at the end of portENABLE_INTERRUPTS(). This is still a band-aid, but should catch all places where this error could happen."

Here are my code changes:

I updated the FreeRTOS port.c file to include the following dummy_nop function:

void dummy_nops( void ) {

__no_operation(); // this would be the normal target address of the calla

__no_operation(); // this address would be hit when the calla hits 2 bytes past the target address.

}

I updated the portENABLE_INTERUPTS macro as follows:

#define portENABLE_INTERRUPTS() extern void dummy_nops( void ); \

UCB1IE &= ~(UCNACKIE+UCSTPIE+UCSTTIE); \

_NOP(); \

_EINT(); \

UCB1IE |= UCSTPIE+UCSTTIE; \

dummy_nops(); \

This version seemed to fix the first failure spot (calla #vTaskSuspedAll) described in my previous post. Calling into the double nop's would probably be ok if the target address hits 2 bytes past the intended address. The second failure spot described previously still had a problem. It seems like the PC went to 0x102D8 instead of 0x102D6 after the 0x102D0 bis.b #0xC,&UCB1ICTL instruction was executed. Further code execution was then in trouble...

The disassembly and State Storage Window are shown below when the breakpoint triggered outside of normal code space:

2) Anders's Suggestion #1: "Add a NOP before the "bis.b #0xC, &UCB1ICTL" instruction."

Here is the new version of portENABLE_INTERRUPTS()

#define portENABLE_INTERRUPTS()  UCB1IE &= ~(UCNACKIE+UCSTPIE+UCSTTIE); \
                                    _NOP();                                 \
                                    _EINT();                                \
                                    _NOP();                                 \
                                    UCB1IE |= UCSTPIE+UCSTTIE;

This version seemed to have the exact same problem as the first failure mode described in my previous post. I probably didn't test long enough to determine if the second failure mode would still occur or not.

Here is the crash point:

Here is the State Storage Window (As before, I assume the first occurrence of address 0xE2C2 in the state storage window is due to CPU37).

And the disassembly

3) Anders's Suggestion #2: "Add a NOP before the "bis.b #0xC, &UCB1ICTL" instruction. Try to swap the order of the aforementioned instruction and the EINT instruction"

I am not sure that I understand exactly what order you would like me to try for this one. According to the TI Errata sheet, the workaround for CPU139 (MSP430F5335) shows the bis.b after the EINT. There is also some discrepancies in their wording verses the code. They words say to disable all three xIFG flags then set the enable flags again. I assume the wording should have had xIE, not xIFG. The code example is using all xIE.

Workaround Disable the UCSTTIFG, UCSTPIFG and UCNACKIFG before the GIE is set. After GIE is

set, the local interrupt enable flags can be set again.

Assembly example:

bic #UCNACKIE+UCSTPIE+UCSTTIE, UCBxIE ; disable all self-clearing interrupts

NOP

EINT

bis #UCNACKIE+UCSTPIE+UCSTTIE, UCBxIE ; enable all self-clearing interrupts

Anyway, here is a couple more versions I tried:

Here is the portENABLE_INTERRUPTS macro used for this test:

#define portENABLE_INTERRUPTS()  UCB1IE &= ~(UCNACKIE+UCSTPIE+UCSTTIE); \
                                    _NOP();                                 \
                                    _EINT();                                \
                                    UCB1IE |= UCSTPIE+UCSTTIE;              \
                                    _NOP();

This version seemed to crash relatively quickly in a similar fashion as the second failure mode described in the previous post where the popm.a #6,R11 executed twice. In this case however, the popm.a instruction seemed to be skipped, still causing a problem with the reta of course.

Here is the detail during the crash:

Another version of portENABLE_INTERRUPTS

#define portENABLE_INTERRUPTS()  UCB1IE &= ~(UCNACKIE+UCSTPIE+UCSTTIE); \
                                    _NOP();                                 \
                                    _EINT();                                \
                                    _NOP();                                 \
                                    UCB1IE |= UCSTPIE+UCSTTIE;              \
                                    _NOP();

This version also crashed in the same spot:

Disassembly:

State Storage Window:

One thing I noticed in these last two state storage windows: The IRQ flag = 1 in the Control Signal bits when the reta instruction is executed. I don't recall seeing the IRQ=1 in previous. When IR=1 does this mean that the interrupt just became pending when the reta instruction was executed or was an interrupt routine fully executed at this point but not shown. How does the state storage window react when an interrupt runs? Do you see the instructions executed during the interrupt and then also after the reti? I would suspect just a simple stack problem in an interrupt routine but adding the nop's in the spots described in my previous post did fix both of the problem spots. Three boards ran all weekend long with no failures. I really don't want to just run with these fixes though. This sure seems like some kind of strange CPU issue that needs to be explained and possibly corrected with a general purpose workaround in the compiler if possible.

I was also wondering if TI or someone else may have a more advanced JTAG emulator that would make more detail and a longer state storage window available. I know that these more advanced tools may not be available to the public, but it sure would be nice to be able to dig deeper into this issue and determine exactly what is going on. I would be more than happy to provide a board to assist in in this trouble shooting. I can also arrange for a remote session on my computer if anyone is interested.

One a secondary note: I noticed that there is probably a bug in the IAR State Storage Window code. The opcodes for the bis.b instruction are not shown correctly in the state storage window. It looks like a printf format specifier just needs to be changed to %2.2X instead of %X to make sure 2 digits are shown for each byte in the opcode.

In disassembly window:

00E2BA D0F2 000C 063C bis.b #0xC,&UCB1ICTL

In state storage window:

0xE2BA F2D0C03C6 bis.b #0xC,&UCB1IE.UCNACKEI

Thanks again for all your help,

Greg Dunn

0 Greg Dunn over 8 years ago in reply to Greg Dunn

Intellectual 475 points

Here is the latest information on my issue.

I think I have found a modification to the FreeRTOS portENABLE_INTERRUPTS macro that seems to correct the crashing problem, at least during an overnight test on 4 boards anyway. Here is the updated version of the macro:

#define portENABLE_INTERRUPTS() UCB1IE &= ~(UCNACKIE+UCSTPIE+UCSTTIE); \

_NOP(); \

_EINT(); \

UCB1IE |= (UCNACKIE+UCSTPIE+UCSTTIE); \

_NOP(); \

_NOP()

The first four instructions are taken directly from the workaround recommendation for USCI39 in the MSP430F5335 Errata Documentation. The remaining 4 nop's were required to keep the various crash modes described in the previous posts from occurring. If I use 0 to 3 nop's at the end of the macro then the crashes will still occur. I am not sure what kind of strange execution may be occurring during these nop's, but I would imagine that some of them may be skipped or executed twice at times based on what I saw in the previous crashes. Apparently, the issue corrects itself by the last nop and the code afterward functions properly, at least as far as I can determine at the moment. I am not sure if these problems were all related only to USCI39 or if they are triggered by some other type of scenario in combination. While investigating the problem, I have tried changing PMMCOREV, DCORSEL and several other settings to see if they make a difference but I found the problem always stayed the same. I am running from XTL1 with a 32.768KHz crystal and using the FLL to generate MCLK = 7.995392 MHz. I have monitored the single 3.3 volt supply which feeds DVCC1, DVCC2, DVCC3, AVCC1, and VBAT and no significant noise spikes are observable. Each DVCCx, and AVCC1 pin has an individual 0.1uF ceramic cap located close to the respective pin. VBAK has a 4.7 nF cap. The pc board is 4 layers and uses internal ground and power planes. To this point, I have only tested at room temperature. I plan to test this new fix over the full operating temperature range. I suppose there still is a possibility that something else in my application is the real culprit in the problem, but I fell that it is a pretty remote chance given that 4 simple nop's correct the problem.

Based on all of these issues, I have a few more questions that I would like to have detailed answers on:

1) At the point, I have disabled all of my low power mode code. The final version does however require the use of low power modes. I know from other posts that only have limited answers, it may be necessary to add some special handling for USCI39 when your enter and exit LPM. I believe in my application, I will not require i2c triggered wakeup from LPM, so I can probably just disable the i2c IE flags before entering LPM and enable after wakeup. I would however like to have detailed steps for the proper procedure to follow to deal with USCI39 if it may affect operation.

2) Every time an interrupt service routine is exited with the RETI instruction, SR is popped from the stack which will normally always set the GIE bit. Is there any special handling required at this point to deal with USCI39? Other threads have mentioned this possible area as a problem but no definitive solutions have been provided.

3) What other areas should I focus on checking that could cause these types of problems to occur (calla #FUNCTION jumping 2 bytes past target and instructions being skipped or executed twice).

Thanks,

Greg Dunn

0 Ryan Brown1 over 8 years ago in reply to Greg Dunn

TI__Guru**** 202096 points

Hi Greg,

I apologize for my absence from this thread but am glad that you've narrowed the issue down to the USCI39 errata. In typical applications once the GIE bit is set (commonly during initialization) it is not reset elsewhere except for entering an interrupt. There are time-critical applications that call for disabling interrupts, such as writing to flash, but this does not seem to apply to your application and in any case I would recommend disabling the IFGs themselves or simply re-doing the USCI39 workaround afterwards. I don't know too much about FreeRTOS operation so it would be beneficial to search the code and make sure that _DINT(), or disabling interrupt commands in general, is not used or required for your application. Regarding interrupts, you will have to disable the hardware-clear-able IFGs upon entering the ISR so that these bits are not set when the ISR exits, which would then set the GIE bit. This also requires the hardware-clear-able IFGs to be set manually (and the insertion of additional _NOP()s) after the return from each ISR. Also note the GIE rules provided in 1.3.4.1 of the User's Guide.

The hardware-clear-able IFGs can toggle as much as possible without causing the errata so long as the USCI39 workaround involving setting the GIE bit first has been accomplished. I do agree in your confusion though about the need for the extra _NOP() instructions afterwards. Can you please tell me what the following instructions are after the code segment you've provided? I notice that you're operating at a MCLK of 8 MHz which is 8x faster than the default 1 MHz for which the errata workaround might have been tested, therefore a further delay might be warranted depending on the following instructions. I would have to ask our Systems Team for further investigation and clarification of the USCI39 workaround, this might simply be something that needs to be added in the description.

The abbreviated answers to your questions are as follows:
1) I agree with your workaround assessment and would advise re-setting the hardware-clear-able IFGs followed by four _NOP()s after waking from LPM, which occurs through an ISR that resets and set the GIE bit.
2) I would advise incorporating your USCI39 workaround after each return from interrupt.
3) The types of problems you've experienced are a direct result of the "unpredictable code execution" defined by the USCI39 description so I can assure that other areas of your code will not replicate this issue so long as USCI39 has been resolved.

Regards,
Ryan

0 Ryan Brown1 over 8 years ago in reply to Ryan Brown1

TI__Guru**** 202096 points

Further investigation revealed that from CPU perspective there is no available workaround to avoid the USCI39 bug. Whatever the code might be, if the USCI self-clears the interrupt within a few cycles of CPU enabling these interrupts then the BUG can be observed. This includes GIE commands from ISR RETI instructions. The USCI cannot use self-clearing interrupts to completely avoid this bug.

The good news is that most master operation modes are not affected by USCI39. UCSTTIFG and UCSTPIFG are generally used for slave mode operation so there is no concern over those IFGs causing the errata. If using the UCNACKIFG there is still a concern about UCNACKIFG setting within a few cycles of exiting an ISR, but NACKs should be quite rare given that the slave devices should be returning ACKs instead.

Regards,
Ryan

**Attention** This is a public forum

MSP low-power microcontrollers

MSP low-power microcontroller forum

MSP430F5335 Crashing