This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430FR2355: bug in code with respect to volatile uint ??

Part Number: MSP430FR2355

I have two radios sending data back and forth every 1/2 second.  Aperiodically a bug which stops the code running on the client side.  Every time it stops the debugger reports a large integer number. 

volatile uint joinSwitchFlag = 0;

This uint ONLY ever get's assigned either 1 or 0.  It has two places it can get assigned.  The first is an ISR:

#pragma vector = TIMER3_B0_VECTOR
__interrupt void Debounce(void)
{
    TB3CCTL0 &= ~CCIE;
    if (joinSwitchFlag == 1)
        joinSwitchFlag = 0;
    else
        joinSwitchFlag = 1;
    flags.swlockout = F;
    P1IE |= BIT2;
    LPM3_EXIT;
}

the second place is within code it get's assigned 0 after a joining process.....The radio's work as follows....The radios communicate one time on a join channel and after successful communication / acceptance the radios change their ID and move to a data channel.  These are hardcoded numbers currently within the MSP and are loaded onto the radio as needed to communicate on each channel.  Once in the data channel the radios should not move over to the join again.  The code must have a bug because the client always seems to move over to the join at some point.  My question is how is it possible for this variable to be something other than 0 or 1??  What am I missing or is the debugger just screwing up???

Thanks

  • How is "uint" declared? Specifically, is it <=16 bits, i.e. atomic?

    What sequence is used to set joinSwitchFlag in main()?

  • typedef unsigned int uint;
    

    Code performs a bunch of initializations then loads radio registers and captures from the radio the current channel number.....Client vs host is determined by P2.1 (ie host vs !host)

    The following code predicates the while (1) LPM3 loop.....

        if(!(host))
            {
                strcpy(Send, r_enable_groupID);
                wakeRadio(T);
                radioMachine(strncat(Send, "3422", 4)); //hard-coded group ID
                memcpy(radioID, (const void *)&RXData[9], sizeof(radioID));
                P1OUT &= ~BIT0;
            }
        if(host)
            wakeRadio(F);
    
        /*
         * client : should not come up in JOIN mode
         * host : could come up in either JOIN
         * or DATA mode depending on radio ID read
         */
        if (!strncmp((const char *)radioID, default_Info, sizeof(default_Info)))
        {
            joinSwitchFlag = 1;
            flags.clientJoinFlag = T;
        }
        else
            joinSwitchFlag = 0;
    

    After LPM3 there is a conditional flag that is initially 0 and gets turned if ever there is a keystroke on P1.2....ie swlockout.....this is in here for BOTH host and client.  My scenario is after power up I join the host to the client by pushing the P1.2 to join channel (it writes to radio ) and then push P1.2 to client....A few handshaking messages along with a data channel ID are transferred to the host which then rights this new channel to it's radio and switches over and begins data transmissions.....the client stays in join UNTIL p1.2 is pushed again at which point it changes its radio over to the data channel and comms of data start.  This works for a while but when it stops it always seems to be on the client end and the joinSwitchFlag is a large integer....I've set breakpoints but I NEVER stop at them...not sure....

    Here is code after LPM3:

            if (!flags.swlockout)
            {
                memset((char*)RXData, '\0', sizeof(RXData));
                flags.swlockout = T;  //a rising edge registered on P1.2 switch
                wakeRadio(T);
                radioMachine(r_get_groupID);
               strcpy(Send, r_enable_groupID);
    //            wakeRadio(T);
                if (joinSwitchFlag)
                {
                    P1OUT |= BIT0;
                    radioMachine(strncat(Send, default_Info, sizeof(default_Info)));
                }
                else //data
                {
                    P1OUT &= ~BIT0;
                    radioMachine(strncat(Send, permanent_Info, 4));
                }
                memcpy(radioID, (const void *)&RXData[9], sizeof(radioID));
                UCA1IFG &= UCRXIFG;
                if (host)
                    wakeRadio(F);
            }
    

    The timer ISR (above) is a debounce of 100ms to P1.2.  Within the client section of the code (after LPM3) there is a condition if (flags.joinSwitchFlag) that looks for messages with correct ID...If it finds it all is well and it acknowledges, etc....if not that is to say the radio ID matches something other than the join channel the code issues the following:

                            wakeRadio(T);
                            radioMachine(r_get_groupID);
                            wakeRadio(F);
                            memcpy(radioID, (const void *)&RXData[9], sizeof(radioID));
                            if (!memcmp( radioID, permanent_Info, sizeof(radioID)))
                                P1IFG |= BIT2;
                            else
                                flags.swlockout = F;
    

    basically...the code gets here if a join is occurring on something OTHER than the join channel (ie a data channel)....if the id matches the msp data ID then a bug brought us here and we issue a 'soft' pushbutton of P1.2 which should flip us back to the data channel that the node is on....if not then the swlockout allows the above code to go to the join channel because the ID is something else.....

    I apologize for all the info.....thought it may help to understand what I really want to know at the moment which is why the joinSwitchFlag becomes this monster integer when nowhere is it ever set to anything other than a 1 or a 0.....Hope you can get through the heavy read....

    Thanks

  • What value(s) do you see appear in joinSwitchFlag?

    Two things I would be doing about now:

    1) Paste those values into a calculator and display them in hex, to see if they look like ASCII or some other recognizable pattern.

    2) Check the .map file ("ordered by address" section) to see what variable(s) immediately precede that variable.

    These might indicate a buffer overflow, and only take a couple of minutes.

  • If you suspect the code is wrong, you should look at the code. I use "msp430-elf-objdump -S elf-filename|less" for that all of the time.

    When you do that you will find that the notational advantages of using bitfields, if any, come at the cost of code efficiency. The compiler tends to shift and mask the field:

        4540:       5c 42 16 1c     mov.b   &0x1c16,r12     ;0x1c16
        4544:       12 c3           clrc                    
        4546:       0c 10           rrc     r12             ;
        4548:       7c f0 03 00     and.b   #3,     r12     ;
        454c:       5c 93           cmp.b   #1,     r12     ;r3 As==01
    

    I can also see that when it sets that 2 bit field it is altering 3 bits which is a bit of a mystery.

        4558:       5c 42 16 1c     mov.b   &0x1c16,r12     ;0x1c16
        455c:       7c f0 f9 ff     and.b   #-7,    r12     ;#0xfff9
        4560:       6c d3           bis.b   #2,     r12     ;r3 As==10
        4562:       c2 4c 16 1c     mov.b   r12,    &0x1c16 ;
    

  • After startup and running a few minutes a stop in the code yields 0 which is what is expected....that is join occurs and we live in the data channel...When it stops working (LED stops flashing) I stop the code and see numbers like 27568....Crazy large

    I will look at your suggestions tomorrow evening....The ASCII patterns (receive data) is always cleared after I recognize the message is valid....The send array seems to always show the correct pattern....

    I am only sending data every 1/3 second....It is a mere 8-9 bytes....I've extended the tx period to 1 second and still have the same issue.

    I do have in the live code :

                if (UCA1STATW & UCRXERR)
                    junk = UCA1RXBUF;
    

    This occurs right after the incomingMssg flag is set (which means we've received a stream ending in a character '\n'..after this the CRC of the message is compared to the CRC that is sent to validate...

  • Where in the CCS gui is this setting to see this?  I would be interested.....Yes I've seen a 2 bit field that ONLY ever sees 0 and 1 as an assignment somehow creates a 3??  I've since moved this out of the bit-field and defined it as an unsigned int that is either 0 or 1 but low and behold I am still seeing issues....I've been over and over the code and cannot seem to trap this oddity.....

  • What's perhaps more significant than raw speed is that this is not even close to atomic. One can wave hands at something like "A |= 3;" since it will (probably) turn into a BIS instruction, but the read/modify/write window here is at least 10 clocks long.

  • Don't quite understand the meaning 'atomic' (I'm a hardware guy)....but it seems to me it's a simple on / off variable thats it...I did declare it volatile as a suggestion to the compiler bcz it is being changed within an ISR

  • You can generate a listing (.lst) file on each build using "Build Options->Build->Compiler->Advanced->Assembler->Generate Listing File". I suspect it doesn't use objdump, but the format is similar.

  • Did this...Not sure I know what I'm looking at???? or for??

  • I was referring to the bitfield thing that David brought up (more applicable to the other thread). 

    In general terms "atomic" just means "can't be caught partly-done". In software, this usually means "can't be interrupted". If main is in the middle of a non-atomic update sequence and an ISR updates the item being updated then you get a mess.

    That's why I asked about "uint" -- if it were e.g. a 32-bit type then it requires two instructions and thus can be interrupted. A 16-bit constant assignment is typically atomic; "volatile" still helps since it forces a write to memory right then. (That's why I'm starting to suspect a buffer overflow.)

    I don't use bitfields voluntarily (I had a brief flirtation maybe 30 years ago), but I supposed that a constant assignment would turn into a BIS or something else atomic. David's observations change all that.

  • Oops. I realized that I somehow missed the "-" in front of the 7 in those and.b instructions. Which means that the code is dealing with the 2 bit wide field as it should. I should have looked at the hex representation handily provided to the right.

  • Steve Wenner said:
    My question is how is it possible for this variable to be something other than 0 or 1??

    It is possible to set a "Watchpoint with Data" to trap any writes of values other than zero or one, to try and trap if there is a software bug which writes some other value. To do that:

    1. In the Breakpoints view right click and select Breakpoint (Code Composer Studio) -> Watchpoint With Data.

    2. Populate the dialogue as:

    3. The above sets a watchpoint for when the value 2 is written. Since we want to trap values other than zero or one being written, right click on the watchpoint in the Breakpoints view and select Breakpoint Properties. For Trigger 1 on the Memory Data Bus, change the Operator from "==" to ">=":

    This will stop the program if the software attempts to write a value of other than zero or one to the join_flag variable.

    Advanced Debugging Using the Enhanced Emulation Module (EEM) With Code Composer Studio Version 6 has more information on other EEM features.

  • This has become my new favorite friend....Thanks for showing me this....not found the bug yet and have switched to another flag that is having similar issues....running now....will keep posted 

  • Chester....

    I did as your dwg showed exactly with a flag that is either a 1 or a 0.....joinSwitchFlag....I found that although the debugger never stopped the code did break as witnessed by turning a RED LED on, on the board when things on the board stop working and by triggering my logic analyzer to see that although the transmitter is still sending packets the receiver (client) is gone into some other land.  I then paused the debugger (remember I had it set to stop at >=2) and as you can see the flag is showing > 15000 ....What am I doing wrong here?  Seems to me the debugger NEVER caught the flag going off...See the included pics....so then I hit the play button on the debugger and the rf comm starts up again (I did NOT reset anything) (proved with logic analyzer) and flashing green LEDs....I then pause again and the flag now = 0????? Very confusing 

    top one shows breakpoint

    bottom one shows joinSwitchFlag 151914762.Mem.docx

  • I think I see some faults in my code wrt your 'buffer overflow' comment....using strcpy and memcpy (this one prolific 9x)....strcpy looks good ....that is to say I am copying a char string (always within "") BUT memcpy looks like I have instances of targets < sources on size....I suspect this is a real problem?!

    Is this a problem with memcpy

                    memcpy(radioID, (const void *)RXData, sizeof(radioID)); 
    

    radioID is 4 deep array

    RXData is 32 deep

    notice that I do use the sizeof the small array

    I find myself doing this about 6x in the code this way

  • Steve Wenner said:
    Seems to me the debugger NEVER caught the flag going off

    Do you use DMA in your program?

    I think the Watchpoint With Data setup suggested will on trap writes from the CPU. There other options to trap DMA accesses.

  • Steve Wenner said:
    notice that I do use the sizeof the small array

    That isn't a problem, as won't write off the end of the destination array.

  • I do not use DMA....

  • That memcpy is safe. The usual mistake is to use the size of the source, not the destination, but you're doing the right thing here.

    My calculator says that 15191=0x3B57 which is ASCII ";Q". Does that look familiar at all? Yes, this is a shot in the dark, but sometimes there's a clue here.

    Just interpolating from the source, I suspect the preceding variable is "pResult", which as a "uint *" is not an obvious candidate for an overrun. 

    Is the compiler giving you any Warnings?

  • Your buffer overflow comment kept making sense and last night as I was looking over memcpy and strcpy I decided to look into my  radio ISR....Here I realized that my receive data buffer sometimes gets garbage in it and that I have no condition to check for overflow here.  I only checked for a terminating character.....Long story short is I put this check within the ISR and ran code successfully all night......The buffer overflow was the issue!!

    Thanks to everyone on this forum for the help and insight...

**Attention** This is a public forum