This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS/MSP430F6720: Can variable value change when being passed to a different variable?

Part Number: MSP430F6720

Tool/software: Code Composer Studio

Hi,

In the following test setup, two system A & B are connected via UART/Xbee radio.

System A (MSP430 + UART/Xbee Radio)   --->  System B (PC + Xbee with USB adapter)

System A repeatedly packs a hard coded variable(INT16)and message count, checksum into a message, then sends out to System B via UART.

void xbeeSendIntAT(char moduleIdentifier, char msg_type, char port, char sensor_type, int data) {
	unsigned char tempArray[2];
	tempArray[0] = (data >> 8) & 0xFF;
	tempArray[1] = data & 0xFF;
	xbeeSendmsgAT(moduleIdentifier, msg_type, port, sensor_type, tempArray, 2);
}

void xbeeSendmsgAT(char moduleIdentifier, char msg_type, char port, char sensor_type, unsigned char data[], char dataLen) {
	char i;
	unsigned char msg[48];
	msg[0] = 7 + dataLen; 		// message length, including length byte itself and checksum
	msg[1] = moduleIdentifier;
	msg[2] = msg_type;
	msg[3] = port;				
	msg[4] = sensor_type;
	for(i = 0; i < dataLen; i++) {
		msg[5 + i] = data[i];
	}
	msg[5 + dataLen] = xbee.frameID;
	xbee.frameID++;
	unsigned char checkSum = 0;
    // Calculate checkSum, attach to the end of each message so the receiving end can check integrity. To avoid message truncation and concatenation.
    for(i = 0; i < 6 + dataLen; i++) {
    	checkSum += msg[i];
    }
    msg[6 + dataLen] = 0xFF - checkSum;
	if(xbeePresent()) {
		while (P3IN & BIT1);					// Wait till xbee is ready to take input
		delay_ms(10);							// Wait 2 character's time: 1.7ms; RO = 3, 3 character time 2.5ms. Otherwise two adjacent messages will be combined into one RF package. 4ms is not enough
		for (i = 0; i < 7 + dataLen; i++) {
			xbeeSendByte(msg[i]);
		}
	}
}

On the receiving end, software on PC will unpack the message, insert into a SQLite database.

Occasionally, the value of the hard coded variable in System A will appear differently on system B when received. Specifically, BIT7 of the value will be flipped from 1 to 0 or 0 to 1. For example, 0x0172 will show up in the database on B as 0x01F2, 0x0078 --> 0x00F8, 0x0190 --> 0x0110, 0x1388 --> 0x130B.

The message count and checksum of message appear to be correct on receiving end. So the the it is unlikely the change value occurred during transmission to UART and via Xbee radio. that leaves the only possible step to when the value of the hard coded variable is passed to tempArray before being packed into a message. But I find this hard to believe.

We have a few hundred System A deployed in field. So far only 1 particular A generated enough errors that caught our attention. In a controlled test setup, the error rate of this particular A is about 1 in every 250 messages. On other System A with the same PCB and test code and shared System B, the error rate is less than 1 in 40000. We detected only 1 error in two weeks.

This seems quite puzzling so far. Could anyone provide some pointers? Many thanks in advance.

  • More than likely something is clobbering the memory.
  • The RAM usage is only 44%, while FLash is indeed being pushed to the limit, > 99%.

    Could you explain this a little more? If the error is caused by EMI, I would assume other parts of the code will be affected as well, and the control that is sitting right next to the one with high error rate will be affected at the same rate.

    Thanks.

  • It is very unlikely for one bit, but I assume some other part of your program - probably an errant pointer - is over writing this memory space. Do you have any *x |= 0b01000000 like statements anywhere?
  • There are temporal variables are used. What would be solution if error was caused by such operation?

    Also BIT7 occassionally flips from 1 to 0, but we don't have operations like &= ~BIT7;

    char sensorType = 1;
    int sensorValue = 0x0172;
    xbeeSendIntAT(1, 0, 1, sensorType | BIT7, sensorValue);




  • I don't know what you mean by a "temporal" variable, but the solution is to find out what is writing to the wrong variable and stop it.
  • Hi,
    As you think there is no problem with TX function. Can you post the code that generate sensorValue?
    What is hard coded variable? Is the sensorValue directly assigned by you?
    Eason
  • In test setup, sensorValue is assigned/hard coded as in the following code:

    char sensorType = 1;
    int sensorValue = 0x0172;
    xbeeSendIntAT(1, 0, 1, sensorType | BIT7, sensorValue);

    then it is passed to tempArray[2], further transmitted out via UART and checked on the receiving end.

  • Hi,
    So, in test setup, mistake will also happen? Does it have any relationship with other phenomenon?
    Have you use RTOS?
    Is other bytes of msg[48] changed? You may change it as a global variable and use a function to check before put it into xbeeSendByte().
    Eason
  • We got the error from test setup and we don't know if other part of msg[48] are also changed or not yet. But I double the remaining bytes are changed since the first 9 bytes are not. Still I think it is good idea to just poke around see what happens.

    In the latest test session, we fitted the Xbee radio on A with a new one with the same firmware/hardware. So far in the 12 hours since, we haven't seen any errors on the receiving end. so the tentative conclusion is somehow that particular Xbee radio is somehow compromised and radiates more than allowed amount of EMI which in turn changes the value of sensorValue when it is passed to tempArray[2]. We also have circumstantial evidence that sensorValue itself is never changed, like video of the value displayed on LCD. And it is very unlikely one single bit is changed during transmission while checksum is then changed accordingly to make them match while not changing any other bytes during transmission. With the latest test with new xbee radio, we know for a fact the same firmware running on the same MSP430 MCU runs mostly fine. So problem from firmware can also be ruled out.
  • Hi,
    If it is an EMI problem, it should not have a regular change.
    Are you sure only the msg[5~6] change and other bytes don't change, how about msg[0~5]? As the checksum you use is too simple, you can try to use CRC8
    Eason
  • The error always happen to msg[6] BIT7. It either flips from 1 to 0 or 0 to 1 depending assigned sensorValue when error happens. The checksum formula is indeed simple, but it is very unlikely the checksum/msg[8] is changed in a way to compensate change of msg[6] BIT7. And meanwhile we see msg[7] increment as it should be on receiving end. On top of checksum in RF payload, Xbee radio has its own message integrity check.
  • Hi,

    OK, we return to the original point.

    Let us clarify your problem, please check every item carefully.

    1. When you send a word to PC. You sometimes find that the data sent and the PC receives are different.

    2. The only difference lies on bit7 in the lower byte.(0 to 1 or 1 to 0)

    3.  However, you find that the checksum is right when checked by PC.

    If so, we can reach a conclusion, the data is changed before it is send into xbeeSendByte()

    There are two reasons: hardware and software.

    hardware:  EMI, MSP430 bug

    software: pointer

    Anyway the problem is happen in MSP430. Can you define a constant value of sensorValue which is saved in flash and more reliable? And compare it after generating the checkSum. If it can response to the error, we can know that the data is changed in RAM and happened before generating the checkSum.

    The problem is so strange. So doing experiment and make assumption with solid evidence. All your reply is only in word, please attach some test result.

    Eason

  • Your summary is right, I will add/edit to for the sake of completeness.

    1. Data are sent to PC via a xbee radio connected to UART on MSP430. The xbee part is important. We didn't test wired connection, and I don't think hardwired connection will generate such errors as well.

    4. frameID, the message count received on PC is also correct.

    The following code is used in one of the test and resulting data received and plotted image are attached.

    int readTemperature() {
    	sdReadCount++;      // declared in header to generate artificial swing
    	if(sdReadCount > 59) {
            sdReadCount = 0;
            return 400;
    	} else {
            return 370;
        }
    }
    
    int readCO2() {
        return 5000;
    }
    
    int readRPM() {
        return 120;
    }
    
    // Following code running in main clycle, once per second
    sensorvalue = readTemperature();
    xbeeSendIntAT(1, 0, 1, 1, sensorvalue);
    
    sensorvalue = readCO2();
    xbeeSendIntAT(1, 0, 1, 10, sensorvalue);
    
    sensorvalue = readRPM();
    xbeeSendIntAT(1, 0, 1, 4, sensorvalue);

    Please note, in the ~ 12 hours when a good radio put in, no spikes were detected. It is a short period of time, not conclusive but indicative enough. Once the bad radio is put back in, spike start to show up. Changing of value from 370 to 498, and 400 to 272.

    Please note the circled spikes from 120 to 248

    Please note the tiny downward spike, 5000 to 4872

  • Please note message 37, 38, 39; Those are from an earlier session of test.

    Message 37: scout report that temperature is 0190
    one second later:
    Message 38: scout report that temperature is 01F2 while the actual value should be 0172. Observations: 1) this 01F2 value came out of nowhere. 2) frameID byte correctly incremented by 1. 3) checksum byte adds up. 

    200 mini second later:
    Message 39: scout report alarm is cleared, with latest reading(should be the same as in message 38) at 01F2, again, the correct value should be 0172, because 01F2 is greater than threshold 017C. If the current reading as seen in monitor.c is 01F2, monitor.c won't think alarm is cleared.

  • The conclusion so far is, a compromised xbee radio caused this error, but how exactly this happens is unknown. Also the problem is not limited to this particular radio. We have data from field test that "Good" radios can also cause such errors but at a much lower rate.

    this was not clear to us at the time. We evaluated a wide range of possibility, eventually narrowed down to the radio with fixed sensorValue in firmware, and swapping of radios etc.

    Please note the lonely upward spike here. this test lasted more than 2 weeks.

    Rotor speed data during the same period of time. Two upward spikes from 119 to 247. Please ignore the downward spikes, motor slowed down during normal operation.

  • Hi,
    Thank for your detailed reply.
    1. Too many data, but I can see that you guess the problem lies on the radio.
    2. Good radio will result a less frequency of problems. A bad radio will result a high frequency of problems.

    Here is my questions:
    1. what is the difference between good and bad radios?
    2. You make an assumption that the reason is the radio. So why not you just remove the radio and use the PC or MCU to collect the data directly and judge whether problems still happen?
    3. If the problem lies on the radio, it is still hard to explain how it will affect MSP430 so regularly.

    I think this problem will need much effort and time to look into and solve, but it is worth.
    Eason
  • To answer your questions:
    1. We don't see any obvious difference between the BAD radio and good radios. We just noted that particular radio caused much higher (1/200 vs. ~1/40K) rate of error once installed on one of our MSP430 PCB.
    2. The working theory is that the "BAD" radio causes the error when installed on MSP430 PCB. In that regard, I wouldn't be surprised if we don't see such error once the radio is removed and we connect MSP430 PCB with USB dongle with hardwires.
    3. It is hard to believe, but the only sensible theory seems to be radio, because the much higher error rate moves around with that particular radio.

    Later this week we will devise a particular version of software on PC and bare minimum firmware for MSP430 to test the problem, and post back our results.
  • For the record, the problem is further narrowed down to either Xbee radio itself or the communication between MSP430 and Xbee radio via UART.

    We were able to catch a bout of errors on the receiving end. In the attached image, the last column indicates whether the message is valid or not judged by whether checksum matches. Red marks wrong byte(s), with correct values in following line. Double error in both sensor value bytes and checksum byte did happen. in this case, the message appear to be correct on receiving end while it is actually incorrect.

    So far the most prone bit seems to be either BIT7 or BIT3. The problem could happen between two xbee radios or xbee radio and MSP430, but shouldn't have happened in memory.

     


  • Hi,
    It is a good news to here that, so the problem is simple now.
    I guess mostly the UART communication is affected by the radio.
    Why not use a oscilloscope to catch the wave on UART. So you can evaluate the effect of radio on the UART line.

    I still have a question:
    1.What the PC will do when the checksum of read data is wrong? Throw it?
    Eason
  • The problem is the problem is baud rate. Somehow that particular xbee radio is really finicky about baud rate. The same radio generates different error rate in different setup:

    1) FTDI based USB-UART adapter, sending by software on PC, 0/9000 messages has error.
    2) MSP430, LPM0, use SMCLK sourced from DCO, 0/1000 messages has error
    3) MSP430, LPM0, use ACLK sourced from XT1, 27/1000 messages have error
    4) MSP430, LPMP3, use ACLK sourced from XT1, 41/1000 messages have error.
    Good/New xbee radio:
    5) MSP430, LMP3, use ACLK sourced from XT1, less than 1/10K messages have error.

    Not sure why error rates are different between 3) & 4) because MSP430 is fully awake and error usually happens at the later bytes of message. And also not sure why that particular radio is finicky about baud rate. But going forward we will just use LPM0 and SMCLK as source for all UART related project on MSP430, just to be safe.