CCS/TMS570LS0432: Unable to store entire UART packet into BYTE array

David Wiest Jr

Part Number: TMS570LS0432
Other Parts Discussed in Thread: HALCOGEN

Tool/software: Code Composer Studio

Hi team,

I am currently using the TMS570 to debug a customer issue with one of our automotive battery monitors. The battery monitor device is stackable, and will send information such as cell voltage, temperature, etc to a host mcu (the tms570) via UART. Due to the stackable nature of the device, the UART packet can be quite long. For some reason, my array storing the data fills with nonsense, shown below:

This is a BYTE array, consisting of ~425 index points, but good data stops being stored after the 18th index. I have never had this problem before with smaller arrays. I have verified that the data actually being sent by the battery monitor is good (using a logic analyzer), and narrowed the problem down to the TMS570. I am assuming it is something basic that I managed to miss.... any help on the topic would be greatly appreciated!

over 8 years ago

0 Chuck Davenport over 8 years ago

TI__Guru 59540 points

Hello David,

From a device perspective, there is no known errata that would cause this misbehavior.

In order to help, we will need a bit more information about the application other than the simple problem statement.

How are you allocating the space for the array? Is it a standard declaration (unsigned byte array[425]) or is it dynamically allocated based on the number of stackable elements?

Is the array initialized to all 0's or some other known value at boot time or prior to use?

Is it possible that the Rx stops after the 19th element and you are seeing simply the default content of the RAM locations instead of corrupted received data?

Do you see any error conditions/flags associated with the UART? i.e., overrun errors, parity bit errors, etc?

What is the actual baud rate of the transmitters and receivers? i.e., is there a slight mismatch causing a stackup of tolerance in the byte stream causing sync issues to occur over time? i.e., framing errors?

If the number of bytes in the packet are variable how does the receiver know when the packet is through? Is there a data length code sent as part of the packet? How is the received data correlated with each of the stackable elements? Is there any data protection built into the packet (i.e. CRC or transmission redundancy?)

If setup the UART in loopback mode and send the same number of bytes, does the same issue occur?

Can you confirm that the array does not overrun the stack definition area or some other defined memory buffer storage from elsewhere in the code?

Also, have you tried to debug by putting a breakpoint on the 20th recevied data byte (19th index) to see what happens to the received data byte and find out if it is corrupted in the Rx buffer, during the move to the RAM array, or sometime after it is moved to the array?

0 David Wiest Jr over 8 years ago in reply to Chuck Davenport

TI__Genius 11480 points

Hi Chuck,

Thanks for the detailed response. Working in the embedded realm is out of my area of expertise, so i'll answer as best I can....

As you said... it is just a standard allocation. BYTE bFrame[423]...i do not believe it is set to 0's before.

My initial thought has been that the RX of the TMS570 is stopping for some reason, as you mentioned...I am assuming it is somewhere "deeper" in the code, and I missed a bit. I am not sure why it would work for shorter packets through.

From my logic analyzer, it looks like the TX is right around 500k, but the RX is 526k.

Newbie question....where can I look to see the status of those bits?

You are correct, the packet length is sent as part of the first byte. If we have 8 battery monitor devices, they will synchronize together and send out their information 1 by 1, starting with the highest cell. CRC is indeed sent by the battery monitor.

I am not so sure how to set the UART in loopback mode, but that is something I can check.

I don't think it would be overruning the stack definition... once again my inexperience with MCU's is showing. I am not sure where I would actually check this... the array is initialized as bFrame[512]...so there should be plenty of room in the array itself.

Thank you,

David

0 Chuck Davenport over 8 years ago in reply to David Wiest Jr

TI__Guru 59540 points

Hi David,

So let me take your reply one item at a time:

David Wiest Jr said:
As you said... it is just a standard allocation. BYTE bFrame[423]...i do not believe it is set to 0's before.

David Wiest Jr said:
I don't think it would be overruning the stack definition... once again my inexperience with MCU's is showing. I am not sure where I would actually check this... the array is initialized as bFrame[512]...so there should be plenty of room in the array itself.

So, if you are using the standard way of defining the array, there shouldn't be an issue overrunnig the stack or other arrays/data space. The linker will take care of RAM utilization/management at build time. One curiosity, though. You state bFrame[423] in the first sentence and bFrame[512] in the second. Not that it makes a big difference, but it should be something that is fixed to avoid unnecessary complexity.

David Wiest Jr said:
From my logic analyzer, it looks like the TX is right around 500k, but the RX is 526k.

So this means there is a >5% difference in the baud rate of your battery monitors and your MCU. This could lead to issues like you are seeing. Basically, the UART has a sample point based on the baud rate/SCI clock. When there is a mismatch in the clocks there will eventually be a bit shift and you will either get garbage or you will get UART errors (framing errors or parity errors).

David Wiest Jr said:
Newbie question....where can I look to see the status of those bits?

There is an error status register that can be checked. If you're using the HalCoGen drivers there is a function (uint32 sciRxError(sciBASE_t *sci)) that returns the status flags read from the error status register in the SCI.

David Wiest Jr said:
I am not so sure how to set the UART in loopback mode, but that is something I can check.

Again, if you are using HalCoGen, there is a function to enable loopback (void sciEnableLoopback(sciBASE_t *sci, loopBackType_t Loopbacktype)). It can be used to enable either the digital loop back (exercises only the digital logic in loopback and wont affect the pins) or analog loopback (excercises the analog IO buffers as well as digital logic and will affect the pins). More information on loop back can be see in the TRM. basically, anything you Tx will be recevied on the Rx side.

0 David Wiest Jr over 8 years ago in reply to Chuck Davenport

TI__Genius 11480 points

Chuck,

Thanks again for the very detailed response...this has been a good learning experience for me (if it isn't painfully obvious, I am much more comfortable in the analog world)

Per your suggestion, I used the sciRXError function, and it returns an unsigned in of 33554432, or 0x02 00 00 00. It appears that this is an overrun error, if I am reading the documentation correctly.

So now my question is, how do i correct this? Is the TMS570 just not able to handle a UART packet that large?

EDIT: To answer a previous question, bFrame is initialized as bFrame[512]. There are only 423 bytes being sent (so 423 index points).

Also, to resolve this would I need to tweak the prescaler in the BRS value, so that the two match more closely?

0 Chuck Davenport over 8 years ago in reply to David Wiest Jr

TI__Guru 59540 points

Hi David,

David Wiest Jr said:
Per your suggestion, I used the sciRXError function, and it returns an unsigned in of 33554432, or 0x02 00 00 00. It appears that this is an overrun error, if I am reading the documentation correctly.

So now my question is, how do i correct this? Is the TMS570 just not able to handle a UART packet that large?

So, if you are getting an overrun error, it means you are not reading the Rx buffer quick enough before the next byte is received causing data to be lost. There are several possible reasons for this but it depends on how you have the SCI setup. Are you using a polling method or interrupts to receive data? If you are using interrupts, when do you clear the interrupt flag (should be first thing). I would recommend spending as little time as possible processing incoming data and simply do a move from the Rx buffer in the module to your array.

Note that you can also setup the interrupts so that you are interrupted on an error so you can process it. Does your protocol offer any retry mechanism? Or at least and ACK/NACK mechanism so the master can communicate to the slaves that data was lost and to retry?

David Wiest Jr said:
EDIT: To answer a previous question, bFrame is initialized as bFrame[512]. There are only 423 bytes being sent (so 423 index points).

Ok. Thanks. Just a question for clarity. I don't think this is impacting your issue.

David Wiest Jr said:
lso, to resolve this would I need to tweak the prescaler in the BRS value, so that the two match more closely?

If you are getting the overrun error and not a framing or parity error, this is not, most likely, your problem. However, it would be good to see if you can get the baud rates to align more closely if possible but tweeking the prescaller. If you can't get it any closer, due to the operating frequencies, then don't worry too much about it and chalk it up to a granularity issue.

Another question, in the recieved btes, have verified that the data is accurate? i.e., in the collection of data received, are there any missing bytes? I am curious because this could tell us how sever the overrun issues are. i.e., if the 19 bytes received represent the first byte corresponding to byte 0 of the packet and byte 18 represents the last byte of the 423 byte packet, then you are missing a lot of data in the middle and there is a severe software architectural issue with your code.

Finally, how frequent are the packets sent in your system? i.e., how much actual processing time is allowed for the packet between transmissions? Again, this is an issue of servicing the received data in the SCI Rx buffer and if there is sufficient time to retrieve the data and move it to the array before the next byte is received.

0 David Wiest Jr over 8 years ago in reply to Chuck Davenport

TI__Genius 11480 points

Chuck,

We are using interrupts - specifically the sciReceive function found in sci.c...

Here is the sciReceive function (i apologize if this is formatted incorrectly, but it may help):

void sciReceive(sciBASE_t *sci, uint32 length, uint8 * data)

{
/* USER CODE BEGIN (17) */
/* USER CODE END */

if ((sci->SETINT & SCI_RX_INT) == SCI_RX_INT)
{
/* we are in interrupt mode */

/* clear error flags */
sci->FLR = ((uint32) SCI_FE_INT | (uint32) SCI_OE_INT | (uint32) SCI_PE_INT);

g_sciTransfer_t.rx_length = length;
/*SAFETYMCUSW 45 D MR:21.1 <APPROVED> "Valid non NULL input parameters are only allowed in this driver" */
g_sciTransfer_t.rx_data = data;
}
else
{
/*SAFETYMCUSW 30 S MR:12.2,12.3 <APPROVED> "Used for data count in Transmit/Receive polling and Interrupt mode" */
while ((length--) > 0U)
{
/*SAFETYMCUSW 28 D MR:NA <APPROVED> "Potentially infinite loop found - Hardware Status check for execution sequence" */
while ((sci->FLR & SCI_RX_INT) == 0U)
{
} /* Wait */
/*SAFETYMCUSW 45 D MR:21.1 <APPROVED> "Valid non NULL input parameters are only allowed in this driver" */
*data = (uint8)(sci->RD & 0x000000FFU);
/*SAFETYMCUSW 45 D MR:21.1 <APPROVED> "Valid non NULL input parameters are only allowed in this driver" */
*data++;
}
}
/* USER CODE BEGIN (18) */
/* USER CODE END */
}

I think this is pretty standard, and is generated from HAlCoGen. From our end, we use a "wait response frame" function, found here:

int WaitRespFrame(BYTE *pFrame, uint32 bLen, uint32 dwTimeOut)
{
uint16 wCRC = 0, wCRC16;
BYTE bBuf[512];
uint32 bRxDataLen;

memset(bBuf, 0, sizeof(bBuf));

sciEnableNotification(scilinREG, SCI_RX_INT);
rtiDisableNotification(rtiNOTIFICATION_COMPARE0);
rtiDisableNotification(rtiNOTIFICATION_COMPARE1);
rtiDisableNotification(rtiNOTIFICATION_COMPARE2);
rtiDisableNotification(rtiNOTIFICATION_COMPARE3);
rtiInit();
rtiEnableNotification(rtiNOTIFICATION_COMPARE1);
/* rtiNOTIFICATION_COMPARE0 = 1ms
* rtiNOTIFICATION_COMPARE1 = 5ms
* rtiNOTIFICATION_COMPARE2 = 10ms
* rtiNOTIFICATION_COMPARE3 = 15ms
*/
rtiResetCounter(rtiCOUNTER_BLOCK0);
rtiStartCounter(rtiCOUNTER_BLOCK0);
sciReceive(scilinREG, bLen, bBuf);

while(UART_RX_RDY == 0U)
{
// Check for timeout.
if(RTI_TIMEOUT == 1U)
{
rtiStopCounter(rtiCOUNTER_BLOCK0);
rtiDisableNotification(rtiNOTIFICATION_COMPARE0);
rtiDisableNotification(rtiNOTIFICATION_COMPARE1);
rtiDisableNotification(rtiNOTIFICATION_COMPARE2);
rtiDisableNotification(rtiNOTIFICATION_COMPARE3);
RTI_TIMEOUT = 0;

return 0; // timed out
}
} /* Wait */
rtiStopCounter(rtiCOUNTER_BLOCK0);
rtiDisableNotification(rtiNOTIFICATION_COMPARE0);
rtiDisableNotification(rtiNOTIFICATION_COMPARE1);
rtiDisableNotification(rtiNOTIFICATION_COMPARE2);
rtiDisableNotification(rtiNOTIFICATION_COMPARE3);
UART_RX_RDY = 0;
RTI_TIMEOUT = 0;
bRxDataLen = bBuf[0];

delayms(dwTimeOut);

// rebuild bBuf to have bLen as first byte to use the same CRC function as TX
// i = bRxDataLen + 3;
// while(--i >= 0)
// {
// bBuf[i + 1] = bBuf[i];
// }
// bBuf[0] = bRxDataLen;

wCRC = bBuf[bRxDataLen+2];
wCRC |= ((uint16)bBuf[bRxDataLen+3] << 8);
wCRC16 = CRC16(bBuf, bRxDataLen+2);
if (wCRC != wCRC16)
return -1;

memcpy(pFrame, bBuf, bRxDataLen + 4);

return bRxDataLen + 1;
}

Regarding the data integrity, I verified that the first 18 bytes coming into the array are indeed the first 18 bytes coming in the packet. My logic analyzer shows it matches byte for byte, up until that magical 18th byte....

There are several ms in between packets. I think this is plenty of time for the mcu to work through.

0 Chuck Davenport over 8 years ago in reply to David Wiest Jr

TI__Guru 59540 points

Thanks David,

The sciReceievie function above is capable of being used in either interrupt mode or non-interrupt mode (polling). The first conditional is checking if in interrupt mode and if yes, then it simply transfers one byte into g_sciTransfer_t.rx_data. It is then up to the higher functions to move the data from the g_sciTransfer_t.rx_data struct to a buffer array and to manage the datalengths for the packet.

In the alternate part of the function (the else after checking if in interrupt mode), it uses the passed length parameter to pol for data until the whole packet is received and placed into the data buffer passed in as an argument. However, there is a check in this code that concerns me in that it waits until either the receive flag is set (new data available) or an error flag is set (while ((sci->FLR & SCI_RX_INT) == 0U) ). If we consider that an error flag is set, the data is still read from the recieve buffer into the data buffer, the buffer pointer is incremented and we then wait for the next byte. So, if there is a sync issue as originally speculated resulting in frame or parity errors, you would basically load garbage into your data array and move on.

I don't believe the baud rate mismatch is responsible for this issue either, but if you can adjust the baud rate of the slaves any at all closer to 500kbps, it could eliminate this question. I checked the MCU and 500k is about as best your going to do as each 1 count change in the divider
adjusts the divider by ~56kbs. i.e., the best you can do it 500 +/-56.

Admittedly, I am a bit confused by your wait response frame function and the use of the RTI functions. I see that you call the sciReceive function within this wait response frame function and pass it a length of bLen which will determine the number of bytes to recieve. This parameter is passed into the wait response frame function. However, when you copy the bBuf array into pFrame (not sure what this new buffer is), you use the value in bRxDataLen assigned from bBuf[0]. This could lead to inconsistancy between the numberof bytes specified by bLen and those specified in the packet itself (bBuf[0] or bRxDataLen).

It is also possible that if you have the RTI interrupts enabled, that the ISR for these is blocking the execution of your sciReceive function since it isn't an interrupt. This could cause the overrun error as well.

0 David Wiest Jr over 8 years ago in reply to Chuck Davenport

TI__Genius 11480 points

Chuck,

Thanks for sticking with me on this... it has definitely been a learning experience so far, and hopefully it can help someone else in the future.

I have changed the battery monitor baud rate to 250k, and have written 250000 to the tms570 with the sciSetBaudrate function. Now, my analyzer shows both working happily at 252525 Hz.

I spoke with someone more familiar with the code, and the rti functions are intended to be a timeout, incase the amount of bytes received don't match the amount of bytes requested. I have commented these out, and replaced them with phantom interrupts in sys_vim.c. Unfortunately, my problem persists after both changes...

The RTI interrupt was actually something that I had been suspect of...is there anything else I should do to ensure those are disabled? If I remember correctly, interrupts need to be enabled somewhere else besides sys_vim.c

linHighLevelInterrupt is also set on channel 13 of Sys_Vim.c....I believe this is setting the SCI RX interrupt as high priority, correct?

0 Chuck Davenport over 8 years ago in reply to David Wiest Jr

TI__Guru 59540 points

David,

Can you send me your project so I can try and have a look at it here? Specifically, please include your Halcogen files as well (hcg and dil files) and what version of Halcogen you are using.

I am curious about the interrupts because the interrupt you mentioned linHighLevelInterrupt is for LIN not for SCI. I suspect you aren't really using interrupts as you believe since you are manually calling the notification functions which would need to be defined as ISRs for them to work properly as iSRs.

Anyway, if you can zip it up and send it by attaching to this thread or send it to me directly through TI email, that would work.

Arm-based microcontrollers

Arm-based microcontrollers forum

CCS/TMS570LS0432: Unable to store entire UART packet into BYTE array