cc2530 Rx errors

Fred10415

Other Parts Discussed in Thread: SIMPLICITI, TIMAC

I've been working on a proprietary network using the 2530's and today's the first time we tried to get multiple nodes to connect at the same time (I'm only using 3 just to troubleshoot a bit). One on one the network works fine and to try and make it more resilient there is a random delay every time a command requires multiple nodes to respond. In addition each transmit is sent using ISTXONCCA in a loop similar to the one in simpliciTI. The problem is one by one the nodes stop triggering interrupts on received frames (except for the last node).

To make sure there aren't any bad frames being received I throw out everything that doesn't match the FCS and flush the buffer afterword (even though that never seems to happen). In debug mode for some reason the RF_VECTOR stops triggering and like I said before, this problem only started when I began to attach multiple nodes (actually we had seen some problems before where a device would suddenly stop receiving but that was usually after several days, not less than a minute like I'm seeing now). While debugging I look at the addresses for RFERROR interrupts and there aren't any, I look at flags that should indicate a received frame, but there aren't any, and I look at the last frame I read and it matches what it should have received. I checked all the error flags and nothing is showing up, but if I reset the device it works again.

Even stranger is to test whether or not RFERR flags are working properly I purposely read 200 bytes from the rx buffer to create an underflow and even though the RXUNDERF flag is set, the TCON.RFERRIE flag isn't set so no RFERR interrupt is triggered.

Yes I have IE0.RFERRIE set (and verified that it stays set using the debugger) as well as IE0.EA. As far as I can tell nothing regarding the flags or interrupts the radio should look at have changed.

Has anyone else had similar problems joining nodes or noticed that the interrupt vector isn't working?

over 16 years ago

0 MaMoe over 16 years ago

TI__Intellectual 1005 points

To get the RFERR interrupt to trigger you also need to set the RFERRM bits. To get the RXUNDERF condition to trigger an interrupt youset RFERRM[3] = 1. The naming is perhaps confusing as setting a mask bit to 1 actually means that the corresponding condition is enabled and not masked away.

0 Fred10415 over 16 years ago in reply to MaMoe

Prodigy 130 points

Thanks Moe, I don't know how I missed that. That takes care of the error interrupt problem I was having although I still notice that when my nodes lock up I never get an rf error.

For those of you interested I was looking in the Zigbee stack and found a note from TI that there is a hardware issue that can corrupt the rx buffer on the 2530. When trying to implement their solution I was still having problems with nodes locking up, even using the SimpliciTI code (as mentioned above). For my code I used the hal_rf.c file that comes with the 2530 software examples and just modified the tx and rx functions to match SimpliciTI and the TI MAC (from the Zigbee stack). I finally noticed that at the beginning of the file where they define macros they used:

#define ISFLUSHRX() st(RFST = 0xEC;)

But ISFLUSHRX should be RFST = 0xED. Instead of flushing the buffer it was trying to transmit and because I ran the flush command twice in a row (per the data sheet) it was locking the node up. This helped but I was still seeing nodes lock up. It turns out there was another problem I was having using the random number generator code from SimpliciTI, it was always giving me the same "random number"... zero.

I didn't look into SimpliciTI enough to know if they were using a different clock speed or something but for the inialization that hal_rf.c uses RFRND was always 0x02 (and since I used the SimpliciTI code I was only looking at the first bit). I added a delay of 1 ms in the loop and that seemed to correct the problem because now my numbers are random.

Now that the nodes have a random delay and properly flush bad frames my small network doesn't seem to have collisions that disable nodes anymore. The only remaining issue is that if I take out the random backoff and two nodes try to transmit at exactly the same time, one or both of the nodes lock up and the radio stops receiving frames. Although this should be a rare occurence it's still possible and I'd like a solution to prevent nodes from locking up as my network gets larger. If anyone knows how to stop this, or at least generate an error that I can spot and use to reset my device, I'd be very appreciative.

Thanks,

-Fred

0 MaMoe over 16 years ago in reply to Fred10415

TI__Intellectual 1005 points

It's hard to say exactly what is causing the lockup situation you describe. To start to debug you could read the FSMSTAT0 register too see if you have ended up in some state you did not expect. I.e you are not in RX when you expect to be.

That being said, buidling a robust network is a challenging task and there are many ways to approach the problem. Even with random backoffs you will get collisions from time to time and interferers will also make you loose packets. Auto retransmissions based on some aknowledgement scheme combined with random backoffs (from packet to packet) should guarantee that all the nodes get some data through eventually. The SimpliciTI and TIMac uses these techniques and should be a good starting point.

0 Fred10415 over 16 years ago in reply to MaMoe

Prodigy 130 points

Thanks for the insight Moe you were right, it was being taken out of rx mode. It looks like when two nodes tried to respond at EXACTLY the same time (or at least within the 192 us window to turn on the transmitter) they would receive part of the other nodes message. Sometimes it would have a length of greater than 128 bytes (which I was already checking for per the TI MAC stack), other times it would have a length that didn't correspond with the FCS, which I was also looking for. The case I was missing, and what seemed to be doing the damage, was when the length was less than RXFIFOCNT, which should never happen. I put in a check to make sure my length is always greater than RXFIFOCNT and flushing the buffer if it isn't (before reading the whole frame) and that seems to have corrected the problem. When we get more chips in and I can test on a bigger network we'll see if this solution will stick. The only question remaining is why there was never an rf error generated.

Thanks again,

-Fred

0 MaMoe over 16 years ago in reply to Fred10415

TI__Intellectual 1005 points

I assume that when you write "length" you refer to the length field of a frame. I also assume that in the case you describe above you have two nodes in RX and they both want to start transmitting at roughly the same time(Using STXON?).

In this case it is not unexpected that you get some partial frames in the rxfifo. Since as you write the "slowest" node will start receiving what the "faster" node is transmitting. This will cause partial frames being stored in the rxfifo and this must be handled by software. However if the scenario is like this I would expect that you will get the RXABO flag set. This flag is set if you abort reception in the middle of a frame.

Another way to avoid problems like this is to use STXONCCA, if you have set the appropriate CCA mode you will not start transmitting if you are in active receive. You will still receive the frame from the other device and you need to filter this out somehow.

I'm not quite sure what you mean when you write "length that didn't correspond to the FCS". The FCS doesn't contain any length information does it?

It is quite possible that the length field of a packet is smaller that rxfifocnt, this happens when there is more than one frame in the rxfifo at the same time. If the read length field is larger than the rxfifocnt you either have a partial frame in the fifo (or are still receiving).

Hmm. Maybe we have different understanding of what "length" is?

0 Fred10415 over 16 years ago in reply to MaMoe

Prodigy 130 points

Moe,

Sorry to be so vague with my descriptions. Yes, when I talk about lenth I mean the frame length byte given just after the preamble. When transmitting I turn the set FRMCTRL0.RX_MODE to 0x03 (11) to disable symbol search, flush the rx buffer, and turn on the receiver. Once I get a good RSSI I send the frame using RFST = ISTXONCCA, then check FSMSTAT1 for the sampled CCA. This is done in a while loop so if the bit is zero the device will continue to retry sending the frame. Next, I look at FSMSTAT1.FIFO to see if any frames were received from the time between flushing the rx buffer and sending the frame. If there are, I flush the buffer again. Below is the exact code I use to transmit, where halRfRxOn() simply fushes the rx buffer twice and turns the receiver on.

uint8 halRfTx(void)
{
uint8 status;
uint8 x;
uint8 txActive = 0;

HAL_INT_LOCK(x);    // Save the current global interrupt state (EA) and turn them off

FRMCTRL0 |= RX_MODE_SYMB_DIS;   // Set the receiver to disable symbol search
halRfRxOn();    // Turn the receiver on and flush anything in the rx buffer
while(!(RSSISTAT & RSSI_VALID)); // Wait for the RSSI to stablize

while(!txActive)
{
    ISTXONCCA(); // Sending
    txActive = FSMSTAT1 & FSM_SAMPLED_CCA;
}
status = SUCCESS;

FRMCTRL0 &= ~RX_MODE_SYMB_DIS;

// Clear the receive buffer if partial frames were received during tx
if(FSMSTAT1 & FSM_FIFO)
{
    halRfClearRx();
    RFIRQF0 &= ~IRQ_RXPKTDONE;
    RFIRQF0 &= ~IRQ_FIFOP;

    S1CON= 0;
}

HAL_INT_UNLOCK(x); // Turn on global interrupts if they were on before

return status;
}

Also, when I wrote the "length didn't correspond to the FCS" what I meant is that I read the length byte and read that amount of data from the RXFIFO (as long as it is less than FIFOCNT). If the last byte isn't 0xEC then I assume I received a bad frame and flush the rx buffer. The only rf interrupt I have enabled is RXPKTDONE so I should never get into my function that processes the rx frame unless an entire packet was received. My RF ISR code is below, where pfISR is initiallized as my function to read and process the frame.

HAL_ISR_FUNCTION( rfIsr, RF_VECTOR )
{
uint8 x;

HAL_INT_LOCK(x);

if( RFIRQF0 & IRQ_RXPKTDONE )
{
    do
    {
      if(pfISR){
          (*pfISR)();                 // Execute the custom ISR
      }
      S1CON= 0;                   // Clear general RF interrupt flag
      RFIRQF0&= ~IRQ_RXPKTDONE;   // Clear RXPKTDONE interrupt
    } while(RXFIFOCNT);
}
S1CON= 0;
HAL_INT_UNLOCK(x);
}

The portion of my processRx function that reads the frame is below:

uint8 fcs;
uint8* pBuf;
uint8 len;
uint8 frmType;

// Clear interrupt and disable new RX Frame Done interrupt
halRfDisableRxInterrupt();

// Read the header
pBuf = (uint8*) &rxPacket[rxIndex].rxHdr[0];
halRfReadRxBuf(pBuf, NWK_HDR_LEN + 1); // the extra 1 is for the length byte

// Set the length of the command id + command
len = rxPacket[rxIndex].rxHdr[0] - NWK_HDR_LEN - NWK_FCS_LEN;

// Make sure the length is less than 128 to avoid an overflow
// And make sure it is less than or equal to FIFOCNT to avoid an underflow
if(((len+ NWK_HDR_LEN + NWK_FCS_LEN) > NWK_MAX_FRM_LEN) ||
     ((len+NWK_FCS_LEN) > RXFIFOCNT))
{
    // Clear the RX buffer to get rid of a possible overflow or underflow
    halRfClearRx();

    // Enable RX frame done interrupt again
    halRfEnableRxInterrupt();
    return;
}

// Read the command id
pBuf = (uint8*) &rxPacket[rxIndex].rxCmd[0];
halRfReadRxBuf(pBuf, len);

// Read the rssi value
halRfReadRxBuf(&rxPacket[rxIndex].rssi, 1);

// Make sure the frame passed the check sum, otherwise erase the packet
halRfReadRxBuf(&fcs, 1);

// Check the FCS to make sure it is a valid frame
if (fcs != 0xEC)
{
    // Clear the RX buffer to get rid of a possible overflow or underflow
    halRfClearRx();

    // Enable RX frame done interrupt again
    halRfEnableRxInterrupt();
    return;
}

The structure rxPacket[5] is a structure I created to store the last 5 received packets. It contains an array rxHdr[12] that is a constant length and contains header information and an array rxCmd[115] that contains command information (as well as other info not relevant to this process). The last change I made between the time my nodes were locking up and the time they worked was adding the "|| ((len+NWK_FCS_LEN) > RXFIFOCNT)" to my frame length check.

It looks to me like the only time I could have been getting partial frames is when the device finished transmitting, the receiver came back online and it detected a 0000 in the middle of a frame (which can happen with the command they were sending). Does that make sense, or is there something else going on?

0 MaMoe over 16 years ago in reply to Fred10415

TI__Intellectual 1005 points

Are you using AUTOCRC=1? Why are you checking that the last byte is 0XEC? The FCS result is found in the most significant bit if you are using AUTOCRC. The lower 7 bits are correlation value that is different from frame to frame.

I cannot see exactly what goes wrong in your program. How do you handle the fact that there might be several frames in the rxfifo? There might be more than one frame in the rxfifo that needs to be processed on a rxpktdone interrupt. A useful tool to see if there are more data available is the FIFOP interrupt.

-M

0 Fred10415 over 16 years ago in reply to MaMoe

Prodigy 130 points

Yes, I'm using AUTOCRC = 1 and I guess for my testing the correlation value has always been the same. Thanks for the tip, I'll have to change that.

To check if there's any data left in the RXFIFO my ISR routine has a while loop in it that loops until RXFIFOCNT =0 (see HAL_ISR_FUNCTION( rfIsr, RF_VECTOR ) in my last post).

Like I said, it looks like when I checked for a frame length longer than RXFIFOCNT that appears to fix the problem, I'm just not sure why it wasn't generating an error (which makes me nervous as we start to build bigger networks).

Thaks for the help Moe,

-Fred

0 Klinkenbecker over 15 years ago in reply to Fred10415

Intellectual 615 points

I also had similar problems.

In testing I send raw (non zigbee) frames from one radio to the other One node only sends, the other only recieves.

After a few thousands of frames (varying power) the receiving node radio reports FSMSTAT0=33 and FSMSTAT0=2 and it no longer receives frames.

Remember the receiver is not sending, but somehow the TX_ACTIVE bit is set and the radio state is >invalid< (state not found in Table 19.3 p209).

To fix: in the receive function look to see if you have TX_ACTIVE and SFD low - this is an invalid state (I think) and you can reset the radio.

I use;

ISRFOFF();
ISRXON();

and the radio begins receiving again for the next few thousands of frames.

I have a feeling this is a result of a RX_FIFO overflow, but I am not certain and I may never bother to find out.

Enjoy ;-)

0 Klinkenbecker over 15 years ago in reply to Klinkenbecker

Intellectual 615 points

And just confirming - changing ISFLUSHRX() to 0xED instead of 0xEC fixes the (my) issue.

Other wireless

Other wireless technologies forum

cc2530 Rx errors