Other Parts Discussed in Thread: CC2592,
Hello,
We have a pretty simple design with the CC2652R and the CC2592 and are using the SimpleLink CC13x2 26x2 SDK 4.20.01.04 stack. We are running our router application on dozens of products and it generally runs well.
During our testing, we have seen where one of our routers will power on and be able to transmit & receive for several seconds, but after that, it no longer sends or receives anything. Using a sniffer capture, we see that the packets it did sent have strong RSSI (i.e.-40 or -50dBm). But after some number of seconds (2-15?), it doesn't route, repeat broadcasts, or even send link status packets.
We instrumented code on the router to watchdog reset it if it doesn't receive any packets for 35 minutes, and when it is reset, it operates fine. It is rare for a router to get into this state, and it is not persistent.
When the 35-minute timer expires without receiving any packets, I wrote code to read a bunch of values from the stack & application, and write them to a nonvolatile blob. I then send this blob in a special packet after the reset to help us see what state the stack was in when it was not transmitting or receiving anything. Here are some of the data items I'm saving, and how I'm getting them.
1. What the stack thinks the PAN ID and channel are, along with the rx_on_when_idle flag and if it's part of a network:
- I call Zstackapi_sysConfigReadReq() every 15 seconds and read the panID, chanList, macRxOnIdle, and devPartOfNetwork values. When a router gets into this bad state and the 35-minute timer expires, I save the most recently read values to nonvol right before watchdog resetting the software.
2. If the stack thinks it is joined:
- This method is probably redundant to reading the devPartOfNetwork field with the Zstackapi_sysConfigReadReg() command in #1.
- I set a boolean flag to true in the zclGenericApp_ProcessCommissioningStatus() function, in the BDB_COMMISSIONING_NWK_STEERING case handler, if the bdbCommissioningStatus is BDB_COMMISSIONING_SUCCESS. Or, I set the boolean flag to false if the status is not BDB_COMMISSIONING_SUCCESS.
3. Monitor ZStackMessages to see if any "leave" or other unexpected messages are received.
- In zclGenericApp_processZStackMsgs(), I shift in the most recent event into a U32 if it doesn't match the most recently received event, and keep a counter of the number of received stack messages. This will track the 4 most recent stack events and provide a counter of how many stack events were received.
4. Transmit Packet Status
- Our application sends a packet to the coordinator every 15 seconds. I track the number of packets we think we are sending, and the return status value from AF_DataRequest().
5. Number of Received Packets
- I increment a counter every time zclGenericApp_processAfIncomingMsgInd() is called. There are broadcast commands being send by the coordinator once every couple of minutes that it should receive.
6. Transmit Power Level
- I set a value in ZMacSetTransmitPower() anytime the transmit power is set.
There are many other fields, but these are most of the stack-related settings. When the router resets, and sends the debug information, I extracted the following information:
#1 - The values read from Zstackapi_sysConfigReadReq() indicate the panID and channel are correct. The RxOnWhenIdle setting is true, and the devPartOfNetwork setting is also true. So the stack seems to think it's connected and operating on the right channel & PAN ID.
#2 - My boolean "joined" flag is set to true.
#3 - The most recent ZStack messages include: BDB_Notification (0xc5), AF_DATA_CONFIRM_IND (0x91), and INCOMING_MSG_IND (0x92). (There were two 0x91 messages.) The number of received stack messages is 4.
#4 - The number of application messages in the counter is 140, which matches the expected number (4 per minute * 35 minutes = 140). The most recent AF_DataRequest() return status was SUCCESS. So the application is sending messages when it should, and the stack is responding with a SUCCESS status, even though they are never seen in the sniffer capture, or received by the coordinator.
#5 - The number of received packets is only 1.
#6 - The transmit power level is set to what we expect it to be (0xF7), or -9dBm. (Keep in mind this is the power level that feeds into the CC2592.)
I also noticed that when the router is reset after the 35-minute timeout, it ASSOCIATES and joins a new network and receives a new short address. That seems to indicate something is wrong with the stack settings or something, to make it join instead of just resuming operation on its current network.
--
So from what I can tell, when the router stops transmitting or receiving, the stack thinks it is joined to the correct network, the power level is set correctly, and the stack thinks it should be able to transmit packets since it is not throwing an error. Couple questions:
1. Are these valid conclusions? Am I reading the right APIs to see if the stack thinks it is joined and what channel and pan ID it thinks it's operating on? Or is there a better way to get that information? I've wondered if the radio is operating on a different channel for some reason, but that doesn't seem to be the case.
2. Are there any other APIs we could call or settings we can check to try and figure out why the radio is not transmitting or receiving, or to confirm if the radio or stack is currently enabled?
3. Are there any other hardware pins or firmware settings we should be looking at to troubleshoot this failure?
--
Lastly, we just noticed this problem seems very similar to the reported problem here, although that was posted years ago with a much older stack. We do have a Wifi router operating close by though, so channel interference could be a culprit. If this is a similar problem to what is in the link, I should be able to detect the problem as suggested in that post by monitoring the number of neighbors with >0 link cost, and then reset the router if it drops to 0.