Extra route requests / Missing route replies on 5437+CC2520+CC2591

nkn

Other Parts Discussed in Thread: CC2591, MSP430F5437, CC2520, Z-STACK, CC2531

Good morning,

we are using a custom board with the ProFLEX01 module, which is a MSP430F5437 + CC2520 + CC2591 design, in a small profiling application in which up to 40 router devices periodically send unicast timestamped AF status messages to a serial gateway (also a router). The serial gateway outputs information about the sent and received AF messages as well as its APS acknowledgements on its RS-232 port to a desktop application on a PC.

We have modified the Z-Stack-EXP5438-2.4.0 to use the proper RESETN pin for the ProFLEX01 Rev. C (pin 7.6 instead of pin 1.2) and have enabled the HAL_PA_LNA compilation flag for proper CC2520<->CC2591 communication. The stack profile used is ZigBee Pro (ZIGBEEPRO compilation flag). The RF settings (fw8Config.cfg and fw8Router.cfg) were left at the default. Additionally, we use the BUILD_ALL_DEVICES directive and set an appropriate zgLogicalDeviceType at runtime. We do not use many-to-one resp. source routing.

Now with about 10 routers turned on and sending status updates every 2 seconds, we can observe that some routers lose their route to the serial gateway and will issue a route request again. However, most of the time, no route reply is sent from the destination device, although we can see that the destination device is still transmitting. In our testing setup, none of the devices is turned off or moved. Setting ROUTE_EXPIRY_TIME=0 and increasing NWK_MAX_ROUTERS doesn't seem to help. We have also unsuccessfully tried to use the usual ZigBee profile.

What could be the problem? Thanks very much for any pointers!

over 14 years ago

0 "Double 0" over 14 years ago

TI__Expert 4155 points

Hi nkn,

Thanks for your note. It looks like you are pretty knee deep with the ProFlex module and the Z-Stack and quite knowledgeable about the products already. Thank you for choosing LS Research and TI products. It would be helpful if you could share a "clean" Ubiqua sniffer trace of time t = 0 to when the problem occurs. The sniffer should show the time period from the start of the formation of the network to when the route requests are observed. It will be interesting to see which node is generating the route requests etc. and an analysis of the link status messages from each router can give some information as to who it thinks its best neighboring routers are. In addition, here are a few tips/comments/questions to help narrow the problem down:

1) What is the physical separation between each router?

2) Is NV_RESTORE being used?

3) What is the transmit power of the modules?

4) Do all 10 routers try to transmit at roughly the same time within the 2 second interval? Typically it makes sense to add some random start time + some jitter so that when the nodes come up the likelihood that they all try to send their messages at the same time is minimized.

5) Try turning off frequency agility on all the nodes by setting ZDNWKMGR_MIN_TRANSMISSIONS=0 in f8wConfig.cfg.

0 nkn over 14 years ago in reply to "Double 0"

Prodigy 115 points

Hello Double 0, thank you very much for your answer.

1) At the moment, the devices are on the same desk, with being about 20cm apart. I have tried placing them 2-3m apart from each other, but the observed behavior remains.

2) Not anymore, but we originally first observed the behavior with NV_RESTORE before reverting to NV_INIT. The attached packet sniffer snapshots are with NV_INIT enabled, so the problem is still there, but I also rechecked the packet sniffer snapshots we created when NV_RESTORE was still turned on and stumbled upon a broadcasted NWK status message with status 0x02 for one device, e.g. indicating "Non-tree link failure: The routing failure did not occur as a result of an attempt to route along the tree.". I was unable to find more information about this status, but I believe this could be related?

3) We are using a transmit power of 18 (non-register value) against logical channel 0x0B. I have not yet tried to turn the transmit power down.

4) Yes, they are programmed to send a message at the same interval, without jitter, but, at the moment, are turned on by hand one by one by inserting the batteries, so I don't think this should be a problem; however, I will keep this hint in mind.

5) Thanks for the suggestion, but the problem still occurs with the changed setting. I have attached a packet sniffer snapshot created with ZDNWKMGR_MIN_TRANSMISSIONS=0.

We do not have access to the Ubiqua packet sniffer or hardware supported by it, so I attached two snapshots created with the SmartRF packet sniffer (sniffer revision T, stack profile ZigBee PRO 2007, hardware CC2520), which I hope is fine as well. Here is a short description of the important events in the snapshot

extrarouterequests_missingroutereplies.psd, sorted by the sniffer's RX packet number:

#2 - The coordinator with short address 0x0000 has started up and created a PAN with ID 0xE100.

#13 - The serial gateway has successfully joined the PAN and received the short address 0xCE98.

#33 - Router A has successfully joined the PAN and received the short address 0x6A69.

#40 - Router A [0x6A69] sends a route request to find a route to the serial gateway [0xCE98].

#56 - Router A [0x6A69] receives a route reply for the serial gateway [0xCE98] from the coordinator.

#149 - Router B has successfully joined the PAN and received the short address 0xD877.

#170 - Router B [0xD877] sends a route request to find a route to the serial gateway [0xCE98].

#172 - Router B [0xD877] receives a route reply for the serial gateway [0xCE98] from the serial gateway itself.

#386 - Router C has successfully joined the PAN and received the short address 0xEE33.

#434 - Router C [0xEE33] sends a route request to find a route to the serial gateway [0xCE98].

#436 - Router C [0xEE33] receives a route reply for the serial gateway [0xCE98] from the serial gateway itself.

… Meanwhile, messages are sent without problems ...

#1163 - Router C [0xEE33] seems to have lost its route to the serial gateway [0xCE98] and sends out another route request.

#1217 - Last broadcast for Router C's [0xEE33] route request. There is no reply.

#1220 - Although not having received a route reply, router C [0xEE33] sends an AF message (APS counter 31) to the serial gateway [0xCE98] anyway and receives a MAC acknowledgement.

#1239 - The serial gateway [0xCE98] issues a route request for router C [0xEE33], which is broadcasted until #1273, but receives no reply. It then proceeds to send an APS acknowledgement at #1277 for the AF message with APS counter 31 anyway.

… The problems go on, lost routes, route requests and no route replies.

From what I can tell through debugging, some of the messages, which are sent without having first received a route reply for the destination, are actually received at the target, but very randomly. In fact, I can observe jumps (i.e. no sequence with message transaction ID increased only by one) in the message transaction ID for received messages of AF_INCOMING_MSG_CMD and AF_DATA_CONFIRM.

Note that in this example, I have only turned on the coordinator + serial gateway + three routers. With more routers being turned on, more routers start a route request to the serial gateway like observed in thie example.

[For what it's worth: Could this be a regression from the optmized route request/response handling introduced in Z-Stack 2.3.0? I found the following in the 2.3.0 changelog:

- Improved delivery of unicast response messages in a larger network that has broadcast "storms" - route responses could get lost when a route request occured. A delay was added to re-broadcasts of route requests and limited queuing of incoming broadcast messages. [3099]]

extrarouterequests_missingroutereplies.zip

0 nkn over 14 years ago in reply to nkn

Prodigy 115 points

In the meantime, I have tried some more ways to fix the problem and can add some information:

3) Setting a transmit power of 4 or 8 doesn't help. Also, I tried configuring the CC2591 in low-gain mode by calling the HAL_PA_LNA_RX_LGM() macro, which did not have an effect on the observed route requests.

4) I did add a 0-127ms jitter to the send interval periods, without effect.

Apart from that I have also unsucessfully tried to increase the MAX_NEIGHBOR_ENTRIES and to change various broadcast settings (e.g. queue length).

0 nkn over 14 years ago in reply to nkn

Prodigy 115 points

Does anyone have some more suggestions?

0 "Double 0" over 14 years ago in reply to nkn

TI__Expert 4155 points

Hi nkn,

Thank you again for your detailed responses and spending time to investigate the problem. I have been trying to analyze the problem further and what I notice is that there doesn't seem to be regular link status messages come from all the nodes in the network. I'm not sure if this is just an artifact of where the sniffer was placed in relation to the rest of the nodes, but if the assumption is that all nodes are within earshot then the sniffer should be picking up more regular link status messages from all the nodes. Each link status message from every router should come out at roughly 15 second intervals. If they are not doing this, then perhaps this is a clue to the aberrant behavior.

This certainly is not normal behavior and something that we have not seen in our own 400 node test network.

If you can, I would recommend getting the CC2531 USB dongle ($49 each from the TI e-store) and try to capture this with the Ubilogix sniffer. I think you can download a 30-day free trial from ubilogix.com. The problem is a lot easier to analyze with this sniffer.

0 "Double 0" over 14 years ago in reply to "Double 0"

TI__Expert 4155 points

Could I also entice you to try this experiment with Z-Stack 2.5.0 the latest release? We have addressed some potential pitfalls with routing in this release.

0 "Double 0" over 14 years ago in reply to nkn

TI__Expert 4155 points

Just to confirm, you using the default settings of MAX_RTG_ENTRIES=40 and MAX_NEIGHBOR_ENTRIES=16?

0 nkn over 14 years ago in reply to "Double 0"

Prodigy 115 points

Hello Double 0, thanks very much for your analysis. Sorry for the late answer, but I was on holiday for a few days.

> CC2531: It seems that the sniffer USB dongle is currently out of stock, so it's possibly not an option right now ;)

> Z-Stack 2.5.0: I have tried porting the current Z-Stack version to the ProFLEX01 module and our custom board, but I can't seem to bring the coordinator up. Actually, it just stops in the state PAN_CHNL_SELECTION and doesn't proceed; when I turn on/off the board a few times, the coordinator randomly does manage to come up, but devices turned on later are unable to join the network.

> Default settings: Yes, both settings are at their defaults. I have also tried increasing MAX_NEIGHBOR_ENTRIES to 40 without effect.

Based on your conclusion with the link status messages, I have tried to increase NWK_ROUTE_AGE=30 (i.e. number of missed link status frames) and also tried to turn off the link status messages by setting LINK_STATUS_PERIOD=0. While it seems to indeed help with a setup of 1x coordinator + 4x routers (I don't see the extra route requests within a timespan of 10 minutes), the problem again appears with 1x coordinator + 14 routers.

Do you think trying to change the CCA mode (as described at Z-Stack How to change CCA mode?) would be worthwile?

0 Dh Sh over 14 years ago in reply to nkn

Intellectual 645 points

Guys,

Did you have any lick? I am seeing very similar issue

Guys,

Here is my configuration:

- 11 Nodes all configured as routers (all sending out reports to Gateway)

- Implementation is based exactly off Sensor Demo application.

- Stack is 2.4.0-1.4.0

- All nodes seem to be storing nwk info in NVRAM.

During reset entire NWK comes up. All nodes respond and I get desired data. After few hours of operation the whole network becomes unstable in that, some nodes stop sending data and others keep sending data, but I see a lot of "Route Req" and "Match desc Reqs" going around in network. See below for a snapshot. The basic questions I had are:

- Is that a healthy sign that every few seconds one "route req" and "match desc" gets generated? I do not understand why is this happening especially if that router is already successfully sending data to gateway. I understand such requests if they are from disconnected nodes.

- I took one node from this network and tested on bench. I found it was generating many "Route Reqs" along with sending partial data after dropping few packets. Even after turning it on/off. Apparently when I flashed a new FW, it stopped sending such flood Req. and in started responding neatly to gateway - only few Rout Req initally- after that none.

I was suspecting, this node along with few other nodes in earlier network went stale. But still I do not understand why and how did this happen?

Any clue what is going on here?

(This snapshot is after few hours of NWK formation)

36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	62	0xFC63	0xFFFD	0xA6	0x1D
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	63	0x035B	0xFFFD	0xED	0x21
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	64	0x65A6	0xFFFD	0x67	0x1B
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	65	0x0000	0xFFFD	0x63
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	66	0x035B	0xFFFD	0xED	0x21
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	67	0x65A6	0xFFFD	0x67	0x1B
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	68	0x65A6	0xFFFD	0x67	0x1B
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	69	0x9C1E	0xFFFD	0x9D	0xE8
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	70	0x0000	0xFFFD	0x65
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	71	0x9C1E	0xFFFD	0x9D	0xE8
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	72	0x9C1E	0xFFFD	0x9D	0xE8
50	11	Reserved	0x7701	0xE800	0xFFF8	128
5	11	Acknowledgment				33
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	73	0x8878	0xFFFD	0x1F	0xE4
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	74	0x0000	0xFFFD	0x67
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	75	0xD1C5	0xFFFD	0xDF	0xFA
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	76	0x0000	0xFFFD	0x69
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	77	0x8878	0xFFFD	0x1F	0xE4
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	78	0xD1C5	0xFFFD	0xDF	0xFA
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	79	0x8878	0xFFFD	0x1F	0xE4
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	80	0xD1C5	0xFFFD	0xDF	0xFA
41	11	NWK: Link Status	0xC301	0x0000	0xFFFF	81	0x0000	0xFFFC	0x6A
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	82	0x6C39	0xFFFD	0xFE	0x42
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	83	0x0000	0xFFFD	0x6C
33	11	Reserved	0xB301	0x1029	0xFFFB	39	0x691D	0x0000	0x70
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	84	0x6C39	0xFFFD	0xFE	0x42
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	85	0x6C39	0xFFFD	0xFE	0x42
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	86	0xFC63	0xFFFD	0xA8	0x1E
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	87	0x0000	0xFFFD	0x6E
5	11	Acknowledgment				43
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	88	0xFC63	0xFFFD	0xA8	0x1E
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	89	0x035B	0xFFFD	0xEF	0x22
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	90	0x0000	0xFFFD	0x70
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	91	0xFC63	0xFFFD	0xA8	0x1E
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	92	0x035B	0xFFFD	0xEF	0x22
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	93	0x035B	0xFFFD	0xEF	0x22
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	94	0x65A6	0xFFFD	0x69	0x1C
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	95	0x0000	0xFFFD	0x72
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	96	0x65A6	0xFFFD	0x69	0x1C
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	97	0x65A6	0xFFFD	0x69	0x1C
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	98	0x9C1E	0xFFFD	0x9F	0xE9
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	99	0x0000	0xFFFD	0x74
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	100	0x9C1E	0xFFFD	0x9F	0xE9
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	101	0x9C1E	0xFFFD	0x9F	0xE9
41	11	NWK: Link Status	0xC301	0x0000	0xFFFF	102	0x0000	0xFFFC	0x75
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	103	0xD1C5	0xFFFD	0xE1	0xFB
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	104	0x0000	0xFFFD	0x77
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	105	0x8878	0xFFFD	0x21	0xE5
33	11	NWK: Route Request	0xC301	0x0000	0xFFFF	106	0x0000	0xFFFD	0x79
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	107	0xD1C5	0xFFFD	0xE1	0xFB
36	11	ZDP: Match_Desc_req	0xC301	0x0000	0xFFFF	108	0x8878	0xFFFD	0x21	0xE5

0 Leila Keyvani over 14 years ago in reply to Dh Sh

Expert 1090 points

Hi,

I wonder if you figured out the problem. I have something similar going on. My router is working fine and all of sudden starts a route request. I have also disabled APS ack.

Thanks,

Leila

0 Darren_NZ over 13 years ago in reply to Dh Sh

Intellectual 375 points

Hi guys,

I seem to have observed a similar issue,
I have a found that "APS Ack's" and "ZDP:MatchDescRsp" messages not been routed back to the source device in a multi Hop network with a Concentrator. I have witnessed this problem with several networks with route paths of two to three hops deep.

Observed behaviour:
When a network is newly formed all data packets are correctly routed to and from the network concentrator when each device reports in at a set interval. The network will work like this for days but over time devices that are three to four hops deep seem to disappear from the network, i.e. stop reporting into the network concentrator. I believe this is because they are not receiving the "APS Ack" from their report messages and unbinding from the concentrator as it appears to not exist. It also appears that when the now unbound device tries to re-find the concentrator through "MatchDescReq" its not receiving the "MatchDescRsp".

I have also found that if I try to connect to one of the devices in the network that has appeared to have disappeared by brute force, i.e. constantly quarrying for an attribute. I can eventually establish a link to the device. Once a link is re-established the device will rebind to the concentrator and start reporting again for a period, before it evenly stopping again. I have also found that if I add a new concentrator to the network that all devices will bind and subsequently create a routing path to the new concentrator and report in, even those that have stopped reporting to the old concentrator. This proves that all the devices in the network are stable and running.

I believe the problem is caused by source routing tables not been recorded properly or routing tables not being updated correctly. I have tried experimenting with different setting in the network concentrator, i.e setting different discovery times and route cache but always get the same result.

Has anyone had the same issue and managed to resolve it?

----------------------------------------------------------------------------------------------------------------------------------------

My test network:
I have a real world network setup with 14 routers spread across a campus made up of 5 building with open spaces and trees in between the buildings. My network is broken down as: 6 Router within the same room as the concentrator(my Office), one hop from the concentrator,
3 Router set at two hops,
2 Routers set at three hops,
3 Routers set at four hops,

Hardware:
Customer boards with CC2530 + CC2591 into an F antenna
Running Zstack 2.5.0, I think I have also observed this behaviour with Zstack 2.3.1

MAC Settings:
MAC_CFG_TX_DATA_MAX         3
MAC_CFG_TX_MAX                     5
MAC_CFG_RX_MAX                     5

Network settings:
ROUTE_EXPIRY_TIME=0
APSC_ACK_WAIT_DURATION_POLLED=3000
NWK_INDIRECT_MSG_TIMEOUT=30

MAX_RTG_SRC_ENTRIES =12
SRC_RTG_EXPIRY_TIME =10

MAX_RREQ_ENTRIES=8
APSC_MAX_FRAME_RETRIES=3
MAX_POLL_FAILURE_RETRIES=2
MAX_BCAST=9

Concentrator settings I have tried:
CONCENTRATOR_ENABLE true
CONCENTRATOR_DISCOVERY_TIME 0, 15 and 60
CONCENTRATOR_ROUTE_CACHE false and true (not sure if CC2530 can support router cache )

0 adarsh.pll over 12 years ago in reply to Darren_NZ

Intellectual 580 points

Dear ALL

We too are facing the same problem as above. Please share if anybody got any work around or solution.

Hoping for some support.

Thanks in Advance

Adarsh

Zigbee & Thread

Zigbee & Thread forum

Extra route requests / Missing route replies on 5437+CC2520+CC2591