This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Extra route requests / Missing route replies on 5437+CC2520+CC2591

Other Parts Discussed in Thread: CC2591, MSP430F5437, CC2520, Z-STACK, CC2531

Good morning,
    
we are using a custom board with the ProFLEX01 module, which is a MSP430F5437 + CC2520 + CC2591 design, in a small profiling application in which up to 40 router devices periodically send unicast timestamped AF status messages to a serial gateway (also a router). The serial gateway outputs information about the sent and received AF messages as well as its APS acknowledgements on its RS-232 port to a desktop application on a PC.
      
We have modified the Z-Stack-EXP5438-2.4.0 to use the proper RESETN pin for the ProFLEX01 Rev. C (pin 7.6 instead of pin 1.2) and have enabled the HAL_PA_LNA compilation flag for proper CC2520<->CC2591 communication. The stack profile used is ZigBee Pro (ZIGBEEPRO compilation flag). The RF settings (fw8Config.cfg and fw8Router.cfg) were left at the default. Additionally, we use the BUILD_ALL_DEVICES directive and set an appropriate zgLogicalDeviceType at runtime. We do not use many-to-one resp. source routing.
     
Now with about 10 routers turned on and sending status updates every 2 seconds, we can observe that some routers lose their route to the serial gateway and will issue a route request again. However, most of the time, no route reply is sent from the destination device, although we can see that the destination device is still transmitting. In our testing setup, none of the devices is turned off or moved. Setting ROUTE_EXPIRY_TIME=0 and increasing NWK_MAX_ROUTERS doesn't seem to help. We have also unsuccessfully tried to use the usual ZigBee profile.
    
What could be the problem? Thanks very much for any pointers!

 

  • Hi nkn,

    Thanks for your note. It looks like you are pretty knee deep with the ProFlex module and the Z-Stack and quite knowledgeable about the products already. Thank you for choosing LS Research and TI products. It would be helpful if you could share a "clean" Ubiqua sniffer trace of time t = 0 to when the problem occurs. The sniffer should show the time period from the start of the formation of the network to when the route requests are observed. It will be interesting to see which node is generating the route requests etc. and an analysis of the link status messages from each router can give some information as to who it thinks its best neighboring routers are. In addition, here are a few tips/comments/questions to help narrow the problem down:

    1) What is the physical separation between each router?

    2) Is NV_RESTORE being used?

    3) What is the transmit power of the modules?

    4) Do all 10 routers try to transmit at roughly the same time within the 2 second interval? Typically it makes sense to add some random start time + some jitter so that when the nodes come up the likelihood that they all try to send their messages at the same time is minimized.

    5) Try turning off frequency agility on all the nodes by setting ZDNWKMGR_MIN_TRANSMISSIONS=0 in f8wConfig.cfg.

     

  • Hello Double 0, thank you very much for your answer.
          
    1) At the moment, the devices are on the same desk, with being about 20cm apart. I have tried placing them 2-3m apart from each other, but the observed behavior remains.
        
    2) Not anymore, but we originally first observed the behavior with NV_RESTORE before reverting to NV_INIT. The attached packet sniffer snapshots are with NV_INIT enabled, so the problem is still there, but I also rechecked the packet sniffer snapshots we created when NV_RESTORE was still turned on and stumbled upon a broadcasted NWK status message with status 0x02 for one device, e.g. indicating "Non-tree link failure: The routing failure did not occur as a result of an attempt to route along the tree.". I was unable to find more information about this status, but I believe this could be related?
        
    3) We are using a transmit power of 18 (non-register value) against logical channel 0x0B. I have not yet tried to turn the transmit power down.
        
    4) Yes, they are programmed to send a message at the same interval, without jitter, but, at the moment, are turned on by hand one by one by inserting the batteries, so I don't think this should be a problem; however, I will keep this hint in mind.
        
    5) Thanks for the suggestion, but the problem still occurs with the changed setting. I have attached a packet sniffer snapshot created with ZDNWKMGR_MIN_TRANSMISSIONS=0.
        
    We do not have access to the Ubiqua packet sniffer or hardware supported by it, so I attached two snapshots created with the SmartRF packet sniffer (sniffer revision T, stack profile ZigBee PRO 2007, hardware CC2520), which I hope is fine as well. Here is a short description of the important events in the snapshot
    extrarouterequests_missingroutereplies.psd, sorted by the sniffer's RX packet number:
       
    #2 - The coordinator with short address 0x0000 has started up and created a PAN with ID 0xE100.
       
    #13 - The serial gateway has successfully joined the PAN and received the short address 0xCE98.
       
    #33 - Router A has successfully joined the PAN and received the short address 0x6A69.
       
    #40 - Router A [0x6A69] sends a route request to find a route to the serial gateway [0xCE98].
      
    #56 - Router A [0x6A69] receives a route reply for the serial gateway [0xCE98] from the coordinator.
       
    #149 - Router B has successfully joined the PAN and received the short address 0xD877.
      
    #170 - Router B [0xD877] sends a route request to find a route to the serial gateway [0xCE98].
      
    #172 - Router B [0xD877] receives a route reply for the serial gateway [0xCE98] from the serial gateway itself.
       
    #386 - Router C has successfully joined the PAN and received the short address 0xEE33.
       
    #434 - Router C [0xEE33] sends a route request to find a route to the serial gateway [0xCE98].
      
    #436 - Router C [0xEE33] receives a route reply for the serial gateway [0xCE98] from the serial gateway itself.
       
    … Meanwhile, messages are sent without problems ...
       
    #1163 - Router C [0xEE33] seems to have lost its route to the serial gateway [0xCE98] and sends out another route request.
       
    #1217 - Last broadcast for Router C's [0xEE33] route request. There is no reply.
       
    #1220 - Although not having received a route reply, router C [0xEE33] sends an AF message (APS counter 31) to the serial gateway [0xCE98] anyway and receives a MAC acknowledgement.
       
    #1239 - The serial gateway [0xCE98] issues a route request for router C [0xEE33], which is broadcasted until #1273, but receives no reply. It then proceeds to send an APS acknowledgement at #1277 for the AF message with APS counter 31 anyway.
        
    … The problems go on, lost routes, route requests and no route replies. 
       
    From what I can tell through debugging, some of the messages, which are sent without having first received a route reply for the destination, are actually received at the target, but very randomly. In fact, I can observe jumps (i.e. no sequence with message transaction ID increased only by one) in the message transaction ID for received messages of AF_INCOMING_MSG_CMD and AF_DATA_CONFIRM.
       
    Note that in this example, I have only turned on the coordinator + serial gateway + three routers. With more routers being turned on, more routers start a route request to the serial gateway like observed in thie example.
        
    [For what it's worth: Could this be a regression from the optmized route request/response handling introduced in Z-Stack 2.3.0? I found the following in the 2.3.0 changelog:
     - Improved delivery of unicast response messages in a larger network that has broadcast "storms" - route responses could get lost when a route request occured. A delay was added to re-broadcasts of route requests and limited queuing of incoming broadcast messages. [3099]]
    extrarouterequests_missingroutereplies.zip
  • In the meantime, I have tried some more ways to fix the problem and can add some information:  

    3) Setting a transmit power of 4 or 8 doesn't help. Also, I tried configuring the CC2591 in low-gain mode by calling the HAL_PA_LNA_RX_LGM() macro, which did not have an effect on the observed route requests.

    4) I did add a 0-127ms jitter to the send interval periods, without effect.

    Apart from that I have also unsucessfully tried to increase the MAX_NEIGHBOR_ENTRIES and to change various broadcast settings (e.g. queue length).

  • Does anyone have some more suggestions?

  • Hi nkn,

    Thank you again for your detailed responses and spending time to investigate the problem. I have been trying to analyze the problem further and what I notice is that there doesn't seem to be regular link status messages come from all the nodes in the network. I'm not sure if this is just an artifact of where the sniffer was placed in relation to the rest of the nodes, but if the assumption is that all nodes are within earshot then the sniffer should be picking up more regular link status messages from all the nodes. Each link status message from every router should come out at roughly 15 second intervals. If they are not doing this, then perhaps this is a clue to the aberrant behavior.

    This certainly is not normal behavior and something that we have not seen in our own 400 node test network.

    If you can, I would recommend getting the CC2531 USB dongle ($49 each from the TI e-store) and try to capture this with the Ubilogix sniffer. I think you can download a 30-day free trial from ubilogix.com. The problem is a lot easier to analyze with this sniffer.

  • Could I also entice you to try this experiment with Z-Stack 2.5.0 the latest release? We have addressed some potential pitfalls with routing in this release.

  • Just to confirm, you using the default settings of MAX_RTG_ENTRIES=40 and MAX_NEIGHBOR_ENTRIES=16?

  • Hello Double 0, thanks very much for your analysis. Sorry for the late answer, but I was on holiday for a few days.
        
    > CC2531: It seems that the sniffer USB dongle is currently out of stock, so it's possibly not an option right now ;)
        
    > Z-Stack 2.5.0: I have tried porting the current Z-Stack version to the ProFLEX01 module and our custom board, but I can't seem to bring the coordinator up. Actually, it just stops in the state PAN_CHNL_SELECTION and doesn't proceed; when I turn on/off the board a few times, the coordinator randomly does manage to come up, but devices turned on later are unable to join the network.
       
    > Default settings: Yes, both settings are at their defaults. I have also tried increasing MAX_NEIGHBOR_ENTRIES to 40 without effect.
        
    Based on your conclusion with the link status messages, I have tried to increase NWK_ROUTE_AGE=30 (i.e. number of missed link status frames) and also tried to turn off the link status messages by setting LINK_STATUS_PERIOD=0. While it seems to indeed help with a setup of 1x coordinator + 4x routers (I don't see the extra route requests within a timespan of 10 minutes), the problem again appears with 1x coordinator + 14 routers. 
       
    Do you think trying to change the CCA mode (as described at Z-Stack How to change CCA mode?) would be worthwile?

     

  • Guys,

     Did you have any lick? I am seeing very similar issue

    Guys,

    Here is my configuration:

    - 11 Nodes all configured as routers (all sending out reports to Gateway)

    - Implementation is based exactly off Sensor Demo application.

    - Stack is 2.4.0-1.4.0

    - All nodes seem to be storing nwk info in NVRAM.

    During reset entire NWK comes up. All nodes respond and I get desired data. After few hours of operation the whole network becomes unstable in that, some nodes stop sending data and others keep sending data, but I see a lot of "Route Req" and "Match desc Reqs" going around in network. See below for a snapshot. The basic questions I had are:

    - Is that a healthy sign that every few seconds one "route req" and "match desc" gets generated? I do not understand why is this happening especially if that router is already successfully sending data to gateway. I understand such requests if they are from disconnected nodes.

    - I took one node from this network and tested on bench. I found it was generating many "Route Reqs" along with sending partial data after dropping few packets. Even after turning it on/off. Apparently when I flashed a new FW, it stopped sending such flood Req. and in started responding neatly to gateway - only few Rout Req initally- after that none.

    I was suspecting, this node along with few other nodes in earlier network went stale. But still I do not understand why and how did this happen?

    Any clue what is going on here?

     

    (This snapshot is after few hours of NWK formation)

    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 62 0xFC63 0xFFFD 0xA6 0x1D
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 63 0x035B 0xFFFD 0xED 0x21
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 64 0x65A6 0xFFFD 0x67 0x1B
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 65 0x0000 0xFFFD 0x63
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 66 0x035B 0xFFFD 0xED 0x21
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 67 0x65A6 0xFFFD 0x67 0x1B
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 68 0x65A6 0xFFFD 0x67 0x1B
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 69 0x9C1E 0xFFFD 0x9D 0xE8
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 70 0x0000 0xFFFD 0x65
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 71 0x9C1E 0xFFFD 0x9D 0xE8
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 72 0x9C1E 0xFFFD 0x9D 0xE8
    50 11 Reserved 0x7701 0xE800 0xFFF8 128
    5 11 Acknowledgment 33
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 73 0x8878 0xFFFD 0x1F 0xE4
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 74 0x0000 0xFFFD 0x67
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 75 0xD1C5 0xFFFD 0xDF 0xFA
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 76 0x0000 0xFFFD 0x69
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 77 0x8878 0xFFFD 0x1F 0xE4
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 78 0xD1C5 0xFFFD 0xDF 0xFA
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 79 0x8878 0xFFFD 0x1F 0xE4
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 80 0xD1C5 0xFFFD 0xDF 0xFA
    41 11 NWK: Link Status 0xC301 0x0000 0xFFFF 81 0x0000 0xFFFC 0x6A
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 82 0x6C39 0xFFFD 0xFE 0x42
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 83 0x0000 0xFFFD 0x6C
    33 11 Reserved 0xB301 0x1029 0xFFFB 39 0x691D 0x0000 0x70
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 84 0x6C39 0xFFFD 0xFE 0x42
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 85 0x6C39 0xFFFD 0xFE 0x42
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 86 0xFC63 0xFFFD 0xA8 0x1E
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 87 0x0000 0xFFFD 0x6E
    5 11 Acknowledgment 43
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 88 0xFC63 0xFFFD 0xA8 0x1E
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 89 0x035B 0xFFFD 0xEF 0x22
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 90 0x0000 0xFFFD 0x70
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 91 0xFC63 0xFFFD 0xA8 0x1E
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 92 0x035B 0xFFFD 0xEF 0x22
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 93 0x035B 0xFFFD 0xEF 0x22
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 94 0x65A6 0xFFFD 0x69 0x1C
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 95 0x0000 0xFFFD 0x72
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 96 0x65A6 0xFFFD 0x69 0x1C
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 97 0x65A6 0xFFFD 0x69 0x1C
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 98 0x9C1E 0xFFFD 0x9F 0xE9
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 99 0x0000 0xFFFD 0x74
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 100 0x9C1E 0xFFFD 0x9F 0xE9
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 101 0x9C1E 0xFFFD 0x9F 0xE9
    41 11 NWK: Link Status 0xC301 0x0000 0xFFFF 102 0x0000 0xFFFC 0x75
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 103 0xD1C5 0xFFFD 0xE1 0xFB
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 104 0x0000 0xFFFD 0x77
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 105 0x8878 0xFFFD 0x21 0xE5
    33 11 NWK: Route Request 0xC301 0x0000 0xFFFF 106 0x0000 0xFFFD 0x79
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 107 0xD1C5 0xFFFD 0xE1 0xFB
    36 11 ZDP: Match_Desc_req 0xC301 0x0000 0xFFFF 108 0x8878 0xFFFD 0x21 0xE5
  • Hi,

    I wonder if you figured out the problem. I have something similar going on. My router is working fine and all of sudden starts a route request. I have also disabled APS ack.

    Thanks,

    Leila

  • Hi guys,

    I seem to have observed a similar issue, 
    I have a found that "APS Ack's" and "ZDP:MatchDescRsp" messages not been routed back to the source device in a  multi Hop network  with a Concentrator. I have witnessed this problem with several networks with route paths of two to three hops deep.

    Observed behaviour: 
    When a network is newly formed all data packets are correctly routed  to and from the network concentrator when each device reports in at a set interval. The network will work like this for days but over time devices that are three to four hops deep seem to disappear from the network, i.e. stop reporting into the network concentrator.  I believe this is because they are not receiving the "APS Ack" from their report messages and  unbinding from the concentrator as it appears to not exist. It also appears that when the now unbound device tries to re-find the concentrator through "MatchDescReq" its not receiving the "MatchDescRsp".

    I have also found that if I try to connect to one of the devices in the network that has appeared to have disappeared by brute force, i.e. constantly quarrying for an attribute. I can eventually establish a link to the device. Once a link is re-established the device will rebind to the concentrator and start reporting  again for a period, before it evenly stopping again.  I have also found that if I add a new concentrator to the network that all devices will bind and subsequently create a routing path to the new concentrator and report in, even those that have stopped reporting to the old concentrator. This proves that all the devices in the network are stable and running.

    I believe the problem is caused by source routing tables not been recorded properly or routing tables not being updated correctly.  I have tried experimenting with different setting in the network concentrator, i.e setting different discovery times  and  route cache but always get the same result. 

    Has anyone had the same issue and managed to resolve it?

    ----------------------------------------------------------------------------------------------------------------------------------------

    My test network:
    I have a real world network setup with 14 routers spread across a campus made up of 5 building with open spaces and trees in between the buildings. My network is broken down as: 6 Router within the same room as the concentrator(my Office), one hop from the concentrator,
    3 Router set at two hops,
    2 Routers set at three hops,
    3 Routers set at four hops,

    Hardware:
    Customer boards with CC2530 + CC2591 into an F antenna
    Running  Zstack 2.5.0, I think I have also observed this behaviour with Zstack 2.3.1

    MAC Settings:
    MAC_CFG_TX_DATA_MAX         3
    MAC_CFG_TX_MAX                     5
    MAC_CFG_RX_MAX                     5

    Network settings:
    ROUTE_EXPIRY_TIME=0
    APSC_ACK_WAIT_DURATION_POLLED=3000
    NWK_INDIRECT_MSG_TIMEOUT=30

     MAX_RTG_SRC_ENTRIES =12
    SRC_RTG_EXPIRY_TIME =10

     MAX_RREQ_ENTRIES=8
    APSC_MAX_FRAME_RETRIES=3
    MAX_POLL_FAILURE_RETRIES=2
    MAX_BCAST=9

    Concentrator settings I have tried:
    CONCENTRATOR_ENABLE                    true
    CONCENTRATOR_DISCOVERY_TIME  0, 15 and 60
    CONCENTRATOR_ROUTE_CACHE        false and true (not sure if CC2530 can support router cache )  

  • Dear ALL

    We too are facing the same problem as above. Please share if anybody got any work around or solution.

    Hoping for some support.

    Thanks in Advance

    Adarsh