This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2538: ZNP Sleeping ED rejoin via Router. Coordinator association not updated to expect no Data Requests.

Part Number: CC2538
Other Parts Discussed in Thread: Z-STACK,

We have found what we believe to be an interesting bug in Z-Stack 3.0.1. If a sleepy ZED changes it's parent from directly connected to the ZC to connected via a ZR Z-Stack does not update it's internal state to reflect the change from sleepy (wait for Data Request) to non-sleepy (buffered by ZR).

Network Setup:

1. First setup the network as follows ZC (ZNP 3.0) -> ZED (commercial latch sensor)

2. Add a ZR close to ZED and wait, eventually ZED will choose to rejoin via stronger ZR (Z-Stack 1.2.2a) signal.

3. ZC is unable to communicate via ZED. All afDataRequests return ZMacTransactionExpired. It appears that ZC still beleives that either ZED is directly connected or sleepy. This is despite seeing an update request from ZR and packets (IAS Zone Status Update) from the ZED.

Packets sent from ZED to ZC (e.g IAS Zone Status Change) arrive correctly (ZED -> ZR -> ZC) and are processed as valid packets.

  • If you wait until ZC aging out ZED, can the message be delivered from ZC to ZED?
  • YiKai Chen,

    ChildAging is enabled (zgChildAgingEnable = TRUE). NWK_END_DEV_TIMEOUT_DEFAULT is 8 seconds (default). The issue does not resolve itself after any amount of time.

    Is this enough information?

    I have also tested removing the battery in ZED and having it re-connect via the ZR (Orphan Notification, Coordinator Realignment packets). This does not resolve the issue either.
  • When this happens, do you check if ZED is still in association list of ZC even after child aging timeout?
  • Hi,

    NWK_END_DEV_TIMEOUT_DEFAULT set to 8 is not 8 seconds, it is 256 minutes (check the comments above the macro in ZGlobals.h). You can try setting it to 0 (10 seconds) and rerun your test to see if it is still an issue.
  • Thanks Jason, I will do that test now.

    Thanks YiKai Chen,

    The best way to test this would be an LQI Scan on 0x0000, that would return an entry for the ZED only if the ZED was in the Association list on ZC correct?
  • Yes, it's correct to use LQI Scan on 0x0000 to know if the ZED was in the Association list on ZC.
  • YK Chen,

    LQI request on ZC (0x0000) entry for ZED:

    { extPandId: '0x14b5780200124b00', extAddr: '0x000d6f000b66e723', nwkAddr: 31432, deviceType: 2, rxOnWhenIdle: 0, relationship: 1, permitJoin: 2, depth: 1, lqi: 170 }

    LQI request on ZR (0xCECD) entry for ZED:

    { extPandId: '0x14b5780200124b00', extAddr: '0x000d6f000b66e723', nwkAddr: 31432, deviceType: 2, rxOnWhenIdle: 0, relationship: 1, permitJoin: 2, depth: 2, lqi: 117 }

    Jason,

    NWK_END_DEV_TIMEOUT_DEFAULT=0 does not seem to be making the ZED age out. I am hypothesizing that this value may be stored in the Assosiation table and hence require a new association to take effect over the old default. Correct?

    Another way to test this would be a Leave request which would as a result of the bug not be relayed to the device via ZR or sent via the ZC (no Data Request) making it take effect only on the ZC (remove assosication). Correct?

  • Jason,

    I rejoined ZED via ZC then forced it to reconnect via ZR (moved it). It did not age out after 3 minutes (NWK_END_DEV_TIMEOUT_DEFAULT=0) as its still shows in a LQI to ZC (0x0000).
  • It seems that your ZED are existing on both ZR and ZC. Check if zgChildAgingEnable is set to TRUE in your ZC.
  • YK, it is. Although it should not be necessary.

    When the ZR takes the ZED the ZC should be updating it's table to enable routing to the ZR.
  • Jason,

    To lower the timeout a NVRAM reset is required. Changing the #define does not cause the item to be updated in the NVRAM.

    Once lowered after the Child Aging timeout occurs the ED will be removed from the ZC association list. Then a route discovery will be performed and the route via ZR is found and used. Communication resumes.

    As this could occur at (as per the default) 240 minutes. It would be best the assosiation was removed when the ED is first known to exist on ZR. How can this be made to happen?

  • A possible internal work around would be something like on received packet
    if nwkAddress != ieeeAddress (received from router) and contained in local association table then remove from local association table. This would require a packet from the ZED unfortunately, but might be a good idea regardless. Thoughts?

    As a possible external work around using MT commands we might be able to do something like:
    1. If communication to ZED is failing with ZMacTransactionTimeout and there is at-least one ZR on the nework
    2. Issue MT_ZDO_EXT_ROUTE_DISC
    3. If response received with route, clear association from ZC with MAC_DISASSOCIATE_REQ

    Result: ZC would go on to do it's own route discovery learning about the route we found in step 3.

    Thoughts? Would that work. Am I understanding the action performed by these MT commands correctly?
  • You original issue is that ZED connects to ZR and ZC still has ZED on its association list so ZC cannot send message to ZED. I don't think your workaround would work. I think the issue is why ZC doesn't remove ZED from its association list after child aging timeout.
  • From the previous post. Based on our observations:

    After the Child Aging timeout occurs the ED will be removed from the ZC association list. Then a route discovery will be performed and the route via ZR is found and used. Communication resumes.

    -----

    The problem is that a 240 minute (default timeout) outage occurs. Lowering the ageing timeout to 10s seems dangerous to me as it would result in the loss of valid devices that are sleeping for longer than 10 seconds (most devices). Even longer timeouts carry the same risks as we have devices that are known to sleep for very long periods during inactivity.

    What in particular would not work with either solution?

  • Just use my CC2538DK to run Z-Stack 3.0.1 ZNP and set NWK_END_DEV_TIMEOUT_DEFAULT to 0 in ZGlobals.h to test this. I can see device won't be removed from association list after I turn off ZED for more than 10 seconds. I think this is a bug. what do you think?
  • YiKai Chen,

    I found a work around for that issue.

    Flash your firmware with the new defaults
    Erase the device (this clears the NVRAM)
    Flash the firmware again.

    This way the default timeout will actually be set. This is because of the behaviour of NV_RESTORE and ZCD_STARTOPT_DEFAULT_CONFIG_STATE (not being set).

  • I erase the flash and test again. However, I still see the device cannot be removed from association list after child aging timeout.
  • A bit more reading and it looks like the better behaviour would be:

    1. If communication to ZED is failing with ZMacTransactionTimeout and there is at-least one ZR on the network
    2. Issue MT_ZDO_EXT_ROUTE_DISC
    3. If response received with route, clear association from ZC with MT_NLME_LEAVE_REQ
  • Maybe it would be easier to enable many-to-one routing.
  • What do you mean by many to one routing? This is the coordinator not a router.
  • Hi YK,

    If you are using a Z3.0 ZED as well, it will perform the End Device Timeout Request handshake when doing network association, and it will set its particular timeout value to END_DEV_TIMEOUT_VALUE, which is 8 (256 minutes) by default.
  • OK, I am not aware of this? How can I also change this END_DEV_TIMEOUT_VALUE to 10 seconds for testing?
  • END_DEV_TIMEOUT_VALUE is defined for a ZED in ZGlobals.h as well and it uses the same values as NWK_END_DEV_TIMEOUT_DEFAULT, so you can set END_DEV_TIMEOUT_VALUE to 0 for the ZED to set its own timeout on its parent to 10 seconds.
  • Jason & YK Chen,

    Some additional information my ZEDs are all ZHA 1.2 devices not Zigbee 3.0. That's probably why YK's behaviour is different.

    I've started work on a MT based external workaround. ZDO EXT ROUTE DISCOVERY has not proved to be the correct MT command an the local route is preventing the route discovery request from going out. Hmm.

  • After I set END_DEV_TIMEOUT_VALUE to 0, my ZED cab be aged out. Thanks for your tips.

  • YK,

    Now that you have your problem resolved have you been able to look at the issue reported?

  • Can you provide complete sniffer log of your issue?
  • I will work on taking a packet capture as soon as possible. Unfortunately with Easter aproaching it may be Monday or Tuesday.

    Today we found out that this issue (or a similar one) is also replicatable if you simply disconnect the router abruptly in the middle. The devices reconnect to the coordinator on detecting that their router is gone but the route for the ED disappears after a minute or so from the coordinator, at which point it performs route requests for the directly connected ED/child. We have not performed full diagnosis on this part of the issue as it first occurred hours before a demo.

    This is quite a serious bug for anyone using Z-Stack 3.0.1 I dare say.
  • YK,

    As requested here is a sniffer dump.

    0x6F23 is the sleepy ED.

    As you can see until packet 14 communicating directly with the coordinator.

    After this packet I moved the sensor too far from the coordinator (but closer to Router 0xCB43). The ZED rejoins to network via router at packet 122.

    From here ZC is unable to send outgoing packets to ZED, it looks like the ZC still thinks the ZED is a sleepy ZED (RFD) directly connected and is waiting for a Data Request.

    Sniffer Dump: https://ufile.io/7vy1k

  • Additional information: We have also seen this happen when ZC is turned off and rejoins a network where ZRs are left uninterrupted.

    This also occurs in ZS 3.0.0.

  • Hi,

    When I follow the link above, I get some CC2538 configuration file, not a sniffer log. Can you please repost it? You can .zip the file up and attach it directly to your e2e post.