This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2530: ZED (End device) refuses to leave parent

Part Number: CC2530
Other Parts Discussed in Thread: Z-STACK

Hello,

I'm having a strange issue with the zigbee stack that maybe someone else came across with but I couldn't find an answer in the forum.

The problem is my end device after a while refuses to leave the network after receiving the "Leave" message from the parent and continues to make data requests like nothing happened. The test was made in a network with 10 devices with some curious behavior.  If I remove the parent it was paired with (and refused to leave), it detects the link loss and changes to a new parent but with no rejoin message, is this correct behaviour?

I'm using stack 3.0 since we also had this problem with the previous stack (1.2.2a) and thought maybe the and upgrade would fix the issue, but seems like the problem remains. This end device was based on the temperature sensor suplied as an example on the stack with the power saving feature enabled since our hardware is battery powered.

This is the compilation flags I have in the current code in case its relevant

POWER_SAVING
ISR_KEYINTERRUPT
BDB_REPORTING
SECURE=1
TC_LINKKEY_JOIN
NV_INIT
NV_RESTORE
NWK_AUTO_POLL
LCD_SUPPORTED=DEBUG
MULTICAST_ENABLED=FALSE
ZCL_READ
ZCL_WRITE
ZCL_BASIC
ZCL_IDENTIFY

I attached a sniffer file where we can watch the incorrect behaviour of the end device (shortAddr 0xA301).

So far what I tried to recover the device when the comunication is lost was:

ZDP_NwkAddrReq - To make sure the short adress of the device we are trying to acess is correct

NLME_RouteDiscoveryRequest - To force a unicast route discovery

ZDP_MgmtLeaveReq - Broadcast a Leave request to all routers and coordinators to force a end device rejoin

If someone has any idea of something that may mitigate the issue feel free to share, I'm running out of things to try and places to look and this isn't really something acceptable for the solution we are trying to design.

Leave infinito.zip

  • I don't see sniffer log is attached.
  • I edited the post after I first submitted because I forgotten to add it. You should be able to see it now
  • I don't see there is any leave response in your sniffer log. I suggest you to set a breakpoint in ZDO_LeaveInd() on device side to check if device receives leave request.
  • Thanks for the fast response.

    I was looking into the function you suggested and there is a commentary in the function saying this:
    "Only respond if we are not rejoining the network" in the line 3169 of ZDApp.c
    This leads me to believe the end device is not supposed to send a leave response, since the leave message has the rejoin flag active. Or am I seeing it wrong?
    The problem with debugging this problem is that it only happens after the device is powered after at least one day and I haven't figure out a way to reproduce it in a consistent way. This function seems to be called from the precompiled libraries so it makes lets suppose I manage to be on debug when this problem happens and the program doesn't stop in a breakpoint on that function you have any other ideas of things to try?
  • Can you elaborate why the parent node keep send leave request with rejoin flag as true to this end device?
  • The idea behind this approach was to find a way to recover the connection between the end device and coordinator which sometimes gets lost along the way.
    Couldn't really figure out so far why this happens so I've been trying to find a way around this issue.
    So far I tried to make sure the device has the short address which I think it does with the "ZDP_NwkAddrReq" function because I've seen at times when the device changes parents the shortAddr changes also.
    Then I tried to force a route discovery in unicast mode using "NLME_RouteDiscoveryRequest" which sometimes seems to fix the problem at times when the shortAddr changes

    I noticed with ubiqua when the device gets "lost" it still polls the parent for messages but they don't seem to reach it. My idea was that maybe a rejoin would fix the issue that's why the "Leave request" has the rejoin flag enabled because I want it to rejoin one parent. If I putted that flag to false the device would leave the network and never come back which is something I don't want.

    In theory zigbee should be able by itself to deal with this and recover without any workarounds but the reality looks a bit different. I need to find a reliable way to recover from this situation but seems like I'm not quite there yet.

    At this point if I don't do anything the end device dies after one day and the only solution for recover is a power cycle

  • What intrigues me is that this is an almost untouched version of the example supplied by the stack. So I find it difficult to be some sort of memory leak or firmware bug.
    I've been around this problem for a few weeks and to be honest I don't know what else to do...
    In any case thanks for your suggestions, I'll see if something else crosses my mind or if I can at least reproduce the bug in a reliable way.

    If I end up finding the problem I'll share it here, and I'm always open to new suggestions to try :)
  • 1. Since your device 0xA301 keeps polling, why do you say the device gets "lost". Do you mean the device is doing polling but you cannot send message from ZC to it? If so, what kind of message do you send?

    2. You mentioned some sort of memory leak or firmware bug on almost untouched version of the example supplied by the stack. If you use complete untouched version of the example supplied by the stack, do you still see memory leak?

  • By "lost" I mean I can't send/receive messages from the end device to the coordinator. I know that by seeing that the device 0xA301 polling the parent everything should be working as expected but it doesn't.
    I don't have a log on me right now, but what I saw with ubiqua was that the "read package" traveled all the way to the parent but never got delivered to the end device even with the polling occurring every 3 seconds. Without being able to communicate with the end device there is very little I can to recover the communication since from the end device perspective everything is ok since it gets replies to the data requests.
    Assuming the problem is in the parent, this shouldn't justify the leave loop that originally made me start this post or should it?

    About the memory leak I didn't see any, what I was trying to say is that I made almost no changes to the firmware so I don't think I introduced any bug. I even used the "SmartRF05" board with the original example of the temperature sensor supplied by the stack and I get the exact same behavior.
    Only the end device has the stack 3.0 the rest of the devices in the test setup have the 1.2.2a or are third party components that mention zigbee compliant in the box. I'm not sure if it can be a compatibility issue or not.
  • 1. Does parent node run Z-Stack 3.0?
    2. How do you judge there is memory leak?
  • The parent node is running the stack 1.2.2a

    About the memory leak maybe I didn't explain myself well, I wasn't saying I saw or suspected of one. Since the code is so close to the original example I don't think that's the problem.
    I don't think it's an hardware problem either since it still happens using the TI SmartRF05 board and not only with our custom board.

  • I suggest you to use Z-Stack 3.0 router to test this.
  • Hi MSantos,

    From your capture, I see that the device that doesn't leave the network also doesn't send a single packet besides dataRequest. I'm wondering if the device has something wrong with the NWK key. Can you stimulate the device to send any packet or the other devices to read and attribute or something else besides the leave request to see if the device can share packets in the network.

    Also try to use a brand new SDK and create an small network just to see if everything works as expected and then try to kick the device again to see if leaves or ignores the packets. Just try this as a sanity check.

    Regards,
  • Hello,

    I don't have a capture with me but I tried to read attributes while the device was on that loop without success. If I do it again I'll attach a sniff that may help to see what it's going on.
    About creating an entire network on the new SDK I already have a router and end device but I'm still missing a coordinator to have a full system in zigbee 3.0
    So far for the tests I've been using a coordinator with the old version of the stack (1.2.2a) which supposedly is compatible with zstack 3.0
    Creating a coordinator is a bit more involved since I need to implement a uart interface (or another way) to control the system.

    One strange behavior I saw in the tests was this
    -> Coordinator requests a attribute read from the end device
    -> Request travels all the way to the end device parent
    -> End device polls parent but no message is delivered. There were no timeouts and the messages seemed to be ok
    -> Tried to kick the device out of the network with rejoin active
    -> End device entered a loop of perpetual leave with the parent

    Not sure if this indicates anything, but maybe triggers some idea in someone else head. So far even though this problem happens here and there I couldn't find a way to reproduce it in a consistent way
  • Hi MSantos,

    What do you mean by a coordinator to have a full system in zigbee 3.0?

    It will be helpful if you can create a capture of the behavior and share with us.

    Regards 

  • Hi Jose,

    The coordinator used in the system is also developed by us but it uses an earlier version of the stack (1.2.2a)
    By a full system in zigbee 3.0 I mean that all devices use the zstack 3.0.
    The idea is to exclude possible compatibility problems between zstack 1.2.2a and zstack 3.0
    I don't have the capture yet since I had to work in another project in the meantime, but as soon as I have it I will try to post it here
  • I suggest you to do the test with everything (ZC, ZR, asnd ZED) runs Z-Stack 3.0.