This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2530: Z-Stack 3.0.2 Green Power bugs, memory leak

Part Number: CC2530
Other Parts Discussed in Thread: Z-STACK, SIMPLELINK-CC13X2-26X2-SDK, CC1352R, CC2652R

I'm working with Z-Stack 3.0.2 on a CC2530.  I'm trying to get it working as a Zigbee PRO Green Power Proxy (GPP).

I have found and fixed some bugs in Z-Stack, and it now I can pair a GPD and the GPP works.  But after about 20 presses the router dies (even with non-GP packets).  There looks to be some large memory leak; I am afraid that the leak may be in the closed source dGP_stub.

Please see my patch for the changes I have made.  I see in Z-Stack 3.5.0 some comments about a memory leak fixed (ZIGBEE-583).  Could this fix be back-ported to Z-Stack 3.0.x?

http://software-dl.ti.com/simplelink/esd/simplelink_cc13x2_26x2_sdk/3.30.00.03/exports/docs/zigbee/release_notes_zigbee_3_5_0.html

Thanks,

Ryan

Z-Stack_3.0.2_gp.patch.gz

  • Hi,

    For Z-Stack 3.0.2, the GPP functionality should work out-of-box, since GPP is required for Zigbee 3.0.
    If you use the original Z-Stack 3.0.2 and apply the same GP traffic, do you also see the router stop working?

    The fix for ZIGBEE-583 was, in fact, applied to dgp_stub.c (which is in a precompiled library).
    Additionally, given that the stacks/application framework have diverged between 3.0.x and 3.x.0, I'm not certain whether backporting will fix the issue.

    Regards,
    Toby

  • Hi,

    Thanks for your reply!

    The original Z-Stack 3.0.2 didn't work at all, it did not send the commissioning notification until I changed the line with !osal_memcmp(pKey,zgpSharedKey,SEC_KEY_LEN).  If you look at my patch you can see there were also pointer errors with zcl_memcpy().  Before I fixed this, the commissioning notification payload was garbage.  If you look closely at the code you can see the error, it is getting the address of something that is already a pointer, so then it gets the address of the pointer itself.  I don't see how it should have worked before.

    The reason I think it is a memory leak is because memAlo increases greatly every time a GPDF is proxied, and keeps climbing every time another packet  is proxied, until the unit stops responding altogether.

    A final change I made in the patch is to use the GPD nwk addr alias with the commissioning commands as well.

    Thanks,

    Ryan

  • One thing you could consider doing to get more debug information: make a wrapper for event handler for GP, gp_event_loop, e.g. in tasksArr, replace it with gp_event_loop_wrapper. Something like the following:

    UINT16 gp_event_loop_wrapper( uint8 task_id, UINT16 events )
    {
      uint16 before, after;
      uint16 retVal;
      
      before = osal_heap_mem_used();
    
      retVal = gp_event_loop( task_id, events );
    
      after = osal_heap_mem_used();
    
      // check which event causes permanent rise in heap usage
    
      return retVal;
    
    }

    If you notice it to be a certain event each time, that will be the event to investigate further.

  • Thanks for that, it is a good idea.  I did it like so, for setting a breakpoint.

    UINT16 gp_event_loop_wrapper( uint8 task_id, UINT16 events )
    {
      static uint16 before, after;
      uint16 retVal;
      before = osal_heap_mem_used();
      retVal = gp_event_loop( task_id, events );
      after = osal_heap_mem_used();
      // check which event causes permanent rise in heap usage
      if (after > before)
        ASM_NOP;
      return retVal;
    }
    

    I tried this, and I saw before = 797 and after = 907, 110 bytes lost.  But, I think it is not so clear, because in zclGp_SendGpNotificationCommand() it queues up the messages and sends them later (it also should free it in gp_NotificationMsgClean()).  So in this case I did see an increase in memory from before and after but it could be something (like the message) that is freed at a later time.

    To test that theory, I disabled almost all GP callbacks.  I only leave the one needed, and it is called only on boot.

    void gp_CBInit(void)
    {
    //  GP_DataCnfGCB = GP_DataCnf;
      GP_endpointInitGCB = gp_endpointInit;  
    //  GP_expireDuplicateFilteringGCB = gp_expireDuplicateFiltering;
    //  GP_stopCommissioningModeGCB = gp_stopCommissioningMode;
    //  GP_returnOperationalChannelGCB = gp_returnOperationalChannel;
    //  GP_DataIndGCB = GP_DataInd;
    //  GP_SecReqGCB = GP_SecReq;   
    //  GP_CheckAnnouncedDeviceGCB = gp_CheckAnnouncedDevice;
        
      GP_aliasConflictAnnce = &aliasConflictAnnce;
      
      GP_endpointInitGCB();
    }

    So as far as I can tell, there should be no consumed memory because nothing is called when a GPDF is received.  I saw before = 844 and after = 955, 111 bytes lost.  So now I think even more something is losing memory inside dGP_stub.  What do you think?

  • Can you share a sniffer log with the GPDFs?

  • Attached is a capture of the commissioning, followed by a button press (Toggle).  This is from a PTM 215Z (EnOcean module inside the Hue Tap).

  • Hi Ryan,

    I apologize for the delay but this is still being investigated.  Can you provide the actual sniffer file as this would be easier to parse using software?  Was it captured in Wireshark or Ubiqua?

    I also highly recommend considering the SimpleLink CC1352R /P or CC2652R /P devices as this Z-Stack solution provided with SIMPLELINK-CC13X2-26X2-SDK receives quarterly updates as compared to the Z-Stack 3.0.2 solution (final update was October 2018).  There are also advanced resources provided in TIRex: http://dev.ti.com/tirex/explore/node?a=pTTHBmu__&node=AOSQDVXMohlV5LElLx5wxA__pTTHBmu__LATEST 

    Regards,
    Ryan

  • Thanks a lot for your both of your help, I appreciate it.  Those newer parts are indeed nice but the CC2530 is still quite a bit less expensive.  So far I have been able to solve any issues because the source code is available, but I do not have that luxury with this last issue.

    It would be nice if you guys would consider creating another release.  I have needed to find other fixes here and there on the forum, mostly with the BDB code.  As far as I can tell the CC2530 is still active and not NRND.

    I have attached the Ubiqua capture in the native CUBX format.

    gpdf.cubx.gz

  • I've sent you a friend request so that I can share further information off the public forum.  The CC253X devices are no NRND, however our R&D Teams are committing all development efforts on newer products.  I apologize for the frustration this business decision causes towards your project but I hope to provide more assistance off this thread.

    Regards,
    Ryan

  • I've accepted your friend request.  I understand that business decisions don't often follow what engineering wants.  What I'm struggling to understand is how Z-Stack 3.0.2 was certified for Zigbee 3.0 when it doesn't look like it actually works.  My patch in the first post showed some bugs that would not allow that code to work.  On the other hand, it seems so close.  If this memory leak problem were fixed then perhaps that would be good enough.  Perhaps it's worth reevaluating that business decision in light of these bugs.

    Did the sniffer capture help?  Were you able to reproduce the problem?

    Thanks again for your help!

  • The Zigbee Alliance certifies based on their Test Specification which does not account for memory leakage, among other possible issues for a given platform.  Fixing these issues and creating a new release would require re-certification and this involves several other problems since the CC253X Z-Stack solution uses the R21 Zigbee Specification which has been deprecated since 2018 in favor of newer revisions.  TI has determined that development efforts are better spent on the newest SimpleLink CC13X2 / CC26X2 platform.

    I have sent you a private message so that this investigation can be continued outside the public forum space.

    Regards,
    Ryan