This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SIMPLELINK-CC13X2-26X2-SDK: ZNP crashing when executing interpan commands

Part Number: SIMPLELINK-CC13X2-26X2-SDK
Other Parts Discussed in Thread: SIMPLELINK-CC13XX-CC26XX-SDK, CC2530, CC2531, Z-STACK, CC2652R

After executing some interpan commands, my ZNP firmware based on SIMPLELINK-CC13XX-CC26XX-SDK_5.30.00.56 crashes. After this the ZNP is not responding anymore and has to be replugged to work again. This also happens with older SDKs (I've tested with 4.30, 4.40 and 5.10).

The same command sequence does not make my firmware for the CC2531 and CC2530 crash (Z-stack home 1.2), therefore I believe this is a bug in the simplelink sdk.

Zigbee2MQTT log containing the execute z-stack commands:

Zigbee2MQTT:debug 2021-11-15 18:31:21: Received MQTT message on 'zigbee2mqtt/bridge/request/touchlink/scan' with data '{"transaction":"39mfd-2","value":true}'
Zigbee2MQTT:info  2021-11-15 18:31:21: Start Touchlink scan
  zigbee-herdsman:controller:touchlink Set InterPAN channel to '11' +15s
  zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - interPanCtl - {"cmd":1,"data":[11]} +15s
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,2,36,16,1,11,60] +15s
  zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,100,16,0,117] +15s
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,100,16,0,117] +0ms
  zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 4 - 16 - [0] - 117 +0ms
  zigbee-herdsman:adapter:zStack:znp:SRSP <-- AF - interPanCtl - {"status":0} +15s
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] +1ms
  zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - dataRequestExt - {"dstaddrmode":2,"dstaddr":"0x000000000000ffff","destendpoint":254,"dstpanid":65535,"srcendpoint":12,"clusterid":4096,"transid":18,"options":0,"radius":30,"len":9,"data":{"type":"Buffer","data":[17,0,0,163,155,172,203,4,18]}} +4ms
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,29,36,2,2,255,255,0,0,0,0,0,0,254,255,255,12,0,16,18,0,30,9,0,17,0,0,163,155,172,203,4,18,134] +4ms
  zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,100,2,0,103] +6ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,100,2,0,103] +0ms
  zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 4 - 2 - [0] - 103 +0ms
  zigbee-herdsman:adapter:zStack:znp:SRSP <-- AF - dataRequestExt - {"status":0} +7ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] +0ms
  zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,3,68,128,0,12,18,217] +3ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,3,68,128,0,12,18,217] +0ms
  zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 3 - 2 - 4 - 128 - [0,12,18] - 217 +0ms
  zigbee-herdsman:adapter:zStack:znp:AREQ <-- AF - dataConfirm - {"status":0,"endpoint":12,"transid":18} +16s
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] +1ms
  zigbee-herdsman:controller:touchlink Scan request failed or was not answered: 'Error: Timeout - null - 254 - null - 4096 - 1 after 500ms' +512ms
  zigbee-herdsman:controller:touchlink Set InterPAN channel to '15' +0ms
  zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - interPanCtl - {"cmd":1,"data":[15]} +507ms
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,2,36,16,1,15,56] +508ms
  zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,100,16,0,117] +501ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,100,16,0,117] +0ms
  zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 4 - 16 - [0] - 117 +0ms
  zigbee-herdsman:adapter:zStack:znp:SRSP <-- AF - interPanCtl - {"status":0} +506ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] +1ms
  zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - dataRequestExt - {"dstaddrmode":2,"dstaddr":"0x000000000000ffff","destendpoint":254,"dstpanid":65535,"srcendpoint":12,"clusterid":4096,"transid":19,"options":0,"radius":30,"len":9,"data":{"type":"Buffer","data":[17,0,0,38,82,31,81,4,18]}} +5ms
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,29,36,2,2,255,255,0,0,0,0,0,0,254,255,255,12,0,16,19,0,30,9,0,17,0,0,38,82,31,81,4,18,226] +4ms
  zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,100,2,0,103] +6ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,100,2,0,103] +0ms
  zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 4 - 2 - [0] - 103 +0ms
  zigbee-herdsman:adapter:zStack:znp:SRSP <-- AF - dataRequestExt - {"status":0} +6ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] +0ms
  zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,3,68,128,0,12,19,216] +4ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,3,68,128,0,12,19,216] +0ms
  zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 3 - 2 - 4 - 128 - [0,12,19] - 216 +0ms
  zigbee-herdsman:adapter:zStack:znp:AREQ <-- AF - dataConfirm - {"status":0,"endpoint":12,"transid":19} +513ms
  zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] +0ms
  zigbee-herdsman:controller:touchlink Scan request failed or was not answered: 'Error: Timeout - null - 254 - null - 4096 - 1 after 500ms' +512ms
  zigbee-herdsman:controller:touchlink Set InterPAN channel to '20' +1ms
  zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - interPanCtl - {"cmd":1,"data":[20]} +508ms
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,2,36,16,1,20,35] +508ms
  zigbee-herdsman:controller:touchlink Restore InterPAN channel +6s
  zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - interPanCtl - {"cmd":0,"data":[]} +6s
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,1,36,16,0,53] +6s
Zigbee2MQTT:error 2021-11-15 18:31:34: Request 'zigbee2mqtt/bridge/request/touchlink/scan' failed with error: 'SRSP - AF - interPanCtl after 6000ms'
Zigbee2MQTT:debug 2021-11-15 18:31:34: Error: SRSP - AF - interPanCtl after 6000ms
    at Timeout._onTimeout (/Users/koenkk/Git/zigbee2mqtt/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
Zigbee2MQTT:info  2021-11-15 18:31:34: MQTT publish: topic 'zigbee2mqtt/bridge/response/touchlink/scan', payload '{"data":{},"error":"SRSP - AF - interPanCtl after 6000ms","status":"error","transaction":"39mfd-2"}'

  • Hi Koen,

    Please further describe the nature of the InterPAN commands which replicate the issue.  For instance, is there a certain command, data, number, or interval which reproduces the issue and is it always guaranteed to fail eventually?  Are you able to further debug the ZNP to monitor the heap or check the call stack after it crashes?

    Regards,
    Ryan

  • Hi Ryan,

    I have a CC2652R launchpad which I can use for debugging. I'm able to run the SDK as Debug from CCS, while CCS is connected to it I can start Zigbee2MQTT and I can reproduce the issue. Where can I find the required debug information?

  • Hi Koen,

    Thanks for getting this set up.  Please add HEAPMGR_METRICS and add the expressions listed in the Application Overview section of the Z-Stack User's Guide.  Pause the device after failure and record these values, also show the full Debug window along with where the code stops.  You can refer to the Debugging module if you are comfortable using the ROV, it may also help to disable optimizations. 

    Regards,
    Ryan

  • I hope this contains the necessary information:

  • Thanks Koen, I'll let you know if any further data is needed from your end.

    Regards,
    Ryan

  • Hi Koen,

    The Software Development Team recently applied a fix which may help (although most likely not resolve) the issue you've observed, please follow the workaround provided in this E2E post.  Also, does this issue only occur when performing TL Scans or can it also occur during normal network operation?

    Regards,
    Ryan

  • I applied the fix but it did not resolve the issue. This only happens when performing TL scans. I don't think the issue is caused by a memory leak since it happens after just a few commands (with a MEM heap being set to 48kb)

  • Thanks for the feedback Koen.  I agree that this problem is not caused by a memory leak.  I've been able to replicate the behavior with Z-Tool and the default ZNP.  I'm continuing to investigate this issue with the Software Development Team.

    Regards,
    Ryan

  • Hi Koen,

    The Software Development Team found that when performing a StubAPS_SetInterPanChannel() channel change, the Zigbee network task would set its state to idle. During this state no packets are sent over the air, but queued instead.  The first channel change works correctly and the response sent back. However, subsequent requests are made with the network task still in the IDLE state and as such, the pending packets are never sent out.  This is why we see the same packet being queued/de-queued inside the call stack of the debug session.  This will be addressed in the v6.10 SIMPLELINK-CC13XX-CC26XX-SDK, but meanwhile it is possible to add nwk_setStateIdle( FALSE ); inside the StubAPS_SetNewChannel function from within stub_aps.c to resolve the issue.

        ZMacSetReq( ZMacRxOnIdle, &rxOnIdle );
    
        channelChangeInProgress = FALSE;
        nwk_setStateIdle( FALSE );      //LINE ADDED
    
        return ( ZSuccess );
      }

    I verified this solution on my system but please test it out and let me know if it works for you as well.

    Regards,
    Ryan

  • Sorry for the late reply, that fixed it indeed!