This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2652R: End device takes very long to resume after pause for more that 10 minutes

Part Number: CC2652R
Other Parts Discussed in Thread: Z-STACK

A customer is testing the Zstackapi_pauseResumeDeviceReq api in order to pause and resume an end device. They are able to use this api to put the device into pause, and if Zstackapi_pauseResumeDeviceReq is called right after the device is paused, the device will rejoin the network smoothly. However, if the device was left in pause for more that about 12-13 minutes, then after Zstackapi_pauseResumeDeviceReq is called the end device will take more that 18 minutes to rejoin.

According the sniffer log in attachment, when the issue happens the end device can send the Beacon Request, and the coordinator responds with a Beacon: NwkClosed. But after that the end device won't send the Rejoin Request immediately, and the device status stays in "Discovering" from the UART UI. Then after about 18 minutes, the end device finally sends out the Rejoin request and is able to rejoin successfully.

Normal pause and resume:

When the issue happend:

I am trying to debug this but have limited understanding about how the Zstackapi_pauseResumeDeviceReq works. Please help on where to start to debug this issue. The issue can be reproduced with the zc_light and zed_sw examples from SDK 5.30, with some modification to enable calling the api with buttons on the LaunchPad:

static void zclSampleSw_processKey(uint8_t key, Button_EventMask buttonEvents)
{
    if (buttonEvents & Button_EV_CLICKED)
    {
        if(key == CONFIG_BTN_LEFT)
        {
            // Use left button for pause
            zstack_pauseResumeDeviceReq_t zstack_pauseResumeDeviceReq = { 0 };
            zstack_pauseResumeDeviceReq.pause = true;
            Zstackapi_pauseResumeDeviceReq(appServiceTaskId, &zstack_pauseResumeDeviceReq);
        }
        if(key == CONFIG_BTN_RIGHT)
        {
            // Right button for resume
            zstack_pauseResumeDeviceReq_t zstack_pauseResumeDeviceReq = { 0 };
            zstack_pauseResumeDeviceReq.pause = false;
            Zstackapi_pauseResumeDeviceReq(appServiceTaskId, &zstack_pauseResumeDeviceReq);
        }
    }
}

Thanks.

Best regards,

Shuyang

  • Hi Shuyang,

    This Z-Stack API is not commonly recommended as it is not commonly used by typical Z-Stack operations or included in any application examples by default.  I do not know the last time it was tested/verified by the Software Development Team.  Regardless, if you would like to further debug then please be aware that this API ultimately calls ZDApp_PauseNwk or ZDApp_ResumeNwk based on the pause value.  You may want to pay particular attention to the behavior of bdb_parentLost and could further consider using SysCtrlSystemReset instead to simply restart the device.  Although not directly related, this issue could be somewhat related to the behavior noticed in this other E2E thread which is currently being investigated for resolution involving the low-level MAC.  Is power consumption affected when these APIs are used and what is the applied poll rate?

    Regards,
    Ryan

  • Hi Ryan,

    I did some debugging for this problem and found out that the ZDO_NetworkDiscoveryConfirmCB is never called when the issue happens. Because the issue only happens after a certain time of the pause command, is it possible the RX is turned off and does not turned back on properly after a certain timeout?

    The power policy and poll rate are kept as default in the example code.

    Best regards,

    Shuyang

  • Hi Shuyang,

    Once ZDApp_ResumeNwk is applied, bdb_parentLost is applied and then it is the application's responsibility to attempt a network rejoin.  This is accomplished by the device entering the BDB_COMMISSIONING_PARENT_LOST case from *_ProcessCommissioningStatus which eventually leads to the SAMPLEAPP_END_DEVICE_REJOIN_EVT from zclSampleSw_process_loop calling Zstackapi_bdbRecoverNwkReq.  This will in turn use processBdbRecoverNwkReq -> bdb_recoverNwk -> ZDOInitDevice.  Can you confirm that all of this is taking place?  Ca you please test removing nwk_SetCurrentPollRateType( POLL_RATE_DISABLED, FALSE ); from ZDApp_ResumeNwk since this would be the expected behavior for a device entering bdb_parentLost.  Otherwise, the logic is very similar to a ZED which orphans because its parent stops responding, and thus should rejoin the network momentarily

    Regards,
    Ryan

  • Hi Ryan,

    All of the steps has taken place. The device is able to send a beacon request, but never enter the callback for the beacon response.

    I also tried removing nwk_SetCurrentPollRateType( POLL_RATE_DISABLED, FALSE ); and it's not working.

    Best regards,

    Shuyang

  • Hi Shuyang,

    Thank you for continuing to help debug this issue, as I am currently unable to replicate this behavior due to other matters.  I believe you are correct in that the RX being turned off by calling ZMacSetReq inside bdb_parentLost is potentially causing issues.  Next, I would like for you to try replacing bdb_parentLost with bdb_rejoinNwk, or remove it entirely, inside ZDApp_ResumeNwk and see how it affects the device's performance.

    Regards,
    Ryan

  • Hi Ryan,

    If replacing bdb_parentLost with bdb_rejoinNwk, the end device does not send the periodic data request after the recovery. However, removing bdb_parentLost completely seems to work just fine.

    I also tested with commenting out the ZMacSetReq call inside bdb_parentLost, which also seems to fix the problem, the end device is able to rejoin normally after 13 minutes. I think you are right about the ZMacSetReq, although I don't understand the exact reason and why it only happens after a certain period of time of the ZDApp_PauseNwk call.

    Best regards,

    Shuyang

  • Hi Shuyang,

    So you believe removing bdb_parentLost and/or ZMacSetReq completely resolves the customer's issue?  I doubt that these APIs has been evaluated since changes in the orphan process and data polling management have taken place.  I will follow up with this Software Development Team to resolve this API problem.  There are ongoing improvement to the IEEE MAC taking place which could improve the pause timing dependency

    Regards,
    Ryan

  • Hi Ryan,

    I'm not capable of doing a thorough test to verify the potential risk of this change, I only can observe that after this change the end device was able to rejoin the network and send on/off commands to the coordinator. It will be great if you can follow with R&D team to completely solve the issue, very looking forward to the bug fix, thanks!

    Best regards,

    Shuyang