CC1352P: Lost ZC->ZED data packets

Part Number: CC1352P
Other Parts Discussed in Thread: SYSCONFIG, Z-STACK,

Hello

We have a ZED and ZC, both using CC1352P1F3 devices and built with SDK 4.30.0.54. Everything works largely correctly as designed.

The one exception to this is when queueing packets on the ZC. When queued, we can check their presence via nwkDB_CountIndirectHold(), and when the ZED wakes & connects, this drops to zero as expected. However many packets are not received by the ZED. These packets are custom ZCL clusters, addressed with short addresses.

Using Wireshark, a TI Zigbee sniffer dongle, and a bit of glue (whsniff), all over the air network activity is clearly seen and decrypted.

When a ZED device connects and goes through the process of rejoin requests etc etc, the "data request" packets are clearly seen, but only sometimes can the queued packet be seen being returned. Yet the queued packet(s) are always removed from the queue buffer well before timing out (set to 127s).

What happens to the queued packets? Are they de-queued ready to be sent, but dropped? Could there be a setting we have missed which allows only some (but identical) packets through? Is this a known problem or feature?

Kind regards

  • Hello Gary,

    I assume that you have set SysConfig -> Z-Stack -> Advanced -> Network Indirect Message Timeout to 127 seconds?  What other changes to the network configuration have been made?  How often does the ZED poll for data? Are multiple packets being queued for several sleepy ZEDs?  Do you monitor the return value of AF_DataRequest for errors?  You may want to consider altering nwk_globals.c macros including NWK_MAX_DATABUFS_* and NWK_INDIRECT_MSG_MAX_PER depending on your system needs.  Feel free to provide a sniffer log if it could help provide more issue details. 

    Regards,
    Ryan

  • Is the current implementation of indirect message queue first-in-first-out which means a new indirect message cannot be put into the queue when indirect message buffer is full?

  • That's correct YK, if the indirect message buffer is full then ZBufferFull is expected to be returned.

    Regards,
    Ryan

  • Thanks for confirming.

  • Hi

    The buffer has more than sufficient space (8 packets), and only queue 1 or 2 at a time. A check is made to ensure queuing was successful via checking for "== ZSuccess".

    Kind regards

    Gary

  • Hi Ryan

    Yes, the timeout is set to 127s using Network Indirect Messages Timeout :

    The ZED is currently set for 100ms, but has been 1s and 3s; all in an attempt to ensure collection of pending packets.

    Current testing is one ZC, one ZED, and 1 queued packet.

    Sniffer log to follow.

    Kind regards

    Gary Partis

  • Hi again Ryan

    Please find attached a PCAP file from Wireshark, showing a ZED connecting to a ZC, with a queued packet in the ZC, but not being transferred.

    Kind regards

    Gary Partis

  • Hi again, I dont think the file was correctly attached in last message, so here is another attempt.

    example of zigbee not getting zc to zed packet.zip

  • Can you please share the NWK key?  The packets are encrypted on my end.

    Regards,
    Ryan

  • Hi Ryan

    Ah - sorry - Doh!

    Default Network Key : AAABACADAEAFBABBBCBDBEBFCACBCCCD

    TC Link Key : 5E45A7A15E45A7A25E45A7A301020304
    Kind regards
    Gary
  • Can you please identify an instance where the queued packet is successfully transmitted?  I cannot see any ZCL messages from the ZC on the sniffer log.  Please also answer the following:

    • What channel are you using? 
    • Are you evaluating LaunchPads or a custom EVM?
    • What is the transmit power value?
    • How close together are the two nodes and where is the sniffer located?
    • Does your environment include physical barriers?
    • Is there a lot of channel noise or Wi-Fi/BLE interference?
    • Can you share a code snippet of the API used to send the packet?
    • Have you monitored zstackmsg_CmdIDs_AF_DATA_CONFIRM_IND for return messages?
    • Are there any other network configuration changes worth mentioning?

    Given your explanation of the poll rate and number of queued messages, I expect channel transmission failure as compared to timed out network messages.

    Regards,
    Ryan

  • Hi Ryan

    I'll answer each point in turn

    1. We are using channel 15 as channel 11 is used by another zigbee network
    2. We are using custom CC1352P boards which have been heavily tested with proprietary RF comms for another project
    3. TX power is the default of 0dBm
    4. Two nodes are around 20cm apart, with the sniffer being a further 1.5m away
    5. There are no physical barriers, everything sitting on desks in same office
    6. There is no BLE, but some WiFi activity in the area (difficult to avoid nowadays!)
    7. Data TX code is shown below
    8. When monitoring zstackmsg_CmdIDs_AF_DATA_CONFIRM_IND, the ZED connects, callback made, and zstackmsg_afDataConfirmInd_t structure contains a status value of 0xf0, with correct endpoint and custom cluster ID. Standard list of allowable status values does not include 0xf0, but I believe these status values can be propagated up from MAC level.
    9. Nothing that I can think of. We used the ZigBee generic app as the initial templated and built from there. Zigbee & RF settings left largely unchanged (poll rates excepted).

            /* get some memory for packet */
            zclReportCmd_t *pReportCmd = OsalPort_malloc( sizeof( zclReportCmd_t ) + sizeof( zclReport_t ) );
            if( pReportCmd != NULL )
            {
                afAddrType_t address;
                zstack_getZCLFrameCounterRsp_t rsp;
    
                /* create temporary address block of remote sensor device, with short address */
                address.addr.shortAddr = sensorid;
                address.addrMode       = afAddr16Bit;
                address.endPoint       = APP_ENDPOINT;
                address.panId          = 0x0000;
    
                /* fill in the attribute information for the time and send it */
                pReportCmd->numAttr = 1u;
                pReportCmd->attrList[ 0u ].attrID      = attrid;
                pReportCmd->attrList[ 0u ].dataType    = datatype;
                pReportCmd->attrList[ 0u ].attrData    = block;
                Zstackapi_getZCLFrameCounterReq( appServiceTaskId, &rsp );
                okay = ( zcl_SendReportCmd( APP_ENDPOINT, &address, clusterid, pReportCmd, ZCL_FRAME_SERVER_CLIENT_DIR, true, rsp.zclFrameCounter ) == ZSuccess );
    int32_t cnt = nwkDB_CountIndirectHold();
    
                /* release memory */
                OsalPort_free( pReportCmd );
            }
            else
            {
                /* could not allocate memory - so fail */
                okay = false;
            }
    

    I also grab the pending queue size as temp test code to sanity check stuff.

    One thing has come to light however, could the zstackmsg_CmdIDs_AF_DATA_CONFIRM_IND status value 0xf0 come from MAC_TRANSACTION_EXPIRED (value of 0xf0)? Between queueing and call back was around 5s; no where near 127s. Plus why the MAC_TRANSACTION_EXPIRED when ZED connects? Too much of a coincidence...?!

    With regards to channel transmission failure, why just the queued packets, and nothing else? Again, too much of a coincidence.

    Kind regards

    Gary Partis

  • Hey Gary,

    Thank you for providing all of these details!  Your environment appears to be fine.  Your tested poll rates prove that Network Indirect Message Timeout doesn't even need to be set to its maximum value.  0xF0 indicates a ZMacTransactionExpired but I think this may be due to an incorrect destination address.  Are you using the shortAddr and endPoint for the destination ZED?  The panId parameter shouldn't  matter as this is not INTER_PAN communication but it is typically set to _NIB.nwkPanId.  If you were to bind the involved clusters on the ZC then you could use indirect addressing.  I'm also not sure whether zcl_SendReportCmd is the correct API based on your application's needs.  Please consider using Zstackapi_AfDataReq() and review the Zigbee Fundamentals SLA.

    Regards,
    Ryan

  • Hi Ryan

    Many thanks for your reply. I checked and double checked and the short address is always correct. I also ensured all references to PanId where always set to _NIB.nwkPanId instead of zero, but as you quite rightly pointed out, this made no difference.

    However, the comment adjacent to the definition of MAC_TRANSACTION_EXPIRED says "device did not respond before the transaction expired or was purged". When ever the ZED connects, it always does a "rejoin" request, is given a new (but the same) short address. Could this cause the purging of packets for that ZED?

    The fact the ZED is allocated an address, although the same address, does smell of something I am not doing correct; ie. why does it request to rejoin when it ought not need to?

    Please note that our ZED device powers down completely when "sleeping", and not into a CC1352P sleep state. Wake up is performed by an external timer.

    Kind regards

    Gary

  • Hey Gary,

    I don't believe the ZED is assigned the same short address on rejoin, as it uses the short address in NWK Header of the Rejoin Request.  It is more likely that the ZC Rejoin Response confirms the ZED's ability to rejoin as it has already been previously associated.  You can check its association and child status on the ZC with AssocCount and AssocIsChild, respectively.  Although if these were not true then I would expect different error messages or over-the-air behavior.  How long does a ZED sleep before a wake up?  Does it possibly age out of the ZC's child table (256 minutes by default)?  Once again, I expect this would be corrected on the rejoin and not affect behavior.  

    By looking through the Z-Stack source code, I can confirm that the ZC will remove any queued messages for a rejoining device when it receives the Rejoin Request so that no previously queued messages get in the way of the Rejoin Response.  You may need to account for this in your application.

    Regards,
    Ryan

  • Hi Ryan

    Once again, many thanks for your reply.

    The ZED wakes up after a couple of seconds, to ease testing, yet it still does a “rejoin request”. The PCAP file sent earlier in the week (above) shows this, as the second wakeup in the file at ~8.41s, does a “beacon request” and a “rejoin request”. It is (re)allocated the same short address, also visible in PCAP file. It has had the same address for about 3 weeks now.

    However, your last message says something important, “looking through the Z-Stack source code, I can confirm that the ZC will remove any queued messages for a rejoining device”. This is indeed what is happening.

    Checking a Zigbee reference, it says ZEDs will issue a “rejoin request” after such events as a power outage. This is also happening as we kill power to the CC1352P via external control to reduce overall board power consumption to ~50nA. The CC1352P reboots for every sensor capture and Zigbee communication.

    So now I know why it is happening, I need to compensate for it. I’ll do this probably by queuing messages myself, then pass them to the Zigbee stack, after a ZED has rejoined (probably with a Check Announce callback message).

    Kind regards

    Gary