This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2652R: High packet loss on Zigbee devices routed through other devices

Part Number: CC2652R
Other Parts Discussed in Thread: CC2530, Z-STACK, , SIMPLELINK-CC13XX-CC26XX-SDK

Hi, I'm seeing some weird behavior on my Zigbee mesh. I have a mesh with several different generations of our IoT product line. The coordinator is based on a CC2530 and there are routers on the mesh based on both the CC2530 and CC2652.

I have a handful of CC2652 units that are seeing significant packet loss, in the neighborhood of 25% or so. I've been using Wireshark to sniff the network and I've found that the problematic units all seem to be routing to the coordinator through other intermediate devices, which of course is quite proper for a mesh network. In the sniffer, I can see the problem devices send out packets and in the normal case, I can see the intermediate device repeat the packet, sending it to the coordinator. When we see packet loss, I can still see the initial packet go out but the intermediate device never repeats it and the coordinator never sees it.

The screenshot below is a Wireshark capture of a normal sequence. The first packet in the capture, packet number 2097, is coming from the device with address 00:12:4b:00:25:cb:6f:cc. This device has been assigned a short MAC of bba4. The initial packet gets the 802.15.4 Ack response. Next, in packet number 2099, the intermediate routing device with address 00:12:4b:00:25:cb:6f:a5 repeats the packet, sending it on to the coordinator.

This next screenshot is a case where we see packet loss. Device 25:cb:6f:cc is sending out its packet and gets Ack'ed. The intermediate routing device never forwards it on though and the coordinator never reports seeing it. The next packet the sniffer sees is some time later and is a Link Status packet from a completely unrelated device.

I don't believe that this is a case of packets are getting lost on the radio. The units experiencing packet loss are physically in very close proximity to other devices that are seeing 0% packet loss.

When the problematic devices are put on a mesh where they are not routed through an intermediate device, the same devices see very, very low levels of packet loss.

I've got a current working hypothesis with very little hard evidence to back it up. Based solely on the RSSI values of the 802.15.4 Ack packets, I suspect that when we see packet loss it's the coordinator that is sending the Ack to the initial packet and not the intermediate routing device. I haven't got the visibility into the depths of the Zigbee stack to say if this is possible or not. I'll admit, I am leaping to that conclusion with very flimsy evidence.  Is there any way in Wireshark to see which device is sending the 802.15.4 Ack?

Any thoughts on what may be going on are appreciated. Please let me know if there is any other information that would help.

Regards,
Grant China
WattIQ, Inc.

  • Hi Grant,

    Are the intermediary devices CC2530s or CC2652Rs, and are there any ZEDs or are these all ZRs?  What is the NWK destination of the device as compared to the MAC destination?  You can select the packet in Wireshark to learn more about it's origin and intent.  Please provide the sniffer log and exact packet numbers in question for a more detailed review.  Also note that SimpleLink CC13XX / CC26XX Z-Stack devices are tested for large network stability quarterly.  Some recommendations are provided in SWRA650.  I'd also recommend adopting a different ZC as the CC2530 has significant RAM limitations, and it's Z-Stack solution has not been updated for several years (unlike the CC2652R).

    Regards,
    Ryan

  • Hi Ryan, thanks for the response. The mesh is a good mix of CC2652Rs and CC2530s but for the particular CC2652R ZR that I'm looking at in the examples below, that ZR is being routed through a CC2530 ZR. Aside from the ZC, all the devices on the mesh are ZRs.

    Here's my capture file: app.box.com/.../z430requg287tl9yo4y0q7xfhm5lklc9

    Let me know if you have any trouble downloading it. I'm sorry that I haven't got a minimal capture for you to look at, this one is from a mesh with several ZRs on it. It's a pretty busy capture file.

    First, let's take a look at a data packet that succeeds in getting all the way to the ZC. Take a look at the 4 packet sequence starting at packet #2097. This is a packet being sent by the ZR to the ZC. In packet #2097, the NWK destination is 0x0000. The NWK source is 0xbba4, which is the originating ZR. The WPAN source MAC of this packet is 00:12:4b:00:25:cb:6f:cc. The next packet, #2098, is the Ack. Then in packet #2099, we see another ZR device, WPAN source MAC 00:12:4b:00:25:cb:6f:a5, repeating the exact same packet, same NWK src and dst as the first data packet. This packet also gets Ack'ed by packet #2100. As I said, the ZC sees this data packet fine and sends it up to our app.

    For an example of a packet that is lost, look at packet #2026. This is a data packet from the same initial ZR but this packet never makes it to the ZC. Once again, the NWK destination of this data packet is 0x0000. Again, the NWK source is 0xbba4 and the WPAN source MAC of this packet is 00:12:4b:00:25:cb:6f:cc. This packet gets Ack'ed by packet #2027. And that's it. The packet is never seen by the ZC.

    This pattern is very consistent. For this particular originating ZR, NWK source 0xbba4, every data packet that makes it to the ZC gets repeated by the ZR with WPAN source MAC 00:12:4b:00:25:cb:6f:a5. Every data packet that gets dropped gets transmitted by the originating ZR but I never see it repeated. It gets Ack'ed but that's it, the ZC never sees it.

    I've double clicked on the Ack packets but Wireshark doesn't provide any detail about who is transmitting the Ack. I don't think that information is anywhere in the Ack packet itself.

    I'd appreciate it if you can help me understand what's going on in the sniffer log and how it's leading to packet loss on our mesh.

    Regarding, adopting a different chip for the ZC, we do have a plan to transition to a CC2652R. Of course, CC2652Rs have been extremely difficult to come by recently. You wouldn't happen to have a few thousand lying around that we can take off your hands, would you? :-)

    Regards,
    Grant

  • Thanks for the sniffer log!  In packet 2097, the ZC (0x0000) is the NWK address (final destination) whereas 0xcd82 is the MAC address (next hop).  For packet 2026, the ZC is both the NWK and MAC address, meaning only one packet needed to be transferred directly and not through an intermediary router.  Something occurs between packets 2026 and 2097 for which the 0xbba4 router decides that routing messages through 0xcd82 is more stable than sending directly to 0x0000.  It's the NWK addresses which are being shown in your screenshots above, not the MAC addresses.  I cannot investigate further at the moment as I am probably missing the NWK key since all of the Zigbee messages show as "Bad FCS".

    How are you observing packet loss from your ZC application and quantify it as 25%?  Also, how many nodes in total are in your Zigbee network and how often is each node reporting to the ZC?

    Regards,
    Ryan

  • Ah, thanks, I wasn't looking at the Source and Destination at the 802.15.4 layer (so many layers and fields!) and looking at that helps a lot.  It makes sense to me that the router might change routing paths as conditions change but it looks like it's changing paths really frequently.  There's one point where it's changing paths on every report.  Here's a sequence of successive reports:

    • Packet 2313 is going through 0xcd82 and is received by the app
    • Packet 2391 is going straight to 0x0000 and is not received by the app
    • Packet 2481 is going through 0xcd82 and is received by the app
    • Packet 2556 is going straight to 0x0000 and is not received by the app
    • Packet 2609 is going through 0xcd82 and is received by the app

    Considering the data packets are getting Ack'ed, maybe the problem is on the ZC side?

    Sorry, when I said MAC, I really meant the IEEE 802.15.4 Extended Source address, which I always think of like an ethernet MAC.

    The "Bad FCS" message is showing up because the TI CC231 sniffer overwrites the FCS field with the RSSI and LQI values.  You can tell Wireshark to decode this properly by going to the Preferences and finding "IEEE 802.15.4" under Protocols.  Change the "FCS format" value to "TI CC24xx metadata".

    Our ZC application forwards data packets from each ZR up to our web application.  Each data packet has a sequence number in it so we can tell when we're missing reports.  Looking at it closer, it's probably to 15% than 25%.  Each ZR node sends up a report every 15 seconds.

    Regards,

    Grant

  • The "Bad FCS" message is showing up because the TI CC231 sniffer overwrites the FCS field with the RSSI and LQI values.  You can tell Wireshark to decode this properly by going to the Preferences and finding "IEEE 802.15.4" under Protocols.  Change the "FCS format" value to "TI CC24xx metadata".

    Thanks for this hint, after making the changes I am now officially seeing encrypted payloads since I do not know the NWK key.

    Considering the data packets are getting Ack'ed, maybe the problem is on the ZC side?

    It's possible that the ZC does not consider 0xBBA4 to be a neighbor and is thus discarding direct messages from this device.  What are the contents of the Link Status messages of both 0x0000 and 0xBBA4 during this time?  If you can debug the CC2530 ZC then are you able to observe messages directly from this ZR reaching the application, or does it seem to be blocked by the pre-built Z-Stack libraries?  What version of Z-Stack is the CC2530 ZC?  Keep in mind the E2E post about Known Issues and Limitations.

    Regards,
    Ryan

  • Ryan,

    Looking at the Link Status messages is interesting. I only looked at the Link Status messages around a couple of data reports since it's time consuming digging through Wireshark captures.

    The Link Status messages from 0x0000 were very consistent. The neighbor entry for 0xBBA4 always has an incoming value of 0 and an outgoing value of 0.

    The Link Status messages from 0xBBA4 was less consistent. The neighbor entry for 0xCD82 always has an incoming value of 1 and an outgoing value of 1. But the neighbor entry for 0x0000 showed a pattern. Every time the ZC sees the data report packet, the neighbor entry for 0x000 always has an incoming value of 1 and an outgoing value of 3. As a reminder, this is the case where the data report gets routed through 0xCD82. Every time the ZC doesn't see the report packet, the neighbor entry always has an incoming value of 1 and an outgoing value of 0. This is the case where the data report goes straight to 0x0000. You may be on to something thinking that the ZC doesn't consider 0xBBA4 a neighbor but that's some deep stack magic.

    Here's the same information in more of a tabular form:

    Link Status from 0x0000                       Link Status from 0xBBA4
    Neighbor entry for 0xBBA4: in 0, out 0 Neighbor entry for 0x0000: in 1, out 3
    Neighbor entry for 0xCD82: in 1, out 1 Neighbor entry for 0xCD82: in 1, out 1

    Packet #2313 (good packet) 0xBBA4 -> 0xCD82 -> 0x0000

    Link Status from 0x0000 Link Status from 0xBBA4
    Neighbor entry for 0xBBA4: in 0, out 0 Neighbor entry for 0x0000: in 1, out 0
    Neighbor entry for 0xCD82: in 1, out 1 Neighbor entry for 0xCD82: in 1, out 1

    Packet #2391 (bad packet) 0xBBA4 -> 0x0000

    Link Status from 0x0000 Link Status from 0xBBA4
    Neighbor entry for 0xBBA4: in 0, out 0 Neighbor entry for 0x0000: in 1, out 3
    Neighbor entry for 0xCD82: in 1, out 1 Neighbor entry for 0xCD82: in 1, out 1

    Packet #2481 (good packet) 0xBBA4 -> 0xCD82 -> 0x0000

    Link Status from 0x0000 Link Status from 0xBBA4
    Neighbor entry for 0xBBA4: in 0, out 0 Missing!
    Neighbor entry for 0xCD82: in 1, out 1

    Packet #2556 (bad packet) 0xBBA4 -> 0x0000

    We are using ZStack-CC2530 version 2.5.1a on the ZC. I haven't got an easy way to debug on the ZC at the moment.

    Nothing in the E2E Known issues leapt out at me as something that could cause this symptom but of course these issues are for version 3.0.2 of the stack.

    Regards,
    Grant

  • Hi Grant,

    Thanks for taking a closer look at the Link Status, I believe this confirms that the 0xBBA4 ZR believes the 0x0000 ZC is a neighbor whereas the ZC itself does not think the same of the ZR.  Thus this legacy version of Z-Stack is deciding to discard the messages instead of improving the routing mechanism or providing feedback over-the-air.  Since a fix would involve updates to a legacy Z-Stack which TI has not supported for several years, the best option would be to upgrade the ZC to Z-Stack 3.0.2 or, better still, consider a SimpleLink CC26X2 solution which uses SIMPLELINK-CC13XX-CC26XX-SDK with ongoing quarterly stack updates.

    Regards,
    Ryan

  • Ryan, it's unfortunate for us to hear that the problem is likely on the coordinator side. We've already got a non-trivial number of units in the field running on the CC2530 with version 2.5.1a of ZStack and upgrading the coordinators in the field would be a significant challenge.

    Just a note, along with our CC2530 ZC, we've had our CC2530 ZRs in the field for years now and have never seen this issue. One thing that I've noticed is that we have configured our CC2530 ZRs to remember what mesh they've joined and immediately start sending data reports on power up.

    Our new CC2652 product, every time it powers up it does a complete mesh join negotiation, starting with a Beacon Request. It does this even if it has already successfully joined the same mesh in the past. Is there a way on the CC2652 to have the device remember the last mesh that it was on and skip the negotiation process? On power up, I call the following code snippet:

    zstack_bdbStartCommissioningReq_t zstack_bdbStartCommissioningReq;
    zstack_bdbStartCommissioningReq.commissioning_mode = 0;
    Zstackapi_bdbStartCommissioningReq(appServiceTaskId,&zstack_bdbStartCommissioningReq);

    And I also have some follow up questions.

    • You mentioned that you do a large network stability test quarterly. Is this a mixed environment with different versions of chipsets and software stacks? Or is your quarterly test done with homogenous devices?
    • You provided a link to the E2E post about known issues for Z-Stack 3.0.2. I don't suppose you have a link to known issues in Z-Stack 2.5.1a?

    Thanks,
    Grant

    • The Zigbee End Device does remember the last network it was on through it's saved NV memory and will send a Rejoin Request if it sees a Beacon Response with the network information it recognizes.
    • There are different chipsets using the same SIMPLELINK-CC13XX-CC26XX-SDK version each quarter.
    • You would have to check the Bug Fixes sections of the Release Notes for versions after Z-Stack 2.5.1a, however all except what remains in Z-STACK-ARCHIVE have been removed from the TI website.

    Regards,
    Ryan