CC2652P7: Critical Z-Stack Firmware Bug

Other Parts Discussed in Thread: CC2652P7, Z-STACK

CRITICAL Z-STACK FIRMWARE BUG: CC2652P7 SRSP AF Data Buffer Overflow / Lockup during high-volume Cluster Requests (e.g., GenTime)

I am managing a large-scale deployment controlling safety-critical infrastructure, including automated commercial gates and traffic light systems. Under high load, the CC2652P7 regularly locks up completely due to an unhandled Synchronous Response (SRSP) / Asynchronous Framework (AF) data buffer overflow.

The issue triggers when end-devices flood the coordinator with specific cluster requests—specifically GenTime queries (Cluster 0x000A). Instead of gracefully discarding unhandled requests, dropping timed-out packets, or returning a clean error code to the host application (Zigbee2MQTT/zigbee-herdsman), the Z-Stack's internal memory management for 'AF - dataRequest' completely saturates. Once the SRSP buffer is full, the entire serial interface freezes, requiring a hard power cycle and a complete NVRAM flash to restore functionality every few hours.

As an enterprise-grade component provider, TI cannot expect customers running critical infrastructure to implement absurd workarounds like physically cutting power to the MCU via external relays just because the Z-Stack fails to clear its own asynchronous command buffers. A simple web search proves that the community is plagued by these "SRSP AF" and Z-Stack crash loops on the CC2652 series.

I need immediate technical clarification from your engineering team on the following points:
1. Why does the Z-Stack fail to implement a strict FIFO/LIFO timeout mechanism to clear stalled AF requests from the memory before a total lockup occurs?
2. Is there an undocumented configuration parameter or a specific patch in the SimpleLink SDK that prevents the SRSP interface from freezing when flooded with unhandled cluster requests?
3. What is TI’s official roadmap to fix this severe stability flaw in the Z-Stack core firmware?

Please forward this directly to a Senior Z-Stack Systems Engineer. This is a production-stopping issue for a safety-critical environment.

  • Hi Dirk,

    Please provide the SimpleLink F2 SDK version you are evaluating, as well as any debug or sniffer logs if you have them.  Are you using the Zigbee2MQTT ZNP patches and can you provide further details as to this version as well?    and I engaged on this topic a few years ago, here are the references:

     SIMPLELINK-CC13X2-26X2-SDK: Firmware not stable since SDK 6.20.00.29 
     https://github.com/Koenkk/Z-Stack-firmware/discussions/483 
    https://github.com/Koenkk/Z-Stack-firmware/discussions/496 

    Regards,
    Ryan

  • Technical details regarding the Z-Stack crash under high-density ZCL traffic on CC2652P7

    Hi Ryan,

    Thank you for the response. To prevent the usual 'edge-case' or 'out-of-spec' assumptions, let me provide you with the exact technical environment and the protocol-level behavior that triggers this critical Z-Stack freeze.

    1. The Environment (High-Density Enterprise Mesh)
    - Coordinator: SMLIGHT SLZB-06P7 based on the CC2652P7 chip (124 kB RAM available for Z-Stack).
    - Network Size: 120 physical Zigbee devices (111 Routers, 9 End Devices).
    - Traffic Profile: 31 of these routers are industrial-grade smart breakers that constantly stream energy metrics (Voltage, Current, Power) back to the coordinator via reporting.

    2. The Trigger Condition (The "ZCL Time-Sync Spam")
    The crash is reliably triggered when a specific brand of light bulbs (Manufacturer: Eglo/AwoX) floods the coordinator with ZCL time cluster read requests (genTime.readRsp) every few seconds.

    3. The Z-Stack Failure Mode (Why this is a Core Bug)
    When the high frequency of incoming energy metrics coincides with the rapid generation of genTime responses by the coordinator, the serial transmit buffer of the Z-Stack completely clogs up.

    Instead of gracefully dropping unacknowledged or low-priority packets, the Z-Stack exhibits the following behavior:
    - Latency Spike: ZCL command response latency suddenly jumps from <100 ms to over 6000 ms.
    - Serial Backlog: The communication between the Zigbee core and the serial interface (UART/Network bridge) freezes.
    - Fatal Lockup: The CC2652P7 chip stops responding entirely. It requires a hard power cycle (physical voltage cutoff via relay) to recover. A warm software reset is often not enough because the internal heap/NV-RAM state remains corrupted.

    4. The Question to TI Engineering
    As a software engineer, I see a clear architectural flaw in how the Z-Stack handles asynchronous traffic peaks on the Application Layer.

    - Why does the Z-Stack allow the serial/ZCL response buffer to completely exhaust the 124 kB RAM instead of throttling incoming requests or dropping expired outbound packets?
    - Is there a hidden buffer allocation limit within the f8wConfig.cfg or the Z-Stack core that causes a silent stack overflow when processing high-volume genTime requests alongside standard attribute reports?

    We need a patch or a compiler flag recommendation to harden the Z-Stack against this type of Application-Layer Denial of Service (DoS) caused by misbehaving third-party routers.

    Looking forward to your deeper technical analysis.

  • Hi Dirk,

    Are you in contact with a TI Field or Sales office? In order to best address your questions, I need to know what SDK version you are using and whether you've applied Koen's recommended changes.  Is your project using a ZNP or ZC application?  In particular, what are the values of the following definitions?

    • NPI_TL_BUF_SIZE
    • MAC_CFG_*_MAX
    • HEAPMGR_SIZE

    You could choose to utilize application processing in MT_AfIncomingMsg or processAfIncomingMsgInd (depending on ZNP or ZC) to filter out unwanted ZCL messages.  Or you could proactively have the ZC request that "misbehaving third-party routers" leave the network and prevent them from re-joining.  There is no current plan to modify the Z-Stack source code to filter incoming requests or drop packets in the TX buffer, so we will need to mitigate this behavior from the application as best as possible.

    Regards,
    Ryan