LP-CC2652RB: Z-STACK sending Endpoint cluster data, NOT executing incoming ON/OFF commands

Roberto Nucera

Part Number: LP-CC2652RB
Other Parts Discussed in Thread: CC1352P, Z-STACK

Hello!

As you perhaps can see from previous tickets, we have prepared a preliminary plant test made of 20 zigBee devices, same Hw, same Fw, each one structured the following way:
4 x router devices: "zr_genericapp_CC1352P_2_LAUNCHXL_tirtos_ticlang"
16 x end devices: "zed_genericapp_CC1352P_2_LAUNCHXL_tirtos_ticlang"
All these devices have been configured in "zcl_genericapp_data.c " to handle 8 Endpoints, where each one disposes: 1 x ATTRID_SE_METERING_CURR_SUMM_DLVD, 1 x ATTRID_ON_OFF_ON_OFF
We use these devices to "collect power consumption" and to enable "power distribution"

All these devices are joined to a SONOFF coordinator and, at the moment, we can handle all devices using "ZigBee2Mqtt" application downloaded from the web.
We are testing the entire plant in laboratory in order to check the stability of the system along some weeks time....
During test period each device: a) sends periodically 4 "CurrentSummationDelivered" measures to our console "ZigBee2Mqtt" ; b) handles remote console ON/OFF commands
The biggest problem that has arisen recently is that sometimes (let's say one of the 20 devices each day) a single device sends the four Power measures, but does not accept ON/OFF commands! (all other devices are ok)

In this condition (using an external PC application talking with CC1352P processor by the use of a processor free Uart), we can check internal device main state variables: "zstack_DevState_DEV_END_DEVICE" (even BDB_COMMISSIONING_NETWORK_RESTORED): so they are.
Furthermore, the periodic transmission of "CurrentSummationDelivered" measures implies that the application is not locked!
But no way to activate the 8 ON/OFF buttons!
Every time this happens, it's sufficient turning off, then on power supply, and it all restarts working well!
Any idea about what could happen?
What can my internal application verify, besides checking " zstack_DevState"

Again many thanks!

Luigi

over 1 year ago

0 Ryan Brown1 over 1 year ago

TI__Guru**** 202446 points

Hello Luigi,

What SDK version are you using? Does this only happen to ZED devices or can it occur for ZRs as well? Just to be clear, a device which was previously not experiencing any issues with accepting on/off commands during operation will at some time stop acting on them until a reset occurs? Are all devices susceptible to this behavior, or only a certain few? Are you able to further quantify what could be happening to cause this device state, or have a way to reliably reproduce the behavior? It is important to use a sniffer device to confirm that the faulty device is acknowledging these packets. Yyou can also debug your application's zclGenericApp_processAfIncomingMsgInd, zclGeneral_ProcessInOnOff, and on/off cluster command callback (if enabled, see zclSampleLight_OnOffCB for an example) functions to determine whether the packet is received by the application and where it fails along processing.

Regards,
Ryan

0 Roberto Nucera over 1 year ago in reply to Ryan Brown1

Prodigy 235 points

Hello Ryan! Here are my answers. As you can see Roberto is no longer working with us, so it's a month I had to work hard on its project, and it's not so easy for me as it was for him! He was very dynamic!

About your questions: "...a device which was previously not experiencing any issues with accepting on/off commands during operation will at some time stop acting on them until a reset occurs" : Yes, this is what's happening: in two days 3 different ZED devices gave this problem. Yesterday I reprogrammed one of them (complete flash erase, than program and verify), now it's 12 hours it's working properly. This morning a new one is giving me the same behavior ... I'll reprogram it.

In practice, last weeks I found that reprogramming a device without issuing a complete device erase before, sometimes some device did not get back ON/OFF attributes at power-on (even this state is saved in NVM, as you surely know)

About SDK we’re still using “CC13xx CC26xx SDK [6.20.0.29]”; I remember a month ago Roberto installed a new release and after some troubles (the application was not working at zigbee network level) we decided to keep on working on the original SDK [6.20.0.29].

So if you think moving to a recent release, I’ll try to do it. Hoping I’ll not bore you more than I’m doing now!

? Are you able to further quantify what could be happening to cause this device state: I really don’t know: we have prepared a test bench with 20 devices: 4 routers and 16 end devices, continuously running: each device periodically sends simulated energy measurements and must accept remote commands about ON/OFF attributes. Practically impossible understanding what could be the cause; we also can't mount a debugger on every device to debug a device in this state….

But on our latest release Fw I trigger a buzzer Beep an time “zclGenericApp_OnOffCB” is called…. In this way I’ll understand if the trouble is before call back or after…. I hope it’ll be after, in this case I surely will be able to resolve the problem by myself, but I think it’s above!

Consequently we do not have a way to reliably reproduce the behavior.

Please notice that this is for us an important project: measurement of electricity and water consumption on the docks of the ports, for each vessel, with the possibility of remotely enabling both supplies; all this managed by a single cloud server. As you can well understand if the system works there are countless possible installations! But the system must be very reliable. We’re a little Company, working on Hw/Sw since 1983. On this project we have invested a lot for our possibilities, about 50,000 euros: Hw and Sw…..

Your Help is for us very important... We coluld even be interested to get there in the USA, if some developer on this matter could have a chek on the entire project....(naturally remunerating the cost of the activity). Can you help us in this way?

Thanks a lot for your kind collaboration

Luigi

0 Ryan Brown1 over 1 year ago in reply to Roberto Nucera

TI__Guru**** 202446 points

Luigi,

Thanks for the detailed response. So sometimes a device reset is sufficient, but other times a reprogram/NV reset is necessary to restore operation? This is unexpected as on/off attributes are not intrinsically associated with NV memory. Is it possible for some devices to never fail? You may try adding other GPIO debug to zclGenericApp_processAfIncomingMsgInd for clusterId matching the on/off cluster (0x0006) similar to your buzzer beep for another reference point. Is the on/off attribute set for reporting? How often is the ZED periodically sending CurrentSummationDelivered measurements? Is the ZED able to receive AF ACKs from sending these packets (see Zigbee Fundamentals SLA for details on AF_ACK_REQUEST) or receive any other ZCL commands (read/write other attributes for instance)? Once again, a sniffer log of the problematic behavior would be helpful. I cannot comment on further TI resources being dedicated for this investigation.

Regards,
Ryan

0 Roberto Nucera over 1 year ago in reply to Ryan Brown1

Prodigy 235 points

Ryan, thanks for your wide answer....

Normally a Reset is sufficient to restart operation.

About the question " Is it possible for some devices to never fail? " it would take a lot of time to get the answer as this happens after many hours of burnin....

We noticed that the failed devices seem to be different each time... I'll be more accurate about tracking.

Please notice that each device (4 routers + 16 end devices) every 5 seconds send 4 x CurrentConsumptionDelivered measures (50mS from each other)

furthermore a 5th "CurrentConsumptionDelivered" containing device diagnostic events or alive code only is sent each 30 seconds.

With all this traffic, I think it's natural that sometines sending an ON/OFF command, the end device delays some seconds before actuating it... do you agree?

Endpoints 5,6,7,8 are at the moment not sending data, but in our application they will have to do....

This morning back to office I found 2 devices locked: they did not send any "CurrentConsumptionDelivered" and did not accept ON/OFF commands.

Our host (Zigbee2Mqtt) was reporting timeout errors about on/off commands, and periodically "ping error" about the same devices.

Before resetting the two devices (using my external APP talking through Uart) I was able to get important variables, that is:

my internal main application status was "DEVICE_JOINED_OK" telling me that it found correct value on "zstack_DevState" beeing "zstack_DevState_DEV_ROUTER"; (having a look to “BDB commissioning” it was “BDB_COMMISSIONING_SUCCESS”)

My application was alive because periodically was sending internal variables to my PC application, and diagnostic leds were correctly blinking...

--> this monitoring is the same as well working devices.... no difference at all

We do not have a way to reliably reproduce the behavior.

Sorry I dare asking again my original question: is there any way to trigger the incoming "low level stack ping command" ? If so I'd be able to reset my processor when it times out!

Today I'll try to check all your questions....that at first sight seem to be not easy to investigate....

Thnk you very much!

Luigi

0 Ryan Brown1 over 1 year ago in reply to Roberto Nucera

TI__Guru**** 202446 points

You could be experiencing a memory leak from your application based on the behavior you are describing. The OSAL Heap manager for Z-Stack projects with SDK v6.20 uses a static heap size, HEAPMGR_SIZE which is declared inside the *.cfg file. Increasing the heap size would delay the issue but not prevent it. As you continue to debug devices, you should add the HEAPMGR_METRICS pre-define and monitor heap variables inside the IDE debugger. This is also described in the Heap Allocation and Management section of the Z-Stack User's Guide. You can also find common heap issue debug and resolution from the BLE5-Stack Debugging Guide. You can further troubleshoot to determine the cause of the memory leak if applicable.

You can use the Watchdog peripheral to force a device reset if it becomes unresponsive. You can find more information inside the Watchdog.h and WatchdogCC26XX.h files of the TI Drivers Runtime APIs and watchdog example project. This means you would be able to set a watchdog timeout period and clear the watchdog at a determined interval, like sending CurrentConsumptionDelivered values, and failing to do so would indicate device failure at which point the watchdog should intervene and reset the device.

Regards,
Ryan

0 Roberto Nucera 11 months ago in reply to Ryan Brown1

Prodigy 235 points

Hi Ryan, here I am again with some questions about our project:

we turned to "simplelink_cc13xx_cc26xx_sdk_6_41_00_17"

not so easy.... but now it works!

What I noticed on this new release is that sometimes the application stops running: besides zigbee device not responding, I see my flashing leds (handled by standard timer callback) stopped, meaning that timer callback also does not run.

This could depend on some RAM define settings to be improved. Naturally I did not do any modifications related to previous stack release.

Will you kindly tell me where I can find these settings, and what could be a good value?

Please consider that in my application I don’t use heap at all.

Another question: for similar needs, in my application I'd need to execute a Sw Reset; I tried with "SysCtrlShutdown()" but it does not run!

Any suggestion?

Thank a lot

BR Luigi

0 Ryan Brown1 11 months ago in reply to Roberto Nucera

TI__Guru**** 202446 points

Here is a guide discussing RAM allocation, otherwise I don't have any more comments beyond what was already provided by my most recent reply. You need to further debug the device to understand what its current state is and the cause of failure. Also, you should be using SysCtrlSystemReset to perform a device reset. SysCtrlShutdown will put the device into its lowest-power state (shutdown) which can only be recovered by a device reset or pre-configured pin use. You can learn more about the shutdown state inside the TRM.

Regards,
Ryan

0 Roberto Nucera 11 months ago in reply to Ryan Brown1

Prodigy 235 points

Thanks Ryan!

Luigi

Zigbee & Thread

Zigbee & Thread forum

LP-CC2652RB: Z-STACK sending Endpoint cluster data, NOT executing incoming ON/OFF commands