This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software:
I have made a temperature sensor with two endpoints: one for a SHT sensor and another one for an NTC. Everything seems to work as intended and the sensor reports all values correctly, however it seems to disconnect randomly after some time. It could be one or two days, but some may last as much as a couple of weeks before randomly dying. They will reconnect after a repower (I get to see the device announce messages in the Z2M logs after a repower). If the device has been dead for a longer period, I have to manually repair it again for it to work.
I've tried a SONOFF coordinator as well as a UZG coordinator and the behaviour is identical: random disconnects after a varied number of days/weeks
I could not get a sniffer to work (tried it on 2 PCs to write the firmware on a sniffer I bought), but considering it might take days/weeks for a sensor to die, it would be extremely difficult to use a sniffer for this. I am using SDK version 6.20. I will attach the full source code in case anyone is willing to take a look.
Has anyone else encountered a behaviour like this? Where can I delve deeper into the issue to try and isolate a possible cause for this?
Hello A V,
One thing we could try and do is speed up the packet transmissions, for example if the device reports every minute then changing it to every second could help reproduce the issue faster. Do you have the ability to increase the total amount of connections or transmissions per second/minute?
Thanks,
Alex F
Hello Alex,
Are you referring to the poll period? I use a poll period of 30 seconds to conserve battery, but I send reporting messages every 10 seconds. I've delved deeper into this and it seems that reporting messages get ignored when they are sent more often than minimum reporting time, so I assume they have no effect unless I set the minimum reporting time to be smaller than the rate I am polling the sensor and sending reports.
Anyways, I will make a new build with a poll period of 3 seconds (the default value for sample projects) and let them run to see if that has any impact. That would be undesirable, as I was aiming for a few months of battery life, comparable to off the shelf products.
Were you referring to other means of increasing the message count?
Hello everyone,
I suspect there might be some issues with the coordinators I have, so I flashed another 6 sensors with the zed_temperaturesensor_LP_CC2652RB_tirtos_ticlang sample project and, within 12 hours, one of the sensors is already down. Might it also be caused by the sample project?
Are there any known bugs within TI's implementation of Zigbee Stack that might cause unexpected disconnects? Both coordinators I have (son off and UZG) use CC2652 uC, so I suppose they just took the sample projects offered by TI and appended some extra functionality to cater to theirs needs.
Hello A V,
Do you happen to have a sniffer that can grab the Zigbee packets (Wireshark)? This way we can check the outgoing packets to see if some error was reported.
On the topic of your second response, I can look into if we have anything in the release notes about this, though as you mentioned the coordinators could be adding some extra functionality which we are unaware of.
Do you happen to have a CC2652RB (or CC26xx) LaunchPad we can use here?
Thanks,
Alex F
Hello,
I do have an extra launchpad that I could try and use as a sniffer, though in the last 24 hours since my last update no sensor from the 12 I have has yet failed.
I talked about these issues more with my colleagues and have an additional question: is the zstack api inherently task safe? I have an additional task inside which I poll the temperature sensor and issue a Zstackapi_bdbRepChangedAttrValueReq each time I perform a measurement. Is it safe to do so in another thread (without using any muteness / semaphores ) or should I only do it in the main zclSampleTemperatureSensor_process_loop ?
Will share additional code details.
Inside temperaturesensor_data:
// *** Temperature Measurement Attriubtes *** { ZCL_CLUSTER_ID_MS_TEMPERATURE_MEASUREMENT, { // Attribute record ATTRID_TEMPERATURE_MEASUREMENT_MEASURED_VALUE, ZCL_DATATYPE_INT16, ACCESS_CONTROL_READ | ACCESS_REPORTABLE, (void *)&zclSampleTemperatureSensor_MeasuredValue // value to hold temperature } },
Function I call inside the sensor polling thread to send reportable values:
void reportTemperatureMeasurement(int16_t temp, zstack_bdbRepChangedAttrValueReq_t *zReq) { if (NULL == zReq) { return; } zclSampleTemperatureSensor_MeasuredValue = temp; #ifdef BDB_REPORTING zReq->attrID = ATTRID_TEMPERATURE_MEASUREMENT_MEASURED_VALUE; zReq->cluster = ZCL_CLUSTER_ID_MS_TEMPERATURE_MEASUREMENT; zReq->endpoint = SAMPLETEMPERATURESENSOR_ENDPOINT; zstack_ZStatusValues response = Zstackapi_bdbRepChangedAttrValueReq( appServiceTaskId, zReq); #endif }
Additional task delay function I use to allow for sensor measurements and other stuff:
void customTaskSleepMs(uint16_t ms) { size_t ticksToSleep = 0; ticksToSleep = ms * (1000 / Clock_tickPeriod); Task_sleep(ticksToSleep); }
Hello A V,
The ZStack should be safe (and not crash). Your code does look ok, Zstackapi_bdbRepChangedAttrValueReq should be able to be called as needed (as another function in the sample light example uses it in a similar way, and I see that your code matches the snippet below).
Thanks,
Alex F
Thanks a lot for the additional explanations and info. I am still at a loss... Now it's been 36 hours and the 12 sensors are all up and running with no issues.
I want to try to sniff some zigbee traffic, but I need some more help from you - maybe you are better experienced with this:
I see that within the snipped you posted, the variable zclSampleLight_OnOff is modified without guarding it with semaphores and mutexes. I am modifying this variable inside a function called on a different thread context (see reportTemperatureMeasurement in my previous reply) without using semaphores. Could this cause some issues? I tried declaring a new binary semaphore, but it seems the code hangs when pending regardless of me using BIOS_WAIT_FOREVER or a different number of ticks. The debugger tells me that the program is inside Power_sleep and never seems to wake up.
Hello A V,
I am still at a loss... Now it's been 36 hours and the 12 sensors are all up and running with no issues.
-No problem here, sometimes when we want it to "not work" the most it works! Just asking but when you noticed the original crash was it in a noisy environment?
1. So I have a CC2531 in front of me and these are the steps I take:
a. Goto download folder of sniffer and locate the hex file: C:\Program Files (x86)\Texas Instruments\SmartRF Tools\Packet Sniffer\bin\general\firmware (sniffer_fw_cc2531.hex).
b. plug the dongle into a UBS port as well as the CC debugger
c. open flash programmer, you should notice "CC2531 CC Debugger" detected.
d. click on the flash image and select the image/file from step a.
e. Perform erase/flashing
2. I may need to ask my co-worker about this, though they are not in office today but will be next week, I assume we may need to find a way to continuously dump the logs to a file so the sniffer can constantly run without being filled with data.
I see that within the snipped you posted, the variable zclSampleLight_OnOff is modified without guarding it with semaphores and mutexes. I am modifying this variable inside a function called on a different thread context (see reportTemperatureMeasurement in my previous reply) without using semaphores. Could this cause some issues? I tried declaring a new binary semaphore, but it seems the code hangs when pending regardless of me using BIOS_WAIT_FOREVER or a different number of ticks. The debugger tells me that the program is inside Power_sleep and never seems to wake up.
-In this case (with your semaphore) does the program get to and call Zstackapi_bdbRepChangedAttrValueReq? On the last part about sleep that possibly sounds like an error case where it got stuck in a while(1) statement or such.
Thanks,
Alex F
when you noticed the original crash was it in a noisy environment?
It was in a noisy environment and they still are. I'm talking about my office, where we have about two dozen IoT devices plus the plethora of phones and computers everyone uses, mostly on 2.4 GHz. Could interference be such an issue? Though it may be, I find it weird that they're now working really well. It's been 4 days now and they're all up an running..
I will try your flashing steps, though they seem similar to other guides I've tried. Will share results were I to get it to work.
does the program get to and call Zstackapi_bdbRepChangedAttrValueReq?
It does not. From what I've managed to follow with the debugger, it gets stuck after the first execution of the for(;;) loop inside zclSampleTemperatureSensor_process_loop. Putting a breakpoint on if(Semaphore_pend(appSemHandle, BIOS_WAIT_FOREVER )) inside that for will only trigger once, then it gets stuck inside Power_sleep.
Hello Alex,
I've reached the office and tried sniffing again in two ways:
Regarding my issue with Packet Sniffer 2: the program does not detect any boards, though I see them in device manager. If I try to manually configure serial settings, the port dropdown is not available and cannot see the ports listed in device manager to manually select the device COM.
Anyways, 6 days into testing, not a single device has yet failed. They used to die within a day or two, but everything is now working without issues. Anymore suggestions?
Hello A V,
Sorry to hear that the other sniffing options don't seem to be working well for you.
Regarding my issue with Packet Sniffer 2: the program does not detect any boards, though I see them in device manager. If I try to manually configure serial settings, the port dropdown is not available and cannot see the ports listed in device manager to manually select the device COM.
-Its unfortunate that the tools here aren't working well. The coordinator should use the latest Zigbee2MQTT ZNP code branch after the following issue was resolved: https://e2e.ti.com/f/1/t/1288546.
Anyways, 6 days into testing, not a single device has yet failed. They used to die within a day or two, but everything is now working without issues. Anymore suggestions?
-We could consider a watchdog to reset if the device enters a state for which the execution code crashes
Thanks,
Alex F
The coordinator should use the latest Zigbee2MQTT ZNP code branch after the following issue was resolved: https://e2e.ti.com/f/1/t/1288546.
Hello Alex. The following issue applies to coordinators only? I do not use TI coordinator examples, but some two other coordinators working with Zigbee2Mqtt: Sonoff and UZG. That fix mentioned there applies to coordinator firmwares only, or could I apply it to my end device as well?
Were I not clear enough, I suspect the disconnects are caused by my end device implementations, not the coordinators. Were the disconnects caused by the corodinators, I do not have control over that, nor the potential issues clients might experience with them. I really do not think the coordinators could be at fault, as they work fine with other Aquara/Xiami/Ikea devices.
I would like to know if I should try these changes in my end device firmware before committing to flashing and adding additional devices to my network.
We could consider a watchdog to reset if the device enters a state for which the execution code crashes
I've already implemented a watchdog in my app, but that is in relation to potential infinite loops, not if the devices receive no Zigbee messages. I might be wrong, but I think a device should work without having ever been paired with a coordinator without resetting so long everything else works correctly. I'm open to suggestions regarding how such a watchdog could be implemented to detect disconnects.
Hello A V,
I may need to go back through the release notes to check if the fix was applied to connections other than coordinators.
On the topic of the watchdog there are many ways we can approach this, by checking the connection status, checking if no packets are received, among other things. As you noted ideally the device can work fine without being on the ZigBee network, as if you designed your application to some data without needing a ZigBee connection it should work.
Thanks,
Alex F
Hello Alex.
checking the connection status, checking if no packets are received
Can you, please, instruct me on how could I do this?
Hello A V,
Reference our SLA which discusses our reporting status:
Zigbee Fundamental Project Development (ti.com)
And an e2e thread about connection status:
https://e2e.ti.com/f/1/t/1357610/
Thanks,
Alex F
Hello Alex.
Sorry for my belated response, but i did the router steps (as i do not have control over the coordinator's firmware), yet i cannot get any hits inside zclSampleTemperatureSensor_ProcessIncomingMsg when debugging. I defined ZCL_REPORT and ZCL_REPORT_DESTINATION_DEVICE and made sure i have no optimizations enabled, yet i cannot get any hit when sending the reporting command from Z2M. Whats even weirder, Z2M reports that reporting was configured and i can see the metrics being reported accordingly. I tried the other command defines as well, yet i still cannot see any breakpoint hits.
Hello A V,
Do we have any reports from BDB_REPORTING, Zigbee Configuration — SimpleLink CC13XX/CC26XX SDK Z-Stack User's Guide 7.40.00 documentation?
We may need to get a log here of the data.
Thanks,
Alex F
Hello. I cannot provide sniffing logs as per the issues mentioned in
Regarding my issue with Packet Sniffer 2: the program does not detect any boards, though I see them in device manager.
This is an example of setting Reporting Config and getting responses.
2024-10-16 09:32:42z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/TestDev', payload '{"battery":0,"humidity_8":33.88,"linkquality":112,"temperature_8":22.14,"temperature_9":-80.15,"update":{"installed_version":2,"latest_version":2,"state":"available"},"update_available":null,"voltage":3200}' info 2024-10-16 09:32:53z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/TestDev', payload '{"battery":0,"humidity_8":33.9,"linkquality":120,"temperature_8":22.14,"temperature_9":-80.15,"update":{"installed_version":2,"latest_version":2,"state":"available"},"update_available":null,"voltage":3200}' info 2024-10-16 09:33:04z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/TestDev', payload '{"battery":0,"humidity_8":33.9,"linkquality":116,"temperature_8":22.15,"temperature_9":-80.15,"update":{"installed_version":2,"latest_version":2,"state":"available"},"update_available":null,"voltage":3200}' info 2024-10-16 09:33:04z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/TestDev', payload '{"battery":0,"humidity_8":33.96,"linkquality":116,"temperature_8":22.15,"temperature_9":-80.15,"update":{"installed_version":2,"latest_version":2,"state":"available"},"update_available":null,"voltage":3200}' info 2024-10-16 09:33:15z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/TestDev', payload '{"battery":0,"humidity_8":33.89,"linkquality":116,"temperature_8":22.15,"temperature_9":-80.15,"update":{"installed_version":2,"latest_version":2,"state":"available"},"update_available":null,"voltage":3200}' info 2024-10-16 09:33:23z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/TestDev', payload '{"battery":0,"humidity_8":33.89,"linkquality":112,"temperature_8":22.15,"temperature_9":-77.4,"update":{"installed_version":2,"latest_version":2,"state":"available"},"update_available":null,"voltage":3200}' info 2024-10-16 09:33:26z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/TestDev', payload '{"battery":0,"humidity_8":33.88,"linkquality":112,"temperature_8":22.15,"temperature_9":-77.4,"update":{"installed_version":2,"latest_version":2,"state":"available"},"update_available":null,"voltage":3200}'
The -77C temperature you see reported is not an error. It is caused by using the second endpoint for an NTC which is not connected, hence reading 0 from the ADC.
Hello A V,
I will see if there is any information we can try as a next step, I will likely discuss this with one my team members!
Thanks,
Alex F
Hello A V,
Here is summary of the discussion with my team member:
Unfortunately, we can't help with their ZC/ZNP programmed using Zigbee2MQTT code, Packet Sniffer 2 environment is messed up. We could try re-installing Packet Sniffer 2 environment or with another computer if possible.
On the question, "i cannot get any hits inside zclSampleTemperatureSensor_ProcessIncomingMsg when debugging":
In SDK v7.40, zclParseInConfigReportCmd is handled if ZCL_REPORTING_DEVICE is defined which comes with BDB_REPORTING defined. The zclParseInConfigReportCmd function is in zcl.c, then in zcl_HandleExternal we see that ZCL_CMD_CONFIG_REPORT leads to Zstackapi_bdbProcessInConfigReportCmd and then processBdbProcessInConfigReportReq -> bdb_ProcessInConfigReportCmd. If zclport_registerZclHandleExternal occurs in the application then zcl_HandleExternal should pass the command to the application accordingly.
Thanks,
Alex F