Hello
We currently have an issue with our devices, based on a TI microchip, where the firmware crashes for unknown reason(s). The crash results in one of the following two scenarios:
- The WIFI/MQTT reconnect mechanism stops working and the device is not able to reconnect to our AP or MQTT broker again.
- The chip enters a freeze state from which it is unable to recover. Operational processes stops working, and the watchdog does not reboot the device.
Both problems can be fixed by executing a hard reset on the device (removing the power for a second).
To give you some more context about our use case: We received a need from our business to think about a helpchain solution which they can use to visualize, monitor and report issues happening on workstation within a distribution center. For this, we developed and designed a custom PCB board based on the TI CC3235SF chip. This device is connected to a stacklight whose lights can be controlled by the buttons located on the device to identify the current status of an issue (open, acknowledged or closed). This information is then also sent the AWS cloud where it gets processed and reported on screens in the distribution centers. Because these devices are managed within the AWS cloud, we have decided to use Amazon FreeRTOS as our real time operating system, since this binary already contains libraries and tools for services in AWS IoT Core.
i
To go a bit further in our technical design. We are using version 2.02007.00 of the Amazon Freertos kernel which uses v2.10.00.04 of the TI Simplelink SDK. All our devices are running almost the latest TI service pack (sp_4.8.0.8_3.7.0.1_3.1.0.26) and are connected to the 2,4Ghz WIFI band.
It is very hard for us to identify the root cause of this issue, because we don't have visibility on any logs. We tried implementing a mechanism which sends logs to the cloud if the devices are connected and writes the logs to flash when the devices are disconnected, but without any success. When we try simulating this issue in a controlled environment all our mechanisms seem to be working:
- Auto reconnect loop works for both the WIFI and MQTT connections.
- Watchdog reset the devices when it doesn't receive any kicks.
The watchdog is implemented by a RTOS task which kicks the watchdog periodically.
Currently, we are out of options to debug this problem further. Does anyone have some advice on how we can tackle our debug this issue?
Thanks in advance!