This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3235SF: Connectivity issues with Amazon Freertos

Other Parts Discussed in Thread: CC3235SF

 Hello

 

We currently have an issue with our devices, based on a TI microchip, where the firmware crashes for unknown reason(s). The crash results in one of the following two scenarios:

-           The WIFI/MQTT reconnect mechanism stops working and the device is not able to reconnect to our AP or MQTT broker again.

-           The chip enters a freeze state from which it is unable to recover. Operational processes stops working, and the watchdog does not reboot the device.

Both problems can be fixed by executing a hard reset on the device (removing the power for a second).

 

To give you some more context about our use case:  We received a need from our business to think about a helpchain solution which they can use to visualize, monitor and report issues happening on workstation within a distribution center. For this, we developed and designed a custom PCB board based on the TI CC3235SF chip. This device is connected to a stacklight whose lights can be controlled by the buttons located on the device to identify the current status of an issue (open, acknowledged or closed). This information is then also sent the AWS cloud where it gets processed and reported on screens in the distribution centers. Because these devices are managed within the AWS cloud, we have decided to use Amazon FreeRTOS as our real time operating system, since this binary already contains libraries and tools for services in AWS IoT Core.

 i

To go a bit further in our technical design. We are using version 2.02007.00 of the Amazon Freertos kernel which uses v2.10.00.04 of the TI Simplelink SDK. All our devices are running almost the latest TI service pack (sp_4.8.0.8_3.7.0.1_3.1.0.26) and are connected to the 2,4Ghz WIFI band.

 

It is very hard for us to identify the root cause of this issue, because we don't have visibility on any logs. We tried implementing a mechanism which sends logs to the cloud if the devices are connected and writes the logs to flash when the devices are disconnected, but without any success. When we try simulating this issue in a controlled environment all our mechanisms seem to be working:

-           Auto reconnect loop works for both the WIFI and MQTT connections.

-           Watchdog reset the devices when it doesn't receive any kicks.

The watchdog is implemented by a RTOS task which kicks the watchdog periodically.

 

Currently, we are out of options to debug this problem further. Does anyone have some advice on how we can tackle our debug this issue?

 

Thanks in advance!

  • Hi,

    It is always hard to resolve such issue when you are not able to simulate problem at your desk. Unfortunately nobody else than you will not be able resolve your issue. If I had a similar problem I would do this steps:

    • prepare independent logging hardware for capturing logs at final deployment. For example I will use Raspberry PI which will capture UART log, NWP log, WLAN and Cloud connectivity statistic (response time, connection outage, etc.).
    • it will be good to do EMC measurements at final deployment using far field antenna and spectrum analyser, to be sure how looks EM environment
    • in meanwhile you can do EM immunity tests at your desk including ESD tests with connectors (e.g. EN 61000-4-2)
    • depending how often issues occurred it may to be reasonable to consider to re-design of your board with "hardware" watchdog

    Jan

  • Hi Hans,

    While I agree with Jan that it is unusual you cannot replicate the issue, it would also be good to verify the state of the application MCU:

    • When the device is in this state, are you able to ping it? This may give us an idea if the NWP is alive and connected, even if the MQTT client is not responding.
      • If the NWP does crash, there should be a tmp/crashminidump.bin saved in the file system. We can parse that and see if it's helpful.
    • Regarding the watchdog:
      • What priority is the thread kicking the watchdog versus other threads?
      • Check out the watchdog recovery sequence in section 10.4.2 of the TRM.
    • How are you disconnecting and reconnecting?
    • If you are able to pull out JTAG pins and access a device before it is reset, you can use the CCS debugger to check the current state of the application. See the Manual Launch section of Debug Overview or this video.

    Best regards,

    Sarah

  • Hi Sarah,

    We've checked if the device is able to receive a ping. The device is not connected to any of our networking devices, meaning it also doesn't have an IP address. This makes it impossible for us to ping it and leads us to believe the NWP has also frozen. We've not been able to check the crashminidump.bin in the file system yet. We'll do this as soon as we have the necessary hardware to read these files.

    We're also planning to lower the priority of the watchdog and release this test during next week. 

    We'll keep you posted on our progress.

    Thanks for trying to help us figure this out!

    Kind regards,

    Vincent

  • I'm working on a similar design, but I found out that v2.10.00.04 of the TI Simplelink SDK doesn't support CC3235SF. 

    Could that be the issue?

  • Thanks Will, I didn't catch that SDK version as pre-CC3235. This has not been tested, so we do not know what the impact might be.

    Hans, Vincent,

    Is there a reason you are working with an old AWS FreeRTOS version? We have the AWS plugin compatible with CC32xx SDK 4.30.00.02. I believe it's based on v3.0.1.

    Best regards,

    Sarah

  • Hey Sarah.

    Yes, we are using a pre-CC3235SF version of the SDK. We have tested this and saw no issues with the libraries provided in Amazon FreeRTOS, except for the OTA library. There we had to implement a workaround. We did some research when we started this project and concluded that the AWS plugin didn't include an implementation for AWS OTA jobs which was a hard requirement for us. If I understand correctly, the AWS plugin is basically a fork of the AWS IoT Device SDK for Embedded C project with some additional functionality specific for TI chips. This project does include a library for AWS OTA jobs. Do you maybe know if the AWS plugin will implement this OTA library in future releases?

    Best regards

    Hans

  • Hi Hans,

    The pre-CC3235 host driver was not designed for or tested on that device, so we do not know if this causes issues on the host or NWP side. We cannot know if this impacts the unusual behavior you're seeing. You are also at risk of potential issues in future servicepack updates, as we do not test backwards-compatibility for that SDK version on this device.

    Where did you pull your original source from? The TI AWS plugin is just a re-package of the aws-iot-device-sdk-embedded-C git. You should be able to update to a later SDK version regardless.

    Best regards,

    Sarah

  • I have the same needs as Hans - to use AWS OTA instead of TI OTA. 

    Which is an easier way?

    1. Update the TI CC32xx SDK in the Amazon FreeRTOS pkg to support CC3235SF, so hopefully the TI OTA port would work without much work?

    2. Port the AWS OTA features (as part of the AWS IOT C SDK) to TI AWS IoT plugin? By the way, TI is using a 4 years old AWS C SDK 3.0.1 (May 10, 2018)!

  • Hi Will,

    I think the simplest method would be to update the host driver used by the AWS FreeRTOS package. This would just require updating the host driver source folder (source/ti/drivers/net/wifi/*). My minimum recommendation for CC323x is host driver version 3.0.1.46, which is in the 2.40 SDK. You can update to the latest version as well, but there may be more compatibility breaks.

    Best regards,

    Sarah