This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3220SF: CC3220 strange WDT reset issue.

Part Number: CC3220SF
Other Parts Discussed in Thread: CC3200

Hi all,

We are encountering a strange issue on CC3220.

We have about 1000pcs CC3220 gateway running in the market.

We found that about 50 pcs of those 1000 units keep rebooting every 5 mins to 10 mins and

the reboot is caused by wdt timer reset.

They are running on mqtt with GCP server.

We think that there maybe some issue between CC3220 and some kinds of router.

It seems that some of routers will kick the CC3220 out if it is idle in a short period.

We finally found out that it can be fixed by publishing a mqtt message every 10s.

But this is not a solution as it will cost a lot of network expenses on GCP.

Then, we tried to fix it by followings:

1. Ping Google 8.8.8.8 every 10s

2. Sync clock every 10s.

3. Change TCP keepalive time to 10s (SLNETSOCK_OPSOCK_KEEPALIVE_TIME).

4. Send a MQTT pingreq every 10s

5. Publish MQTT message every 20s

However, none of those solution can fix the strange wdt reset issue.

Anyone can help? It is very strange as we have 9xx units that are working great.

  • We are sure that it doesn't cause by server side.
    As we changed the server from xively to gcp iot, it also kept rebooting on xively.

    We think it may get this issue when CC3220 connected to some kinds of router.

    However, as those units are already in the market, we don't know which brand of router will cause this issue.

    Does anyone have this information and how we can fix it?

  • Hi Bryan,

    Without more info that will help us understand the root cause, it will be very hard for us to help.

    Do you see a disconnection from the server before the wdt expire?

    Any error event from the NWP?

    What is the SDK and SP versions? (did you try updating to the latest?)

    I'm only aware of one possible issue that can occur during MQTT disconnection, depending on application implementation (basically calling MQTTClient_delete can be done only when there are no other active MQTTClient calls, such MQTTClient_run or MQTTClient_publish), but there is no real reason for a frequent disconnection.

    br,

    Kobi

  • Hi Kobi,

    It is using SDK simplelink_cc32xx_sdk_1_60_00_04 and

    service pack sp_3.9.0.6_2.0.0.0_2.2.0.6.bin.

    We think it may be compatibility issue when CC3220SF connected to some kinds of router.

    It seems that the issue is caused by some kinds of router.

    We got 1 device that was working great with our router and

    we sent it to our customer for replacement.

    However, it still kept disconnecting in his router.

    If we set that device to keep publishing message every 10s, it will work fine...

    It is very strange that you need to keep publishing message to keep it online on that router.

    We tried keep publishing PINGREQ but it didn't work...

    In server side, it shows that the connection is terminated by client.

    No idea what the issue is.

    For service pack, can I directly ota it with latest version?

    Best Regards,

    Bryan

  • Yes, you should be able to update the SP (and/or the MCU image) if you are using our OTA library (if it was supported).

    The versions are very old. There were many fixes since. Please try the latest SP.

    Which AP (router) causes the problems?

    We are testing the simplelink device against most available routers. 

    Are you seeing many disconnections?

    Br,

    Kobi

  • Hi Kobi,

    I will try to ota those devices to the latest SP.

    We are trying to ask our customer about the router's information.

    "Are you seeing many disconnections?"

    The weird thing is that we have 9xx units that are running great.

    Only about 30 to 50 units got disconnection issue.

    For those 30 to 50 units, we are sure that it was working great before selling to our customer.

    And thus, we wonder it is a compatibility issue.

    We tried a lot of methods but only keep publishing mqtt message every 10s can fix it (However, it is not a acceptable solution for us)...

    I will try to ota those devices with latest SP first.

    Hope it can fix it.

    Best regards,

    Bryan

  • Kobi,

    I have tested to update the SP but it still keep disconnecting. :'(

    From now, it seems the only way is keep publishing message...

    Do you have other suggestion that we can try?

    We don't want to update the SDK. It should be our last option. :'(

    Thank you.

    Best regards,

    Bryan

  • I think there is also one weird thing that.

    Keep publishing mqtt message and keep sending MQTT PINGREQ should be on the same socket.

    No idea why only the former one can fix the issue. :'(

  • Do you get any event from the NWP (in one of the Simplelink Event handlers)?

    Do you see the MQTT connection gets disconnected?

    what happens if you disable the WDT?

    I still don't sure what is the issue you are facing beside getting the WDT. The WDT typically means that your app got stuck, what can cause it to not reset the watchdog?

    You mentioned the issue is related to certain router that is being used. What is the reason you suspect the router?

    I'm not sure i understand your answer about the disconnection from the router. Do you see many disconnection toward the router where the system fails?

  • Honestly, we don't have much information.

    Those units are running on customer side…

    We just got the information with our limited mqtt log.

    Why do we suspect the router?

    For server,  there are 9xx units that are running great.

    For hardware,  there is one unit that we were sure it was working great with our testing router before sending to customer for replacement. However, it got same disconnection issue when it connected to customer's router.

    The most weird thing is keep publishing mqtt message every 10s can fix it… Will TI engineer have any idea that may be related to this issue?

  • If you see a lot of disconnections, from the AP, you may want to disable the PS-POLL policy, as some routers doesn't respect the PS-POLL indication sent by the device and this can cause disconnections.

    SlWlanNoPSPollMode_t NoPsPollMode;
    NoPsPollMode.Enable = 1; // enable no PS-Poll mode (work without PS-Poll frames) 
    // 0 : Disable =  no PS_Poll mode (default) - station sends PS-Poll ctrl frame to receive buffered frames from the AP when unicast traffic is indicated in the beacon
    // 1 : Enable = no PS_Poll mode - Stating transition from PS to Active whenever unicast traffic is indicated in the beacon (this mode is for inter operability issues with access points that doesn't fully support PS-Poll) 
    
    sl_WlanSet(SL_WLAN_CFG_GENERAL_PARAM_ID, SL_WLAN_GENERAL_PARAM_OPT_NO_PS_POLL_MODE,sizeof(SlWlanNoPSPollMode_t),(_u8 *)& NoPsPollMode);
    
    
    
    

    Anyway, disconnection in the WiFi level (i.e. the connection to the router) or in the MQTT level (connection to the broker) shouldn't cause the app to get stuck (and trigger the WDT). It may impact the power consumption, but shouldn't be fatal.

    Unless you will identify the exact issue that cause the app to stuck - I don't know how to help with this.

    Br,

    Kobi

  • Many thanks Kobi,

    I will test it and let you know the result.

    Best regards,

    Bryan

  • Ok.

    Please update once you find anything.

    Thanks,

    Kobi

  • Hi Kobi,

    Sorry for late reply.

    It seems that SL_WLAN_GENERAL_PARAM_OPT_NO_PS_POLL_MODE is not in my old sdk.

    There is not option for that.

    We are trying to get more information about disconnection issue.

    Thank you.

    Best regards,

    Bryan

  • You can add this command option to your old SDK (it mostly requires the OPTION value - assuming you use an updated SP), but it will be better to use this debug opportunity to update to the new SDK.

    Br,

    Kobi

  • Kobi,

    We have an interesting founding.

    We tried to connect our device to a cheaper router (Mercusys AC1200).
    We found that if the device doesn't keep publishing log, it will got a socket error on mqtt socket in a short period
    and this causes the disconnect issue.
    Once we set the device back to keep publishing mqtt every 10s, it will not have such socket error.

    We are sure that it is caused by specific router
    because there is another router using same network and
    if the same device connected to it, it will not have same issue (Socket error).

    I tried to turn on the mqtt library log. Here is the log when it got that error.

    C: Net 0, Raw Error -1, Time Out: N [601]
    C: RX closing Net 0 [-1]
    C: Cleaning session for net 0
    C: Net 0 now closed

    Do you have any idea about it?

    Best regards,
    Bryan

  • Are you sure you are using the CC3220 SDK MQTT stack?

    These are not logs from the CC3220 stack (i believe they are from the CC3200 MQTT lib).

  • ...

    We are using simplelink_cc32xx_sdk_1_60_00_04.
    And the mqtt is called as follows
    C:\ti\simplelink_cc32xx_sdk_1_60_00_04\examples\rtos\CC3220SF_LAUNCHXL\demos\mqtt_client

  • Please try to update the SDK/MQTT (and probably the SP).

    We are using a different library today and solved many issues since version 1.60.

    Br,

    Kobi