This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/CC3220MODA: Network interface fails in CC3220MODA

Genius 3100 points
Part Number: CC3220MODA

Tool/software: TI C/C++ Compiler

Team,

This is a followup on the earlier issue reported - https://e2e.ti.com/support/wireless-connectivity/wifi/f/968/p/857873/3187543?tisearch=e2e-sitesearch&keymatch=strange%252520mqtt#3187543

I still find that the issue persist on some modules, where the network interface fails rarely i.e MQTT connections keeps failing until the device is reset. I am still unable to get logs as it happens in few of our production modules and we are finding it hard to replicate the same in debug modules.

I am using SDK 3.30.01.02 (18 Oct 2019) and I see that the latest version is 4.20.00.07 (02 Jul 2020). Has anything changes on the network implementation or MQTT ?

I could also find a known issue mentioned in the release notes of the latest SDK i.e 'CC3X20SDK-601 - Rarely the MQTT Server internal bridging does not work from server to client'. Can someone explain further on this issue please?

Appreciate your support.

Regards,

Zac

  • Hi Zac,

    Are you using the internal MQTT server?

    I'm not familiar with similar issue. It will be difficult to help without any log.

    You can try the latest SP. It will work without changing anything in the application.

    Beside the mqtt connection failure, do you see other issues with regards to NWP interface (when the issue occurs)?

    Br,

    Kobi

  • Hello Kobi,

    We are not using MQTT server. MQTT client is used to connect to a remote broker.

    I suspect that, none of the network calls would work when the issue happens.

    Do you have more details on CC3X20SDK-601 issue?

    Regards,

    Zac

  • It is about our MQTT server that sometimes fails to forward local clients messages to an external cloud broker.

    Since you are not using the MQTT server, it is not related.

    Br,

    Kobi  

  • Hi Kobi,

    Do you think anything else may cause such issues?

    Regards,

    Zac

  • Hi Zac,

    Please try the latest service pack.

    When you say network interface fail are you referring only to the MQTT level? Are you getting disconnected from the Broker without being able to re-connect? Are you working against one broker or faced this with different brokers?

    or do you also get wifi related errors (i.e. through the SimpleLinkWlanEventHandler)?

    What error code are you getting when trying to connect?

    Br,

    Kobi

  • Hello Kobi,

    The issue seems to happen during a fresh boot and was not seen once the MQTT connection is established. All we know so far is that when the issue happens, the MQTT, SNTP or any other network calls will not work until the device is reboot. We are not able to get the exact code, hence not sure about the exception.

    Regards,

    Zac

  • So let me know when you have more details (and try the new SP).

    I'm not aware of a similar issue and can't help you without more info. 

    Br,

    Kobi

  • Hello Kobi,

    I recently found a device which showed similar issue (MQTT not connecting) and found that its WLAN was still connected to the router and the HTTP API like /api/1/wlan/en_ap_scan was returning data successfully. Only the MQTT connection and other outbound REST API calls were failing. Also note that the issue happened after an internet failure (WLAN was still active)

    Any idea on when such behaviour would occur? Sorry, I am not able to get further details. Appreciate any help.

    Regards,

    Zac

  • You said that the internet was disconnected before. Did the MQTTClient_run() returned? (this would happen only when the MQTTClient library identifies the MQTT disconnection, based on KA timeout).

    In such cases, we saw applications that got into race condition (the disconnection by MQTTClient_delete() releases internal resources, including mutex that may be used by other pending MQTTClient commands (e.g. MQTTClient_publish  that was not completed due to the internet failure).

    The MQTTClient_delete should be called only when there are no other MQTTClient activities in progress.

    br,

    Kobi

  • Hi Kobi,

    Below is the sequence executed when the internet is disconnected.

    1. MQTTClient_DISCONNECT_CB_EVENT event received in the MQTT event handler.
    2. MQTTClient_unsubscribe called to unsubscribe from all topics (this may not be relevant as internet is already down)
    3. MQTTClient_disconnect invoked on MQTT client
    4. MQTTClient_delete invoked
    5. MQ cleared and discards any pending messages
    6. MQTTClient_run returns and MQTT Client thread exits.

    This sequence is running fine during my testing and reconnection is happening when internet connection is restored. The sequence is executed using MQueue and hence the MQTTClient_delete or MQTTClient_disconnect will not be called in the middle of a publish.

    Can you help us understand further on the behaviour when the internet drops during a publish? How to handle such scenarios?

    Appreciate your help.

    Regards,

    Zac

  • Your sequence can cause issues. MQTTClient_delete should be called only once the MQTTClient_run returns.

    When the Intrernet drops, the KA timeout will expire causing the library to notify the disconnection and to return from MQTTClient_run.

    If there is a pending publish command (if the publish is sent with QOS1 or QOS2 - the command will return only when MQTT ACK will be received,) it will basically block forever (waiting on a signaling object that will never get signaled). But upon the disconnection sequence, the Signaling Object (for any ACK pending command) will be set, causing the command to complete.

    Before the publish command complete, it will try to update internal library context, that is protected by a semaphore that gets freed when the MQTTClient_delete is called, thus this can to be called only when there is no pending commands.

    Br,

     Kobi

  • Thanks Kobi,

    How do we ensure that there are no pending commands? Should I always wait for MQTTClient_run to return before assuming disconnect?

    Can you also help us to understand when MQTTClient_DISCONNECT_CB_EVENT is received and should we just ignore it?

    Regards,

    Zac

  • E.g. by adding critical section (e.g. mutex) protection around each call to MQTTClient, so the MQTTClient_delete won't be executed until MQTTClient_publish returns.

    Since SDK 4.20 we have an interface module (mqtt_if) that eases the use of library. You can check this implementation.

    Br,

    Kobi

  • Thanks Kobi,

    I have gone through mqtt_if implementation and found lot of changes. At the moment we may not have the bandwidth to port to the new implementation.

    Few observations during our testing are, MQTTClient_publish returns even when the internet is disconnected in the middle of a publish (QOS is 1 and blocking send is true). And with MQueue being used for publish and disconnect sequence, MQTTClient_delete will never be called unless MQTTClient_publish returns.

    While debugging, we could also see that  MQTTClient_DISCONNECT_CB_EVENT is received and MQTTClient_run command returns at the exact same time. We tried many cases like disconnecting internet during publish and while idle and the behaviour seems to be the same.

    To be on the safer side, should we ignore MQTTClient_DISCONNECT_CB_EVENT and always wait for MQTTClient_run to return and then initiate MQTT clean up sequence? Is there a scenario when MQTTClient_run will get stuck?

    Regards,

    Zac

  • Yes, you can ignore the MQTTClient_DISCONNECT_CB_EVENT and  do the cleanups when MQTTClient_run returns.

    if internet is off (as well as QOS>0 and blocking send is true), the publish should be blocked.

    Are you using the library from the latest SDK?

  • Hi Kobi,

    I am using SDK 3.30.01.02 (18 Oct 2019). For us if we try to publish a message (QOS is 1 and blocking send is true) as soon as internet is disconnected, the method MQTTClient_publish returns (with 3 or 4 seconds delay) with a non zero number.

    I have made changes as per your recommendation to invoke MQTTClient_delete only when MQTTClient_run is returned. The firmware is pushed and we may have to wait for a while to see if it resolves the issue.

    Meanwhile can you please help us to understand the below.

    1. What is the outcome when MQTTClient_delete is called before MQTTClient_run returns? Will MQTTClient_run gets stuck forever? Does it have other impacts on the networking layer?
    2. The default KA seems to be 25 seconds. What is the recommended or most commonly used KA while using MQTT?

    Regards,

    Zac

  • 1. you shouldn't do that. the "_delete" releases the connection resources (such as the mutex, signaling objects, etc) that the "_run" uses. As mentioned before, MQTTClient_delete can be called only when there are no pending MQTTClient calls (including the "_run").

    2. it depends on the application (considering what is more important power saving or latency, i.e. delayed failure detection). I saw people using KA > 10 minutes. 

    Br,

    Kobi

  • Thanks a lot Kobi. This seems to have solved my issues.

    Regards,

    Zac