This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

handle mqtt reconnect

Other Parts Discussed in Thread: CC3200, CC3100

Hi everyone,

I'm having some difficulties to handle mqtt disconnection when the device encountered it.

I want realize a system that if the cc3200 lose connection with broker, it retries to connect.

Any suggestion?

Thanks in advance.

  • Hi Aeromechs,

    What is the difficulty that you are encountering? If the connection is lost, you just need to re-connect.
    We provide the MQTT library for our CC3100/CC3200. You can simply call the sl_ExtLib_MqttClientConnect() API to connect to the broker again.
  • I've tried to handle the reconnect in the sl_MqttDisconnect method, rebooting MCU in this method, but the system doesn't reboot

    my configuration

    connect_config usr_connect_config[] =
    {
    {
    {
    {
    SL_MQTT_NETCONN_SEC | SL_MQTT_NETCONN_URL,
    SERVER_ADDRESS,
    PORT_NUMBER,
    SL_SO_SEC_METHOD_SSLv3_TLSV1_2,
    SL_SEC_MASK_TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
    1,
    security_file_list
    },
    SERVER_MODE,
    true,
    },
    NULL,
    user,
    mqtt_username,
    mqtt_password,
    true,
    KEEP_ALIVE_TIMER,
    {Mqtt_Recv, sl_MqttEvt, sl_MqttDisconnect},
    TOPIC_COUNT,
    {topic_set, topic_get, topic_setup, topic_reboot},
    {QOS2, QOS2, QOS2},
    {WILL_TOPIC,WILL_MSG,WILL_QOS,WILL_RETAIN},
    false
    }
    };


    static void
    sl_MqttDisconnect(void *app_hndl)
    {
    RebootMCU();

    }


    static void RebootMCU()
    {

    //
    // Configure hibernate RTC wakeup
    //
    PRCMHibernateWakeupSourceEnable(PRCM_HIB_SLOW_CLK_CTR);

    //
    // Delay loop
    //
    MAP_UtilsDelay(8000000);

    //
    // Set wake up time
    //
    PRCMHibernateIntervalSet(330);


    //
    // Request hibernate
    //
    PRCMHibernateEnter();

    //
    // Control should never reach here
    //
    while(1)
    {

    }
    }


    What is wrong?
  • Hi Aeromechs,

    Is the new issue about the sl_MqttDisconnect() callback not being triggered, or the device doesn't enter hibernation?

  • Dear TI engineers,

    There is a big issue with the CC3200 when it comes to stability running for long periods of time.
    in ServicePack 1.0.0.10, (MQTT client based application)
    after running for many hours/days, the chip is frozen and doesn't respond to commands properly, not even MCU reset (via hibernation).

    in ServicePack 1.0.1.6 / SDK 1.2, the MQTT client disconnects and it does fire the event, I try to reconnect. sometimes it works
    but after several times, it just hangs and doesn't work anymore. Not even the MCU reset works as well!

    SDK 1.2 is very buggy I believe , and the above issues (stability with MQTT and CC3200 O.S/platform) is a big problem for
    us and other manufacturers who want to build IoT capable devices based on this chip.

    in our case, the device is in a "difficult to reach" area and we will be spanning across a thousand units, so manually resetting each device is not a solution for us.

    Please find a solution to this problem.

    F.K.
  • Hi Fadi,

    If I understand you correctly, the issue is observed in both SDK 1.1.0 and 1.2.0, but 1.2.0 has this error much quicker than 1.1.0?
  • Hi Victor,

    Thanks for getting back to us.

    While it is very difficult for me to provide that info precisely, I can comment on it to help steer the conversation.

    the firmware has some logic to that is described as:

    while( MqttConnect() )
    {
          Message("Error: Can't connect to broker, retry in in 5 seconds\r\n");
          osi_Sleep(5000);
    }

    MqttConnect() sets up the parameters needed and then connects to the broker. (port, authentication, topic, timeout, lastwill, ...etc reading from config file saved)

    The last UART log message we get is the above. and never retries after 5 seconds or anything like that. just dies/hangs.
    Web access to the device is working to a degree, however functionality is not 100% correct, meaning a reset button on the page that would allow for a remote
    reset doesn't execute successfully and the device never resets (hibernate cycle).

    The above was logged via an ESP8266 module acting as a wireless uart logger since as I mentioned the devices are placed in not easy to stay-next-to places
    so keeping my laptop attached to the uart is near impossible.

    This is from sdk 1.1.0
    After migrating to 1.2 thinking that it would be better, what I see now is more stuck/hanging in the chip that happens either after many retries to connect to the broker. What we also noted is that now, we get stuck after connecting and disconnecting from the wifi. Meaning of for any reason wifi decides to disconnect, whether because signal issues, or getting kicked off for any technical/protocol reasons, after reconnect attempts, the CC3200 just hangs.
    No more logging and no more cycles executed.

    the only solution is to literally power-cycle the device. But even then no guarantees that connecting to wifi would be smooth, it might still fail to connect easily and hang again.

    I know that my issue might be broader than the original post, but I know we share the same issue partially with MQTT disconnects and failing to get reset properly after that.

    Thank you,

    Fadi

  • Hi Victor,

    the mqttdisconnect callback not being triggered.
    I've tried to write an UART_PRINT("I am HERE!"); in the callback, but the cc3200 doesn't print nothing on the uart.
  • Hi Fadi, Aeromechs,

    It looks to me that there are multiple issues involved in the MQTT application:

    • The application stops execution after a certain period of time. -> Did you run the MQTT Client application directly from the SDK and see the issue? I tested the same SDK application (TI-RTOS version) on my side and the it's still able to publish and receive events even after several hours run-in.

    • The sl_MqttDisconnect() callback doesn't get triggered. -> I tried your code with the hibernation and it works for me. See the followings screenshot.

  • Hi Fadi,

    Can you please provide more details on the issue?

    • What line of your code is getting stuck? Do you see the same issue when running the SDK MQTT client example directly?
    • What do you mean exactly by getting stuck? are you able to perform any NWP operations?
    • Can you please provide some code snippet showing how you perform the reset?
    • Can you please provide some console print outs?

  • Hi Victor,

    Thanks for getting back to us.

    I can provide you with the information requested. Just need time to do the test for you and get things sorted.

    Should we decide on using SDK1.2 for the test or do you recommend we stay with the prev. version?
    I have not tested the same MQTT example that came with the SDK due to the fact that had to make some changes
    for our own pruposes. (i.e reading data from UART before pushing back via MQTT, certain config parameters done via html page...etc)

    Let me try and study this on my end to see how I can present you with consistent information from a scientific perspective in an effort
    to help you (TI) troubleshoot this with us for a proper solution.

    Please feel free to reach out to me via PM also.

    Fadi

  • Hi Fadi,

    Either SDK should work just fine.
  • Hi victor,
    I've tried to connect my code, based on mqtt_client example to my wifi network.
    When I turn off the wifi, after some seconds I get a " [WLAN ERROR] disconnected from ap" and a "[SOCK ERROR] -close scoket (81)" error, so, the code never execute mqtt_disconnect.
    I want handle the connection lose error.
    I want reconnect to wifi my cc3200 when wifi lose, and reconnect it to mqtt broker.
    How I can handle this events?
    Thanks for the help.
  • Hi Aeromechs,

    These are async events coming from the NWP. You can setup a flag to detect these events and perform corresponding actions. Please see chapter 19 of NWP Programmer's Guide.

  • Hi victor,
    I've tried to handle this event, but I can't understand why when I try to reconnect to the broker after a connection fail, the connection continues to fail
  • Hi Aeromechs,

    Can you please elaborate the issue a bit more? By "connection fail", do you mean Wi-Fi connection failing or MQTT connection failing?
    Are there console printouts that you can share?
  • Hi,
    by connection fail I mean MQTT connection failing.

    I can't reconnect to MQTT if MQTT connection fail and when a connection established with MQTT encounter sock error (by server die or internet connection lost).
    I want handle this issues because the internet connection is not very stable.
    When I try to reconnect after a connection fail, even if the mqtt broker goes on, the connection fail.

    console return

    [SOCK ERROR] - close socket (81) operation failed to transmit all queued packets

    [SOCK ERROR] -close scoket (81) operation connection less mode, rx packet fragmentation
    > 16K, packet is being released

    sometimes 

    [SOCK ERROR] - close socket (82) operation failed to transmit all queued packets

    but the errors are the same

    when the cc3200 is connected to broker and I close the server the errors are

    [SOCK ERROR] -close scoket (81) operationconnection less mode, rx packet fragmentation
    > 16K, packet is being released[SOCK ERROR] -close socket (81) operationremote side down from secure to unsecure
    unknown sock async event: 2
    [SOCK ERROR] - close socket (81) operation failed to transmit all queued packets

  • Finally I've arrived to a partial solution of problem.

    I read the known issues of sdk and I've modified the file as suggests. This works when the cc3200 is connected to the broker, then I kill the broker on an AWS server. But doesn't work when I turn off internet connection (not wifi, only unplug the adsl cable)

    now the mqtt_client method is called when there is a disconnection.

    Then I've implemented a method that unlock a sync object that unlock a task

    //these are two task
    
    void retry_connection(void *pvParameters){
    
    	int lRetVal=-1;
    while(1){
    	osi_SyncObjWait(&semafore_down_position,OSI_WAIT_FOREVER);
    	disconnect_from_broker();
    	lRetVal = Network_IF_ConnectAP(wifi_ssid, SecurityParams);
    	    if(lRetVal < 0)
    	    {
    	       MAP_UtilsDelay(80000000);
    	       //LOOP_FOREVER();
    	    }
    
    		}
    }
    
    
    void retry_broker_connetion(connect_config *local_con_conf){
    
    	int lRetVal;
    
    while(1){
    	osi_Sleep(4000);
    	osi_SyncObjWait(&semaphore_retry_broker_connection,OSI_WAIT_FOREVER);
    	reconnect_to_broker();
    
    
    }
    
    
    }
    
    
    //end
    
    //modified Network_IF_ConnectAP as 
    long
    Network_IF_ConnectAP(char *pcSsid, SlSecParams_t SecurityParams)
    {
    #ifndef NOTERM  
        char acCmdStore[128];
        unsigned short usConnTimeout;
        unsigned char ucRecvdAPDetails;
    #endif
        long lRetVal;
        unsigned long ulIP = 0;
        unsigned long ulSubMask = 0;
        unsigned long ulDefGateway = 0;
        unsigned long ulDns = 0;
    
        //
        // Disconnect from the AP
        //
        Network_IF_DisconnectFromAP();
        
    
    
        //
        // This triggers the CC3200 to connect to specific AP
        //
        lRetVal = sl_WlanConnect((signed char *)pcSsid, strlen((const char *)pcSsid),
                            NULL, &SecurityParams, NULL);
        ASSERT_ON_ERROR(lRetVal);
    
    
        //
        // Wait for ~10 sec to check if connection to desire AP succeeds doveva essere 15 al posto di 10
        //
        while(g_usConnectIndex < 10)
        {
    #ifndef SL_PLATFORM_MULTI_THREADED
            _SlNonOsMainLoopTask();
    #else
                  osi_Sleep(1);
    #endif
            MAP_UtilsDelay(8000000);
            if(IS_CONNECTED(g_ulStatus) && IS_IP_ACQUIRED(g_ulStatus))
            {
                break;
            }
            g_usConnectIndex++;
        }
    
    #ifndef NOTERM
        //
        // Check and loop until AP connection successful, else ask new AP SSID name
        //
        while(!(IS_CONNECTED(g_ulStatus)) || !(IS_IP_ACQUIRED(g_ulStatus)))
        {
    
            //
            // Disconnect the previous attempt
            //
            Network_IF_DisconnectFromAP();
            
            CLR_STATUS_BIT(g_ulStatus, STATUS_BIT_CONNECTION);
            CLR_STATUS_BIT(g_ulStatus, STATUS_BIT_IP_AQUIRED);
            UART_PRINT("Device could not connect to %s\n\r",pcSsid);
            if(wpa_errata==1){return -1;}
            /*do
            {
                ucRecvdAPDetails = 0;
    
                UART_PRINT("\n\r\n\rPlease enter the AP(open) SSID name # ");
    
                //
                // Get the AP name to connect over the UART
                //
                lRetVal = GetCmd(acCmdStore, sizeof(acCmdStore));
                if(lRetVal > 0)
                {
                    // remove start/end spaces if any
                    lRetVal = TrimSpace(acCmdStore);
    
                    //
                    // Parse the AP name
                    //
                    strncpy(pcSsid, acCmdStore, lRetVal);
                    if(pcSsid != NULL)
                    {
                        ucRecvdAPDetails = 1;
                        pcSsid[lRetVal] = '\0';
    
                    }
                }
            }while(ucRecvdAPDetails == 0);*/
    
            //
            // Reset Security Parameters to OPEN security type
            //
            //SecurityParams.Key = (signed char *)"";
            //SecurityParams.KeyLen = 0;
            //SecurityParams.Type = SL_SEC_TYPE_OPEN;
    
            UART_PRINT("\n\rTrying to connect to AP: %s ...\n\r",pcSsid);
    
            //
            // Get the current timer tick and setup the timeout accordingly
            //
            usConnTimeout = g_usConnectIndex + 15;
    
            //
            // This triggers the CC3200 to connect to specific AP
            //
            lRetVal = sl_WlanConnect((signed char *)pcSsid,
                                      strlen((const char *)pcSsid), NULL,
                                      &SecurityParams, NULL);
            //UART_PRINT("%d\n\r",lRetVal);
            ASSERT_ON_ERROR(lRetVal);
    
            //
            // Wait ~10 sec to check if connection to specifed AP succeeds
            //
            while(!(IS_CONNECTED(g_ulStatus)) || !(IS_IP_ACQUIRED(g_ulStatus)))
            {
    #ifndef SL_PLATFORM_MULTI_THREADED
                _SlNonOsMainLoopTask();
    #else
                  osi_Sleep(1);
    #endif
                MAP_UtilsDelay(8000000);
                if(g_usConnectIndex >= usConnTimeout)
                {
                    break;
                }
                g_usConnectIndex++;
            }
    
        }
    #endif
        //
        // Put message on UART
        //
        UART_PRINT("\n\rDevice has connected to %s\n\r",pcSsid);
    
        //
        // Get IP address
        //
        lRetVal = Network_IF_IpConfigGet(&ulIP,&ulSubMask,&ulDefGateway,&ulDns);
        ASSERT_ON_ERROR(lRetVal);
    
        //
        // Send the information
        //
        UART_PRINT("Device IP Address is %d.%d.%d.%d \n\r\n\r",
                SL_IPV4_BYTE(ulIP, 3),SL_IPV4_BYTE(ulIP, 2),
                SL_IPV4_BYTE(ulIP, 1),SL_IPV4_BYTE(ulIP, 0));
    
        if(check_disconnection==1){
    
                    	reconnect_to_broker();
                    }
        return 0;
    }
    
    
    void reconnect_to_broker(){
    
    	//RECONNECT_BROKER
    	tPushButtonMsg sMsg;
    
    	osi_messages var = RECONNECT_BROKER;
    
    	sMsg.received=var;
    
    	osi_MsgQWrite(&g_PBQueue,&sMsg,OSI_NO_WAIT);
    
    }
    
    //in the queue handler in main function
    else if(RECONNECT_BROKER == RecvQue.ricevuto){
    
                if((sl_ExtLib_MqttClientConnect((void*)local_con_conf[iCount].clt_ctx,
                                    local_con_conf[iCount].is_clean,
                                    local_con_conf[iCount].keep_alive_time) & 0xFF) != 0)
                {
                    UART_PRINT("\n\rBroker connect fail for conn no. %d \n\r",iCount+1);
    
    
                   
                }
                else
                {
                    UART_PRINT("\n\rSuccess: conn to Broker no. %d\n\r ", iCount+1);
                    local_con_conf[iCount].is_connected = true;
                    //iConnBroker++;
    
                }
    
                lRetVal=sl_ExtLib_MqttClientSub((void*)local_con_conf[iCount].clt_ctx,
                                                       local_con_conf[iCount].topic,
                                                       local_con_conf[iCount].qos, TOPIC_COUNT);
                if(lRetVal<0){
    
    
                	sl_ExtLib_MqttClientDisconnect((void*)local_con_conf[iCount].clt_ctx);
                	osi_SyncObjSignal(&semaphore_retry_broker_connection);
    
    
                	
    
    
                }
    
    //end

  • Hi Aeromechs,

    Thank you for the feedback. We'll investigate this issue.
  • Hi victor,
    can you explain me how I can get the connection error code from sl_ExtLib_MqttClientConnect function?

    If is possible to have a code snippet.
    Thanks in advance.
  • I am having a similar problem. This happen both on SDK 1.1.0 and 1.2.0. My code is suppose to reconnect when MQTT broker connection is lost. I simulated connection lost by removing network cable of the WiFi router. It behaves as expected for about 4 times. But then it consistently fail to reconnect to MQTT broker at the fifth reconnection. After that it cannot reconnect again. The MQTT_client example in the SDKs were not designed to reconnect upon disconnection. For successful broker connection there seems to be a small delay for the sl_ExtLib_MqttClientConnect function call to return. For the fifth and subsequent reconnection, the sl_ExtLib_MqttClientConnect function call return immediately. It is as if the MQTT library does not even make an attempt to connect to the broker.

  • Eruan Abdul Razak said:

    I am having a similar problem. This happen both on SDK 1.1.0 and 1.2.0. My code is suppose to reconnect when MQTT broker connection is lost. I simulated connection lost by removing network cable of the WiFi router. It behaves as expected for about 4 times. But then it consistently fail to reconnect to MQTT broker at the fifth reconnection. After that it cannot reconnect again. The MQTT_client example in the SDKs were not designed to reconnect upon disconnection. For successful broker connection there seems to be a small delay for the sl_ExtLib_MqttClientConnect function call to return. For the fifth and subsequent reconnection, the sl_ExtLib_MqttClientConnect function call return immediately. It is as if the MQTT library does not even make an attempt to connect to the broker.

    I also have the similar problem and dispite some other TI forum posts, there exists no final and reliable answer to this fundamental MQTT library problem.

    After each reconnection  with sl_ExtLib_MqttClientConnect(), the "net number" is incrementing. It seems like if the library ends up at the limit of 4 brokers.
    I did not find a way to reset the net number to 17 because the MQTT library programming and implementation style is not very transparent.
    In my application, a total reset is not possible because user application must continuously run.


    Terminal screenshots:

    Version: Client LIB 1.0.4, Common LIB 1.1.1.
    C: FH-B1 0x10 to net 17, Sent (43 Bytes) [@ 3]
    C: Rcvd msg Fix-Hdr (Byte1) 0x20 from net 17 [@ 3]
    C: Cleaning session for net 17
    C: Msg w/ ID 0x0000, processing status: Good
    ...

    After 4 reconnections:

    C: FH-B1 0x10 to net 20, Sent (43 Bytes) [@ 527]
    C: Rcvd msg Fix-Hdr (Byte1) 0x20 from net 20 [@ 527]
    C: Cleaning session for net 20
    C: Msg w/ ID 0x0000, processing status: Good
    ...

    After next reconnection attempt, the library hangs. No debug message, no disconnection callback.

    Dear TI employees, could you please provide a final and reliable solution th this fundamental issue?
    (Answers like "You can simply call the sl_ExtLib_MqttClientConnect() API to connect to the broker again" in an earlier post do not help)

    Thanks and regards
    Klaus

  • Hello Klaus,

    Can you please open a new thread as this one is closed?

    You acn add a link to this one as reference.

    Regards,

    Shlomi

  • Problem solved. Once you detect broker disconnected call sl_ExtLib_MqttClientCtxDelete function before re-connection. I suspect some memory allocation must be released before new connection memory allocation. 

  • Hi Eruan
    Thanks for the suggestion. I already have checked the call to sl_ExtLib_MqttClientCtxDelete(). After calling, reconnection was not possible at all. What action did you do just before calling sl_ExtLib_MqttClientCtxDelete()?

    Please refer to the new thread for future discussion because this one was declared as closed by Shlomi:
    e2e.ti.com/.../559706

    Thank you
  • Here is a snippet of a pseudo code

    while(1)

    {

    if((broker_config.is_connected==false) && wifi_is_connected  && IP_ACQUIRED)

    {

    sl_ExtLib_MqttClientCtxDelete(broker_config.clt_ctx);

    mqttbrokerconnect();

    }

    Task_sleep(5000);
    }

  • Thanks for your assistance. Your proposal is just one attempt how I tried to solve the problem. But unfortunately with no success.
    After executing
    sl_ExtLib_MqttClientCtxDelete()
    --> sl_ExtLib_MqttClientConnect() returns error 0x000000FF

    Without executing sl_ExtLib_MqttClientCtxDelete()
    --> sl_ExtLib_MqttClientConnect() connects to broker, but the "net number" is incremented and hangs after 3 retries as described in my earlier post.
  • Once a call to sl_ExtLib_MqttClientCtxDelete is made, you need to call sl_ExtLib_MqttClientCtxCreate() before sl_ExtLib_MqttClientConnect().