This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3220SF-LAUNCHXL: Subscription to multiple mqtt topics fails when Blocking Send is disabled

Genius 3100 points
Part Number: CC3220SF-LAUNCHXL

In MQTT Client example (SDK 5.10), while Blocking Send is disabled and when we try to subscribe to multiple topics (more than 2 topics), subscription would fail with error code -5 (MQTT_PACKET_ERR_PKT_AVL)

I have tried with various QOS levels and with/without retained messages and the result is same.

Regards,

Zac

  • Can you send the reference code you are using?

  • I have taken the MQTT client example in SDK and enabled Secure MQTT connection with Blocking Send disabled. There were 4 topics and each of them have retained messages present.

  • Looking at the code (specifically, line 267 in mqtt_if.c) - it seems that currently when using the example as-is only the blocking mode is supported.

    Immediately after the MQTT connection is established, the application tries to send all the topic subscriptions.

    The error you see is due to limited resources in the module, so sending a second subscription command before the first one is completed (SUB-ACKed) will return the error you see.

    You can try to re-call the MQTT_IF_Subscribe if the specific error is returned (or wait till you get SUB-ACK before sending the next subscription).

  • Thanks Kobi. It would be really helpful if these interfaces are more complete and tested, before released along the SDK. We were facing issues with blocking send where MQTT Connect was getting stuck when the memory available is limited (7 tasks / threads running) and hence tested the non blocking send option. However it looks like we may need to analyse/understand further to ensure that the system is stable.

  • To add, we also found that some messages are missed while publishing multiple messages in the non blocking send mode. I guess this makes the non blocking send mode completely unusable. Adding checks for ACK does not seems to have any advantage over blocking mode.

  • Did you check the return values of the missed publish messages?

    The non blocking won't make sense in a sequential implementation (where there is no advantage over blocking i/f). It might be useful in event driven (state machine) implementation.

    Please provide details on your original (blocking) issue. The library was tested in the context of the existing app. Having more threads can impact the limited resources system. We can review the mqtt internal memory allocation and may be able to help optimizing it to your implementation.

    Br,

    Kobi

  • In blocking mode with secure server and 4 topics subscribed, MQTT_IF_Connect method sometimes get stuck and will not return. This happens for 2 or 3 tests out of 10 and the same works fine when few other threads are disabled which makes me believe that it is memory related. We are unable to debug into the exact step as this issue is random and mostly does not occur during a debug session. Another issue is while calling MQTT_IF_Disconnect, we often see the error [SOCK EVENT] - Unexpected Event [20x] error. The same works fine in the example code and only occurs when more threads are running. Not sure if this is also related to memory. 

  • What are the other threads doing during the MQTT connection?

    Does the MQTT_IF_Subscribe being called before or after the MQTT_IF_Connect? 

    Can you provide sniffer log (showing the MQTT traffic)? the module may be stuck waiting for the conn-ack or sub-ack but this shouldn't be impacted by the other threads)?

    Please try to check the heap status and thread's stack peaks (both can be viewed through the RTOS Object Viewer: in CCS check Tools->ROV->HeapMem/Task during debug session).

    Br,

    Kobi

    • Other threads include WiFi provisioning task, NTP thread, OTA task, Scheduler (app specific), GPIO read/write handler and watchdog.
    • MQTT_IF_Subscribe is currently called before MQTT_IF_Connect. We also tried the subscription after connection and it works in our initial testing, however this may need additional handling during reconnect process. Also note that the code works fine when I disable few other threads like Scheduler and OTA
    • Can you help me to get the sniffer logs from CCS? (I am using macOS Big Sur)
    • Thread stack peaks are with in the range and heap looks fine on ROV (snapshot attached)

  • I'm not sure what is the Scheduler thread, but does the OTA thread actually executes during the failed MQTT operation (or is it just pending/sleeping)?

    Maybe you can try raising the priority of the MQTTContextThread and the MQTTAppThread.

    Seems that you can reduce stack size for several threads (e.g. watchdog and GPIO) and gain more heap space (although i understand the heap is currently not the problem).

    CCS can't be used to sniff MQTT traffic. It should be done by an external sniffer (e.g. wireshark).

    Upon reconnect, the previous subscriptions will get (re)subscribed automatically (following the re-connection), so you are right that having an issue with the auto-subscription can be problematic. The only fix may be to unsubscribe in response to disconnection and then and subscribe again after the (re)connection is completed. 

      

  • OTA thread is scheduled to run at certain intervals and should not interfere with MQTT.

    I have raised the priority of both MQTTContextThread and MQTTAppThread, but the issue still occurs. 

    watchdog and GPIO stack peak increases on few operations and hence there is no scope for reducing their stack size.

    Subscribing to topics after a connection does work, however unsubscribe to the topics during a disconnection fails with an error 'not connected to broker' and the module resets.

    Please note that, few topics have multi-level wildcards and most topics have retained messages. This will result in receiving multiple messages concurrently upon subscription. It might be causing the connect method to get stuck. Can we disable auto subscription? or can you suggest some workaround ?

  • I think I found out the issue. The topics have wildcards in them resulting in more that 10 messages delivered simultaneously, however the max queue size of "msgQueue" in mqtt_if.c is only 10 which seems to blocking the RECV event initially. Increasing its size to 20 seems to work so far. I shall do further tests to confirm.

  • Thanks for this info.

      

    Can you send an example of a topic with wildcard (that will translate to more than 1 packet) as you are using?

    The standard MQTT wildcard should not produce more events than non-wildcard subscriptions (maybe there were pending events on the server). 

    I'm still not sure how the event queue size  can cause the MQTT_IF_Connect to get stuck The queue is for the application thread to handle the stack events and i don't remember the application is blocked on any specific event.

    Can you tell where exactly was the MQTT_IF_Connect being stuck? e.g. by adding prints or by checking the state of the thread in ROV.

    I don't think this is related to the wildcards and would like to reproduce it and double check the issue.

    Thanks,

    Kobi

  • An example of MQTT wildcard is /house/room/# which has /house/room/appliance1, /house/room/appliance2, /house/room/appliance3 etc.

    Each of these topics will have retained messages and upon subscribing to /house/room/#, it will push all the messages and in our case we had 12 or more different messages being pushed. Please note that these are just retained messages and they are not pending events (non persistent sessions)

    Our testing so far shows that the issue was with the queue size. mq_send seems to be blocking when the queue is full.

    Regards,

    Zac

  • OK, That makes more sense. But even when the mq_send gets stuck, it should be released when the app thread call mq_recv, so the entire deadlock is still not clear to me.

    i guess the mqtt_if could have used O_NONBLOCK flags when creating the message queue (but then you might have missed some messages - so your fix is better).