This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Connected LaunchPad IoT Quickstart hangs after disconnect/reconnect to network

Other Parts Discussed in Thread: EK-TM4C1294XL, LM3S8962

Has anyone else noticed that the Connected LaunchPad IoT Quickstart example hangs after disconnecting and reconnecting to the network?

Using the Tiva C Series Connected LaunchPad (EK-TM4C1294XL - using XM4C1294NCPDT123CA32VW MCU chip) board's original IoT Quickstart example application, do the following:

  1. Plug in the Ethernet cable to the board
  2. Power on the Connected LaunchPad
  3. Verify connection and communications with the T.I. Exosite website
  4. Unplug the Ethernet cable from the board
  5. On the Connected LaunchPad virtual COM port, issue the "stats" command
  6. Observe the stats are continually updated about once a second
  7. Plug the Ethernet cable back into the board
  8. Observe the stats output stops after 4 to 5 seconds and the board is unresponsive to any commands

This is very repeatable. I have recompiled the qs_iot application using the Yagarto GCC compiler and get the same results. Unfortunately, I don't have a debugger to determine where the program is hanging!

Note that this does not appear to be a problem with any of the other network-related examples I tried:

  • enet_io
  • enet_lwip
  • enet_uip
  • enet_weather

  • Hello Dan

    The debugger is essential to understand where the program is stuck.

    Regards

    Amit

  • As a followup:
    Using spare GPIOs, instrumented code, and a logic analyzer, I have uncovered the following:

    There appears to be 3 different types of (apparently related) hangs occurring:

    1. Infinite loop when processing an lwip TCP timer in "tcp_fasttmr()"
    2. Infinite loop when receiving Ethernet TCP input in "tcp_input()"
    3. Hard Fault exception when allocating memory for TCP tcp_pcb structure in "tcp_alloc()"

    The first two hangs are the most prevalent hangs while the Hard Fault exception hang appears very infrequently.

    ---
    In the first hang, in function "tcp_fasttmr()" is a while loop (Line 1055) that never exits because the test at the next line (Line 1056) has 'pcb->last_timer the same as tcp_timer_ctr'. This appears to be due to a corruption of the 'tcp_active_pcbs' linked list in which the current, next, and the next's next 'pcb' are all the same value.

    The first hang's function call trace prior to the hang is as follows:
      - the "lwIPEthernetIntHandler()" interrupt service routine
         - calls "lwIPServiceTimers()"
            - which calls "lwIPHostTimerHandler()"
              - which calls "tcp_tmr()"
                 - which calls "tcp_fasttmr()"
                    - in: TivaWare\third_party\lwip-1.4.1\src\core\tcp.c
    ---
    Likewise, in the second hang, in function "tcp_input()" is a for loop (Line 170) that never exits because the 'pcb is never NULL'. This appears to be due to a corruption of the 'tcp_active_pcbs' linked list in which the current, next, and the next's next 'pcb' are all the same value.

    The second hang's function call trace prior to the hang is as follows:
      - the "lwIPEthernetIntHandler()" interrupt service routine
         - calls "lwIPServiceTimers()"
            - which calls "tivaif_receive()"
              - which calls "ethernet_input()"
                 - which calls "ip_input()"
                    - which calls "tcp_input()"
                       - in: TivaWare\third_party\lwip-1.4.1\src\core\tcp_in.c
    ---
    The third hang results in an infinite loop in the "FaultISR()" interrupt handler for a Hard Fault exception in the "tcp_alloc()" function (starting at Line 1278 in tcp.c). Similar to the other hangs, it is believed this is possibly due also to some kind of memory corruption.

    The third hang's function call trace prior to the hang is as follows:
      - the "SyncWithExosite()" routine
         - calls "Exosite_Write()"
            - which calls "connect_to_exosite()"
              - which calls "ethernet_input()"
                 - which calls "exoHAL_SocketOpenTCP()"
                    - which calls "tcp_new()"
                       - which calls "tcp_alloc()" which has a Hard Fault
                          - in: TivaWare\third_party\lwip-1.4.1\src\core\tcp.c
    ---

    In attempting to uncover where the "tcp_active_pcbs" linked list appears to get corrupted, it seems to occur in "tcp_connect()" in tcp.c by the TCP_REG_ACTIVE() macro call (at Line 772), but I am not sure how its happening.

    ---
    In summary, this problem seems to happen when the application is sending network traffic when the network stack is coming back up after plugging the network cable back in after unplugging the network cable during active communications. In all of the other T.I. TivaWare example applications for the Connected LaunchPad board I tried, I believe this problem did not occur because they generally were not actively sending network traffic when the network cable was disconnected and then reconnected.

    Also, as an experiment, I ported the Connected LaunchPad qs-iot example application to the Stellaris LM3S8962 Ethernet+CAN Evaluation board using StellarisWare and older lwip version 1.3.2. The problem does *not* occur with that setup. Additionally, I ported the FreeRTOS-based senshub_iot TivaWare example to run the qs-iot example and I had the same problem after disconnecting and reconnecting the network cable, leading me to believe the problem is related to the Tiva C Series Ethernet driver or the newer version of the lwip (version 1.4.1) network stack.

    In conclusion, I consider this a "show-stopping" problem in that one cannot tolerate a system hang because the network goes down and comes back up during active network communications!


  • As a further followup, I believe I have found the root cause of this problem!

    Doing further investigation of the qs-iot firmware, I enabled debugging by doing a global #define of DEBUG=1 in the Makefile and enabled some lwip debugging by setting MEMP_SANITY_CHECK to 1 in lwipopts.h. With this debugging enabled, as soon as I disconnected the Ethernet cable, I got the following message:

    "ASSERT FAIL at line 464 of TivaWare/third_party/lwip-1.4.1/src/core/memp.c: memp sanity"

    In investigating the memory freeing function resulting in the memp sanity when the Ethernet cable is disconnected, I discovered that a MEMP_TCP_PCB 'pcb' is getting freed twice, corrupting the memp_tab[MEMP_TCP_PCB]  pcb linked list! (see related comment starting on Line 204 of tcp.c).

    Here's what I believe is happening:

    In the lwIPServiceTimers() in lwiplib.c, when the Ethernet link goes down, two separate timed handlers are called that result in a memp_free() getting called. In one of these handlers (lwIPLinkDetect() - which is called every 10mS), a tcp_abort() is done and the in the other handler (lwIPHostTimerHandler() - which is called every 100mS), a tcp_close() is done. The end result is a double free of the same item, resulting in the corruption of  the memp_tab[MEMP_TCP_PCB]  pcb linked list and the eventual hang of the firmware when the Ethernet cable is plugged back in and the corrupted link list is used for allocation of pcb items.

    I am not sure what the solution to this problem is except to ensure that there is not two attempts to free the same allocated item when the Ethernet cable is disconnected!

    I think a T.I. TivaWare developer needs to investigate this and provide a fix in the next release of TivaWare.

  • Does a spokes person from Ti have a time line or even progress for the fix of the IoT Launchpad demonstration? The use of these boards with a cloud system (e.g. Exosite) would be a fantastic starting point for lots of developers; But the system has to offer long term stability to be of any use in remote data monitoring.....

    This thread is also covering the issue and has not been supported by Ti:

    http://e2e.ti.com/support/microcontrollers/tiva_arm/f/908/t/358842.aspx?pi307171=3