This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Connected LaunchPad Quick-start IoT Application dies after a day or two

We have experienced this issue on all Connected LaunchPads we have using the default application shipped on the Connected LaunchPad (CLP) and also after compiling the 'qs_iot' project and programming it to the CLP.

Essentially the device stops working after a day.  I do not have a specific amount of time although it should not be hard to let run a few times to see if it is always the same.  

I've not hooked this up with the debugger and let it run, which would be a next step to trouble-shoot ourselves.  I think others are seeing this and I wanted to know if TI engineers were aware of this issue.

These forum posts may be related

http://e2e.ti.com/support/microcontrollers/tiva_arm/f/908/p/332304/1158414.aspx

http://e2e.ti.com/support/microcontrollers/tiva_arm/f/908/p/332931/1164075.aspx#1164075

107 Replies

  • Just tried an experiment.

    Setup: Lauchpad to Linksys router to Asus router to cable dcm476 modem to internet.

    Unplug the LaunchPad for a second and plug it back. Then it is not possible to ping it even after a few minutes. Does not show up on the network. Only a press of the reset button will do. The exosite will be unaware of that until  a long while.

    I can disconnect the cable modem (drop internet)  and back with no issues.

    I can drop the Linksys to the Asus link with no issues.

    If I drop the LaunchPad from the Linksys or power down the Linksys for a fraction of a second and it become un-pingable and I have to do a "reset" button.

    Same results with a fixed IP or DHCP to the LauchPad from the network.

    With the time delay the exosite is oblivious to the changes when done fast enough.

    Looks like this is one issue with the LauchPad software or board itself connecting to the straight local link?

  • In reply to Richard Normandin:

    @Mike A

    Thanks for the info. Much appreciated.

    I repeated my quick experiment with the same results.

    However I noted that I have to time the internet down at the modem to internet in between temperature being sent to the website. If the net is down at that time looks like the unit freezes.

    On the other hand should there be a brief interruption in the connection between the LaunchPad and the router it goes down instantly.

    Perhaps I am looking at two different issues. The first one as you described and another one having to do with the LauchPad local network link. You might want to try and see if you can reproduce that latter one.

    Cheers.

    Richard

  • In reply to Richard Normandin:

    I've fixed the issue where our server responds with a 401 (no auth) and the device stops reporting.  It isn't really a bug, as it is how the code was developed, I assume to focus on letting developer know about this state.  It's clear though this is not useful to users/developers as for monitoring application, it is preferred that the device keeps running, first trying to see if it should activate again and then keep running with it's current CIK if no new CIK is available.  I'm running tests on my changes and we are making some other tweaks to that code including adding a routine to blink one of the application LEDs quickly if there is a connection problem and printing out the IP address at boot.  I've had a unit run for a few days now without stopping.

    It does look like there is one other problem though.  If you disconnect the Ethernet cable (simulate a network problem),  the app just stops running instead of keep running with error responses.  When you plug the network back in, it doesn't recover from this state.   I've not yet figured out why this happens and am digging into it.  

    There are also some other items we are attempting to clean up that hopefully will make this a bit more robust.  I'm going to attach the qs_iot.c file here with my updates.  (Sorry I am trying to figure out if I can do a public code repo that would show changes but checking TI code license).  

    Disclaimer, there may be issues yet with the attached qs_iot.c file but I think it is good to use as is and there are no other file changes that it would be dependent on.  6428.qs_iot.c

  • In reply to Mike Aanenson:

    Fantastic Mike. Will try it immediately.

  • In reply to Mike Aanenson:

    Glad to see that the "issues" have been replicated as we are having the same issues with several launchpads.


    Than you for working on this as having a rock solid sample that handles network issues with graceful degradation and recovery is extremely valuable.

  • In reply to Mike Aanenson:

    Great work Mike.... I was getting a bit worried there at first as I thought my application layer had a bug and was causing the drop out on my trial system with Exosite.... amazing how time flies trying to find a bug in your own code that doesn't really exist.

    Cheers,

    Mike

  • In reply to Mike Aanenson:

    I haven't been able to actually observe a 401 coming back from our platform yet. I had my board running through wireshark over the weekend and it continued to work for over 63 hours. However, after restarting the board this morning I am able to reproduce the issue that some are seeing (not sure if it's actually been said in this thread, but I know there are support tickets about it) where the device comes online only for a few seconds after a reset. It looks like the board is sending a request  and getting a reply, wireshark shows a RTT of about 220 ms, but for some reason the board never actually sees the response. (Or maybe it somehow sees it as a 401.) Then the board decides that the CIK must be bad and says it tries to activate, but I never see any activate call come out.

    I haven't been able to catch this in the debugger and my board seems to be working again so I'm not sure if I'll be able to.

    If anyone else out there that is seeing these issues would be able to run their board through wireshark to see what's happening, please let me know what you find.

    (Note: In case it wasn't obvious I also work for Exosite.)

  • In reply to Patrick Barrett:

    Today I finally captured a packet on the server side for what the Exosite server sees:

    POST /onep:v1/stack/alias HTTP/1.1
    Host: texasinstruments.m2.exosite.com
    POST /onep:v1/stack/alias HTTP/1.1
    Host: texasinstrumenUser-Agent:ti-ek-tm4c1294xl
    Content-Type: application/x-www-form-urlencoded; charset=utf-8
    Content-Length: 51

    It should look like this:

    POST /onep:v1/stack/alias HTTP/1.1
    Host: texasinstruments.m2.exosite.com
    X-Exosite-CIK: XXXXXXXXXXXXXXXXXXXXXXXXX
    User-Agent:ti-ek-tm4c1294xl
    Content-Type: application/x-www-form-urlencoded; charset=utf-8
    Content-Length: 53

    I've added a User Agent line to help me catch these packets in addition to using vendor specific Host address. You can see here the X-Exosite-CIK header is over-written totally and the User-Agent header is malformed. I'm writing some code to check what the device code thinks it is sending out so see if the creation of these strings before sending to the socket is the issue or if it happens after sending it to the socket.

  • In reply to Mike Aanenson:

    A quick update, printing the buffer characters for the exoHAL_SocketSend() function, here is what was sent and you can see from the server log what the server received:

    Device:

    POST /onep:v1/stack/alias HTTP/1.1
    Host: texasinstruments.m2.exosite.com
    X-Exosite-CIK: 20963248ad4730a749d90ce15dd108aa6a74c5ac
    Content-Type: application/x-www-form-urlencoded; charset=utf-8
    Content-Length: 50

    usrsw1=536931384&usrsw2=0&ontime=1926&thermof=74.5

    Server Log:

    POST /onep:v1/stack/alias  HTTP/1.1
    Host: texasinstruments.m2.exosite.com
    POST /onep:v1/stack/alias  HTTP/1.1
    Content-Type: application/x-www-form-urlencoded; charset=utf-8
    Content-Length: 50

  • In reply to Mike Aanenson:

    For the sake of completeness, I've found the same thing coming out over the Ethernet port from my wireshark log:

    GET /onep:v1/stack/alias?ledd1&ledd2&thresh  HTTP/1.1
    Host: texasinstruments.m2.exosite.com
    GET /onep:v1/stack/alias?ledd1&ledd2&thresh  HTTP/1.1
    HoUser-Agent:ti-ek-tm4c1294xl
    Exo-Log: fullrequest
    Accept: application/x-www-form-urlencoded; charset=utf-8
    
    

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.