This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TI-RTOS-MCU: Issue with TCP Client reconnect using TM4C129E Launchpad and TI RTOS NDK

Part Number: TI-RTOS-MCU

Hi, 

Here is the issue I am having. I wrote a TCP client program using TM4C129E launchpad and NDK and I tested it with a program called Socket Test and Packet Sender and it works. 

When the Server (Socket Test or Packet Sender) disconnects first, my TCP client is able to call close() successfully. Whenever I reconnect to the server, I am able to do it with no problems. However, when the TCP client (launchpad) calls close() first, I am still able to disconnect successfully because close would not return an error. I noticed the server disconnects too (at least from what I am seeing based on the gui but I have no idea what goes on under the hood). Whenever I reconnect, it fails for some period of time and then eventually I am able to reconnect again. But I have to wait for sometime. During the time that reconnection fails, I noticed an error in bind. and using fdError, I get error 48 which I think is EADDRINUSE. So what I did was call setsockopt and enable SO_REUSEPORT. This eliminated the error in bind. However, now I get an error when calling connect(). I called fdError and it's still 48. How is this possible? I have closed the client successfully and the server seems to have closed the socket as well. I did some digging and found out about TIME_WAIT. How do you change the TIME_WAIT here? I'm not sure if this is the solution. 

Since I do not have a complete control of the server, I decided to do another test. I got another launchpad and set it up as a TCP server. Now I know I have complete control of the close() functioning the server side. When I call close() on the server side, close returns successful on server and client also calls close successfully. Reconnection is not a problem. However, when I initiate disconnect on client side(call close here first), I see that close is called successfully on client and even on the server. Looks like, in both sides, socket is closed. However, I am not able to reconnect the client right away. I need to wait for some time before I am able to connect. I am experiencing the same issue as explained above. I keep on getting error 48 for a certain amount of time when calling bind, or when calling connect if I have set SO_REUSEPORT.

Any solution to this matter? 

Thanks.

AJ

  • An engineer will look at this on Tues.

  • Hi AJ,

    What you've encountered is a well known issue and is part of the TCP protocol itself. The behavior of close() varies depending on which side of the connection calls close first(), and this also dictates who must enter the TIME_WAIT state.

    There are 2 types of close() calls - "active" and "passive":

    1. Active Close:
      1. the side which calls close() on a connected socket first performs an active close
      2. this socket will enter the TCP TIME_WAIT state. This is part of the TCP protocol.
    2. Passive Close:
      1. This is when close is called on a socket, due to the other side of the connection calling close
      2. For example, the other side (active closer) called close, which results in this side's sockets APIs failing due to that (e.g. recv() returns a failure) and then since the socket API failed, the close is called in result (this is the passive close)
      3. this socket will get clean up relatively fast, as it *does not* need to wait in TIME_WAIT

    The general consensus is that applications should be designed so that the client calls close first (active closer) therefore allowing the server to close quickly (passive closer). This is to prevent a build up in the server of sockets stuck in TIME_WAIT.

    Note that it is possible to force a socket to exit time wait by configuring it to be a linger socket and setting its linger time to zero. However, this is seen as a hack and not recommended.

    Please refer to the following thread for further details:

    https://stackoverflow.com/questions/3757289/when-is-tcp-option-so-linger-0-required

    and especially the answer here.

    Steve

    (edit: fixed broken link)

  • Hi Steve,

    Thanks for the response. I clicked "here" link but web page says it no longer exists. I also clicked the other link and read about SO_LINGER. Just to make sure I understood you correctly - So in my application, The TM4C is the client and will always initiate the close. That's how we've designed our system. In this case then, my device is the active closer, which is why there is that delay. And the only thing I could do to reduce that delay is select a small value for SO_LINGER? Did I get that right? Also, just curious, what is the default value for this delay? Is this value standard or just here in TI NDK? I couldn't  open the other link so I'm not sure if there is another solution.

    Thanks.

    AJ

  • Another thing, the reason I asked if there is a standard default value is because I noticed that the socket Test program, when set up as a client and active closer is able to reconnect without waiting. I am assuming, whoever designed this program probably just set SO_LINGER to zero or a small value? Is there a recommended value for SO_LINGER in most applications? 

    Thanks again

    AJ

  • AJ,

    My bad, I accidentally copied the wrong link in my other post. I fixed it there but here it is also: https://stackoverflow.com/a/13088864

    This too: http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html

    AJ_ee said:
    And the only thing I could do to reduce that delay is select a small value for SO_LINGER? Did I get that right?

    Yes, your options are to either wait (recommended) or configure the socket to be a linger socket with a timeout of zero (not recommended, do at your own risk).

    Like I mentioned before, the linger + zero timeout isn't recommended. The reason is that it's a bit of a hack; when you do this, your TCP connection will not close normally. Hopefully the shared links will help explain that better.

    AJ_ee said:
    Also, just curious, what is the default value for this delay? Is this value standard or just here in TI NDK?

    There is no default setting - sockets are not linger sockets by default. You would need to configure your socket to be a linger socket by calling setsockopt() with the SO_LINGER option.

    In that call, you must provide a struct linger in which you specify the linger time.

    So, you might say the answer is whatever you initialize it to be :)

    AJ_ee said:
    Whenever I reconnect, it fails for some period of time and then eventually I am able to reconnect again. But I have to wait for sometime. During the time that reconnection fails, I noticed an error in bind. and using fdError, I get error 48 which I think is EADDRINUSE. So what I did was call setsockopt and enable SO_REUSEPORT. This eliminated the error in bind. However, now I get an error when calling connect().

    Just wanted to clarify/confirm one detail in the above (from your original post) - that you are trying to re-use the same socket (that you closed previously) to re-connect again to the server app?

    Steve

  • Hi Steve,

    I don't need to reuse the socket. I mean it doesn't matter as long as I can get the client to connect again. Every time I try to connect, I call socket(). I am assuming I get assigned a random one. What I did was use SO_REUSEPORT. That prevented the error occur in bind. But the error occurred in connect(). 

    Just to clarify,when you say linger socket, SO_LINGER is set to 0 right? Or does it mean that as long I call setsockopt with SO_LINGER, even if it is set to a non-zero value, that's also a linger socket? In other words, If I configure socket with an SO_LINGER that's non-zero, that's still okay right or using SO_LINGER is not recommended at all regardless whether it is zero or non-zero? For example, 10 sec of wait is what I planned to do. Also, when I ask about default, what I meant was if I did not call setsockopt with SO_LINGER, how long is TIME_WAIT? Is this value standard for any TCP stack? 

    Thanks.

    Regards,

    AJ

  • Hi AJ,

    AJ_ee said:
    Just to clarify,when you say linger socket, SO_LINGER is set to 0 right? Or does it mean that as long I call setsockopt with SO_LINGER, even if it is set to a non-zero value, that's also a linger socket?

    No, and yes. Let me define it like this:

    • A Lingering Socket is a socket that is currently blocked, awaiting the Linger Time to expire, before proceeding on to actually closing and having its resources freed.
    • A Linger Socket is a socket that will linger for the Linger Time once that socket is closed

    So, the time value has nothing to do with being a linger socket. When you set the Linger Time to 0, it is still a linger socket, but it will just linger for a time of 0 (i.e. it will try to linger but immediately timeout so doesn't really linger at all).

    AJ_ee said:
    If I configure socket with an SO_LINGER that's non-zero, that's still okay right

    Yes this is OK. You're "supposed to" configure a non-zero time.

    The Linger Time of 0 is the one that's questionable and being debated in those linked threads.

    AJ_ee said:
    how long is TIME_WAIT? Is this value standard for any TCP stack?

    The standard is "2 x MSL (max segment lifetime)"

    But, it seems that MSL varies depending on the implementation.

    In NDK, it's set as follows:

    // in tcp.h
    /* Timer Constants (in 1/10 second ticks) */
    #define TCPTV_MSL               300     /* max seg lifetime (30 sec) */
    
    ...
    
    static void TcpEnterTimeWait( TCPPROT *pt )
    {
     ...
        pt->TicksWait2   = 2 * TCPTV_MSL;

    So should be 1 minute.

    Steve

  • Hi Steve,

    Thank you very much. You've answered my questions well. You may close this thread.

    Regards,

    Albert