This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMDSICE3359: Timer in NDK stack

Part Number: TMDSICE3359

Hello,

I've a little problem with the different timer in the NDK. To explain it I've attached the following trace:

It can be seen that the first SYN requires an ARP for the ARP cache. But as this is a robustness test the IP 192.168.1.2 will not response this ARP. Meanwhile more SYNs arrive. This leads to a filled backlog queue of my listening socket. That's no problem but it takes 11 Minutes to recover from this situation which is to long for the test. In this test I had a listening socket with a backlog of 3.

My question: How can I reduce this timeout? I think it have to be 2 timer which have to be changed (ARP & TCP)

Thank you in advance and best regards

Stefan

backlog_killer.zip

  • Hi Stefan,

    Can you tell us which Processor SDK RTOS for AM335x are you using? (6.1.0.8?)

    Ming

  • Hi Ming,

    we are using:

    • PRU-ICSS-EthernetIP_Adapter_01.00.03.04
    • processor_sdk_rtos_am335x_5_01_00_11
    • bios_6_73_00_12
    • ndk_3_40_01_01

    Thank you and best regards,

    Stefan

  • Hi Ming,

    can you tell me which timer have to be decreased for this ARP-TCP-combination?

    Thank you in advance and best regards,

    Stefan

  • Hi Stefan,

    Let me follow up with NDK team and then get back to you...

    Regards,
    Garrett

  • Stefan,

    The easiest way to update the ARP and TCP connection timeout is probably to make change from .cfg file:

    Also you may have figured out to use APIs like:

    Int timeout = 1;
    CfgAddEntry(hCfg, CFGTAG_IP, CFGITEM_IP_SOCKTIMECONNECT,
                CFG_ADDMODE_UNIQUE, sizeof(uint), (UINT8 *)&timeout, 0);
    Regards,
    Garrett

  • Hi Garrett,

    I haven't tried it yet but I think the numbers don't add up properly:

    • I have 3 pending connections in my backlog
    • It takes about 11 minutes for the NDK to clean it up

    • routeDownTime is 20 seconds in my case
    • CFGITEM_IP_SOCKTIMECONNECT is on 24 seconds
    • I don't see any combination of these numbers add up to 11minutes

    I think something is missing, but I will give it a try...

    Best regards

    Stefan

  • Hi Garrett,

    so I've set

    • routeDownTime and
    • CFGITEM_IP_SOCKTIMECONNECT

    to one second and it didn't help.

    Could you ask somebody from the NDK team who knows which timers have to be touched in the described scenario as two different protocols (ARP and TCP) interacting each other in an unusual way.

    Thank you and best regards,

    Stefan

  • Hi Stefan,

    >>This leads to a filled backlog queue of my listening socket.

    Do you mean the 'blacklog' is your listening socket’s total connection count? Do you get any error from accept() function? 

    It seems the accept() creates a new socket when a SYN packet arrives, then the NDK tries to send a SYN/ACK but is blocked on ARP as no ARP responses occur. There might be a timeout for the spawned socket in the TCP algorithm about never receiving a final ACK.

    Regards,

    Garrett

  • Hi Garrett,

    somehow my response from last week didn't arrived in this thread:

    > Do you mean the 'blacklog' is your listening socket’s total connection count?

    When I'm starting listening on a socket I define the size of the backlog of this listening socket. It's kinda of a queue where pending connections wait for an accept().

    > Do you get any error from accept() function?

    The problem is that accept() is never called as the ARP is not succesfull.

    Like you mentioned there should be one or two timeouts which can be modified. But I tried a bunch of timeout without success. Could you tell me the right one?

    Thank you and best regards,

    Stefan

  • Hi Stefan,

    Thanks for answering those questions for us. 

    It looks like what's happening is the TCP algorithm is attempting a SYN-ACK response to each SYN packet. These never go out the wire though because ARP is never resolved. The TCP algorithm will retry this 3 times before giving up, and that can take some time. So the timer involved is the re-transmit timer.

    The tcp re-transmit timer is found in ti/ndk/stack/tcp/tcptime.c

    /* tcp_backoff - Exponential backoff applied to Timer Ticks */
    static uint32_t tcp_backoff[TCP_MAXBACKOFF+1] =
           { 1, 2, 4, 8, 16, 32, 64, 64, 64, 128, 128, 128, 128 };

    The timeouts between re transmissions exponentially increase. The RFCs regarding TCP dictate the timer work like this, and this timer is used throughout TCP code. So you can change it, but I strongly suggest you don't.

    Maybe there is another way to accomplish what you want though. When you say "it takes 11 Minutes to recover from this situation" what do you mean by recover? The tcp code I mentioned above would recover from the situation eventually by closing any sockets spawned by SYN packets. If that behavior is okay with you, maybe you could forcibly close the listening socket after some timeout of your choosing? 

    Regards,

    Dalton

  • Hi Dalton,

    thank you for this.This seems to explain why the other timer don't affect the recovering time I'm observing.

    Regarding the recovering time of 11mintues:

    The test spams thousands of SYNs on the listening socket of my device. The listening socket has a backlog of three and queues the first SYNs. It takes about 11 minutes to recover from this till the device accepts another SYN from a different device.

    So after seeing the tcp_backoff[] array I thought there will be TCP_MAXBACKOFF+1 ARPs per SYN. But somehow I'm seeing much more. And the ARPs follow a strange time pattern:

    0s, 1s, 2s, 3s, 57.9s, 59.9s, 60.9s, 61.9s, 62.9s, 117.9s, 120s, 121s, 123s, 177.9s, 180s, 181s,...

    So I'm seeing jumps of 1s, 2s and 54.9s in the ARP repetitions.

    Somehow I thought that the backlog mechanism is working as I never saw more than 3 SYNs stacked in the queue. But there are much more ARPs...

    Do you have any idea for this behaviour?

    And more important: Do you have any idea how to decrease this recovery time?

    My idea at the moment: a special treatment for TcpSetPersist of non established connections

    if (flags & (TCP_SYN | TCP_ACK) == (TCP_SYN | TCP_ACK))
    {
    /* something different */
    }
    else
    {
    TcpSetPersist( pt );
    }

    Thank you in advance and best regards

    Stefan

  • Hey Stefan,

    ARP also has an exponential re transmit timeout, but that doesn't really explain the large jumps in time you are seeing. I'm going to have to try to reproduce this, but it will take me a bit of time to get my network setup correct for testing this, as we are working from home these days. I'll send an update tomorrow.

    If you can send me the test script you are running against the NDK that would be very helpful.

    As for your tcp suggestion. I'd generally be cautious about manipulating any of that code. It is meant to match up with TCP RFCs, but I'll look into this while reproducing and see what can be done. Again it would be ideal if you could handle this with application level code.

    Regards,

    Dalton

  • Hi Dalton,

    sorry for the late response.

    I can't send you the test script as it's a very big licensed test software.  But a tool like ostinato let you generate this kind of packets pretty fast.

    I don't see any chance to handle this kind of behaviour on application level. Or maybe I'm missing something but I don't think I get any callbacks if the ARPs or SYN-ACKs are timing out

    Regards,

    Stefan

  • Hi Stefan,

    No problem on the test software. I can roughly reproduce this by generating packets with scapy. Still working through some ideas to reduce the timer here at the moment. 

    You are right that you don't get any callbacks for timeouts. I'm not sure how much you can design your server for this test, but a timeout on accept() would allow you to close the listening socket if you never receiving anything from accept (ie the 3 way handshake fails like it is for you) in a given timeout. 

    Regards,

    Dalton

  • Hi Dalton,

    as you can reproduce it. Did you also get this strange jumps in the retransmission time?

    The approach with closing the socket could work for a client. But I don't know if it's a good idea as a server. There's also a monitor in the test system who's checking the TCP ports all the time...

    I'm hoping you can develop a nice solution.

    Thank you and regards,
    Stefan

  • Hi Stefan,

    I am not seeing the odd re-transmission case you are, but it might because my current test script is constructing packets in such a way that the NDK is noticing duplicate packets. So if I send 10 syn packets 9 are detected as duplicates. However I still see it taking roughly 11 minutes for a SYN generated packet to finally timeout and drop. 

    If your test is sending syn packets in such a way that the NDK can't detect them as duplicates (we do this with the seq number) you might be seeing 4 separate re-transmissions as each syn packet is creating a new socket up until you reach your listening sockets max connections. That would also explain why your listening socket backlog is getting full.

    In your test can you examine the NDK_tcps struct (could just be called tcps depending on your NDK version) through the CSS expressions window after the 11 minute period of cleaning up sockets and post a picture of that struct? That will let me know if the NDK is detecting duplicate packets for you. 

    Regards,

    Dalton

  • Hi Dalton,

    at the moment I've set up my DUT for anothers tests which will take one or two days. As I correctly remember there have been duplicated SYNs because the TCP port field is just 16bit large and after a few seconds all ports have been used and the test system reuse them as the storm lasts longer. The sequence number is always zero in the test system SYNs...

    As you see this timeout of 11minutes too:

    • any idea how to decrease it?
    • what do you think about the solution approach with a special case for SYN-ACKs with a faster timeout?

    Thank you and best regards,

    Stefan




  • Hi Stefan,

    No problem on getting the test setup ready. I think I have solution that will work for you though. I think a reduced SYN-RECVD will help you out here. RFC4987 actually specifically mentions this as a mitigation technique for SYN flood attacks. Note that it also states that this technique is not perfect, but I think it will be sufficient for your test.

    To actually implement this you will need to modify the TcpTimeoutRexmt() instead of TcpSetPersist(). It's up to you how much you want to reduce the timeout (or completely eliminate it) for SYN-RECVD state packets but here's an example of a change I made:

    void TcpTimeoutRexmt( TCPPROT *pt )
    {
        uint32_t Ticks;
    
        DbgPrintf(DBG_INFO,"TcpTimeoutRexmt: Retransmit Timeout");
    
        /* Message has not been acked within retransmit interval. */
        /* Back off to a longer retransmit interval and retransmit one segment. */
        if( ++pt->t_rtxindex > TCP_MAXBACKOFF ||
           (pt->t_state == TSTATE_SYNRCVD && pt->t_rtxindex > 2 ))
        {
            /* Already at max - drop it like a bad habit */
            pt->t_rtxindex = TCP_MAXBACKOFF;
    
            /* Bump the drop stats */
            NDK_tcps.TimeoutDrops++;
    
            /* Set socket error and drop connection */
            TcpDrop( pt, NDK_ETIMEDOUT );
            return;
        }

    I tested this with my own syn flood test and this cleaned up my syn sockets in under 3 minutes! 

    Regards,

    Dalton

  • Hi Dalton,

    thank you that looks great. As I'm in homeoffice I will try it tomorrow and give you a feedback about this.

    Best regards,

    Stefan

  • Hi Dalton

    it worked like a charm. Thank you again for this nice solution.

    Best regards,
    Stefan