This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

NDK: High task stack usage with TCP loopback

Hello,

In my application, I had stack overflow problems caused by excessive recursive calls within the NDK.

The problem occurs when using TCP loopback (127.0.0.1) to transfer data between two tasks / sockets. If the transmit buffers of both sockets hold multiple outgoing packets, TcpOutput(), IPTxPacket(), IPRxPacket() and TcpInput() are called recursively for each packet. The stack usage was more than 16 kB in my application.

I added entry/exit logging to the NDK. This is an example of the log when this problem occurs:

sock/sock.c line 1461: SockRecv(): Total=16548
sock/sock.c line 1549: SockRecv(): SBRead()
sock/sock.c line 1596: SockRecv(): SockPrRecv()
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177663525, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177647705, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177664985, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177649165, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177666445, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177650625, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177667905, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177652085, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177669365, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177653545, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177670825, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177655005, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177672285, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177656465, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177673745, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177657925, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177675205, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177659385, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177676665, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177660845, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177678125, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177662305, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177679585, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177663765, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177681045, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177665225, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177682505, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177666685, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177683965, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177668145, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 703: TcpOutput(): Seq=2177685425, len=1460
tcp/tcpin.c line 171: TcpInput(): Entry
tcp/tcpout.c line 240: TcpOutput(): Entry
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
tcp/tcpin.c line 1060: TcpInput(): Exit
tcp/tcpout.c line 497: TcpOutput(): Exit
sock/sock.c line 1613: SockRecv(): return

Is this a known problem?

Thanks,
Ralf

  • After further testing I had a situation, where even a stack size of 32kB wasn't enough.

    I'm not sure if there is an easy way to break this recursive behaviour. One solution could be to put the outgoing looback packets into a spezial PBM queue and then send a signal the network scheduler, which would process the packets later.

    Any help would be appreciated.

    Ralf

  • Ralf,

    Sorry for the late reply. Can we know what device you're using and your version of NDK and TI-RTOS/

    Thanks,

    Moses

  • I'm using NDK 2.22.3.20 and SYS/BIOS 6.35.4.50.

    The device is a C6678 but I don't think this is important, because the NIMU code isn't used for TCP loopback.

    Thanks,
    Ralf

  • Ralf,

    I want to make sure I clearly understand what you're doing. It sounds like you're transferring data between sockets on 2 tasks running on the same device. Is this correct?

    Can i have more details on how you're testing this: "The problem occurs when using TCP loopback (127.0.0.1) to transfer data between two tasks / sockets"?

    Moses

  • Yes, I'm transferring data between tasks on the same device.

    Both tasks are opening a TCP socket connection to each other using the loopback address. Then, both tasks start sending and receiving blocks of data to each other. The first task calls send() and recv() in a loop, while the other calls recv() and then send() in a loop.

    The problem seems to be difficult to isolate. Other tasks using the NDK seem to interfere with the problem. One situation causing this recursive behavior is when PBM buffers are depleted. This can be simulated by randomly dropping packets in TcpOutput(), where SockCreatePacket() would return 0 when no buffers are left.

    Another situation this recursion occurs is when one side of the connection gets closed, while the other end is still sending data. This can easily be reproduced with one task:

    #include <stdio.h>
    #include <stdlib.h>
    #include <ti/ndk/inc/netmain.h>

    #define BLOCKSIZE 16384

    int testSocketClose()
    {
        struct sockaddr_in sin;
        int len;
        SOCKET s_client, s_listen, s_server;
        unsigned int *pBuffer;

        // Create client and server sockets (TCP)
        s_client = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
        s_listen = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

        pBuffer = (unsigned int *)malloc(BLOCKSIZE);
        if (pBuffer == NULL)
        {
            printf("Error: failed buffer allocation\n");
            return -1;
        }

        // Prepare server socket
        memset(&sin, 0, sizeof(sin));
        sin.sin_family      = AF_INET;
        sin.sin_len         = sizeof(sin);
        sin.sin_addr.s_addr = INADDR_ANY;
        sin.sin_port        = htons(5000);
        if (bind(s_listen, (PSA) &sin, sizeof(sin)) < 0)
        {
            printf("failed bind (%d)\n", fdError());
            return -1;
        }
        if (listen(s_listen, 1) < 0)
        {
            printf("failed listen (%d)\n", fdError());
            return -1;
        }


        // Prepare address for connect
        bzero( &sin, sizeof(struct sockaddr_in) );
        sin.sin_family      = AF_INET;
        sin.sin_len         = sizeof(sin);
        sin.sin_addr.s_addr = inet_addr("127.0.0.1");
        sin.sin_port        = htons(5000);

        // Connect to server
        if (connect(s_client, (PSA)&sin, sizeof(sin)) < 0)
        {
            printf("Error: failed connect (%d)\n", fdError());
            return -1;
        }

        // accept conntection
        len = sizeof(sin);
        s_server = accept(s_listen, (PSA)&sin, &len);
        if (s_server == INVALID_SOCKET)
        {
            printf("failed accept (%d)\n", fdError());
            return -1;
        }

        // Close one side of the connection
        fdClose(s_client);

        // Send data => stack overflow occurs
        if (send(s_server, pBuffer, BLOCKSIZE, 0) < 0)
        {
            printf("failed send (%d)\n", fdError());
            return -1;
        }

        fdClose(s_server);
        fdClose(s_listen);
        free(pBuffer);

        return 0;
    }

    Thanks,
    Ralf

  • Ralf,

    Looking at your code, I see that you have your server and client code inter-leaved and that might cause problems. First off, I see you your client calls connect() before the server is even ready to accept. Your server should be blocking on accept() before the client calls connect(). Try separating your server to one task(higher priority so it runs first) and the client to another task. In this case, you're client task will not run till your server pends on accept().

    Try this and let me know if it fixes your problem.

    Moses

  • Moses,

    this code actually works for me and I have no problems connecting both ends within one task. I think the important part is that listen() is called before connect(). Connected sockets are first put into a listen queue. accept() can be called any time later and it takes the first connection on the listen queue (see also: SPRU524H, section 3.3.3).

    My application actually uses two tasks, I just simplified it to create a test case.

    Ralf

  • Ralf,

    I see. I'll work on reproducing this issue and I'll get back to you.

    Thanks,

    Moses

  • Hi Moses,

    are you able to reproduce this issue?

    Thanks,
    Ralf

  • Hi Moses,

    any update on this issue?

    Thanks,
    Ralf

  • It seems that Moses doesn't respond to this thread any more.

    Can someone else reproduce this problem?

    Thanks,

    Ralf

  • I'm back. Sorry for letting this fall out of our radar.
    I've ran the code you provided and haven't seen any stack overflow. I'm using a different board (EK_TM4C1294) but with the same NDK version you have. Do you have other processes running in your application? Is there anything else I'm missing?

    Is it your Task stack that is overflowing? Is the program supposed to crash after the socket send?

    Thanks
    Moses
  • Thanks for taking the time.

    In my original application, the problem first occured when multiple tasks are using the NDK simultaneously. This caused a depletion of all PBM buffers which seems to trigger this recursive behavior.
    The test code however also triggers a similar situation without any other processes running.

    I'm not sure about the behavior of the program after a stack overflow. If your stack size is large enough, no overflow will occur and you don't have any problems. But you should see a increase of the Task.stackPeak in RTOS object view after calling send().

    The acutally used stack size depends on the number of recursive iterations. This is influenced by the configured TCP transmit and receive buffer sizes and is also limited by BLOCKSIZE in the test code.

    Here is an example when using a TCP transmit and receive buffer size of 32768 and a BLOCKSIZE of 32768:
    Before send():

    After send():

    If your buffer sizes are smaller, the increase in stack usage will also be smaller.

    Thanks,
    Ralf

  • Ralf,

        I'm able to get my hands on a C6678 device and would like to replicate the setup you have as much as I can in order to reproduce this. I'm running your test with the same version of NDK on a different device and I'm not seeing what your're seeing. It would help if you can zip up your test project and send it to me. It'll help if it's as simple as possible. I see from ROV screen shot that you have other tasks running as well, can you remove those and have just the task that runs the test. Also, I'm believing you made changes to some NDK sources and rebuilt it. Can you attach the files changed so I can see what you're doing.

    Thanks,

    Moses

  • Hello Moses,

    I didn't make any changes to the NDK. But my application is using the legacy configuration API instead of XCONF (enableCodeGeneration set to false). This is why you may see other task states in your program. In contrast to XCONF, "NDK Stack Thread" gets terminated after initialization. Therefore, "GBIT_NetTask" is created to run the network scheduler instead.
    The idle task is required because Task.deleteTerminatedTasks is set by the NDK. "MainTask" is used to run the test code. The only thing I could remove was "daemon", which was running the NDK http server. But it doesn't make any difference.

    My program will not work on a different type of hardware (e.g. different PLL settings). But I created a modified version of the MCSDK example in mcsdk_2_01_02_06\examples\ndk\helloWorld:
    1385.helloWorld.zip

    The archive contains a stripped version of the executable which should run on the C6678 EVM. The program prints the stack usage before and after send() in the CCS console. Example:
    Stack usage before send(): 1776
    Stack usage after send(): 23872

    Thanks,
    Ralf

  • Ralf,

        Thanks for sending your project. I'll run it and get back to you.

    Thanks,

    Moses

  • Update:
    I get a similar increase in stack usage if I run my test function on different hardware and software environment:

    • Custom C6415 hardware, ethernet controller connected using PCI
    • DSP/BIOS 5.42.0.07
    • NDK 1.93

    Ralf

  • Ralf,

        Got my hands on a C6678 EVM and ran your application. I'm running your application and DHCP isn't working. I'll should figure out the DHCP thing on monday. In the meantime can you send the following file: helloWorld_pe66.rov.xs. I need it to be able to have ROV working for your application.

    Thanks,

    Moses

  • Hi Moses,

    I think ROV will not work anyway, because I only uploaded a stripped version of the binary. The original file is somwhat large. But the stripped binary should print the stack usage on the console window. You can also try to recompile the project yourself. Maybe you need to install the MCSDK.

    The steps for loading an application to the EVM are described here:
    http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_Getting_Started_Guide#Use_JTAG_to_Load_the_Application

    Ralf

  • Ralf,

      Thanks. I've finally been able to reproduce it. I played around with some of the variables and the 2 that seem key to producing the issue is the BLOCKSIZE and the  fdClose(s_client). When I reduce the BLOCKSIZE the stack usage doesn't blow up anymore. Also regardless of BLOCKSIZE, the code that closes the client seems to be key. You said you had other instances that this shows up, in those cases did a client file descriptor get closed before the server sends data? With the client closed, I maybe expected send() to return an error. From a high-level, it looks like the client not being available causes something to happen repeatedly thus causing the stack to grow a lot. It's definitely a bug but we need to characterize it more. I'll carry out some more tests. Let me know of other cases you have.

    Thanks,

    Moses

  • In my original application, I'm not closing the socket before the remote side sends data. The communication continues to work if the stack size is large enough.

    As I tried to explain in my first posts, there is a recursion happening between the following NDK functions which leads to the grow in stack usage:
    TcpOutput() > IPTxPacket() > IPRxPacket() > TcpInput() > TcpOutput() > ...

    In my application, the recursion seems to be caused by depletion of PBM buffers. I think that the transmit buffers on both ends start filling up while each peer is sending data to the remote side. When PBM buffers become available again, the recursion suddenly starts happening.

    The test code in testSocketClose() is causing a similar recursion in a different way. The value of BLOCKSIZE affects the number of recursive interations (= number of packets) between both sockets.

    I'm not even sure this is a bug. It seems more implementation specific and undocumented to me.

    Thanks,
    Ralf

  • Maybe a NDK expert should take a look into this issue and decide if this is a bug or normal / undocumented behaviour.

    By the way, the NDK User's Guide already has a statement about recursion and stack sizes:

    3.1.2.2.1 Stack Sizes for Network Tasks
    Care should be taken when choosing a Task stack size. Due to its recursive nature, a Task tends to
    consume a significant amount of stack. A stack size of 3072 is appropriate for UDP based
    communications. For TCP, 4096 should be used as a minimum, with 5120 being chosen for protocol
    servers. The thread that calls the NETCTRL library functions should have a stack size of at least 4096
    bytes. If lesser values are used, stack overflow conditions may occur.

    Thanks,
    Ralf