This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C3200 driver stuck - TCP sockets (TxPoolCnt = 1)

Guru 80405 points
Other Parts Discussed in Thread: CC3200, CC3200-LAUNCHXL

Hi,

We found issue when SimpleLink driver stuck in high “bandwidth” application with TCP sockets. This is same problem as was discussed at this thread: https://e2e.ti.com/support/wireless_connectivity/simplelink_wifi_cc31xx_cc32xx/f/968/t/432080. Only difference is that reported issue was with UDP sockets, our problem is with TCP communication. According our investigation it looks that problem is in SimpleLink driver or NWP firmware itself.

 

Information about application:

We started developing own web server in CC3200 application processor. We have main listening task which accept incoming TCP connection and two tasks for client connections. Main task accept connection and sets client socket handle. This unblock client task. Client task read incoming data (http header sent from browser), open file from flash memory and sent data back. Simplelink task have higher priority than listening and client task.

- We use production version of CC3200-LAUNCHXL or CC3200MODLAUNCHXL

- TI-RTOS

- Latest service pack v1.0.1.5.0 and SDK v1.1.0

- Issue reproduced with multiple types of AP from different vendors

 

Example of client socket sending code:

while(1) {
 // wait for valid handle set from listening task and read http header
 // …

 while(fileLen > 0) {
  // read data from flash file into sendBuff with length in toSend
  // …

  retVal = SL_EAGAIN;

  while(retVal == SL_EAGAIN) {
   //if(g_pCB->FlowContCB.TxPoolCnt < 5) osi_Sleep(10);
   retVal=sl_Send(sock, sendBuff, toSend, 0);
   //retVal2=_SlDrvMsgRead();
   osi_Sleep(1);
  }
  if (retVal < 0) break;
 }
 
 // close socket, etc.
}

 

Test conditions:

We have connected CC3200 connected to AP. Web browser Firefox have open 5 tabs. Each tab have set refresh interval 2 sec and ask for jpeg file from CC3200 device. Size of file is about 150kB and is loaded in ~600ms.

Communication with browsers works for some time correctly. But when I load WiFi network by another devices (e.g. HD streaming video) is start network communication with CC3200 slowing. Global g_pCB->FlowContCB.TxPoolCnt starts fluctuating and when decreased to 1 SimpleLink driver stuck. It stuck at line 777 in file driver.c (_SlDrvDataWriteOp()). Code at this line is VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN + 1 ); This code is nothing else then forever loop when is TxPoolCnt lower than 2. When driver stuck there is all communication using SimpleLink driver not possible. Only restart of application MCU can recover this state.

 I tried this points:

1. Add line retVal2=_SlDrvMsgRead(); behind sl_Send() – suggestion from UDP forum thread – this decrease fluctuation of TxPoolCnt but problem was not fixed, only is decreased probability

2. Add variable sleep according TxPoolCnt size – similar result as point 1

3. Add long sleep (e.g. osi_Sleep(10);) – not acceptable because is dramatically decrease “bandwidth”

 

My questions:

1. It is available any fix for this issue? I expect that this is known issue because it was already reported with UDP sockets.

2. It is any way how recover SimpleLink driver when stuck without reset of application processor?

3. Any other suggestion will be welcome.

 

This is serious issue for us. If any clarification or sent confidential information will be required, you can contact me at my email address. We have signed NDA with TI regarding to Wireless connectivity.

 

 Thank you for answer

 

Jan

  • I have small update.

    After studying code in function _SlDrvDataWriteOp() at driver.c, I am almost 100% that there is any problem with synchronization objects. In my point of view it looks all OK, but there is definitely some bug. I am not able to find him.


    Jan
  • Hi,

    According to what you are describing, you are stuck at VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN + 1 );.

    The only way to exit the wile loop and get to this line is if the following condition is met:  if(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN + 1 ).

    So you are describing case where another task is modifying g_pCB->FlowContCB.TxPoolCnt.

    Theoretically I cannot see it happening since there is TxLockObj and GlobalLockObj but I need some more information.

    1. how many tasks are using sl_send() and sl_recv(). Is it only 2 tasks or more?
    2. can you tell the valur of g_pCB->FlowContCB.TxPoolCnt?
    3. what is the state of the other task when it happens?

    Basically, since extensive testing is done on the product with 16 sockets in parallel, most likely it is not the parallel tasks but related maybe to the behavior of tirtos. I'm looking into it in parallel.

    Shlomi

  • Hi,

    According my study of code driver.c, I am not able determine how this can occur. From my point view synchronization in _SlDrvDataWriteOp() looks implemented properly. I can confirm that this behaviour depends on using sl_send() from "parallel" tasks.

    Answers to your questions:

    1. Yes, I have only two client tasks with sl_send() and sl_recv().

    2. When driver stuck value of g_pCB->FlowContCB.TxPoolCnt is 1.

    3. Other tasks with higher priority tasks without SimpleLink access runs correctly. Tasks with SimpleLink are blocked at sl_ API call.

    Here is code of "server". Please excuse that code is that ugly. It is only a proof concept. By calling httpStartServer() is created main listening task httpServerTask(). This task open file handle "/www/wifi.jpg", create two client tasks httpClientTask() and start listening for incoming requests. Client task wait for valid socket handle in socketTable[], read http header and send back jpeg file to all requests.

    For simulation of load to server is used Firefox with 5 opened tabs with auto refresh interval 2sec for each tab.

    #include <stdio.h>
    #include <string.h>
    
    #include "simplelink.h"
    #include "protocol.h"
    #include "driver.h"
    
    #include "osi.h"
    #include "uart_if.h"
    #include "common.h"
    
    
    #define HTTP_STACK_SIZE                   4048
    #define HTTP_MAX_CONNECTIONS              2
    #define TM_DEST_PORT                      81
    
    int                                       socketTable[HTTP_MAX_CONNECTIONS];
    int                                       sockHanderNr = 0;
    int                                       sockNr;
    
    signed long                               fl;
    int                                       flSize;
    
    extern _SlDriverCb_t* g_pCB;
    
    
    //
    // close TCP socket
    //
    int httpCloseSocket(int socket) {
    
      int i = 0;
    
      if (socket < 0) return 0;
    
      do {
        if (sl_Close(socket) >= 0) return 0;
        osi_Sleep(500);
        // TODO break in case of disabled SimpleLink!
        i++;
      } while (i < 3);
    
      return -1;
    }
    
    
    //
    // open and set TCP socket
    //
    int httpOpenServerSocket(unsigned int tcpPort) {
    
      int             socketTCP, retVal;
      SlSockAddrIn_t  destAddr;
      long            nonBlocking = 1;
    
      // create socket
      socketTCP = sl_Socket(SL_AF_INET, SL_SOCK_STREAM, SL_IPPROTO_TCP);
      if (socketTCP < 0) return -1;
    
      // non blocking socket
      retVal = sl_SetSockOpt(socketTCP,
                             SL_SOL_SOCKET,
                             SL_SO_NONBLOCKING,
                             &nonBlocking,
    						 sizeof(nonBlocking));
      if (retVal < 0 ) {
    	httpCloseSocket(socketTCP);
    	return -2;
      }
    
      // bind socket
      destAddr.sin_family      = SL_AF_INET;
      destAddr.sin_port        = sl_Htons((unsigned short)tcpPort);
      destAddr.sin_addr.s_addr = 0;
      retVal = sl_Bind(socketTCP, (SlSockAddr_t *)&destAddr, sizeof(destAddr));
      if (retVal < 0) {
    	httpCloseSocket(socketTCP);
    	return -3;
      }
    
      // set listening
      retVal = sl_Listen(socketTCP, HTTP_MAX_CONNECTIONS);
      if (retVal < 0) {
    	httpCloseSocket(socketTCP);
    	return -4;
      }
    
      return socketTCP;
    }
    
    
    //
    // Client task
    //
    void httpClientTask(void *pvParameters) {
    
      // TODO break in case of disabled SimpleLink!
    
      unsigned char buffHndl[1401];
    
      int l, e, e2, retVal, op;
      int sock, i, readLen, readCyc, readPos;
    
      int param = sockHanderNr;
      socketTable[param] = -1;
      sockHanderNr++;
    
      UART_PRINT("client: Client %d Thread created\n\r", param);
    
      while(1) {
    
        // wait for available socket
        while (socketTable[param] == -1) osi_Sleep(10);
        sock = socketTable[param];
    
        e  = 0;
        e2 = 0;
        while (1) {
    
          l = sl_Recv(sock, buffHndl, 1000, 0);
          /*osi_Sleep(1);*/
    
          // wait for data
          if (l == SL_EAGAIN) {
            osi_Sleep(30);
            continue;
          }
          // socket error or timeout 200ms
          if ((l < 0) || ((e2++) > 200)) {
            e = 0;
            break;
          }
          // wait for CR LF, later will be reworked
          for (i = 0; i < l; i++) {
            if (buffHndl[i] == 0x0A) e++;
            if (buffHndl[i] > 0x1F) e = 0;
          }
          if (e == 2) break;
        }
    
        if (e == 2) {
    
          readLen = flSize;
          readCyc = 1400;
          readPos = 0;
          op = 0;
    
          // send http header, return code not handled yet...
          sprintf(buffHndl, "HTTP/1.0 200\r\nContent-type: image/jpeg\r\nConection: close\r\n\r\n");
          sl_Send(sock, buffHndl, strlen(buffHndl), 0);
    
          while (readLen > 0) {
    
        	// less than 1400B to read...
        	if (readLen < 1400) readCyc = readLen;
            // read from file
            retVal = sl_FsRead(fl, readPos, buffHndl, readCyc);
            if (retVal < 0) {
              UART_PRINT("client: sock (%d) read file error!\n\r", sock);
              break;
            }
    
            readLen = readLen - readCyc;
            readPos = readPos + readCyc;
    
            retVal = SL_EAGAIN;
            while (retVal == SL_EAGAIN) {
              // send to socket
              retVal = sl_Send(sock, buffHndl, readCyc, 0);
              osi_Sleep(1);
         	  if (retVal == SL_EAGAIN) op++;
            }
            // handle socket error
            if (retVal < 0) {
        	  UART_PRINT("client: sock (%d) err - %d!\n\r", sock, retVal);
        	  break;
            }
          }
    
          UART_PRINT("client: sock (%d) repeat-send: (%d) TxPoolCnt: (%d)\n\r", sock, op, g_pCB->FlowContCB.TxPoolCnt);
        }
    
        sl_Close(sock);
        socketTable[param] = -1;
        sockNr--;
      }
    }
    
    
    //
    // server task
    //
    void httpServerTask(void *pvParameters) {
    
      int                    listenSocket;
      int                    clientSocket;
      struct SlSockAddrIn_t  sourceAddr;
      int                    errorCode;
      int                    addrLen;
      long                   nonBlocking = 1;
    
      int                    i;
    
      UART_PRINT("http: Create client tasks\n\r");
      for (i = 0; i < HTTP_MAX_CONNECTIONS; i++) {
    	errorCode = osi_TaskCreate(httpClientTask, (const signed char*)"httpClientTask", HTTP_STACK_SIZE, NULL, 5, NULL);
        if (errorCode < 0) while(1) osi_Sleep(1);
      }
    
      // create/open listening socket
      listenSocket = httpOpenServerSocket(TM_DEST_PORT);
      if (listenSocket < 0) while(1) osi_Sleep(1);
    
      UART_PRINT("http: Waiting for TCP connection on port %d...\n\r", TM_DEST_PORT);
    
      sockNr = 0;
    
      while(1) {
    
    	clientSocket = SL_EAGAIN;
    
        while(clientSocket == SL_EAGAIN) {
    
          // allow only limited number of client connections
          if (sockNr >= HTTP_MAX_CONNECTIONS) {
            osi_Sleep(30);
        	continue;
          }
    
      	  // accept incomming connection
          clientSocket = sl_Accept(listenSocket, (struct SlSockAddr_t *)&sourceAddr, (SlSocklen_t*)&addrLen);
    
          // wait for connection
          if (clientSocket == SL_EAGAIN) {
            osi_Sleep(30);
            continue;
          }
    
          // socket error
          if(clientSocket < 0 ) {
            sl_Close(clientSocket);
            sl_Close(listenSocket);
            while(1) osi_Sleep(1);
          }
        }
    
        // set client socket to non blocking mode (TODO handle return code)
        sl_SetSockOpt(clientSocket, SL_SOL_SOCKET, SL_SO_NONBLOCKING, &nonBlocking, sizeof(nonBlocking));
    
        // add sockt handle into socket "Table"
        for (i = 0; i < HTTP_MAX_CONNECTIONS; i++) {
          if (socketTable[i] == -1) {
            socketTable[i] = clientSocket;
            sockNr++;
            /*UART_PRINT("http: new socket (%d) for task %d\n\r", clientSocket, i);*/
            break;
          }
        }
      }
    }
    
    
    int httpStartServer(int serverPort) {
    
      sl_FsOpen("/www/wifi.jpg", FS_MODE_OPEN_READ, NULL, &fl);
      // please chenge file size
      flSize = 272587;
    
      osi_TaskCreate(httpServerTask, (const signed char*)"httpServerTask", HTTP_STACK_SIZE, NULL, 5, NULL);
    
      return 0;
    }
    
    

    Jan

  • Hi,

    The code looks OK.

    I have checked locking semaphore in case of tirtos and it seems that if 2 tasks grabs the same lock and a 3rd task is unlocking, only the 1st task that grabed the lock is unlocked and not both tasks. This behavior is expected so there is not issue around tirtos as I suspected.

    Regarding g_pCB->FlowContCB.TxPoolCnt is 1, it means that the only case is that sl_recv() decreased this global variable as sl_send() would be blocked on 2. So most likely one of the tasks called sl_recv() and decreased this counter to 1 when the other task were doing sl_send() and got blocked on the line you reported.

    However, I cannot see it happening since these sections are protected with TxLockObj.

    I would take a second look at it and think of a way to debug it on your setup.

    Shlomi

  • Hi,

    Thank you for you investigation this issue.

    I want to apologize I tried simulate issue now again. It looks that g_pCB->FlowContCB.TxPoolCnt stuck at 2 not a 1. I am for sure if at previous cases was sucked at 1 or always at number 2 and number 1 was only my mistake.

    Sorry for confusion.


    Jan
  • OK, thanks for the update.

    So it means these are 2 parallel sl_send().

    Doesn't change much in the investigation though :)

  • Hi,

    Any update in this topic? I tried latest SDK (v1.2.0) and Service Pack (v1.0.1.6-2.6.0.5) and it looks that problem with TxPoolCnt still persist. With new SDK it not stuck in same place as previous SDK at driver.c file (_SlDrvDataWriteOp()). From my side it looks when stuck occurs, than stack overflow is reported by TI-RTOS. I have big stack (8kB and 4kB for tasks) and from this reason I not expect stack insufficiency in normal case. I use TI-RTOS version 2_15_00_17. From my side it that look, but I am not 100% sure. I can confirm that SimpleLinkGeneralEventHandler() was not called. The result is however the same. Simplink stopped responding.

    It looks than error handling was reworked in driver.c file. Code hidden behind macro SL_TINY_EXT in my case is not enabled (driver.c line 837).

    Thank you for response,


    Regards,

    Jan

    UPDATE: There is no stack overflow, sorry for mistake. All tasks with SL are blocked by semaphore.

  • Hi,

    It is not very clear where it is stuck with the new driver. Can you elaborate?

    Also, I can see that the spawned task is set to less priority than other tasks. spawned task should get higher priority than other tasks.

    Shlomi

  • Hi,

    It is hard determine place where it stuck (wait for synchronisation). I will try find exact place.

    I am not sure if you talk about sl task. Increasing priority of SimpleLink nothing change. This was one first thing what I tried.

    I use new SDK but timeout mechanism I haven't implemented in my code yet. Because this mechanism is nowhere described. Later I will try elaborate with advices about timeouts from this topic (e2e.ti.com/support/wireless_connectivity/simplelink_wifi_cc31xx_cc32xx/f/968/t/497984). I am not sure if this can help.

    Regards,


    Jan

  • Hi,

    I have update in this topic.

    First I would like to thank you trying to find the cause of the problem. Unfortunately issue still persists. I tried few possible solution, but nothing was 100% functional. As I get closer into problematic, it look that issue is inside NWP firmware. I was not able find any problem in driver code.

    I have latest SDK (1.2.0) and latest service (1.0.1.6) pack. In case of high load, weak signal (about -80dBm) and multiple simultaneously task with sockets, it can happen this. When sl_Send() or sl_Recv() return error code -7 (SL_RET_CODE_PROTOCOL_ERROR?), then driver stuck at next sl_ call. TxPoolCnt is decreased to 1 or 2. This next sl_ function call waits forever at _SlDrvObjGlobalLockWaitForever() in driver.c - sl_LockObjLock(&g_pCB->GlobalLockObj, SL_OS_WAIT_FOREVER).

    Than k you for advice

    Regards,

    Jan

  • Hi Jan,

    Sorry for the late response.

    First, -7 is indeed PROTOCOL_ERROR. Unlike the previous SimpleLink driver implementation, I have noticed that the SL_PROTOCOL_HANDLING is by default SL_HANDLING_ERROR and not SL_HANDLING_ASSERT. In this case, the macro causes _SlDrvDataWriteOp()/_SlDrvDataReadOp() to return. However, there are still locks that need to be unlocked (g_pCB->FlowContCB.TxLockObj and the Global lock). You can see those locks unlocked in the regular flow just after the VERIFY_PROTOCOL line.

    Note that even with SL_HANDLING_ASSERT, you would end up looping forever.

    Bottom line, since it should not happen it really doesn't matter whether it is stuck on loop forever or just return from the function. It indicates something is wrong and the root cause should be found.

    Second, let me elaborate where the possible root cause is.

    As stated before, theoretically we could not find where it fails. This was assuming you only use sl_send() and sl_recv(). However, looking at the tasks you also call other commands that might get cpu time in a middle of sl_send() and sl_recv(). Any command would be allocated with a packet from the command pool and reduce TxPoolCnt by 1.

    The problem is when you break from --> if(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN + 1 ) and before you grab the global lock, another task takes it and sends the command. The command complete should in most cases release this packet so the original task that is locked on the global lock releases and the VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN + 1 ) should pass. However, the packet may not get released on command complete. This would lead to getting stuck as you can see.

    What can be done?

    My recommendation is to do a minor change to overcome this case.

    1. _SlDrvDataWriteOp(): change the line VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN + 1 ); to VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN);
    2. _SlDrvDataReadOp(): change the line VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN); to 
      VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN - 1);

    This way even if the device does not release the packet upon command complete, you still have headroom of one packet.

    Please test it and let me know if it still happen.

    Shlomi

  • Hi,

    Thank you for answer. Tomorrow I'll try it and let you know.


    Jan
  • Hi,

    I have partial good news.

    We was not able to eliminate issue by advices from previous post. Problem still persists after changes in _SlDrvDataReadOp() and _SlDrvDataWriteOp(). But we found one ugly hack, which probably eliminate issue. This not solve main reason of issue, but hides problem. For this moment is this acceptable for me. Together with higher level of supervision by watch-dog, it will be reliable.

    long WLANTxPoolProtection(void) {
      extern _SlDriverCb_t* g_pCB;
      int timeout = 0;
      while (g_pCB->FlowContCB.TxPoolCnt < 15) {
        osi_Sleep(100);
        // timeout
        if (timeout > 2) return -2;
      }
      return 0;
    }
    
    if ((retVal = WLANTxPoolProtection(void)) != 0) {
      // timeout - close socket
      return retVal;
    }
    retVal = sl_Send(socket, buf, len, 0);

    For this moment is issue with TxPoolCnt=1 resolved. We can close this thread

    Thank you,

    Jan

  • Jan,

    To be honest, I am not sure I understand what you did and how it helps but if it works for you that is great.

    I'll close the thread for now.

    If you face related issues in the future, please open a new thread and refer to this one.

    Shlomi