Other Parts Discussed in Thread: SYSBIOS
Tool/software: TI-RTOS
Hello,
I have a very rare bug in my HTTPS Post Task code where occasionally, the task will get stuck in a while loop effectively blocking all tasks permanently until the device is reset. I cannot easily reproduce it but I was finally able to capture this bug with the debugger hooked up and step through the code.
I am using: RTOS version: 2.14.04.31
My code has 2 HTTPS Post tasks that run intermittently (at least once a minute, sometimes more).
I see that my code is stuck on the 2nd of 4 "HTTPCli_sendField" calls that I make.
I am not sure if that is a consistent thing as this is the first time I have captured this with the debugger hooked up.
While stepping through the code, I can see that it's stuck in the while loop inside the "Ssock_send()" function in "C:\ti\tirtos_tivac_2_14_04_31\packages\ti\net\http\ssock.c"
The while loop runs while (len > 0) but "nbytes = ssock->sec.send(ssock->ctx, ssock->s, buf, ilen, flags);" keeps returning 0.
This means that the next if condition is hit "if (nbytes >= 0)" but "len -= nbytes;" doesn't decrement since nbytes is 0.
I do not understand what would cause ssock->sec.send to keep returning 0 all the time. I even removed the ethernet connection and nothing changes.
I included some pictures of the stack trace (I think that's what it's called?) after pausing and resuming the device a few times.
It looks like the NDK Stack Thread is toggling in priority between 1 (ROV shows 2 for some reason) and 9.
At least, that's what's reported in the SYSBIOS System Logger:
"LM_setPri: tsk: 0x20001978, func: 0xbbed, oldPri: 1, newPri 9"
"LM_setPri: tsk: 0x20001978, func: 0xbbed, oldPri: 9, newPri 1"
Deeper down in the WolfSSL code, I believe this error is being hit: "WOLFSSL_CBIO_ERR_WANT_WRITE"
But I do not know what that means, or why it would be happening.
EDIT: I think it's actually just returning WOLFSSL_CBIO_ERR_GENERAL or SOCKET_ERROR_E.
NDK_Send() calls fdint_lockfd() which hits this code in "tirtos_tivac_2_14_04_31\products\ndk_2_24_03_35\packages\ti\ndk\stack\fdt\file.c"
/* Verify Socket and verify that it is open */ if( !pfd || (Type && pfd->Type != Type) || ((pfd->Type != HTYPE_RAWETHSOCK) && #ifdef _INCLUDE_IPv6_CODE (pfd->Type != HTYPE_SOCK6) && #endif (pfd->Type != HTYPE_SOCK && pfd->Type != HTYPE_PIPE) ) || !pfd->OpenCount ) { pfdt->error = EBADF; return( SOCKET_ERROR ); }
Due to the very rare occurence of this issue, I do not have any wireshark captures or anything like that. I haven't figured out how to reproduce it, it takes days to weeks for this error to happen. The HTTPS tasks run every few minutes (or less).
This is an example of my HTTPS Post task. I trimmed a lot of the supporting code down so you can just see the key functionality.
static void HttpsPostTask(UArg arg0, UArg arg1) { bool moreFlag = false; char responseDataBuf[256]; /* buffer for response data */ int ret; // Storage variables char responseLengthStr[10]; int responseLengthInt; char signature[41] = {0}; //40 hex characters char counterString[12] = {0}; //1 to 11 characters (unsigned int --> decimal) char http_uri[50] = {0}; HTTPCli_Struct cli; // This section builds the static HTTP headers HTTPCli_Field fields[4] = { { HTTPStd_FIELD_NAME_HOST, HOSTNAME }, { HTTPStd_FIELD_NAME_USER_AGENT, USER_AGENT }, { HTTPStd_FIELD_NAME_CONTENT_TYPE, CONTENT_TYPE }, { NULL, NULL } }; HTTPCli_Params httpParams = {0}; httpParams.timeout = HTTP_POST_TIMEOUT; /* Timeout value in seconds */ for(;;) { if (!IsDomainResolved()) { // DNS resolution function call } if (!DoesSSL_ContextExist()) { // Setup SSL Context function call } if (IP_AddressIsSet && IsDomainResolved() && DoesSSL_ContextExist()) { // Clear storage memset(&responseLengthStr[0], 0, sizeof(responseLengthStr)); memset(&counterString[0], 0, sizeof(counterString)); memset(&signature[0], 0, sizeof(signature)); memset(&http_uri[0], 0, sizeof(http_uri)); // Set the correct URI (feedback or metrics) if (httpsEndpoint == metrics) { PrintDebug("\n ** HTTP Post to /metrics/ ** \n"); strncpy(http_uri, METRICS_URI, sizeof(http_uri)); } else if (httpsEndpoint == feedback) { PrintDebug("\n ** HTTP Post to /feedback/ ** \n"); strncpy(http_uri, FEEDBACK_URI, sizeof(http_uri)); } else if (httpsEndpoint == authentication) { PrintDebug("\n ** HTTP Post to /authentication/ ** \n"); strncpy(http_uri, AUTHENTICATION_URI, sizeof(http_uri)); } // Get counter int to string httpCounter++; sprintf(counterString, "%u", httpCounter); // Signature generation GenerateSignature(signature, httpDeviceId, counterString, httpsPostData); // Store the size of the data in INT and STRING responseLengthInt = strlen(httpsPostData); sprintf(responseLengthStr, "%d", responseLengthInt); // Start HTTP transaction HTTPCli_construct(&cli); HTTPCli_setRequestFields(&cli, fields); // In the actual code, I have error checks on each of these calls! ret = HTTPCli_connect(&cli, DNS_Results->ai_addr, HTTPCli_TYPE_TLS, &httpParams); ret = HTTPCli_sendRequest(&cli, HTTPStd_POST, http_uri, true); ret = HTTPCli_sendField(&cli, HTTP_HEADER_ID, httpDeviceId, false); ret = HTTPCli_sendField(&cli, HTTP_HEADER_COUNT, counterString, false); ret = HTTPCli_sendField(&cli, HTTP_HEADER_SIGN, signature, false); ret = HTTPCli_sendField(&cli, HTTPStd_FIELD_NAME_CONTENT_LENGTH, responseLengthStr, true); ret = HTTPCli_sendRequestBody(&cli, httpsPostData, responseLengthInt); ret = HTTPCli_getResponseStatus(&cli); ret = HTTPCli_getResponseField(&cli, responseDataBuf, sizeof(responseDataBuf), &moreFlag); do { ret = HTTPCli_readResponseBody(&cli, responseDataBuf, (sizeof(responseDataBuf)-sizeof(responseDataBuf[0])), &moreFlag); if (ret < 0) { // error code } else if (ret) { if (ret < sizeof(responseDataBuf)) { responseDataBuf[ret] = '\0'; } // handle data } } while (moreFlag); // Need to check for success here. if(strncmp(METRICS_EXPECTED_RETURN, responseDataBuf, strlen(METRICS_EXPECTED_RETURN)) != 0) { PrintError(-1, "HttpsPostTask: response body does not match success"); goto httpError; } failedCount = 0; goto httpCleanup; // This goto is hit if there is an error. // Uses a simple backoff timer up to 30 seconds. httpError: failedCount++; ResetFleetCommsSuccess(); PrintDebug("Fail Count: %u\n", failedCount); Task_sleep(failedCount > 8 ? 30000 : (1 << failedCount)*100); httpCleanup: HTTPCli_disconnect(&cli); HTTPCli_destruct(&cli); } else { // Do nothing PrintDebug("No IP address or DNS not resolved. HttpsPostTask doing nothing.\n"); failedCount++; ResetFleetCommsSuccess(); } // "Turn off" this task every run. // Makes this task a "run on command" task. Task_setPri(taskHandleHttpsPost, -1); } } // Creates the https post task and starts the handler. void InitHttpsPostTask() { // Task creation and starting code. } void StartHttpsPost(char* data, enum Endpoints endpoint) { if (Task_getPri(taskHandleHttpsPost) == -1) { Task_setPri(taskHandleHttpsPost, 1); strcpy(httpsPostData, data); httpsEndpoint = endpoint; } } bool IsHttpsPostTaskRunning() { return (Task_getPri(taskHandleHttpsPost) > 0); } int GetPostFailureCount() { return failedCount; } void ClearPostFailureCount() { failedCount = 0; }