CC3220SF: Handling SLNETERR_BSD_EAGAIN error during update process

Swapnil

Part Number: CC3220SF

Tool/software:

Hello Community,

I have OTA update in my custom application. The update file (.tar) is on the server.

Sometime during the update, I get SLNETERR_BSD_EAGAIN error. This happens when server does not respond before socket timeout.

I have set the socket timeout to 5 seconds and I think this is enough time for server to respond.

SLNETERR_BSD_EAGAIN error means trying again, but in ProcessOta(), when this error comes it will stop the OTA update giving error message "OTA_NOTIF_DOWNLOAD_ERROR".

To avoid this, I am retrying to get the missing packet by calling the GetChunk callback again, as you can also see in the code snippet below.

For the test, I kept the number of retry attempts to 10.

The problem is that if I don't keep the sleep time between attempts, I get the same socket timeout error consecutively, and all my retry attempts will elapse.

By keeping this sleep time, I still get this error but will not cross the retry attempts.

My question is, why this sleep time is required?

Could it be that the NWP processor is busy and not responding to the requests in time?

I am not able to understand this behaviour.

The below code snippet is from the file ota_if.c from function ProcessOta()

do
            {
                /* Now, fill the rest of the buffer (using GetChunk Callback) */
                rc = m_sessionCBs[m_ota.type].fGetChunk(m_ota.hSession, &m_ota.buff[nUnprocessed], OTA_BUFF_SIZE-nUnprocessed);
                if(rc > 0)
                {
                    /* Update counters with the actual number of bytes read */
                    m_ota.nTotalRead += rc;
                    nUnprocessed += rc;
                    ui8RetryCnt = 0;
                    LOG_DEBUG("ProcessOta:: read=%d (%d)", rc, m_ota.nTotalRead);
                }
                else if(rc == 0)
                {
                    /* mark end of input */
                    bEndOfInput = true;
                    LOG_DEBUG("ProcessOta:: read=0 (%d)", m_ota.nTotalRead);
                }
                else if((rc == SLNETERR_BSD_EAGAIN) && (ui8RetryCnt < OTA_PACKET_RETRY_COUNT))
                {
                    LOG_WARNING("ProcessOta:: Socket did not respond and trying again!");
                    sleep(1);
                }
                else
                {
                    LOG_ERROR("ProcessOta: ---- Can't get next chunk (%d)", rc);
                    return rc;
                }

                ui8RetryCnt++;
                if(ui8RetryCnt > OTA_PACKET_RETRY_COUNT)
                {
                    break;
                }
            }
            while(rc == SLNETERR_BSD_EAGAIN);

6 months ago

0 Shlomi Itzhak 6 months ago

TI__Guru 64415 points

Hi,

Not sure what type of OTA you are using, local or remote, since the behavior would be a little different.

For local OTA, the NWP is the server handling the HTTP transactions (with the netapp mechanism).

For remote OTA, the server is on the cloud and HTTP client on the SL is handling the transactions (so practically SL open a socket on top of NWP).

I assume you use the second option.

In this case, I don't see how the NWP is busy and the timeout would simply last more, giving the server more time to respond (as it seems this is the issue since finally it succeeds when you wait more).

Shlomi

0 Swapnil 6 months ago in reply to Shlomi Itzhak

Intellectual 550 points

Hello,

Thank you for your reply.

Yes, we are testing the remote OTA.

Also, what I didn't understand was the reason behind getting rc=0. Because while updating, sometimes the update is incomplete and rc = 0.

In what case does the zero is returned by GetChunk function? Because if the server is disconnected or socket is closed, it should have returned negative value.

How can we continue with the update in this case?

Regards,

Swapnil

0 Shlomi Itzhak 6 months ago in reply to Swapnil

TI__Guru 64415 points

Hi,

Basically, the GetChunkData() calls SlNetSock_recv() and a return value of 0 should mean that the other side closed the connection.

Upon an error, you would get a negative value so 0 means the server disconnected the socket.

Do you see cases where the return value is 0 and the update succeeds?

Shlomi

0 Swapnil 6 months ago in reply to Shlomi Itzhak

Intellectual 550 points

Hello,

No, when return value is zero from SlNetSock_recv(), the update stops and never succeeds.

Is it possible to reconnect with the socket and continue the update from where it was disconnected?

If yes, how could it be done?

Regards,

Swapnil

0 Shlomi Itzhak 6 months ago in reply to Swapnil

TI__Guru 64415 points

so it makes sense now. 0 means the server side closed the connection.

I don't know of a way to resume data from where it left and in any case I don't think it depends on the client which simply fetch the data from the server.

Shlomi

Wi-Fi

Wi-Fi forum

CC3220SF: Handling SLNETERR_BSD_EAGAIN error during update process