SIMPLELINK-WIFI-CC3120-SDK-PLUGIN: SL_DEVICE_EVENT_FATAL_DRIVER_ABORT occurs in sl_SendTo under weak radio quality

Yojiro Yane1

Part Number: SIMPLELINK-WIFI-CC3120-SDK-PLUGIN
Other Parts Discussed in Thread: UNIFLASH, CC3120, CC3100, CC3200

I am a FAE of a Distributor in Japan.

My customer has a problem.

SL_DEVICE_EVENT_FATAL_DRIVER_ABORT occurs in sl_SendTo when multiple UDP communications are performed in an environment with weak radio quality.
They checked the details and found that TxPoolCnt was 4 at the beginning of the _SlDrvDataWriteOp function (# 1), but TxPoolCnt was 2 at VERIFY_PROTOCOL (# 2).
It seems that the spawn task is executed between # 1 and # 2, and TxPoolCnt is changed to 2 by the _SlDrvMsgRead function.

- Is there anything the TxPoolCnt suddenly update to the minimum value?
- Can you give me advice on how to avoid this issue?

/* ******************************************************************************/
/*   _SlDrvDataWriteOp                                                          */
/* ******************************************************************************/
_SlReturnVal_t _SlDrvDataWriteOp(_SlSd_t Sd,
                                 _SlCmdCtrl_t  *pCmdCtrl,
                                 void                *pTxRxDescBuff,
                                 _SlCmdExt_t         *pCmdExt)
{
    _SlReturnVal_t RetVal = SL_ERROR_BSD_EAGAIN;  /*  initiated as SL_EAGAIN for the non blocking mode */
    _u32 allocTxPoolPkts;

    while(1)
    {
        /*  Do Flow Control check/update for DataWrite operation */
        SL_DRV_OBJ_LOCK_FOREVER(&g_pCB->FlowContCB.TxLockObj);

                                :

        if(g_pCB->FlowContCB.TxPoolCnt <= FLOW_CONT_MIN + allocTxPoolPkts)                                      ----> (#1)
        {
            /*  we have indication that this socket is set as blocking and we try to  */
            /*  unblock it - return an error */
            if(g_pCB->SocketNonBlocking & (1 << (Sd & SL_BSD_SOCKET_ID_MASK)))
            {
#if (defined (SL_PLATFORM_MULTI_THREADED)) && \
                (!defined (SL_PLATFORM_EXTERNAL_SPAWN))
                if(_SlDrvIsSpawnOwnGlobalLock())
                {
                    _SlInternalSpawnWaitForEvent();
                }
#endif
                SL_DRV_OBJ_UNLOCK(&g_pCB->FlowContCB.TxLockObj);
                return(RetVal);
            }
            /*  If TxPoolCnt was increased by other thread at this moment, */
            /*  TxSyncObj won't wait here */
#if (defined (SL_PLATFORM_MULTI_THREADED)) && \
            (!defined (SL_PLATFORM_EXTERNAL_SPAWN))
            if(_SlDrvIsSpawnOwnGlobalLock())
            {
                while(TRUE)
                {
                    /* If we are in spawn context, this is an API which was called from event handler,
                       read any async event and check if we got signaled */
                    _SlInternalSpawnWaitForEvent();
                    /* is it mine? */
                    if(0 ==
                       sl_SyncObjWait(&g_pCB->FlowContCB.TxSyncObj,
                                      SL_OS_NO_WAIT))
                    {
                        break;
                    }
                }
            }
            else
#endif
            {
                SL_DRV_SYNC_OBJ_WAIT_FOREVER(&g_pCB->FlowContCB.TxSyncObj);
            }
        }
        if(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN + allocTxPoolPkts)
        {
            break;
        }
        else
        {
            SL_DRV_OBJ_UNLOCK(&g_pCB->FlowContCB.TxLockObj);
        }
    }

    SL_DRV_LOCK_GLOBAL_LOCK_FOREVER(GLOBAL_LOCK_FLAGS_UPDATE_API_IN_PROGRESS);

    /* In case the global was succesffully taken but error in progress
       it means it has been released as part of an error handling and we should abort immediately */
    if(SL_IS_RESTART_REQUIRED)
    {
        SL_DRV_LOCK_GLOBAL_UNLOCK(TRUE);
        return(SL_API_ABORTED);
    }

    /* Here we consider the case in which some cmd has been sent to the NWP,
       And its allocated packet has not been freed yet. */
    VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt >
                    (FLOW_CONT_MIN + allocTxPoolPkts - 1));                                                         ----> (#2)
    g_pCB->FlowContCB.TxPoolCnt -= (_u8)allocTxPoolPkts;

    SL_DRV_OBJ_UNLOCK(&g_pCB->FlowContCB.TxLockObj);

    SL_TRACE1(DBG_MSG, MSG_312, "\n\r_SlDrvCmdOp: call _SlDrvMsgWrite: %x\n\r",
              pCmdCtrl->Opcode);

    /* send the message */
    RetVal = _SlDrvMsgWrite(pCmdCtrl, pCmdExt, pTxRxDescBuff);
    SL_DRV_LOCK_GLOBAL_UNLOCK(TRUE);

    return(RetVal);
}

The environments they use are:

SDK Version : 2.40.00.22 (simplelink_sdk_wifi_plugin_2_40_00_22.exe)
Service Pack : 3.11.1.0_2.0.0.0_2.2.0.6 (CC3x20ServicePack-3.11.1.0_2.0.0.0_2.2.0.6-windows-installer(2.40.00.22).exe)
Host OS : RTX OS (MCU: Renesas RZ/A1H)

Thank you for advice.

Regards,

Yojiro

over 3 years ago

0 Michael Reymond over 3 years ago

TI__Mastermind 40965 points

Hi Yojiro-san,

Something to try is to update the servicepack to the latest release of the CC32xx SDK. Within the SDK, there is an up-to-date servicepack with the latest fixes including a UDP fix that may help. You can download the SDK here: http://www.ti.com/tool/download/SIMPLELINK-CC32XX-SDK

The servicepack will be in /tools/cc32xx_tools/servicepack-cc3x20/ directory. The CC32xx servicepack is compatible with the CC3120, and backwards compatible with the host driver version in the 2.40.00.22 SDK, so you can simply use Uniflash to program the new servicepack onto your CC3120.

Let me know if you still run into the same issues and we can try more debug steps.

Regards,

Michael

0 Yojiro Yane1 over 3 years ago in reply to Michael Reymond

Intellectual 580 points

Hi Michael-san,

Thank you for your advice.

My customer have already confirmed with the latest servicepack (in simplelink_cc32xx_sdk_4_10_00_07.exe).
But they could not solve this issue.

Please let me know more debug steps.

Regards,

Yojiro

0 Michael Reymond over 3 years ago in reply to Yojiro Yane1

TI__Mastermind 40965 points

Hi Yojiro-san,

The next diagnostic step would be to capture the NWP logs from the device as this error case occurs. Looking at the logs will allow me to see the state of the CC3120 as this error occurs. Please instruct your customer to follow the steps at this page to capture the logs from Pin62 of the device:

https://processors.wiki.ti.com/index.php/CC3120_%26_CC3220_Capture_NWP_Logs

If the customer can also provide me instructions or code to allow me to replicate the issue, that would also be useful for my debug.

Regards,

Michael

0 Yojiro Yane1 over 3 years ago in reply to Michael Reymond

Intellectual 580 points

Hi Michael-san,

I have request to the customer to capture the NWP log. But they would not possible to capture in their environment, because their board is not lead out the TEST_62 (NWP UART TX) pin.

So again, could you comment on following question?

- Is there anything the TxPoolCnt suddenly update to the minimum value?

They modified the first TxPoolCnt check in _SlDrvDataWriteOp
if(g_pCB->FlowContCB.TxPoolCnt <= FLOW_CONT_MIN + allocTxPoolPkts)

{

/* we have indication that this socket is set as blocking and we try to */

/* unblock it - return an error */

if(g_pCB->SocketNonBlocking & (1 << (Sd & SL_BSD_SOCKET_ID_MASK)))

if(g_pCB->FlowContCB.TxPoolCnt <= FLOW_CONT_MIN + alpha + allocTxPoolPkts)

{

/* we have indication that this socket is set as blocking and we try to */

/* unblock it - return an error */

if(g_pCB->SocketNonBlocking & (1 << (Sd & SL_BSD_SOCKET_ID_MASK)))

Then this issue no longer occurs.

Is there any problem with this modification?

I really appreciate the support you’ve given me.

Best Regards,

Yojiro

0 Yojiro Yane1 over 3 years ago in reply to Yojiro Yane1

Intellectual 580 points

Hi Michael-san,

Would you please provide your comment?

Best regards,

Yojiro

0 Jan D over 3 years ago in reply to Yojiro Yane1

Guru 73215 points

Hi,

A quick comment.

I have seen issue with "underflow" of TxPoolCnt at previous generation of devices CC3200/CC3100. I have never seen this issue at CC3220 or CC3120 devices. Way how start this issue was similar as your (high traffic at poor signal).

You can search e2e forum to "TxPoolCnt" keyword. Maybe you find some advices which can work for you.

Jan

0 Yojiro Yane1 over 3 years ago in reply to Jan D

Intellectual 580 points

Hi Jan-san,

Thank you for your comment.

We have already referenced past threads on "TxPoolCnt".
I think these are the ways to avoid SL_DEVICE_EVENT_FATAL_DRIVER_ABORT when a problem occurs.

What we want to confirm is:

Is there anything the TxPoolCnt suddenly update to the minimum value?

at CC3120/CC3220.

Best Regards,

Yojiro

0 Jan D over 3 years ago in reply to Yojiro Yane1

Guru 73215 points

Hi Yojiro,

I am sorry. I am not able to answer your question. Please wait for a answer from TI engineer.

Jan

0 Michael Reymond over 3 years ago in reply to Jan D

TI__Mastermind 40965 points

Hi Yojiro-san,

What is 'alpha' set as in the modified host driver code? TxPoolCnt is a counter used for flow control between the host MCU and the CC3120. Modifying the host driver code to allow for additional send commands after TxPoolCnt is exhausted is potentially unsafe. TxPoolCnt is set to 0 in the case of a deinit of the host driver, which could happen as part of a driver abort. If your customer edits that TxPoolCnt check, do they no longer get the abort?

Getting the NWP logs out of pin62 so that I can examine the state of the NWP would be useful, so if you can perform the needed modification to extract those logs that would be greatly appreciated. There may be other causes of TxPoolCnt being set to 0, notably in the case of memory corruption, and looking at the NWP logs would be useful for me.

Alternatively, being able to replicate the error on my setup would help with debug. Having the instructions and code to replicate would be useful.

Regards,

Michael

0 Yojiro Yane1 over 3 years ago in reply to Michael Reymond

Intellectual 580 points

Hi Michael-san,

Thank you for your support.

Alpha' is a constant for adding a margin of TxPoolCnt. They set the'alpha' value from 5 to 10 to evaluate.
With this edit, TxPoolCnt is no longer at the minimum value (FLOW_CONT_MIN) and aborts no longer occur.

Regarding the NWP log, in their board mounting, the unused terminals are left unconnected, so the lead wire cannot be output. Therefore, they cannot get the logs.

This problem is occurring when sending and receiving three or more UDP communications in separate tasks.

Can it happen the TxPoolCnt notified from NWP the minimum value when the TxPoolCnt managed by the Host Driver is not the minimum value and the host performing a lot of communication?

Best Regards,
Yojiro

Wi-Fi

Wi-Fi forum

SIMPLELINK-WIFI-CC3120-SDK-PLUGIN: SL_DEVICE_EVENT_FATAL_DRIVER_ABORT occurs in sl_SendTo under weak radio quality