This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SIMPLELINK-WIFI-CC3120-SDK-PLUGIN: SL_DEVICE_EVENT_FATAL_DRIVER_ABORT occurs in sl_SendTo under weak radio quality

Part Number: SIMPLELINK-WIFI-CC3120-SDK-PLUGIN
Other Parts Discussed in Thread: UNIFLASH, CC3120, CC3100, CC3200

I am a FAE of a Distributor in Japan.

My customer has a problem.

SL_DEVICE_EVENT_FATAL_DRIVER_ABORT occurs in sl_SendTo when multiple UDP communications are performed in an environment with weak radio quality.
They checked the details and found that TxPoolCnt was 4 at the beginning of the _SlDrvDataWriteOp function (# 1), but TxPoolCnt was 2 at VERIFY_PROTOCOL (# 2).
It seems that the spawn task is executed between # 1 and # 2, and TxPoolCnt is changed to 2 by the _SlDrvMsgRead function.

- Is there anything the TxPoolCnt suddenly update to the minimum value?
- Can you give me advice on how to avoid this issue?

/* ******************************************************************************/
/*   _SlDrvDataWriteOp                                                          */
/* ******************************************************************************/
_SlReturnVal_t _SlDrvDataWriteOp(_SlSd_t Sd,
                                 _SlCmdCtrl_t  *pCmdCtrl,
                                 void                *pTxRxDescBuff,
                                 _SlCmdExt_t         *pCmdExt)
{
    _SlReturnVal_t RetVal = SL_ERROR_BSD_EAGAIN;  /*  initiated as SL_EAGAIN for the non blocking mode */
    _u32 allocTxPoolPkts;

    while(1)
    {
        /*  Do Flow Control check/update for DataWrite operation */
        SL_DRV_OBJ_LOCK_FOREVER(&g_pCB->FlowContCB.TxLockObj);

                                :

        if(g_pCB->FlowContCB.TxPoolCnt <= FLOW_CONT_MIN + allocTxPoolPkts)                                      ----> (#1)
        {
            /*  we have indication that this socket is set as blocking and we try to  */
            /*  unblock it - return an error */
            if(g_pCB->SocketNonBlocking & (1 << (Sd & SL_BSD_SOCKET_ID_MASK)))
            {
#if (defined (SL_PLATFORM_MULTI_THREADED)) && \
                (!defined (SL_PLATFORM_EXTERNAL_SPAWN))
                if(_SlDrvIsSpawnOwnGlobalLock())
                {
                    _SlInternalSpawnWaitForEvent();
                }
#endif
                SL_DRV_OBJ_UNLOCK(&g_pCB->FlowContCB.TxLockObj);
                return(RetVal);
            }
            /*  If TxPoolCnt was increased by other thread at this moment, */
            /*  TxSyncObj won't wait here */
#if (defined (SL_PLATFORM_MULTI_THREADED)) && \
            (!defined (SL_PLATFORM_EXTERNAL_SPAWN))
            if(_SlDrvIsSpawnOwnGlobalLock())
            {
                while(TRUE)
                {
                    /* If we are in spawn context, this is an API which was called from event handler,
                       read any async event and check if we got signaled */
                    _SlInternalSpawnWaitForEvent();
                    /* is it mine? */
                    if(0 ==
                       sl_SyncObjWait(&g_pCB->FlowContCB.TxSyncObj,
                                      SL_OS_NO_WAIT))
                    {
                        break;
                    }
                }
            }
            else
#endif
            {
                SL_DRV_SYNC_OBJ_WAIT_FOREVER(&g_pCB->FlowContCB.TxSyncObj);
            }
        }
        if(g_pCB->FlowContCB.TxPoolCnt > FLOW_CONT_MIN + allocTxPoolPkts)
        {
            break;
        }
        else
        {
            SL_DRV_OBJ_UNLOCK(&g_pCB->FlowContCB.TxLockObj);
        }
    }

    SL_DRV_LOCK_GLOBAL_LOCK_FOREVER(GLOBAL_LOCK_FLAGS_UPDATE_API_IN_PROGRESS);

    /* In case the global was succesffully taken but error in progress
       it means it has been released as part of an error handling and we should abort immediately */
    if(SL_IS_RESTART_REQUIRED)
    {
        SL_DRV_LOCK_GLOBAL_UNLOCK(TRUE);
        return(SL_API_ABORTED);
    }

    /* Here we consider the case in which some cmd has been sent to the NWP,
       And its allocated packet has not been freed yet. */
    VERIFY_PROTOCOL(g_pCB->FlowContCB.TxPoolCnt >
                    (FLOW_CONT_MIN + allocTxPoolPkts - 1));                                                         ----> (#2)
    g_pCB->FlowContCB.TxPoolCnt -= (_u8)allocTxPoolPkts;

    SL_DRV_OBJ_UNLOCK(&g_pCB->FlowContCB.TxLockObj);

    SL_TRACE1(DBG_MSG, MSG_312, "\n\r_SlDrvCmdOp: call _SlDrvMsgWrite: %x\n\r",
              pCmdCtrl->Opcode);

    /* send the message */
    RetVal = _SlDrvMsgWrite(pCmdCtrl, pCmdExt, pTxRxDescBuff);
    SL_DRV_LOCK_GLOBAL_UNLOCK(TRUE);

    return(RetVal);
}

The environments they use are:

  • SDK Version : 2.40.00.22 (simplelink_sdk_wifi_plugin_2_40_00_22.exe)
  • Service Pack : 3.11.1.0_2.0.0.0_2.2.0.6 (CC3x20ServicePack-3.11.1.0_2.0.0.0_2.2.0.6-windows-installer(2.40.00.22).exe)
  • Host OS : RTX OS (MCU: Renesas RZ/A1H)

Thank you for advice.

Regards,

Yojiro

  • Hi Yojiro-san,

    Something to try is to update the servicepack to the latest release of the CC32xx SDK. Within the SDK, there is an up-to-date servicepack with the latest fixes including a UDP fix that may help. You can download the SDK here: http://www.ti.com/tool/download/SIMPLELINK-CC32XX-SDK

    The servicepack will be in /tools/cc32xx_tools/servicepack-cc3x20/ directory. The CC32xx servicepack is compatible with the CC3120, and backwards compatible with the host driver version in the 2.40.00.22 SDK, so you can simply use Uniflash to program the new servicepack onto your CC3120.

    Let me know if you still run into the same issues and we can try more debug steps.

    Regards,

    Michael

  • Hi Michael-san,

    Thank you for your advice.

    My customer have already confirmed with the latest servicepack (in simplelink_cc32xx_sdk_4_10_00_07.exe).
    But they could not solve this issue.

    Please let me know more debug steps.

    Regards,

    Yojiro

  • Hi Yojiro-san,

    The next diagnostic step would be to capture the NWP logs from the device as this error case occurs. Looking at the logs will allow me to see the state of the CC3120 as this error occurs. Please instruct your customer to follow the steps at this page to capture the logs from Pin62 of the device:

    https://processors.wiki.ti.com/index.php/CC3120_%26_CC3220_Capture_NWP_Logs

    If the customer can also provide me instructions or code to allow me to replicate the issue, that would also be useful for my debug.

    Regards,

    Michael

  • Hi Michael-san,

    I have request to the customer to capture the NWP  log. But they would not possible to capture in their environment, because their board is not lead out the TEST_62 (NWP UART TX) pin.

    So again, could you comment on following question?

    - Is there anything the TxPoolCnt suddenly update to the minimum value?

    They modified the first TxPoolCnt check in _SlDrvDataWriteOp 
            if(g_pCB->FlowContCB.TxPoolCnt <= FLOW_CONT_MIN + allocTxPoolPkts) 

            {
                /*  we have indication that this socket is set as blocking and we try to  */
                /*  unblock it - return an error */
                if(g_pCB->SocketNonBlocking & (1 << (Sd & SL_BSD_SOCKET_ID_MASK)))
     to
            if(g_pCB->FlowContCB.TxPoolCnt <= FLOW_CONT_MIN + alpha + allocTxPoolPkts)
            {
                /*  we have indication that this socket is set as blocking and we try to  */
                /*  unblock it - return an error */
                if(g_pCB->SocketNonBlocking & (1 << (Sd & SL_BSD_SOCKET_ID_MASK)))
    Then this issue no longer occurs.
    Is there any problem with this modification?

    I really appreciate the support you’ve given me.
    Best Regards,
    Yojiro
  • Hi Michael-san,

    Would you please provide your comment?


    Best regards,

    Yojiro

  • Hi,

    A quick comment.

    I have seen issue with "underflow" of TxPoolCnt at previous generation of devices CC3200/CC3100. I have never seen this issue at CC3220 or CC3120 devices. Way how start this issue was similar as your (high traffic at poor signal).

    You can search e2e forum to "TxPoolCnt" keyword. Maybe you find some advices which can work for you.

    Jan

  • Hi Jan-san,

    Thank you for your comment.

    We have already referenced past threads on "TxPoolCnt".
    I think these are the ways to avoid SL_DEVICE_EVENT_FATAL_DRIVER_ABORT when a problem occurs.

    What we want to confirm is:

    Is there anything the TxPoolCnt suddenly update to the minimum value?

    at CC3120/CC3220.

    Best Regards,

    Yojiro

  • Hi Yojiro,

    I am sorry. I am not able to answer your question. Please wait for a answer from TI engineer.

    Jan

  • Hi Yojiro-san,

    What is 'alpha' set as in the modified host driver code? TxPoolCnt is a counter used for flow control between the host MCU and the CC3120. Modifying the host driver code to allow for additional send commands after TxPoolCnt is exhausted is potentially unsafe. TxPoolCnt is set to 0 in the case of a deinit of the host driver, which could happen as part of a driver abort. If your customer edits that TxPoolCnt check, do they no longer get the abort?

    Getting the NWP logs out of pin62 so that I can examine the state of the NWP would be useful, so if you can perform the needed modification to extract those logs that would be greatly appreciated. There may be other causes of TxPoolCnt being set to 0, notably in the case of memory corruption, and looking at the NWP logs would be useful for me.

    Alternatively, being able to replicate the error on my setup would help with debug. Having the instructions and code to replicate would be useful.

    Regards,

    Michael

  • Hi Michael-san,

    Thank you for your support.

    Alpha' is a constant for adding a margin of TxPoolCnt. They set the'alpha' value from 5 to 10 to evaluate.
    With this edit, TxPoolCnt is no longer at the minimum value (FLOW_CONT_MIN) and aborts no longer occur.

    Regarding the NWP log, in their board mounting, the unused terminals are left unconnected, so the lead wire cannot be output. Therefore, they cannot get the logs.

    This problem is occurring when sending and receiving three or more UDP communications in separate tasks.

    Can it happen the TxPoolCnt notified from NWP the minimum value when the TxPoolCnt managed by the Host Driver is not the minimum value and the host performing a lot of communication?

    Best Regards,
    Yojiro