This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

BOOSTXL-CC3120MOD: WiFi SDK: Problem with Non-Blocking sl_Accept()

Part Number: BOOSTXL-CC3120MOD
Other Parts Discussed in Thread: CC3120MOD

Hi,

I have run into a major problem in the port of the WiFi SDK for an MSP432 + CC3120MOD project. I'm trying to set up a simple listening TCP server:

sl_Start appears to run fine, and a short time later my start callback function receives notification of startup in ROLE_AP as expected. Subsequent calls to sl_Socket, sl_SetSockOpt (to set it to non-blocking), sl_Bind, and sl_Listen all appear to return the expected results.

But the subsequent call to sl_Accept fails. The call appears to block in a semaphore Pend call until the Pend timeout value is hit, at which point it returns with -2005 (Internal Error). I also get a Fatal Error callback at that point, with an ID of 5 (Command Timeout) and Code 4099, Value 0 (no idea what that means).

I've done some further debugging with a 'scope to try to understand the sequence of events in more depth (all timing approximate):

  1. About 60us after sl_Accept is called, a semaphore Pend is entered;
  2. About 200us later, the module IRQ line goes high; My interrupt dispatcher runs the registered SDK interrupt routine, a semaphore Post is generated, the Pend is released, and the module IRQ line returns to low; The interrupt routine returns.... so far so good.....
  3. About 150us later, a semaphore Pend is again entered - this is the one that eventually times out;
  4. About 100us later, the module IRQ line goes high, and my interrupt dispatcher runs the registered SDK interrupt routine again; But the interrupt routine returns, the module IRQ line is left at high state (?), and no semaphore Post is generated..... so the Pend continues to block until eventual timeout and failure....

Development setup is a CC3120MOD Boosterpack mounted on an MSP432 Boosterpack, programmed with IAR development environment V8.11. Software is running on bare metal, no RTOS.

I'd be very grateful for any thoughts or suggestions from anyone with more detailed knowledge of the workings of the SDK.... I'm at a dead end and can't really allocate too much more time to trying to debug this, so I think I'm going to have to abandon using this module, which would be a shame.... it seems so close to working....

(Just to clarify: I've also tried this without setting the socket to non-blocking, and I get exactly the same failure at sl_Accept).

Thanks and Best wishes,

John

  • Hi John,

    Which SDK are you using? And is there a specific software example you've been using to guide you?

    Is this the same semaphore being pended and at the same location? Can you locate where it's supposed to post and ensure that it gets there in your program?

    Hope to help you get this working!

    Best,
    Kevin
  • Hi Kevin, thanks for the reply, sorry for delay, I wasn't in office yesterday.....

    The SDK is "simplelink_sdk_wifi_plugin_1_55_00_42". I'm not using any specific example to guide me - I've been following the Porting Guide section of the Programming Manual, together with the comments in the user.h file, etc. I'm then following the function call sequence as documented in the Programming Manual, but I feel that this is quite a low-level problem, nothing to do with the high-level user function calls being wrong.

    The semaphore which is correctly released by the first interrupt is NOT the same semaphore as the one which later remains blocked and times out.... but the semaphore that times out IS always the same one, being pended at always the same time after sl_Accept() runs.... the sequence of events is totally reproducible.....

    I'm not sure what to look at next..... I feel it likely that the second call to the interrupt routine is not seeing what it expected or otherwise doing the right thing, but I'm well off the end of my knowledge there... Any help much appreciated!

    Best wishes, John

  • Hi John,

    No worries. Your 2nd semaphore is a timed wait/pend, correct? Is your 1st semaphore timed as well or just a normal sem pend/wait?

    Maybe the time limit set for the 2nd semaphore is being reached before the semaphore is being posted. Have you found where the 2nd semaphore is supposed to post and make sure your program run gets there? Increasing the semaphore time limit may fix the issue, but not if your program can never get to where it posts.

    I think that taking some debug measures around the sem post could help you figure out the issue. (I.E. add debug print statements, add breakpoints, add debug variables and check them in the expressions window, etc...)

    Hope this helps,
    Kevin
  • Thanks Kevin, I'll investigate that. I honestly can't remember if the first semaphore is timed or not..... the one that fails times out after about 10 seconds (the 'timeout' value is 10000) so I doubt that it's too short.....

    I'll do as you suggest and try to trace where the semaphore post should occur..... I'll get back to you!

    Thanks! John
  • John,

    Sounds like a good plan. Let me know what you find.

    Best,
    Kevin
  • Hi Kevin, sorry for delay, got pulled off onto other stuff.....

    Well, I'm a bit bemused at what I've found, and thinking that I've misunderstood the business of running this driver in a non-operating-system environment:

    The Semaphore Pend is entered very soon in the sl_Accept() routine:

    _i16 sl_Accept(_i16 sd,
                   SlSockAddr_t *addr,
                   SlSocklen_t *addrlen)
    {
        _SlSockAcceptMsg_u Msg;
        _SlReturnVal_t RetVal;
        SlSocketAddrResponse_u AsyncRsp;

        _u8 ObjIdx = MAX_CONCURRENT_ACTIONS;

        /* verify that this api is allowed. if not allowed then
           ignore the API execution and return immediately with an error */
        VERIFY_API_ALLOWED(SL_OPCODE_SILO_SOCKET);

        Msg.Cmd.Sd = (_u8)sd;

        if((addr != NULL) && (addrlen != NULL))
        {
            /* If addr is present, addrlen has to be provided */
            Msg.Cmd.Family =
                (_u8)((sizeof(SlSockAddrIn_t) ==
                       *addrlen) ? SL_AF_INET : SL_AF_INET6);
        }
        else
        {
            /* In any other case, addrlen is ignored */
            Msg.Cmd.Family = (_u8)0;
        }

        ObjIdx = _SlDrvProtectAsyncRespSetting((_u8*)&AsyncRsp, ACCEPT_ID,
                                               (_u8)sd & SL_BSD_SOCKET_ID_MASK);

        if(MAX_CONCURRENT_ACTIONS == ObjIdx)
        {
            return(SL_POOL_IS_EMPTY);
        }

        /* send the command */
        VERIFY_RET_OK(_SlDrvCmdOp((_SlCmdCtrl_t *)&_SlAcceptCmdCtrl, &Msg, NULL));
        VERIFY_PROTOCOL(Msg.Rsp.Sd == (_u8)sd);

        RetVal = Msg.Rsp.StatusOrLen;

        if(SL_OS_RET_CODE_OK == RetVal)
        {
    #ifndef SL_TINY
            /* in case socket is non-blocking one, the async event should be received immediately */
            if(g_pCB->SocketNonBlocking & (1 << (sd & SL_BSD_SOCKET_ID_MASK)))
            {
                SL_DRV_SYNC_OBJ_WAIT_TIMEOUT(&g_pCB->ObjPool[ObjIdx].SyncObj,
                                             SL_DRIVER_TIMEOUT_SHORT,
                                             SL_OPCODE_SOCKET_ACCEPTASYNCRESPONSE
                                             );
            }
            else
    #endif

    When the interrupt occurs, the driver interrupt routine runs and appears to simply do a Spawn:

    _SlReturnVal_t _SlDrvRxIrqHandler(void *pValue)
    {
        (void)pValue;

        sl_IfMaskIntHdlr();

        RxIrqCnt++;

        if(TRUE == g_pCB->WaitForCmdResp)
        {
            OSI_RET_OK_CHECK(sl_SyncObjSignalFromIRQ(&g_pCB->CmdSyncObj));
        }
        else
        {
            (void)sl_Spawn((_SlSpawnEntryFunc_t)_SlDrvMsgReadSpawnCtx, NULL,
                           SL_SPAWN_FLAG_FROM_SL_IRQ_HANDLER);
        }
        return(SL_OS_RET_CODE_OK);
    }

    The interrupt routine then returns, and code execution returns to the Semaphore Pend routine..... and stays there until the Pend times out....

    What is that Spawn trying to achieve? Given that I'm running in a non-operating-system environment, the CPU code execution is effectively captured by the Pend routine unless an interrupt occurs.... The Spawn routine called is in "nonos.c" and simply sets some variables:

    #ifndef SL_PLATFORM_MULTI_THREADED

    #include "nonos.h"

    _SlNonOsCB_t g__SlNonOsCB;

    _SlNonOsRetVal_t _SlNonOsSpawn(_SlSpawnEntryFunc_t pEntry,
                                   void* pValue,
                                   _u32 flags)
    {
        _i8 i = 0;

        /* The parameter is currently not in use */
        (void)flags;

    #ifndef SL_TINY
        for(i = 0; i < NONOS_MAX_SPAWN_ENTRIES; i++)
    #endif
        {
            _SlNonOsSpawnEntry_t* pE = &g__SlNonOsCB.SpawnEntries[i];

            if(pE->IsAllocated == FALSE)
            {
                pE->pValue = pValue;
                pE->pEntry = pEntry;
                pE->IsAllocated = TRUE;
    #ifndef SL_TINY
                break;
    #endif
            }
        }

        return(NONOS_RET_OK);
    }

    Can you explain what should be happening here? Sorry if I'm missing something obvious!

    Thanks, Best wishes,

    John

  • Hi John,

    Are you calling sl_task() at any point within your program? The spawn entries created within the interrupt by sl_Spawn may not be serviced if sl_task() is not called. _SlNonOsHandleSpawnTask (within nonos.c) which services these entries, or "tasks", will never be reached.

    I am no expert on this matter, but I'll try my best to explain the Spawn function.

    When sl_Spawn is called within the interrupt, and therefore _SlNonOsSpawn, an entry is added to an array. Numerous entries can be put within this array and when _SlNonOsHandleSpawnTask is entered (sl_task() enabled) the spawned functions that correlate to the entries in the array will be executed in the order that they were added to the array.

    If the SL_DRV_SYNC_OBJ_WAIT_TIMEOUT call is timing out, some functionality isn't occurring to continue on with the program before the timeout is reached.

    Could you provide some simple code from your project to help replicate what you are seeing?

    Best,
    Kevin
  • Hi Kevin,

    Thanks for reply! Now, what you say is somewhat what I'd guessed from that latest investigation, but it implies that the information in the Porting Guide is very incomplete....

    The Porting Guide in the WiFi Library Programmers Guide says about sl_Task() in non-operating-system environment (page 242, and I'm sure in other places) "This function must be called from the main loop in non-OS platform..."

    And I am indeed doing that.... sl_Task() is being called each time round my main loop. But the problem in this case is that the main loop is effectively being frozen by the call to sl_Accept() blocking at the Semaphore Pend..... so sl_Task() by definition cannot run during the execution of sl_Accept()......

    So, from what you're saying, sl_Task() running each time round the loop is actually not sufficient, because it has to run during the execution of some other function calls, not just between calls. Is that correct?

    If so, then that's not at all what the documentation implies (to me at least!), but is pretty easy to achieve. Can you confirm that sl_Task() is OK to be called completely asynchronously to the main program flow - that it doesn't matter if other WiFi library functions are executing when sl_Task() runs? If so, then I can call it from a low-priority timer interrupt so as it always runs regularly..... How often does sl_Task() ideally need to run? Once every 100ms OK?

    Thanks for your help!

    John

  • Hi Kevin,

    Just further to this: I had a few minutes to play, so I tried dropping sl_Task() into a timer interrupt routine so as it ran every 200ms asynchronously to the main loop..... and sl_Accept() now returns promptly with SL_ERROR_BSD_EAGAIN, which is what it should be doing.....

    Furthermore, if I use my laptop to connect to the WiFi Module and open a TCP connection to the port being listened on, sl_Accept() returns a sensible-looking client socket number instead of SL_ERROR_BSD_EAGAIN, and my state machine moves on as expected....

    So it's very possible that all is now working.... As I'm building my application as I go along I've not yet got any data transfer code to test, and I won't have a chance to do much more before next week, but it's all looking very hopeful....

    So I'm going to mark this thread as 'resolved', as any further issues would probably warrant a new thread title. Thank you very much for your help, I'm very grateful! I would suggest, though, that you feed back to the documentation team that the documentation on porting the driver is very misleading and could do with thorough review by someone who really knows about the library in depth.

    Thanks again!
    Best wishes,
    John