This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[NDK] Crash on DaemonFree

Hi everyone,


I have one strange issue with the NDK 2.24. When calling DaemonFree the Application crashes. The reason (as far as I can tell) is:

void DaemonFree( HANDLE hEntry )
{
    DREC *pdr = (DREC *)hEntry;
    CHILD *pc;

    /* At this point we must have a semaphore */
    if( !hDSem )
        return;

    /* Enter our own lock */
    SemPend( hDSem, SEM_FOREVER );

    /* Sanity check */
    if( pdr->Type!=SOCK_STREAM && pdr->Type!=SOCK_STREAMNC && pdr->Type!=SOCK_DGRAM )
        goto errorout;

    /* Clear the record */
    pdr->Type = 0;
    RequestedRecords--;

    /* Close the socket session of all children. This will */
    /* cause them to eventually fall out of their code and */
    /* close their sockets */
    pc = pdr->pC;
    while( pc )
    {
        if( pc->hTask )
        {
            fdCloseSession( pc->hTask );
            pc->hTask = 0;
        }
        pc = pc->pNext;
    }

    /* Close the socket (this will wake anyone who's using it) */
    if( pdr->s != INVALID_SOCKET )
    {
        fdClose( pdr->s );
        pdr->s = INVALID_SOCKET;
    }

    /* If there are no more records, close the daemon task's */
    /* file descriptor session. That will cause it to error */
    /* out and remove itself */
    if( !RequestedRecords )
    {
        fdCloseSession( hDTask );
        hDTask = 0;
    }

errorout:
    /* Exit our lock */
    SemPost( hDSem );
}

As you can see DaemonFree calls fdClose(pdr->s) and invalidates the socket (pdr->s = INVALID_SOCKET). However, after that the main daemon thread crashes in the following code:

                    if( bind( drec[i].s,(struct sockaddr *)&sin1, sizeof(sin1) ) < 0 )
                    {
                        fdClose( drec[i].s );
                        drec[i].s = INVALID_SOCKET;
                    }

Here fdClose tries to close the already closed socket again... which crashes since fdClose doesn't check for INVALID_SOCKET.

Any ideas why and how?

Greetings,

Thomas

  • Hi Thomas,

    Can you please share your code so that we can reproduce the issue? Also can you share the full version of NDK and other TI products that you are using? Which board are you running this on?

    Thanks,
    Vikram
  • Hi Virkam,

    unfortunately I cannot provide any simple example code since this would take some effort (which I do not have time for right now). Sorry.

    However, I think I understand the circumstances of the crash now better. It is actually not DaemonFree causing the crash. It is some cleanup code, when doing CfgRemoveEntry.

    In our scenario we need to reconfigure the devices IP on runtime. So we use CfgRemoveEntry to remove the "old" IPs config entry and add a new one using CfgAddEntry. Now, when doing that CfgRemoveEntry, the NDK stack thread calls SPIpNet() on the road. There SockCleanPcb() is called.

    /*-------------------------------------------------------------------------- */
    /* SPIpNet() - CFGTAG_IPNET Service Provider (ti/ndk/netctrl/netsrv.c) */
    /*-------------------------------------------------------------------------- */
    static int SPIpNet(HANDLE hCfg, uint Tag, uint Item, uint Op, HANDLE hCfgEntry)
    {
        CI_IPNET *pi;
        HANDLE   hIF;
    
        (void)Tag;
    
        /* Get the information */
        if( CfgEntryInfo( hCfgEntry, 0, (UINT8 **)(&pi) ) < 0 )
            return( -1 );
    
    .... REMOVED SOME IRRELEVANT CODE HERE
    
        /* Else if this is a "remove", remove the entry */
        else if( Op == CFGOP_REMOVE )
        {
            /* If no network, return "pass" */
            if( !pi->hBind )
                return(0);
    
            /* Remove the network */
            NtRemoveNetwork( pi->hBind );
            pi->hBind = 0;
    
            /* BUG FIX SDSCM00014560 
             *  When an IP address is modified we need to close the sockets of all existing
             *  applications. */
            {
                SockCleanPcb (SOCKPROT_TCP, pi->IPAddr); // <===== HERE!!!
                SockCleanPcb (SOCKPROT_UDP, pi->IPAddr);
                SockCleanPcb (SOCKPROT_RAW, pi->IPAddr);
            }
    
            /* Notify NetCtrl */
            NC_IPUpdate( pi->IPAddr, Item, 0 );
        }
    
        /* Perform service maintenance when not in boot process */
        if( !fBooting )
            ServiceScan( hCfg );
    
        /* Return success */
        return(1);
    }
    
    

    Now, in SockCleanPcb(), SockClose() (which frees the allocated memory using SockIntAbort()) is called completely ignoring the fact, that the socket might already have been closed by any other thread (woken up through FdSignalEvent()).

    void SockCleanPcb (uint SockProt, IPN IPAddress)
    {
        SOCK    *ps;
        int error;
    
        /* Validate the socket protocol family. */
        if( SockProt<SOCKPROT_TCP || SockProt>SOCKPROT_RAW )
            return;
    
        /* Cycle through all the entries in the socket table. */
        ps = pSockList [SockProt];
        while (ps != NULL)
        {
            /* Clean the entry only if we have a match */
            if (ps->LIP == IPAddress) {
                /*
                 *  Look for any sockets that are blocked on accept() call
                 *  (fix for SDOCM00115513)
                 */
                if (ps->OptionFlags & SO_ACCEPTCONN) {
                    /* Set an error for this socket to force accept() to return */
                    error = ECONNABORTED;
                    SockSet((HANDLE)ps, SOL_SOCKET, SO_ERROR, &error,
                            sizeof(error));
                    /*
                     *  Wake the socket blocked on accept(), allowing accept() to
                     *  handle the error and return
                     */
                    FdSignalEvent(ps, FD_EVENT_READ);
                }
    
                SockClose (ps); // <====== HERE!!!
            }
    
            /* Get the next entry. */
            ps = ps->pProtNext;
        }
        return;
    }
    

    Now, there are two scenarios:

    1. Either the daemon thread wakes up because of the FdSignalEvent (see above) and closes the socket itself (this is done in fdPoll()). In that case the NDK stack user thread will crash when calling the above SockClose() later.

    2. Or the daemon thread wakes up later using a then invalid and (memory)freed socket... which will also crash.

    I think the problem here is, that SockCleanPcb() does not take the reference counting for the fd-Sockets into account. My current workaround is:

    void SockCleanPcb (uint SockProt, IPN IPAddress)
    {
        SOCK    *ps;
        int error;
    
        /* Validate the socket protocol family. */
        if( SockProt<SOCKPROT_TCP || SockProt>SOCKPROT_RAW )
            return;
    
        /* Cycle through all the entries in the socket table. */
        ps = pSockList [SockProt];
        while (ps != NULL)
        {
            /* Clean the entry only if we have a match */
            if (ps->LIP == IPAddress) {
                llEnter();
                /* BUGFIX for #9766
                   Increase OpenCount and use fdClose (instead of SockClose) since FdSignalEvent below might lead to other
                   threads closing the socket using fdClose. So SockClose below will then use an invalid socket pointer!!!
                */
                ps->fd.OpenCount++;
                
                /*
                 *  Look for any sockets that are blocked on accept() call
                 *  (fix for SDOCM00115513)
                 */
                if (ps->OptionFlags & SO_ACCEPTCONN) {
                    /* Set an error for this socket to force accept() to return */
                    error = ECONNABORTED;
                    SockSet((HANDLE)ps, SOL_SOCKET, SO_ERROR, &error,
                            sizeof(error));
                    /*
                     *  Wake the socket blocked on accept(), allowing accept() to
                     *  handle the error and return
                     */
                    llExit();
                    FdSignalEvent(ps, FD_EVENT_READ);
                    llEnter();
                    
                }
                llExit();
    
                fdClose((HANDLE)ps); // use fdClose instead of SockClose       
    
            }
    
            /* Get the next entry. */
            ps = ps->pProtNext;
        }
        
        return;
    }
    

    Please note the usage of fdClose instead of SockClose and upcounted fd.OpenCount. This ensures that the socket is not closed by any fdSignalEvent-woken-up thread.

    Or am I completely on the wrong road here?

    Greetings,
    Thomas

  • Hello.
    Can you hear me?
    ....

  • Sorry, Vikram is on vacation and we missed this thread. We'll have someone else look at it.
  • Hi Thomas,
    I'm going to file a bug against the NDK for this problem. I will also discuss your work-around with our local NDK expert when he is back in the office.
    Best regards,
    Janet
  • Hi Janet

    is there any news about this thread?

    Regards,
    Bernd

  • Hi Bernd,

    I filed a bug jira against the NDK for this:

    https://jira.itg.ti.com/browse/NDK-119?filter=-2

    The NDK team is looking into it.

    Best regards,

    Janet

  • Hi Janet

    any news?

    Regards,
    Bernd

  • Hi Brend,

    The NDK developer is working on a plan for the next NDK release, so this will probably be fixed within the next 6 months.

    Best regards,

    Janet

  • Hi Janet

    any update here?

    Regards,
    Bernd
  • Hi Bernd,

    No update on this bug.  The NDK release has been delayed until later this year.

    Best regards,

    Janet

  • Hi Janet

    more than 6 months are gone since your last feedback.
    Any news now?

    Regards,
    Bernd
  • I'm confirming the date for the next NDK release. I'll reply on Monday as the development manager is out today.
  • It is scheduled to be included in the NDK 3.00 release. The scheduled time-frame is Aug/Sept.

    Todd
  • Hi Thomas,

     

    we don’t plan to do any additional NDK work for Centaurus.

     

    Regards, Bernd