This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2530: Null pointer access in NWK manager - ZIGBEE-LINUX-SENSOR-TO-CLOUD_1.0.1

Part Number: CC2530


I got a null pointer access in the network server:

[2020-11-04 07:33:50.626,940] [NWK_MGR/MAIN] INFO   : [MUTEX] Unlock SRSP Mutex
[2020-11-04 07:33:51] nwkmgrservices.c:483:9: runtime error: member access within null pointer of type 'struct sNwkMgrDb_DeviceInfo_t'
[2020-11-04 07:33:51] SUMMARY: AddressSanitizer: undefined-behavior nwkmgrservices.c:483
[2020-11-04 07:33:51] ASAN:SIGSEGV
[2020-11-04 07:33:51] =================================================================
[2020-11-04 07:33:51] ==791==ERROR: AddressSanitizer: SEGV on unknown address 0x00000008 (pc 0x00065c98 bp 0xbea7b8d8 sp 0xbea7b568 T0)
[2020-11-04 07:33:51]     #0 0x65c97 in zNwkSrv_TimerCallback .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:483
[2020-11-04 07:33:51]     #1 0x65387 in zNwkSrv_UpdateTimers .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:419
[2020-11-04 07:33:51]     #2 0x45065 in timerHandler .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:886
[2020-11-04 07:33:51]     #3 0xb6d65a5f  (/lib/libc.so.6+0x2ca5f)
[2020-11-04 07:33:51]     #4 0xb6f41043 in pause (/lib/libpthread.so.0+0x11043)
[2020-11-04 07:33:51]     #5 0x54bd9 in getUserInput .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:5170
[2020-11-04 07:33:51]     #6 0x44063 in appMain .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:610
[2020-11-04 07:33:51]     #7 0x9a075 in main ../srvwrapper/main.c:182
[2020-11-04 07:33:51]     #8 0xb6d4fcf7 in __libc_start_main (/lib/libc.so.6+0x16cf7)
[2020-11-04 07:33:51]
[2020-11-04 07:33:51] AddressSanitizer can not provide additional info.
[2020-11-04 07:33:51] SUMMARY: AddressSanitizer: SEGV .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:483 zNwkSrv_TimerCallback
[2020-11-04 07:33:51] ==791==ABORTING

This happens in the call to sendSipleDesReq:

static void zNwkSrv_TimerCallback( zNwkSrv_AD_StateMachine_t *pState )
[...]
      case zNwkSrv_AD_State_GettingSimpleDesc_c:
        sendUnicastRouteReq( pState->pDeviceInfo->nwkAddr );
        sendSimpleDescReq( pState->pDeviceInfo->nwkAddr, pState->pDeviceInfo->aEndpoint[pState->ep].endpointId );
        break;

Given that the segmentation fault is generated on the access to address  0x00000008, we can determine from the struct sNwkMgrDb_DeviceInfo_t that the access to pState->pDeviceInfo->nwkAddr is causing this.  (The message designates sNwkMgrDb_DeviceInfo_t , not zNwkSrv_AD_StateMachine_t where pDeviceInfo is likely also at zNwkSrv_AD_StateMachine_t ).
However, line 482 has the same parameter.  Either the report is offset by 1 line, or pState was modified in the mean time.  This source file was not changed.

sendUnicastRouteReq does not seem to impact pState.

What could be causing this null pointer?

nwkmanagerNullPointerAccess.zip

  • Hi,

    This occurrence is likely associated with the previous debug print:

    [2020-11-04 07:33:50] AddDevice: Retrying state 6, remaining tries: 2
    [2020-11-04 07:33:50.606,082] [NWK_MGR/MAIN] MISC1 : NwkMgr MOT sendNwkRouteReq

    And then it seems like some "state machine" is freed:

    [2020-11-04 07:33:50.613,853] [NWK_MGR/HNDL] INFO : zNwkSrv_AD_WrapUp: Freeing State machine
    [2020-11-04 07:33:50.614,052] [NWK_MGR/HNDL] INFO : zNwkSrv_AD_FreeStateMachine: Entered pState=0xb3403e5c
    [2020-11-04 07:33:50.614,141] [NWK_MGR/HNDL] INFO : zNwkSrv_AD_FreeStateMachine: State machine freed

    So it could be the case that this had indirectly been freed? There could be good reason to this, will need to further investigate.

    Probably would be helpful to include some kind of print of the "pState=" in each key places where it is accessed.

    Regards,
    Toby

  • I analysed the other thread first.

    I came more or less to the same conclusion, even though I do believe that the conditions are different - this case is not entirely the same as the other one ( ) .

    There is a double test of pState/pDeviceInfo in one location:

    ./source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:  if ( !pState || !pState->pDeviceInfo )

    In the network manager, pState is assigned in two locations (other than memset or memcpy, if any): :

    ./source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:  pState = zNwkSrv_AD_ReuseStateMachine( ieeeAddr );
    ./source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:    pState = zNwkSrv_AD_GetStateMachine();

    pDeviceInfo is et only one location apparently (other than memset or memcpy,):

      // found a free state machine. also, allocate a device info for this entry
      gNwkSrv_AD_StateMachine[i].pDeviceInfo = malloc( sizeof( sNwkMgrDb_DeviceInfo_t ) );
      if ( !gNwkSrv_AD_StateMachine[i].pDeviceInfo )
      {
        gNwkSrv_AD_StateMachine[i].state = zNwkSrv_AD_State_Available_c;
        return NULL;
      }
      // make sure to fill with 00s so things like simple descriptor pointers are NULL
      memset( gNwkSrv_AD_StateMachine[i].pDeviceInfo, 0, sizeof( sNwkMgrDb_DeviceInfo_t ) );

    Where I now remplace the malloc with a calloc so that the memset is not needed.

    The zNwkSrv_AD_GetStateMachine function also show that the pDeviceInfo pointer can change when there is a lack of memory,;
    In order to determine if there are issues with that, I should set NWKSRV_ADD_STATE_MACHINES to 1 during testing.

    Determining the cause of "[2020-11-04 07:33:50] AddDevice: Retrying state 6, remaining tries: 2" can also help

    I might have been running in a memory availble issue on the embedded platform as well (with all the debug options that were active, memory consumption was higher).  But in that case, most of the gateway memory allocations result in a message that say that the memory could not be allocated.  So if memory was an issue here, then it should appear in the log.

  • After a reboot (without a power cycle) of my system, the gateway/ZNP was still not responsive, so I did a reset by pressing the button on the CCDebugger.

    This resulted int he following memory access error:

    [2020-11-08 17:50:22] =================================================================
    [2020-11-08 17:50:22] ==783==ERROR: AddressSanitizer: heap-use-after-free on address 0xb4c02d78 at pc 0x00067045 bp 0xbe96a558 sp 0xbe96a55c

    [2020-11-08 17:50:22] ==783==ERROR: AddressSanitizer: heap-use-after-free on address 0xb4c02d78 at pc 0x00067045 bp 0xbe96a558 sp 0xbe96a55c
    [2020-11-08 17:50:22] READ of size 4 at 0xb4c02d78 thread T0
    [2020-11-08 17:50:22] #0 0x67043 in zNwkSrv_AD_FreeStateMachine .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:803
    [2020-11-08 17:50:22] #1 0x6626d in zNwkSrv_TimerCallback /home/mdeweerd/workspace/habitat/3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:514
    [2020-11-08 17:50:22] #2 0x65477 in zNwkSrv_UpdateTimers /home/mdeweerd/workspace/habitat/3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:420
    [2020-11-08 17:50:22] #3 0x451d9 in timerHandler /home/mdeweerd/workspace/habitat/3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:886
    [2020-11-08 17:50:22] #4 0xb6dc2a5f (/lib/libc.so.6+0x2ca5f)
    [2020-11-08 17:50:22] #5 0xb6f9e043 in pause (/lib/libpthread.so.0+0x11043)
    [2020-11-08 17:50:22] #6 0x54d4d in getUserInput /home/mdeweerd/workspace/habitat/3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:5170
    [2020-11-08 17:50:22] #7 0x441d7 in appMain /home/mdeweerd/workspace/habitat/3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:610
    [2020-11-08 17:50:22] #8 0x9a005 in main ../srvwrapper/main.c:182
    [2020-11-08 17:50:22] #9 0xb6daccf7 in __libc_start_main (/lib/libc.so.6+0x16cf7)

    I added some tracking to the code to know when the zNwkSrv_AD_FreeStateMachine is entered. The relevant lines are:

    .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:514
    .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrservices.c:410 (discriminator 2)
    .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:888
    .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:5176
    .../3rdparty/ti/Zigbee_3_0_Linux_Gateway_1_0_1/source/Projects/zstack/linux/nwkmgr/nwkmgrsrv.c:612


    Which corresponds to (line 414):
    zNwkSrv_AD_FreeStateMachine( pState );
    inside
    static void zNwkSrv_TimerCallback( zNw cankSrv_AD_StateMachine_t *pState )

    Called from zNwkSrv_UpdateTimers (line 410).

    Line 888 in nwkmgrsrv.c is 'ualarm( TIMER_WAIT_PERIOD, 0 );' in 'static void timerHandler( int sig )'


    The error is triggered two lines before the ualarm call (but it happens later in time?), from 'zNwkSrv_UpdateTimers();'


    The memory access error happens just after that, on line 803 when "pState->pDeviceInfo" is accessed.

    Lines 5176 and 5170 are inside user_input, but I wasn't on the keyboard at that time.


    I now know what is wrong and I have an idea on how to fix it.

    1) In stead of using if(pState!=NULL), do:
    if(lockState(pState)) {
    // Code.
    releaseState(pState);
    }
    2) 'lockState' increases a semaphore showing that there is a process using the state.
    3) 'releaseState' decrements the semaphore and deletes it when it has the "delete_state"
    4) When the state should be deleted, set it to "delete_state".
    5) 'lockState' returns false when the stiate is set to "delete_state".

    However, zNwkSrv_AD_GetStateMachine is making this more complicated because it changes the location of pState.
    To cope with that it appears to me that pState has to become "pStateIdx", and accessing the original 'pState' would become PSTATE(pStateIdx) where '#define PSTATE(idx) '((idx<gNwkSrv_AD_StateMachines)?&gNwkSrv_AD_StateMachine[i]:NULL)' .


    Another problem that occured is that the 'zigbeegw_debug' ("wrapper" for zigbeegw) did not try to restart the servers. and reports "bailing out".
    That means that an extra layer of monitoring is needed: the system must ensure that there is always one instance of 'zigbeegw' running.

    DoubleDeletionOfState.zip

  • The bailing out issue is fixed by setting the corresponding expected code to 1000.

    The GW server on my development system is now crashing repeatedly because of the invalid accesses to deleted pDeviceInfo entries.

  • I've changed management of the StateMachines in 'nwkmgrsrvr.c' based on 'pthread' and an improved memory allocation and release architecture that does not copy states anymore, only the state pointers.

     There are still other issues to identify and fix, but I got no more crashed because of invalid memory accesses for now.

  • My updates to the code did not prevent me from entering a deadlock: the nwk_mgr was waiting for a statemachine to be released while it was already locked in the same thread.

    This led to identifying what likely happened with regards to the null pointer access and the released memory access.

    At its core is the fact that zNwkSrv_AD_ReuseStateMachine does not actually reuse a machine, and that it doesn't care about the machine being in "WrapUp" state which leads freeing the state machine in the function where the wrap up state was set.

    So when the wrapup is interrupted, then the statemachine can be freed two times - once through the zNwkSrv_AD_ReuseStateMachine  and once in the WrapUp.

    I have updated zNwkSrv_AD_ReuseStateMachine to do stricter state checking.