This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC1352P: SRSP Cond Wait timed out

Part Number: CC1352P
Other Parts Discussed in Thread: Z-STACK, CC2538

We have are having an issue with a smart home gateway with CC1352 adn Z-Stack Linux gateway. After a certain amount of commands sent to a smart plug, we get the following message:

[16:54:21.697,127] [Z_STACK/LSTN] ERROR  : SRSP Cond Wait timed out!
[0m[37m[16:54:21.696,797] [GATEWAY/LSTN] ERROR  : SRSP Cond Wait timed out!
[0m[37m[16:54:21.696,894] [GATEWAY/LSTN] ERROR  : apicSendSynchData() failed getting response

I doubt that this has to do with UART speed or other speed-related parameters as the problem can be reproduced depending on the amount of commands sent, not their interval:

  1. 2 secs interval: after about 50 minutes
  2. 5 secs interval: after about 3 hours
  3. 15 secs interval: after about 10 hours

When running a test script changing the plug state each x seconds, at some point we do not receive the state anymore (this is when we get the timeout message) but the plug continues to switch correctly. After another 20 minutes or so the plug also stops switching. For the second state, I cannot find a specific point in the log, you can just see this on the plug.

Could it be some counter or buffer which will overrun after some time?

In the attached log you can find the event at 16:54:21.697,127. I needed, we can provide further logs. The problem happens with different units (gateway and plug) so it is not related to a specific hardware.

ZigBee gateway version is the latest (exact versions of each module at the beginning of the trace), the Z-Stack on the module is 4.20.00.35.

Regards
Peter

  • Sorry, the attachment was missing in the previous post.

    5807.zigbee.zip

  • Hi,

    Does the issue clear up if you:
    - reset the ZNP?
    - restart the gateway servers?

    Is it possible to provide a log with more verbose logs enabled?
    For example, in zigbeeHAgw --> setup_for_arm --> export NPI_CMD="./$NPI_NAME $NPI_CONFIG -v 0xFFFFFFFF" (for all the other servers too).

    Regards,
    Toby

  • Hi Toby,

    Attached please find the logs with verbose enables.

    zigbee1.log is the log of the initial test where it stopped responding after 2520 toggle operations.Timestamp 10:57:42:772.

    I have also tested if it is a problem with the end device. I unplugged and replugged our plug to see if it was working again but no. So we can exclude that option.

    Switching works fine again when restarting the server (zigbee2.log).

    In your answer, did you mean restarting single processes? I have tried with ZLSZNP and it was not a good idea as all other processes except GATEWAY_SRVR were closed as well. If you want me to try restarting single server processes, can you please advise on a good order? As I have to wait each time for about 2 hours to get back to the condition, it is not an ideal scenario having to try all combinations...

    Regards
    Peter

    ZigBee Logs.zip

  • Thanks for sharing the logs, I see if I can find anything there.

    Peter Hoyer said:
    In your answer, did you mean restarting single processes?


    I meant restarting all of them (I guess one by one). An example of this is shown in the script zigbeeHAgw:

    stop_all()
    {
    	stop_tracker
    	stop_nwkmgr
    	stop_others
    stop_others()                                              
    {                                                   
    	stop_gateway
    	stop_ota
    	stop_zlsznp
    	stop_npi                                                    
    }     
    start_all()                                                 
    {    
    ...
    				start_npi
    				SERVER_PID_VAR=NPI_PID
    			;;
    			2)
    				start_zlsznp
    				SERVER_PID_VAR=ZLSZNP_PID
    			;;
    			3)
    				start_netmgr
    				SERVER_PID_VAR=NETWORK_MGR_PID
    			;;
    			4)
    				start_gateway
    				SERVER_PID_VAR=GATEWAY_SERVER_PID
    			;;
    			5)
    				start_otaserver
    				SERVER_PID_VAR=OTA_SERVER_PID
    			;;

  • Hi Toby,

    Then I have done the right thing. Yes, if restarting all services, it works again well. You have it in the logs.

    Regards
    Peter

  • Looks like Z_STACK/READ is no longer active around where the issue begins to happen (which explains why the timeouts occur every 2 seconds for Z_STACK/LSTN (your app is configured to send onoff commands every 2 secs in this case?)).

    I don't see any further debug prints, so this could be a socket issue (recv  calls by Z_STACK/READ ).

    Also, there seems to be a lot of unnecessary route requests (that I've seen come up in another recent case).
    Can you use the following patch for reference?

    reduce_route_req.patch

  • Hi Toby,

    We are still testing but the switching issue seems to be resolved with your patch or a least it does not occur within very few hours anymore.

    Thank you and regards
    Peter

  • Hi Toby,

    It just takes longer now but unfortunately, we do still have this issue. It depends on the system. Sometimes it happens after just 2 hours with a 5 seconds switching interval, sometimes only after 12 or more hours with even a 2 seconds switching interval. Software and hardware are always the same.

    I have attached some more logs in case they might be helpful.

    How can we find the actual cause of this problem?

    Regards
    Peter

    zb.tar.gz

  • Ok, I will look into this and update you within the next week.

    If possible, can you share sniffer logs as well ?
    Ideally the log encompasses the period where the issue happens, but having steady-state network traffic should be sufficient to see what is happening in the typical case.

    Thanks,
    Toby

  • Hello,

    I have the same problem. I posted about it few month ago (e2e.ti.com/.../873858) but unfortunately no clear solution was obtained. In my case I send some commands (for example 40 on/off commands) to some devices (Interval is 500ms at maximum, but after any command is sent I wait for its response and then send next command to GW server). In first two or three tries it works good but after that confirmation responses are failed and I need to restart Zigbee Server to go back to normal state. I use CC2538 as ZNP device and Z-Stack home 1.2.2. I will be so thankful if you can help me about solving this issue because it has stopped our work.

  • Can this patch be used with Z-Stack Linux GW 1.2.2? and how can I use it?

  • Hi Ahmad.TI,

    This patch was made for the Zigbee 3.0 gateway. I recommend migrating to that one if possible. It should be compatible with Zigbee HA devices.

    The files which the patch touches are gatewaysrvr.c and zcl.c.

    For zcl.c, the change is applied in function zcl_SendCommand:

        status = AF_DataRequest( destAddr, epDesc, clusterID, msgLen, msgBuf,
                                 &zcl_TransID, options, AF_DEFAULT_RADIUS );

    is replaced with:

        // TP: if response message, use the same seqNum as original message instead of zcl_TransID
        if ( !specific &&
            ( cmd == ZCL_CMD_READ_RSP ||
              cmd == ZCL_CMD_WRITE_RSP ||
              cmd == ZCL_CMD_CONFIG_REPORT_RSP ||
              cmd == ZCL_CMD_DEFAULT_RSP ||
              cmd == ZCL_CMD_DISCOVER_ATTRS_RSP ) )
        {
            status = AF_DataRequest( destAddr, epDesc, clusterID, msgLen, msgBuf,
                                     &seqNum, options, AF_DEFAULT_RADIUS );
        }
        else
        {
            status = AF_DataRequest( destAddr, epDesc, clusterID, msgLen, msgBuf,
                                     &zcl_TransID, options, AF_DEFAULT_RADIUS );
        }

    For gatewaysrvr.c:

      if (pTransEntry &&  ZCL_CMD_DEFAULT_RSP == pInMsg->hdr.commandID) {
         // We received ack for our command, remove the transaction.
        gwMsgRemoveTrans( pTransEntry->appTransId );
      }
    

    is added before:

       // Free memory
       free( zclFrameInd.payload.data );

    Feel free to evaluate with these changes. There may be other changes between the 1.2.2 and 3.0 gateway to consider, but again, it is recommended if possible to use 3.0 gateway.

    >>Interval is 500ms at maximum

    In general I would not recommend sending too frequently, as this could overflow the buffers at ZNP. This could cause some messages to not be sent over the air, meaning that the gateway will time out commands that it is expecting a response for. Having this continuously happen could put the gateway in a non-ideal state.

    Since it seems this issue is easily reproducible for you, my suggestion for debugging your current setup is to check the transactions (e.g. "[GATEWAY/LSTN] INFO: (GW posted) cmdId: 39, zclTransId 68") with what is sent over the air (e.g. do you see the ZCL command with this sequence number?).

  • Hi Toby, 

    Thanks you for you reply. I made these changes but unfortunately no significant improvement was achieved. I increased interval time to one second and it became better but still not acceptable. In most cases that this situation happens, the command is sent to destination device but confirm command fails and also no status from device is received, so the command is sent over the air. 

    Here is the log file when the problem occurs, but in this case the server came back to normal state after a while (after that no command was sent to it for few minutes). In time [12:25:47.219,580], first time out happened and this failure continued until [12:27:45.898,562]. After this time the server worked fine. Why it can't recover from failure state and why it became OK after not sending command for a while?

    4010.Zigbee-Server-Log1.txt

  • We have made various updates to server wrapper functionality (e.g. api_client.c, api_server.c, ... , please see the files in C:\ti\Zigbee_3_0_Linux_Gateway_1_0_1\source\Projects\zstack\linux\srvwrapper). Your issue seems related to this. Please check out those files and consider migrating that to your project.

  • Hi Toby,

    Attached a very quick one. This time it took less than one hour while switching 2 plugs on and off in alternation. After restarting the server, it will work fine again.

    Regards
    Peter

    1321.zigbee.zip3487.sniffer log.zip

  • Thank you, this looks to be a helpful data point; creating a similar traffic profile on my end will ideally reproduce the issue in similar fashion, to find the root cause.

  • Hi Peter,

    An update from my side:

    After investigating the logs which you've provided, I believe the issue is related to and EINTR, which occurs during call to recv() in SISreadThreadFunc: recv: Interrupted system call. Notice that in zigbee.log, all the "SRSP Cond Wait timed out!" happen after "recv: Interrupted system call".

    Specifically for Z_STACK/READ:

      // Read from socket
      do
      {
        // Normal data
        n = recv( pInstance->sAPIconnected, &hdrbuf, sizeof(hdrbuf), MSG_WAITALL );
    
        if ( n <= 0 )
        {
          if ( n < 0 )
          {
            perror( "recv" );
          }
          else
          {
            uiPrintfEx(trINFO, "Peer closed connection\n" );
          }
          done = 1;
        }

    As you can see, if recv() runs across EINTR, then this thread essentially ends (done = 1). The result is that Z_STACK/READ no longer receives messages, causing Z_STACK/LSTN to always time out for synchronous transactions (READ is the only thread which signals that a SRSP is received).

    To overcome this, we can re-do the recv() if it was interrupted by a system call, something like:

    // Normal data
        do {
            n = recv( pInstance->sAPIconnected, &hdrbuf, sizeof(hdrbuf), MSG_WAITALL );
        } while ( n < 0   &&   errno == EINTR);
        
    // ...
            if ( len > 0 )
            {
              do {
                n = recv( pInstance->sAPIconnected, pMsg + 1, len, MSG_WAITALL );
              } while ( n < 0   &&   errno == EINTR);
            }
            else
            {
              // There are no payload bytes; which is also valid.
              n = 0;
            }

    Regards,
    Toby

  • Hi Toby,

    I have applied your changes but now it runs ino a segfault after some time, I suppose when the othe problem would have normally occured.

    I have attached the full log file and below a screenshot from when it happens. There is also the ZB trace attached and, just to make sure, a copy of the file which I have changed.

    Regards
    Peter

    5875.zigbee.zip

    sniffer trace.zip

    api_client.c
    /*********************************************************************
     Filename:       api_client.c
     Revised:        $Date: 2014-11-18 18:32:59 -0800 (Tue, 18 Nov 2014) $
     Revision:       $Revision: 41168 $
    
     Description:    This file contains the API Server Wrapper client APIs.
    
    
     Copyright 2013 - 2014 Texas Instruments Incorporated. All rights reserved.
    
     IMPORTANT: Your use of this Software is limited to those specific rights
     granted under the terms of a software license agreement between the user
     who downloaded the software, his/her employer (which must be your employer)
     and Texas Instruments Incorporated (the "License").  You may not use this
     Software unless you agree to abide by the terms of the License. The License
     limits your use, and you acknowledge, that the Software may not be modified,
     copied or distributed unless used solely and exclusively in conjunction with
     a Texas Instruments radio frequency device, which is integrated into
     your product.  Other than for the foregoing purpose, you may not use,
     reproduce, copy, prepare derivative works of, modify, distribute, perform,
     display or sell this Software and/or its documentation for any purpose.
    
     YOU FURTHER ACKNOWLEDGE AND AGREE THAT THE SOFTWARE AND DOCUMENTATION ARE
     PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
     INCLUDING WITHOUT LIMITATION, ANY WARRANTY OF MERCHANTABILITY, TITLE,
     NON-INFRINGEMENT AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL
     TEXAS INSTRUMENTS OR ITS LICENSORS BE LIABLE OR OBLIGATED UNDER CONTRACT,
     NEGLIGENCE, STRICT LIABILITY, CONTRIBUTION, BREACH OF WARRANTY, OR OTHER
     LEGAL EQUITABLE THEORY ANY DIRECT OR INDIRECT DAMAGES OR EXPENSES
     INCLUDING BUT NOT LIMITED TO ANY INCIDENTAL, SPECIAL, INDIRECT, PUNITIVE
     OR CONSEQUENTIAL DAMAGES, LOST PROFITS OR LOST DATA, COST OF PROCUREMENT
     OF SUBSTITUTE GOODS, TECHNOLOGY, SERVICES, OR ANY CLAIMS BY THIRD PARTIES
     (INCLUDING BUT NOT LIMITED TO ANY DEFENSE THEREOF), OR OTHER SIMILAR COSTS.
    
     Should you have any questions regarding your right to use this Software,
     contact Texas Instruments Incorporated at www.TI.com.
     *********************************************************************/
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <errno.h>
    #include <string.h>
    #include <sys/types.h>
    #include <sys/socket.h>
    #include <sys/un.h>
    #include <pthread.h>
    #include <semaphore.h>
    #include <poll.h>
    #include <sys/time.h>
    #include <unistd.h>
    #include <signal.h>
    #include <bits/local_lim.h>
    
    #ifndef NPI_UNIX
    #include <netdb.h>
    #include <arpa/inet.h>
    #endif
    
    #include "hal_types.h"
    #include "hal_rpc.h"
    #include "api_lnx_ipc_rpc.h"
    #include "api_client.h"
    #include "trace.h"
    #include "pb_utils.h"
    #include "api_server.h"
    
    /*********************************************************************
     * Constant
     *********************************************************************/
    #define MSG_AREQ_READY     0x0000
    #define MSG_AREQ_BUSY      0x0001
    
    // These came from api_lnx_ipc_rpc.h
    #define API_LNX_PARAM_NB_CONNECTIONS    1
    #define API_LNX_PARAM_DEVICE_USED     2
    
    /*********************************************************************
     * EXTERNAL VARIABLES
     *********************************************************************/
    extern apisSysParams_t *pAPIS_SysParams;
    
    /*********************************************************************
     * Typedefs
     *********************************************************************/
    
    #ifdef API_CLIENT_8BIT_LEN
    typedef apic8BitLenMsgHdr_t apicMsgHdr_t;
    #else // API_CLIENT_8BIT_LEN
    typedef apic16BitLenMsgHdr_t apicMsgHdr_t;
    #endif // API_CLIENT_8BIT_LEN
    struct linkedAreqMsg
    {
      struct linkedAreqMsg *nextMessage;
      uint8 subSys;
      uint8 cmdId;
      uint16 len;
    };
    
    typedef struct linkedAreqMsg areqMsg;
    
    typedef struct _apicInstance_t
    {
      // Client socket handle
      int sAPIconnected;
    
      // Mutex to handle synchronous request-response transactions
      pthread_mutex_t clientSREQRSPmutex;
    
      // Mutex to handle synchronous response
      // i.e., to protect the synchronous response queue
      pthread_mutex_t clientSREQmutex;
    
      // Mutex to protect asynchronous message receive queue
      pthread_mutex_t clientAREQmutex;
    
      // Mutex to protect socket send
      pthread_mutex_t sendMutex;
    
      // conditional variable to notify Synchronous response
      pthread_cond_t clientSREQcond;
    
      // conditional variable to notify that the AREQ is handled
      sem_t clientAREQsem;
    
      pthread_t SISRThreadId;
      pthread_t SISHThreadId;
    
      // Client data AREQ received buffer
      areqMsg *areq_rec_buf;
    
      // SRSP received message payload
      areqMsg *srsp_msg;
    
    #ifndef NPI_UNIX
      struct addrinfo *resAddr;
    #endif
    
      // variable to store the number of received synchronous response bytes
      long numOfReceivedSRSPbytes;
    
      // Message count to keep track of incoming and processed messages
      int areqRxMsgCount;
      int areqProcMsgCount;
    
      // Notification from receive thread to callback thread
      // that the connection is closed.
      bool closed;
    
      // Application triggered close
      bool appClosed;
    
      // Freeing this memory block is postponed because apicClose()
      // is called from a callback thread
      bool freePending;
    
      // Asynchronous message callback function
      pfnAsyncMsgCb pfnAsyncMsgHandler;
    } apicInstance_t;
    
    /*********************************************************************
     * Globals
     *********************************************************************/
    
    size_t apicThreadStackSize = (PTHREAD_STACK_MIN * 3); // 16K*3
    
    /*********************************************************************
     * Locals
     *********************************************************************/
    
    /*********************************************************************
     * Function Prototypes
     *********************************************************************/
    static void initSyncRes( apicInstance_t *pInstance );
    static void delSyncRes( apicInstance_t *pInstance );
    static void *SISreadThreadFunc( void *ptr );
    static void *SIShandleThreadFunc( void *ptr );
    static int asynchMsgCback( apicInstance_t *pInstance, areqMsg *pMsg );
    
    /*********************************************************************
     *
     * @fn          apicIgnoreSigPipe
     *
     * @brief       This function sets SIGPIPE signal handling action to ignore
     *              the signal.
     *              This function, if to be called, must be called prior to
     *              any apicInit() call because apicInit() call itself may cause
     *              a SIGPIPE signal.
     *              If this function is not called, there is a chance that
     *              the process may be terminated when server connected to the
     *              process drops the connection and hence application must
     *              handle SIGPIPE with its own handler in such a case in order
     *              to gracefully handle disconnection by peer.
     *
     * @param       None
     *
     * @return      None
     *
     *********************************************************************/
    void apicIgnoreSigPipe( void )
    {
      struct sigaction newact;
    
      /* Set up a new action to ignore the signal. */
      newact.sa_handler = SIG_IGN;
      sigemptyset( &newact.sa_mask );
      newact.sa_flags = 0;
      sigaction( SIGPIPE, &newact, NULL );
    }
    
    int apicGetConnectionHandle(apicHandle_t handle)
    {
      apicInstance_t *pInstance = (apicInstance_t *)handle;
    
      if (pInstance != NULL)
      {
        return pInstance->sAPIconnected;
      }
    
      return 0;
    }
    
    /*********************************************************************
     *
     * @fn          apicInit
     *
     * @brief       This function initializes API Client Socket
     *
     * @param       srvAddr - path to the serial interface server
     * @param       getVer - TRUE to get and display the server information after connection
     * @param       pFn - function pointer to async message handler
     *                    Note that if len argument is passed as 0xffffu to this
     *                    handler function, the API client is notifying that
     *                    the connection is dropped by the peer (server)
     *                    and other parameters passed to the handler function
     *                    are not valid.
     *
     * @return      API client handle if successful.
     *              NULL, otherwise.
     *
     *********************************************************************/
    apicHandle_t apicInit( const char *srvAddr, bool getVer, pfnAsyncMsgCb pFn )
    {
      int i = 0;
      const char *ipAddress = "", *port = "";
      apicInstance_t *pInstance;
    
      char *pStr, strTmp[128];
      pthread_attr_t attr;
    
      // prepare thread creation
      if ( pthread_attr_init( &attr ) )
      {
        perror( "pthread_attr_init" );
        return NULL;
      }
    
      if ( pthread_attr_setstacksize( &attr, apicThreadStackSize ) )
      {
        perror( "pthread_attr_setstacksize" );
        return NULL;
      }
    
      pInstance = malloc( sizeof(apicInstance_t) );
    
      if ( !pInstance )
      {
        uiPrintf( "[ERR] apicInit malloc failed\n" );
        return pInstance;
      }
    
      // Clear the instance
      memset( pInstance, 0, sizeof(*pInstance) );
    
      strncpy( strTmp, srvAddr, 128 );
    
      // use strtok to split string and find IP address and port;
      // the format is = IPaddress:port
      // Get first token
      pStr = strtok( strTmp, ":" );
      while ( pStr != NULL )
      {
        if ( i == 0 )
        {
          // First part is the IP address
          ipAddress = pStr;
        }
        else if ( i == 1 )
        {
          // Second part is the port
          port = pStr;
        }
        i++;
        if ( i > 2 )
          break;
        // Now get next token
        pStr = strtok( NULL, " ,:;-|" );
      }
    
      /**********************************************************************
       * Initiate synchronization resources
       */
      initSyncRes( pInstance );
    
      /**********************************************************************
       * Connect to the API server
       **********************************************************************/
    
    #ifdef NPI_UNIX
      int len;
      struct sockaddr_un remote;
    
      if ( (pInstance->sAPIconnected = socket( AF_UNIX, SOCK_STREAM, 0 )) == -1 )
      {
        perror( "socket" );
        exit( 1 );
      }
    #else
      struct addrinfo hints;
    
      memset( &hints, 0, sizeof(hints) );
    
      hints.ai_family = AF_UNSPEC;
      hints.ai_socktype = SOCK_STREAM;
    
      if ( port == NULL )
      {
        int res;
    
        // Fall back to default if port was not found in the configuration file
        uiPrintf( "Warning! Port not sent to RTIS. Will use default port: %s",
            APIC_PORT );
    
        if ( (res = getaddrinfo( ipAddress, APIC_PORT, &hints, &pInstance->resAddr ))
            != 0 )
        {
          uiPrintfEx(trERROR, "getaddrinfo: %s\n", gai_strerror( res ) );
          delSyncRes( pInstance );
          free( pInstance );
          return NULL;
        }
      }
      else
      {
        int res;
    
        uiPrintf( "Port: %s\n\n", port );
        if ( (res = getaddrinfo( ipAddress, port, &hints, &pInstance->resAddr ))
            != 0 )
        {
          uiPrintfEx(trERROR, "getaddrinfo: %s\n", gai_strerror( res ) );
          delSyncRes( pInstance );
          free( pInstance );
          return NULL;
        }
      }
    
      uiPrintf( "IP addresses for %s:\n\n", ipAddress );
    
      struct addrinfo *p;
    
    #ifdef __APP_UI__
      char ipstr[INET6_ADDRSTRLEN];
    
      for ( p = pInstance->resAddr; p != NULL; p = p->ai_next )
      {
        void *addr;
        char *ipver;
    
        // get the pointer to the address itself,
        // different fields in IPv4 and IPv6:
        if ( p->ai_family == AF_INET ) // IPv4
        {
          struct sockaddr_in *ipv4 = (struct sockaddr_in *) p->ai_addr;
          addr = &(ipv4->sin_addr);
          ipver = "IPv4";
        }
        else // IPv6
        {
          struct sockaddr_in6 *ipv6 = (struct sockaddr_in6 *) p->ai_addr;
          addr = &(ipv6->sin6_addr);
          ipver = "IPv6";
        }
    
        // convert the IP to a string and print it:
        inet_ntop( p->ai_family, addr, ipstr, sizeof ipstr );
        uiPrintf( "  %s: %s\n", ipver, ipstr );
      }
    #endif //__APP_UI__
    
    #endif
    
      uiPrintf( "Trying to connect...\n" );
    
    #ifdef NPI_UNIX
      remote.sun_family = AF_UNIX;
      strcpy( remote.sun_path, ipAddress );
      len = strlen( remote.sun_path ) + sizeof ( remote.sun_family );
      if ( connect( pInstance->sAPIconnected, (struct sockaddr *)&remote, len )
          == -1 )
      {
        perror( "connect" );
        close( pInstance->sAPIconnected );
        delSyncRes( pInstance );
        free( pInstance );
        return NULL;
      }
    #else
      for ( p = pInstance->resAddr; p != NULL; p = p->ai_next )
      {
        if ( (pInstance->sAPIconnected = socket( p->ai_family, p->ai_socktype,
            p->ai_protocol )) == -1 )
        {
          continue;
        }
    
        if ( connect( pInstance->sAPIconnected, p->ai_addr, p->ai_addrlen ) != -1 )
        {
          /* Success */
          break;
        }
    
        close( pInstance->sAPIconnected );
        pInstance->sAPIconnected = -1;
      }
    
      if ( pInstance->sAPIconnected == -1 )
      {
        perror( "connect" );
        freeaddrinfo( pInstance->resAddr );
        delSyncRes( pInstance );
        free( pInstance );
        return NULL;
      }
    #endif
    
      uiPrintf( "Connected.\n" );
    
      int no = 0;
      // allow out-of-band data
      if ( setsockopt( pInstance->sAPIconnected, SOL_SOCKET, SO_OOBINLINE, &no,
          sizeof(int) ) == -1 )
      {
        perror( "setsockopt" );
        close( pInstance->sAPIconnected );
        freeaddrinfo( pInstance->resAddr );
        delSyncRes( pInstance );
        free( pInstance );
        return NULL;
      }
    
      // Set up asynchronous message handler before creating callback thread.
      pInstance->pfnAsyncMsgHandler = pFn;
    
      /****************************************************************************
       * Create thread which can read new messages from the serial interface server
       ****************************************************************************/
    
      if ( pthread_create( &pInstance->SISRThreadId, &attr, SISreadThreadFunc,
          pInstance ) )
      {
        // thread creation failed
        uiPrintf( "Failed to create RTIS LNX IPC Client read thread\n" );
        close( pInstance->sAPIconnected );
        freeaddrinfo( pInstance->resAddr );
        delSyncRes( pInstance );
        free( pInstance );
        return NULL;
      }
    
      /******************************************************************************
       * Create thread which can handle new messages from the serial interface server
       ******************************************************************************/
    
      if ( pthread_create( &pInstance->SISHThreadId, &attr, SIShandleThreadFunc,
          pInstance ) )
      {
        // thread creation failed
        uiPrintf( "Failed to create RTIS LNX IPC Client handle thread\n" );
        close( pInstance->sAPIconnected );
        pthread_join( pInstance->SISRThreadId, NULL );
        freeaddrinfo( pInstance->resAddr );
        delSyncRes( pInstance );
        free( pInstance );
        return NULL;
      }
    
      if ( getVer )
      {
        uint8 version[3];
        uint8 param[2];
    
        //Read Software Version.
        apicReadVersionReq( pInstance, version );
        uiPrintf( "Connected to Server v%d.%d.%d\n", version[0], version[1],
            version[2] );
    
        //Read Number of Active Connection Version.
        apicReadParamReq( pInstance, API_LNX_PARAM_NB_CONNECTIONS, 2, param );
        uiPrintf( "%d active connection , out of %d maximum connections\n", param[0],
            param[1] );
    
        //Check Which interface is used.
        apicReadParamReq( pInstance, API_LNX_PARAM_DEVICE_USED, 1, param );
        uiPrintf( "Interface used y server: %d (0 = UART, 1 = SPI, 2 = I2C)\n",
            param[0] );
      }
    
      return pInstance;
    }
    
    void apicInitializeConnectionInfp(apicHandle_t handle, char * remoteName, uint8 remoteLN)
    {
      char * localName = pAPIS_SysParams->serverName;
      uint8 localLN = pAPIS_SysParams->layerNum;
      char tmpstr[strlen(localName) + 1];
    
      set_connection_details(apicGetConnectionHandle(handle), remoteName, strlen(remoteName), remoteLN);
    
      strcpy(tmpstr + 1, localName);
      tmpstr[0] = localLN;
      apicSendAsynchData(handle, RPC_DUMMYSYS_CONNECTION_INFO, RPC_CMD_CONNTCTION_ID, sizeof(tmpstr), (uint8 *)tmpstr);
    }
    
    /*********************************************************************
     *
     * @fn          apicClose
     *
     * @brief       This function stops API client
     *
     * @param       handle - API client handle
     *
     * @return      None.
     *
     *********************************************************************/
    void apicClose( apicHandle_t handle )
    {
      apicInstance_t *pInstance = handle;
    
      if ( pInstance->appClosed )
      {
        // This function was called again.
        return;
      }
    
      // Not to notify application of connection close when application triggered
      // connection close, indicate that the application triggered close.
      pInstance->appClosed = TRUE;
    
      // Close the API client socket connection
      // For some reason, close() without shutdown() does not unblock
      // recv() in the receive thread when attached to a debugger or
      // when run from valgrind, though close() supposedly encompass
      // shutdown().
      shutdown( pInstance->sAPIconnected, SHUT_RDWR );
      close( pInstance->sAPIconnected );
    
      // Join receive thread and callback thread.
      // It is important to wait till those threads are closed
      // since, otherwise, the threads will access released resources.
      pthread_join( pInstance->SISRThreadId, NULL );
    
      if ( pthread_self() == pInstance->SISHThreadId )
      {
        // To prevent deadlock, join() is not called from the same
        // context, but we know that SIShandleThreadFunc()
        // must not be calling and pInstance variable any longer
        pInstance->freePending = TRUE;
      }
      else
      {
        pthread_join( pInstance->SISHThreadId, NULL );
      }
    
      // Delete synchronization resources
      delSyncRes( pInstance );
    
    #ifndef NPI_UNIX
      freeaddrinfo( pInstance->resAddr ); // free the linked-list
    #endif //NPI_UNIX
      if ( !pInstance->freePending )
      {
        // Free the memory block for the instance
        free( pInstance );
      }
    }
    
    /*********************************************************************
     *
     * @fn          apicSendSynchData
     *
     * @brief       This function sends a message synchronously over the socket
     *
     * @param       handle - API client handle returned from apicInit()
     *
     * @param       subSys - Subsystem ID
     *
     * @param       cmdId - Command ID
     *
     * @param       len- length in bytes of the message payload to send
     *
     * @param       pData - payload message to send
     *
     * @param       pRxSubSys - pointer to a buffer to store the subsystem ID
     *                      of the received response message.
     *                      This parameter may be NULL.
     *
     * @param       pRxCmdId - pointer to a buffer to store the command ID
     *                      of the received response message.
     *                      This parameter may be NULL.
     *
     * @param       pRxLen - pointer to a buffer to store the received
     *                      response message payload length.
     *                      This parameter may be NULL.
     *
     * @return      Pointer to response message payload or NULL when failed.
     *              The caller has to call apicFreeSynchData() to free
     *              the returned memory block once it is done with the received
     *              response message unless this function returned NULL.
     *              Note that even if the pRxLen dereferences value of zero,
     *              this function shall return a valid address that needs to
     *              be freed as far as the function succeeded.
     *
     *********************************************************************/
    uint8 *apicSendSynchData( apicHandle_t handle, uint8 subSys, uint8 cmdId,
                              uint16 len, const uint8 *pData, uint8 *pRxSubSys,
                              uint8 *pRxCmdId, uint16 *pRxLen )
    {
      int result = 0;
      struct timespec expirytime;
      struct timeval curtime;
      ssize_t n;
      uint8 *ptr;
      areqMsg *rspMsg = NULL;
      apicMsgHdr_t *hdr;
      apicInstance_t *pInstance = handle;
      uint8 * returnValue = NULL;
      visualization_args_t visualization_args;
    
    #ifdef API_CLIENT_8BIT_LEN
      if (len > 255)
    #else // API_CLIENT_8BIT_LEN
      if ( len == 0xFFFFu )
    #endif // API_CLIENT_8BIT_LEN
      {
        uiPrintfEx( trERROR, "apicSendSynchData failed due to excessive length\n" );
        return NULL;
      }
    
      hdr = malloc( sizeof(apicMsgHdr_t) + len );
      if ( !hdr )
      {
        return NULL;
      }
      hdr->subSys = subSys & RPC_SUBSYSTEM_MASK;
      if ( (subSys & RPC_CMD_TYPE_MASK) == 0 )
      {
        hdr->subSys |= RPC_CMD_SREQ;
      }
      hdr->cmdId = cmdId;
    #ifdef API_CLIENT_8BIT_LEN
      hdr->len = (uint8) len;
    #else // API_CLIENT_8BIT_LEN
      // Length field endianness conversion to little endian
      hdr->lenL = (uint8) len;
      hdr->lenH = (uint8)( len >> 8 );
    #endif // API_CLIENT_8BIT_LEN
      memcpy( hdr + 1, pData, len );
    
      uiPrintfEx(trINFO, "preparing to send %d bytes, subSys 0x%.2X, cmdId 0x%.2X, pData:\n",
          len,
          subSys,
          cmdId );
    
    
      // Lock mutexes
      uiPrintfEx(trINFO, "[MUTEX] Lock SRSP Transaction Mutex\n" );
      if ( pthread_mutex_lock( &pInstance->clientSREQRSPmutex ) != 0 )
      {
        perror( "pthread_mutex_lock" );
        exit( 1 );
      }
    
      len += sizeof(*hdr);
      ptr = (uint8 *) hdr;
      if ( pthread_mutex_lock( &pInstance->sendMutex ) != 0 )
      {
        perror( "pthread_mutex_lock" );
        exit( 1 );
      }
    
      visualization_args.localLN = pAPIS_SysParams->layerNum;
      visualization_args.localName = pAPIS_SysParams->serverName;
      visualization_args.remoteLN = get_connection_layer_number(pInstance->sAPIconnected);
      visualization_args.remoteName = get_connection_name(pInstance->sAPIconnected);
      visualization_args.directionSend = TRUE;
      visualization_args.max_layers = INVALID_LAYER_NUMBER;
    
      trace_print_buf(ptr, len, "[SREQ] ", &visualization_args);
      print_pb_msg(hdr->subSys, hdr->cmdId, ptr + sizeof(apicMsgHdr_t), len - sizeof(apicMsgHdr_t), 0, "", FALSE, &visualization_args, sizeof(apicMsgHdr_t), "[SREQ] ");
    
      for ( ;; )
      {
        n = send( pInstance->sAPIconnected, ptr, len, 0 );
        if ( n == -1 )
        {
          perror( "send" );
          pthread_mutex_unlock( &pInstance->sendMutex );
          free( hdr );
          pthread_mutex_unlock( &pInstance->clientSREQRSPmutex );
          return NULL;
        }
        if ( n < len )
        {
          ptr += n;
          len -= n;
          // Repeat till entire frame is sent out.
          continue;
        }
        break;
      }
      pthread_mutex_unlock( &pInstance->sendMutex );
      free( hdr );
    
      uiPrintfEx(trINFO, "Waiting for synchronous response...\n" );
      // Conditional wait for the response handled in the receiving thread,
      // wait maximum 2 seconds
      gettimeofday( &curtime, NULL );
      expirytime.tv_sec = curtime.tv_sec + 2;
      expirytime.tv_nsec = curtime.tv_usec * 1000;
    
      uiPrintfEx(trINFO, "[MUTEX] Lock SRSP Mutex\n" );
      if ( pthread_mutex_lock( &pInstance->clientSREQmutex ) != 0 )
      {
        perror( "pthread_mutex_lock" );
        exit( 1 );
      }
      
      uiPrintfEx(trINFO, "[MUTEX] Wait for SRSP Cond signal...\n" );
      while ((result == 0) && (pInstance->numOfReceivedSRSPbytes == 0))
      {
        result = pthread_cond_timedwait( &pInstance->clientSREQcond, &pInstance->clientSREQmutex, &expirytime );
      }
    
      // Wait for response
      if (result == 0)
      {
        if ( pInstance->numOfReceivedSRSPbytes > 0 )
        {
          // Copy response back in transmission buffer for processing
          rspMsg = pInstance->srsp_msg;
    
          // Clear the response queue
          pInstance->srsp_msg = NULL;
        }
        else if ( pInstance->numOfReceivedSRSPbytes == -1 ) //need to add handling of a client disconnection while waiting to srsp (in that case, lock pInstance->clientSREQmutex, set pInstance->numOfReceivedSRSPbytes = -1 , and signal pInstance->clientSREQcond.
        {
          uiPrintf( "Server closed connection\n" );
        }
      }
      else if ( result == ETIMEDOUT )
      {
        // TODO: Indicate synchronous transaction error
        uiPrintfEx(trINFO, "[MUTEX] SRSP Cond Wait timed out!\n" );
        uiPrintfEx( trERROR, "SRSP Cond Wait timed out!\n" );
      }
      else
      {
        uiPrintfEx( trERROR, "SRSP Cond Wait returned %d\n", result );
      }
      
      pInstance->numOfReceivedSRSPbytes = 0;
    
      // Now unlock the mutexes before returning
      uiPrintfEx(trINFO, "[MUTEX] Unlock SRSP Mutex\n" );
      pthread_mutex_unlock( &pInstance->clientSREQmutex );
      pthread_mutex_unlock( &pInstance->clientSREQRSPmutex );
    
      if ( rspMsg )
      {
        if ( pRxSubSys )
        {
          *pRxSubSys = rspMsg->subSys;
        }
        if ( pRxCmdId )
        {
          *pRxCmdId = rspMsg->cmdId;
        }
        if ( pRxLen )
        {
          *pRxLen = rspMsg->len;
        }
        returnValue = (uint8 *) (rspMsg + 1);
      }
      else
      {
        uiPrintfEx( trERROR, "apicSendSynchData() failed getting response\n");
      }
    
      return returnValue;
    }
    
    /*********************************************************************
     *
     * @fn          apicFreeSynchData
     *
     * @brief       This function returns the received synchronous response
     *              data back to the memory pool.
     *
     * @param       pData - pointer to the memory block that were returned
     *                      from apicSendSynchData.
     *
     * @return      None
     *
     *********************************************************************/
    void apicFreeSynchData( uint8 *pData )
    {
      free( ((areqMsg *) pData) - 1 );
    }
    
    /*********************************************************************
     *
     * @fn          apicSendAsynchData
     *
     * @brief       This function sends a message asynchronously over the 
     *              socket
     *
     * @param       handle - API client handle
     *
     * @param       subSys - Subsystem ID
     *
     * @param       cmdId - Command ID
     *
     * @param       len- length in bytes of the message payload to send
     *
     * @param       pData - payload message to send
     *
     * @return      None.
     *
     *********************************************************************/
    void apicSendAsynchData( apicHandle_t handle, uint8 subSys, uint8 cmdId,
                             uint16 len, uint8 *pData )
    {
      size_t n;
      uint8 *ptr;
      apicMsgHdr_t *hdr;
      apicInstance_t *pInstance = handle;
      visualization_args_t visualization_args;
    
    #ifdef API_CLIENT_8BIT_LEN
      if (len > 255)
      {
        uiPrintf( "[ERR] apicSendAsynchData called with excessive length %d\n",
            len );
        return;
      }
    #endif // API_CLIENT_8BIT_LEN
      hdr = malloc( sizeof(apicMsgHdr_t) + len );
      if ( !hdr )
      {
        uiPrintf( "[ERR] apicSendAsynchData failed malloc()\n" );
        return;
      }
    
      // Add Proper RPC type to header
      hdr->subSys = subSys & RPC_SUBSYSTEM_MASK;
    
      if ( (subSys & RPC_CMD_TYPE_MASK) == 0 )
      {
        hdr->subSys |= RPC_CMD_AREQ;
      }
      hdr->cmdId = cmdId;
    #ifdef API_CLIENT_8BIT_LEN
      hdr->len = len;
    #else // API_CLIENT_8BIT_LEN
      // Endianness conversion
      hdr->lenL = (uint8) len;
      hdr->lenH = (uint8)( len >> 8 );
    #endif // API_CLIENT_8BIT_LEN
      memcpy( hdr + 1, pData, len );
    
      uiPrintfEx(trINFO, "trying to send %d bytes, subSys 0x%.2X, cmdId 0x%.2X\n",
          len,
          subSys,
          cmdId );
    
    
      ptr = (uint8 *) hdr;
      len += sizeof(apicMsgHdr_t);
      if ( pthread_mutex_lock( &pInstance->sendMutex ) != 0 )
      {
        perror( "pthread_mutex_lock" );
        exit( 1 );
      }
    
      visualization_args.localLN = pAPIS_SysParams->layerNum;
      visualization_args.localName = pAPIS_SysParams->serverName;
      visualization_args.remoteLN = get_connection_layer_number(pInstance->sAPIconnected);
      visualization_args.remoteName = get_connection_name(pInstance->sAPIconnected);
      visualization_args.directionSend = TRUE;
      visualization_args.max_layers = INVALID_LAYER_NUMBER;
      
      trace_print_buf(ptr, len, "[AREQ] ", &visualization_args);
      print_pb_msg(hdr->subSys, hdr->cmdId, ptr + sizeof(apicMsgHdr_t), len - sizeof(apicMsgHdr_t), 0, "", FALSE, &visualization_args, sizeof(apicMsgHdr_t), "[AREQ] ");
    
      for ( ;; )
      {
        n = send( pInstance->sAPIconnected, ptr, len, 0 );
        if ( n == -1 )
        {
          perror( "send" );
        }
        else if ( n < len )
        {
          len -= n;
          ptr += n;
          // Repeat till all bytes of the message is sent.
          continue;
        }
        break;
      }
      pthread_mutex_unlock( &pInstance->sendMutex );
      free( hdr );
    }
    
    /*********************************************************************
     *
     * @fn          apicReadVersionReq
     *
     * @brief       This API is used to read the serial interface server 
     *              version.
     *
     * @param       handle - API client handle
     *
     * @param       pValue -
     *
     * @return      None.
     *
     *********************************************************************/
    void apicReadVersionReq( apicHandle_t handle, uint8 *pValue )
    {
      uint8 *pRsp;
    
      // Send Read Version Request
      pRsp = apicSendSynchData( handle, RPC_SYS_SRV_CTRL,
          API_LNX_CMD_ID_VERSION_REQ, 0, NULL, NULL, NULL, NULL );
    
      if ( pRsp )
      {
        // copy the reply data to the client's buffer
        // Note: the first byte of the payload is reserved for the status
        memcpy( pValue, &pRsp[1], 3 );
        apicFreeSynchData( pRsp );
      }
    }
    
    /*********************************************************************
     *
     * @fn          apicReadParamReq
     *
     * @brief       This API is used to read serial interface server parameters.
     *
     * @param       handle - API client handle
     * @param       paramId - The parameter item identifier.
     * @param       len - The length in bytes of the item identifier's data.
     * @param       *pValue - Pointer to buffer where read data is placed.
     *
     * @return      None.
     *
     *********************************************************************/
    void apicReadParamReq( apicHandle_t handle, uint8 paramId, uint8 len,
                           uint8 *pValue )
    {
      uint8 *pRsp, reqdata[] =
      { paramId, len };
    
      // Send param request
      pRsp = apicSendSynchData( handle, RPC_SYS_SRV_CTRL,
          API_LNX_CMD_ID_GET_PARAM_REQ, 2, reqdata, NULL, NULL, NULL );
    
      if ( pRsp )
      {
        // copy the reply data to the client's buffer
        // Note: the first byte of the payload is reserved for the status
        memcpy( pValue, &pRsp[1], len );
        apicFreeSynchData( pRsp );
      }
    }
    
    /* Initialize thread synchronization resources */
    static void initSyncRes( apicInstance_t *pInstance )
    {
      // initialize all mutexes
      pthread_mutex_init( &pInstance->clientSREQRSPmutex, NULL );
      pthread_mutex_init( &pInstance->clientSREQmutex, NULL );
      pthread_mutex_init( &pInstance->clientAREQmutex, NULL );
      pthread_mutex_init( &pInstance->sendMutex, NULL );
    
      // initialize all conditional variables
      pthread_cond_init( &pInstance->clientSREQcond, NULL );
      if ( sem_init( &pInstance->clientAREQsem, 0, 0 ) != 0 )
      {
        uiPrintf( "[ERR] sem_init() failed\n" );
        exit( 1 );
      }
    }
    
    /* Destroy thread synchronization resources */
    static void delSyncRes( apicInstance_t *pInstance )
    {
      // destroy all conditional variables
      pthread_cond_destroy( &pInstance->clientSREQcond );
      sem_destroy( &pInstance->clientAREQsem );
    
      // destroy all mutexes
      pthread_mutex_destroy( &pInstance->clientSREQRSPmutex );
      pthread_mutex_destroy( &pInstance->clientSREQmutex );
      pthread_mutex_destroy( &pInstance->clientAREQmutex );
      pthread_mutex_destroy( &pInstance->sendMutex );
    
    }
    
    /*********************************************************************
     *
     * @fn          SISreadThreadFunc
     *
     * @brief       Read Thread
     *
     * @param       ptr - poiner to instance
     *
     * @return      pointer to instance
     *
     *********************************************************************/
    static void *SISreadThreadFunc( void *ptr )
    {
      int done = 0, n;
      apicMsgHdr_t hdrbuf;
      apicInstance_t *pInstance = ptr;
      char trace_prefix[35];
      visualization_args_t visualization_args;
    
      trace_init_thread("READ");
      /* thread loop */
    
      // Read from socket
      do
      {
        // Normal data
        do
        {
          n = recv( pInstance->sAPIconnected, &hdrbuf, sizeof(hdrbuf), MSG_WAITALL );
        }  while ( n < 0   &&   errno == EINTR);
    
        if ( n <= 0 )
        {
          if ( n < 0 )
          {
            perror( "recv" );
          }
          else
          {
            uiPrintfEx(trINFO, "Peer closed connection\n" );
          }
          done = 1;
        }
        else if ( n == sizeof(hdrbuf) )
        {
          size_t len;
          areqMsg *pMsg;
    
          // We have received the header,
          // now read out length bytes and process it,
          // if there are bytes to receive.
    #ifdef API_CLIENT_8BIT_LEN
          len = hdrbuf.len;
    #else // API_CLIENT_8BIT_LEN
          // Convert little endian length to host endianness
          len = hdrbuf.lenL;
          len |= (uint16) hdrbuf.lenH << 8;
    #endif // API_CLIENT_8BIT_LEN
          pMsg = malloc( sizeof(areqMsg) + len );
          if ( pMsg )
          {
            pMsg->nextMessage = NULL;
            pMsg->len = len;
            pMsg->subSys = hdrbuf.subSys;
            pMsg->cmdId = hdrbuf.cmdId;
    
            if ( len > 0 )
            {
              do
              {
                n = recv( pInstance->sAPIconnected, pMsg + 1, len, MSG_WAITALL );
              }  while ( n < 0   &&   errno == EINTR);
            }
            else
            {
              // There are no payload bytes; which is also valid.
              n = 0;
            }
    
            if ( n == pMsg->len )
            {
              uiPrintfEx(trINFO, "Received %d bytes, subSys 0x%.2X, cmdId 0x%.2X\n",
                  pMsg->len, pMsg->subSys, pMsg->cmdId );
    
    
    #ifdef API_CLIENT_8BIT_LEN
              STRING_START(trace_prefix,"[read] %02X:%02X:%02X:", hdrbuf.len, hdrbuf.subSys, hdrbuf.cmdId);
    #else
              STRING_START(trace_prefix,"[read] %02X:%02X:%02X:%02X:", hdrbuf.lenL, hdrbuf.lenH, hdrbuf.subSys, hdrbuf.cmdId);
    #endif
    
              visualization_args.localLN = pAPIS_SysParams->layerNum;
              visualization_args.localName = pAPIS_SysParams->serverName;
              visualization_args.remoteLN = get_connection_layer_number(pInstance->sAPIconnected);
              visualization_args.remoteName = get_connection_name(pInstance->sAPIconnected);
              visualization_args.directionSend = FALSE;
              visualization_args.max_layers = INVALID_LAYER_NUMBER;
    
              trace_print_buf(((uint8 *)(&pMsg[1])), n, trace_prefix, &visualization_args);
              print_pb_msg(pMsg->subSys, pMsg->cmdId, ((uint8 *)(&pMsg[1])), n, 0, "", FALSE, &visualization_args, sizeof(apicMsgHdr_t), "[read] ");
    
              if ( (pMsg->subSys & RPC_CMD_TYPE_MASK) == RPC_CMD_SRSP )
              {
                // and signal the synchronous reception
                uiPrintfEx(trINFO, "[MUTEX] SRSP Cond signal set\n" );
                uiPrintfEx(trINFO, "Client Read: (len %d):\n", pMsg->len + sizeof(hdrbuf) );
                fflush( stdout );
    
                if ( pthread_mutex_lock( &pInstance->clientSREQmutex ) != 0 )
                {
                  uiPrintf( "[ERR] Mutex lock failed while handling SRSP\n" );
                  exit( 1 );
                }
    
                if ( pInstance->srsp_msg )
                {
                  // Unhandled SRSP message must be freed
                  uiPrintfEx( trWARNING, "[ERR] Unhandled SRSP cleared\n" );
                  free( pInstance->srsp_msg );
                }
                pInstance->srsp_msg = pMsg;
                pInstance->numOfReceivedSRSPbytes = pMsg->len + sizeof(hdrbuf);
                pthread_cond_signal( &pInstance->clientSREQcond );
                pthread_mutex_unlock( &pInstance->clientSREQmutex );
              }
              else if ( (pMsg->subSys & RPC_CMD_TYPE_MASK) == RPC_CMD_AREQ )
              {
                uiPrintfEx(trINFO, "RPC_CMD_AREQ cmdId: 0x%.2X\n", pMsg->cmdId );
    
                pInstance->areqRxMsgCount++;
    
                uiPrintfEx(trINFO, "[DBG] Allocated \t@ 0x%08X"
                    " (received\040 %d messages)...\n",
                    (unsigned int)pMsg,
                    pInstance->areqRxMsgCount );
    
                uiPrintfEx(trINFO, "Filling new message (@ 0x%08X)...\n",
                    (unsigned int)pMsg );
    
                if ( pthread_mutex_lock( &pInstance->clientAREQmutex ) != 0 )
                {
                  uiPrintf( "[ERR] pthread_mutex_lock() failed"
                      " while processing AREQ\n" );
                  exit( 1 );
                }
    
                // Place message in read list
                if ( pInstance->areq_rec_buf == NULL )
                {
                  // First message in list
                  pInstance->areq_rec_buf = pMsg;
                }
                else
                {
                  areqMsg *searchList = pInstance->areq_rec_buf;
    
                  // Find last entry and place it here
                  while ( searchList->nextMessage != NULL )
                  {
                    searchList = searchList->nextMessage;
                  }
                  searchList->nextMessage = pMsg;
                }
                pthread_mutex_unlock( &pInstance->clientAREQmutex );
                // Flag semaphore
                sem_post( &pInstance->clientAREQsem );
              }
              else
              {
                // Cannot handle synchronous requests from RNP
                uiPrintf( "ERR: Received SREQ\n" );
                free( pMsg );
              }
            }
            else
            {
              // Possible if the socket connection is gone in the middle
              uiPrintf( "[ERR] Connection lost in the middle of reception\n"
                  "- n:%d, len:%d\n", n, pMsg->len );
              free( pMsg );
            }
          }
        }
        else
        {
          // Possible if the socket connection is gone in the middle
          uiPrintf( "[ERR] Connection lost in the middle of header reception"
              " - n: %d\n", n );
          done = 1;
        }
    
      } while ( !done );
    
      // Flag semaphore to notify the callback thread that the receive
      // thread is terminated.
      pInstance->closed = TRUE;
      sem_post( &pInstance->clientAREQsem );
    
      return (ptr);
    }
    
    /*********************************************************************
     *
     * @fn          SIShandleThreadFunc
     *
     * @brief       handle Thread
     *
     * @param       ptr - poiner to instance
     *
     * @return      pointer to instance
     *
     *********************************************************************/
    static void *SIShandleThreadFunc( void *ptr )
    {
      int done = 0;
      apicInstance_t *pInstance = ptr;
      trace_init_thread("HNDL");
    
      // Handle message from socket
      do
      {
        int semresult;
    
        uiPrintfEx(trINFO, "[MUTEX] Wait for AREQ semaphore\n" );
    
        do
        {
          semresult = sem_wait( &pInstance->clientAREQsem );
          /* Repeat while interrupt by signal */
        } while ( semresult != 0 && errno == EINTR );
    
        if ( semresult != 0 )
        {
          uiPrintf( "[ERR] sem_wait() for AREQ receive failed\n" );
          exit( 1 );
        }
    
        if ( pthread_mutex_lock( &pInstance->clientAREQmutex ) != 0 )
        {
          uiPrintf( "[ERR] pthread_mutex_lock() for AREQ receive failed\n" );
          exit( 1 );
        }
    
        // Walk through all received AREQ messages before releasing MUTEX
        areqMsg *searchList = pInstance->areq_rec_buf;
    
        // Note that some may think of the following statement
        //   areq_rec_buf = NULL;
        // here and process searchList as entire linked list
        // without waiting for semaphore to reduce the sem_wait() calls
        // during callbacks.
        // However, it increases the risk of semaphore counter overflow
        // and it does not actually reduce the number of sem_wait() calls.
        if ( searchList )
        {
          pInstance->areq_rec_buf = searchList->nextMessage;
        }
    
        // Note that the callback calls must not hold the receive thread
        // hostage and hence, the thread is freed here.
        pthread_mutex_unlock( &pInstance->clientAREQmutex );
    
        uiPrintfEx(trINFO, "[MUTEX] Mutex for AREQ unlocked\n" );
    
        if ( searchList != NULL )
        {
          uiPrintfEx(trINFO, "[DBG] Processing \t@ 0x%08X\n",
              (unsigned int)searchList );
    
          // Must remove command type before calling callback function
          searchList->subSys &= ~(RPC_CMD_TYPE_MASK);
    
          uiPrintfEx(trINFO, "[MUTEX] AREQ Calling asynchMsgCback (Handle)...\n" );
    
          asynchMsgCback( pInstance, searchList );
    
          if ( pInstance->freePending )
          {
            // application must have called apicClose() from within the callback
            free( pInstance );
            return NULL;
          }
    
          uiPrintfEx(trINFO, "[MUTEX] AREQ (Handle) (message @ 0x%08X)...\n",
              (unsigned int)searchList );
    
          pInstance->areqProcMsgCount++;
    
          uiPrintfEx(trINFO, "[DBG] Clearing \t\t@ 0x%08X"
              " (processed %d messages)...\n",
              (unsigned int) searchList,
              pInstance->areqProcMsgCount );
    
          free( searchList );
        }
        else if ( pInstance->closed )
        {
          if ( !pInstance->appClosed && pInstance->pfnAsyncMsgHandler )
          {
            // Notify that the connection was closed when application did not
            // trigger close.
            pInstance->pfnAsyncMsgHandler( pInstance, 0, 0, 0xFFFFu, NULL );
            if ( pInstance->freePending )
            {
              // apicClose() must have been called from within the callback.
              // free the instance.
              free( pInstance );
              return NULL;
            }
          }
          done = TRUE;
        }
    
      } while ( !done );
    
      return (ptr);
    }
    
    /*********************************************************************
     * @fn          asynchMsgCback
     *
     * @brief       This function is a API callback to the client that 
     *              inidcates an asynchronous message has been received. 
     *              The client software is expected to complete this call.
     *
     *              Note: The client must copy this message if it requires
     *                    it beyond the context of this call.
     *
     * input parameters
     *
     * @param       pInstance - API client instance
     * @param       *pMsg - A pointer to an asychronously received message.
     *
     * output parameters
     *
     * None.
     *
     * @return      None.
     *********************************************************************/
    static int asynchMsgCback( apicInstance_t *pInstance, areqMsg *pMsg )
    {
      if ( pMsg )
      {
        if ( pInstance->pfnAsyncMsgHandler )
        {
          uiPrintfEx(trINFO, "[DBG] asyncCB: subSys:0x%08X, cmdId:0x%08X, len:0x%08X, pData:0x%08X\n",
              (unsigned int)pMsg->subSys,
              (unsigned int)pMsg->cmdId,
              (unsigned int)pMsg->len,
              (unsigned int)(pMsg + 1) );
          pInstance->pfnAsyncMsgHandler( pInstance, pMsg->subSys, pMsg->cmdId,
              pMsg->len, (uint8 *) (pMsg + 1) );
        }
      }
    
      return (0);
    }
    

  • This time it looks like the NPI server catches this issue (C:\ti\Zigbee_3_0_Linux_Gateway_1_0_1\source\Projects\zstack\linux\RemoTI-Linux-master\Projects\tools\LinuxHost\ipclib\server\npi_lnx_ipc.c).

    Similar concept can be applied here to the recv() calls.

    int NPI_LNX_IPC_ConnectionHandle(int connection)
    {
    //...
    
    	do {
    		n = recv(connection, npi_ipc_buf[0], RPC_FRAME_HDR_SZ, 0);
    	} while(n == -1 && errno == EINTR);
    
    //...
    			do {
    				n = recv(connection, (uint8*) &npi_ipc_buf[0][RPC_FRAME_HDR_SZ], ((npiMsgData_t *) npi_ipc_buf[0])->len , 0);
    			} while(n == -1 && errno == EINTR);

  • Hi Toby,

    Unfortunately, this resolves the segfault but not the problem. Actually my impressions is that - since intrdoducing the while loop in znp_zdo - the error happens even much earlier. Attached one more log.

    Any chance to resolve this issue within a reasonable time. Otherwise, any suggestion how to implement a "sof restart" which which only restart that specific thread and not the whole server once the SRSP Timeout has been reached?

    Regards
    Peter

    zigbee1 (2).log

  • Apologies for the delay.

    Looking at this most recent log, there is a similar error which causes the NPI server to exit:

    ERR:: Interrupted system call
    [ERR] npi_ipc_errno 0x02030100
    [10:01:21.687,265] [NPISRVR/U_RX] PKT_HEX: [SOCZIGB>>NPISRVR ] [ASNC] 03:44:80:00:01:8E
    [10:01:21.687,385] [NPISRVR/ACBK] PKT_HEX: [ NPISRVR>>Z_STACK ] [bcst] 03:44:80:00:01:8E
    [10:01:21.687,799] [Z_STACK/MAIN] PKTTYPE: [ Z_STACK>>>>>>>>>>>GATEWAY ] afDataConfirmInd
    [10:01:21.687,863] [Z_STACK/MAIN] PKTBODY: cmdID = <NOT_FOUND>
    [10:01:21.687,904] [Z_STACK/MAIN] PKTBODY: status = ZSuccess
    [10:01:21.687,944] [Z_STACK/MAIN] PKTBODY: endpoint = 0x00000001 (1)
    [10:01:21.687,985] [Z_STACK/MAIN] PKTBODY: transID = 0x0000008E (142)
    Exit socket while loop
    pid 692 is not there

    Close to top of log, we see NPI_PID=692.

    The npi_ipc_errno 0x02030100 is defined in npi_lnx_error.h: #define NPI_LNX_ERROR_UART_SEND_FRAME_FAILED_TO_WRITE 0x02030100

    Later we see the part of log you've screenshot:

    [10:01:30.477,901] [Z_STACK/LSTN] PKTTYPE: [ NPISRVR<<Z_STACK ] [SREQ] 04:25:45:C0:4D:00:05
    [10:01:32.478,236] [Z_STACK/LSTN] ERROR : SRSP Cond Wait timed out!
    [10:01:32.478,333] [Z_STACK/LSTN] ERROR : apicSendSynchData() failed getting response

    At this point the NPISRVR is not active, and we see that the zigbeeHAgw script is attempting to restart all the servers:

    resetting ZigBee SoC
    ===================================================
    starting NPI, cmd ' ./NPI_lnx_arm_server NPI_Gateway.cfg -v 0x0000010E ' on Thu Nov 5 10:13:17 CET 2020
    [10:13:17.518,404] [NPISRVR/MAIN] UNMSKBL:
    [10:13:17.518,855] [NPISRVR/MAIN] UNMSKBL: ************************************************
    [10:13:17.518,887] [NPISRVR/MAIN] UNMSKBL: * NPI Server v1.0.2d *

    After this, it looks like the communication has been restored (e.g there are GW_ATTRIBUTE_REPORTING_IND).

    Based on these, there are two options:

    1. Restart the specific server (in this case NPI). This would involve changes to other_server_died() (in zigbeeHAgw, which currently stops then restarts all the servers), and also the track_servers script to specify which PID died.
    2. The "Interrupted system call" can be fixed, using similar methods as before (repeating a blocking operation if EINTR is detected, such as for npi_write()).

  • Toby Pan said:

    1. The "Interrupted system call" can be fixed, using similar methods as before (repeating a blocking operation if EINTR is detected, such as for npi_write()).

    Hi Toby,

    I have added the part you mentioned in 1 to npi_lnx_uart.c and it has been working fine for more than week now so I suppose it is resolved:

    	pthread_mutex_lock(&npi_write_mutex);
    + 	do { 
    		result = write(npi_fd, buf, count);
    + 	} while ( result < 0   &&   errno == EINTR);
    	pthread_mutex_unlock(&npi_write_mutex);

    Maybe you can fix this together with your changes in your previous answers in your version control so that in the next version it will not be necessary to follow-up this fix...

    Regards
    Peter