This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2538: Why "SRSP Cond Wait timed out! " condition occur in Zigbee linux gateway 3.0 application.

Part Number: CC2538
Other Parts Discussed in Thread: Z-STACK, Z-STACK-ARCHIVE

Hello E2E community,

I am using CC2538 module with BBB, till now I am able to run ./start_application &  ./zigbeeHAgw bbb applications successfully.

when i am sending multiple commands on gateway server. for first few 40-50 commands it will works successfully. 

 

but after that zigbeeHAgw Server stuffs & get slower.

when i go through console log. I have found that there is a error continuously prompting on console window. which is...

[08:05:27.616,978] [Z_STACK/LSTN] ERROR  : SRSP Cond Wait timed out!
[08:05:27.617,093] [Z_STACK/LSTN] ERROR  : apicSendSynchData() failed getting response



Will you please help me to understand whats going on here.
How to resolve this error?...


Thanks & Regard's

Shiv Patil.

  • I also encountered the same problem a few years ago when I plan development the zigbee linux gateway, I had to give up later and use other solutions instead. TI's zigbee linux gateway release 2014 years and not have maintenance forever,that has too many problems,for example memory leak,process crash unexpectedly,communication with UART timeout and so on.
  • I am using the Newer version of gateway application i.e. Zigbee linux gateway application 3.0
    Is TI resolved this problems in New Zigbee linux gateway application 3.0
    How we solve this... because due to this reason the hole system will go slow down. that is not acceptable in real time use.
  • Hi,

    Is the CC2538 connected to SmartRF06?

    If so, having the serial communication thought XDS100v3 chip adds a little delay to the communication.

    You can try introducing a longer SRSP timeout in apicSendSynchData (located in <gateway install>/source/Projects/zstack/linux/srvwrapper/api_client.c):

      expirytime.tv_sec = curtime.tv_sec + 2; // increase timeout
      expirytime.tv_nsec = curtime.tv_usec * 1000;

    Within how much time are you sending the "multiple commands"?

    I would recommend evaluating the ZNP on a newer launchpad:

    -

    http://www.ti.com/tool/LAUNCHXL-CC1352R1

    -

    http://www.ti.com/tool/LAUNCHXL-CC26X2R1

    Use the hex image from building the ZNP project in the new SDK (as the CC13x2/26x2 ZNP images were built for Rev C launchpads).

    Regards,

    Toby

  • Dear Toby,

    We are not using XDS emulator or any other interface, we have directly created the Hex file of ZNP and have flashed that on CC2538 which is directly connected to Beagle Bone through UART at 115200 Baud Rate.

    Also as we are using CC2538 we would like to test and resolve the problem on CC2538, instead of going to other launch pads which uses different chipsets.
  • Hi Toby,

    As per your suggestion I have made the following changes,

    install>/source/Projects/zstack/linux/srvwrapper/api_client.c):

    expirytime.tv_sec = curtime.tv_sec + 5; // timeout increased
    expirytime.tv_nsec = curtime.tv_usec * 1000;

    I am not found that it does some major effect on application code.
    I am sending request to a device @50 m sec interval.

    still I am getting error mentioned above,

    I wants to know,

    1. Is gateway server buffering the all incoming request. processing them one by one?.
    2. How I fix this bug whenever it arries?.


    Thanks & Regards
    Shiv Patil.
  • Do you rebuild (setup.sh) after increasing the timeout?

    You may be sending requests to frequently to the server.

    When you increase from 2 to 5 seconds, are you able to send more than 40-50 commands successfully?
    What if you increased from 5 to 10 seconds?
  • Yes, we also tried it for 5 to 10 sec SRSP timer out interval. But the problem is not related to the timeout.

    suppose we make SRSP timeout = 10 sec.
    Ones the SRSP time out condition occurs rest all command will also get the SRSP time out & they restures a time out status in Generic Response Indication after that 10 sec which is a SRSP time out.

    In real time use this will be not torrolable. If this kind of condition occurs zigbee may restart service or handle it with care.

    how we can handle this?..
  • Does this still happen after you increased the baudrate here? e2e.ti.com/.../784775
  • Yes,

    It occurs eventually,
    But still it is showing SRSP Time out still  " SRSP Cond Wait timed out! " Error.
    even Increasing SRSP timer out interval not resolved this problem & slows down the gateway application as we increases the timer value.

    As I worked on increasing the baud rate,
    Now I am able to know that this condition occurs for each Sync Response type of commands send to ZNP device on Serial BUS (i.e. UART),
    when ZNP Device doesn't response to Gateway Server. It will throw a error SRSP Cond Wait timed out!

    But I am not 100% sure about this,
    Because I have few questions regarding this,

    I have enabled the Handshake by setting variable flowcontrol=1 ; in NPI_Gateway.cfg. file. 

    CTS & RTS pins are grounded on ZNP device.

    Because I am unable to found CTS & RTS PINs on Beaglebone. there is no definition for them in Gateway application.

    1. Is it happening because I am not connected CTS & RTS pins for the handshaking between ZNP & Gateway Server?.
               If -> Yes
               Would you please let me know which PINS are to configured as a CTS & RTS on beagle bone side?

    2. I think it is a critical problem In gateway application, how we can handle this?
               So that even it arries we take a immediate action to recover it.

    Regards,

    Shiv Patil.

  • NPI_Gateway.cfg has devPath="/dev/ttyO4".
    According to beaglebone.cameon.net/.../serial-ports-uart, you can see the following mapping:
    - RX: P9_11
    - TX: P9_13
    - CTS: P8_35
    - RTS: P8_33

    Does the gateway recover after this error shows up?
    Can you provide the complete log for more context?
  • Hello Toby Pan,

    I am able to configure CTS & RTS pins for Hardware Flow control on ZNP device ( CC2538 ) & Beaglebone successfully.

    I have tested the Zigbee Gateway Application by sending commands to different devices over the network & its working fine.

    But,

    I am still facing that problem ERROR :SRSP Cond Wait timer out! 

    Does the gateway recover after this error shows up?

    Ans : It will be only recover when we RESTART the Gateway Application OR Gateway Application it self causes RESTART for certain condition.

                ( i.e. when NWK / GATEWAY / OTA any one of network dies, application get a call of restart by it self. )

     

    Can you provide the complete log for more context?

    Ans : Find the attachment given below will help you to understand the problem.  

    56916 is the first line when I got this ERROR. Till this I am able to receive responses for the actuation commands.
    56936 was the last response from the ZNP device.

    Note : Now the gateway application has somehow better performance than previous one. ( Baudrate = 921600 & enabled flowconrtol using CTS & RTS )

               If you look at the logs & do search for the "  ERROR : SRSP Cond Wait timed out!  ",  You will be able to know that more than 100s of time this

               Error will be generated.

               Even though system is in working state we have no control over it. It will be continually prompted on console.

               Major drawback is it will be slows down gateway application responses as per the values of 

    expirytime.tv_sec = curtime.tv_sec + 2; // in this case increasing timeout will respectively delay the responses with timeout interval.
    expirytime.tv_nsec = curtime.tv_usec * 1000;
     
     
    Regards,
    Shiv Patil.
  • In line 56862, looks like a socket recv fails: "recv: Interrupted system call"
    Perhaps of the servers/clients is overloaded.

    For further debugging, can you rebuild the projects with -D__BIG_DEBUG__ ? This will provide additional output on the logs. Please provide the logs from that.

    In the following make files, you would add -D__BIG_DEBUG__ to the already existing DEFINES:
    - Zigbee_3_0_Linux_Gateway_1_0_0/source/Projects/zstack/linux/RemoTI-Linux-master/Projects/tools/LinuxHost/makefile
    - Zigbee_3_0_Linux_Gateway_1_0_0/source/Projects/zstack/linux/zstackserverznp/Makefile
    - Zigbee_3_0_Linux_Gateway_1_0_0/source/Projects/zstack/linux/hagateway/Makefile
    - Zigbee_3_0_Linux_Gateway_1_0_0/source/Projects/zstack/linux/nwkmgr/Makefile
    - Zigbee_3_0_Linux_Gateway_1_0_0/source/Projects/zstack/linux/otaserver/Makefile
  • To figure out what is happening in each server separately, you can also change the output directory of the debug logs to separate files.

    By default all output is written to /tmp/GW_SRVRS.out. This is specified in zigbeeHAgw

    NPI_OUT=/tmp/GW_SRVRS.out
    ZLS_OUT=/tmp/GW_SRVRS.out
    GWA_OUT=/tmp/GW_SRVRS.out
    OTA_OUT=/tmp/GW_SRVRS.out
    NWKMGR_OUT=/tmp/GW_SRVRS.out

    Using different names for the .out files will show you if it is the same server causing issues.

    Ultimately, I still recommend that you not send commands so frequently (50 ms).

  • Hello Toby Pan,

    I have tried both of the solutions as you mentioned above.

    1. Enabling compilation flag -D__BIG_DEBUG__ will not affect logs very much. though I am able to get more informations using this,
        but it is not showing any additional information related to the SRSP Cond Wait timed out!

    2. making separate logs.out for all types of server components on gateway server will also not help me to get information related to this
    problem.

    And I found something different that,

    When we are sending multiple commands to a device rapidly. it goes in to offline mode after some time.

    Is this problem that device doesn't response or packet are dropped on ZNP or Gateway server due to it's offline status?.

    If device goes offline how we get back it in online mode?..

  • Do you still see "recv: Interrupted system call" on any of the logs?
    If yes, I think this is happening on one of the servers, and it'd be useful to figure out which one.

    What do you mean "it goes in offline mode" and "device goes offline":
    - does "it" refer to the ZNP or another Zigbee node?
    - what do you mean by "offline mode"?

    Please provide a sniffer log and specify the relevant packets.
  • Hello Toby Pan,

    I would like to answer your question 

    Q. What do you mean "it goes in offline mode" and "device goes offline" ?

    Ans : 

    if you look at the Zigbee Gateway Application code. you will be able to found that there is a way to get the Device Network Status. 

    In file nwkmgr / nwkmgr.pb-c.h you will able to find its definition. also same reference is mentioned in serverpb / server.pb-c.h

    typedef enum _NwkDeviceStatusT {
      /*
       * Device is off-line (non-responsive to service discovery)             // Here TI clearly mentioned that the device is in non-responsive mode.
       */
      NWK_DEVICE_STATUS_T__DEVICE_OFF_LINE = 0,
      /*
       * Status good
       */
      NWK_DEVICE_STATUS_T__DEVICE_ON_LINE = 1,
      /*
       * Device has been removed
       */
      NWK_DEVICE_STATUS_T__DEVICE_REMOVED = 2,
      /*
       * Not Applicable (this value is returned when gateway device)
       */
      NWK_DEVICE_STATUS_T__DEVICE_NA = 255
        PROTOBUF_C__FORCE_ENUM_TO_BE_INT_SIZE(NWK_DEVICE_STATUS_T)
    } NwkDeviceStatusT;

    & this Device State is maintained in the file type.h  > device_info_t  struct.

    /* Display-related information for a Device in the network */
    typedef struct { endpoint_info_t ep_list[MAX_ENDPOINTS]; uint64_t ieee_addr; uint16_t nwk_addr; uint16_t manufacturer_id; bool valid; bool selected; bool selected_as_bind_destination; uint8_t device_status; uint8_t num_endpoints; uint8_t selected_endpoint_index; } device_info_t;

    I am sending you a screenshot that shows the GRAY line devices goes in to offline mode. & RED ones are online.

    & this is defined in user demo / framework / user_interface.c file in function  void ui_redraw_device_list(void)

    STRING_ADD(device_string, DARK_WHITE " |%s%s" , ds_device_table[i].selected_as_bind_destination ? GREEN :   \
     ((ds_device_table[i].device_status == NWK_DEVICE_STATUS_T__DEVICE_OFF_LINE) ? BOLD BLACK : RED), ds_device_table[i].selected ? BOLD : "");
    
    STRING_ADD(device_string, " ");

    I hope now you will be able to understand about device status.

    How we can manage this to get back our device responsive to service discovery?

     

    Thanks & Regards 

    Shiv Patil.

  • If a device is considered offline if the ZNP detects that that device has failed to receive a consecutive number of messages.
    In the gateway server, this number of messages is MAX_DEVICE_FAILED_ATTEMPTS.
    You will need to check sniffer logs to see if this is the case.

    To get the offline device online again, this offline device must become active in the network again.
  • Yes it tries up the MAX_DEVICE_FAILED_ATTEMPTS if device doesn't responds to the requests it goes in to offline mode.

    & When Device goes offline we are not able to see the commands on sniffer. // shows something like Route Request.

    Find the attachment give below will help you to understand the problem

     

    My question is, how we get device back to online state?...


  • The Route Request is attempting to find a routing path to the "offline device".

    Can you check if the "offline device" is actually functional?
  • To return the device to "online state", you need to make sure that it responds to the messages that the ZNP sends to it.
  • understood,

    When device is in online state or is Responsive to the all requests send to it, works fantastically.

    But in case whenever the device is in non responsive mode / Offline. SRSP Cond Wait timed out! may occur.
    how to handle this error?.

    Regards,
    Shiv Patil.
  • I am not able to reproduce this behavior; when a device goes offline (by powering off), it is grayed out in the UI, but there is no time out in the linux gateway servers.

    Looking back at the sniffer log screenshot, it looks like your device (0x8692) is still functional (it sends the Route Reply).
    Do you still see similar behavior if you try any of the following:
    - sending commands less frequently from the gateway application
    - reducing the number of devices in the network
    ?

    Also, in your further investigations, do you still see "recv: Interrupted system call" in the gateway server logs? If yes, which server is printing this message?
  • Let us know if you have any updates to my followup questions.
  • hello Toby Pan,

    I am still not able to get a solution on,

    How to handle this ERROR : SRSP Cond Wait timed out!


    Regards,
    Shiv Patil.
  • Do you still see "recv: Interrupted system call" in the gateway server logs? If yes, which server is printing this message?

    As I'm unable to reproduce this issue, I still recommend sending commands less frequently.
  • Do one thing,

    Run the ./zigbeeHAgw bbb & ./start_application script.

    Add any device in network supports ON/OFF OR anye Actuation Command related to it.

    Continuously send ON &  OFF commands by pressing keys 'n' & 'f'.

    Try this as ferster as you can.

    After some time you will able to know that device is going to be offline now.

    The device details on start application will be display in gray color as given below.

    >00:12:4B:00:09:E2:91:21 B5F0 0F 0100(HA) | F2 0061(A1E0)   

    Try this for few more time you will able to see msg on start application logs.

    TIMEOUT waiting for confirmation      

    This means on ./zigbeeHAgw produces the error.

    ERROR : SRSP Cond Wait timed out!

     

    I am not sure but, this problem is may related to the NIP Linux server.

     

    Regards,

    Shiv Patil.

  • Again, I would not recommend sending commands so frequently from the ZNP to the other device.
    For example, if the other device is a sleepy end device, it may not poll quickly enough.
  • Hello Toby Pan,

    I have further tested the code in more details, now able to understand the problem,

    See the logs given below. will help you to understand it.

    ERR:: Interrupted system call
    [ERR] npi_ipc_errno 0x02030100
    Exit socket while loop
    cat: /proc/7128/cmdline: No such file or directory
    pid 7128 is not there
    count is 0, not 4
    kill -SIGUSR2 6985
    caught SIGUSR2, a server other than NWKMGR died!
    [12:50:12.281,096] [NWK_MGR/LSTN] CONNECT: Disconnected from client GATEWAY (connection # 7)
    waiting for GATEWAY SERVER to exit
    [12:50:12.624,424] [Z_STACK/LSTN] ERROR  : SRSP Cond Wait timed out!
    [12:50:12.624,610] [Z_STACK/LSTN] ERROR  : apicSendSynchData() failed getting response
    [12:50:12.625,185] [Z_STACK/LSTN] CONNECT: Disconnected from client GATEWAY (connection # 6)
    tracker exiting
    [12:50:14.625,804] [Z_STACK/LSTN] ERROR  : SRSP Cond Wait timed out!
    [12:50:14.626,238] [Z_STACK/LSTN] ERROR  : apicSendSynchData() failed getting response
    [12:50:17.398,019] [NWK_MGR/LSTN] CONNECT: Disconnected from client OTASRVR (connection # 8)
    waiting for OTA SERVER to exit
    ZLSZNP_arm: no process found
    waiting for Zstack linux to exit
    NPI_lnx_arm_server: no process found
    waiting for NPI to exit
    NETWORK MANAGER exited with code 140 on Fri May 17 12:50:17 IST 2019
    ./zigbeeHAgw: line 583: kill: (7173) - No such process
    making sure there are no lingering servers...

    In above logs, I you look at the line  npi_ipc_errno 0x02030100 & trace this error in npi_lnx_error.h file

    you will able to see that this error mapped to,

    #define NPI_LNX_ERROR_UART_SEND_FRAME_FAILED_TO_WRITE        0x02030100

    How this causes the SRSP time out Condition?

    Also How to solve this error OR handle it so that our zigbee servers does not crashed or get the delayed responses.

    Regard's 

    Shiv Patil.

  • Yes, it seems this happens in the NPI server.

    The non-success return value of npi_write causes the npi_lnx_ipc.c loop to break. This is the loop which runs the NPI server. When one server dies, zigbeeHAgw will kill the other servers.
    You can try handling it based on the actual error value returned by npi_write (which is a wrapper for the write function).

    // this is in npi_sendframe in npi_lnx_uart.c
       if (npi_write(pBuf, frlen) < 0) {
        perror("ERR:");
        free(pBuf);
        npi_ipc_errno = NPI_LNX_ERROR_UART_SEND_FRAME_FAILED_TO_WRITE;
        return NPI_LNX_FAILURE;
    }
    // this is in main() of npi_lnx_ipc.c
    else
    {
        //                          debug_
        printf("[ERR] npi_ipc_errno 0x%.8X\n", npi_ipc_errno);
        // Everything about the error can be found in the message, and in npi_ipc_errno:
        childThread = ((npiMsgData_t *) npi_ipc_buf[0])->cmdId;
        sprintf(toNpiLnxLog, "Child thread with ID %d in module %d reported error:\t%s",
                NPI_LNX_ERROR_THREAD(childThread),
                NPI_LNX_ERROR_MODULE(childThread),
                (char *)(((npiMsgData_t *) npi_ipc_buf[0])->pData));
        //                          printf("%s\n", toNpiLnxLog);
        writeToNpiLnxLog(toNpiLnxLog);
    }
    break;

  • Can I enable SPI communication between the CC2538 & Beagle Bone?..

  • In Z-Stack 3.0.x, ZNP over SPI is deprecated so you cannot use it.

  • You can reference the SPI module on the HA 1.2.2a ZNP from Z-STACK-ARCHIVE and port the functionality to your Z-Stack 3.0 ZNP accordingly.

    Regards,
    Ryan

  • Thank You Ryan, 

    I will work on it.

    I would also like to know that,

    Is Zigbee Linux Gateway has SPI Support?  (i.e. Can we use it as it is just by enabling the flags & GPIO's)

    OR

    we need to do modifications in code?...

  • SPI functionality is built into the HA 1.2.2a ZNP but modifications are required for the Z-Stack 3.0.  We already had this conversation: https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/784775/

    Regards,
    Ryan

  • Yes, I remember it.

    I am Asking about the Zigbee Linux gateway Application.

    Shall we need to do modifications on Zigbee Linux Gateway Application along with the Z-stack 3.0 Application code?..

    Or we can use it by just making changes in NPI_Gateway.cfg.

  • Ah, thank you for clarifying.  I believe SPI support has remained on the the Zigbee 3.0 Linux gateway application from the Z-Stack 1.2.2a implementation, however as this is unsupported you will have to change NPI_Gateway.cfg and test this for yourself.

    Regards,
    Ryan