Rerouting not working properly

Matt Warshawsky

Other Parts Discussed in Thread: CC2530, Z-STACK

We have a cc2530 based system with a single gateway, and multiple routers and sensor modules. All communications is between sensors and the gateway, and never between sensors. Routers are also setup as sensors, if only to ensure connectivity. The system works, however, it does not reliably route when a router goes offline, i.e. a sensor connected through a router will not reroute to ensure connectivity to the gateway and will only come back online once the router it was originally connected to comes back. This was tested in rather close proximity to ensure that there were alternative routes.

I am using the z-stack simple API with only a slight modification to get signal strengths from incoming packets. This was built up from the sample code. Somewhere along the line I must have set some parameter incorrectly, or am doing something out of order so that the routing tables are not correct. I am hoping that someone can take a look at my stripped down code and tell me what I missed. There is no issue with initial connectivity, just rerouting when a node goes offline. I'm guessing either I have a parameter incorrect, or my error handling in some of the events is not correct.

The sensor and router use almost identical code. Both send a periodic report back to the gateway. The router functionality is handled by the Z-stack simple API. Since this forum doesn't appear to have a good way to paste code with proper indentation, I'm attaching the reduced code stripped to the essentials.

Sensor + Router code:

Fullscreen 2500.sensorRouter.c Download

void bindAll(bool create) 
{
   zb_BindDevice( create, CMD_ID_SENSOR_GATEWAY, (uint8 *)NULL );  
}

void zb_HandleOsalEvent( uint16 event )
{
    if(event & SYS_EVENT_MSG)
    {
  
    }
  
    if( event & ZB_ENTRY_EVENT )
    {     
      uint8 startupOption;
      zb_ReadConfiguration(ZCD_NV_STARTUP_OPTION, sizeof(uint8), &startupOption);
      if (!(startupOption & ZCD_STARTOPT_DEFAULT_NETWORK_STATE)) 
      {
         startupOption |= ZCD_STARTOPT_DEFAULT_NETWORK_STATE;
         zb_WriteConfiguration(ZCD_NV_STARTUP_OPTION, sizeof(uint8), &startupOption);
         zb_SystemReset();
      }
  
#ifdef _REPEATER
    uint8 logicalType;
    zb_ReadConfiguration( ZCD_NV_LOGICAL_TYPE, sizeof(uint8), &logicalType );
    if ( logicalType != ZG_DEVICETYPE_ROUTER )
    {
      logicalType = ZG_DEVICETYPE_ROUTER;
       zb_WriteConfiguration(ZCD_NV_LOGICAL_TYPE, sizeof(uint8), &logicalType);
       zb_SystemReset();
    }
#endif    
      osal_start_timerEx(sapi_TaskID, EVT_REPORT, 50);    
      zb_StartRequest();
    }
    if (event & EVT_START_REQUEST) {
      zb_StartRequest();
    }
    if ( event & EVT_REPORT )    
      osal_start_timerEx(sapi_TaskID, EVT_REPORT, 100);
    }
    if ( event & MY_FIND_COLLECTOR_EVT )
    {
      // Delete previous binding
      if ((appState==APP_REPORT) || (appState == APP_SLEEP))
      {
          bindAll(FALSE);
      }
    
      // Find and bind to a collector device
      bindAll(TRUE);
  
    }
}

void zb_StartConfirm( uint8 status )
{
  // If the device sucessfully started, change state to running
  if ( status == ZB_SUCCESS )
  {
#ifdef _REPEATER
#else    
    zb_AllowBind(0);	    
#endif
    // Set event to bind to a collector
    osal_set_event( sapi_TaskID, MY_FIND_COLLECTOR_EVT );
  }
  else {
    osal_start_timerEx( sapi_TaskID, EVT_START_REQUEST, 2000 );    
  }
}

void zb_BindConfirm( uint16 commandId, uint8 status )
{
  static uint8 failCount = 0;
  if( status == ZB_SUCCESS )
  {    
    osal_set_event( sapi_TaskID, EVT_REPORT );
    failCount = 0;      
  }
  else
  {
    failCount++;
    if (failCount > 3) {
       zb_SystemReset();
    }
    osal_start_timerEx( sapi_TaskID, MY_FIND_COLLECTOR_EVT, myBindRetryDelay );
  }
}

void zb_SendDataConfirm( uint8 handle, uint8 status )
{
  static uint8 registerFailCount = 0;
  if(status != ZB_SUCCESS)
  {    
    registerFailCount++;
    if (registerFailCount > 3) {      
      zb_SystemReset();      
    }    
  }
  else
  {
    registerFailCount = 0;
  }
}

Gateway code:

Fullscreen 7144.gateway.c Download

void zb_HandleOsalEvent( uint16 event )
{
    uint8 logicalType;

    if(event & SYS_EVENT_MSG)
    {

    }

    if( event & ZB_ENTRY_EVENT )
    {
        
	// Force the device type to coordinator
        zb_ReadConfiguration( ZCD_NV_LOGICAL_TYPE, sizeof(uint8), &logicalType );
        if ( logicalType != ZG_DEVICETYPE_COORDINATOR )
        {
          logicalType = ZG_DEVICETYPE_COORDINATOR;
           zb_WriteConfiguration(ZCD_NV_LOGICAL_TYPE, sizeof(uint8), &logicalType);
           zb_SystemReset();
        }

        // Start the device
        zb_StartRequest();
    }

    if ( event & MY_START_EVT )
    {
      	zb_StartRequest();
    }

    if ( event & MY_FIND_COLLECTOR_EVT ) {}
}

void zb_StartConfirm( uint8 status )
{
  
    // If the device sucessfully started, change state to running
    if ( status == ZB_SUCCESS )
    {      
        zb_AllowBind( 0xFF );	    // Permanently enable allow bind mode       
    }
    else
    {
        // Try again later with a delay
        osal_start_timerEx( sapi_TaskID, MY_START_EVT, myStartRetryDelay );
    }
}

Config:

3288.f8wConfig.cfg

Sensor also has the following flags:

NWK_AUTO_POLL
HOLD_AUTO_START
REFLECTOR
POWER_SAVING
NV_INIT
DEVICE_LOGICAL_TYPE=ZG_DEVICETYPE_ENDDEVICE
ZIGBEEPRO
HAL_LED=FALSE
HAL_KEY=FALSE
HAL_SPI=FALSE
HAL_IRGEN=FALSE
HAL_UART_DMA_RX_MAX=128
HAL_UART_DMA_TX_MAX=128
HAL_UART=TRUE
INT_HEAP_LEN=2048

and in Enddev.cfg:

-DCPU32MHZ // CC2530s Run at 32MHz
-DROOT=__near_func // MAC/ZMAC code in NEAR

/* MAC Settings */
-DMAC_CFG_TX_DATA_MAX=3
-DMAC_CFG_TX_MAX=6
-DMAC_CFG_RX_MAX=3

Router has the following flags:

HOLD_AUTO_START
BUILD_ALL_DEVICES
REFLECTOR
NV_INIT
DEVICE_LOGICAL_TYPE=ZG_DEVICETYPE_ROUTER
ZIGBEEPRO
HAL_LED=FALSE
HAL_KEY=FALSE
HAL_SPI=FALSE
HAL_IRGEN=FALSE
HAL_UART_DMA_RX_MAX=128
HAL_UART_DMA_TX_MAX=128
HAL_UART=TRUE
INT_HEAP_LEN=2048
_REPEATER

Gateway has the following flags:

HOLD_AUTO_START
BUILD_ALL_DEVICES
REFLECTOR
NV_INIT
DEVICE_LOGICAL_TYPE=ZG_DEVICETYPE_COORDINATOR
ZDO_COORDINATOR
RTR_NWK
CONCENTRATOR_ENABLE=true
ZIGBEEPRO
HAL_LED=FALSE
HAL_KEY=FALSE
HAL_SPI=FALSE
HAL_IRGEN=FALSE
HAL_UART_DMA_RX_MAX=128
HAL_UART_DMA_TX_MAX=128
HAL_UART=TRUE
INT_HEAP_LEN=2048

over 13 years ago

0 YiKai Chen over 13 years ago

Guru 735695 points

Hi,

I do not see anything wrong on your settings. The zigbee mesh network needs some time to recovery the new routing table once there is any router missed in the network. Do you wait a moment to see if the device can find a new route to report?

0 Matt Warshawsky over 13 years ago in reply to YiKai Chen

Prodigy 45 points

Thanks for the reply.

Yes, I waited 5 to 10 minutes. What is "some time"? I have seen cases where it takes a few minutes to recover. For example, if I cycle power on the gateway, it takes maybe a minute or two for the closest nodes to connect, then another minute or two for each successive level. But I waited much longer than this for the mesh to recover from this routing change.

For a bit I had the gateway (concentrator) calling NLME_RouteDiscoveryRequest() every 15 seconds or so (with option 3) to try and maintain the routing tables. This seemed to be the recommedation from 5.4.1 of the Z-Stack Dev Guide for Many-to-one routing arrangements like mine. Is this required? Or perhaps this is creating a single routing option and then once a router dies the sensor doesn't know any other way to get to the gateway?

0 YiKai Chen over 13 years ago in reply to Matt Warshawsky

Guru 735695 points

Yes, it might takes up to many minutes. So, if you want 5 or minutes, will your device come back? It is recommend to call NLME_RouteDiscoveryRequest() periodically to maintain the routing table.

0 Matt Warshawsky over 13 years ago in reply to YiKai Chen

Prodigy 45 points

No, it won't come back until I power the router it was originally connected to back up.

How often should NLME_RouteDiscoveryRequest() be called?

Just in case, I'm having the hardware guys check the antennas. Perhaps there is a bad solder joint or something causing the signal to be reduced to only a few feet.

0 YiKai Chen over 13 years ago in reply to Matt Warshawsky

Guru 735695 points

Hi,

If you call NLME_RouteDiscoveryRequest() every minutes, will the lost end device come back?

0 Derek over 13 years ago in reply to YiKai Chen

Mastermind 9802 points

YiKai Chen said:

Yes, it might takes up to many minutes. So, if you want 5 or minutes, will your device come back? It is recommend to call NLME_RouteDiscoveryRequest() periodically to maintain the routing table.

This is very useful information, thanks. Why does it take so long for the route to recover? I'm testing this on a very simple network consisting of one coordinator, and two routers. Router A's parent is the Coordinator and Router B is communicating through Router A. All is working fine, just sending data straight to coordinators. Then when I remove Router A, it takes awhile for Router B to find a new Route back to the Coordinator. Router B isn't displaying any messages, and I'm seeing its message get ACKd correctly but the Coordinator isn't displaying it. I also see on packet sniffer that after each ACK there is a Route Request being broadcast from the Router but no response from the Coordinator (the only other device on the network).

0 TheDarkSide over 13 years ago in reply to Derek

TI__Genius 16570 points

Hi Derek,

did you check link status message in the sniffer from before and after removing router A. In the link status message you should who's coordinator or router B neighbor and link cost associated in both directions.

Can you please attach the logs?

Thanks,

TheDarkSide

0 Derek over 13 years ago in reply to TheDarkSide

Mastermind 9802 points

2766.Route Request Failure - Never gets a Route Reply 01_16_2013__01_16AM.zip

Hello,

Thanks for helping. Attached is a Ubiqua trace showing the two devices communicating properly, then when I turn off one the other is constantly sending out Route Requests. The device is also sending messages to the Coordinator, (and they're ack'd) but these aren't actually making it up to the application level. I'm using ZNP, nothing fancy. What could cause this?

Thanks,

Derek

0 Matt Warshawsky over 13 years ago in reply to YiKai Chen

Prodigy 45 points

No, and actually I realized I provided the wrong code. The proper code has the gateway calling that function every 15 seconds:

NLME_RouteDiscoveryRequest( 0xFFFC, 0x03, 0 );

Also, the sensor code is changed very slightly. During startup instead of making sure ZCD_STARTOPT_DEFAULT_NETWORK_STATE is set, it makes sure its cleared.

0 Ilya Averin over 13 years ago in reply to Derek

Expert 1815 points

Hi Derek,

could you try to send data from the coordinator to the router B (the router A is off)? Maybe the link is asymmetrical.

Regards,

Ilya

0 Derek over 13 years ago in reply to Ilya Averin

Mastermind 9802 points

Hello Ilya,

I added an option for Router B to use APS acking on request, and these aren't coming through in this case. These are back from Coordinator to Router B. The environment is all devices on a desk; it should be a pretty good environment. But with APS acking at least the lack of ack now generates an error, which I can detect and then if detected restart Router B. The problem I was having was that the messages would be getting acked by the Coordinator so my application wasn't able to detect an error has occurred. In the packet trace you can also see that Router B keeps on sending out "Route Requests" with no response from the Coordinator.

Has anyone else seen this, or knows how to solve it?

Thanks,

Derek

0 OD over 13 years ago in reply to Derek

TI__Expert 3050 points

Hi Derek,

From the sniffer log file and your description, it looks like the issue is with the mac layer not propperly passing the packets to the network layer at the coordinator:The packets are acked at the mac level, but fails to be processed at the nwk level and above:

- The link status messages from 0x467C are not processed by the coordinator (it's tx cost stays 0 even though 0x467C specifies rx cost from 0x0000 as 1)
- According to your post, aps ack is never received when requested.

The route request being ignored is actually by design, as the current code ignores route requests if the txCost for the router from which we got the request is 0. As can be seen in the Link Status messages from 0x0000, the txCost (outgoing cost) to 0x467C is 0. For some reason, it never gets updated (mac to nwk issue?)

If a many to one route is broken, the device facing a broken route is expected to issue a network status command, rather than sending route requests. Something may be wrong with the many-to-one configuration.

For being able to assist you further, could you please answer / provide the following:
- I assume the issue described by you and by Matt Warshawsky is the same issue. Is it?
- What version/release of Z-Stack do you use? As a general practice, I'd recommend to use the latest release - this issue may have been solved with it.
- Can you please provide a more complete log? Please include all the following: from the initial network formation, with aps ack enabled, through the problem being observed when a router is shut down, then link is OK again when the router is back on, and including periodic many-to-one route discovery messages from the concentrator.

Regards,

Oded.

0 Derek over 13 years ago in reply to OD

Mastermind 9802 points

Hello Oded,

I will run the tests and get the results back to you. Can you please contact me offline regarding this issue? There is some information which I can't make public.

Regards,

Derek

0 Derek over 13 years ago in reply to OD

Mastermind 9802 points

Hello Oded,

I'm setting up and running more tests. To answer your questions:

1. We are not using many to one or source routing, just standard routing.

2. Using Z-Stack 2.5.1, ZNP, on CC2530.

Your conclusion that it's a MAC to NWK issue make sense. This is disconcerting, as it means that there is a bug in the stack. Ug. Please contact me offline so I can send you additional application information.

Regards,

Derek

0 Derek over 13 years ago in reply to Derek

Mastermind 9802 points

Hello,

I reproduced the problem on TI hardware using the CC2530ZNP mini-kit hardware, using the mini-kit firmware.

I used the Basic Comms - Router firmware for the routers unmodified with the exception of using APS ACK instead of MAC ACK.

I used the Basic Comms-Coordinator firmware on the coordinator.

Firmware on all devices was brand new Z-Stack 2.5.1, ZNP, using TestHex, and the following #defines:

SECURE=1

ASSERT_RESET

POWER_SAVING

CC2530_MK

The cfg files were not changed, and there were no other changes to the Z-Stack.

Tests 1-5 were success, Test 6 was failure. There are ubiqua packet captures for each, along with a description. This is a high priority issue for the client.

Thanks,

Derek

0755.For TI.zip

0 Rui Zhang69778 over 13 years ago in reply to YiKai Chen

Genius 4525 points

You just need to set ROUTE_EXPIRY_TIME to certain value, them routing table will be released and new routing discovery needs to be issued

Rui

0 Ilya Averin over 13 years ago in reply to Rui Zhang69778

Expert 1815 points

Hi Derek,

I'm not sure that the overflow of the routing table might be a reason, as the data exchange between two neighbor nodes doesn't involve routing functionality.

If I understand correctly, the node (say A) generates the acks (say to the node B) at the MAC level but not at the APS level. Could you say whether you see the status link messages from the node B?

Regards,

Ilya

0 Derek over 13 years ago in reply to Ilya Averin

Mastermind 9802 points

Hello,

When the problem happens, I see Link Status messages from all nodes being transmitted correctly. I also see the device (0xCB1C) sending out Route Request messages with no corresponding Route Replies. Message frequency is low; each of the two routers sends out a message once every 2 seconds, so I doubt there are collisions going on.

Regards,

Derek

0 OD over 13 years ago in reply to Derek

TI__Expert 3050 points

Hi Derek,

I was able to reproduce the issue on my setup today. I'll continue investigating this one, and let you know the outcome.

Regards,

Oded.

0 Ilya Averin over 13 years ago in reply to Derek

Expert 1815 points

Hi Derek,

I'm wonder if you see that the coordinator indicates (in the link status messages) the router of interest as the neighbor?

Regards,

Ilya

0 Derek over 13 years ago in reply to Ilya Averin

Mastermind 9802 points

Hello,

Ilya, Below is the sequence of link status messages in relation to the event when I ran a test (Oded, this is Test 15). The router that gets kicked offline is 0x7B40, the other router is 0x819A /

11:42:57 - things working normally; Link status from coordinator shows 0x7B40 with inc cost of 0x0, outgoing cost of 0x0; and 0x819A has inc cost of 0x1, out of 0x1. 0x7B40 is routing through 0x819A successfully

11:43:07 - things working normally; Link Status from 0x7B40 (the affected node) show coord (incoming cost 0x1, outgoing 0x0) and the other router 0x819A (inc 0x1, out 0x1)

11:43:08 - Link status from other router 0x819A show coord (inc 0x1, out 0x1) and 0x7B40 (inc 0x1, out 0x1).

Based on these messages the coordinator does see the affected node, and vice-versa.

11:43:08 - Right about here is when I restart 0x819A. First I see four messages from 0x7B40 to 0x819A not getting ackd (because parent 0x819A is being restarted) and the last one from 0x7B40 to 0x0000 is getting MAC ACKd.

11:43:10 - 0x7B40 is sending out Route Requests but I don't see any reply from anyone else

11:43:12 - Link status from Coordinator indicates 0x7B40 (inc 0x0, out 0x0) and 0x819A (inc 0x1, out 0x1) - obviously 0x819A hasn't been pruned yet.

11:43:12 - Association Request/Response from the restarted node, now it's 0xFAD5, and device announces being broadcast

11:43:14 - another Route request from 0x7B40 being broadcast, with no response from anyone. 0x7B40 continues to send a message to 0x0000 and it is getting MAC ACKd, but not getting passed on to the application.

Oded, Thanks for working on this. While it sucks that we have this issue, I'm glad to know that I wasn't imagining things. :)

0 Ilya Averin over 13 years ago in reply to Derek

Expert 1815 points

Hi Derek,

as follows from your post, the coordinator doesn't see the router 0x7B40 as a neighbor but the router 0x7B40 does see the coordinator (...coordinator shows 0x7B40 with inc cost of 0x0, outgoing cost of 0x0, ..... Link Status from 0x7B40 show coord incoming cost 0x1, outgoing 0x0).

Are you sure that after the restart of the router 0x819A, the router 0x7B40 gets MAC acks from the coordinator and not from the the router 0x819A (0xFAD5)?

Regards,

Ilya

0 Derek over 13 years ago in reply to Ilya Averin

Mastermind 9802 points

Hi Ilya,

Following 0x819A's restart (0xFAD5), I see that the messages are from 0x7B40 to 0x0000. MAC Src. and NWK Src. are 0x7B40; MAC Dest and NWK Dest are 0x0000.

The following ACK has no Source Addr or Dest Addr so it's hard to know exactly who it came from. But since the ACK'd message was to destination 0x0000, I doubt that 0x819A / 0xFAD5 would be acknowledging it, right?.

0 Ilya Averin over 13 years ago in reply to Derek

Expert 1815 points

Hi Derek,

yes, if the MAC dest was 0x0000, the MAC ack could be obtained from the corresponding node only (i.e. the coordinator). However, as I wrote in my previous post, it seems the coordinator doesn't see the router 0x7B40 at the network level (the corresponding 'in' and 'out' costs are equal to 0). At the same time the router 0x7B40 sees the coordinator as the 'in' cost is equal to 1.

Regard,

Ilya

0 OD over 13 years ago in reply to Ilya Averin

TI__Expert 3050 points

Hi Derek,

Just wanted to update you that I'm working on a fix for this issue. I'll share it after having it verified.

Regards,

Oded.

Zigbee & Thread

Zigbee & Thread forum

Rerouting not working properly