ZACCEL hangs/stuck forever

Zaphod

Hi,

I've been experiencing an issue with ZACCEL for some time, where the router would hang forever and would require hard restart to be able and operate again.

Today I've finally managed to drill down to the location and call stack when that happens:

1) I am calling zb_SendDataRequest (sapi.c) with a SAPI command to send
2) zb_SendDataRequest calls zaccelRPC (zaccel.c)
3) zaccelRPC calls spiSREQ (spi.c)
4) spiSREQ runs a while (!getSRDY2()) loop
5) getSRDY2 (spi.c) always returns false, thus the ZACCEL is stuck forever

Did anyone encounter this, or knows this problem with ZACCEL? is there a resolution?

Thanks in advance,

Zaphod

over 17 years ago

0 Dirty Harry over 17 years ago

TI__Mastermind 19350 points

To begin, it is important to realize that the CC2480 is lossy on the SPI as both master and slave even in the perfect, isolated arena of a factory test routine. To be sure, the loss is very small - I remember something on the order of 1 byte in 15,000, but since I don't have the factory test results sheet available, please do not use that number as a fact. In the harsh arena of running the hard real-time requirements of the 802.15.4 MAC and the soft real-time requirements of the ZigBee stack, it might be reasonably expected that the SPI would be more lossy than in factory test. In any case, under the load of the ZASA, the SPI was seen to be lossy and so the RPC driver on both the CC2480 and the host needs to abort a "stuck" state after some reasonable tolerance for latency.

On the CC2480, whenever the RPC state machine is "stuck" for more than 1 second, the watchdog timer expires and resets the board. Because of this, you can never use a debugger on the host to manually step through an RPC transaction - the CC2480 slave will have reset one or more times. The RPC transactions with the CC2480 slave must therefore run full speed. If the host hits a breakpoint or the user stops the host in a debugger session and there is an RPC transaction underway, the CC2480 will surely reset because of detecting a "stuck" RPC state. When the host is allowed to run full speed, getSRDY2() in spi.c cannot always return FALSE. This is because every time that the CC2480 resets, it sends an AREQ to indicate to the host that is has reset and the reason (power / watchdog) - this sending of an AREQ causes the CC2480 to set its SRDY (slave ready) line low, which causes any loop calling getSRDY2() to succeed at the top on the call to getSRDY1(), which allows the host to pump data over the SPI, which causes the slave to get an SPI Rx complete and set its SRDY high, which causes getSRDY2() to succeed.

On the host, the algorithm for detecting and aborting a "stuck" RPC transaction should be customized according to the deadlines and SPI use by the host application. There will be many factors to take into account including whether the host is a multi-slave master, a single or multi-thread program, hard or soft real time deadlines, ultra-low power or mains powered, amoung others. For ZASA, you will note that the priimary design requirement to be met is low power. Thus, the host effects the lock-step RPC protocol by blocking and sleeping at each step. The block and sleep waits for what is a reasonable but arbitrary amount of time for ZASA - empirically, it works. This wait time could be tweaked for a specific host application.But it is important to realize that the CC2480 slave could have a latency in responding to a host MRDY (master ready pulse) upwards of 50 usecs, at least double that for making a SRSP (synchronous command response), and milliseconds to seconds for generating the results of a AREQ (asynchronous command request) - although all latencies are mostly very much less.

The algorithm for detecting and escaping a "stuck" RPC transaction in the ZASA S/W works for ZASA running on the development boards. But it is not the end all solution for every H/W platform nor every host application. If you have modified ZASA or ported to different H/W, it is possible that for your configuration the algorithm is now broken and must be modified accordingly.

Are you clocking the SPI slave too fast or on the edge? The CC2480 max is 4 Mbps.

Is your SPI connection noisy or clean? Common ground?

0 Zaphod over 17 years ago in reply to Dirty Harry

Prodigy 155 points

Hey Dirty Harry,

Thanks for the profound answer - it was very interesting to read and learn!

The platform I am currently using is the same sample board that comes with the evaluation kit, and I have not modified it at all. We have designed our own board, but it was not manufacured yet.

Also we've made very minor changes to the ZACCEL itself, except for the higher level application layer that we wrote, but I doubt it is that CPU intensive.

Referring to the debugging point you raised, it may very well be that debugging caused this issue to occur, since we did detect this situation via a debugging session.

Thanks again!

Zaphod.

Zigbee & Thread

Zigbee & Thread forum

ZACCEL hangs/stuck forever