Rui Zhang has posted about this problem on this forum but I thought I would sum up our current investigation of this mystery to see if it rings a bell with anyone. At this point, I am willing to believe the problem is TI code, our code, hardware design, or manufacturing. As you can tell, we have really narrowed it down! Any assistance is greatly appreciated.
The problem: in 3 instances out of about 50 device-years of runtime (200 devices for 3 months) we saw end devices which, after some period of days or weeks of normal operation, went into a state of altered sleep behavior which persists until the device is reset. When the device goes into P2 sleep, it apparently wakes up after something like 80x the programmed sleep time. We have observed the sleep time to exceed the 511s limit allowed by the sleep timer's 24-bit range.
Our end devices spend most of their time in P2 sleep; we use the internal RC oscillator for 32kHz timing. They are CC2430 devices running ZStack 1.4.3-1.2.1.
We have been able to experiment with the 3 devices showing this bug and watch their behavior both with a packet sniffer and an oscilloscope.
With device #1, we reset it by removing and replacing batteries. It resumed normal functioning and has not demonstrated the bug again.
With device #2, we issued a ZigBee end-device reset. Since the device was still polling its router, with finesse of the reset timing it received the reset. It resumed normal functioning and has not demonstrated the bug again.
With device #3, we retrieved it from the field without resetting it and have carefully observed its behavior.
In P0 mode the device functions properly; if we orphan it, for example, it will execute all correct activities to try to find a new parent and rejoin the network. Timings are correct during this process. When the device is associated and awakes from P2 sleep, it successfully communicates with other devices.
In normal operation with no external stimulus the device should wake from P2 and poll its parent for data every 8.9s. By observing RF activity with a packet sniffer and power state with an oscilloscope, we see that the device instead sleeps for about 750s before waking from P2 and polling. By all accounts, sleeping for more than 511s without waking up should be strictly impossible due to the range limitations of the 24-bit sleep timer. However, this is what we observe. Based on analysis of the source code and the behavior of other devices, we believe that the value written to the sleep timer corresponds to about 8.9 sec, which is how we arrive at our "something like 80x".
It has been suggested that perhaps there is some trouble with lock of the 32MHz xosc on transition to P0 which causes extra startup delay. Possible; however, we have seen that actual sleep time seems proportional to intended sleep time for (difficult to pin this down exactly, though) which would not be expected if this were the case.
My immediate suspicion points to the 32 kHz RC clock, perhaps a catastrophic calibration failure. Otherwise I can't explain a sleep more than 511s. However I don't see how this would persist more than one P2 sleep cycle, as the calibration disable bit is not set. I would think that recalibration after emerging from P2 would fix this hypothetical calibration problem. But what we see is that the state persists until the device is power cycled or soft reset.
Any advice appreciated!
Thanks,
Bill Gribble
Hi Bill and Rui also I guess ;)
First of all. Rui has written many good questions on this forum but unfortunately answers are still missing :(
Your problem description definitely rings my bells.
Here's what we have seen 2 times with the CC2430. (Only)
1) Our 30 seconds sleep cycle is suddenly ~3600 seconds cycle (~ 120 x 30).2) We can press a button during this erroneous cycle. It starts a led blink which is ~120x too slow ( Led on --> PM2 --> Led off)3) When the button is pressed down our device uses only PM0 and everything works fine. After it is released PM2 is entered again. Erroneous cycle continues.
Both two times the problem occured temperature was ~ -30 'C. In indoor circumstances this has never happened.
I have few questions:
1) Are you still struggling with this one? If not I would be extremely glad to here what was the root cause of the problem!
2)
>> "We have observed the sleep time to exceed the 511s limit allowed by the sleep timer's 24-bit range."
>> "By observing RF activity with a packet sniffer and power state with an oscilloscope, we see that the device instead sleeps for about 750s before waking from P2 and polling."
--> Are you 100% sure that CC2430 is CONTINUOUSLY in PM2 750s?
3) Is there any external interrupts in your application?
-Antero
Early silicon revs of the CC2430 had a bug where they wouldn't sleep properly. Check the rev of your chip vs. the errata to see if it's been fixed. And switch to the CC2530 for goodness' sake, it's way better and cheaper than the 2430.
--Derek
fixituntilitsbroken.blogspot.com
Hi Derek,
Our CC2430s are >= rev. E (Which means rev. E or rev. F). We are familiar with the AN044 but as it is says it is for CC2430 Revision D.
Here is a relevant post from another member: http://e2e.ti.com/support/low_power_rf/f/158/t/156639.aspx
As i said, unfortunately there is no answer for example to that one.
At the moment changing to CC2530 is not possible so we must do something.
Currently we are starting to test if change from PM0 to PM3 has something to do with this. Our hypothesis is that if the device changes PM0 to PM3 something awful might happen. After the device wakes up from PM3 it goes to PM0. In PM0 everything works fine but after that if we go to PM2, sleep time is way too long.
Hello,
Today we got some kind of a catch. The picture is below:
* Picture presents voltage measured over a shunt resistor in Vcc-line. (R is ~3ohm). I can be calculated by Umeas / R.
* The device should blink led 30ms on / 30 ms off
* Suddenly led blinks too slow (Can be seen in the picture ~500ms on / ~500ms off)
* As you can see something happens / something wakes up every 30ms even though our application doesn't know about it!
* This is just speculation but :
Bill's 8,9s changed to ~80x
Our 30ms changed to 16x
Our 30s changed to 120x
Quite close to power of two values don't you think?! The bigger the correct sleep time, the bigger the factor.
Hi Antero, quite a nice surprise to see activity on this issue after some time!
We have not nailed the root cause to this problem. We did identify some "marginal" components in our design that "might" contribute to it, but it's completely speculative. We implemented a software mitigation to identify devices showing these problems and reset them... not ideal, but seems to work.
The marginal components we identified were an incorrect capacitor value on the DCOUPL pin, which could interfere with the internal voltage regulator, and a 32 MHz XOSC part that is right at the edge of the spec loading and drift range.
After in-depth discussion with TI engineers (thanks, guys) we never identified ANY failure mode of the 32 kHz RC oscillator or sleep timer that could cause the behavior we observed.
When we put our device on the scope, we look at the voltage on the DCOUPL pin, which should show us a spike any time the CPU wakes up into P0. We see no activity on this pin at all for the full 800+ seconds between DRDY polling activity.
Please keep us up-to-date on any additional findings! Thanks for your report.
Hello Antero:
It's great to hear your experience on the similar problem. Actually our Sleep timers are also delayed around 110-120 times! Almost same as yours.(for instance, 8.9s regular polling timer is delayed as around 1100s). But we have not had clear evidence to show "The bigger the correct sleep time, the bigger the factor". Bill has described our investigation quite well. we are unable to exactly locate the root cause, but there are two issues we believe may be related to this problem.
1. we have a larger capacitor value at DCOUPL(pin42) side than TI's recommendation, which could potentially extend the discharging period when device goes into sleep and leave some room for potential risk as in AN044? However, we have not confirmed that the problem in AN044(CC 2430 Rev D) has been completely solved, or just mitigated. if the later case, our larger capacitor value could be a problem since it is beyond the test scenario?
2. main 32M XTAL of those buggy sensors have show some distortion, our HW engineer believe that should be fine for device operation and our device can perform normal operation except longer sleep timer. However, we use RCOSC so that 32K clock used for sleep is derived from XTAL, not sure whether this could be another potential problem?
3. Reset could solve this problem.
I am not sure whether you have similar problems, if possible(under permission of your company), I'd like to follow up with you for several questions regarding your descriptions.
1. Which Z stack version do you use? Although we highly suspect this should be a chip level issue, we still keep an eye on SW side and notice TI has changed several versions of TIMAC for CC2430. The code comparison shows some of changes happen around hal_sleep() funtion.
2. you mentioned the problem happens at -30C degree, our problems happen in indoor environment. So quite interesting to see whether temperature has impact on this or not. Have you tried to bring the buggy device to normal temperature and check whether the problem disappear?
3. You mentioned about CC2430 Rev F. we never heard there is Rev F, so we will follow up this.
Your post and descriptions are greatly appreciated! Hope we can drive this a little bit further and get the root cause. Keep in touch
Best
Rui
Hi Antero:
Following my previous post, Could you elaborate a bit more about the scope result?
(1) Which pin do you use to measure the voltage? there are several VCC inputs.
(2) Regarding your claim "something happens / something wakes up every 30ms even though our application doesn't know about it!". Dose this also happen for 30s->3600s sleep timer(the device wakes up 30s but the application dose not know about it until 3600s later)
(3) Do you have a big while loop in hal_sleep() function?TI changes/removes the while loop in hal_sleep() for its later TIMAC code(TIMAC -1.3.0 and later)
Thanks
-Rui
Hi there:
>Which Z stack version do you use?
--> Actually we are using TIMAC 1.3.1
>Have you tried to bring the buggy device to normal temperature and check whether the problem disappear?
--> We have brought 7 devices to -30C and 2 of them started to act incorrectly within 2 days. In indoor circumstances we have seen only the "catch" I posted yesturday. The amount of test time in indoor circumstances is much much longer. (I will write more about that to the end of this post)
>You mentioned about CC2430 Rev F. we never heard there is Rev F, so we will follow up this.
--> Yes. There is CC2430 Rev F. but actually it is same as CC2430. Rev E. This is verified by asking from TI support.
>Which pin do you use to measure the voltage? there are several VCC inputs.
--> We use a following measurement. By observing the voltage over 3ohm shunt resistor we can calculate how much current our PCB takes in and in which kinf of cycles. This tells us if the MCU is on or in sleep.
>Dose this also happen for 30s->3600s sleep timer(the device wakes up 30s but the application dose not know about it until 3600s later)
--> That is what we don't unfortunately know.
>Do you have a big while loop in hal_sleep() function?TI changes/removes the while loop in hal_sleep() for its later TIMAC code(TIMAC -1.3.0 and later)
--> I quess not because I don't know what you mean.
And more about the "catch".
In this case also there might be something with the DCOUPL capacitor (1uF) because I have soldered some test wires to it and might broke something. The pcb in which this happens doesn't recover even if I power it up again.
Hi again.
You have take part in the following thread:
http://e2e.ti.com/support/low_power_rf/f/156/t/65196.aspx?PageIndex=1
In TIMAC 1.3.1 and function hal_sleep():
/* Power on the MAC; blocks until completion. * Osal clock adjustment has been moved to macSleepWakeUp() function * to reduce the wake up time from sleep. */ MAC_PwrOnReq();
-->
Before macSleepWakeUp() is entered there is a little time window where something bad might happen if timer 2 is used for any purposes? (Interrupts are enabled again after waking up from sleep and for example external interrupt would cause osal_start_timerEx to be called etc. I don't know about your application so just guessing...)
We are using this timer 2 SYNC and are now wondering if this could cause some troubles to us...
And reporting more,
I wrote: application doesn't know about it!
--> Application don't, but oh yes SW does. SW starts running but for some reason OSAL (Timer 2) doesn't work properly. So from upper point of you it seems that time has slowed down. But i guess the sleep itself does the work as it should.
Hello, Antero:
Thanks for your fruitful answers. Here are some of my thoughts regarding your experimental results:
---"The pcb in which this happens doesn't recover even if I power it up again.",
So it seems to me that the reset can not solve this problem? If so, I will highly suspect it is some sort of hardware problem?
---Before macSleepWakeUp() is entered there is a little time window where something bad might happen if timer 2 is used for any purposes? (Interrupts are enabled again after waking up from sleep and for example external interrupt
My comments about this are as following:
1. Actually we use Timer4 instead of Timer2 for OSAL timer(separating MAC and OSAL timer) since we use TIMAC-1.2.1. I know the later TIMAC code changes this and TIMAC-1.3.1 uses MAC Timer(Timer2) as OSAL timer.
2. MAC_PwrOnReq() is still protected from INT. please check the code in hal_sleep(). I believe these codes will prevent external INT.
/* set CC2430 power mode */ HAL_SLEEP_SET_POWER_MODE(halPwrMgtMode); /* wake up from sleep */ HAL_ENTER_CRITICAL_SECTION(intState);
....
/* Power on the MAC; blocks until completion. * Osal clock adjustment has been moved to macSleepWakeUp() function * to reduce the wake up time from sleep. */ MAC_PwrOnReq(); } HAL_EXIT_CRITICAL_SECTION(intState);
3. Timer Sync after sleep(13.7.4.3 in CC2430 datasheet) is done in chip side, there is only one line of code in software side
#define MAC_RADIO_TIMER_WAKE_UP() st( T2CNF |= RUN; while (!(T2CNF & RUN)); )
we don't have that much to do it.
-----> Application don't, but oh yes SW does. SW starts running but for some reason OSAL (Timer 2) doesn't work properly. So from upper point of you it seems that time has slowed down. But i guess the sleep itself does the work as it should.
If so, device reset should solve it since it is software problem?
--We use a following measurement. By observing the voltage over 3ohm shunt resistor we can calculate how much current our PCB takes in and in which kinf of cycles. This tells us if the MCU is on or in sleep.
I am still not very sure what you measure. If you can specify which pin you measure in CC2430, that will be greatly helpful.
FYI, we measure the voltage change in DCOUPL side and GND, which is recommended by TI engineer. The voltage in DCOUPL side will drop when device enters sleep and arise when device wakes up. the curve is exactly like Fig 1 in AN044.
If possible, could you try to measure voltage change in DCOUPL side for buggy device and share your results?
One of my concern is your measurement seems to measure the "current our PCB takes in ", so PCB power consumption should be different from CC2430 power consumption, right?
Thanks for your follow up on this topic, If you can address my comments with your experience that will be greatly appreciated
Antero, One important missing in our discussion. Do you use XTAL or RCOSC as your 32K clock?
*****************************************************
>>---"The pcb in which this happens doesn't recover even if I power it up again.",
>>So it seems to me that the reset can not solve this problem? If so, I will highly suspect it is some sort of hardware problem?
Yes, me too. Below there is a picture of the not working one and working one: (This is DCOUPL pin)
PCB1: Not working: PCB2: Working (Took from -30'C):
As the picture 1 shows, something is wrong with HW. Looks that the LDO is oscillating to me. As I have previously said, in that scanario MCU wakes up correctly every 30ms but OSAL seems not to work (Or Timer 2).
>> MAC_PwrOnReq() is still protected from INT. please check the code in hal_sleep(). I believe these codes will prevent external INT.
Yes but I am not 100 % sure. MAC_PwrOnReq is a function offered by TIMAC-library. Somehow I understand the description of that function in mac_api.h that MAC_PwrOnReq() only launches the procedure and after that things happen via events. MAC_PWR_ON_CNF + possible event in the MAC Task?!?!?! This is of course what I cannot be sure because we have not bought the library code to be visible to us. The question is that in which point function macSleepWakeUp() is called. There is the MAC_RADIO_TIMER_WAKE_UP(). If I got time today I can try to debug if interrupts are enabled or disabled at this point (When SYNC is done).
>>-----> Application don't, but oh yes SW does. SW starts running but for some reason OSAL (Timer 2) doesn't work properly. So from upper point of you it seems that time has >>slowed down. But i guess the sleep itself does the work as it should.
>>If so, device reset should solve it since it is software problem?
I think like this also. But even if it was a HW problem it would nice to know why our hal_sleep.c works porperly but the OSAL events don't occur as they should.
>>I am still not very sure what you measure. If you can specify which pin you measure in CC2430, that will be greatly helpful.
>>One of my concern is your measurement seems to measure the "current our PCB takes in ", so PCB power consumption should be different from CC2430 power consumption, right?
The way I measured the behaviour of the MCU tells us more than just measuring the DCOUPL in certain cases. But now there are pictures of DCOUPL pin above. Thanks for the tip.
>>Antero, One important missing in our discussion. Do you use XTAL or RCOSC as your 32K clock?
--> We use RCOSC
In TIMAC1.4.0 hal_sleep.c it is said:
/* For CC2530, T2 interrupt won’t be generated when the current count is greater than * the comparator. The interrupt is only generated when the current count is equal to * the comparator. When the CC2530 is waking up from sleep, there is a small window * that the count may be grater than the comparator, therefore, missing the interrupt. * This workaround will call the T2 ISR when the current T2 count is greater than the * comparator. The problem only occurs when POWER_SAVING is turned on, i.e. the 32KHz * drives the chip in sleep and SYNC start is used. */ macMcuTimer2OverflowWorkaround();
Is there somebody who could tell me if this is an issue with CC2430 or not?
Since there are contradictory evidences that confuses us whether it is HW/SW problem. Could you make a special FW that signasl some pin right before/after sleep, then we can have clear understanding whether the device get out of SLEEP. Or turn off all Tx/Rx and other INT, just execute the 30ms cycle.
Also the scope pic of buggy device shows the DCOUPL pin has high voltage most of time, which indicates that it is in PM0? Since the left fig has 100ms unit and right one has 50ms unit. It seems both show same pattern?