CC2674R10: Various Timer Source Rollovers

Stuart Baker

Part Number: CC2674R10

Tool/software:

I'm working with a customer on this device that is experiencing a strange issue that occurs at about 12 hours of continuous operation following a reset. Essentially, the device does not go to sleep, and it appears there is a standby inhibit in place. The issue if very reproducible, but they are struggling to identify the root cause. The customer agrees that this is likely an application specific issue, and not likely an issue with TI platform software. We wanted to look at the various time based rollover sources to see if that might give us some clues, if there was any correlation.

I don't think there is a correlation, but I hoping for someone to check my work. They are using FreeRTOS + OpenThread + BLE. SDK version is 7.10.02.23. Their FreeRTOS tick rate is 10kHz (100us).

Radio Timer

32-bit, fixed 4-MHz rate
((2^32) / (4 * 1000 * 1000)) / 60 = ~17.9 minutes rollover period

FreeRTOS

32-bit, fixed 10kHz rate
((2^32) / (10 * 1000)) / (60 * 60 * 24) = ~4.97 days rollover period

RTC (used as the wake-up timer in standby)

This piece I'm a little bit less clear on. The documentation says it is a "70-bit free running counter", but I only see a 32-bit "sub-second" and 32-bit "second" component. That adds up to only 64-bits. Let's assume that it rolls over after 2^32 seconds.

(2^32) / (60 * 60 * 24 * 365) = ~136 years rollover period

However, I think we need to look at the rollover of the channel events. I think the standy wake-up timer uses the RTC channel 0 event (in tickless idle mode). This appears to use 16-bits for seconds and 16 bits for sub-seconds. This would approximate the rollover time to:

(2^16) / (60 * 60) = ~18.2 hours rollover period

In summary, I'm asking:

Are their any potential timer rollovers I should be aware of?
Is my math correct on the three rollover sources I have already identified?
1. I'm particularly interested in if my assessment of the RTC is correct as this is the most difficult to follow in the source and TRM.

Any other ideas for root cause investigation are of course welcome.

Thanks,

Stuart

3 months ago

0 Ryan Brown1 2 months ago

TI__Guru**** 211917 points

Hi Stuart,

I will notify Radio driver experts to help answer your rollover questions. Does the issue require RF activity to occur, or will it happen after 12 hours regardless of stack activity? Are you able to enter a debug session during this time and collect the AON_RTC registers? It would also be helpful if you could add RFCC26X2_multiMode.c to your project locally and view RTC timestamps obtained through AONRTCCurrent64BitValueGet, as well as the return of RF_getCurrentTime. Is the customer calling any RTC APIs in their application directly?

Regards,
Ryan

0 Ryan Brown1 2 months ago

TI__Guru**** 211917 points

Here is some initial feedback:

If the customer is not entering standby when they think they should be, that is probably due to uneven set / release cycles for the standby constraint. That is something that can be read out at runtime PowerCC26X2_module.constraintCounts[2] is what they should check. We can only enter standby when that counter reaches zero.
TI Drivers do not test or validate their SW with FreeRTOS tick rates other than 1ms. At 100us, we may end up with significant overhead servicing the FreeRTOS tick function. That probably has no impact on the current issue but it is worth seeing whether they can change that to 1ms and see whether it makes a difference (assuming that is simple to do in their application).
The RTC is not exposed to customers directly unless they actively use the second RTC channel. Otherwise, they are accessing the RTC through the ClockP module, which only maintains time in 32-bit system ticks of 10us. Rollover for ClockP thus occurs after 2^32 * 10us * 1s/1000000us * 1m/60s * 1h/60m = 11.93h

Looks like this may be the culprit. Or at least ClockP rollover occurs after 11.93 hours, meaning it is involved. The ClockP implementation for CC2674X uses kernel/freertos/dpl/ClockPCC26X2_freertos.c and kernel/freertos/dpl/TimerPCC26XX_freertos.c. It is almost a direct copy of the old TI-RTOS Clock implementation.

static ClockP_Module_State ClockP_module keeps the state of the clock. You should be able to access that in CCS with the usual reading static variables trick. Or just recompile with the code altered in the SDK to remove the static.

Hope this helps,
Ryan

0 Stuart Baker 2 months ago in reply to Ryan Brown1

TI__Genius 10500 points

Ryan,

These suggestions are very helpful. I've already discussed these briefly with the customer, and we have some time scheduled tomorrow to make a deeper dive investigation.

Ryan Brown1 said:
If the customer is not entering standby when they think they should be, that is probably due to uneven set / release cycles for the standby constraint. That is something that can be read out at runtime PowerCC26X2_module.constraintCounts[2] is what they should check. We can only enter standby when that counter reaches zero.

Yes, I agree. We think it is very likely that there is an active constraint. We also think it is likely to be tied to one of the UARTs. We've instrumented to UART driver constraints with entry and exit hooks, the results of which have led us to be suspicious of this peripheral driver as the source of a stuck constraint. Of course, application level logic can still be to blame, which is what we suspect to be the case.

Ryan Brown1 said:
TI Drivers do not test or validate their SW with FreeRTOS tick rates other than 1ms. At 100us, we may end up with significant overhead servicing the FreeRTOS tick function. That probably has no impact on the current issue but it is worth seeing whether they can change that to 1ms and see whether it makes a difference (assuming that is simple to do in their application).

Noted. We are aware of this. There is a potential path to reduce the tick rate to 1kHz. At this time, we think this is very unlikely to be the root cause, but will keep it under consideration.

Ryan Brown1 said:
The RTC is not exposed to customers directly unless they actively use the second RTC channel. Otherwise, they are accessing the RTC through the ClockP module, which only maintains time in 32-bit system ticks of 10us. Rollover for ClockP thus occurs after 2^32 * 10us * 1s/1000000us * 1m/60s * 1h/60m = 11.93h

I think this is the most interesting. I looked through the ClockP driver for the CC2674R10 device. I agree with the rollover math. Most of the customer application timer usage is actually directly on the FreeRTOS timer API (SysTicks, instead of ClockP), which is why thy wanted a 10kHz tick rate. However, I think anything scheduled through the OpenThread API is ported on top of a POSIX timer, which is in turn on top of a ClockP timer. We will confirm this architecture. Furthermore, the customer is using TI Drivers that use ClockP timers under the hood, including the UART driver with its "timeout" API's.

We will focus our efforts mostly here for now.

Ryan Brown1 said:
Looks like this may be the culprit. Or at least ClockP rollover occurs after 11.93 hours, meaning it is involved. The ClockP implementation for CC2674X uses kernel/freertos/dpl/ClockPCC26X2_freertos.c and kernel/freertos/dpl/TimerPCC26XX_freertos.c. It is almost a direct copy of the old TI-RTOS Clock implementation.

Agree.

Ryan Brown1 said:
static ClockP_Module_State ClockP_module keeps the state of the clock. You should be able to access that in CCS with the usual reading static variables trick. Or just recompile with the code altered in the SDK to remove the static.

Agree. We might consider some instrumentation here as well since it can be difficult to stay attached to the target for ~12 hours, or to attach to a target that has been running for ~12 hours without inadvertently causing a reset.

Thanks,

Stuart

0 Stuart Baker 2 months ago in reply to Stuart Baker

TI__Genius 10500 points

Stuart Baker said:
However, I think anything scheduled through the OpenThread API is ported on top of a POSIX timer, which is in turn on top of a ClockP timer.

According to my review of source/ti/posix/freertos/timer.c, the POSIX timers appear to be implemented on top of the FreeRTOS timers. Therefore, I don't think these are a likely source of any rollover issue relative to the ClockP implementation.

This actually tracks with one of the limitations of POSIX timers on FreeRTOS, in that they do not technically support SIGEV_SIGNAL and only support SIGEV_THREAD, because FreeRTOS timers always run in task (thread) context.

Thanks,

Stuart

+1 Stuart Baker 17 days ago in reply to Stuart Baker

TI__Genius 10500 points

After sitting down with the customer, looking at these likely timer rollover sources, we narrowed the likely candidate down to a ClockP timer, which is used in the UART2 driver for reading with a timeout. It turns out that the customer setup the following sequence of events:

Call a "blocking" UART read.
1. The result, under the hood, is actually a read with ClockP timeout of maximum time (11.93 hours)
Asynchronously, but later in time, from another thread that handled power management from the application, disabled the UART RX, which removes the low power inhibit.
After 12 hours, the blocking UART read returns (maximum timeout period). Another call of "blocking" UART read is made.
1. This results, under the hood, in the RX being re-enabled with an active low power inhibit.
The asynchronous power management thread is unaware that the UART read returned and was re-armed, thus does not come in later to disable the RX and remove the low power inhibit.

The customer has modified their application logic to this now known behavior and confirmed the issue resolved.

Thanks,

Stuart

Zigbee & Thread

Zigbee & Thread forum

CC2674R10: Various Timer Source Rollovers