RTOS/LAUNCHXL-CC1310: Task_sleep(1500) never returns?

Michael Moon

Expert 2050 points

Part Number: LAUNCHXL-CC1310
Other Parts Discussed in Thread: SYSBIOS, CC1350

Tool/software: TI-RTOS

Hi,

Today I'm having a problem where my task deadlocks in Task_sleep. I need to wait 15ms for an external chip to complete a measurement, but when I wait using Task_sleep(1500) my task never runs again.

If I check in the ROV, it says it's blocked on Task_sleep, and the callstack clearly shows that it's entered Task_sleep and scheduled another task which runs for a little then pends on a semaphore - however TI-RTOS never resumes my task even though the timeout has long since passed!

Currently, TI-RTOS has been waiting 15 minutes for a 15ms timeout to expire..

How can I find out why this is happening?

over 7 years ago

0 Michael Moon over 7 years ago

Expert 2050 points

With some experimentation, it appears that it only deadlocks if there's another task to switch to. If no other tasks are ready, it just snoozes for 15ms and returns as expected.

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

Apparently, *all* tasks that use Task_sleep when this issue occurs will deadlock.

They will all show as "blocked by Task_sleep(undefined)" in ROV, and remain that way apparently indefinitely.

I'm using BIOS in flash because I find it really difficult to debug stuff with BIOS in ROM.

I'm using CCS 7.1.0.00016, ti-rtos 2.21.00.06, and TI compiler (although I've experienced essentially the same sort of problems with GNU toolchain as well)

I've tried clean rebuild of project and restarting CCS to no avail - sometimes this actually helps although I've no idea why it would.

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

I tried swapping Task_sleep for Semaphore_pend(&sem, timeout) but this also deadlocks until/unless the semaphore is posted. The pend never times out despite being passed a timeout value.

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

When this state occurs, it seems that ti_sysbios_knl_Clock_Module__state__V.ticks never changes - I'm not sure if it's supposed to, or if it's supposed to wake up via Hwi when the RTC hits nextScheduledTick, or if it works some other way..

Here's a dump from the Expression viewer on the Clock_Module symbol:

ti_sysbios_knl_Clock_Module__state__V	struct ti_sysbios_knl_Clock_Module_State__	{ticks=8923581,swiCount=0,timer=0x200044A0 {__fxns=0x00000000 {__base=0x20004D00 ...,__sysp=...,__label=...,swi=...	0x20004664	
	ticks	unsigned int	8923581	0x20004664	
	swiCount	unsigned int	0	0x20004668	
	timer	struct ti_sysbios_interfaces_ITimer___Object *	0x200044A0 {__fxns=0x00000000 {__base=0x20004D00 {base=0xBEBEBEBE {base=???},__sysp=...,__label=...	0x2000466C	
		*(timer)	struct ti_sysbios_interfaces_ITimer___Object	{__fxns=0x00000000 {__base=0x20004D00 {base=0xBEBEBEBE {base=???},__sysp=0x0000BD81 ...,getNumTimers=...,__label=...	0x200044A0	
			__fxns	struct ti_sysbios_interfaces_ITimer_Fxns__ *	0x00000000 {__base=0x20004D00 {base=0xBEBEBEBE {base=???},__sysp=0x0000BD81 {__create=...,getNumTimers=...	0x200044A0	
			__label	unsigned int	1	0x200044A4	
	swi	struct ti_sysbios_knl_Swi_Object *	0x20004634 {qElem={next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C ...,prev=...,prev=...,prev=...,prev=...,fxn=...	0x20004670	
		*(swi)	struct ti_sysbios_knl_Swi_Object	{qElem={next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C {next=...,prev=...,prev=...,prev=...,prev=...,fxn=...	0x20004634	
			qElem	struct ti_sysbios_knl_Queue_Elem	{next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C ...,prev=...,prev=...,prev=...,prev=...,prev=...	0x20004634	
			fxn	void (*)(unsigned int,unsigned int)	0x00005E49	0x2000463C	
			arg0	unsigned int	0	0x20004640	
			arg1	unsigned int	0	0x20004644	
			priority	unsigned int	5	0x20004648	
			mask	unsigned int	32	0x2000464C	
			posted	unsigned short	0	0x20004650	
			initTrigger	unsigned int	0	0x20004654	
			trigger	unsigned int	0	0x20004658	
			readyQ	struct ti_sysbios_knl_Queue_Object *	0x2000462C {elem={next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C ...,prev=...,prev=...,prev=...,prev=...}	0x2000465C	
				*(readyQ)	struct ti_sysbios_knl_Queue_Object	{elem={next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C {next=...,prev=...,prev=...,prev=...,prev=...}	0x2000462C	
					elem	struct ti_sysbios_knl_Queue_Elem	{next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C ...,prev=...,prev=...,prev=...,prev=...,prev=...	0x2000462C	
						next	struct ti_sysbios_knl_Queue_Elem *	0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C {next=...,prev=...,prev=...,prev=...,prev=...	0x2000462C	
						prev	struct ti_sysbios_knl_Queue_Elem *	0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C {next=0x2000462C {next=...,prev=...,prev=...,prev=...,prev=...	0x20004630	
			hookEnv	void * *	0x00000000 {0x20004D00}	0x20004660	
				*(hookEnv)	void *	0x20004D00	0x00000000	
					*(*(hookEnv))	unknown	cannot load from non-primitive location	
	numTickSkip	unsigned int	50	0x20004674	
	nextScheduledTick	unsigned int	8923834	0x20004678	
	maxSkippable	unsigned int	3240050766	0x2000467C	
	inWorkFunc	unsigned short	0	0x20004680	
	startDuringWorkFunc	unsigned short	0	0x20004682	
	ticking	unsigned short	1	0x20004684	
	Object_field_clockQ	struct ti_sysbios_knl_Queue_Object__	{elem={next=0x2000477C {next=0x200041C4 {next=0x200041E8 {next=0x2000420C {next=...,prev=...,prev=...,prev=...,prev=...}	0x20004688	
		elem	struct ti_sysbios_knl_Queue_Elem	{next=0x2000477C {next=0x200041C4 {next=0x200041E8 {next=0x2000420C {next=0x20004230 ...,prev=...,prev=...,prev=...,prev=...,prev=...	0x20004688	
			next	struct ti_sysbios_knl_Queue_Elem *	0x2000477C {next=0x200041C4 {next=0x200041E8 {next=0x2000420C {next=0x20004230 {next=...,prev=...,prev=...,prev=...,prev=...	0x20004688	
			prev	struct ti_sysbios_knl_Queue_Elem *	0x200020C8 {next=0x20004688 {next=0x2000477C {next=0x200041C4 {next=0x200041E8 {next=...,prev=...,prev=...,prev=...,prev=...	0x2000468C

I'm not sure exactly what it's supposed to look like, but it looks sensible enough to me.

How can I find out why TI-RTOS's clock suddenly breaks?

Usually I'd just set a hardware watchpoint somewhere, but TI-RTOS source seems very convoluted with huge amounts of symbol redirection and I haven't managed to find out any sensible place to put a watchpoint yet...

I tried putting a breakpoint in Clock_tick but CCS spat errors at me about Clock_doTickFunc(0); not being an executable line of code.

So, how do I debug stuff when (as always) everything looks like TI-RTOS itself is choking?

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

ROV/BIOS/Scan for errors says:
,ti.sysbios.knl.Clock,Basic,ti.sysbios.knl.Clock@200020c8,N/A,Caught exception in view init code: "./xdctools_3_32_00_06_core/packages/xdc/rov/StructureDecoder.xs", line 518: java.lang.Exception: Target memory read failed at address: 0x2000462c, length: 32This read is at an INVALID address according to the application's section map. The application is likely either uninitialized or corrupt.
,ti.sysbios.knl.Clock,Module,N/A,N/A,Caught exception in view init code: "./xdctools_3_32_00_06_core/packages/xdc/rov/StructureDecoder.xs", line 518: java.lang.Exception: Target memory read failed at address: 0x2000462c, length: 32This read is at an INVALID address according to the application's section map. The application is likely either uninitialized or corrupt.

This is straight after a fresh build from clean.

0x200020C8 is inside my task's stack, it holds the value 0x20004300 which is Obect_field_clockQ inside ti_sysbios_knl_Clock_Module__state__V (from TI-RTOS) which just has next=0x200043F4 and prev=0x200020C8 (the address listed by BIOS)

These struct ti_sysbios_knl_Queue_Elem seem to be about 8 array elements in a circular linked list, with 0x200020C8 being the odd one out.

I assume Task_sleep has added something to the Clock's queue so it can be woken after the timeout.

Now, why Expression viewer and Memory Browser can see this memory, but ROV spits Java errors about it being inaccessible I have no idea.

Also I have no idea if this is relevant to the problem I'm having, I'm basically clutching at straws here because I'm finding it so difficult to navigate and debug TI-RTOS internals, which all of my numerous issues point to.

Whenever I pause my application, it's always stopped at address 0x10000486 which is in an unmapped part of the internal ROM - even though I've set my cfg to use TI-BIOS in flash for easier debugging.

Is it possible that whatever code is in this piece of ROM is messing up TI-RTOS internal state since the linker might not make room for its RAM usage if I select BIOS in flash?

Then again, I get the same sort of problems with BIOS in ROM, but they're far harder to debug because I get either no symbols at all, or messed up symbols everywhere else since rtos_rom.xem3 has stuff outside the 0x1001nnnn range.

Any ideas? Even suggestions for further debugging steps would be most welcome

0 ToddMullanix over 7 years ago in reply to Michael Moon

TI__Guru* 96960 points

Michael,

First, do out of the box examples work for you? For example UART Echo.

Do you have power enabled or disabled in your application? It's in PowerCC26XX_config.

Can you disable power management to see if it is related?

You can set a breakpoint in ti_sysbios_knl_Clock_doTick__I to see if the clock tick is happening.

Finally can you let me know the size and peaks of all the task stacks and Hwi stack? You can get these from ROV. You may have to enable the setting of the stacks. This is done by setting the following in the .cfg file.
var halHwi = xdc.useModule('ti.sysbios.hal.Hwi');
var Task = xdc.useModule('ti.sysbios.knl.Task');
Task.initStackFlag = true;
halHwi.initStackFlag = true;

Todd

0 Michael Moon over 7 years ago in reply to ToddMullanix

Expert 2050 points

I set a breakpoint on ti_sysbios_knl_Clock_doTick__I while my app was deadlocked, it did not trigger.

When I restart with the breakpoint set, after hammering away on F8 for a while, the breakpoint stopped triggering. My application continued to run on other interrupt sources (eg radio, SCS) since the breaking had upset it's timing and it didn't want to sleep anymore.

Curiously, before the breakpoint stopped triggering, it would trigger while pending on radio commands. After it stopped, the radio commands still worked fine but presumably the pend wouldn't time out if they somehow got stuck.

After switching power policy to doWFI, I seem to have regular calls to doTick and I no longer get deadlocks in Task_sleep.

I disabled the breakpoint so it could free-run for a while, and it seems fine.

Surprisingly, after switching back to standbyPolicy, it also isn't deadlocking anymore!

All I've done is set some breakpoints, change power policy, disable the breakpoints and change power policy back, and suddenly everything works fine... what effect could that have, that multiple debugger+target powercycles, a clean build and CCS restart doesn't?

Since it seems to work now, there's not much point in me checking examples?

Checking task stacks is one of the first things I do - as soon as I worked out that ROV requires the Task_struct to be global rather than static or dynamically allocated that is. I don't bother to post here if a stack has overflowed, that would be daft ;)

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

Aaand just as suddenly it's deadlocking again.

Images: deadlocked, ROV task view

As can be seen, neither task has overflowed its stack and they're both deadlocked in Task_sleep. The sleeps are both less than 100 milliseconds so they certainly shouldn't stop for longer than the several minutes I waited!

The Clock state thing looks ok, with nextScheduledTick being a little ahead of ticks, but doTick never gets called when the deadlock occurs.

So, presumably something somewhere is breaking the timer somehow? How can I find what's happening to it? How/where does the clock choose which timer to use?

I tried to get a dump of GPT0's registers after it deadlocked, but Memory Browser apparently can't read it for some reason.

ROV says the only Timer is the RTC, which is apparently disabled. It's also disabled when timeouts are working fine (after initial startup) so whatever is being used as a timing source isn't appearing in ROV's Timer pane.

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

Forgot to mention, it's still deadlocking with power policy doWFI, so I don't think it's a power management issue.

All I've been able to find out so far is that whatever is supposed to call Clock_doTick() stops doing so after a minute or two of normal running

0 Alan DeMars over 7 years ago in reply to Michael Moon

TI__Mastermind 30830 points

I suspect that interrupts have been disabled.

When it's locked up can you print out the contents of the CTRL_FAULT_BASE_PRI register under the Core registers view:

CTRL_FAULT_BASE_PRI 0x02000000 CM3 Special Registers [Core]

Above is how it should look with interrupts enabled.

CTRL_FAULT_BASE_PRI 0x02002000 CM3 Special Registers [Core]

Above is how it looks with an unbalanced Hwi_disable() call.

Alan

0 Michael Moon over 7 years ago in reply to Alan DeMars

Expert 2050 points

Hi Alan,

Thanks for your response.

CTRL_FAULT_BASE_PRI is 0x02000000 when deadlocked in Task_sleep, so apparently that's not the problem.

I'm always very careful to balance calls like Hwi_(en|dis)able and similar.

After randomly changing various stuff, I've found that the deadlock seems to vanish if I stop using GateTask to prevent corruption of some inter-task linked lists - I'm worried that these primitives might become corrupted without appropriate locking however.

I've checked that GateTask_enter and GateTask_leave are appropriately matched in all places where they're used, and found no problems.

Usually I'd use function-scoped C++ objects (ala std::unique_lock<mutex>) but I'm just as capable of dropping a _leave before every return or using goto funcname_leave; ... funcname_leave: GateTask_leave(...); return

Is GateTask known to be broken somehow? I'll try (ab)using a semaphore for inter-task resource sharing instead and see what happens.

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

Just tried using a semaphore instead of GateTask (pend to take lock, post to release) and I'm still getting deadlocks in Task_sleep.

I'd expect it to deadlock in Semaphore_pend if my code was wrong somehow.

Removing any attempt at preventing multi-thread access to my LL seems to allow things to run fine, ie Task_sleep no longer deadlocks.

Any suggestions for further debugging steps?

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

Here's something curious.. ROV/BIOS/Scan for errors says:
,ti.sysbios.knl.Clock,Module,N/A,N/A,Caught exception in view init code: "./xdctools_3_32_00_06_core/packages/xdc/rov/StructureDecoder.xs", line 518: java.lang.Exception: Target memory read failed at address: 0x20004998, length: 32This read is at an INVALID address according to the application's section map. The application is likely either uninitialized or corrupt.

,ti.sysbios.knl.Clock,Basic,ti.sysbios.knl.Clock@20002120,N/A,Caught exception in view init code: "./xdctools_3_32_00_06_core/packages/xdc/rov/StructureDecoder.xs", line 518: java.lang.Exception: Target memory read failed at address: 0x20004998, length: 32 This read is at an INVALID address according to the application's section map. The application is likely either uninitialized or corrupt.

The first one seems to always be there - at least it's present when CCS breaks on main, and when I pause while things are running normally.

The second one only appears when my task deadlocks! (Is this somehow relevant?)

Now, 0x20002120 is near the bottom of one of my task's stacks, why does ROV think there's a Clock there? Nothing shows up in the symbol table except for my task's stack at that address.

Perhaps the clock is somehow trying to use memory that the linker has assigned to something else? How could that possibly occur?

Perhaps that's simply the SP rather than a clock object location? It seems to match the FP when Task_sleep is called, but why would it then appear in ROV like that?

0x20004998 is after xdc_runtime_Main_Module__root__V symbol according to Memory viewer, which has zero troubles reading that memory address, however 'Expressions' suggests that that symbol is only an integer rather than a struct large enough to reach that address. I did a memory dump of the full contents of ram (because CCS's memory search seems to be broken) and 0x20004998 appears nowhere within it - there's no pointer to that address anywhere - so why is ROV/Clock trying to read it? That address isn't inside any known symbol...

If I add symbols from rtos_rom.xem3 (ostensibly an image describing the on-chip ROM), the error in ROV/BIOS/Scan for errors says:
,ti.sysbios.knl.Semaphore,Basic,(0x20004960),pendElems,Error: Problem scanning pend Queue: JavaException: java.lang.Exception: Target memory read failed at address: 0xbebebebe, length: 8
This read is at an INVALID address according to the application's section map. The application is likely either uninitialized or corrupt.

0x20004960 is the address of ti_sysbios_family_arm_m3_Hwi_Module_State_0_excActive__A - why does ROV think there's a Semaphore here? Could that somehow explain the problems I'm seeing?

0xbebebebe is the canary value that TI-RTOS fills memory with for me, evidently there's a wild pointer being picked up here. Perhaps adding rom symbols is irrelevant since I'm (or am supposed to be) using BIOS-in-flash?

I even tried putting all my code into a single task to eliminate any possibility of task-switch-induced problems and I'm still getting deadlocks in Task_sleep.

They seem to randomly disappear for a few runs sometimes, but then return when I change a quite innocuous bit of code here or there...

0 Alan DeMars over 7 years ago in reply to Michael Moon

TI__Mastermind 30830 points

Hmm...

Your use of GateTask set off some alarm bells. If not used correctly, GateTask can result in data structures used internally by the Task scheduler becoming fatally corrupted. Specifically, calling any blocking functions (such as Semaphore_pend(), Event_pend(), Task_sleep(), Mailbox_post/pend()) AFTER calling GateTask_enter() will corrupt the Task scheduling data structures.

There are carefully placed Asserts in the BIOS code that will catch all of these disastrous scenarios at runtime. However, those Asserts are disabled when building your application against the BIOS in ROM in favor of significantly improved performance.

Try building your application again with these lines commented out in your .cfg file:

// var ROM = xdc.useModule('ti.sysbios.rom.ROM');

// ROM.romName = ROM.CC1350.

Also, add this line to your .cfg file to enable the Asserts:

BIOS.assertsEnabled = true;

If your application is violating any of the rules for calling blocking APIs when the Task scheduler is disabled, an Assert will be raised at runtime that will reveal the problem.

Note that this configuration will use more flash and run more slowly than when using the BIOS in ROM.

Alan

0 Michael Moon over 7 years ago in reply to Alan DeMars

Expert 2050 points

I enabled asserts, but no asserts fired before or during the deadlock condition (unless they show up somewhere other than the console and allow the code to keep running)

I also enabled error-print, fault-print, and BIOS logging, and again nothing unusual in console before or during deadlock.

I have had stack and heap checking enabled the whole time, and even when I check manually in memory browser there appear to be no stack or heap related problems.

0 Alan DeMars over 7 years ago in reply to Michael Moon

TI__Mastermind 30830 points

Did you make sure you're not building against the ROM by deleting or commenting out all of the ROM code in your .cfg file?

Can you post the .map file associated with your .out file?

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

I tried splicing two tasks together - one task is simply a setup routine that initializes an external I2C chip then adds a callback hook to the other task's hook list so the chip can be communicated with at the appropriate time.

After it runs, it returns from the task function and gets terminated. I have of course carefully checked that all initialisation stuff happens in the appropriate order.

Instead, I removed the task creation stuff and called the init routine from the very head of my other task. Suddenly, no more deadlocks.

Then, I set everything back to normal, but added for (;;) Task_sleep(BIOS_WAIT_FOREVER - 1); in the bottom of the task to prevent it terminating. Again, no deadlocks.

Just to confirm that this seems to be triggering the deadlock, I removed the wait forever loop (allowing it to terminate) and ran again, and my other task deadlocked on Task_sleep within 2 minutes.

Is task termination also a known but undocumented cause of internal BIOS state corruption?

In my previous debugging I noticed that the address of Task_exit was placed as the topmost address in the Task stack so returning from task function should simply invoke Task_exit.

The documentation for Task_exit states "Task_exit is automatically called whenever a task returns from its top-level function."

When poking around in the ROV, this task was correctly marked as terminated both before and after my other task deadlocked.

As a final test sequence, I disabled BIOS asserts and logging and re-ran both with forever-wait and with task termination. Now, even with forever-wait, it deadlocked within ~2 minutes.. So, Task_exit coupled with no BIOS asserts or BIOS logging?

I re-enabled BIOS logging and still experienced the deadlock.

I re-enabled BIOS asserts and suddenly the deadlock is gone again.

I disabled BIOS logging and deadlock is back.

So with BIOS asserts and logging enabled, I currently don't get the deadlock but I'm sure some other weirdness will crop up when I start trying to actually progress with the development of my project.

... (time passes) ...

Ok, so I rearranged some (ostensibly unrelated) stuff, and the deadlock vanished for a couple of days (despite me reverting all the code to when it started happening and proceeding from there) but this morning it came back, but with a slightly different symptom..

I have a Task_sleep(1500); (hardcoded tick count) call in a device driver in which my code is deadlocking, however when I go and examine the stack around that call during deadlock, it has the value 1074274700 (0x4008218C) which is the address of PRCM_PDCTL1VIMS. All other tasks are either terminated or blocked, and this symptom wasn't happening before.

Apparently, something somewhere inside TI-RTOS is smashing the stack during the top of the Task_sleep() function before it actually sleeps..

As per above, I switched BIOS-in-ROM for BIOS-in-flash again and it seems to run fine now, however whether this somehow actually fixes the problem or merely relocates it somewhere else is a mystery.

I switched from BIOS-in-flash back to BIOS-in-ROM to see if it was just some oddity in the build process, and it eventually deadlocked again. Unfortunately I couldn't examine the stack because the ROV view got stuck on "Loading Packages..."

I did a clean build and things started working again, and just now I've run into the same deadlock with 0x4008218C in Task_sleep's "timeout" variable, yet with a hardcoded value of 1500 in the calling function...

Again, going back to BIOS-in-flash seems to fix things... for now.

It seems to be behaving at the moment (with BIOS-in-flash, it ran overnight without problems), but I've posted my map file as per request - apparently TI forum won't allow me to upload it here, as "map" extension isn't on the whitelist.

For comparison, I also posted the map for BIOS-in-ROM which still seems to reliably deadlock in Task_sleep within a minute or two.

0 Michael Moon over 7 years ago in reply to Michael Moon

Expert 2050 points

Update:

This problem is still happening, but now it's triggering after a longer time (between 30 minutes and several hours) so it's even more difficult to track down.

Is there really no-one with any ideas on how to debug this further? I'd happily set a watchpoint on TI-RTOS internal structures if I could work out which ones were relevant, but I'm not sure my employer wants me to spend my time reverse engineering TI-RTOS's scheduler and timing system interaction.

I've already tried using various hooks to find out what's happening, but they all seem fine right up until it stops. I've printed out every single pointer my program uses every time I use them, and they all seem fine too. I've removed all instances of dynamic allocation, I've pumped my stacks up until they're almost 3x the required size, I've dug around in TI-RTOS internals, set watchpoints, checked CPU flags.. I've even cut my project down to having only one task, all to no avail.

At this point it's starting to feel like it'd be simpler to rewrite the sections of TI-RTOS I need from scratch, and implement stubs for the few libraries I need..

0 Karl Wechsler over 7 years ago in reply to Michael Moon

TI__Mastermind 20805 points

This issue looks very similar to this:

e2e.ti.com/.../2152134

Can you check the post on April 3 with a workaround that works in that setup (change threshold from 4 to 6).

We are actively debugging that other thread and this seems very similar.

Regards,
Karl-

0 Michael Moon over 6 years ago in reply to Karl Wechsler

Expert 2050 points

Beautiful, I'm testing that change right now, thanks!

0 Karl Wechsler over 6 years ago in reply to Michael Moon

TI__Mastermind 20805 points

Can you let us know how this experiment goes? If this doesn't work, can you send across a test project so that we can reproduce? You can make a private friend request if you don't want to share your project on this forum.

One more question ... do you know which compiler version you are using? We have not seen problems with the timer code for several years. The original implementation was complex but it has been well tested and reliable in the field for a long time.

Thanks,
-Karl-

0 Michael Moon over 6 years ago in reply to Karl Wechsler

Expert 2050 points

I had one node run for the whole weekend which is a first!

I'll keep an eye on things and report back if I find any more deadlocks that appear to be inside Task_sleep() or Semaphore_pend(... timeout) or related functions, but for now I'm gonna call this tentatively solved since it deadlocked very reliably before.

0 Karl Wechsler over 6 years ago in reply to Michael Moon

TI__Mastermind 20805 points

Thanks for reporting back. This issue is being worked tracked in our bug data base as SYSBIOS-383.

0 Michael Moon over 6 years ago in reply to Karl Wechsler

Expert 2050 points

Hmm all my DNS servers say no such host for that URI, is that an internal bug tracker?

0 Michael Moon over 6 years ago in reply to Michael Moon

Expert 2050 points

I just had two nodes run overnight, with the third apparently experiencing a lockup in the XDS probe itself and thus losing UART connectivity. Still very promising for calling this issue solved :)

0 Alan DeMars over 6 years ago in reply to Michael Moon

TI__Mastermind 30830 points

As a follow-up to close out this thread…

After a root cause analysis, a precise timing scenario was found where a compare margin value of “4” could be insufficient. But a compare margin value of “6” will avoid the problem.

The fix for this issue will be tracked with bug ID: SYSBIOS-383 and will be resolved in the upcoming 3.20 TI-RTOS release.

As a workaround for the time being, you should modify the COMPARE_MARGIN definition in /packages/ti/sysbios/family/arm/cc26xx/Timer.c as below:

#define COMPARE_MARGIN 6

And then rebuild your application.

0 Michael Moon over 6 years ago in reply to Alan DeMars

Expert 2050 points

Well technically the fix also requires a rebuild of TI-RTOS itself and disabling BIOS-in-ROM as well, not just rebuild of the application, but that's certainly the heart of it.

0 Alan DeMars over 6 years ago in reply to Michael Moon

TI__Mastermind 30830 points

The fix works with BIOS in ROM since the affected API is not in the ROM.

No need to rebuild TI-RTOS. When the config phase of the application build process is performed, the change to Timer.c will be incorporated in the generated custom sysbios library that the application links with.

Alan

Processors

Processors forum

RTOS/LAUNCHXL-CC1310: Task_sleep(1500) never returns?