Soft lockup when sys_clock_gettime is called to often

Gilles Haverbeke

Other Parts Discussed in Thread: AM3352, AM3359

We have build a custom board with a AM3352 ES2.1 processor. We are often experiencing soft lockup kernel panics, that we are not able to explain.

We are running a own build kernel (3.12.8 with beaglebone patches) and a modified device tree. We've made a small test program that is calling sys_clock_gettime(CLOCK_REALTIME) in an endless loop. If we run this program for a few minutes, the following kernel panic always occurs:

[ 412.053485] BUG: soft lockup - CPU#0 stuck for 22s! [CrashTool:344]
[ 412.060147] Modules linked in: snd_soc_omap snd_pcm_dmaengine ipv6 autofs4
[ 412.067459]
[ 412.069048] CPU: 0 PID: 344 Comm: CrashTool Not tainted 3.12.8 #1
[ 412.075496] task: df49a800 ti: df5b2000 task.ti: df5b2000
[ 412.081231] PC is at __getnstimeofday+0xb4/0xfc
[ 412.086021] LR is at __getnstimeofday+0x98/0xfc
[ 412.090818] pc : [<c0077988>] lr : [<c007796c>] psr: 80000013
[ 412.090818] sp : df5b3f50 ip : 386d4503 fp : df5b3f90
[ 412.102959] r10: a398bf46 r9 : 00e01d0c r8 : b8beec00
[ 412.108487] r7 : 00000000 r6 : fff2c836 r5 : ffffffff r4 : c4653600
[ 412.115384] r3 : d9e8e0db r2 : ebe728b8 r1 : 00000000 r0 : 3b9ac9ff
[ 412.122288] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 412.129836] Control: 10c5387d Table: 9f504019 DAC: 00000015
[ 412.135914] CPU: 0 PID: 344 Comm: CrashTool Not tainted 3.12.8 #1
[ 412.142413] [<c0013cf8>] (unwind_backtrace+0x0/0xe0) from [<c0010f9c>] (show_stack+0x10/0x14)
[ 412.151455] [<c0010f9c>] (show_stack+0x10/0x14) from [<c050f050>] (dump_stack+0x68/0x84)
[ 412.160033] [<c050f050>] (dump_stack+0x68/0x84) from [<c009c598>] (watchdog_timer_fn+0x104/0x164)
[ 412.169435] [<c009c598>] (watchdog_timer_fn+0x104/0x164) from [<c0054134>] (__run_hrtimer.isra.15+0x84/0x100)
[ 412.179933] [<c0054134>] (__run_hrtimer.isra.15+0x84/0x100) from [<c0054980>] (hrtimer_interrupt+0x124/0x298)
[ 412.190436] [<c0054980>] (hrtimer_interrupt+0x124/0x298) from [<c0023118>] (omap2_gp_timer_interrupt+0x20/0x30)
[ 412.201126] [<c0023118>] (omap2_gp_timer_interrupt+0x20/0x30) from [<c00714b0>] (handle_irq_event_percpu+0x2c/0x170)
[ 412.212268] [<c00714b0>] (handle_irq_event_percpu+0x2c/0x170) from [<c0071630>] (handle_irq_event+0x3c/0x5c)
[ 412.222671] [<c0071630>] (handle_irq_event+0x3c/0x5c) from [<c007416c>] (handle_level_irq+0xd8/0xf0)
[ 412.232336] [<c007416c>] (handle_level_irq+0xd8/0xf0) from [<c0070f1c>] (generic_handle_irq+0x20/0x30)
[ 412.242188] [<c0070f1c>] (generic_handle_irq+0x20/0x30) from [<c000e7bc>] (handle_IRQ+0x64/0x8c)
[ 412.251493] [<c000e7bc>] (handle_IRQ+0x64/0x8c) from [<c00085e0>] (omap3_intc_handle_irq+0x60/0x74)
[ 412.261075] [<c00085e0>] (omap3_intc_handle_irq+0x60/0x74) from [<c0011a40>] (__irq_svc+0x40/0x50)
[ 412.270552] Exception stack(0xdf5b3f08 to 0xdf5b3f50)
[ 412.275899] 3f00: 3b9ac9ff 00000000 ebe728b8 d9e8e0db c4653600 ffffffff
[ 412.284552] 3f20: fff2c836 00000000 b8beec00 00e01d0c a398bf46 df5b3f90 386d4503 df5b3f50
[ 412.293198] 3f40: c007796c c0077988 80000013 ffffffff
[ 412.298550] [<c0011a40>] (__irq_svc+0x40/0x50) from [<c0077988>] (__getnstimeofday+0xb4/0xfc)
[ 412.307578] [<c0077988>] (__getnstimeofday+0xb4/0xfc) from [<c0077f70>] (getnstimeofday+0x8/0x24)
[ 412.316974] [<c0077f70>] (getnstimeofday+0x8/0x24) from [<c004fd54>] (posix_clock_realtime_get+0xc/0x14)
[ 412.327010] [<c004fd54>] (posix_clock_realtime_get+0xc/0x14) from [<c0050ee0>] (SyS_clock_gettime+0x28/0x84)
[ 412.337405] [<c0050ee0>] (SyS_clock_gettime+0x28/0x84) from [<c000dee0>] (ret_fast_syscall+0x0/0x30)
[ 412.347064] Kernel panic - not syncing: softlockup: hung tasks
[ 412.353232] CPU: 0 PID: 344 Comm: CrashTool Not tainted 3.12.8 #1
[ 412.359690] [<c0013cf8>] (unwind_backtrace+0x0/0xe0) from [<c0010f9c>] (show_stack+0x10/0x14)
[ 412.368715] [<c0010f9c>] (show_stack+0x10/0x14) from [<c050f050>] (dump_stack+0x68/0x84)
[ 412.377274] [<c050f050>] (dump_stack+0x68/0x84) from [<c050cc58>] (panic+0x84/0x1e0)
[ 412.385472] [<c050cc58>] (panic+0x84/0x1e0) from [<c009c5b8>] (watchdog_timer_fn+0x124/0x164)
[ 412.394499] [<c009c5b8>] (watchdog_timer_fn+0x124/0x164) from [<c0054134>] (__run_hrtimer.isra.15+0x84/0x100)
[ 412.404994] [<c0054134>] (__run_hrtimer.isra.15+0x84/0x100) from [<c0054980>] (hrtimer_interrupt+0x124/0x298)
[ 412.415488] [<c0054980>] (hrtimer_interrupt+0x124/0x298) from [<c0023118>] (omap2_gp_timer_interrupt+0x20/0x30)
[ 412.426170] [<c0023118>] (omap2_gp_timer_interrupt+0x20/0x30) from [<c00714b0>] (handle_irq_event_percpu+0x2c/0x170)
[ 412.437306] [<c00714b0>] (handle_irq_event_percpu+0x2c/0x170) from [<c0071630>] (handle_irq_event+0x3c/0x5c)
[ 412.447707] [<c0071630>] (handle_irq_event+0x3c/0x5c) from [<c007416c>] (handle_level_irq+0xd8/0xf0)
[ 412.457374] [<c007416c>] (handle_level_irq+0xd8/0xf0) from [<c0070f1c>] (generic_handle_irq+0x20/0x30)
[ 412.467221] [<c0070f1c>] (generic_handle_irq+0x20/0x30) from [<c000e7bc>] (handle_IRQ+0x64/0x8c)
[ 412.476521] [<c000e7bc>] (handle_IRQ+0x64/0x8c) from [<c00085e0>] (omap3_intc_handle_irq+0x60/0x74)
[ 412.486095] [<c00085e0>] (omap3_intc_handle_irq+0x60/0x74) from [<c0011a40>] (__irq_svc+0x40/0x50)
[ 412.495565] Exception stack(0xdf5b3f08 to 0xdf5b3f50)
[ 412.500915] 3f00: 3b9ac9ff 00000000 ebe728b8 d9e8e0db c4653600 ffffffff
[ 412.509572] 3f20: fff2c836 00000000 b8beec00 00e01d0c a398bf46 df5b3f90 386d4503 df5b3f50
[ 412.518221] 3f40: c007796c c0077988 80000013 ffffffff
[ 412.523577] [<c0011a40>] (__irq_svc+0x40/0x50) from [<c0077988>] (__getnstimeofday+0xb4/0xfc)
[ 412.532601] [<c0077988>] (__getnstimeofday+0xb4/0xfc) from [<c0077f70>] (getnstimeofday+0x8/0x24)
[ 412.542000] [<c0077f70>] (getnstimeofday+0x8/0x24) from [<c004fd54>] (posix_clock_realtime_get+0xc/0x14)
[ 412.552039] [<c004fd54>] (posix_clock_realtime_get+0xc/0x14) from [<c0050ee0>] (SyS_clock_gettime+0x28/0x84)
[ 412.562442] [<c0050ee0>] (SyS_clock_gettime+0x28/0x84) from [<c000dee0>] (ret_fast_syscall+0x0/0x30)

The weird thing is that we can boot our kernel also on the beaglebone black board (AM3359 ES2.0) with exactly the same configuration & device tree, and the problem never occurs there. We can run the endless-sys_clock_gettime-program there for hours without problems.

I have already tried to update the kernel to a 3.13.6 kernel, but that doesn't change a thing.

Has anyone every experienced this behavior, or is able to point us in the right direction?

Thanks in advance.

over 11 years ago

0 Miroslav Kiradzhiyski XID over 11 years ago

TI__Mastermind 25235 points

Hi,

What are the differences between your board and the BBB?

Are you using the same root file system on the two boards?

Best regards,
Miroslav

0 Mark Vermeulen1 over 11 years ago in reply to Miroslav Kiradzhiyski XID

Prodigy 20 points

There are some small differences. The cpu is different. We use the AM3352BZCZ60 vs AM3359AZCZ100 on the beagle bone black.

One of the differences which also come into mind is we use a battery to provide power to the on cpu rtc when normal power is not available. (See attachment)

We use the same microSD card (with the same kernel and root file system) to boot the two boards.We also tried other boards with the same reproducable result.

Which other hardware component could be related to the hanging of the clock_getsystime() call?

Best regards, Mark Vermeulen

0 Gilles Haverbeke over 11 years ago in reply to Mark Vermeulen1

Prodigy 30 points

As you can see my colleague Mark replied with the answers to your questions.

We have been further investigating this problem but we are not yet able to trace the root cause. The delay between starting the "test crash program" and the real crash seems to vary (so far we've seen values between 25 seconds and 250 seconds).

Has anyone got tips on how we can further trace this problem?

0 Gilles Haverbeke over 11 years ago in reply to Gilles Haverbeke

Prodigy 30 points

After more research we finally found the problem.

Because our custom board architecture, a part of the u-boot initialization was not executed. The missing line was:

do_setup_dpll(&dpll_core_regs, &dpll_core_opp100);

After that, the issue disappeared.

Processors

Processors forum

Soft lockup when sys_clock_gettime is called to often