We have build a custom board with a AM3352 ES2.1 processor. We are often experiencing soft lockup kernel panics, that we are not able to explain.
We are running a own build kernel (3.12.8 with beaglebone patches) and a modified device tree. We've made a small test program that is calling sys_clock_gettime(CLOCK_REALTIME) in an endless loop. If we run this program for a few minutes, the following kernel panic always occurs:
[ 412.053485] BUG: soft lockup - CPU#0 stuck for 22s! [CrashTool:344]
[ 412.060147] Modules linked in: snd_soc_omap snd_pcm_dmaengine ipv6 autofs4
[ 412.067459]
[ 412.069048] CPU: 0 PID: 344 Comm: CrashTool Not tainted 3.12.8 #1
[ 412.075496] task: df49a800 ti: df5b2000 task.ti: df5b2000
[ 412.081231] PC is at __getnstimeofday+0xb4/0xfc
[ 412.086021] LR is at __getnstimeofday+0x98/0xfc
[ 412.090818] pc : [<c0077988>] lr : [<c007796c>] psr: 80000013
[ 412.090818] sp : df5b3f50 ip : 386d4503 fp : df5b3f90
[ 412.102959] r10: a398bf46 r9 : 00e01d0c r8 : b8beec00
[ 412.108487] r7 : 00000000 r6 : fff2c836 r5 : ffffffff r4 : c4653600
[ 412.115384] r3 : d9e8e0db r2 : ebe728b8 r1 : 00000000 r0 : 3b9ac9ff
[ 412.122288] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 412.129836] Control: 10c5387d Table: 9f504019 DAC: 00000015
[ 412.135914] CPU: 0 PID: 344 Comm: CrashTool Not tainted 3.12.8 #1
[ 412.142413] [<c0013cf8>] (unwind_backtrace+0x0/0xe0) from [<c0010f9c>] (show_stack+0x10/0x14)
[ 412.151455] [<c0010f9c>] (show_stack+0x10/0x14) from [<c050f050>] (dump_stack+0x68/0x84)
[ 412.160033] [<c050f050>] (dump_stack+0x68/0x84) from [<c009c598>] (watchdog_timer_fn+0x104/0x164)
[ 412.169435] [<c009c598>] (watchdog_timer_fn+0x104/0x164) from [<c0054134>] (__run_hrtimer.isra.15+0x84/0x100)
[ 412.179933] [<c0054134>] (__run_hrtimer.isra.15+0x84/0x100) from [<c0054980>] (hrtimer_interrupt+0x124/0x298)
[ 412.190436] [<c0054980>] (hrtimer_interrupt+0x124/0x298) from [<c0023118>] (omap2_gp_timer_interrupt+0x20/0x30)
[ 412.201126] [<c0023118>] (omap2_gp_timer_interrupt+0x20/0x30) from [<c00714b0>] (handle_irq_event_percpu+0x2c/0x170)
[ 412.212268] [<c00714b0>] (handle_irq_event_percpu+0x2c/0x170) from [<c0071630>] (handle_irq_event+0x3c/0x5c)
[ 412.222671] [<c0071630>] (handle_irq_event+0x3c/0x5c) from [<c007416c>] (handle_level_irq+0xd8/0xf0)
[ 412.232336] [<c007416c>] (handle_level_irq+0xd8/0xf0) from [<c0070f1c>] (generic_handle_irq+0x20/0x30)
[ 412.242188] [<c0070f1c>] (generic_handle_irq+0x20/0x30) from [<c000e7bc>] (handle_IRQ+0x64/0x8c)
[ 412.251493] [<c000e7bc>] (handle_IRQ+0x64/0x8c) from [<c00085e0>] (omap3_intc_handle_irq+0x60/0x74)
[ 412.261075] [<c00085e0>] (omap3_intc_handle_irq+0x60/0x74) from [<c0011a40>] (__irq_svc+0x40/0x50)
[ 412.270552] Exception stack(0xdf5b3f08 to 0xdf5b3f50)
[ 412.275899] 3f00: 3b9ac9ff 00000000 ebe728b8 d9e8e0db c4653600 ffffffff
[ 412.284552] 3f20: fff2c836 00000000 b8beec00 00e01d0c a398bf46 df5b3f90 386d4503 df5b3f50
[ 412.293198] 3f40: c007796c c0077988 80000013 ffffffff
[ 412.298550] [<c0011a40>] (__irq_svc+0x40/0x50) from [<c0077988>] (__getnstimeofday+0xb4/0xfc)
[ 412.307578] [<c0077988>] (__getnstimeofday+0xb4/0xfc) from [<c0077f70>] (getnstimeofday+0x8/0x24)
[ 412.316974] [<c0077f70>] (getnstimeofday+0x8/0x24) from [<c004fd54>] (posix_clock_realtime_get+0xc/0x14)
[ 412.327010] [<c004fd54>] (posix_clock_realtime_get+0xc/0x14) from [<c0050ee0>] (SyS_clock_gettime+0x28/0x84)
[ 412.337405] [<c0050ee0>] (SyS_clock_gettime+0x28/0x84) from [<c000dee0>] (ret_fast_syscall+0x0/0x30)
[ 412.347064] Kernel panic - not syncing: softlockup: hung tasks
[ 412.353232] CPU: 0 PID: 344 Comm: CrashTool Not tainted 3.12.8 #1
[ 412.359690] [<c0013cf8>] (unwind_backtrace+0x0/0xe0) from [<c0010f9c>] (show_stack+0x10/0x14)
[ 412.368715] [<c0010f9c>] (show_stack+0x10/0x14) from [<c050f050>] (dump_stack+0x68/0x84)
[ 412.377274] [<c050f050>] (dump_stack+0x68/0x84) from [<c050cc58>] (panic+0x84/0x1e0)
[ 412.385472] [<c050cc58>] (panic+0x84/0x1e0) from [<c009c5b8>] (watchdog_timer_fn+0x124/0x164)
[ 412.394499] [<c009c5b8>] (watchdog_timer_fn+0x124/0x164) from [<c0054134>] (__run_hrtimer.isra.15+0x84/0x100)
[ 412.404994] [<c0054134>] (__run_hrtimer.isra.15+0x84/0x100) from [<c0054980>] (hrtimer_interrupt+0x124/0x298)
[ 412.415488] [<c0054980>] (hrtimer_interrupt+0x124/0x298) from [<c0023118>] (omap2_gp_timer_interrupt+0x20/0x30)
[ 412.426170] [<c0023118>] (omap2_gp_timer_interrupt+0x20/0x30) from [<c00714b0>] (handle_irq_event_percpu+0x2c/0x170)
[ 412.437306] [<c00714b0>] (handle_irq_event_percpu+0x2c/0x170) from [<c0071630>] (handle_irq_event+0x3c/0x5c)
[ 412.447707] [<c0071630>] (handle_irq_event+0x3c/0x5c) from [<c007416c>] (handle_level_irq+0xd8/0xf0)
[ 412.457374] [<c007416c>] (handle_level_irq+0xd8/0xf0) from [<c0070f1c>] (generic_handle_irq+0x20/0x30)
[ 412.467221] [<c0070f1c>] (generic_handle_irq+0x20/0x30) from [<c000e7bc>] (handle_IRQ+0x64/0x8c)
[ 412.476521] [<c000e7bc>] (handle_IRQ+0x64/0x8c) from [<c00085e0>] (omap3_intc_handle_irq+0x60/0x74)
[ 412.486095] [<c00085e0>] (omap3_intc_handle_irq+0x60/0x74) from [<c0011a40>] (__irq_svc+0x40/0x50)
[ 412.495565] Exception stack(0xdf5b3f08 to 0xdf5b3f50)
[ 412.500915] 3f00: 3b9ac9ff 00000000 ebe728b8 d9e8e0db c4653600 ffffffff
[ 412.509572] 3f20: fff2c836 00000000 b8beec00 00e01d0c a398bf46 df5b3f90 386d4503 df5b3f50
[ 412.518221] 3f40: c007796c c0077988 80000013 ffffffff
[ 412.523577] [<c0011a40>] (__irq_svc+0x40/0x50) from [<c0077988>] (__getnstimeofday+0xb4/0xfc)
[ 412.532601] [<c0077988>] (__getnstimeofday+0xb4/0xfc) from [<c0077f70>] (getnstimeofday+0x8/0x24)
[ 412.542000] [<c0077f70>] (getnstimeofday+0x8/0x24) from [<c004fd54>] (posix_clock_realtime_get+0xc/0x14)
[ 412.552039] [<c004fd54>] (posix_clock_realtime_get+0xc/0x14) from [<c0050ee0>] (SyS_clock_gettime+0x28/0x84)
[ 412.562442] [<c0050ee0>] (SyS_clock_gettime+0x28/0x84) from [<c000dee0>] (ret_fast_syscall+0x0/0x30)
The weird thing is that we can boot our kernel also on the beaglebone black board (AM3359 ES2.0) with exactly the same configuration & device tree, and the problem never occurs there. We can run the endless-sys_clock_gettime-program there for hours without problems.
I have already tried to update the kernel to a 3.13.6 kernel, but that doesn't change a thing.
Has anyone every experienced this behavior, or is able to point us in the right direction?
Thanks in advance.
