This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM3352: Soft lockup issues when inserting/removing SDCard

Part Number: AM3352

Tool/software: Linux

Hi,

We've detected a few soft lockup situations when testing sequential SDCard insertion and removal. The problem happens at random intervals, and the stack trace always shows something different, making it difficult to track the problem's cause. Mind that the issue is not related to any I/O, it can be reproduced simply by insertion/removal of the card.

Our current kernel version is 3.2.61.

Here are 2 of the kernel oops messages we've got:

Oops 1:

[ 2424.068061] BUG: soft lockup - CPU#0 stuck for 22s! [swapper:0]
[ 2424.074308] Modules linked in: unifi_sdio(O) mmc_block omap_hsmmc mmc_core smsc95xx usbnet ftdi_sio usbserial g_zero ti81xx nop_usb_xceiv musb_hdrc usb_storage usbcore
[ 2424.090147]
[ 2424.091713] Pid: 0, comm: swapper
[ 2424.096396] CPU: 0 Tainted: G O (3.2.61 #1)
[ 2424.102923] PC is at __do_softirq+0x7c/0x148
[ 2424.107421] LR is at __do_softirq+0x50/0x148
[ 2424.111919] pc : [<c00375a4>] lr : [<c0037578>] psr: 20060113
[ 2424.111926] sp : c0303f08 ip : 00000040 fp : 00000000
[ 2424.124010] r10: 00000000 r9 : 00000000 r8 : c031e058
[ 2424.129511] r7 : c0302000 r6 : 00000000 r5 : c033df00 r4 : 00000002
[ 2424.136383] r3 : c033de80 r2 : c0302000 r1 : 00010001 r0 : c033de80
[ 2424.143258] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 2424.150955] Control: 10c5387d Table: 81a8c019 DAC: 00000015
[ 2424.157050] [<c0012670>] (unwind_backtrace+0x0/0xec) from [<c0066090>] (watchdog_timer_fn+0x120/0x148)
[ 2424.166881] [<c0066090>] (watchdog_timer_fn+0x120/0x148) from [<c004fae0>] (__run_hrtimer.isra.37+0x58/0x10c)
[ 2424.177334] [<c004fae0>] (__run_hrtimer.isra.37+0x58/0x10c) from [<c00502fc>] (hrtimer_interrupt+0x104/0x2fc)
[ 2424.187796] [<c00502fc>] (hrtimer_interrupt+0x104/0x2fc) from [<c0019d54>] (omap2_gp_timer_interrupt+0x2c/0x34)
[ 2424.198437] [<c0019d54>] (omap2_gp_timer_interrupt+0x2c/0x34) from [<c0066ae0>] (handle_irq_event_percpu+0xb0/0x178)
[ 2424.209531] [<c0066ae0>] (handle_irq_event_percpu+0xb0/0x178) from [<c0066bf8>] (handle_irq_event+0x50/0x78)
[ 2424.219893] [<c0066bf8>] (handle_irq_event+0x50/0x78) from [<c0068e0c>] (handle_level_irq+0x8c/0xfc)
[ 2424.229522] [<c0068e0c>] (handle_level_irq+0x8c/0xfc) from [<c0066280>] (generic_handle_irq+0x2c/0x3c)
[ 2424.239340] [<c0066280>] (generic_handle_irq+0x2c/0x3c) from [<c000e7f4>] (handle_IRQ+0x38/0x84)
[ 2424.248602] [<c000e7f4>] (handle_IRQ+0x38/0x84) from [<c0008608>] (omap3_intc_handle_irq+0x7c/0x80)
[ 2424.258136] [<c0008608>] (omap3_intc_handle_irq+0x7c/0x80) from [<c000db80>] (__irq_svc+0x40/0x70)
[ 2424.267571] Exception stack(0xc0303ec0 to 0xc0303f08)
[ 2424.272896] 3ec0: c033de80 00010001 c0302000 c033de80 00000002 c033df00 00000000 c0302000
[ 2424.281512] 3ee0: c031e058 00000000 00000000 00000000 00000040 c0303f08 c0037578 c00375a4
[ 2424.290123] 3f00: 20060113 ffffffff
[ 2424.293803] [<c000db80>] (__irq_svc+0x40/0x70) from [<c00375a4>] (__do_softirq+0x7c/0x148)
[ 2424.302516] [<c00375a4>] (__do_softirq+0x7c/0x148) from [<c0037ad4>] (irq_exit+0x98/0x9c)
[ 2424.311135] [<c0037ad4>] (irq_exit+0x98/0x9c) from [<c000e7f8>] (handle_IRQ+0x3c/0x84)
[ 2424.319480] [<c000e7f8>] (handle_IRQ+0x3c/0x84) from [<c0008608>] (omap3_intc_handle_irq+0x7c/0x80)
[ 2424.329013] [<c0008608>] (omap3_intc_handle_irq+0x7c/0x80) from [<c000db80>] (__irq_svc+0x40/0x70)
[ 2424.338445] Exception stack(0xc0303f70 to 0xc0303fb8)
[ 2424.343763] 3f60: 0007789d 00000000 0007789d 00000000
[ 2424.352379] 3f80: c0302000 c032afc4 c0306e68 c03cd440 c02f9fa8 413fc082 00000000 00000000
[ 2424.360993] 3fa0: a6aaaa47 c0303fb8 c000e918 c000e91c 60060013 ffffffff
[ 2424.367967] [<c000db80>] (__irq_svc+0x40/0x70) from [<c000e91c>] (default_idle+0x24/0x28)
[ 2424.376589] [<c000e91c>] (default_idle+0x24/0x28) from [<c000ea88>] (cpu_idle+0x7c/0xa4)
[ 2424.385122] [<c000ea88>] (cpu_idle+0x7c/0xa4) from [<c02e2708>] (start_kernel+0x260/0x26c)

Oops 2:

[ 528.093248] BUG: soft lockup - CPU#0 stuck for 22s! [sh:2010]
[ 528.099317] Modules linked in: unifi_sdio(O) mmc_block omap_hsmmc mmc_core smsc95xx usbnet ftdi_sio usbserial g_zero ti81xx nop_usb_xceiv musb_hdrc usb_storage usbcore
[ 528.115167]
[ 528.116735] Pid: 2010, comm: sh
[ 528.121694] CPU: 0 Tainted: G O (3.2.61 #1)
[ 528.128228] PC is at __do_softirq+0x84/0x250
[ 528.132728] LR is at __do_softirq+0x6c/0x250
[ 528.137227] pc : [<c0040040>] lr : [<c0040028>] psr: 20060113
[ 528.137236] sp : c3423c58 ip : c3423c58 fp : c3423ca4
[ 528.149322] r10: 00000000 r9 : 00100100 r8 : 00000000
[ 528.154823] r7 : fa200000 r6 : c03f6240 r5 : 00000002 r4 : 00000d20
[ 528.161699] r3 : c03f61c0 r2 : c03d0d20 r1 : c3016180 r0 : c03f61c0
[ 528.168574] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 528.176090] Control: 10c5387d Table: 81b3c019 DAC: 00000015
[ 528.182193] [<c0013c28>] (unwind_backtrace+0x0/0xf8) from [<c027cba8>] (dump_stack+0x20/0x24)
[ 528.191199] [<c027cba8>] (dump_stack+0x20/0x24) from [<c000f6ac>] (show_regs+0x54/0x58)
[ 528.199651] [<c000f6ac>] (show_regs+0x54/0x58) from [<c0077d9c>] (watchdog_timer_fn+0x130/0x158)
[ 528.208925] [<c0077d9c>] (watchdog_timer_fn+0x130/0x158) from [<c005e3d0>] (__run_hrtimer+0x94/0x260)
[ 528.218652] [<c005e3d0>] (__run_hrtimer+0x94/0x260) from [<c005ef7c>] (hrtimer_interrupt+0x120/0x31c)
[ 528.228387] [<c005ef7c>] (hrtimer_interrupt+0x120/0x31c) from [<c001c054>] (omap2_gp_timer_interrupt+0x3c/0x44)
[ 528.239032] [<c001c054>] (omap2_gp_timer_interrupt+0x3c/0x44) from [<c00788ec>] (handle_irq_event_percpu+0x64/0x2bc)
[ 528.250130] [<c00788ec>] (handle_irq_event_percpu+0x64/0x2bc) from [<c0078ba4>] (handle_irq_event+0x60/0x88)
[ 528.260495] [<c0078ba4>] (handle_irq_event+0x60/0x88) from [<c007b184>] (handle_level_irq+0x9c/0x10c)
[ 528.270219] [<c007b184>] (handle_level_irq+0x9c/0x10c) from [<c0077fe4>] (generic_handle_irq+0x3c/0x4c)
[ 528.280125] [<c0077fe4>] (generic_handle_irq+0x3c/0x4c) from [<c000f014>] (handle_IRQ+0x48/0x94)
[ 528.289392] [<c000f014>] (handle_IRQ+0x48/0x94) from [<c0008670>] (omap3_intc_handle_irq+0x8c/0x90)
[ 528.298932] [<c0008670>] (omap3_intc_handle_irq+0x8c/0x90) from [<c000e2c0>] (__irq_svc+0x40/0x70)
[ 528.308372] Exception stack(0xc3423c10 to 0xc3423c58)
[ 528.313697] 3c00: c03f61c0 c3016180 c03d0d20 c03f61c0
[ 528.322318] 3c20: 00000d20 00000002 c03f6240 fa200000 00000000 00100100 00000000 c3423ca4
[ 528.330938] 3c40: c3423c58 c3423c58 c0040028 c0040040 20060113 ffffffff
[ 528.337915] [<c000e2c0>] (__irq_svc+0x40/0x70) from [<c0040040>] (__do_softirq+0x84/0x250)
[ 528.346635] [<c0040040>] (__do_softirq+0x84/0x250) from [<c00406cc>] (irq_exit+0xa8/0xac)
[ 528.355261] [<c00406cc>] (irq_exit+0xa8/0xac) from [<c000f018>] (handle_IRQ+0x4c/0x94)
[ 528.363611] [<c000f018>] (handle_IRQ+0x4c/0x94) from [<c0008670>] (omap3_intc_handle_irq+0x8c/0x90)
[ 528.373148] [<c0008670>] (omap3_intc_handle_irq+0x8c/0x90) from [<c000e2c0>] (__irq_svc+0x40/0x70)
[ 528.382586] Exception stack(0xc3423d00 to 0xc3423d48)
[ 528.387912] 3d00: c32236c4 402ca000 00000000 c3223f94 c1a599f0 c3223f80 c1a59b28 c32236c0
[ 528.396531] 3d20: 00200200 00100100 00000000 c3423d84 c3423d48 c3423d48 c3223f94 c00be488
[ 528.405144] 3d40: 60060013 ffffffff
[ 528.408832] [<c000e2c0>] (__irq_svc+0x40/0x70) from [<c00be488>] (unlink_anon_vmas+0x88/0x1ec)
[ 528.417930] [<c00be488>] (unlink_anon_vmas+0x88/0x1ec) from [<c00b3e74>] (free_pgtables+0x88/0xdc)
[ 528.427379] [<c00b3e74>] (free_pgtables+0x88/0xdc) from [<c00bb454>] (exit_mmap+0x178/0x2c0)
[ 528.436279] [<c00bb454>] (exit_mmap+0x178/0x2c0) from [<c0036cc8>] (mmput+0x44/0xf8)
[ 528.444458] [<c0036cc8>] (mmput+0x44/0xf8) from [<c00d0598>] (flush_old_exec+0x2bc/0x5c0)
[ 528.453090] [<c00d0598>] (flush_old_exec+0x2bc/0x5c0) from [<c0114990>] (load_elf_binary+0x308/0x12fc)
[ 528.462906] [<c0114990>] (load_elf_binary+0x308/0x12fc) from [<c00cf88c>] (search_binary_handler+0x15c/0x3d8)
[ 528.473365] [<c00cf88c>] (search_binary_handler+0x15c/0x3d8) from [<c00d12fc>] (do_execve+0x258/0x2c8)
[ 528.483185] [<c00d12fc>] (do_execve+0x258/0x2c8) from [<c00117d4>] (sys_execve+0x44/0x64)
[ 528.491810] [<c00117d4>] (sys_execve+0x44/0x64) from [<c000e6c0>] (ret_fast_syscall+0x0/0x30)

We have two other oops messages, all with different stack traces. Can anyone help to shed a light on this issue?

Edit: Small addition; Although the kernel version is quite old, I've patched in some changes from the more recent versions regarding finishing block I/O requests properly when removing the card during I/O.

Regards,

Guilherme