This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[AM335x] Kernel Crashes When on Soft Reset

Other Parts Discussed in Thread: WL1271

Sitara Champs,

We've only seen this twice so far however were encountering another strance issue and was hoping to hear your thoughts and or speculations as to how to debug this issue.  We are using Debian Wheezy and PSP 5.06.

When attempting a soft reboot (using the reboot command) were encountering a kernel crash.  Here are the logs leading up to the crash and ofcourse the backtrace for those who can read that. 

We will work on being able to reliably reproduce this issue in the meantime.

Kernel Logs:

[ ok ] Asking all remaining processes to terminate...done.
[ ok ] All processes ended within 1 seconds...done.
[ ok ] Stopping enhanced syslogd: rsyslogd.
[info] Saving the system clock.
[ 1644.324371] wl1271: down
[ 1644.328430] disabling wifi
[ 1672.130645] BUG: soft lockup - CPU#0 stuck for 23s! [hwclock:3384]
[ 1672.137115] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211
[ 1672.143676]
[ 1672.145233] Pid: 3384, comm:              hwclock
[ 1672.150146] CPU: 0    Not tainted  (3.2.0suned_01-svn49469 #11)
[ 1672.156341] PC is at __do_softirq+0x58/0x134
[ 1672.160797] LR is at irq_exit+0x98/0xa0
[ 1672.164794] pc : [<c0049ad4>]    lr : [<c004a008>]    psr: 20000113
[ 1672.164825] sp : cf4abe20  ip : cf4abe60  fp : cf4abe5c
[ 1672.176788] r10: c06dc9c0  r9 : cf4aa000  r8 : 3f9235f8
[ 1672.182250] r7 : cf4abeb8  r6 : cf4aa000  r5 : c06dca08  r4 : 00000002
[ 1672.189056] r3 : 00000000  r2 : 00000101  r1 : 0000000a  r0 : cf4abe20
[ 1672.195861] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 1672.203308] Control: 10c5387d  Table: 8f278019  DAC: 00000015
 
Backtrace:
[ 1672.211883] [<c001bae8>] (dump_backtrace+0x0/0x10c) from [<c038379c>] (dump_stack+0x18/0x1c)
[ 1672.220703]  r6:c0504990 r5:00000017 r4:cf4abdd8 r3:00000002
[ 1672.226623] [<c0383784>] (dump_stack+0x0/0x1c) from [<c0019700>] (show_regs+0x4c/0x50)
[ 1672.234924] [<c00196b4>] (show_regs+0x0/0x50) from [<c007df50>] (watchdog_timer_fn+0x110/0x148)
[ 1672.243988]  r4:c068b720 r3:00000002
[ 1672.247772] [<c007de40>] (watchdog_timer_fn+0x0/0x148) from [<c006230c>] (__run_hrtimer.isra.19+0x60/0x13c)
[ 1672.257934] [<c00622ac>] (__run_hrtimer.isra.19+0x0/0x13c) from [<c0062c60>] (hrtimer_interrupt+0xf0/0x27c)
[ 1672.268096]  r7:00000001 r6:c052aa30 r5:00000000 r4:c052aa88
[ 1672.274047] [<c0062b70>] (hrtimer_interrupt+0x0/0x27c) from [<c0025bf8>] (omap2_gp_timer_interrupt+0x34/0x3c)
[ 1672.284393] [<c0025bc4>] (omap2_gp_timer_interrupt+0x0/0x3c) from [<c007e918>] (handle_irq_event_percpu+0x54/0x1c0)
[ 1672.295288] [<c007e8c4>] (handle_irq_event_percpu+0x0/0x1c0) from [<c007eac8>] (handle_irq_event+0x44/0x64)
[ 1672.305480] [<c007ea84>] (handle_irq_event+0x0/0x64) from [<c0080950>] (handle_level_irq+0x94/0x114)
[ 1672.315002]  r6:cf4abeb8 r5:c068eac8 r4:c068ea78 r3:00020000
[ 1672.320922] [<c00808bc>] (handle_level_irq+0x0/0x114) from [<c007e248>] (generic_handle_irq+0x34/0x48)
[ 1672.330657]  r5:00000044 r4:c06ac340
[ 1672.334411] [<c007e214>] (generic_handle_irq+0x0/0x48) from [<c001909c>] (handle_IRQ+0x38/0x8c)
[ 1672.343475] [<c0019064>] (handle_IRQ+0x0/0x8c) from [<c00085c4>] (omap3_intc_handle_irq+0x80/0x88)
[ 1672.352813]  r6:00000044 r5:fa200000 r4:00000004 r3:00000002
[ 1672.358764] [<c0008544>] (omap3_intc_handle_irq+0x0/0x88) from [<c0017dc0>] (__irq_svc+0x40/0x70)
[ 1672.368041] Exception stack(0xcf4abdd8 to 0xcf4abe20)
[ 1672.373321] bdc0:                                                       cf4abe20 0000000a
[ 1672.381835] bde0: 00000101 00000000 00000002 c06dca08 cf4aa000 cf4abeb8 3f9235f8 cf4aa000
[ 1672.390380] be00: c06dc9c0 cf4abe5c cf4abe60 cf4abe20 c004a008 c0049ad4 20000113 ffffffff
[ 1672.398925]  r7:cf4abe0c r6:ffffffff r5:20000113 r4:c0049ad4
[ 1672.404846] [<c0049a7c>] (__do_softirq+0x0/0x134) from [<c004a008>] (irq_exit+0x98/0xa0)
[ 1672.413299] [<c0049f70>] (irq_exit+0x0/0xa0) from [<c00190a0>] (handle_IRQ+0x3c/0x8c)
[ 1672.421478]  r4:c06ac340 r3:00000000
[ 1672.425201] [<c0019064>] (handle_IRQ+0x0/0x8c) from [<c00085c4>] (omap3_intc_handle_irq+0x80/0x88)
[ 1672.434570]  r6:00000044 r5:fa200000 r4:00000004 r3:00000002
[ 1672.440490] [<c0008544>] (omap3_intc_handle_irq+0x0/0x88) from [<c0017dc0>] (__irq_svc+0x40/0x70)
[ 1672.449768] Exception stack(0xcf4abeb8 to 0xcf4abf00)
[ 1672.455017] bea0:                                                       03365b06 00000000
[ 1672.463562] bec0: c06cb628 f9e31000 c06de2b0 c050bd00 cf4abf68 0001f5d2 c052ae40 cf4aa000
[ 1672.472106] bee0: 00000000 cf4abf0c cf4abf10 cf4abf00 c00670c4 c0025cb0 60000013 ffffffff
[ 1672.480621]  r7:cf4abeec r6:ffffffff r5:60000013 r4:c0025cb0
[ 1672.486572] [<c0025c74>] (clocksource_read_cycles+0x0/0x44) from [<c00670c4>] (getnstimeofday+0x6c/0x164)
[ 1672.496582] [<c0067058>] (getnstimeofday+0x0/0x164) from [<c006721c>] (do_gettimeofday+0x1c/0x48)
[ 1672.505828] [<c0067200>] (do_gettimeofday+0x0/0x48) from [<c0048f50>] (sys_gettimeofday+0x24/0xbc)
[ 1672.515167]  r4:00000000
[ 1672.517822] [<c0048f2c>] (sys_gettimeofday+0x0/0xbc) from [<c00181c0>] (ret_fast_syscall+0x0/0x30)
[ 1672.527191]  r5:bec3a6f4 r4:51f170ab
[ 1672.530914] Kernel panic - not syncing: softlockup: hung tasks
[ 1672.537017] Backtrace:
[ 1672.539581] [<c001bae8>] (dump_backtrace+0x0/0x10c) from [<c038379c>] (dump_stack+0x18/0x1c)
[ 1672.548370]  r6:c0504990 r5:00000000 r4:c06cc078 r3:00000002
[ 1672.554321] [<c0383784>] (dump_stack+0x0/0x1c) from [<c03839f0>] (panic+0x7c/0x1ac)
[ 1672.562316] [<c0383974>] (panic+0x0/0x1ac) from [<c007df6c>] (watchdog_timer_fn+0x12c/0x148)
[ 1672.571105]  r3:00000001 r2:00000000 r1:00000001 r0:c0467114
[ 1672.577026]  r7:00000000
[ 1672.579681] [<c007de40>] (watchdog_timer_fn+0x0/0x148) from [<c006230c>] (__run_hrtimer.isra.19+0x60/0x13c)
[ 1672.589874] [<c00622ac>] (__run_hrtimer.isra.19+0x0/0x13c) from [<c0062c60>] (hrtimer_interrupt+0xf0/0x27c)
[ 1672.600036]  r7:00000001 r6:c052aa30 r5:00000000 r4:c052aa88
[ 1672.605957] [<c0062b70>] (hrtimer_interrupt+0x0/0x27c) from [<c0025bf8>] (omap2_gp_timer_interrupt+0x34/0x3c)
[ 1672.616333] [<c0025bc4>] (omap2_gp_timer_interrupt+0x0/0x3c) from [<c007e918>] (handle_irq_event_percpu+0x54/0x1c0)
[ 1672.627227] [<c007e8c4>] (handle_irq_event_percpu+0x0/0x1c0) from [<c007eac8>] (handle_irq_event+0x44/0x64)
[ 1672.637390] [<c007ea84>] (handle_irq_event+0x0/0x64) from [<c0080950>] (handle_level_irq+0x94/0x114)
[ 1672.646911]  r6:cf4abeb8 r5:c068eac8 r4:c068ea78 r3:00020000
[ 1672.652862] [<c00808bc>] (handle_level_irq+0x0/0x114) from [<c007e248>] (generic_handle_irq+0x34/0x48)
[ 1672.662567]  r5:00000044 r4:c06ac340
[ 1672.666320] [<c007e214>] (generic_handle_irq+0x0/0x48) from [<c001909c>] (handle_IRQ+0x38/0x8c)
[ 1672.675415] [<c0019064>] (handle_IRQ+0x0/0x8c) from [<c00085c4>] (omap3_intc_handle_irq+0x80/0x88)
[ 1672.684753]  r6:00000044 r5:fa200000 r4:00000004 r3:00000002
[ 1672.690704] [<c0008544>] (omap3_intc_handle_irq+0x0/0x88) from [<c0017dc0>] (__irq_svc+0x40/0x70)
[ 1672.699951] Exception stack(0xcf4abdd8 to 0xcf4abe20)
[ 1672.705230] bdc0:                                                       cf4abe20 0000000a
[ 1672.713745] bde0: 00000101 00000000 00000002 c06dca08 cf4aa000 cf4abeb8 3f9235f8 cf4aa000
[ 1672.722290] be00: c06dc9c0 cf4abe5c cf4abe60 cf4abe20 c004a008 c0049ad4 20000113 ffffffff
[ 1672.730834]  r7:cf4abe0c r6:ffffffff r5:20000113 r4:c0049ad4
[ 1672.736785] [<c0049a7c>] (__do_softirq+0x0/0x134) from [<c004a008>] (irq_exit+0x98/0xa0)
[ 1672.745208] [<c0049f70>] (irq_exit+0x0/0xa0) from [<c00190a0>] (handle_IRQ+0x3c/0x8c)
[ 1672.753387]  r4:c06ac340 r3:00000000
[ 1672.757141] [<c0019064>] (handle_IRQ+0x0/0x8c) from [<c00085c4>] (omap3_intc_handle_irq+0x80/0x88)
[ 1672.766479]  r6:00000044 r5:fa200000 r4:00000004 r3:00000002
[ 1672.772430] [<c0008544>] (omap3_intc_handle_irq+0x0/0x88) from [<c0017dc0>] (__irq_svc+0x40/0x70)
[ 1672.781677] Exception stack(0xcf4abeb8 to 0xcf4abf00)
[ 1672.786956] bea0:                                                       03365b06 00000000
[ 1672.795501] bec0: c06cb628 f9e31000 c06de2b0 c050bd00 cf4abf68 0001f5d2 c052ae40 cf4aa000
[ 1672.804046] bee0: 00000000 cf4abf0c cf4abf10 cf4abf00 c00670c4 c0025cb0 60000013 ffffffff
[ 1672.812561]  r7:cf4abeec r6:ffffffff r5:60000013 r4:c0025cb0
[ 1672.818511] [<c0025c74>] (clocksource_read_cycles+0x0/0x44) from [<c00670c4>] (getnstimeofday+0x6c/0x164)
[ 1672.828491] [<c0067058>] (getnstimeofday+0x0/0x164) from [<c006721c>] (do_gettimeofday+0x1c/0x48)
[ 1672.837768] [<c0067200>] (do_gettimeofday+0x0/0x48) from [<c0048f50>] (sys_gettimeofday+0x24/0xbc)
[ 1672.847106]  r4:00000000
[ 1672.849761] [<c0048f2c>] (sys_gettimeofday+0x0/0xbc) from [<c00181c0>] (ret_fast_syscall+0x0/0x30)
[ 1672.859100]  r5:bec3a6f4 r4:51f170ab

 

  • Note that I think the crash below is when trying write the time to the hwclock, that’s the process that is getting hung and triggering the watchdog.

    Are there any known problems with the hwclock and Linux?

     

  • Hi,

    It does seem like the hwclock task is hanging (or taking really long time to complete) and thus triggering a soft lockup warning. Is there a chance the /dev/rtc symlink has been unregistered before hwclock has written the system time to it? I'm not a hundred percent sure, but I think hwclock expects /dev/rtc to exist in order to access the RTC through the symlink. (EDIT: Judging by what's written in the man page, it seems like the hwclock should fall back to another method if /dev/rtc is not found. However I don't know if the embedded version of hwclock has the same functionality as the desktop version.)

    Reading the man page for hwclock I found this:

    How hwclock Accesses the Hardware Clock
           hwclock uses many different ways to get and set Hardware Clock values.  The most normal way is to do I/O to the device special file /dev/rtc, which is presumed to be driven by the rtc device driver. However,  this  method  is  not  always  available.  For one thing, the rtc driver is a relatively recent addition to Linux.  Older systems don't have it.  Also, though there are versions of the rtc driver that work on DEC Alphas, there appear to be plenty of Alphas on which the rtc driver does not work (a common symptom is hwclock hanging).  Moreover, recent Linux  systems  have  more  generic support for RTCs, even systems that have more than one, so you might need to override the default by specifying /dev/rtc0 or /dev/rtc1 instead.

           On older systems, the method of accessing the Hardware Clock depends on the system hardware.

           On  an ISA system, hwclock can directly access the "CMOS memory" registers that constitute the clock, by doing I/O to Ports 0x70 and 0x71.  It does this with actual I/O instructions and consequently can only do it if running with superuser effective userid.  (In the case of a Jensen Alpha, there is no way for hwclock to execute those I/O instructions, and so it uses instead the /dev/port device special file, which provides almost as low-level an interface to the I/O subsystem).

           This  is a really poor method of accessing the clock, for all the reasons that user space programs are generally not supposed to do direct I/O and disable interrupts.  Hwclock provides it because it is the only method available on ISA and Alpha systems which don't have working rtc device drivers available.

           On an m68k system, hwclock can access the clock via the console driver, via the device special file /dev/tty1.

           hwclock tries to use /dev/rtc.  If it is compiled for a kernel that doesn't have that function or it is unable to open /dev/rtc (or the alternative special file you've defined on the  command  line) hwclock will fall back to another method, if available.  On an ISA or Alpha machine, you can force hwclock to use the direct manipulation of the CMOS registers without even trying /dev/rtc by specifying the --directisa option.

    I know this is mentioned as related to Alpha systems, but I guess if there is any problem with the RTC driver, hwclock would hang on any other system as well.

    You may try and run a script that constantly reads or updates the RTC through hwclock and see if any hanging would occur.

    Also reading the hwclock man page (man hwclock) may give you an idea about what could be going wrong. I know the embedded version of hwclock included in the busybox binary is much smaller and trimmed as opposed to the hwclock version on your host PC, but still...

    This is all I can think of.

    Best regards,
    Miroslav

  • Hi Miroslav,

    I'm the customer with the problem.  To add some more data, our system has only /dev/rtc0 and we are using Debian, so are running the full version of hwclock as opposed to a BusyBox version.

    I've copied the Debian RTC init script to http://pastebin.com/Z4qjbAQD

    I ran a test for about 30 mins where I was calling /etc/init.d/hwclock stop repeatedly, but this did not expose the error unfortunately.

  • Hi Sean,

    Just a quick morning idea - can you try substituting the full version binary of hwclock with a link to "busybox hwclock". I have no idea if it would work without hanging (probably not if the problem is with the driver), but I guess it's worth a try.

    Best regards,
    Miroslav

  • I think it would be better to try and figure out the root cause from the register dump.  Can TI assist with that?

  • Sean,

    Let's test Miroslav theory by removing the symlink to the RTC and then call hwclock either nativly or by busybox.  This should trigger the lock up condition (hopefully).

  • Note that there is no symlink to any rtc, there is just /dev/rtc0.

  • Miroslav,

    Another were going to try is to increase priority of the hwclock.sh in the shutdown sequence.  We are still having a hard time reliably reproducing the issue, so this "fix" will be hard to test.  Any additional insight is always appreciated.

     

    Tear Down Sequence:

    root@SunDAC:~# ls -l /etc/rc0.d
    total 4
    lrwxrwxrwx 1 root root  17 Feb 17  2013 K01apache2 -> ../init.d/apache2
    lrwxrwxrwx 1 root root  25 Feb 17  2013 K01board_tweaks.sh -> ../init.d/board_tweaks.sh
    lrwxrwxrwx 1 root root  23 Jun 15 02:44 K01etc-setserial -> ../init.d/etc-setserial
    lrwxrwxrwx 1 root root  14 Aug 20 19:04 K01gpsd -> ../init.d/gpsd
    lrwxrwxrwx 1 root root  25 Jul  3 19:58 K01isc-dhcp-server -> ../init.d/isc-dhcp-server
    lrwxrwxrwx 1 root root  19 Jun 15 02:44 K01setserial -> ../init.d/setserial
    lrwxrwxrwx 1 root root  22 Aug 20 22:00 K01sundac-modem -> ../init.d/sundac-modem
    lrwxrwxrwx 1 root root  17 Aug 20 19:01 K01urandom -> ../init.d/urandom
    lrwxrwxrwx 1 root root  18 Aug 20 19:04 K02sendsigs -> ../init.d/sendsigs
    lrwxrwxrwx 1 root root  17 Aug 20 19:04 K03rsyslog -> ../init.d/rsyslog
    lrwxrwxrwx 1 root root  20 Aug 20 19:04 K04hwclock.sh -> ../init.d/hwclock.sh (Change to K01 or 00)
    lrwxrwxrwx 1 root root  22 Aug 20 19:04 K04umountnfs.sh -> ../init.d/umountnfs.sh
    lrwxrwxrwx 1 root root  20 Aug 20 19:04 K05networking -> ../init.d/networking
    lrwxrwxrwx 1 root root  18 Aug 20 19:04 K06umountfs -> ../init.d/umountfs
    lrwxrwxrwx 1 root root  20 Aug 20 19:04 K07umountroot -> ../init.d/umountroot
    lrwxrwxrwx 1 root root  14 Aug 20 19:04 K08halt -> ../init.d/halt
    -rw-r--r-- 1 root root 353 Oct 15  2012 README
  • Miroslav,

    We've removed the reference to the /init.d/hwclock.sh in the de-init script (rc0.d) and have not been able to reproduce the issue.