This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Kexec hardly ever working.

I am using the new 3.12 kernel with a custom board I am designing based on the am.  I am trying to get kexec to work to reboot a kernel.

Most of the time, I run the following commands:

/nfsutils/kexec -d  -l /mnt/images/kernel/uImage.aragoyocto --dtb /mnt/images/kernel/var-som-am33.dtb --append="root=/dev/ram0 console=ttyO0,115200n8 rootwait=1 rw init=/linuxrc" 

/nfsutils/kexec -e --no-ifdown 

and I get the following statements:

[ 177.699626] Starting new kernel
[ 177.702992] Bye!

And the processor hangs, at least as far as I can tell, because there are no messages to the console.

However, after many, many attempts, there were two very very rare instances where the kernel actually booted, but with some errors in the messages.  I am not sure if the following error message can yield a clue to my larger problem with kexec about why it is not working. 

Here are two snippets that are different from the regular u-boot boot sequence:

Snippet 1

cpu cpu0: cpu0 regulator not ready, retry
 platform cpufreq-cpu0.0: Driver cpufreq-cpu0 requests probe deferral
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 1 at arch/arm/mach-omap2/pm33xx.c:322 am33xx_m3_fw_ready_cb+0x9c/0xe8()
 PM: MPU<->CM3 sync failure
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.10 #20
 Backtrace: 
 [<c0017a1c>] (dump_backtrace+0x0/0x10c) from [<c0017bb8>] (show_stack+0x18/0x1c)
  r6:00000142 r5:00000009 r4:cd87fe30 r3:00000000
 [<c0017ba0>] (show_stack+0x0/0x1c) from [<c0540710>] (dump_stack+0x20/0x28)
 [<c05406f0>] (dump_stack+0x0/0x28) from [<c0045ff8>] (warn_slowpath_common+0x6c/0x8c)
 [<c0045f8c>] (warn_slowpath_common+0x0/0x8c) from [<c00460bc>] (warn_slowpath_fmt+0x38/0x40)
  r8:c1357d2c r7:00000190 r6:c08e5e08 r5:cdd4de40 r4:00000000
 [<c0046084>] (warn_slowpath_fmt+0x0/0x40) from [<c0033b70>] (am33xx_m3_fw_ready_cb+0x9c/0xe8)
  r3:00000000 r2:c0668050
 [<c0033ad4>] (am33xx_m3_fw_ready_cb+0x0/0xe8) from [<c071a544>] (am33xx_pm_init+0x384/0x45c)
  r5:cdd4de40 r4:00000000
 [<c071a1c0>] (am33xx_pm_init+0x0/0x45c) from [<c0712d38>] (am33xx_init_late+0x18/0x20)
  r8:c070d43c r7:c070a410 r6:c08e4d00 r5:00000007 r4:c0756d80
 [<c0712d20>] (am33xx_init_late+0x0/0x20) from [<c070d460>] (init_machine_late+0x24/0x30)
 [<c070d43c>] (init_machine_late+0x0/0x30) from [<c0008a44>] (do_one_initcall+0xf4/0x154)
 [<c0008950>] (do_one_initcall+0x0/0x154) from [<c070ab98>] (kernel_init_freeable+0xec/0x1b8)
 [<c070aaac>] (kernel_init_freeable+0x0/0x1b8) from [<c053b398>] (kernel_init+0x10/0xec)
 [<c053b388>] (kernel_init+0x0/0xec) from [<c0014878>] (ret_from_fork+0x14/0x3c)
  r4:00000000 r3:00000000
 ---[ end trace 04f5297724f15e86 ]---
 ThumbEE CPU extension supported.
 omap-gpmc 50000000.gpmc: GPMC revision 6.0
 gpmc_mem_init: disabling cs 0 mapped at 0x0-0x1000000

Snippet 2


Random MACID = 4e:d7:24:6a:d3:fa
irq 57: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 1 Comm: swapper Tainted: G W 3.12.10 #20
Backtrace: 
[<c0017a1c>] (dump_backtrace+0x0/0x10c) from [<c0017bb8>] (show_stack+0x18/0x1c)
r6:00000039 r5:cd805e40 r4:cd805e40 r3:00000000
[<c0017ba0>] (show_stack+0x0/0x1c) from [<c0540710>] (dump_stack+0x20/0x28)
[<c05406f0>] (dump_stack+0x0/0x28) from [<c0072464>] (__report_bad_irq+0x28/0xb8)
[<c007243c>] (__report_bad_irq+0x0/0xb8) from [<c00728c0>] (note_interrupt+0x1cc/0x230)
r5:00000000 r4:cd805e40
[<c00726f4>] (note_interrupt+0x0/0x230) from [<c0070b7c>] (handle_irq_event_percpu+0xb0/0x1b8)
[<c0070acc>] (handle_irq_event_percpu+0x0/0x1b8) from [<c0070cb4>] (handle_irq_event+0x30/0x40)
[<c0070c84>] (handle_irq_event+0x0/0x40) from [<c0073068>] (handle_level_irq+0x88/0xdc)
r4:cd805e40 r3:00020000
[<c0072fe0>] (handle_level_irq+0x0/0xdc) from [<c0070420>] (generic_handle_irq+0x28/0x38)
r4:00000039 r3:c0072fe0
[<c00703f8>] (generic_handle_irq+0x0/0x38) from [<c00156e0>] (handle_IRQ+0x38/0x8c)
r4:c08b6a3c r3:00000112
[<c00156a8>] (handle_IRQ+0x0/0x8c) from [<c000879c>] (omap3_intc_handle_irq+0x68/0x7c)
r6:c08e5814 r5:cd87fb38 r4:fa200000 r3:00000080
[<c0008734>] (omap3_intc_handle_irq+0x0/0x7c) from [<c0544340>] (__irq_svc+0x40/0x54)
Exception stack(0xcd87fb38 to 0xcd87fb80)
fb20: 00000001 c08e67c0
fb40: 00000000 00000100 00000002 00000054 c08e6804 c08e6800 cd87e000 00000000
fb60: cd805e74 cd87fbc4 0000000a cd87fb80 c0049598 c00495f8 20000113 ffffffff
r7:cd87fb6c r6:ffffffff r5:20000113 r4:c00495f8
[<c0049580>] (__do_softirq+0x0/0x1ac) from [<c00497cc>] (do_softirq+0x50/0x5c)
[<c004977c>] (do_softirq+0x0/0x5c) from [<c0049a44>] (irq_exit+0x9c/0xf0)
r4:cd87e000 r3:00000000
[<c00499a8>] (irq_exit+0x0/0xf0) from [<c00156e4>] (handle_IRQ+0x3c/0x8c)
r4:c08b6a3c r3:00000112
[<c00156a8>] (handle_IRQ+0x0/0x8c) from [<c000879c>] (omap3_intc_handle_irq+0x68/0x7c)
r6:c08e5814 r5:cd87fc38 r4:fa200000 r3:00000080
[<c0008734>] (omap3_intc_handle_irq+0x0/0x7c) from [<c0544340>] (__irq_svc+0x40/0x54)
Exception stack(0xcd87fc38 to 0xcd87fc80)
fc20: 00000000 00000039
fc40: 40000111 00000000 cd805e40 cddd4800 60000013 00000000 00000039 00000000
fc60: cd805e74 cd87fcac cd87fc60 cd87fc80 c0073240 c0071c38 40000013 ffffffff
r7:cd87fc6c r6:ffffffff r5:40000013 r4:c0071c38
[<c0071a24>] (__setup_irq+0x0/0x49c) from [<c0071f6c>] (request_threaded_irq+0xac/0x138)
[<c0071ec0>] (request_threaded_irq+0x0/0x138) from [<c0073b6c>] (devm_request_threaded_irq+0x58/0x98)
[<c0073b14>] (devm_request_threaded_irq+0x0/0x98) from [<c03aa864>] (cpsw_probe+0x8e4/0xee8)
[<c03a9f80>] (cpsw_probe+0x0/0xee8) from [<c033628c>] (platform_drv_probe+0x20/0x24)
[<c033626c>] (platform_drv_probe+0x0/0x24) from [<c0334f5c>] (driver_probe_device+0x104/0x240)
[<c0334e58>] (driver_probe_device+0x0/0x240) from [<c033512c>] (__driver_attach+0x94/0x98)
r7:00000000 r6:cd8d0a44 r5:c08cb3ac r4:cd8d0a10
[<c0335098>] (__driver_attach+0x0/0x98) from [<c03334d8>] (bus_for_each_dev+0x5c/0x90)
r6:c0335098 r5:c08cb3ac r4:00000000 r3:cd8ca7fc
[<c033347c>] (bus_for_each_dev+0x0/0x90) from [<c0334a4c>] (driver_attach+0x20/0x28)
r6:c08c2fb0 r5:cddcc8c0 r4:c08cb3ac
[<c0334a2c>] (driver_attach+0x0/0x28) from [<c0334588>] (bus_add_driver+0xdc/0x264)
[<c03344ac>] (bus_add_driver+0x0/0x264) from [<c03357a4>] (driver_register+0x80/0xfc)
r8:c07322b0 r7:c070a410 r6:c08e4d00 r5:00000007 r4:c08cb3ac
[<c0335724>] (driver_register+0x0/0xfc) from [<c03364b0>] (__platform_driver_register+0x50/0x64)
r5:00000007 r4:c0756dc0
[<c0336460>] (__platform_driver_register+0x0/0x64) from [<c07322c8>] (cpsw_init+0x18/0x20)
[<c07322b0>] (cpsw_init+0x0/0x20) from [<c0008a44>] (do_one_initcall+0xf4/0x154)
[<c0008950>] (do_one_initcall+0x0/0x154) from [<c070ab98>] (kernel_init_freeable+0xec/0x1b8)
[<c070aaac>] (kernel_init_freeable+0x0/0x1b8) from [<c053b398>] (kernel_init+0x10/0xec)
[<c053b388>] (kernel_init+0x0/0xec) from [<c0014878>] (ret_from_fork+0x14/0x3c)
r4:00000000 r3:00000000
handlers:
[<c03aae68>] cpsw_interrupt
Disabling IRQ #57
cpsw: Random MACID = 2e:f3:a4:ac:86:0b

Perhaps these snippets can give clues to why when I boot via kexec, most of the time the result is a hung processor.

Thanks for your help.

Brian

  • Hi Brian,

    Implementing a mechanism such as kexec raises two major challenges:

    • Memory of the currently running kernel is overwritten by the new kernel, while the old one is still executing.
    • The new kernel will usually expect all physical devices to be in a well-defined state, as they are after a system reboot, when the system firmware resets them to a "sane" state. Bypassing a real reboot may leave devices in an unknown state, and the new kernel will have to recover from that. On a SoC like the AM335X this means all peripherals, clock structure, etc...
  • It may be a challenge to get working, but it did work in the prior kernel, so the AM335x is capable of this type of thing.

    I would appreciate help in getting this working again in the yocto-based kernel. I'm willing to put some time in.

    Has anyone got kexec working on the  Beaglebone Black?  That would be a good starting place.

    With regards to my first snippet, MPU<->CM3 sync failure, does anyone have any clue how to get the CM3 clock so that it reinitializes correctly?

    Thanks,

    Brian

  • Curious, did you get this working in the end?

    I'm exploring using a kexec based boot loader for Beaglebone Black.. but so far have not succeeded in executing a successful reboot.

  • Hi Joe

    Yes I did.  Will have to look into what I had to do to make it work.  It's been awhile.

    Thanks,

    Brian

  • any update about kexec on am335x? did you also test kdump?