This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

am335x watchdog disabled by Linux boot

Hi I'm having problems with the watchdog on our custom am335x board running on SDK 05.07.00

I enable the watchdog in u-boot just before booting into Linux and want the watchdog to be running to catch any possible faults during Linux boot.

The big problem is that the watchdog timer gets disabled during boot by something. Probably by some PM functionality disable the clock or something. It is nothing in the omap-wdt code itself. I have compiled the kernel without the driver and the problem persists.

I have also tried modifying the omap_hwmod_init_postsetup in io.c without success. (This could however be likely due to my lack of understanding how that is supposed to work then.)

How do i disable this behaviour???

/Tor

  • Hi Tor,

    As I understand files you try to modify are in linux kernel, right?

    I found that in U-Boot itself there is a board support file where the watchdog is disabled by default.

    My SDK is version 06.00.00 but you can search in your version for such a file.

    The WD is disabled in board.c file located at 

    ti-sdk-am335x-evm-06.00.00.00/board-support/u-boot-2013.01.01-psp06.00.00.00/board/ti/am335x

    WD drivers in U-Boot I found here (if some other modification is necessary)

    ti-sdk-am335x-evm-06.00.00.00/board-support/u-boot-2013.01.01-psp06.00.00.00/drivers/watchdog

    Did not notice special Sitara driver there.

    Hope this helps.

    BR,

    Vidin

  • Hi Vidin,

    You are right in that i want to modify the kernel.

    I have already modified the u-boot to turn the watchdog back on before booting the Linux kernel.

    My big problem is that the Linux kernel disables the watchdog again upon boot.  Which obviously must be a design flaw/mistake :( Thus I'm trying to find a way to disable this behaviour.

    I'm quite sure that either the omap_hwmod code or the pm_runtime is the culprit here reinitializing the wdt and then turning it of.

    Reading the omap_hwmod.c it says:

     * The OMAP hwmod code also will attempt to reset and idle all on-chip
     * devices upon boot.  The goal here is for the kernel to be
     * completely self-reliant and independent from bootloaders.

    Further more there are some references to this in mach-omap2/io.c that says

         * Set the default postsetup state for unusual modules (like
         * MPU WDT).
         *
         * The postsetup_state is not actually used until
         * omap_hwmod_late_init(), so boards that desire full watchdog
         * coverage of kernel initialization can reprogram the
         * postsetup_state between the calls to
         * omap2_init_common_infra() and omap_sdrc_init().
         *
         * XXX ideally we could detect whether the MPU WDT was currently
         * enabled here and make this conditional

    I'm however not quite sure how to implement this. Investigating as we speak.

    But do i understand you correctly Vidin that you have gotten this to work?

    /Tor

  • Hi Tor,

    I did not manage to test this WD issue yet. A was browsing the code for WD functionality only. Will check the Linux init also for WD functions. I will let you know if I found where WD is controlled in Linux init.

    BR,

    Vidin

  • Hey Tor and Vidin,

    I encountered this issue recently, too.

    As Tor mentioned, u-boot by default disables the watchdog, but this behavior can easily enough be altered by modifying some lines of code.

    The big problem is that the kernel disable the WDT multiple times.  Removing the"omap_wdt_disable(wdev);" in omap_wdt.c is not enough, as there are other PM processes that disable it.

    Furthermore, from what I have seen, the messing around with omap_hwmod_init_postsetup has no effect either.

    The only way I've seen the WDT not disabled during boot is by gutting the omap2_wd_timer_disable function in wd_timer.c.  This is definitely not a nice solution, though, and I am still investigating how to better accomplish this.  I'll let you know when I have an update.

    Regards,

    Josh

  • Hi,

    Attached is what i ended up with to get my system working. Unfortunately i had to zip them up since i strangely enough only where allowed to upload one file.

    Basically what i did was changed the postsetup state from disabled to enabled. Provided an alternative dummy reset function to not make the ti-code soft reset the wdt  upon initialization and then finally remove the disable in the wdt-driver and also most of the pm_runtime code in the same driver.

    Provided is also the modifications i did to u-boot to reenable the wdt upon boot and some utility code for the wdt.

    Feedback would be appreciated if this could be an acceptable way to solve the problem.

      /Tor

    wdt-patches.zip
  • Hi Tor,

    Thanks for the patches, they are working! However after running stress testing with our kernel after a few days or weeks of operation we were running into issues with accesses in /sys and /proc starting to oops the kernel. I traced the issue back to this patch, and specifically to the enabling of the flag WDIOF_KEEPALIVEPING or possibly WDIOC_SETOPTIONS. It seemed like a problem with the scheduling as the oops would happen inside try_to_wake_up, dereferencing an invalid task entry.

    After removing this support (see patch) our devices run OK, even in lots of stress testing, and still have the watchdog functionality running through the boot process. We use it with NOWAYOUT as well.

    Here is an example script that we had crashing the process through the kernel oops after 5-8 hours running continuously, and sometimes adding multiple instances helped oops it faster as well. After the first oops this script would oops again quickly, so it seems to be creating some corruption inside the kernel. Only a reboot seemed to fix the issue.

    #!/bin/sh
    while true; do
            echo default-on > /sys/class/leds/<any>_led/trigger
            echo 1 > /sys/class/leds/<any>_led/brightness
    done

    == or ==

    while true;
    do
            cat /sys/class/net/eth0/statistics/rx_packets > /dev/null
            echo 0 > /sys/class/gpio/gpio47/value
    done


    Really, lots of /sys or /proc accesses would trigger this. In our devices sometimes it took weeks to first manifest itself, sometimes really quickly inside a day, our app writes and reads from /sys filesystem a lot.

    Of course, this could be a compiler issue or something else, we are using Yocto/Dora with a kernel based on the TI PSP for the am335x, as of d5720d33bc7c434f9a023dbb62c795538f976b7a, with some additional patches from our SOM vendor.

    The telltale sign is an oops like this, always in try_to_wake_up and with address e5832000 (on our system):

    [94307.627105] Unable to handle kernel paging request at virtual address e5832000
    [94307.634735] pgd = cf4e0000
    [94307.637573] [e5832000] *pgd=00000000
    [94307.641357] Internal error: Oops: 5 [#1]
    [94307.645477] Modules linked in: option usb_wwan
    [94307.650177] CPU: 0    Not tainted  (3.2.0 #1)
    [94307.654785] PC is at try_to_wake_up+0x18/0x8c
    [94307.659362] LR is at wake_up_process+0x18/0x1c
    [94307.664062] pc : [<c0037688>]    lr : [<c0037714>]    psr: 800f0093
    [94307.664062] sp : cf71de90  ip : cf71deb0  fp : cf71deac
    [94307.676147] r10: cf4eac00  r9 : cf46e1c0  r8 : cf71df78
    [94307.681640] r7 : 00000019  r6 : 00000019  r5 : 800f0013  r4 : e5832000
    [94307.688537] r3 : c044fe60  r2 : 00000000  r1 : 0000000f  r0 : e5832000
    [94307.695404] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
    [94307.703002] Control: 10c5387d  Table: 8f4e0019  DAC: 00000015
    [94307.709075] Process monit (pid: 1137, stack limit = 0xcf71c2f0)
    [94307.715301] Stack: (0xcf71de90 to 0xcf71e000)
    [94307.719879] de80:                                     c044fe60 00000019 be952bb4 00000019
    [94307.728515] dea0: cf71debc cf71deb0 c0037714 c003767c cf71decc cf71dec0 c044fda0 c0037708
    [94307.737152] dec0: cf71dedc cf71ded0 c044fdc8 c044fd84 cf71df24 cf71dee0 c00b75b8 c044fdb0
    ...
    [94307.814727] dfe0: 00000003 be952bb0 000234d8 4b6e7eac 200f0010 00000006 00000000 00000000
    [94307.823333] Backtrace:
    [94307.825927] [<c0037670>] (try_to_wake_up+0x0/0x8c) from [<c0037714>] (wake_up_process+0x18/0x1c)
    [94307.835174]  r6:00000019 r5:be952bb4 r4:00000019 r3:c044fe60
    [94307.841186] [<c00376fc>] (wake_up_process+0x0/0x1c) from [<c044fda0>] (__mutex_unlock_slowpath+0x28/0x2c)
    [94307.851257] [<c044fd78>] (__mutex_unlock_slowpath+0x0/0x2c) from [<c044fdc8>] (mutex_unlock+0x24/0x28)
    [94307.861083] [<c044fda4>] (mutex_unlock+0x0/0x28) from [<c00b75b8>] (seq_read+0x408/0x418)
    [94307.869720] [<c00b71b0>] (seq_read+0x0/0x418) from [<c00e1564>] (proc_reg_read+0x44/0x60)
    [94307.878326] [<c00e1520>] (proc_reg_read+0x0/0x60) from [<c009d6c0>] (vfs_read+0xb0/0x140)
    [94307.886962]  r5:be952bb4 r4:cf46e1c0
    [94307.890747] [<c009d610>] (vfs_read+0x0/0x140) from [<c009da78>] (sys_read+0x40/0x74)
    [94307.898895]  r8:00000040 r7:be952bb4 r6:cf46e1c0 r5:00000000 r4:00000000
    [94307.906005] [<c009da38>] (sys_read+0x0/0x74) from [<c00131c0>] (ret_fast_syscall+0x0/0x30)
    [94307.914703]  r8:c0013368 r7:00000003 r6:00086a00 r5:00086bcc r4:000869f0
    [94307.921783] Code: e24cb004 e1a04000 e10f5000 f10c0080 (e5900000)
    [94307.928283] ---[ end trace ab3aa54f6031918a ]---

    Just a note so if someone sees an error like this....

    Thanks!

    Kevin