This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM355X Watchdog Reboot Detection

We are using the TI SDK Kernel 3.14.26. We are finding that when making use of the watchdog driver that we cannot obtain a WDIOC_GETBOOTSTATUS of anything other than zero. We are certain that the device is being reset via a watchdog trip.

Looking at the probe function inside of omap_wdt.c the following lines seem to suggest it should be possible if the right information is provided:

         if (pdata && pdata->read_reset_sources)
                 rs = pdata->read_reset_sources();
         else
                 rs = 0;
         omap_wdt->bootstatus = (rs & (1 << OMAP_MPU_WD_RST_SRC_ID_SHIFT)) ?
                                 WDIOF_CARDRESET : 0;

Currently the first conditional evaluates to false. It appears that the reset read sources is not being provided by the platform structure somehow. I wouldn't expect this since this appears to be attempting a simple register read of something equivalent to the PRM_RSTST. The am335x has this register available and I would expect this to work. Am I missing something?

  • Going through the machine files it appears that the AM33XX sources don't declare or assign a prm_read_reset_sources like the 44xx and 2xxx devices do. A sample of this would be:

     /**
      * omap44xx_prm_read_reset_sources - return the last SoC reset source
      *
      * Return a u32 representing the last reset sources of the SoC.  The
      * returned reset source bits are standardized across OMAP SoCs.
      */
     static u32 omap44xx_prm_read_reset_sources(void)
     {
             struct prm_reset_src_map *p;
             u32 r = 0;
             u32 v;
     
             v = omap4_prm_read_inst_reg(OMAP4430_PRM_DEVICE_INST,
                                         OMAP4_RM_RSTST);
     
             p = omap44xx_prm_reset_src_map;
             while (p->reg_shift >= 0 && p->std_shift >= 0) {
                     if (v & (1 << p->reg_shift))
                             r |= 1 << p->std_shift;
                     p++;
             }
     
             return r;
    }

    Later that function is assigned to a structure for the platform

     static struct prm_ll_data omap44xx_prm_ll_data = {
             .read_reset_sources = &omap44xx_prm_read_reset_sources,
             .was_any_context_lost_old = &omap44xx_prm_was_any_context_lost_old,
             .clear_context_loss_flags_old = &omap44xx_prm_clear_context_loss_flags_old,
    };

    The prm33xx file do nothing like this. I believe that this the reason we aren't getting a reboot cause. Why is this not being done. Am I looking in the wrong place, is this provided some other way now? The register exists for these values to be read.

  • Hi Matt,

    I will forward this to the SW team.

  • Hi Matt,

    You are correct implementation for am335x devices is quite different from 44xx & 2xxx devices. I am looking into this.

    Have checked the arch/arm/mach-omap2/prm_common.c & arch/arm/mach-omap2/wdt_timer.c source files? Those two drivers are built in within the Sitara kernel, and there is some prm_ll_data initialization and a generic prm_read_reset_sources function.

    Best Regards,
    Yordan
  • Yordan,

    I have looked through the prm_common.c and wdt_timer.c source files. The prm_common.c appears to be the abstracted device independent library. It relies on the lower level device specific libraries to assign all of the right register values and place the proper function pointers for device specific action into the prm_ll_data struct.

    The wdt_timer.c is making use of the prm_common.c functions. Without the the low level assignment the calls to prm_common for the reset sources will not work.

    wdt_timer.c makes this assignmet:

    pdata.read_reset_sources = prm_read_reset_sources;

    prm_read_reset_sources makes this call:

    if (prm_ll_data->read_reset_sources)

    ret = prm_ll_data->read_reset_sources();

    As far as I can tell the assignment of the read reset sources is never made for the prm_ll_data for the am33xx devices. I could probably add this myself, but I suspect that I am missing something here. Thank you for your help.

    Regards,

    Matt Minga

  • Hi Matt,

    Sorry for the delayed response.

    I also couldn't find a prm_read_reset_sources function dedicate specifically for the am33xx. I don't know what were the reasons not to implement this.  

    matt minga said:
    I could probably add this myself, but I suspect that I am missing something here.

    I think you are ok to give it a try and add this code yourself to see if it will work. 

    Best Regards, 

    Yordan

  • Yordan,

    Looking through the sources this task appeared to be simpler than it is. I made a read reset sources function in the prm33xx.c and assigned it to it to the .read_reset_sources of an instance of the prm_ll_data struct. I then registered that struct using the prm_register call inside of a function called am33xx_prm_init. The call was added to the am33xx_init_early defined in io.c This is all following examples in the prm3xxx.c files.

    It would appear that the assignment of the read reset sources comes in omap_init_wdt. A call is made to prm_read_reset_sources which ultimately returns a function pointer to what I defined. The problem is that this never happens because a check for of_have_populated_dt is failing. I am guessing based on a single comment that the device tree is not done being parsed yet. Once this fails there is never again a call to the omap_ini_wdt, yet there is a call to the drivers probe function and the watchdog is working.

    Is there any insight you can provide on why I might be running into the of_have_populated_dt failure? I do have another route I may be able to take to get the read reset sources but it breaks from what the prior files have done. It is a bit more brute force. There wouldn't be any abstraction around the raw register write which is different from the way the more structured approach takes.

    *Edit - I should also state that the point where the of_have_populated_dt fails is very early in the boot process. It is time stamped around .1912. The call into the watchdog probe is around 1.2768.

    Regards,

    Matt Minga

  • I was hoping to get more information on this issue. I have resumed working on this topic. Thank you.
  • In looking at this again, I may have misstated what I was thinking at the time. It looks like it is failing to register the read_reset_sources function because there is a populated device tree.

    If so where is the device tree appropriate method for registering the read reset sources. I see the prm files have been updated upstream, but nothing for the read reset sources.

    I currently don't have a strong foundation in Linux device drivers. I am trying to avoid adding this feature in such a way that it breaks the driver layering, specifically by adding a device specific call in the omap watchdog probe (bad) or watchdog ioctl (worse).

    I have been able to read the reset source register (PRM_RSTST Register) from the console of our board using busybox devmem2. The bits of this register change in respect to the type of reset that occurs. I did notice that it appears those bits are not being reset unless I do it manually. This makes me think that I am missing something, because I would assume that this value would be picked up by the OS and stored while the register was cleared to be ready to catch the next reset.

    This makes me think I don't have the PRM drivers loaded, is this possibly the case? I am I missing something in how the reset could be picked up by the watchdog driver? Thank you.

    Regards,
    Matt
  • Hi Matt,

    matt minga said:
    This makes me think I don't have the PRM drivers loaded, is this possibly the case?

    This shouldn't be the case, unless you've modified the kernel configuration upon building. By default PRCM functionality should be built-in the kernel image.

    I am digging in the sources to try and figure out a possible solution for your problem, however I'm not quite sure what could be the reason for the described behaviour. 

    matt minga said:
    where is the device tree appropriate method for registering the read reset sources

    This is not done in the device tree. The dts for wdt is quite simple, see am33xx.dtsi, it declares only register space, interrupts hwmod & compatible driver:

       wdt2: wdt@44e35000 {

                        compatible = "ti,omap3-wdt";

                        ti,hwmods = "wd_timer2";

                        reg = <0x44e35000 0x1000>;

                        interrupts = <91>;

        };

    I think the focus needs to be on the sources located in arch/arm/mach-omap2 folder. They describe the machine layer drivers.


    Best Regards,

    Yordan

  • I have modified the kernel configuration, but it has not be expressly modified for the PRCM. Attached is the config.

    .config.zip

    Thank you for the information regarding the device tree. I wanted to be sure that wasn't missing something that was undocumented. Just to be clear my long term goal is that I am able to view the reboot causes using the watchdog API. Currently I am unable to do so due to my finding documented in this thread.

    I will continue to look through the mach-omap2 for a possible solution. Thank you for your help.

    Regards,

    Matt

  • I am also having this same exact issue. I need to be able to determine whether the reset was a warm or cold reset from user space by looking at /dev/watchdog. Have you found a solution to this problem?
  • Patrick,

    I wound up having to add it in myself. Unfortunately I was unable to spend the time to apply it in a way that would be a patch worthy submission that follows the current Linux driver guidelines. It does work for our product though. I have not checked if updated versions of the SDK altered the behavior of the driver. I also checked the mainline at the time I did this and didn't see any movement toward making this more functional.

    The value that gets reported when pulling the reset cause from the watchdog driver is just masked to indicate whether a reset occurred or not. That really isn't helpful. To give it the desired granularity I mapped the memory and did a raw register read. I then cleared the watchdog reset cause register, so as to be ready for the next reset event. I then mapped the reset to the possible watchdog reset enumerations and stored it as the boot status. This way accessing the reset cause remained a function of the watchdog driver and I could maintain compatibility with software I use on two different platforms.

    I hope this helps you.

    Regards,

    Matt

  • Patrick,

    Your timing is pretty good. I am actually just starting a migration to the latest SDK now. It appears that some plumbing has been added to set things up better for getting at the reset causes information, but ultimately allowing the cause to be queried through the driver has to be added in. Please let me know if you have any questions.

    Regards,

    Matt

  • Hey Matt,

    Thank you for your quick response. I've pretty much did what you had described in your messages in this post with the exception of modifying the drivers since I didn't want to break anything. Due to time constraints I needed to use a workaround to get at this data. I'm currently using "devmem 0x44E00F08 8" to get at the PRM_RSTST register from user-space, which I where I need the information. It seems to be working fine, but for some reason I can't get devmem to reset the status after I've read it. This is not a problem since I only need to know if there was a warm reset or a cold reset. A pure cold reset returns 0x01 and a warm reset returns 0x03 (warm reset bit plus the cold reset bit that was not cleared). However, I would have preferred just querying GETBOOTSTATUS from /dev/watchdog since it's much cleaner.

    I'm currently using Buildroot (2/2015 RC2) and the 3.12.10 kernel that came with it. I don't really have the time to upgrade to the latest TI kernel and test it. Everything is working well and I don't want to break anything. I'm curious to see if/when they update the omap2 drivers to include the read_reset_sources() functionality for the AM335X devices. I will keep an eye on this thread. If you can remember to update it that would be awesome.

    Thanks again,

    Patrick
  • Patrick,

    You are welcome. I will keep updating while I continue the migration.

    Just a note on the reboot causes. I think we absolutely had to clear the register because those values are not exclusive. They will stack up. So if you have a cold reset and then a warm I believe we saw both bits set. That may just be our implementation/board, but just something I thought of. Try rebooting your board various ways to see what is happening.

    We are able to write to the register while we are in the driver. I don't know what the limitations of devmem are off the top of my head. We are using the __arm_ioremap method. You may want to set you size to 32 bits, not sure if you are getting rejected due to the size mismatch on the register size.

    As for messing with driver, I had the same reservation as you did. I am by no means a kernel hacker. The population of the reset cause is done at the time of the driver load, so there isn't any long term usage of the code that is inserted. This at least allowed me to sleep at night.

    Regards,
    Matt
  • Patrick,

    It looks like the watchdog driver is still not plumbed right for the reboot causes detection. The PRM files are setup better to allow this information to be provided, but it still requires one to define the reset causes function even though there is a place holder in the structure for it. When the driver is loaded a device tree in progress is detected and stops form registering the reboot causes function. This would tie the prm reset causes function to the watchdog driver. I am hesitant to force this registration. Without this registration, if the reset causes for the PRM was defined, one would have to get the information from the PRM driver. This is less than ideal since we maintain two different platforms and the code base for our product needs to be compatible with both. Using different devices for board reset causes breaks this compatibility.

    It looks like I will be continuing to use the watchdog driver hack for the time being to provide this compatibility. I perceive it to be low risk since it is only executed during the driver probe.

    Regards,
    Matt