This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RE: AM623: Watchdog will not reset processor

Other Parts Discussed in Thread: AM623, SK-AM62, SK-AM62B, AM625

continued from https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1338070/am625-how-to-control-the-watchdog 

Hi Nick,

I'm unable to get the watchdog to function in Linux with the V3 patch. The watchdog fails to reset the CPU when it's starved, despite testing with your provided test cases.

I'm currently using SDK version 9.01 and have backported the patch to this version of the driver, as the affected sections of the file haven't changed. I've also tested with the latest stable driver and patch combination without success.

This issue occurs on a custom board with an AM623 processor using SR1.0 HS-FS silicon. I've removed the main_rti2/3/15 instances from the k3-am62-main.dtsi file since this processor is dual-core without a GPU. With this change, Linux recognizes /dev/watchdog0 and /dev/watchdog1.

Interestingly, I could not reproduce the issue on an SK-AM62 EVM with the AM62X SR1.0 GP processor, where the V3 patch works fine. This suggests a potential issue specific to the AM623 variant of the processor.

Could you please try reproducing this issue on an AM623 part using the 9.01 SDK.

Thanks,
Aaron

  • Hello Aaron,

    Thank you for the report. Hmm, I am pretty sure that the SK-AM62B I tested on was also HS-FS silicon. I didn't try running it with the devicetree modifications to make the processor act like an AM623, but off the top of my head I would not expect simply disabling a couple of the watchdog instances to cause the non-disabled RTI instances to misbehave.

    As a reference, you can find the steps documented in the AM62x academy here:
    https://dev.ti.com/tirex/explore/node?node=A__AZ853VXSIrRV0D6eoeZeeg__AM62-ACADEMY__uiYMDcq__LATEST

    This is a weird week - my part of the city lost power and internet today (I had to drive to another city to type this response), I can't access any of my hardware while the power is out, and it looks like power is likely to be out for the next couple of days. Next week I'll be in another country, so I won't have access to my EVMs then either.

    Please do let us know once you've double-checked the steps to modify the devicetree files. If you are seeing any helpful terminal output on boot, that could be helpful as well.

    Please ping the thread next week if I have not replied. I might be able to get one of my team mates to run tests while I'm out of the country.

    Regards,

    Nick

  • Checking with the HW guys, they do not expect any differences in how the watchdog hardware would respond in AM625 vs AM623.

    If you want us to review your devicetree file, feel free to attach that here or send it to us through Michael. I can take a look at the file next week. It might also be interesting to make the same changes to your SK's devicetree file to see if you can replicate the behavior on the starter kit (if you disable all the right devicetree nodes, you should be able to get the AM625 to behave like an AM623).

    Regards,

    Nick

  • Hi Nick,

    Thank you for the feedback. I will try out the suggested approach of making the same changes to the SK's devicetree file to see if I can replicate the behavior on the starter kit. I'll also review my devicetree file and will send it over for review through Michael this week.

    Thanks,

    Aaron

  • Hello Aaron,

    I am still out of the country, but I have been asking around with other team members. One of the developers did report seeing similar behavior to you. I am asking for followup information now (e.g., what tests they ran, what EVMs & SW instances they saw the behavior on, how far into debug they got before they had to pivot to another task, etc).

    If you come across any other useful tests or discover anything else, please let us know. It might take a couple days for me to hear back from the developer, so please ping the thread on Monday if I have not replied by then to make sure your thread stays at the top of my queue.

    Regards,

    Nick

  • Hi Nick,

    Have you heard back from the developer?

    On a related note, I tried to enable the watchdog driver in u-boot but this resulted in a crash/reboot cycle.  I found a similar ticket here:

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1333288/am625-u-boot-crash-when-rti-watchdog-is-enabled

    It looks like we're unable to get the watchdog to function on the AM623 within U-Boot and Linux.

    Could you please escalate this item as it's critical for us to have a working watchdog.  Do you have any AM623 boards that this can be tested on?

    Thanks,

    Aaron

  • Hello Aaron,

    What is the plan?

    I have moved your discussion to a separate thread to help with tracking (previous thread for watchdog not able to be pet, this thread for watchdog not resetting the processor). You can expect to get daily updates here for the next couple of days.

    The developer did not get very far into debug before they had to pivot elsewhere, unfortunately. They are unavailable this week and next week. I might be able to get on the phone with them (we'll see), but most of the debug & test effort is going to be on you and me. Another customer on a slightly different device is also helping with the debug. I'll link to their thread now: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1370422/am62p-am62p 

    I missed your devicetree file while I was out of the country, but I found it now. I'll give another response before signing off for the night.

    Regards,

    Nick

  • Hello Aaron,

    Please create a separate thread for the uboot issue and give me & Michael the link. I'll follow-up offline with the U-Boot people to make sure it gets attention in the near term.

    Regards,

    Nick

  • Hello Aaron,

    Does uboot make the difference? 

    Are you using the version of u-boot within the downloaded TI SDK, or mainline uboot, or something else?

    As discussed on the AM62Px thread, it looks like some of the esm code that I see in AM62x/AM62Px SDK 9.2 uboot is missing from mainline uboot. This applies to AM62x as well - even though the k3-am62x-sk-common.dtsi file is defined in mainline uboot, it is missing the ESM code:

    ti-processor-sdk-linux-am62pxx-evm-09.02.01.09/board-support/ti-u-boot-2023.04+gitAUTOINC+f9b966c674$ grep -r 'esm'
    grep: .git/index: binary file matches
    ...
    arch/arm/dts/k3-am62x-sk-common.dtsi:   mcu_esm: esm@4100000 {
    arch/arm/dts/k3-am62x-sk-common.dtsi:           compatible = "ti,j721e-esm";
    arch/arm/dts/k3-am62x-sk-common.dtsi:           ti,esm-pins = <0>, <1>, <2>, <85>;
    arch/arm/dts/k3-am62x-sk-common.dtsi:   main_esm: esm@420000 {
    arch/arm/dts/k3-am62x-sk-common.dtsi:           compatible = "ti,j721e-esm";
    arch/arm/dts/k3-am62x-sk-common.dtsi:           ti,esm-pins = <160>, <161>, <162>, <163>, <177>, <178>;
    

    That ESM code that I am observing above is in the uboot packaged in AM62x Linux SDK for both 9.0 & 9.2.

    I don't see any other code in uboot or Linux that appears to be setting the ESM outside of the uboot file arch/arm/mach-k3/am625_init.c (although I'm asking around to see if there's something I'm missing). If your uboot devicetree code doesn't define the ESM, perhaps that init code does not initialize the ESM properly?

    If you are not using the version of uboot in the TI SDK download, please try booting the board with the "TI" version of Uboot see if watchdog starts behaving as expected.

    Can you find a way to replicate your observations on the SK board?

    I would be curious if you could find a way to replicate your observations on the SK board. I will see if I can replicate yall's observations on AM62x or AM62Px tomorrow, but any hints about how I can see what you are seeing would be helpful.

    DTS review 

    I didn't see anything in your Linux devicetree that would make me concerned about watchdog working properly.

    Some minor notes - if you want to discuss more about any of these points, feel free to create a separate thread and we can discuss there.

    1) Does your design generate a PPS output signal from the CPSW Ethernet? If not, the &timesync_router node & K3_TS_OFFSET definition can be removed.

    2) Are you programming the M4F? If not, I can point you to information on how to disable the M4F and free up those DDR memory allocations for the rest of your code.

    3) rtos_ipc_memory_region is not used by Linux - it is just set aside for usage of some of the out-of-the-box examples between M4F and the DM R5F core. If your remote cores (i.e., M4F & DM R5F) are not using that memory, it can be removed.

    Regards,

    Nick

  • Our AM62Px friend is reporting that they are seeing progress after looking into ESM definitions in uboot. I'm asking for more details: 
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1370422/am62p-am62p/5264230#5264230

    So that is the first place I would suggest you take a look today.

    Regards,

    Nick

  • Hi Nick,

    I am using the SDK version of u-boot and not the mainline.  It's from https://git.ti.com/git/ti-u-boot/ti-u-boot.git and is built from the head of branch ti-u-boot-2023.04 with tag 09.01.00.001.

    The esm device tree nodes that are defined in k3-am62x-sk-common.dtsi are already defined in our board-specific top-level DTS file:

    penguin2-r5.dts: mcu_esm: esm@4100000 {
    penguin2-r5.dts: compatible = "ti,j721e-esm";
    penguin2-r5.dts: ti,esm-pins = <0>, <1>, <2>, <85>;
    penguin2-r5.dts: main_esm: esm@420000 {
    penguin2-r5.dts: compatible = "ti,j721e-esm";
    penguin2-r5.dts: ti,esm-pins = <160>, <161>, <162>, <163>, <177>, <178>;

    Our board-specific R5 DTS essentially combines the k3-am62x-r5-sk-commong.dtsi and k3-am625-r5-sk.dts into a single file.  I believe this is enough to get the ESM module to initialize.  I believe I validated this a while ago by poking the watchdog registers in u-boot with the following commands:

    mw.l 0xe0000a4 0xa 1
    mw.l 0xe000090 0xA98559DA 1

    This resulted in the board resetting almost immediately.  Would this work with the ESM uninitialized?

    Thanks,
    Aaron

     

  • Hi Nick,

    In regards to the U-boot crash when enabling the RTI watchdog driver, I'll create a new thread once I reproduce it.

    I'm also taking a look into your DTS review comments but as of yet, we aren't using the M4F or R5F.  I'm not sure if that will change in the future but I think it's safe to remove the reserved memory regions.

    I'm uncertain about the PPS output signal and I'll need to check if we use it.

    Thanks,
    Aaron

  • Hello Aaron,

    I went looking through the watchdog clock sources today.

    Summary 

    Do you have a 32kHz oscillator connected to input WKUP_LFOSC0? If so, please verify that the watchdog is getting sourced by that clock source. If not, please show me which clock source you are using. A register dump is ok - I'll show you how I got my register dump below.

    Here's the process I followed 

    It might not be the cleanest approach, but I'm learning as I go and it's 7:20pm on a Friday. I've gotta go home.

    Our hardware team indicated that the watchdog timer should NOT be sourced from the 32kHz RC clock, because that clock source is not necessarily accurate - the period of the waveform can have 50-100% error. "RC clocks are only usually used in "emergencies" when there is no 32KHz clock source.  Because of its error, it should never be relied upon and is typically available just to keep clocking something until you can recover"

    While it LOOKS it looks like our devicetree files actually configuring all the AM62x watchdog timer instances to be sourced by the 32kHz RC clock, that is not actually the case:

    k3-am62-main.dtsi
    
        main_rti0: watchdog@e000000 {
            compatible = "ti,j7-rti-wdt";
            reg = <0x00 0x0e000000 0x00 0x100>;
            clocks = <&k3_clks 125 0>;
            power-domains = <&k3_pds 125 TI_SCI_PD_EXCLUSIVE>;
            assigned-clocks = <&k3_clks 125 0>;
            assigned-clock-parents = <&k3_clks 125 2>;
        };
    
        main_rti1: watchdog@e010000 {
            compatible = "ti,j7-rti-wdt";
            reg = <0x00 0x0e010000 0x00 0x100>;
            clocks = <&k3_clks 126 0>;
            power-domains = <&k3_pds 126 TI_SCI_PD_EXCLUSIVE>;
            assigned-clocks = <&k3_clks 126 0>;
            assigned-clock-parents = <&k3_clks 126 2>;
        };
    
        main_rti2: watchdog@e020000 {
            compatible = "ti,j7-rti-wdt";
            reg = <0x00 0x0e020000 0x00 0x100>;
            clocks = <&k3_clks 127 0>;
            power-domains = <&k3_pds 127 TI_SCI_PD_EXCLUSIVE>;
            assigned-clocks = <&k3_clks 127 0>;
            assigned-clock-parents = <&k3_clks 127 2>;
        };
    
        main_rti3: watchdog@e030000 {
            compatible = "ti,j7-rti-wdt";
            reg = <0x00 0x0e030000 0x00 0x100>;
            clocks = <&k3_clks 128 0>;
            power-domains = <&k3_pds 128 TI_SCI_PD_EXCLUSIVE>;
            assigned-clocks = <&k3_clks 128 0>;
            assigned-clock-parents = <&k3_clks 128 2>;
        };
    
        main_rti15: watchdog@e0f0000 {
            compatible = "ti,j7-rti-wdt";
            reg = <0x00 0x0e0f0000 0x00 0x100>;
            clocks = <&k3_clks 130 0>;
            power-domains = <&k3_pds 130 TI_SCI_PD_EXCLUSIVE>;
            assigned-clocks = <&k3_clks 130 0>;
            assigned-clock-parents = <&k3_clks 130 2>;
        };
    

    When we check the TISCI clock documentation:
    https://software-dl.ti.com/tisci/esd/latest/5_soc_doc/am62x/clocks.html

    We can see that clock ID 2 = DEV_RTI1_RTI_CLK_PARENT_CLK_32K_RC_SEL_OUT0 for all of the RTI instances.

    But as per the AM62x clock tree tool, CLK_32K_RC_SEL is actually a clock mux. So we need to also check to see how that mux is set to see if the 32kz RC clock is being used, or if another clock is being used.

    Off the top of my head, I'm not sure how to find where that mux is set in the Linux devicetree - I don't see any entries that seem helpful in the TISCI page. So I'm brute-forcing it by checking registers.

    How to verify the clock source? 

    The AM62x SK has a 32kHz oscillator connected to input WKUP_LFOSC0 (which seems to be the same as LFOSC0 in the clock tree tool and the TRM). This is the clock source we want to use for the watchdog timer.

    Let's open up the AM62x clock tree tool: https://www.ti.com/tool/CLOCKTREETOOL

    Navigate to PERIPHERALS > TIMER MODULES > RTI0

    In the lower-right of the screen, I'm going to go to Generated Files > register_dump.rd1

    Whenever I make a change to the clock mux settings, I should see that change reflected in the register dump file.

    I want LFOSC0 as my clock source. So the signal needs to go through CLK_32K_RC_SEL, and MAIN_WWDTCLK0_SEL.

    Let's toggle between different inputs on MAIN_WWDTCLK0_SEL. The register dump will show me what memory address to search for, and what value I should see in that address:

    And then I can do the same for CLK_32K_RC_SEL:

    I can double-check those values against the registers in the TRM:

    and then finally I confirmed the settings on the EVM:

    root@am62xx-evm:~# devmem2 0x108380
    /dev/mem opened.
    Memory mapped at address 0xffffa10b6000.
    Read at address 0x00108380 (0xffffa10b6380): 0x00000001
    root@am62xx-evm:~# devmem2 0x4508058
    /dev/mem opened.
    Memory mapped at address 0xffff9976b000.
    Read at address 0x04508058 (0xffff9976b058): 0x00000003

    Regards,

    Nick

  • Hi Nick,

    I checked the registers above and I'm retrieving the same values:

    root@penguin2:~# devmem2 0x108380
    /dev/mem opened.
    Memory mapped at address 0xffff85571000.
    Read at address  0x00108380 (0xffff85571380): 0x00000001
    root@penguin2:~# devmem2 0x4508058
    /dev/mem opened.
    Memory mapped at address 0xffff87859000.
    Read at address  0x04508058 (0xffff87859058): 0x00000003
    root@penguin2:~#

    This confirms that we are selecting the external WKUP_LFOSC0 signal.  An RTC (MCP7940N) drives the WKUP_LFOSC0 line via its MFP output.  However, I couldn't detect a 32Khz signal on this pin using my DSO.  This appears to be the culprit!  I'm trying to confirm with a colleague if they're experiencing the same anomaly with a newer HW revision.

    As a further step to identify this as the culprit, I selected the RCOSC clock source to drive RTI0 and this reset the device after roughly 60 seconds (default WD timeout).

    One issue that I can foresee is that external oscillator source (MCP7940N) requires configuration to drive this signal.  This is being done in Linux using an RTC driver module.  This means that this source isn't available in u-boot (or earlier) unless we add code to configure it via I2C.  Would the internal oscillator be sufficient during the bootloader stages?

    Thanks for pointing me in this direction.  I'll update this post once I know more.

    Thanks,
    Aaron

  • Hello Aaron,

    Glad to hear we are making progress! Hopefully that fixes things.

    Watchdog in uboot

    I won't have time to double-check the uboot drivers tonight, so please take these statements with a grain of salt.

    If you need a watchdog during uboot, I think you could be fine using the RC driver module, depending on your usecase.

    THING #1: Keep in mind that the RC clock could theoretically be running at twice the speed (or half the speed) that you would expect.

    That's not an exact number - I am not sure if there is a way to really benchmark the worst-case frequency variation. But your usecase would have to be VERY frequency tolerant.

    For one, the watchdog should NOT configure a window (or configure a 100% window) - that way you won't accidentally reset the processor if the RC clock runs slower than the application clock, leading to the application trying to pet the watchdog before the watchdog has opened it's hardware window.

    You would also want to make sure there was a big difference between the expected pet frequency and the timeout value. e.g., I would not trust a 20 second timeout with an expected 10 second pet period. I would probably feel comfortable with a 60 second timeout and a 10 second expected pet period, but I would do more research into the expected worst-case behavior of an RC clock, just in case.

    THING #2: Keep in mind that the period for a specific watchdog can only be set once per boot.

    If you wanted a different timeout period for your runtime applications, you would need to use a different watchdog.

    THING #3: I'm not sure whether you can actually disable a watchdog once it has started 

    I have not experimented with this, so I might be wrong. But I think Linux might throw a "no way out" error if you try to disable a running watchdog? Off the top of my head I am not entirely sure how this works, or what happens in a usecase like a low power mode transition.

    Please keep us updated - I've got 3 more days before vacation

    I will be working Tuesday - Thursday, but after that I will not have access to my computer until the 2nd half of July. My manager and much of our team will be out of office next week, so it might be difficult for us to continue to support yall if debug needs to continue into next week. I will continue setting aside time specifically to work on your request all 3 of those days.

    Regards,

    Nick

  • another note - by default, the Linux watchdog driver configures a 50% window. So by default, I would say the Linux watchdog driver is NOT compatible with an RC clock source.

  • Hello Aaron,

    Final update: I mentioned that I'm working with other customers on watchdog issues for AM62Ax & AM62Px. It looks like we've solved their issues, but it does NOT look like those threads apply to your usecase (we accidentally copy-pasted AM62x ESM settings into AM62Ax & AM62Px, but the ESM is actually different between devices. That meant that the AM62Ax & AM62Px ESM modules were not receiving the watchdog interrupt to reset the device).

    For future readers, we are working to fix AM62Ax & AM62Px watchdog for SDK 10.0. You can see updates from those customers here:
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1370422/am62p-am62p
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1368655/am62a7-enabling-the-watchdog-and-testing

    Regards,

    Nick

  • I'm not very familiar with windowed watchdogs.  Our Linux use-case for a watchdog would be to open /dev/watchdog and pet it from userland.  For example, the default heartbeat is 60 seconds, petting it every 10 seconds would be ideal.  However, the watchdog driver (rti_wdt) configures a 50% watchdog window.  This would result in us servicing the watchdog outside the open window, resulting in a reset.

    I'm interpreting the correct servicing to happen within the 50% window, so with a 60s timeout, that would be between 30-60 seconds.  Once the user-land process services the watchdog, it reloads the counter and we can service it again between 30-60 seconds from the last service point.  Please let me know if that's correct.

    I'm also curious how the clocks and associated mux's are configured for the RTI during startup.  I know you attempted to look into this along with the associate calls into TISCi and I'm also hitting a dead end here.

  • Hello Aaron,

    Appropriate pet intervals per period

    As long as the hardware watchdog's clock source is within 2% of the Linux driver's clock source (even a 32,500Hz clock source is still within 1% of 32,768Hz), the Linux driver will prevent userspace from petting the watchdog before the window opens.

    So as long as you are using your external RTC instead of the internal RC clock, you would be just fine setting a 10 second pet interval for a 60 second timeout. If the pet happened during the 30 seconds where the window has not opened yet, the Linux driver will just ignore the pet.

    If you are curious to see the minimum required pet interval for a specific timeout, please refer to this response on the previous thread:
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1338070/am625-how-to-control-the-watchdog/5185069#5185069

    Where "sw_timeout" would equal 60 seconds in your case, and "max_service_time" would be the pet interval plus any Linux overhead you want to factor in (for example, look at Case 3 which assumes 10 second pet interval and 200ms worst-case Linux overhead).

    How are the clocks & muxes configured during startup? 

    If you look at the signal diagram in the above screenshots from the clock tree tool, the mux MAIN_WWDT_CLK0_SEL is represented in the devicetree file by entry "assigned-clock-parents". You can refer to the previous response for how to check the clock IDs in the TISCI documentation.

    As for where the CLK_32K_RC_SEL mux is configured... I am not sure. I'll reach out to the rest of the team to see if anyone knows where we would look.

    Regards,

    Nick

  • TENTATIVE UPDATE

    I am still collecting feedback from other team members. I'm still not sure how that mux is getting set to 4 currently - I would assume that by default, the mux was set to 0 (hardware reset value) or 1 (the first possible input).

    With that said, we should be able to manually set the mux like this:

    &main_rti0{
            clocks = <&k3_clks 125 0>, <&k3_clk 193 0>;
            assigned-clocks = <&k3_clks 125 0>, <&k3_clks_125 0>;
            assigned-clock-parents = <&k3_clks 125 2>, <&k3_clks 193 4>;
    } 

    You should be able to change that k3_clks 193 entry to be set to 1, 2, or 3, and then see the register value of devmem2 0x4508058 change appropriately (with the understanding that in your actual design, you want to set it to 4 in the Linux devicetree, so that it sets the register to 0x3 as it is currently doing).

    Regards,

    Nick

  • Hello Aaron,

    I hope everything is working for you! If you need additional support, feel free to reply back here, or create a new thread if you have a new request.

    Regards,

    Nick