This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM3517: Debugging unknown Watchdog Timer Reset

Part Number: AM3517

Tool/software: Linux

We have an custom AM3517 Sitara based hardware platform (very similar to the Beagleboard) running the Angstrom Distribution Linux (3.19.2) version. The SD card is setup with the standard MLO -> uBoot -> Linux Kernel partition scheme. In addition to that we have a recovery partition setup in case of primary linux kernel partition issues and or system freeze - both are based upon watchdog resets.

We have a number of these platforms deployed in the field and the issue is that some of the systems are rebooting into the recovery partition due to a watchdog reset after the system is operating normally for weeks, sometimes months. We are not able to replicate this issue in our labs, and there is no pattern as to why/when the system reboots due to watchdog reset. We have tried enhanced logging to see if the watchdog process is being killed but there is no indication of it. Also there are no OOM logs that show the watchdog-feed process being killed. The watchdog-feed is setup to reset the timer every 3 seconds and the reset time is set to 3 minutes. The symptoms all indicate that logging, kernel and other processes just freeze and the watchdog (set to 3 minutes) timer triggers a reset.

We are looking for suggestions and/or ideas on how to troubleshoot this issue. We have access to the CPU JTAG port.

-Prasad

  • Hi Prasad,

    Welcome to TI e2e community!

    As you mentioned, the issue you are facing is not observed on all the boards and also the pattern of watchdog reset is not consistent, it would be difficult to comment about solving this problem without having enhanced debugging mechanisms, logs and observations being shared in this forum.

    Can you please let us know the following?

    1) What is the end use case application you are running in the field?
    2) How different is the custom board from standard EVM/EVK?
    3) Can you please share the logs before/during the crash?
    4) What are the logging mechanisms you have used till now?
    5) Is it possible for you to replace the most frequently rebooting board with another and utilize the faulty board for analysis?


    Thanks,
    Prabhuraj
    BlackPepper Technologies