Tool/software: Linux
We have an custom AM3517 Sitara based hardware platform (very similar to the Beagleboard) running the Angstrom Distribution Linux (3.19.2) version. The SD card is setup with the standard MLO -> uBoot -> Linux Kernel partition scheme. In addition to that we have a recovery partition setup in case of primary linux kernel partition issues and or system freeze - both are based upon watchdog resets.
We have a number of these platforms deployed in the field and the issue is that some of the systems are rebooting into the recovery partition due to a watchdog reset after the system is operating normally for weeks, sometimes months. We are not able to replicate this issue in our labs, and there is no pattern as to why/when the system reboots due to watchdog reset. We have tried enhanced logging to see if the watchdog process is being killed but there is no indication of it. Also there are no OOM logs that show the watchdog-feed process being killed. The watchdog-feed is setup to reset the timer every 3 seconds and the reset time is set to 3 minutes. The symptoms all indicate that logging, kernel and other processes just freeze and the watchdog (set to 3 minutes) timer triggers a reset.
We are looking for suggestions and/or ideas on how to troubleshoot this issue. We have access to the CPU JTAG port.
-Prasad