TMS570LS3137: Is the watchdog supposed to work within the ESM interrupt context?

jimj2713

Part Number: TMS570LS3137

My desired behavior is that the CPU is reset should an ESM Group2 or Group3 event occur. In my current code, I have a simple while(1) in the ESMInterrupt handler, but the watchdog is not resetting the processor. I suspect that, because the ARM fundamentally only has 2 interrupts (IRQ and FIQ), that the watchdog cannot occur while in the ESM exception -- is that true?

Note my watchdog properly works outside the interrupt context.

I have 2 questions then:

1. Should the watchdog be capable of resetting the processor from any interrupt or exception context, and if so, is there a register configuration I misses to allow this?

2. If the watchdog is not an option, is there a way to reset the processor from this context? I tried the CPURSTCR register, but it did not work. Also in this register description, it states "Only the CPU is reset whenever this bit is toggled. There is no system reset", which suggests there is NO way to reset the processor via software.

Thanks,

Jim

over 7 years ago

0 Chuck Davenport over 7 years ago

TI__Guru 59540 points

Hello Jim,

My desired behavior is that the CPU is reset should an ESM Group2 or Group3 event occur. In my current code, I have a simple while(1) in the ESMInterrupt handler, but the watchdog is not resetting the processor. I suspect that, because the ARM fundamentally only has 2 interrupts (IRQ and FIQ), that the watchdog cannot occur while in the ESM exception -- is that true?

This is not true. The DWD and DWWD are independent of the CPU and result in either a warm reset or an NMI dependent on configuration. It should not matter if the CPU is processing an interrupt or not since this is one of the scenarios in which the CPU could become "lost."

Also worth noting is that the Group2 errors will result in an NMI and not an FIQ or IRQ as would be the case with Group 1 errors if so configured. Also worth noting, Group 3 errors will not result in an interrupt since, in most cases, the CPU will issue an abort in response to these critical fault types. See below from the datasheet.

1. Should the watchdog be capable of resetting the processor from any interrupt or exception context, and if so, is there a register configuration I misses to allow this?

Yes, the DWD or DWWD should trip the DWD regardless of whether the CPU is executing code as a result of an ISR, exception, or otherwise. If the DWD is triggering a reset outside of the ISR then there is no reason I can think of for it to not do the same within the context of the ISR. In either case, the CPU is simple fetching instructions and executing them. The ISR really only impacts program flow.

2. If the watchdog is not an option, is there a way to reset the processor from this context? I tried the CPURSTCR register, but it did not work. Also in this register description, it states "Only the CPU is reset whenever this bit is toggled. There is no system reset", which suggests there is NO way to reset the processor via software.

Please see the System Exception Control Register (SYSECR) register described in the TRM in section 2.5.1.47. This is the register used to trigger a reset using SW.

In regard to using the DWD for the purposes you are using it for, is your project a functional safety project? If so, you may want to consider the effect of a common cause failure in the clock subsystem. In that case, the DWD would potentially count at the same depreciated clock rate as the CPU but the system would execute much slower or faster than intended in the event the failure caused the clock to slow or accelerate. If the clock failed entirely (stopped), the on chip WD would also stop counting and never trigger a reset for this scenario as well. If you want to continue using the DWD in a safe manner, it would make sense to provide an external clock using one of the EXTCLKIN pins to drive the RTI. The maximum for this clock is 16.67MHz.

0 Chuck Davenport over 7 years ago

TI__Guru 59540 points

Jim,

I forgot to mention, if this is something you feel fairly confident about and you have a relatively simple CCS project you can send that demonstrates the issue, I can take a look at it on my bench if you can zip the project and post it on this thread.

0 jimj2713 over 7 years ago in reply to Chuck Davenport

Expert 2375 points

I'll see if I can break on this failure and look at the watchdog registers. I should be able to see the count and status change.

0 jimj2713 over 7 years ago in reply to jimj2713

Expert 2375 points

I have some additional information on this issue. I was reproducing this by forcing a lockstep, parity, or double-bit TCRAM error and watching the behavior, but my test code was invoked from within the context of an interrupt (specifically the N2HET-bound interrupt). From this interrupt context, forcing one of these group2 ESM errors causes the CPU to hang and no watchdog fires.

I was building a simple application to send to you to demonstrate this, and my simple app did the same thing, but from the background task (from main.c, not within an interrupt), and the watchdog properly resets the CPU for all of these test cases.

I am not sure if this is a problem anymore and I wanted your input. If you think the watchdog should fire always -- no matter what context the code is in, I should be able to send you an application that demonstrates this. If my original test code was invalid by forcing test-mode parity/lockstep errors from within an interrupt, please tell me so and I will happily change it.

Thanks,
Jim

0 Mukul Bhatnagar over 7 years ago in reply to jimj2713

TI__Guru* 83815 points

Hi Jim
I am auditing a few older forum posts that were potentially not resolved. Since it has been a while since this post was open and eventually locked, I wanted to check if you were able to resolve the issue or need further guidance?

Regards
Mukul

0 jimj2713 over 7 years ago in reply to Mukul Bhatnagar

Expert 2375 points

Mukul,

Thanks for the followup; this issue was not resolved, but we have worked around it in our design. I found during my unit-testing, and it was confirmed when we subjected our boards to radiation testing, that the processor would sometimes halt -- and the watchdog would not reset the processor. I know that the watchdog is properly configured because we also had watchdog events. In our design we had to add an external hardware-watchdog to overcome this. Please let me know if I can provide any additional information.

Jim

0 Sunil Oak over 7 years ago in reply to jimj2713

TI__Mastermind 49120 points

Hi Jim,

What is the configuration of the Digital Windowed Watchdog Reaction Control (RTIWWDRXNCTRL) register in your application? I think it is configured to not generate a reset and generate a non-maskable interrupt instead. You may be choosing to generate a reset from this NMI service routine.

Note that if the CPU is already in an FIQ mode due to some other ESM group2 error being intentionally caused, all new FIQs are automatically blocked since the architecture does not support nested interrupts. That is why the watchdog NMI is not serviced until the CPU first exits from the current FIQ.

Regards,
Sunil

0 Mukul Bhatnagar over 7 years ago in reply to jimj2713

TI__Guru* 83815 points

Hi Jim
Ok thanks. Let me follow up with the team to see if they can provide any additional pointers on this.

0 jimj2713 over 7 years ago in reply to Mukul Bhatnagar

Expert 2375 points

Sunul,
The RTIWWDRXNCTRL register is set to a 5, which should cause a reset. I think the problem is explained by your second comment -- if the CPU is already in an FIQ context (caused by some ESM group2 event), AND if that event cannot be returned from (like a lockstep or double-bit memory error), then the original FIQ prevents the watchdog reset?
Jim

0 Sunil Oak over 7 years ago in reply to jimj2713

TI__Mastermind 49120 points

Jim,

A watchdog reset is not prevented by an FIQ. A watchdog reset causes a system reset, which in turn also disables the watchdog until it is enabled by the application again.

Are all reset sources enabled in your application? Your initial description of the CPU being halted indicates a failure in the clocking. This would cause a system reset upon detection of an oscillator fault, or a PLL slip. These two features can be disabled though, so it would help to check if they are enabled as system reset sources.

Regards,
Sunil

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LS3137: Is the watchdog supposed to work within the ESM interrupt context?