TMS320F28062: ISR Not clearing interrupt flag and other strange after power cycle.

Traver Gumaer1

Part Number: TMS320F28062

I have a project using the 28062 to control a large power regulator. Part of the testing is interrupting the input power to the device at intervals as long as 5 seconds.

1st issue:

The real-time interrupt routine would stop running shortly after a power cycle (with the background code still running). I seemed to have fixed this by changing the lines that cleared the PWM interrupt and set PIEACK. These lines were originally running out of flash even though the ISR was in RAM.

the original code uses my own driver functions that I forgot to move to RAM when I moved the ISR:

pwm_clear_event(PWM_3, EVENT_INT);

interrupts_pieack(PIE_GROUP3);

I replaced these with the usual:

EPwm3Regs.ETCLR.all = 1;

PieCtrlRegs.PIEACK.all = M_INT3;

Is this some sort of pipeline issue?

2nd issue:

Immediately after I seemed to fix the fist problem, another problem appeared. All interrupts were running (i can see this by toggling external pins) but some functions stopped working. For example, I have a display that gets data over CAN. All data to the display stops even thought the pin signal tells me this loop is still running. This function happens to be in RAM. I will test moving it to flash tomorrow and see in this particular function still has a problem. Right now this is only a theory.

I realize these are strange issues but has anyone seen problems like these? The BOR circuit should be taking care of any brown-out problems, but it looks to me like some code, especially RAM functions, are getting corrupted during these power loss tests.

over 8 years ago

0 Adam Dunhoft over 8 years ago

TI__Expert 7885 points

Hello Traver,

If the BOR is tripped, the device will reset. Can you scope the VDDIO, VDD and XRSn signals? This should tell you if the BOR is tripping.

Best Regards,
Adam Dunhoft

0 Adam Haun over 8 years ago

TI__Expert 8950 points

Traver,

Do you set up the flash wait states before enabling the interrupt?

Does your interrupt for CAN come from the CAN module itself, or are you using a timer interrupt?

0 Traver Gumaer1 over 8 years ago in reply to Adam Haun

Intellectual 320 points

The interrupt is not driven by the CAN, it is a ePWM CTR_ZERO triggered interrupt. That issue seems to be fixed by moving the interrupt clear directly in the function (instead of a function call).

I caught a strange power-on event on the 3.3V This is why I was looking into the BOR, but many lock-ups have occurred with a clean, quick, power start-up as well.

The main problem is, after a power cycle, my device (a regulator) randomly shuts off. At the same time, the CAN freezes. Sometimes the CAN freeze occurs without the shutdown. Because I am removing input power, the debugger disconnects making this very difficult to troubleshoot. When I reconnect, the MCU resets and everything is fine. If I connect and look at memory before starting the MCU, I see memory locations such as the controller output and the voltage feedback are corrupt (they have some random huge value in them). This explains the shutdown because the regulator thinks it is overvoltage. I cannot investigate the CAN registers, because when I connect, the registers are all reset. Maybe I can look further by saving some values to the CAN shadow.

I thought I had found the issue because in comparing with an older version of the code, I had changed the DELAY_US() function to my own function using Timer 2. I noticed a case where I called the function before setting the timer PRD. Moving the setting of the PRD made the problem go away, but when I looked closer, given my settings, the delay function was NEVER CALLED.

Now I am in a tough spot where moving a single line of code causes the failure, but there is NO LOGICAL REASON.

0 Traver Gumaer1 over 8 years ago in reply to Traver Gumaer1

Intellectual 320 points

This may be a red herring, but here is what I have done to make the problem appear and disappear:
in main, I was accidentally calling the microsecond_delay() function (that uses timer 2) before the Timer 2 period was set.

system_init();
gpio_init();
adc_init();

timer_set_period(CPU_TIMER_2, 90UL);

The change below seems to make the problem disappear:

system_init();

gpio_init();
timer_set_period(CPU_TIMER_2, 90UL);
adc_init();

All the relevant functions are below. My question is WHY DOES THIS WORK? Removing both microsecond_delay() completely also works, so this seems to be unrelated to the actual delay time being present or not. Is there a problem in issuing a timer reload when the PRD is 0? Why would a call to this function cause a system wide failure? The older version of the code used the TI provided DELAY_US() and has never exhibited the issue through hundreds of power-failure tests.

ADC INIT FUNCTION:
void adc_init(void)
{
EALLOW;

adc_regs.ADCCTL1.bit.ADCBGPWD = 1U;
adc_regs.ADCCTL1.bit.ADCREFPWD = 1U;
adc_regs.ADCCTL1.bit.ADCPWDN = 1U;
adc_regs.ADCCTL1.bit.ADCENABLE = 1U;
adc_regs.ADCCTL1.bit.ADCREFSEL = 0U;

microsecond_delay(1000);

adc_regs.ADCCTL2.bit.CLKDIV2EN = 1U;

microsecond_delay(1000);

// The ACQPS bits set the sampling window, in ADC clocks. Total conversion
// time is ACQPS + 1 + 13.
adc_regs.ADCSOC0CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC1CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC2CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC3CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC4CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC5CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC6CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC7CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC8CTL.bit.ACQPS = 8U;
adc_regs.ADCSOC9CTL.bit.ACQPS = 8U;
. . .

DELAY AND TIMER FUNCTIONS:
void microsecond_delay(uint32_t microsecond_count)
{
timer_reload(CPU_TIMER_2);

while(microsecond_count > 0)
{
while(timer_check_intflag(CPU_TIMER_2) == 0U);
timer_clear_intflag(CPU_TIMER_2);

microsecond_count--;
}

return;
}

uint16_t timer_check_intflag(uint16_t timer_number)
{
uint16_t flag_status;

switch(timer_number)
{
case CPU_TIMER_0:
{
flag_status = CpuTimer0Regs.TCR.bit.TIF;
break;
}
case CPU_TIMER_1:
{
flag_status = CpuTimer1Regs.TCR.bit.TIF;
break;
}
case CPU_TIMER_2:
{
flag_status = CpuTimer2Regs.TCR.bit.TIF;
break;
}
default:
{
flag_status = 0u;
break;
}
}

return flag_status;
}

void timer_clear_intflag(uint16_t timer_number)
{

switch(timer_number)
{
case CPU_TIMER_0:
{
CpuTimer0Regs.TCR.bit.TIF = 1U;
break;
}
case CPU_TIMER_1:
{
CpuTimer1Regs.TCR.bit.TIF = 1U;
break;
}
case CPU_TIMER_2:
{
CpuTimer2Regs.TCR.bit.TIF = 1U;
break;
}
default:
{
break;
}
}
}

void timer_reload(uint16_t timer_number)
{

switch(timer_number)
{
case CPU_TIMER_0:
{
CpuTimer0Regs.TCR.bit.TRB = 1U;
break;
}
case CPU_TIMER_1:
{
CpuTimer1Regs.TCR.bit.TRB = 1U;
break;
}
case CPU_TIMER_2:
{
CpuTimer2Regs.TCR.bit.TRB = 1U;
break;
}
default:
{
break;
}
}
}

void timer_set_period(uint16_t timer_number, uint32_t period)
{

switch(timer_number)
{
case CPU_TIMER_0:
{
CpuTimer0Regs.PRD.all = period;
break;
}
case CPU_TIMER_1:
{
CpuTimer1Regs.PRD.all = period;
break;
}
case CPU_TIMER_2:
{
CpuTimer2Regs.PRD.all = period;
break;
}
default:
{
break;
}
}
}

0 Adam Haun over 8 years ago in reply to Traver Gumaer1

TI__Expert 8950 points

I'm still looking into this, but do note that the RAM on the F2806x is not automatically cleared on reset. If you're expecting valid values in memory on program start-up, you'll have problems.

0 Traver Gumaer1 over 8 years ago in reply to Adam Haun

Intellectual 320 points

I am not expecting RAM to be set at power up. Since the test I am running is a power loss, the debugger will obviously not stay connected to investigate the problem. After the power has come back on and I see the issue appear, I can re-connect the debugger, but then the MCU resets clearing all the registers. Since RAM is unaffected by the reset, I am attempting to use it as a troubleshooting tool.

At least part of my problems are caused by the RAM locations holding my feedback values for the regulator are corrupt. It looks like a classic case of an array index going wild, but there are no arrays anywhere near these memory locations and this does not explain the CAN freeze. The CAN is harder to troubleshoot, because the registers are all reset when I re-connect.

If this was pointer issue or something of the sort, I would expect much more random behavior and I would expect the issue to happen much more frequently then it does. It is unnerving because I know that unless I find the root cause, it is bound to show up again.

Do you know of any debugger that would allow you to connect without causing a processor reset? The device resets with both the xds100 and xds510 even though I have the box unchecked to reset on connect.

0 Adam Haun over 8 years ago in reply to Adam Haun

TI__Expert 8950 points

It doesn't sound like a watchdog problem, but you might double-check that the watchdog is disabled or fed during this sequence. You could also verify that the timer interrupts aren't enabled before the timer is set up.

0 Adam Haun over 8 years ago in reply to Traver Gumaer1

TI__Expert 8950 points

Sorry; there was some crosstalk. I just saw your most recent message.

You can prevent the debugger from resetting on connection by editing the GEL file. In CCS, go to Tools -> GEL Files. This will open a frame with a list of GEL files. Right-click on the one that's listed, then choose Open. The syntax is basically C. You'll need to comment out the call to GEL_Reset() in OnTargetConnect(). Once you've done that, save the file, right-click on the name of the gel file in the GEL Files frame, then choose Reload. You should now be able to connect without resetting the MCU.

If you need to load symbols or other code without resetting, comment out the call to GEL_Reset() in OnPreFileLoaded().

0 Traver Gumaer1 over 8 years ago in reply to Adam Haun

Intellectual 320 points

Thanks for all the information. Right before I read your last message, I had a streak of luck and was able to cause the failure with the debugger connected and solve the problem.

My issue was indeed a pointer overrun trashing memory. I have a serial state machine for communication to a voltage sensor off on another pcb in the system. One state in the serial state machine had and error where it was checking data length and was supposed to return the function if the data was out of range. About a year ago, I started modifying code for MISRA C compliance. Since MISRA stipulates only one return; from a function, I modified the serial code from its original form and made a mistake.

This problem appearing depended on many things:
1) The external sensor had to be connected and sending messages.
2) The transmitter and receiver had to get out of sync in a specific way .
3) The incorrect data length value would have to be larger than the holding array for the data.

This is why it was so rare. I imagine I saw this at power up because the two ends of serial communication powered up at different times. Even then it was still rare. Not all versions of the product have this external sensor and never would have exhibited the problem.

Editing the GEL file to prevent reset will be a very valuable tool for troubleshooting in the future.

Thanks,

Traver

0 Adam Haun over 8 years ago in reply to Traver Gumaer1

TI__Expert 8950 points

Wow, that's a nasty bug! Glad you were able to find it!

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28062: ISR Not clearing interrupt flag and other strange after power cycle.