TMS320F28388D: NMI by uncorrectable Error in CM

Simon Schoch

Part Number: TMS320F28388D

Tool/software:

The thread below has been closed, so I'm asking here if there are any results from the internal discussion.

e2e.ti.com/.../tms320f28388d-nmi-by-uncorrectable-error-in-cm

over 1 year ago

0 Simon Schoch over 1 year ago

Prodigy 70 points

As a temporary solution, we simply increased the Flash Wait States (RWAIT) to 4, assuming that this would solve the problem. However, even with this setting, an ECC error occurred once on a target. This should not happen as it generates a reset. I would be glad of any input on how the problem can be definitively solved.

0 Ibukun Olumuyiwa over 1 year ago in reply to Simon Schoch

TI__Genius 11071 points

Hi Simon,

This is strange, and at this point I wonder if there is some other factor that is contributing to this issue. What is the clock source being used to power the PLL? I wonder if some irregularity in the source clock could be causing this. And can we also confirm the PLL configuration and clock dividers used?

Thanks,
Ibukun

0 Simon Schoch over 1 year ago in reply to Ibukun Olumuyiwa

Prodigy 70 points

Hi Ibukun

In the actual case where we have observed the ECC error, the circuit is implemented with a quartz crystal:

In newer board versions, where we have also observed the ECC error, an oscillator is used:

Here is the configuration from the software at 120 MHz:

//
// Multipliers and dividers to configure 120MHz AUXPLL output from 16MHz XTAL
//
#define AUX_IMULT       IMULT_120
#define AUX_REFDIV      REFDIV_4
#define AUX_ODIV        ODIV_4
#define AUX_DIV         AUXPLLRAWCLK_BY_1

#define   SYSCTL_DCC_BASE1              1             1


  //
  // Set up AUXPLL control and clock dividers needed for CMCLK
  //  AUXPLLCLK = (XTAL_OSC) * (IMULT) /(REFDIV) * (ODIV) * (AUXPLLDIV)
  //
  InitAuxPll(XTAL_OSC, AUX_IMULT, AUX_REFDIV, AUX_ODIV, AUX_DIV, SYSCTL_DCC_BASE1);


/**
 * @def TARGET_CMCLKDIV
 *
 * Clock divider for CM clock
 * CM clock =  CM clock / 1
 */
#define TARGET_CMCLKDIV 0U

/**
 * @def TARGET_CMDIVSRCSEL
 *
 * Source Auxillary PLL
 */
#define TARGET_CMDIVSRCSEL 0U

  /* Set the CM Clock to run at 120MHz.
     The CM Clock is a fractional multiple of the AUXPLL Clock (120 MHz) from
     which the USB Clock (60 MHz) is derived. */
  // Configures the divider & the source
  ClkCfgRegs.CMCLKCTL.bit.CMCLKDIV = TARGET_CMCLKDIV;
  ClkCfgRegs.CMCLKCTL.bit.CMDIVSRCSEL = TARGET_CMDIVSRCSEL;

I think the configuration of the Clk for the CM at 120 MHz is correct.
Is there anything else that could be optimized?

Best regards
Simon

0 Ibukun Olumuyiwa over 1 year ago in reply to Simon Schoch

TI__Genius 11071 points

Hello Simon,

Just an observation -- probably doesn't apply to this case: for the XTAL circuit, the load capacitors C_L1 and C_L2 are slightly exceeding the max spec of 24pF.

Here's one thing we can try -- shot in the dark: Instead of configuring ODIV to 4, configure ODIV to 2 and then set your CMCLKDIV to 1 (divide by 2). I'm wondering if doing so will help filter out any potential glitches in the raw PLL clock output.

Best regards,
Ibukun

0 Simon Schoch over 1 year ago in reply to Ibukun Olumuyiwa

Prodigy 70 points

Hi Ibukun

Thanks for the input.
ODIV was configured to 2 and CMCLKDIV to 1. However, the ECC error still occurred.

In addition, I have activated the XCLKOUT on both board versions and output the AUXPLLRAWCLK / 8. The measured jitter is very low in all versions and the duty cycle is 50%. I therefore exclude a problem with the CLK.

When errors occur, the UNC_ERR_ADDR_LOW or UNC_ERR_ADDR_HIGH register points to the address range of the stack.

Best regards

Simon

0 Ibukun Olumuyiwa over 1 year ago in reply to Simon Schoch

TI__Genius 11071 points

Hello Simon,

Thanks, that is good info. I agree we should be able to rule out the clock.

If the address is pointing to the stack, then that implies the error occurred in RAM, not Flash. This is also highly unusual (RAM is zero wait states), but in this case the error should be reflected in the CM_MEMORYERROR_REGS (UCERRFLG, UCM4EADDR etc.). Do you see any error indications here any of the CM memory error registers?

Best regards,
Ibukun

0 Simon Schoch over 1 year ago in reply to Ibukun Olumuyiwa

Prodigy 70 points

Hi Ibukun

In the event of an ECC error, an NMI interrupt is triggered. The following registers are read out in the interrupt routine:
UCERRFLG and UCM4EADDR are 0,
UNC_ERR_ADDR_LOW or UNC_ERR_ADDR_HIGH are set to values close to 0x1FFFCF30 (stack),
Bit UNC_ERR_H or UNC_ERR_L in ERR_STATUS (FLASH0ECC_BASE 0x400FA600U) register are set,
the return address in the current case is always at the same address 0x00235D3C.

I have inserted a NOP before this line of the C code with the address 0x00235D3C. As a result, the error no longer occurred for several hours.

Here is the ASM listing with the built-in NOP on line 1814:

Best regards

Simon

0 Ibukun Olumuyiwa over 1 year ago in reply to Simon Schoch

TI__Genius 11071 points

Hello Simon,

Thanks, I got that. Could you read the CM_MEMORYERROR_REGS when the error happens and show their contents? I want to establish for sure that this is a Flash ECC error and not a RAM ECC error.

Best regards,
Ibukun

0 Simon Schoch over 1 year ago in reply to Ibukun Olumuyiwa

Prodigy 70 points

Hello Ibukun

I'm not in the office at the moment. I'll do that as soon as I'm back in the office, which will take a few days.

Best regards

Simon

0 Simon Schoch over 1 year ago in reply to Ibukun Olumuyiwa

Prodigy 70 points

Hello Ibukun,

Now I'm back and tried to run this firmware in debug mode to read the entire CM_MEMORYERROR_REGS registers, but unfortunately the ECC error does not occur in debug mode.
I also tried to record some additional, interesting registers in Release mode in the event of an error, but the ECC error no longer occurred here either (ERR_STATUS, ERR_POS, ERR_CNT, ERR_INTFLG). However, the ECC error still occurs in the original build.

In the first case where the ECC error occurred, the CM_MEMORYERROR_REGS registers could be read out in debug mode. It looks like this was not a RAM error. Here is the link to this first case:
TMS320F28388D: NMI by uncorrectable Error in CM - C2000 microcontrollers forum - C2000︎ microcontrollers - TI E2E support forums

best regards

Simon

0 Simon Schoch over 1 year ago in reply to Simon Schoch

Prodigy 70 points

Hello Ibukun

Do you have any ideas on how we can make progress here?

Best regards
Simon

0 Ibukun Olumuyiwa over 1 year ago in reply to Simon Schoch

TI__Genius 11071 points

Hello Simon,

Sorry about the delayed response. I was out of office last week, so I have been catching up.

Not sure if we've covered this before, but could we review the board schematic? Want to rule out any potential power-related issues.

Beyond that, I think the next best option would be if we could get a sample .out file that we can either test or try to simulate on our end.

Best regards,
Ibukun

0 Simon Schoch over 1 year ago in reply to Ibukun Olumuyiwa

Prodigy 70 points

Hello Ibukun

I have sent you the board schematic in a private message.

Simon

0 Ibukun Olumuyiwa over 1 year ago in reply to Simon Schoch

TI__Genius 11071 points

Hello Simon,

Just following up on this. I do not see any issues with the schematic.

Now I have recently come across a situation that was similar to this, and the customer was able to resolve it by using a CMCLKDIV = 2 instead of 1. I know we've tried a few combinations before, but can you check this option? (Reduce the ODIV, and increase the CMCLKDIV instead).

Best regards,
Ibukun

0 Simon Schoch over 1 year ago in reply to Ibukun Olumuyiwa

Prodigy 70 points

Hello Ibukun

Thank you for following this up.

I took the software version where the problem occurred after about 20 minutes and set CMCLKDIV = 2, IMULT = 90 and ODIV = 0 (instead of CMCLKDIV = 0, IMULT= 120 and ODIV = 3). The CMCLK is 120MHz again.

The problem reappeared after 30 minutes (ECC_UNC_ERR_ADDR_LOW 0x1FFFCF30).
This does not seem to solve the problem.

Is there any other idea how to solve the problem?

Best regards

Simon

0 Ibukun Olumuyiwa over 1 year ago in reply to Simon Schoch

TI__Genius 11071 points

Hello Simon,

Our next best option is to try to do a design simulation of your code. Are you able to share a .out file that we can use to investigate?

Thanks,
Ibukun

0 Simon Schoch over 1 year ago in reply to Ibukun Olumuyiwa

Prodigy 70 points

Hello Ibukun,

To build an .out file with the ECC error that runs on the development board takes some work on our part. At the moment we don't have the time. I'll get back to you as soon as we can do it.

Best regards

Simon

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28388D: NMI by uncorrectable Error in CM