Tool/software:
The thread below has been closed, so I'm asking here if there are any results from the internal discussion.
e2e.ti.com/.../tms320f28388d-nmi-by-uncorrectable-error-in-cm
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software:
The thread below has been closed, so I'm asking here if there are any results from the internal discussion.
e2e.ti.com/.../tms320f28388d-nmi-by-uncorrectable-error-in-cm
As a temporary solution, we simply increased the Flash Wait States (RWAIT) to 4, assuming that this would solve the problem. However, even with this setting, an ECC error occurred once on a target. This should not happen as it generates a reset. I would be glad of any input on how the problem can be definitively solved.
Hi Simon,
This is strange, and at this point I wonder if there is some other factor that is contributing to this issue. What is the clock source being used to power the PLL? I wonder if some irregularity in the source clock could be causing this. And can we also confirm the PLL configuration and clock dividers used?
Thanks,
Ibukun
Hi Ibukun
In the actual case where we have observed the ECC error, the circuit is implemented with a quartz crystal:

In newer board versions, where we have also observed the ECC error, an oscillator is used:

Here is the configuration from the software at 120 MHz:
//
// Multipliers and dividers to configure 120MHz AUXPLL output from 16MHz XTAL
//
#define AUX_IMULT IMULT_120
#define AUX_REFDIV REFDIV_4
#define AUX_ODIV ODIV_4
#define AUX_DIV AUXPLLRAWCLK_BY_1
#define SYSCTL_DCC_BASE1 1 1
//
// Set up AUXPLL control and clock dividers needed for CMCLK
// AUXPLLCLK = (XTAL_OSC) * (IMULT) /(REFDIV) * (ODIV) * (AUXPLLDIV)
//
InitAuxPll(XTAL_OSC, AUX_IMULT, AUX_REFDIV, AUX_ODIV, AUX_DIV, SYSCTL_DCC_BASE1);
/**
* @def TARGET_CMCLKDIV
*
* Clock divider for CM clock
* CM clock = CM clock / 1
*/
#define TARGET_CMCLKDIV 0U
/**
* @def TARGET_CMDIVSRCSEL
*
* Source Auxillary PLL
*/
#define TARGET_CMDIVSRCSEL 0U
/* Set the CM Clock to run at 120MHz.
The CM Clock is a fractional multiple of the AUXPLL Clock (120 MHz) from
which the USB Clock (60 MHz) is derived. */
// Configures the divider & the source
ClkCfgRegs.CMCLKCTL.bit.CMCLKDIV = TARGET_CMCLKDIV;
ClkCfgRegs.CMCLKCTL.bit.CMDIVSRCSEL = TARGET_CMDIVSRCSEL;
I think the configuration of the Clk for the CM at 120 MHz is correct.
Is there anything else that could be optimized?
Best regards
Simon
Hello Simon,
Just an observation -- probably doesn't apply to this case: for the XTAL circuit, the load capacitors CL1 and CL2 are slightly exceeding the max spec of 24pF.
Here's one thing we can try -- shot in the dark: Instead of configuring ODIV to 4, configure ODIV to 2 and then set your CMCLKDIV to 1 (divide by 2). I'm wondering if doing so will help filter out any potential glitches in the raw PLL clock output.
Best regards,
Ibukun
Hi Ibukun
Thanks for the input.
ODIV was configured to 2 and CMCLKDIV to 1. However, the ECC error still occurred.
In addition, I have activated the XCLKOUT on both board versions and output the AUXPLLRAWCLK / 8. The measured jitter is very low in all versions and the duty cycle is 50%. I therefore exclude a problem with the CLK.
When errors occur, the UNC_ERR_ADDR_LOW or UNC_ERR_ADDR_HIGH register points to the address range of the stack.
Best regards
Simon
Hello Simon,
Thanks, that is good info. I agree we should be able to rule out the clock.
If the address is pointing to the stack, then that implies the error occurred in RAM, not Flash. This is also highly unusual (RAM is zero wait states), but in this case the error should be reflected in the CM_MEMORYERROR_REGS (UCERRFLG, UCM4EADDR etc.). Do you see any error indications here any of the CM memory error registers?
Best regards,
Ibukun
Hi Ibukun
In the event of an ECC error, an NMI interrupt is triggered. The following registers are read out in the interrupt routine:
UCERRFLG and UCM4EADDR are 0,
UNC_ERR_ADDR_LOW or UNC_ERR_ADDR_HIGH are set to values close to 0x1FFFCF30 (stack),
Bit UNC_ERR_H or UNC_ERR_L in ERR_STATUS (FLASH0ECC_BASE 0x400FA600U) register are set,
the return address in the current case is always at the same address 0x00235D3C.
I have inserted a NOP before this line of the C code with the address 0x00235D3C. As a result, the error no longer occurred for several hours.
Here is the ASM listing with the built-in NOP on line 1814:

Best regards
Simon
Hello Simon,
Thanks, I got that. Could you read the CM_MEMORYERROR_REGS when the error happens and show their contents? I want to establish for sure that this is a Flash ECC error and not a RAM ECC error.
Best regards,
Ibukun
Hello Ibukun
I'm not in the office at the moment. I'll do that as soon as I'm back in the office, which will take a few days.
Best regards
Simon
Hello Ibukun,
Now I'm back and tried to run this firmware in debug mode to read the entire CM_MEMORYERROR_REGS registers, but unfortunately the ECC error does not occur in debug mode.
I also tried to record some additional, interesting registers in Release mode in the event of an error, but the ECC error no longer occurred here either (ERR_STATUS, ERR_POS, ERR_CNT, ERR_INTFLG). However, the ECC error still occurs in the original build.
In the first case where the ECC error occurred, the CM_MEMORYERROR_REGS registers could be read out in debug mode. It looks like this was not a RAM error. Here is the link to this first case:
TMS320F28388D: NMI by uncorrectable Error in CM - C2000 microcontrollers forum - C2000︎ microcontrollers - TI E2E support forums
best regards
Simon
Hello Ibukun
Do you have any ideas on how we can make progress here?
Best regards
Simon
Hello Simon,
Sorry about the delayed response. I was out of office last week, so I have been catching up.
Not sure if we've covered this before, but could we review the board schematic? Want to rule out any potential power-related issues.
Beyond that, I think the next best option would be if we could get a sample .out file that we can either test or try to simulate on our end.
Best regards,
Ibukun
Hello Simon,
Just following up on this. I do not see any issues with the schematic.
Now I have recently come across a situation that was similar to this, and the customer was able to resolve it by using a CMCLKDIV = 2 instead of 1. I know we've tried a few combinations before, but can you check this option? (Reduce the ODIV, and increase the CMCLKDIV instead).
Best regards,
Ibukun
Hello Ibukun
Thank you for following this up.
I took the software version where the problem occurred after about 20 minutes and set CMCLKDIV = 2, IMULT = 90 and ODIV = 0 (instead of CMCLKDIV = 0, IMULT= 120 and ODIV = 3). The CMCLK is 120MHz again.
The problem reappeared after 30 minutes (ECC_UNC_ERR_ADDR_LOW 0x1FFFCF30).
This does not seem to solve the problem.
Is there any other idea how to solve the problem?
Best regards
Simon
Hello Simon,
Our next best option is to try to do a design simulation of your code. Are you able to share a .out file that we can use to investigate?
Thanks,
Ibukun
Hello Ibukun,
To build an .out file with the ECC error that runs on the development board takes some work on our part. At the moment we don't have the time. I'll get back to you as soon as we can do it.
Best regards
Simon