This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28386S: Debugging the NMI on CM core issue

Part Number: TMS320F28386S

Tool/software:

Good evening!

During my project development I use both CPU1 and CM cores, and I started catching the NMI on CM core. Browsing registers I found the NMI cause is the FLASH uncorrected error, and the FLASHECC registers shows UNC_ERR_L bit set in ERR_STATUS, and UNC_ERR_ADDR sometimes iz 0, and sometimes shows some address on stack. ERRORLOG and DIAGERRORLOG registers are all 0. Exiting the NMI handler I see the NMI occured in the same code locations, but there nothing suspitious in this code, this is just a loop thru some const array of structs finding needed entry. So I beleave this fault is some side effect of my code executed before, but I do not understand how to find the cause.

UPDATE: disabling FLASH ECC solves the problem and no problems with code after that. But this is not good.

  • Hi,

    CMNMIFLG show FLUNCERR bit set ?

    Can you find the uncorrectable error using the CM_FLASH_ECC_REGS and information below:  

    Also referring other threads on this forum I found this thread - https://e2e.ti.com/support/microcontrollers/c2000-microcontrollers-group/c2000/f/c2000-microcontrollers-forum/823580/ccs-tms320f28388d-flash-uncorrectable-error-nmi-flag-fluncerr 

    Thanks

  • CMNMIFLG show FLUNCERR bit set ?

    Yes, I got NMI with this flag.

    A I said before, the UNC_ERR_ADDR_LOW regidter contains sometimes 0, sometimes some address in stack area, which is in C0RAM. UNC_ERR_L and UNC_ERR_INTFLG bits are set.

  • Hi Oleg,

    there nothing suspitious in this code, this is just a loop thru some const array of structs finding needed entry. So I beleave this fault is some side effect of my code executed before, but I do not understand how to find the cause.

    When you comment out the code in this loop, do you still get the NMI/Flash uncorrectable error?

    Exiting the NMI handler I see the NMI occured in the same code locations

    How are you checking the error code location after exiting the NMI handler?

    I would suggest the following:

    • Check your section alignment in the linker command file (use ALIGN(8))
    • Verify proper flash wait states for your operating frequency
    • Check your system clock stability to make sure there are no glitches or issues related to the clock

    Best Regards,

    Marlyn

  • When you comment out the code in this loop, do you still get the NMI/Flash uncorrectable error?

    Unfortunately, as code grows, I cannot reproduce the issue now.

    How are you checking the error code location after exiting the NMI handler?

    After I set the breakpoint in NMI handler, I can skip forever loop and exit it, revealing the code where it occured. 

    Check your section alignment in the linker command file (use ALIGN(8))

    Yes, there was no ALIGN keyword. This was default cmd file for the device.

    Check your system clock stability to make sure there are no glitches or issues related to the clock

    The program now runs on F28388D controlCard, so I thing it is ok.

    Verify proper flash wait states for your operating frequency

    What is the waitstates value required for 125MHz? Now it is 2.

  • The device reset by NMI once, after couple of hours. But I had no breakpoint in NMI handler. Will wait. After I catch it again, I'll try to comment the code where it is happend and wait more. So, ALIGN(8) not helped.

  • Hi Oleg,

    What is the waitstates value required for 125MHz? Now it is 2.

    For 125MHz, the required wait state should be 3. Can you please set RWAIT to 3?

    After I catch it again, I'll try to comment the code where it is happend and wait more.

    Yes, please let me know if after changing the wait states you run into this issue again.

    Best Regards,

    Marlyn

  • Wait states changed to 3, same effect.

    Also sometimes I got error when reloading program:

    Cortex_M4_0: Error occurred during flash operation: Timed out waiting for target to halt while executing pwrite_en.alg
    Cortex_M4_0: Error occurred during flash operation: Timed out waiting for target to halt while executing pwrite_en.alg
    Cortex_M4_0: Error occurred during flash operation: Timed out waiting for target to halt while executing pwrite_en.alg
    Cortex_M4_0: Error occurred during flash operation: Timed out waiting for target to halt while executing pwrite_en.alg
    Cortex_M4_0: Trouble Removing Breakpoint with the Action "Remain Halted" at 0x20004118: (Error -2044 @ 0x96) Internal error: Requested breakpoint does not exist. Restart the application. If error persists, please report the error. (Emulation package 20.0.0.3178) 
    Cortex_M4_0: Error occurred during flash operation: Timed out waiting for target to halt while executing erasew.alg
    Cortex_M4_0: Flash Programmer: Error erasing Sector 1. Operation Cancelled (1).
    Cortex_M4_0: File Loader: Memory write failed: Unknown error
    Cortex_M4_0: GEL: File: C:\WorkspaceECOM\CD_CM\Debug\CD_CM.out: Load failed.
    Cortex_M4_0: Error occurred during flash operation: Timed out waiting for target to halt while executing pwrite_en.alg
    Cortex_M4_0: Error occurred during flash operation: Timed out waiting for target to halt while executing pwrite_en.alg
    Cortex_M4_0: Trouble Removing Breakpoint with the Action "Remain Halted" at 0x20004118: (Error -2044 @ 0x96) Internal error: Requested breakpoint does not exist. Restart the application. If error persists, please report the error. (Emulation package 20.0.0.3178) 
    

    But second relead (Ctrl-Alt-R) passes.

    I will try to comment out code now.

  • So, commenting out the code removes the issue. So, the code is as follows:

    I have the const resource table in externally generated file. The record is defined as

    typedef struct {
    	char name[256];
    	int id;
    	int offset;
    	int size;
    	int stype;
    	int width;
    	int height;
    } EVEMAP;
    

    Table:

    const EVEMAP edfmap[] = {
    {"default-fl.blob", 0, 0, 4096, 0, 0, 0},
    {"NotoSansMono-VariableFont.glyph", 1, 4096, 24320, 0, 0, 0},
    {"verdana.glyph", 2, 28416, 36480, 0, 0, 0},
    {"NotoSansMono-VariableFont.xfont", 3, 64896, 311, 0, 0, 0},
    {"NotoSansMono-VariableFont.xfont.padding", 4, 65207, 9, 0, 0, 0},
    {"verdana.xfont", 5, 65216, 311, 0, 0, 0},
    {"verdana.xfont.padding", 6, 65527, 9, 0, 0, 0},
    {"LED-fail.raw", 7, 65536, 256, 37815, 32, 32},
    {"LED-off.raw", 8, 65792, 256, 37815, 32, 32},
    {"LED-on.raw", 9, 66048, 256, 37815, 32, 32},
    {"advion.raw", 10, 66304, 8640, 37815, 480, 72},
    {"back-icon.raw", 11, 74944, 256, 37815, 32, 32},
    {"bs-icon.raw", 12, 75200, 256, 37815, 32, 32},
    {"cancel-icon.raw", 13, 75456, 256, 37815, 32, 32},
    {"ecom-logo.raw", 14, 75712, 8640, 37815, 480, 72},
    {"eth.raw", 15, 84352, 256, 37815, 32, 32},
    {"expand-icon.raw", 16, 84608, 144, 37815, 24, 24},
    {"expand-icon.raw.padding", 17, 84752, 48, 0, 0, 0},
    {"gilson.raw", 18, 84800, 8640, 37815, 480, 72},
    {"home-icon.raw", 19, 93440, 256, 37815, 32, 32},
    {"io-icon.raw", 20, 93696, 256, 37815, 32, 32},
    {"lcprocess.raw", 21, 93952, 8640, 37815, 480, 72},
    {"leak-icon.raw", 22, 102592, 256, 37815, 32, 32},
    {"left-icon.raw", 23, 102848, 256, 37815, 32, 32},
    {"lilichro.raw", 24, 103104, 8640, 37815, 480, 72},
    {"menu-icon.raw", 25, 111744, 256, 37815, 32, 32},
    {"okay-icon.raw", 26, 112000, 256, 37815, 32, 32},
    {"remote-icon.raw", 27, 112256, 256, 37815, 32, 32},
    {"right-icon.raw", 28, 112512, 256, 37815, 32, 32},
    {"semba.raw", 29, 112768, 8640, 37815, 480, 72},
    {"shift-icon.raw", 30, 121408, 256, 37815, 32, 32},
    {"shrink-icon.raw", 31, 121664, 144, 37815, 24, 24},
    {"shrink-icon.raw.padding", 32, 121808, 48, 0, 0, 0},
    {"welch-logo.raw", 33, 121856, 8640, 37815, 480, 72},
    {"wufeng.raw", 34, 130496, 8640, 37815, 480, 72},
    {"xenon-off.raw", 35, 139136, 256, 37815, 32, 32},
    {"xenon-on.raw", 36, 139392, 256, 37815, 32, 32},
    {"zivak.raw", 37, 139648, 8640, 37815, 480, 72},
    {"zoomall-icom.raw", 38, 148288, 144, 37815, 24, 24},
    {"zoomall-icom.raw.padding", 39, 148432, 48, 0, 0, 0},
    {"version.txt", 42, 148544, 40, 0, 0, 0},
    {"version.txt.padding", 43, 148584, 24, 0, 0, 0},
    {""}};
    

    And NMI is catched in the resource find routine:

    EVEMAP *findEDFObject(const char *name)
    {
    	int i = 0;
    	while(edfmap[i].name[0])
    	{
    		if(strcmp(edfmap[i].name, name) == 0) // <- NMI
    		{
    			return edfmap + i;
    		}
    		i++;
    	}
    	return 0;
    }
    

    Or here:

    EVEMAP *flashBitmapById(uint16_t id)
    {
        return edfmap + 5;
    	int i = 0;
    	while(edfmap[i].name[0])
    	{
    		if(edfmap[i].id == id) // <- NMI
    		{
    			Cmd_SetBitmap(RAM_FLASH + edfmap[i].offset / 32, edfmap[i].stype, edfmap[i].width, edfmap[i].height);
    			return edfmap + i;
    		}
    		i++;
    	}
    	return 0;
    }
    

  • Hi Oleg,

    Are you modifying flash in the flashBitmapById() function? If so, make sure you are following the flash programming guidelines highlighted in the flash API guide for this device.

    Kind regards,

    Skyler

  • Hello.

    You see all the code above.

    This functions are for external device's flash operations, internal flash never modified.

  • Hi Oleg,

    Oops, I meant to ask about the Cmd_SetBitmap() function, not the flashBitmapById() function :/

    For now, I will assume that it doesn't modify internal flash either. Can you verify that internal flash isn't being modified at any point during run-time? Does this error occur if you move these functions to a different area of flash?

    Kind regards,

    Skyler

  • Hi Oleg, 

    Can you try moving the code to a different area of flash?

    Kind regards,

    Skyler

  • I confirm internal flash is NEVER is modified in any point of code.

  • The code is constantly modified, so it moves along the flash area. Somyimes it takes hours to catch the issue, somtimes it comes in minute. But anyway, it happends, if ECC check is on, and it happends exactly in this place, BTW, the counter "i" is somethere in the middle of array when it happends.

  • Hi Oleg,

    Let me look into this more and provide an update tomorrow.

    Kind regards,

    Skyler

  • Hi Oleg,

    Can you make sure that the function triggering the NMI is not within the flash address range specified in the Prefetching Beyond Valid Memory section of the Errata? Here is another E2E thread discussing how this issue could appear. Alternatively, you could disable prefetch with the Flash_disableDataCache() and Flash_disableProgramCache() functions from the cm_driverlib.

    Kind regards,

    Skyler

  • Yes, I am sure, it is in the default .text section, which is defined really far from the end of flash. Anyway, the special section covering this errata is defined in the .cmd file by default.

  • Hi Oleg,

    Okay, thank you for the info. There is an ongoing internal debug for an issue that I believe is the same as what you are seeing, I'll provide an update when we determine the root cause.

    Kind regards,

    Skyler

  • Hi Oleg,

    1. Does the code continue to execute normally after the NMI if you handle the interrupt? We want to confirm if this issue breaks the functionality of the code. 
    2. Can you provide some more code? It would be helpful for our debug if you could send the project or at least the function that calls the findEDFObject() and flashBitMapbyID() functions.
    3. Can you also send the map file of the project? If not, can you provide the addresses at which these two functions are loaded/run from?

    Kind regards,

    Skyler

  • Yes, it continues to execute normally until next NMI.

    Tomorrow I will provide the rest.

  • Hi Oleg,

    Okay, thank you!

    -Skyler