This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28374D: NMI(FLUNCERR) happening from time to time

Part Number: TMS320F28374D
Other Parts Discussed in Thread: UNIFLASH

HI everybody ,

We are facing with a rarely occurring NMI(FLUNCERR) on a TMS320F28374D.

We are experiment several MCU reset due to NMI watchdog and, after a preliminary investigation, it seems that the reason is an uncorrectable flash ECC error.

Looking at the internal register we found that, after a reset event, the FLUNCERR bit in the NMISHDFLAG resister was set.

Additional information are:

  1. 1.       We are deeply using both CPUs to create two independent controllers.
  2. 2.       GSRAM banks 8/9/10/11 are assigned to cpu2.

what  can I do ?  how to fix this ? 

regards

Carlo

  • Carlo,

    Is it happening on every device?  If yes, it can be an issue from programming an incorrect ECC. Please check and let me know.

    They can check the address where the error happened (see flash ECC registers) and see whether or not it matches their production image (assuming the location did not get programmed again during the application run time or in the field).

    Thanks and regards,

    Vamsi

  • Hi Vamsi,

    First of all thanks for your answer.

    It seems happening more or less on every device. It usually happen several hours after power on (typically after 8-10hours), but it could happen also after few minutes.

    About programming; we have our own boot loader; problem arises when we are in application.

    boot loader is programmed using jtag connection, Blackhawk and uniflash

    application is programmed using proprietary protocols with and RS485 communication.

    Regards

    Emanuele

  • Hi,

    I have also recompiled the project to work without bootloader and the problem is still visible; the device was reset after 10 hours.

    Regards

    Emanuele

  • Emanuele,

    Thank you for the info.

    Are you seeing the ECC failure at the same location in all the devices?

    If yes, it might be that the ECC is programmed incorrectly by your custom loader.

    In the failing devices, you can simply load a code in RAM to read the entire Flash with ECC check enabled.  And monitor the ECC registers to see the address at which error occurs.  You can narrow down the issue once you find the error location.

    Does your application program flash at run time as well?

    Is the issue seen in CPU1 Flash or CPU2 Flash?

    Thanks and regards,
    Vamsi

  • Vamsi,

    I have 3 devices under test in this moment.

    Next time a fault will occur I should be able to read the address who generates the problem.

    This problem is visible on CPU1.

    About program load; we are experiment this issue also without our bootloader.

    We can recompile the code to work without custom bootloader, so we load it using uniflash or CCS; for this reason I think that we could exclude our custom loader as root cause.

    Regards

    Emanuele

  • Emanuele,

    Thank you for the info.  It is important to know that the application failed even when it is programmed with CCS or UniFlash.  As discussed earlier, I feel it is important to know the address where the ECC error is occuring.  This helps you to identify the root cause.

    Here are a few things to consider:

    1. Did you make any recent changes to your application?  You need to concentrate on those changes since the application started to fail recently.  

    2. Once you program a device using CCS or UniFlash (and before running the application), load a simple code to RAM to read the entire Flash with ECC enabled.  This helps to know if the error is coming after the application execution or even before that.

    3. Do you have any checksum test (not at runtime) for your application to make sure that the application image in Flash is intact?  If yes, did checksum test pass on the failed devices?  You might want to run the test even on the devices immediately after programming.

    4. Does your application embed Flash API in it?  If yes, does the application call Flash API to erase and/or program the Flash at run time?

    5. Do you program DCSM USER OTP?  If yes, please check OTP as well for #2 and #3 listed above.

    6. You may already know this, but just in case: In your application linker cmd file, do you have any initialized sections mapped to RAM?  If yes, please note that all the initialized sections should be mapped to Flash.  If needed, they can be copied to RAM at runtime using memcpy() before accessing/executing them from RAM.  

    7. Please make sure that the flash wait-states are configured correctly as per the operating frequency.

    Thanks and regards,
    Vamsi   

  • HI Vamsi ,

    all points checked  with no success .

    please could you kindly elaborate POIN6 ? 

    it is not so clear

    thank you

    best regards

    Carlo

  •  Hi Vasmi,

    I'm still waiting for a new NMI event oin the setups under test.

    About Flash configuration, here you can find a screen shoot of internal register

    Regards

    Emanuele

  • Carlo,

    Can you explain what it means no success?  Please provide more details.

    In the application linker command file, is there anything mapped to RAM?  It is ok to map uninitialized sections (ex:.ebss, .stack, .esysmem) to RAM.  However, initialized sections (like .text, .cinit, .econst, .switch, .reset) should be mapped to Flash only.  If you need to execute any code from RAM, you still need to map it to Flash for load and later copy it to RAM at runtime using memcpy().  You know about ramfuncs (or .TI.ramfunc) - correct?

    Thanks and regards,

    Vamsi

  • Emanuele,

    Ok, please let us know once you find out the address where it is failing and what that address belongs to in your application.  

    Regarding the Flash configuration:  I reviewed and it looks fine. Next time, if you want us to review anything like this, please take clear snapshot showing all the bit-field values in hex format. That makes it easy to review.  I converted all the decimal values to hex and reviewed this time.  Thank you for understanding.

    Thanks and regards,

    Vamsi

  • Hi,

    We experiment a new fault in one of the setup under test.

    We captured the address who generates the fault in this case:

    • register address 0x5FB06 UNC_ERR_ADDR_LOW = 0x0000000
    • register address 0x5FB08 UNC_ERR_ADDR_HIGH = 0x000701FC

    Regards

    Emanuele

  • Emanuele,

    That specific address is located in a section of TI-OTP Flash that includes intentional ECC errors so that safety-oriented applications can verify the functionality of the ECC logic.

    -Tommy

  • Tommy,

    The question is now why this happens.

    1. Could it be a problem with a pointer?
    2. Could it be a flash programming issue?

    I'm preparing another test:

    I will add 2 watch points (read/write) to location 0x701FC with Blackhawk connected. I hope to find where this wrong access happens.

    NMI is not raised in this condition, but if there is a memory access to this location, CPU should halt.

    There is another interesting point; I found another post with more or less the same problem:

    https://e2e.ti.com/support/microcontrollers/c2000/f/171/t/767132?TMS320F28375D-Rarely-occuring-NMI-FLUNCERR-on-28375D

    Same event and same error address;  0x701FC...

    Do you know how they solved? there are not other info.

    Regards

    Emanuele

     

  • Emanuele,

    Happy to work with you and your team on a long productive debug call.  Hope it helped you to identify the debug solutions.  As aligned with your team offline, I am considering this is closed.

    Here are some of the details for anyone that may refer to this post:

    1. NMI event did not happen in debugger connected case since NMI was left disabled.  When working with debugger, before testing the application after load, we suggest users to do a debug reset, free-run BROM, restart and then execute.  This will help to bring the execution close to standalone execution.  If BROM is not executed, NMI will not get enabled.

    2. This debug has nothing to do with flash programming API or tools.  There is no issue identified with the quality of the programmed content.  Customer is not using flash API.      

    3. ECC is programmed and ECC-check is enabled in the application correctly.

    4. User application is not reading TI-OTP intentionally and is happening due to a design issue in the application.  It will be updated to not read intentional ECC error locations in C28x TI-OTP (0x701F8 - 0x701FF).  We program them in TI-OTP for any customers that would like to check the health of the SECDED logic at run-time without actually inserting errors in the application image.    

    Thanks and regards,
    Vamsi