This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28379D: Unknown reason for non-maskable watchdog reset

Part Number: TMS320F28379D

Dear TI experts,

I am trying to figure out the reason for a non-maskable watchdog reset (NMIWDRSn). Both CPU1 and CPU2 receive NMIWDRSn. CPU1 due to an NMI of CPU2, but there is no obvious reason for the NMI on CPU2, since the NMISHDFLG register of CPU2 is 0x0. Do you have any idea, why CPU2 gets a non-maskable watchdog reset?

Thanks,

Sieghard

  • Sieghard,

         I am researching this issue. How often do you see this?

  • Sieghard,

    Can you try to log the BOOT STATUS flags at startup?  If the device reboots after a NMIWD reset, it is possible for the boot ROM to clear recoverable NMI errors.  The bits are defined in the TRM:

    -Tommy

  • Hi Tommy,

    I wrote an NMI-handler which stores the NMI-cause in the HIBBOOTMODE register, and acknowledges the NMI-interrupt. The result is, that no NMIWD-reset is triggered, and that the cause for the NMI can be seen: The "FLUNC"-bit (flash uncorrectable error) is set on CPU2. Since in my application CPU2 is responsable for the non-volatile error handling, I erased the corresponding flash sectors, with the result that no more NMI's are triggered on CPU2.

    I have two questions:

    Is it possible, that voltage drop during an active flash write cycle can cause an uncorrectable flash error?

    Is is possible to trigger non maskable interrupts deliberately? I need this to test the NMI-handler.

    I also had a look at the BOOT-STATUS word you proposed, and it looks OK to me on both CPUs. Unfortunately it reflects the status of a correctly working system, with no NMI pending, since the suspicious flash sectors have already been erased.

    Best regards,

    Sieghard

  • Hi Hareesh,

    I saw this reproducably, every time the hardware booted. I found out, that the reason for the NMI is an uncorrectable flash error on CPU2. An NMI-handler stores the NMI-flag register into the HIBBOOTMODE register, which is not altered after an NMIWD-reset.

    Best regards,

    Sieghard

  • Sieghard,

    Please check if this is applicable for your debug:  Search for "Why is a NMI (due to double bit errors) occurring when Fapi_setActiveFlashBank() is called on F2837xD CPU2?" in flash wiki at https://processors.wiki.ti.com/index.php/C2000_Flash_FAQ 

    Regarding forcing NMI: Yes, it can be forced.  Please look at NMIFLGFRC in TRM.

    Thanks and regards,

    Vamsi

  • Sieghard Kotz said:
    Is it possible, that voltage drop during an active flash write cycle can cause an uncorrectable flash error?

    My understanding is that unstable voltage during flash programming can result in data corruption, so it seems reasonable that a voltage drop during flash write might result in uncorrectable errors.

    Sieghard Kotz said:
    Is is possible to trigger non maskable interrupts deliberately? I need this to test the NMI-handler.

    In addition to Vamsi recommendation, there is also a STL_Flash_detectECCError() function that is used in the diagnostics library to generate real Flash ECC errors.

  •  Sieghard,

    Regarding the voltage dip:  Did you notice flash operation (erase/program/verify) failure?  If there is a voltage dip during flash programming, flash operation should fail.  Since you did not indicate any such failure and based on the info given in this thread, I felt that ECC is not programmed correctly for the main array (which will cause errors when read).  Hence, I first thought of clearing any application design issue before going to the voltage dip debug. 

    Regarding the NMI ISR check: Are you trying to make sure that his NMI ISR is designed/configured correctly to catch and identify specific reason for NMI?  OR Are you trying to implement a run-time diagnostic test for SECDED module? Please clarify.

    Thanks and regards,

    Vamsi

  • Hi Vamsi,

    I am trying to assure, that the NMI-ISR is working correctly. Thanks for the hint with the NMIFLGFRC register, it works fine and the NMI-ISR is executed as expected, although one has to set the NMIE to '1' in the NMICFG-register when loading the code with the debugger. I assume the BOOTROM sequence sets the NMIE to '1'. This explains why I was able to work with the debugger without getting an NMI.

    Regarding the voltage drop: I did not notice any erase/program/verify- failure. The suspicious flash sectors do not contain any executable code. They contain information about the system when certain conditions are met. I assume, that a power cycle during a write or erase operation on these flash sectors caused an ECC-error.

    Thanks for your help,

    Sieghard

  • Hi Tommy,

    thanks for the hint to use the STL_Flash_detectECCError() function. I need to check that. Basically the systems behaviour is now much clearer to me. You need to have an NMI-ISR which deals with incoming NMIs. Storing the NMIFLG-register in the NMI-ISR showed me, that the reason for the NMI was a FLUNCERR (flash uncorrectable error). After erasing the flash sectors that I had in mind to cause the problem, actually no more NMIs were fired.

    Thanks for your support and best regards,

    Sieghard

  • Sieghard,

    Yes, BROM enables the NMIE bit.

    Thanks and regards,

    Vamsi