This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28377S: Code placed at end of flash bank 0 causes NMI interrupt (in 0x000B FFF0 - 0x000B FFFF range)

Part Number: TMS320F28377S


Hi 

We have an in production product that causes us some problems. It is based on the TI 28377s microcontroller.

We are using TI v18.1.4.LTS toolchain, SYS/BIOS 6.75.0.15, XDCtools 3.51.1.18_core.

If I am using the JTAG everything seems to be working fine, so maybe the GEL file is doing some magic we are not. It seems the error comes and goes with code changes, so the error is not always located in this piece of code.

I have managed to reproduce the error by having a while loop early in the boot process (WD disabled) then connecting to the running target and continue the "boot" process. With this I managed to catch the NMI interrupt and look at the call stack. Below is a picture of this:

As you maybe have noticed is that the failure seems to occur around the memcpy in getCrcFromSection() function. This normally works, if I use the JTAG with the standard 28377s gel file to init the CPU, so this code should be fine.

What I have noticed is that this code is placed across the 0x000C 0000 boundary, which is the beginning of flash bank 1. Currently we have one big flash range in our linker cmd file that covers both flash bank 0 and 1 (this will probably change). I have looked at the errata for the CPU (sprz422i) for the advisory on "Memory: Prefetching Beyond Valid Memory" and according to this, there are not limitations on the boundary between these two flash banks for this CPU.

We have initialized both flash banks with the InitFlash_Bank1() and InitSysCtrl() ( latter calls InitFlash_Bank0() )

I have tried to set a breakpoint just before the sectionFromName() and the entire call stack and variables seems fine. When I step over the sectionFromName/memcpy(), the NMI interrupt is triggered and the call stack looks as pictured earlier.

Is what we are seeing a pipeline issue related to the above mentioned errata or something else?

An update:
I have tried manually placing the getCrcFromSection() function before and after the boundary between flash bank 0 and 1.

If I completely avoid the 0x000B FFF0 - 0x000B FFFF range, which should NOT be necessary for this CPU according to the errata, everything seems to be working fine.

If I place part of the getCrcFromSection() function in the above range, a NMI exception occurs.

Is the errata wrong?

  • Hi Mads Lind,

    Thank you for bringing this to our attention.

    Gel file disables ECC.  Did you notice an ECC error and/or ITRAP when this failure occurs?  Also, please check the values of RESC, NMIFLG and flash ECC registers (UNC_ERR_ADDR_LOW and UNC_ERR_ADDR_HIGH in TRM).

    I will check with the advisory author to confirm whether the advisory is applicable for the end of bank0 or not (even though there is another bank immediately, the flash wrapper is different - I will ask in this context).  Hope you configured the wait-states correctly for both banks.

    Thanks and regards,

    Vamsi

  • Hi Vamsi

    Many thanks for the quick response.

    I just checked the registers you suggested.

    NMIFLG = 0x0009:   NMIINT and FLUNCERR (Flash Uncorrectable Error NMI Flag)

    RESC = 0xC0000000: TRSTn_pin_status and XRSn_pin_status are 1, I believe this is normal, normal power on.

    Flash0EccRegs:
    UNC_ERR_ADDR_LOW = 0x000C0000
    UNC_ERR_ADDR_LOW = 0

    Flash1EccRegs:
    UNC_ERR_ADDR_LOW = 0
    UNC_ERR_ADDR_LOW = 0

    So does it look like the Flash bank 0 causes fault at the start address of Flash Bank 1? 
    Maybe because as you say the flash wrapper is different?

    Both flash banks are configured correctly.

  • Mads Lind,

    Thank you for checking those registers.

    Instead of executing the application, can you load simple code in to RAM to read the entire flash (both banks) with ECC enabled?  Pleae don't erase the application in flash - Keep it as is and just check if ECC errors show up for reads.

    This helps to debug this further.

    Thanks and regards,
    Vamsi

  • Hi Vamsi

    I added a function in ram to the application, so not a separate application. Hope this is good enough.

    I executed it after running InitSysCtrl etc. so that the ramfuncs has been copied from flash to ram.

    #ifdef __TI_COMPILER_VERSION__
        #if __TI_COMPILER_VERSION__ >= 15009000
            #pragma CODE_SECTION(".TI.ramfunc");
        #else
            #pragma CODE_SECTION("ramfuncs");
        #endif
    #endif
    void flashReadTest()
    {
        volatile uint16_t *flash_start = (uint16_t *) 0x00080000;
        volatile uint16_t *flash_end   = (uint16_t *) 0x00100000;
    
        uint32_t sum = 0;
    
    //    ESTOP0;
    
        while (flash_start != flash_end)
        {
            sum += *flash_start++;
        }
    
    //    ESTOP0;
    }

    The function executed without any problems.

  • Mads Lind,

    Thank you for the test and result.

    Is ECC check enabled during this test execution for both flash wrappers?

    While we discuss with our design team on this, for now, please avoid the last 256 bits of the bank0 in your linker cmd file.  

    Thanks and regards,
    Vamsi

  • Hi Vamsi

    Yes, ECC is enabled for both flash banks.
    So the conclusion could be that DATA read is OK for the entire flash area, but program execution is not for end of flash bank 0.

    I have now taken measures to avoid the area in our linker file.

    Many thanks for quick support :-)

  • Mads Lind,

    Thank you for the confirmation.

    I will update here once I get more details from our design team.  I am closing this post for now.

    Clarification: 

    For fetches: When prefetch is enabled, FMC0 fetches beyond bank0 space in to bank1 - which is not accessible for FMC0 and hence fails.

    Data reads: Data cache fills does not go beyond the current 128-bit flash word (aligned on 128-bit boundary).  Hence, data errors should not cause an issue.  

    Above is what I wanted to confirm from your experiment - to establish that this issue is due to prefetch beyond Bank0 by FMC0 and not from a real ECC error (ECC error that you got is not an actual ECC error - it is due to prefetch giving incorrect opcode).

    Best regards,

    Vamsi

     

  • FYI for others that may refer to this post: The last 256-bits of both bank0 and bank1 should be reserved in this device.  We will update errata to reflect the same.

    Thanks and regards,
    Vamsi

  • Mads Lind,

    Do you have any further questions on this?  Or Can I close this post?  Please let me know.

    Thanks and regards,
    Vamsi

  • Everything is great, so please go ahead and close this :-)

    And again many thanks for the quick response.

  • Mads Lind,

    Thank you for the confirmation.

    Best regards,
    Vamsi