This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM42: flash erasure causes system reset

Other Parts Discussed in Thread: HALCOGEN

Hi guys,

I'm developing a bootloader for an RM42-based board. The flow is inspired by TI's SPI-bootloader application note (spna194‎), but I have a problem:

After the first firmware upload, the system reboots/resets during the flash erase-step in the bootloader.

I hit a breakpoint at address 0x00000000 (resetEntry) - e.g. the system resets because of an exception/hard fault.

I would like your help with diagnosing the error :)


I have tried looking at the ESM-registers at the time of the reset. 

  • ESMSR1 is 0x0000 0080 (group1, bit7)
  • ESMSSR2 is 0x0001 0000 (group2, bit16)
  • ESMEPSR is 0x0000 0000 (error pin driven low)

As far as I can tell from reading the datasheet, it could indicate unrecoverable ECC-error. This seems plausible to me for the following reasons:

  • when erasing, I don't think I clear the ECC bits, so when reading back to check if the erase was succesful (ala Fapi_Erase_Check) we will generate a lot of ECC errors.
  • before the first erasure, the application area is programmed to 1's (0xFF, 0xFF...) so the ECC bits are correct until I erase them for the first time.

I have tried disabling Flash ECC when erasing/verifying, to make sure the read-backs didn't generate ECC errors, but I am not sure this is the right venue.

When doing the erase, IRQ+FIQ are disabled, the system is in privilege mode and I've built my bootloader from TI example code from Zhaohong (e.g. the program/erase flow follows the instructions in the application note). Also, I'm using CCS 5.5.0 to generate the ECC bits automatically.

How can I diagnose a system reset that occurs during flash erasure? 

If you need more information to help me (or code snippets) I'll gladly provide it :)

Kind regards

Mikkel Johnsen

  • Mikkel,

    I would like some more information.

    Where is the code that is doing the erase executing from (RAM, Flash)?

    What Bank and sectors are you trying to erase that is causing the reset?

  • Hi John,

    You helped me getting the bootloader up and running in this topic: http://e2e.ti.com/support/microcontrollers/hercules/f/312/t/297699.aspx just FYI


    >Where is the code that is doing the erase executing from (RAM, Flash)?

    The bootloader module is copied to RAM, so the erase is performed from RAM. I can't fit all of the bootloader in RAM, so only the module that handles the Flash is executed from RAM. 

    >What Bank and sectors are you trying to erase that is causing the reset?

    The application resides at 0x8000 and when I erase the flash, I erase all sectors above 0x7FFF, so bank 0, sectors 4 - 14.

    I've tried inserting debug variables and checking the values through the debugger upon reset, and sometimes the reset occurs when erasing sector 5, sometimes sector 6, etc. The data ends up being erased after a few resets though.

  • Hello Mikkel,

    The reason you are seeing the abort is due to the fact that the system vectors (reset, abort, etc) start at address 0.  When the erase is executing on Bank 0, it is not available for reading.  If any kind of interrupt occurs during the erase, the attempt to read the vector will cause the system reset you are seeing.

    Ways to work around this are doing a RAM / Flash memory swap or possibly overlaying the vectors with RAM using the POM module (this is a working theory that has not been completely fleshed out here).

  • Hi John,

    I'm not sure I understand you correctly, so let me clarify:

    I disable IRQ while erasing and programming and I haven't used the FIQ-channel in the bootloader, so I think I've guarded myself against that possibility.

    >If any kind of interrupt occurs during the erase, the attempt to read the vector will cause the system reset

    Is this still an issue with IRQs disabled while erasing/programming?

    The error doesn't occur while programming, only erasing. 

  • I am not sure on the interrupt issue.  I have asked someone else to respond on that.  I have, however, thought of one more scenario that would cause this reset.  Any attempt to execute code from Bank 0 while the erase is executing will also cause this issue.

  • Hi Mikkel,

    Also to John's point on the interrupts - the FIQ is non-maskable on these products.  Once you enable the FIQ you can't disable it again except by going through another reset.  


    Would it be possible for you to send us the value of the CPSR register when you're beginning to execute the bootloader code that you have from RAM so that we can confirm the FIQ is disabled?

  • @John Hall

    >Any attempt to execute code from Bank 0 while the erase is executing will also cause this issue.

    I've tried my best to encapsulate the flash-facing code from the rest of the system, so that I only execute code from RAM during the flash erase/programming. I'm not loading everything to RAM, just the module that handles the flash.

    The programming step is done in the same manner, and it hasn't failed yet. I am pretty sure that I am not executing any code that resides in flash while writing and erasing, but can I check some register post-mortem to diagnose the reset any further ?


    @Anthony F. Seely

    >Once you enable the FIQ you can't disable it again except by going through another reset.  

    I can confirm that FIQ is never enabled and thus permanently disabled in the bootloader :)

    >Would it be possible for you to send us the value of the CPSR register when you're beginning to execute the bootloader code that you have from RAM

    The value of CPSR after branching to RAM (before disabling IRQ) is: 0x600001F3

    After disabling IRQ CPSR is: 0x400001F3

  • After you issue the erase, are you waiting for it to finish or do you check that at a latter time?

    Are you running with the debugger connected viewing the memory in Flash Bank 0 and the memory window tries to refresh itself?

  • Mikkel,

    OK so then it looks like you had the IRQ and FIQ disabled, and the disable of IRQ actually didn't do anything.

    So that I think rules out interrupts as a source of the problem.

    Do you think that somewhere your code might be branching back into flash or reading data from flash?

    Can you functionally separate your code so it 'should' always stay executing from the RAM?  If so then I think with some watchpoints we could trap the place where it accesses the flash...  Have you experimented with the watchpoint capability of the device at all before?

     

  • Mikkel,

    It seems like a watchdog servicing issue. Do you have the digital watchdog enabled in your system? Do you service the watchdog as part of the flash erase routine? I believe the API does have functions that allow you to periodically service the watchdog so that you don't get a system reset in the middle of a flash erase/program process.

    You can check the cause of the system reset by reading the SYSESR register as part of your reset handler. This register is at address 0xffffffe4.

    Regards,

    Sunil

  • @John Hall

    >Are you running with the debugger connected viewing the memory in Flash Bank 0 and the memory window tries to refresh itself?

    No, this occurs even with the JTAG disconnected.


    >After you issue the erase, are you waiting for it to finish or do you check that at a latter time?

    To the best of my knowledge, I wait for the operation to complete and for the Flash chip to become ready again, before continuing.
    As we have discussed in the forum post I linked to above, the erase flow is like this (simplified):

    Fapi_initializeAPI()
    Fapi_setActiveFlashBank()
    
    For each sector to erase {
    while( Fapi_checkFsmForReady() == Fapi_Status_FsmBusy ) /* block until ready */
    Fapi_issueAsyncCommandWithAddress()
    }
    
    while( Fapi_checkFsmForReady() == Fapi_Status_FsmBusy ); /* block until ready */
    status = Flash_Erase_Check /* read back to verify erase completed */
    return status

    I can post the actual code I am using as well, but I thought a simplification would be lighter on the eyes :) ?

    @Anthony F. Seely
    >Do you think that somewhere your code might be branching back into flash or reading data from flash?
    No. I'm using what I can from the provided bootloader examples and all the code that is running when erasing/programming is copied to RAM at startup.

    >Have you experimented with the watchpoint capability of the device at all before?
    Not at all. Can you help me setup a watchpoint for access to flash ?

    @Sunil Oak
    >It seems like a watchdog servicing issue. Do you have the digital watchdog enabled in your system?
    The watchdog is not enabled for the bootloader.

    >You can check the cause of the system reset by reading the SYSESR register as part of your reset handler. This register is at address 0xffffffe4.

    Thanks a lot, I will look into that :)

    Thanks for your help so far :)

    Kind regards

    /Mikkel

  • Hello Mikkel,

    Fapi_checkFsmForReady() will not block until the erase is done.  You would need to write this as:

    v1.x style

    while(Fapi_checkFsmForReady() == Fapi_Status_FsmBusy);

    v2.x style

    while(FAPI_CHECK_FSM_READY_BUSY ==  Fapi_Status_FsmBusy);

  • @John Hall

    >Fapi_checkFsmForReady() will not block until the erase is done.  You would need to write this as:

    Yeah, I am doing that already, sorry if the pseudo-code above didn't reflect that properly. I've edited it for clarity.

    I am using Fapi 1.51 BTW.

  • Do you know if the reset is occurring while waiting for the erase to complete (while loop) or after the erase completes?

  • @John Hall

    >Do you know if the reset is occurring while waiting for the erase to complete (while loop) or after the erase completes?

    I made a debug-version of Fapi_Block_Erase where I set a static variable to an increasing value at every step, and when the system reset I read it back through the debugger/memory browser in CCS.

    do {
        dbg_trace_var = 10;
        while( Fapi_checkFsmForReady() == Fapi_Status_FsmBusy )
        {
          dbg_trace_var = 11;
        }
        dbg_trace_var = 12;
        Fapi_issueAsyncCommandWithAddress(Fapi_EraseSector, eraseStartAddr);
        dbg_trace_var = 13;
        remaining -= flash_sector[j++].length;
        eraseStartAddr = flash_sector[j].start;
    } while(remaining > 0);
    dbg_trace_var = 14;

    The variable 'dbg_trace_var' is 13 after a reset - which means:

    The crash occurs right after calling Fapi_issueAsyncCommandWithAddress() - unless I am mistaken? :)

    .

    @Sunil Oak
    >You can check the cause of the system reset by reading the SYSESR register as part of your reset handler. This register is at address 0xffffffe4.

    After a reset, the 32bit value at 0xFFFFFFE4 is 0x00000018 - which means a sort of SW reset as far as I can tell ? 

  • If the crash is occurring where you think it is, then it is something external to the Flash System.

    Brainstorming here:

    Are you using the MPU?

    Do you have any kind of periodic routine executing (NHET, DMA, EMIF, etc)?

  • Mikkel,

    The SYSESR value indicates a system reset that is caused by the application software itself. Do you cause a system reset in your code by writing to the SYSECR (exception control register)? How is this function triggered? I suspect that this reset is asserted in response to the ECC livelock detected (indicated by ESM group2 channel 16).

    If so, then you will have to ensure that the ECC checking is either disabled or that the ECC values are maintained correct while ECC checking is enabled.

    Regards, Sunil

  • @Sunil Oak

    >How is this function triggered? I suspect that this reset is asserted in response to the ECC livelock detected (indicated by ESM group2 channel 16).

    You are right about that. The reset is caused by a software reset in the prefetch hard fault handler. I suspect the hard-fault is from an ECC error as well. I've tried disabling ECC before erase and re-enabling it afterwards, but I still get the hard-fault.

    What I did to disable/re-enable ECC checks was to call

    _coreDisableFlashEcc_()

    ... before beginning the erase/verify-cycle and calling

    _coreEnableFlashEcc_()

    ... before returning from the erase-function

    Do I need to do anything else in order to disable Flash ECC ?

    It makes sense to me that the error could be an ECC error. When I've programmed the application area for the first time and I want to erase the application-area before the second firmware-upload, I get the error.

  • Mikkel,

    These are the correct functions to be used for disabling/enabling ECC. You do need to enable ECC checking only after all of flash has its corresponding ECC locations also correctly configured. Erasing the flash means that both the data and ECC locations are erased (all Fs), so that any read from this memory with ECC checking enabled will cause ECC errors.

    Do you have reads from the flash memory after it has been erased? Are these reads intentional or for servicing some interrupt? You do need to track this down.

    Also, you can delay enabling the ECC checking until the second firmware has been uploaded.

    Regards, Sunil

  • @Sunil Oak

    >Do you have reads from the flash memory after it has been erased? Are these reads intentional or for servicing some interrupt? You do need to track this down.

    After I erase a sector, I verify the erasure by reading the sector one word at a time to see if there are any 0-bits (i.e. not all Fs). 

    However I disable interrupts and flash ECC before reading back the erased area, so I'm not sure what's causing the error. The ECC error seems plausible, but if I disable flash ECC entirely in the startup-code, I still get the error.

    I see a couple of possibilities:

    1. accessing flash while erasing
    2. ECC error while verifying erasure
    3. MPU is causing the error


    1) I explicitly don't execute code from flash while erasing (which is where the error occurs). I also have both IRQ and FIQ disabled, so I don't think an interrupt is accessing the flash either. I do not run ALL my code from RAM however, so after the erasure, I return to executing code from flash. So I read from the flash after each successive sector erase, after waiting for the Flash to get ready/not busy.

    As far as I can tell from my discussions with John Hall, this is allowed behaviour ?

    2) I've tried both disabling/enabling flash ECC before/after erasing (with no effect). I've also tried disabling flash ECC as the first thing in the startup code (sys_startup.c) and then never re-enable it, also no cigar :(

    If it is not enough to call _coreDisableFlashEcc_() can you help me disable ECC to a point where I can verify whether or not the hard fault comes from an ECC error ?

    3) I haven't actively enabled the MPU, but a lot of my driver code is auto-generated via HALCoGen, so I'm not entirely sure to be honest. I don't see any configuring of the MPU in the startup code either.

    All of my code runs in privilege mode. Can the MPU still cause errors if the code runs in privilege mode ?

  • Mikkel,

    It is hard to continue to debug this issue without having access to your code. There is obviously some point in the software where the CPU fetches code from flash for which ECC checking has been enabled and the correct ECC codes are not programmed.

    When you get the prefetch abort, can you check the uncorrectable error address captured in the flash interface module? This will help you more with the analysis of what location is being addressed resulting in the double-bit ECC error. The flash uncorrectable error address is captured in location 0xfff87020.

    Regards, Sunil

  • Found the problem :)

    I looked through the forums here and found a guy with a similar sounding problem. Sunil advised him to check his CP15 registers, so I did the same. On reaching the hard-fault handler, CP15_INSTRUCTION_FAULT_ADDRESS contained an address from flash - so as John Hall predicted, the problem was caused by reading from flash while trying to erase it. As far as I can tell, this is an example of a precise data abort am I right ? 

    I then set a data watchpoint on that address and reached it inside a piece of code I had not expected.

    Turned out the linker-script placed the .const segment of the bootloader code in flash. After editing the linker-script and confirming the placement in the linker-script, my problem dissapeared.

    So in short:

    To find the cause of a (precise) abort exception, one can check the CP15_INSTRUCTION_FAULT_ADDRESS register as it might contain a valuable clue :)