This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSPM0G1507: Problem with eeprom emulation type B

Part Number: MSPM0G1507
Other Parts Discussed in Thread: TIOL1123

Tool/software:

Hi,

I´m using the eeprom emulation type B from the MSPM0 SDK (Version 2_03_00_07) and I encounter the following problem:

After running my application on the device for a day or sometimes two or three the next restart will reset all of the "eeprom values" in the device. Then they are reset to default values. The application reads these values at startup and doesn`t modify them during these tests for one or more days. So the eeprom emulation is accessed only once -> at startup. The application reads and writes data via multiple UARTs.

What I´ve ruled out so far:

1. That I`m accessing the last 8 bytes of the flash: https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1370047/mspm0g3507-flashded-issue-when-accessing-the-last-8-bytes-of-the-flash-memor?tisearch=e2e-sitesearch&keymatch=FLASHDED# -> I´ve encountered the DED and moved the eeprom emulation down on page -> check out the attached linker cmd

2. That I´m using the newest version of the SDK -> with that the eeprom emulation is now CRC checked

3. That I´ve didn´t exceed the write cycles of the flash -> as I mentioned I only read the values from the emulation

4. Flash wait states are incorrect -> check attached sysconf

I´ve also attached a faulty hexfile -> I´ve read out the flash reserved for eeprom emulation after the problem happend.

Can you help me to pin point the problem?

Thanks!

Best regards

Steffen

EEPROM B Problem.zip

  • Hi Steffen,

    So this problem only occurs when you perform a reset on the device? And sometimes you need to run the example for a day or two, then perform a reset and you see it occur?

    I see you mention that the application code does not touch the EEPROM emulation region after startup. Does your application perform any other flash writes / erases and is it possible that the addresses have just gone wrong?

    Also you mention that the flash is reset to default values -> want to be sure that the EEPROM emulation addresses (and the ECC values if applicable to your region) are all 0xFFs? Just want to be sure about what operation is changing the values here.

    I am also wondering when you first write your EEPROM values into the EEPROM emulation region. Are the original values that are placed there done when you first flash the device? Or do they get written by the application on startup, then overwritten (or erased) when you perform the steps above?

  • Hi Dylan,

    thanks for getting back to me! Regarding your questions:

    1. I`ve described that poorly: The app runs for a work day and the power supply is switched off at the end of the day. Then the next day I switch the power supplyback on and sometimes the problem occurs and sometimes not. Then the next day the cycle starts again..

    2. The only flash writes occurs at first startup -> default values are written -> then I change all values from default to different value via a uart interface and the the app is running. Regarding addresses: have you checked my linker file?

    3. I´ve attached a faulty eeprom flash region above. They are all FFs except 4 bytes are zero.

    4. Check 2. I´ve also attached the user code wich handles the eeprom emulation and the eeprom emulation itself with the settings in the header

    Best regards

    Steffen

  • Hi Dylan,

    an additional info:

    The init function of the eeprom emulation checks if there is a valid group and if thats the case it relocates it to another flash address. I think thats because of the wear leveling. So a write and erase happens at each restart. Maybe there is my problem?!

  • Hi Dylan,

    I´ve attached you the EEPROM section of the flash after first start up. If compared that to the docs for the eeprom emulation .

    You clearly see the group header followed by the 4 data items.

    What I don`t understand is that the data items a sorted as end of write flag, data identifier and then the actual data. But the attached read out looks more like  data identifier, end of write flag and then the actual data. Are the docs wrong or the implementation?

    What I will do next:

    I´ve read out the eeprom flash section at the first start up. I will keep the device running for today and the end of the day I will read out the section again and compare it. Maybe something else is writing to the flash section and corrupt it.

    Thanks!

    Best regards

    Steffen

    StartUpOk.zip

  • first day: no changes on eeprom flash section during a runtime of 6h

  • Unsolicited: Have you considered the effects of slow power ramp and/or bounce on the RST pin? Over the years I've become leery of doing anything irreversible (non-idempotent) in the first (say) second or so after startup. If nothing else, it's not unusual for a debugger to cause your program to run (briefly) before getting control of the MCU and restarting it.

    A quick(?) experiment might be to insert an arbitrary delay at the beginning of main(), and power-cycle repeatedly. Start with maybe 1 second and then reduce it to something manageable.

  • My point about the flash addresses was to ensure that when you interact with the EEPROM emulation region in your application code, make sure that you always have the correct address. I see your attached linker, memory dumps, etc. My question was basically just are you checking to make sure data is written where it is supposed to be, and the correct addresses are erased when you use an erase command. Just a simple thing to check over.

    As for the contents of flash, I see your point about the 4 bytes being zero, it looks like the eeprom emulation region is being initialized properly as far as I can tell. To your point about copying the data to a new sector on startup - it is possible that the error occurs here, but the default library has error checking and should return an error in the event that one of these operations fails, so you should be able to tell if this happens, as long as you monitor the error flags and save them somewhere, stop execution, or something else.

    To be totally clear - what should happen after a reset is that the device checks the EEPROM emulation area and copies the data over to the new EEPROM emulation area, so your data shouldn't be lost, it should just be in a new place, unless it was corrupted. Again then you should see an error of some sort.

    For the order of information in the data item - are you considering that the flash memory is little endian? I think the docs line up with the data.

    You might also want to add some breakpoint or while(1) loop inside of an error check so if you detect an error you get stuck and can see it with the debugger if you haven't already

  • Hi Dylan,

    I´ve checked what happens in case of an error:

    The EEPROM_TypeB_init() returns 0 -> CHECK_ONE_ACTIVE_GROUP but the first read of an value returns an error. gEEPROMTypeBSearchFlag is zero so the value couldn´t be found. I´ve try the read two times. So if the first one fails I´ll give it a second try.

    Regarding the docs: Sorry you are right! ARM is little endian.

    Regarding addresses: I`ve double checked everything: linker and settings ->same addresses. What about the place holder array I´m using to check if the eeprom emulation fits into the specified section ".eeprom_data"?

    Can you please check the adresses too?

    Thanks!

    Best regards

    Steffen

  • Another note:

    Why is the eeprom emulation is using the dl_flashctl api for erasing and writing flash contents but not for reading flash contents?

    For reading a bare pointer is used instead of a readverfiy from the dl_flashctl. Any specific reasons for that?

  • Hi Bruce,

    thanks for the suggestion. Ill try that. Maybe I can poll a flag if the flash is ready?

    Whats odd is that if I restart the device repeatedly (1s off 10s on) the error doesn`t occur. Only if I run the device for serveral hours.

  • Okay the CHECK_ONE_ACTIVE_GROUP value looks correct for the checkFormat() API, that just means it detected that one group is active with valid data. So that is fine.

    Also to be clear about the first read of a value returning an error - when you attempt to read a given data item and it returns the error, I assume you also go check the memory to see if the data item is actually there? If not then next time around with the debugger see if you can stop before the initialization function and step through it to see if the values are corrupted when being copied, or if they are wrong before the init.

    About the array, it looks like you are pre-initializing 2*1*1024 bytes, so 2kB. I can also see that you've reserved 2kB in your linker so that seems fine to me. The addresses are also fine, with your header file and linker matching.

    I am not sure why the pointer is used instead of readverify, both should cause the ECC to be checked automatically by the device. I do not suspect this is the source of the issue. If you prefer you are welcome to switch these to a readVerify

    I see your comment about restarting repeatedly causing the issue, another thing that I am curious about is: do you run the device all day, check the flash contents before powering down, then observe the error condition the next day? Do you ever run it all day, check it, power it down and then check again? Basically what I am asking is does the device need to be powered off for a few hours? I would suspect it does not but... curious. 

    My main suspicion is that the flash data is damaged upon startup in at the beginning of the day, but I still expect that the root is that something in the initialization sequence is erasing the values. So related to this is the first test I mention in this reply. You could try placing a software __BKPT(0); in the init sequence so that the code doesnt execute beyond that line until you connect a debugger.

  • Regarding your questions:

    1.When I first read a value (after eeprom init) I only check the searchflag from the eeprom api. How can I attach to the target with the debugger without reprogramming its contents. Already asked that question: https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1470125/mspm0g1507-attach-to-running-target-with-ccs20-aka-theia/5645662#5645662 -> waiting for an answer.

    Currently what I do is print out debug information via UART. I can read the whole eeprom section and print it out via UART. This I can do before init and read.

    2.Yes I run the device all day and reads its contents before powering down. Then I compare that with start contents. So far no change. I´ll encountered the problem also after immediatly powering it up again. So something happens at startup. I have also programmed my power supply to switch the device off for a second and switch it on for 10s but then after one hour the problem didn`t occur...

    Thanks!

    Steffen

  • You should be able to attach to a running target by using the steps shown in the guide that Diego had linked. I just gave this a try on my end and it worked. 

    I did it by right-clicking on the target config and selecting to launch project-less debug, then expanded the threads tab and right clicked on CORTEX_M0P then selected to connect. After this I used the Run->Load->Load Symbols, then in the dialogue box that appears I hit browse then selected my project's .out file. Now I can debug step through my code. In the other thread did you mean that you can't find a symbol / .out file? To me the steps appear to be working.

    Anyway, if you are able to read out the flash via UART before init and read, thats fine, it sounds like you should be able to do the same test. Just ensure that you have some breakpoint or pause before init and read to see when the data is corrupted.

    Understood on the point about it occurring after working on it all day, powering down then immediately powering up again. Agreed that it does sound like the issue is on startup. Interesting notes here. Maybe the issue appears after you've filled the groups / copied them over one or more times.

  • Thanks. I´ve did the uart print and it works fine. Now I´m waiting that it happens again..

  • Hi Dylan,

    so far nothing. I`ve changed the following:

    1. 1ms Startup delay after SYSCFG_DL_init()

    2. Print out of eeprom flash contents before and after emulation init

    3. no debugger attached

    1 and 2 result in a higher startup delay. So maybe was right or the attached debugger is responsible for the problem.

    I´ve removed 2 and will report back.  I`m taking back the changes one by one.

    Best regards

    Steffen

  • Hi Dylan,

    I think I´ve found the problem: Sometimes on power up I´ve got the attached voltage profile. My brown out level is the default one.So its around 1.5/1.6V. 

    What I think what happens:

    1. System starts out of POR

    2. Voltage rises above brown out level 0

    3. eeprom emulation starts moving stuff from one section to the other

    4. in the middle of a write or erase flash operation the voltage falls below the brown out level 0 

    5. data is corrupted

    The attached voltage profile does not happen all the time. Only when hot plugging.

    I think the solution is to set the brown out level to 3 ~2.9V. I`ve got a 3.3V power supply provided by the TIOL1123.

    In addition to that I will add a 10ms delay because the voltage ramp up will take about 8ms after getting out of POR.

    What do you think?

    Thanks!

    Best regards

    Steffen

  • Hi Dylan,

    an additional question:

    When I change the BOR level from 0 to 3 in sysconf there is a hint which states:

    Set the brown-out reset (BOR) threshold level. A BOR0 violation will force a re-boot, this is the minimum allowed threshold. A BOR1-BOR3 violation generates an interrupt.

    If I understand that correctly a BOR3 level will generate only a interrupt not an POR. So I have to act on that. How should I do that? There is no example for that case.

    Thanks!

    Best regards

    Steffen

  • Hi Steffen, 

    The root cause assessment makes sense to me, if you get a reset during a flash operation then flash data can be corrupted. When the supply voltage is still ramping and is just above the BOR threshold, then a flash operation begins, the device draws extra current and can pull the voltage down a bit.

    Setting the BOR threshold higher should help to ensure that no flash operation is executed if the supply voltage is very close to the minimum operating voltage. Delaying a bit could also help to ensure that the voltage has ramped up before executing code.

    As for your question about what to do in the event of a BOR level 1-3: you are correct that a BOR3 will generate only an interrupt, not a reset. In the interrupt you will need to define the behaviors necessary for your system. Based on what we've discussed here, it sounds like you should check the status of the flash controller to ensure that there are no ongoing operations. By jumping to the interrupt handler you are also stopping the next flash command from being issued to the controller. 

    When you are writing the interrupt handler you need to consider two possible cases:

    1) The power supply returns to safe operating levels. Here you likely just wanted to delay the next flash operation until the voltage has come back up. Clear the interrupt flag, wait a little longer if you see a need for it, and return from the handler. If the voltage dips back down again you should just jump back to the handler again.

    2) The power supply does not return to safe operating levels. Here you want to ensure that you don't request any new flash operations so that none are executing when the voltage supply passes the minimum threshold. Once the ongoing operation completes (if there was one) your flash should be ok to enter the power down state. Beyond that it is up to you how you want to gracefully power down.

  • Hi Dylan,

    thanks for the info.

    What I´ve done:

    1. added a 10ms start up delay before eeprom_init() -> 10ms is longer than all dips mentioned above

    2. changed BOR level to 3 and added BOR Interrupt like:

    void NMI_Handler(void) {

        switch (DL_SYSCTL_getPendingNonMaskableInterrupt()) {

            case DL_SYSCTL_NMI_BORLVL:
                DL_SYSCTL_resetDevice(DL_SYSCTL_RESET_POR);
                break;

            default: break;
        }
    }

    is that correct? Or do I have to somehow check in the interrupt if im above or below BOR level?

    I´ve also found another thread

    https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1472223/mspm0g1107-using-eeprom-emulation-type-b-if-the-device-browns-out-during-a-flash-write-we-lose-several-data-items/5657280?tisearch=e2e-sitesearch&keymatch=mspm0%20brown%20out#5657280

    with the same problem and the root cause is the regrouping behaviour of the eeprom  at startup. Your collegue has already analysed the problem and it will be fixed in a future SDK update.

    Thanks!

    Best regards!

    Steffen

  • My only comment is in 2) you define the NMI handler, then check for the reset reason, if the reason was a BOR then you assert a POR - this was not quite what I expected, do you have reasoning for this? This would mean that any time you pass below the BOR threshold (of any level) you will undergo a POR. This would cause a similar problem to before.

    Inside of the handler you may just want to continue to clear the BOR flag and poll it again to check whether you are still below the BOR threshold. If the power supply rises back up, the flag will not be re-asserted and you can exit the interrupt handler. If the power supply stays in the BOR0 to BORx region (where x is the BOR level you configured) then you would then poll the flag forever. You may want to figure out a different behavior for this. If the device goes below BOR0, then it resets automatically and you do not need to assert a reset using software.

    The other functionality I mentioned earlier is to poll the flash controller status register for the command done signal, and then wait (possibly using the clear and poll loop above) which would complete the ongoing flash command then wait until the power supply comes back up again.

  • My intention was to delay the eeprom init call to 10ms after reset. That was the longest time I´ve seen until the voltage reached 3,3V. So if in that time a BOR is asserted the interrupt is called and the device restarts. Then the 10ms will run again from start. So the device basically restarts until the voltage is above BOR3 level and then when 10ms are up the eeprom init is called.

    Is that a false assumption?

    One additional question:

    Do I have to enable the NMI reset or is it always enabled because it`s non maskable?

    Thanks!

    Best regards

    Steffen

  • I see your point with the code snippet above, I think that would work, but if you experience a BOR during runtime then you will definitely undergo a POR as well, which may or may not be acceptable. That part is up to you, I think your stated goal with this will work, you will likely undergo multiple resets as the power ramps on, but if you find this works best then I think that is ok. So I don't think you NEED to change this now that you've explained, at this point it is up to you to decide if this is the desired way to handle the original issue.

    For your additional question - you dont need to enable anything, since it is non maskable it is effectively always enabled. As long as you have an NMI handler defined in your application you will jump there when an NMI occurs.