MSPM0G3107: MSPM0G3107SDGS20 flash issues

Part Number: MSPM0G3107
Other Parts Discussed in Thread: SYSCONFIG, SEGGER, UNIFLASH

We have start seeing issues in our product when updating the software for some MCU with some date codes.

Some individuals can't program all parts of the application flash when trying to update our boot or application using our updating mechanism, where we erase and write certains sectors. We see the issue at certains addresses written during the update process. Some individuals works fine, some works from time to time and some does not work at all.

We have disabled the watchdog and the errors can appear in the middle of the written data, so it has not been interrupted or restarted during operation.

When writing the sector and reading back the same sector we can see that they do not match. Trying to rewrite does not work either. When reading back the flash with the debugger we can see that it has not been written.

We can write and read the flash with the debugger with any errors.

We see the issues for some specific batches of TI MCU and our PCB. The PCB we seeing issues with uses MCU from between w41-w43 2025.

On a bad individual we get several (32 for example) DL_SYSCTL_NMI_IIDX_FLASH_DED interrupts during normal boot, after that when trying to write sectors on the flash we see additional 20+ interrupts.

On a known good board we don't see any of this interrupt from before and after w41-w43.

See attached uploaded files to the secure files area for generated TI config of the MCU. More information will provided as we find it and when requested.

Why does this error appears on some individuals and not on others and how to handle that if this is a known bug?

  • Hi Daniel,

    A FLASH_DED would happen if there's a code read in the ECC region that has improper ECC code. The SYSCTL->DEDERRADDR will return the memory address of where the error occurred. If the address is in the main region, check where the memory is getting written to in your code.

    The files I received were only the SYSCONFIG configuration files, there should be no issues here as there are no flash writes (except the bootconfig file, but you'd have other issues if NONMAIN was wrong).

    Are you using the DMA during these flash operations? On the MSPM0G31xx device, it is a single bank device so you need to be using the flash functions that have fromRAM in them as they will execute via RAM.

    The only region that would be different in your devices would be the Factory Constants section. The Factory Constants has calibration data on the device.

  • Adding an observation on our board, that is common on both working and not working boards:
    At reviewing schematic we found that load caps on HFXT crystal felt kind of low, expecting something in 18pF range, defined by the crystal in place. We tried to change to 18pF, and when board didn´t even start. Actually measuring the CLK_OUT (with a <1pF FET probe), shows a rather poor waveform, compared to other designs.
    Settings of the HFXTRSEL is at the moment unclear for me, but should be 1h or 2h.  Could a bad setting have this result ? Good something else be fishy here ? And result in bad Flash timing ?

    More shots in uploaded files.....
    BR/Fredrik

  • The 0x1 or 0x2 would change the drive strength for the MCU, you're on the border between 2 so I'd say it depends on the capacitance.

    For your crystal, what is the load? https://www.ti.com/lit/slaa322 is a good document to go over crystal considerations; although the document is for low frequency crystals and the MSP430 devices, the techniques are mcu and crystal agnostic. You'll see similar formulas in different documents spread around the industry, with the only difference being how they describe the parasitic capacitance. 

    You typically want the load caps to be twice the value - 2pF of the specification in your crystals datasheet. Small differences in the capacitors would change the frequency of the crystal.

  • It is a 12pF crystal. So either 18pF or 22pF load caps expected. But could this really affect our experience with Flash ? 

  • Are you using the HFXT to change to a higher frequency through SYSPLL or is this with the external crystal directly to the MCLK?

    There are flash wait states which you need to set based on the frequency, you could run into issues if the HFXT is not stabilized or set properly. You can take one of the units and comment out the clock configuration and see if using the SYSOSC fixes the issue. If it does then you know the HFXT was the cause.

    Is it easily repeatable on one of the units where you're having the flash DED error?

  • I have uploaded the syscfg file as well, and yes we are using SYSPLL to change to a higher frequency. Hope that the syscfg file is updated to it maps to the  generated ti_msp_dl_config.c file we uploaded yesterday. I will try to change to use SYSOSC instead.

  • Same issue when changing sysPLLRef to use DL_SYSCTL_SYSPLL_REF_SYSOSC (and updating pDiv from DL_SYSCTL_SYSPLL_PDIV_2 to DL_SYSCTL_SYSPLL_PDIV_4 since SYSOSC is 32 MHz and our external oscillator is running at 16 MHz).

    SYSPLL clocks the rest of the system with 80 MHz.

  • Hi. So it seems that the HFXT is not the root cause, as problem is still where after using internal oscillator.

    We have now also tried to lower the SYSPLL to 40Mhz ( ULP to 20Mhz) , 2 wait states on flash, still experience the issue.

    We have uploaded a detailed description over our Flash update process, and the ECC errors that occurs.
    Please review!

    Now our ideas are running out.
    it is still very strange that we produced several thousands before board datecode of W40/41 with even major problems W43.
    Shall we upload the circuit datecodes again to current sharepoint ? 

    if i understood right the only changes/updates on die were done on going from  "X" samples. When was this in time ?


  • Hi Fredrik,

    I have already shared the device images with date codes from W27, W40/41 with the team internally, no need for you to upload.

    Thanks,

    / Wolfgang

  • Hi Daniel,

    Could you test whether the 0.47uF capacitor placed well in the MCU VCORE pin on those suspect board? And also check the voltage on VCORE is normal (around 1.35V)?

    And to test whether this issue is related to PCB hardware or MCU silicon (I believe it is not related to software since you have a same software program in previous products and current products), could you take a A-B-A test, to see whether the issue exists if you try to exchange the MCU of a suspect individual and a "always good" individual. This is to check whether the issue is follow suspect MCU or suspect PCB board.

    I believe Wolfgan has shared with you the guidance to check MCU lifecycles, please help share us the information of it. And besides, could you also check the value of MCU trace ID (in address 0x41C40000) of suspected deivces? Thanks. 

  • Hi Daniel,

    Another question, for the suspected devices, do you see the flash abnormal region are with the same address or the address is different time to time?

    For example, if you download a program to MCU and read it back (the program should not includes operation of flash to make sure the flash content keeps the same before and after programming). And you see the readback content is not equal with what you write in the suspect devices. Could you check whether the corrupt region address are always the same or it will change for different times of programming?

  • Hi Pengfel,

    we have checked VCORE and it is correct component (470nF) mounted and looks like below:

    "good" board 

      

    "bad" board

    We will also check while erasing/Flashing

    layout as below. 470nF capcitor in red , via direct to GND plane

  • We have also done the swapping test. Swapping processor on known "good" resp. known bad board and performed an automated test cycle (200 times) of updating SW before and after swapping.
    Problem clearly follows the MCU.
    We will now test erasing MCU again with debugger and restest updating SW, just to ensure that Flash is fully erased. 

  • Additional findings:
    It seems that on bad boards, we are sometimes not able to do the step "Erase program area" once the area has been written into once.
    Our erase function is as simple as can be and obviously works fine on thousands of other boards.

    bool EraseSector( uint32 sector )
    {
        bool result = true;
        DL_FlashCTL_unprotectSector( FlashCfg_Instance.handle, sector * 1024, DL_FLASHCTL_REGION_SELECT_MAIN );
        DL_FLASHCTL_COMMAND_STATUS cmdStatus = DL_FlashCTL_eraseMemoryFromRAM( FlashCfg_Instance.handle, sector * 1024, DL_FLASHCTL_COMMAND_SIZE_SECTOR );
        if ( !DL_FlashCTL_waitForCmdDone( FlashCfg_Instance.handle ) )
            result = false;
        if ( cmdStatus != DL_FLASHCTL_COMMAND_STATUS_PASSED )
            result = false;
        return result;
    }

    This function never reports fail, but when reading back with an even simpler function, it does not come back as 0xFF in all of the bytes.

     

    bool CheckIntervalIsErased( const AddressInterval* interval )
    {
        bool result = true;
        const uint8* pFlash = (const uint8*)interval->startAddr;
        const uint32 length = interval->endAddr - interval->startAddr;
        for ( uint32 i = 0; i < length; i++ )
            if ( pFlash[i] != 0xFF )
                result = false;
        return result;
    }

    When we trusted the EraseSector-function and tried writing to what we were led to think was erased memory, we then get ECC-errors, and of course not the correct data written which makes sense when the erase was unsuccessful. The debugger seems to be able to erase the flash most of the time.

    On some chips, the Segger cannot erase some sectors it seems. 0x2000 is in the middle of the Boot and remains when using "Erase chip". It gets erased when using "Erase sectors", but 0x1F300 sometimes still remains.

     

    A question that arise is if the erasure of the ECC-parity-bits come with some sort of timing, locking or acknowledgement requirements? A hypothesis is that the parity-bits somehow end up blocking the erase.

  • Example of MCU Trace ID from "bad" boards:

    addressHi: 37225204 -> 0x23802F4 (0xF4023802)
    addressLo: 733511727 -> 0x2BB8802F (0x2F80B82B)

     addressHi: 37255342
    addressLo: 733511727

    addressHi: 5381272
    addressLo: 733511727

    addressHi: 37231734
    addressLo: 733511727


    Boards with production week , W43:

    addressHi: 37000949
    addressLo: 733511727 

    addressHi: 37017804
    addressLo: 733511727 

    addressHi: 37256370
    addressLo: 733511727

  • Fredrik,

    Good input. Especially interesting that the swap test showed the issue follows the MCU.

    In you last post, "bad" boards is clear but what about W43 boards - are those also "bad" or "good"?

    Can you share a couple of "good" board MCU Trace ID, if above are all "bad"?

    Thanks,

    / Wolfgang

  • Hi Fredrik,

    Thanks for your significant update.

    It is a great finding that you found some flash address could not be well erased by erase chip or by erase sectors. If we write data to a non-erased state flash region and then read it out, it will trigger ECC error. There are two items need you help to check:

    • Do you configure any static write protection in NONMAIN?
    • Could the flash region (such as 0x2000 and 0x1F300) be erased by Uniflash by mass erase? And do you find any other address region that could not be erased?

     Could you share the value of the bootcfg and lifecycle register by following the guidance on "MSPM0 Check Lifecycle Steps.pdf" in TI Drive, it needs a XDS110 debugger, and you could directly connect me here if you meets any problem on the steps.

  • Hi Pengfei,

    We have locked sector 1 that is populated during board production and we don't have any issues with that sector.

    Sometimes the bad chips end up in a state where the Segger cannot readback or erase the flash and only the TI-debugger can recover it.
    We have observed the non-erased sectors on the addresses described above. We have started writing a stress test program to try writing and erasing and checking sectors and segments of the flash and will come back with the results of that.

  • Hi Daniel,

    Before your test, could you firstly get the lifecycle information of the suspected devices? Thank you.

  • This morning , we got info from TI via our EMS partner that there is a problem with SysPLL issue_ MSPM0Gxxx products. As this was TI " Selective Disclosure" i will update this one to sharepoint
    For us it seems relative ?
    Is it ok to discuss here ?

  • Hi Fredrik,

    The SYSPLL issue is related to SYSPLL clock not working in a proper frequency and it occurs in a very tiny rate. 

    As you have tested by disabling all the clock configurations and keep MCU run with SYSOSC, I think the issue you are facing is not related to this SYSPLL issue.

    Could you get the bootcfg and lifecycle register value for me by following the guidance on "MSPM0 Check Lifecycle Steps.pdf" in sharepoint? Thank you.

  • Just had a case when Segger debugger did not erase 0x2000 to 0x9100. Tried twice. When using "Erase sectors", the flash was erased there too.
    I feel that when we are using the JTAG-interface, there should not be anything that would interfere from how we have set up clocks or anything else programmatically? The MCU should surely use some sort of pre-defined clock speed, interface speed, etc. regardless of what the SW previously set up? There are no locked sectors on this chip.

     

    I am not an expert on the JTAG-interface, but I would expect the debugger to be able to issue a "mass erase" and have it fully completed by the MCU. That the troublesome MCUs are hard to erase even with debugger, not just with our software, is an indication that there is something wonky with the flash and the internal erase command (if that is what happens) cannot perform its operations every time either. Those operations must use some sort of clock too? Or is it clocked by the debugger HW?

  • I got one board that was not able to update the application and got these values. Will test with some more boards.
    CFGAP_BOOTDIAG = 0x00000007
    CFGAP_LIFECYCLE = 0x00000096

  • Daniel,

    Flash operations are clocked and managed internally in the flash engine and should not be dependent on the JTAG interface clock or timing. Once initiated by a JTAG command, the flash operation is autonomous.

    I recall you mentioned Segger uses the target VCC, i.e. 3.0V. while connected where TI XDS110 debugger provides target power of 3.3V and this was more successful when flashing with the debug probe?

    Can you try same operation with the Segger probe as above, but raise VCC to 3.3V and se if that makes a difference?

    Thanks,

    / Wolfgang

  • Hi Daniel,

    Yes when firmware is downloaded by debugger such as JLink or XDS110, the flash operation program works in a default clock setting, using SYSOSC for CPU operation. As a alternate test, could you try whether MCU could be mass erased by XDS110 + Uniflash for a abnormal device, as shown in below. Notice that you need to connect NRST pin between XDS110 and MCU.

    And for the below erase command, could you try always apply DL_FlashCTL_executeClearStatus(..) before any of flash operation?

    bool EraseSector( uint32 sector )
    {
        bool result = true;
        DL_FlashCTL_unprotectSector( FlashCfg_Instance.handle, sector * 1024, DL_FLASHCTL_REGION_SELECT_MAIN );
        DL_FLASHCTL_COMMAND_STATUS cmdStatus = DL_FlashCTL_eraseMemoryFromRAM( FlashCfg_Instance.handle, sector * 1024, DL_FLASHCTL_COMMAND_SIZE_SECTOR );
        if ( !DL_FlashCTL_waitForCmdDone( FlashCfg_Instance.handle ) )
            result = false;
        if ( cmdStatus != DL_FLASHCTL_COMMAND_STATUS_PASSED )
            result = false;
        return result;
    }
  • We still used the PLL, but clocked by the SYSOSC instead of the crystal. Should we test by only using SYSOSC as well then? Might need some tuning of the CAN to get that working then i guess.

  • Hi Daniel,

    Is this value got from abnormal devices? The value you are showing means MCU is in a normal status. If other suspected devices also has a same bootdiag and lifecycle value, I think we could focus more on the hardware or software configurations of this issue.

  • Tried to add the DL_FlashCTL_executeClearStatus command before DL_FlashCTL_unprotectSector, but still no success :-(

  • I have tried several boards that fails to update, and they all have the same lifecycle values.
    CFGAP_BOOTDIAG = 0x00000007
    CFGAP_LIFECYCLE = 0x00000096

  • Uploaded the batch code list to sharepoint

  • We have found that devices that fails erasing the flash could succeed if the "faulty" sector is erased repeatedly. One certain device that we have a lot of issues with was successful erased after 22 retries of erasing sector 124 (0x0001F000-0x0001F3FF)...

  • Hi Daniel,

    And you have disabled all the interrupts before flash erase operation but still could not see all the bytes in the sector erased in this test right?

  • Hi Daniel,

    Do you know the erase time for a single erase operation? For example, we could measure this time by e.g. toggling a GPIO before and after the erase command. 

  • Hi, 
    Just to conclude the question above about also VCC during the erase phase.
    We couldn´t see any dips during erase. Yellow is  3.0V and green is Vcore

  • Thank you for this update.

    Could you please help get below information when an erase fail occurs?

    1. Go to debug mode for the abnormal device and try a erase command for the "faulty" sector. After the API finished, check the flashctl registers STATPCNT, STATCMD and STATADDR register when erase fail.

    2. Give us two screenshots on the "faulty" flash sector before and after the flash erase. We want to check how the erase fails looks like. And your test program is always below code right? EraseSector returns a successful and CheckIntervallsErased returns a fail.

    bool EraseSector( uint32 sector )
    {
        bool result = true;
        DL_FlashCTL_unprotectSector( FlashCfg_Instance.handle, sector * 1024, DL_FLASHCTL_REGION_SELECT_MAIN );
        DL_FLASHCTL_COMMAND_STATUS cmdStatus = DL_FlashCTL_eraseMemoryFromRAM( FlashCfg_Instance.handle, sector * 1024, DL_FLASHCTL_COMMAND_SIZE_SECTOR );
        if ( !DL_FlashCTL_waitForCmdDone( FlashCfg_Instance.handle ) )
            result = false;
        if ( cmdStatus != DL_FLASHCTL_COMMAND_STATUS_PASSED )
            result = false;
        return result;
    }

    bool CheckIntervalIsErased( const AddressInterval* interval )
    {
        bool result = true;
        const uint8* pFlash = (const uint8*)interval->startAddr;
        const uint32 length = interval->endAddr - interval->startAddr;
        for ( uint32 i = 0; i < length; i++ )
            if ( pFlash[i] != 0xFF )
                result = false;
        return result;
    }

  • Hi Daniel and Fredrik,

    Sorry another question:

    1. Except firmware update, will you erase or program flash in your application program for data storage?

    2. Do you think whether it is acceptable to temporarily use a flash BLANKVERIFY command (DL_FlashCTL_blankVerifyFromRAM() API) to check sector blank state after an erase operation, and make sure the flash content is well erased before move on further program? It provides a further step to verify flash erase result but will make the erase takes more time for the "faulty" region.  

  • Hi Pengfei,

    Here you have two screenshots. I have updated to use the DL_FlashCTL_blankVerifyFromRAM operation instead, but still the same issue.

    Added memory watches for ECC corrected data, uncorrected data and ECC checksums as well in the screenshots.

    Before erase:

    After erase:

    To answer your second question. Yes, we are using two sectors for settings as well. But we have not seen issues with those sectors (yet). But we rarely updates those settings after production.

  • Hi Daniel,

    Thank you so much for your update. Sorry for requesting more tests, let clarify on the tests or data we need to analyze the situation:

    1. STATPCNT, STATCMD and STATADDR register value after erasingAs for the screenshot of "After erase" you shown, could you please give a image about flashctl registers STATPCNTSTATCMD and STATADDR just after the "erase" command executed (for the fail erase case)? For the current image, since you have applied the blankverify, so these registers are related to blankverify command. (And also please let me know how many times of erase needed in this case to make erase successful)Thank you. 

    2. Time for erasing in fail case: Could you help get the time consumption for a single erase(have better to measure each single erase time until erase successfully) in this fail case (for example by enabling a timer or by toggling a GPIO to measure the time)?

    3. The return result when you take DL_FlashCTL_readVerifyFromRAM64() API for the flash corruption region. (We see the erase command returns a pass result and the blank verify returns a fail result from your previous test right?).

    As for the blankverify proposal I mentioned before, one of my idea is to take below process for erase:

    1. Set flash word index flashWordIdx = 0.
    2. Take a erase to the flash sector.
    3. Take a blank verify from the flash words flashWordIdx inside this erased sector. The flash word index increases by one if a blank verify is passed.
    4. If all the flash words in this sector passed blank verify (flashWordIdx > =128), then this sector erased successfully. If a flash word flashWordIdx does not pass blank verify, then go back to step 2.

    It will increase the time for a sector erasing, but please kindly check whether it is an acceptable temporarily workaround. We have requested for your team to get some fail units for analysis, but want to check whether this method could work for the existing productions.

  • Christer has measured erasing timing:

    We have measured timings to erase the program area of the flash using a testpin and scope. Screenshots added to shared folder.

     

    Code run for each sector:

     

    Raise pin
    DL_FlashCTL_unprotectSector
    DL_FlashCTL_eraseMemoryFromRAM
    DL_FlashCTL_waitForCmdDone( IFlashCfg_Instance.handle ) )
    Lower pin

     

    For empty sectors except one, erasing one sector takes 53.8 us. Seems consistent.

     

    For empty sectors except one, erasing the whole ProgramArea, raising pin before loop, lowering after loop
    - Good chip: 8.88 ms
    - Bad chip:  9.32 ms when writing to an adress in flash that works on the chip
    - Bad chip:  About 5 ms when writing to an adress in flash that does not work on the chip

     

    We do not exit the loop early in any case, so on bad chip with bad address it seems the chip just stops erasing for whatever reason?

     

    For some filled sectors, it takes 
    - Good chip: 194 ms, 4 ms per sector,

     

    When erasing on a good chip there is some sort of I-don't-know-what happening towards the end.
    Not seeing that on the bad chip.
    Not sure what to expect or look for.

  • Hi Pengfei,

    I think I have found something, sorry for the breakpoint handling :-)

    When erasing sectors containing data is set STATPCNT to 1 when done and 0 if nothing was needed to be erased.

    But when I get to sector 124 STATPCNT also return 0 even though the sector is not erased. In the screenshot above it took 13 retries until it was erased and at that point STATPCNT returned 1. What is the STATPCNT register used for?

    Below you can see one of these retries where the flash is not erased, but STATPCNT is set to 0 (as it already was erase).

  • Hi Daniel,

    Just want to double check, for the 124 sector, the first 12 tries the STATPCNT returned 0 and content is not well erased, and the 13th try the STATPCNT returns 1 and the content is erased right?

    This register means the erase pulses number sent by flash controller for erasing. 

  • That is correct

  • Thank you for confim.

    And for the 1-12 erase try, actually you could see the other content is erased expected 0x1F300 right? 

  •   and  

    I have uploaded an updated flash driver, Flash_updated.c, according to Pengfeis suggestions. Please take a look at that.

  • I have uploaded some images and videos now and please let us know if you need something else, "MSBT6 media.zip".
    Setup.jpg - Image of the setup with debugger and CAN dongle.
    Magnetic board back.jpg - Our magnetic sensor board
    Magnetic board front.jpg - Our magnetic sensor board
    Flash layout.mp4 - Description of our flash layout
    Clocktree.mp4 - Theia project and the clock tree
    EraseCommand_53_retries.mp4 - Debug video that requires 53 retries before the flash sector has been erased...

  • We’ve uploaded the SysPLL workaround proposal.

    Please review it during the day so we can align in a sync meeting tomorrow morning (Dallas time).
  • Hi Emanuel,

    The SYSPLL verify looks good. I'd expect some other functions which would do the actual setting up of SYSPLL?

  •  

    I have uploaded the latest flash driver, with some debug parts for catching the number of retries in the EraseSectorInterval operation, "Flash_2025_12_04_with_retries_debug.c".

    Is it the FLASHCTL_STATCMD_CMDINPROGRESS_MASK in flashctl->GEN.STATCMD you want us to check?

  • Hi Daniel,

    Yes that is the what I want you to check, my theory is the command is not getting processed, and we can check this in order to validate if my assumption is correct or not.

  • Have tried to perform some testing of the clock setup. All tests where using the shared Theia project as a base. 

    Each test included 100 write / erase cycles for different clockspeeds (PDIV:4, QDIV: 32 - 40):

    ULPCLK ( CPU / 2 )

    Percentage of test cycles with failed erase

    32MHz 0%
    34MHz 20%
    36MHz 24%
    38MHz 75%
    40MHz 85%

    Other tested combinations with similar results:

    - Changed HFCLK to SYSOSC as PLL input -> NOK

    - SYSPLL0 clocked down to 40MHz with UDIV:1 -> NOK ( If setting UDIV:2 with ULPCLK at 20MHz -> More stable )

    - Changing from SYSPLL0 to SYSPLL2X -> NOK

    As previously discussed the debugger seems to get more instable with higher clock speeds. Segger get more instable and have also observed connection issues in Theia.

    Attached image is from only using the Texas debugger, no segger was used on this board. In worst case an erase was needed to get back on track!


    In the datasheet it seems like PD0 is used for both FLASHCTL / DEBUG.


    From our side it seems like the instability is related to PD0. Any way to confirm this theory?