This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Mysteriously erased internal flash on a C2000 CPU

Other Parts Discussed in Thread: TMS320F2812

We are using a TMS320F2812 processor. It is programmed with a boot loader in the internal flash and an application in the external flash.

We program the boot loader in the internal flash by using Code Composer.  Inside Code Composer there is a tool called the "F2812 on chip flash programmer". This tool allows one to unlock the CPU (if the pass code is known), program the flash and optionally lock the flash afterward. We DO NOT LOCK THE CPU after programming the internal flash.

As I understand it, this is the ONLY method of programming internal flash on this CPU.

Once we have loaded the boot-loader into internal flash, when it executes, it reads executable application data from a CANBUS device on the board, and stores the data on the external flash, one word at a time. Then if its checksum is valid, jumps to the beginning of the application.

 

On 3 CPUs (out of about 300 that we have programmed so far) we have had a strange occurrence that we simply cannot explain. What happens is that the CPU suddenly stops communicating with its controller. As part of the post-mortem, we looked at the contents of internal flash and find that the entire content has been overwritten with zeroes.

Here are screen shots of first the “good” code and second, an example of the erased flash.

I discussed this with a TI technical representative several months ago and was given two conflicting diagnoses:

1. noise on the ADCREFP and ADCREFM pins.

2. Our application program may have gone off the rails and started writing to internal flash and we should carefully examine our application.

We have programmed our external flash perhaps a thousand times so far and we have programmed the internal flash about 300 times. We have never accidentally programmed the internal flash while we were programming the external flash (even when we were developing and debugging the external flash programming code about 4 years ago). Nor has this ever happened in a controlled environment (our lab). This has only happened in the field as part of a power-cycling of the CPU.

On very rare occasions (perhaps once out of a thousand times), when we power cycle a previously good CPU that had been running debugged, mature code, it fails to boot and is found to be dead with the internal flash erased. The code that was running was not attempting to program anything during this abortive boot up, it was just restarting its external flash application. Attempts to reprogram our CPUs require us to connect a special piece of hardware that feeds the code to the CPU through a CANBUS channel. That piece of hardware was not connected to the device, so no flash programming code was ever executed.

And furthermore, we DO NOT EVER program internal flash, so:

            How can I possibly program internal flash on the TMS320F2812 from my application?

            I thought that was impossible.

In fact, if I had known there were a way to program the internal flash from our code that would be a boon to us to exploit such a method, because right now, the only way we know to do this is from Code Composer. Doing this from Code Composer is very cumbersome and causes our technicians some difficulty in loading the code.

 

  • A couple of thoughts - I can think of a couple of reasons flash would read back 0x0000 even if it isn't really programmed that way

    •  code security module is locked (it locks after each reset).  One way to check if it is locked is to look at L0 RAM (0x8000) - L0 will also appear as all 0x0000.   If it appears locked, then open a memory window to the password location 0x3F7FF8-0x3F7FFF and refresh it.  If the password is all 0xFFFF then this will unlock the CSM.
    •  The flash voltage is not being supplied: VDD3VFL is used for reads of the flash and if it is not powered then the flash will read back all 0x0000

    Regards,

    -Lori

  • Hi Frank,

    the only other thing that comes to mind is FLASH control registers. Check FPWR register

     

    Lori, it seems that in "System Controls and Interrupts Reference Guide" for each family within C2000 the power on values for FPWR registers are wrong

     

    Regards, Mitja

  • Mitja:

    "Lori, it seems that in "System Controls and Interrupts Reference Guide" for each family within C2000 the power on values for FPWR registers are wrong"

    Sorry to not know offhand what the FPWR registers are, nor what their values should be.

    Where can I find the "correct" values for FPWR registers?


    You should understand that our boot loader was put into internal flash by Code Composer, not our Application code.

    It is likely that this automated method got these registers wrong? And the program actually ran fine for a fairly long time. So it is very unlikely that any registers that might have prevented correct programming by having incorrect values actually have incorrect values, because the program was intact and ran.

     

    - Frank

  • Mitja Nemec said:
    Lori, it seems that in "System Controls and Interrupts Reference Guide" for each family within C2000 the power on values for FPWR registers are wrong

    Hi Mitja,

    I think what we probably need to do is add a note on the description that mentions the boot ROM will wake the flash up.  At reset it will be asleep but the boot ROM accesses it which turns it on automatically.

    -L

  • Frank Rudolph said:

    Sorry to not know offhand what the FPWR registers are, nor what their values should be.

    Where can I find the "correct" values for FPWR registers?

    The FPWR registers control the power state of the flash (sleep, standby, active).  Any access to the flash will automatically wake it up and change the value of FPWR.  Even a debugger access - so I'm not sure this fits the problem in your case.

    FPWR is described in the system and control ref guide: for 281x this is www.ti.com/lit/SPRU078

    Frank Rudolph said:
    It is likely that this automated method got these registers wrong?

    Even if the power to the flash was turned off, any access to it will automatically power it back up.

    Frank Rudolph said:
    And the program actually ran fine for a fairly long time. So it is very unlikely that any registers that might have prevented correct programming by having incorrect values actually have incorrect values, because the program was intact and ran.

    This is strange.  It is one reason I am wondering about the flash power pin - if it could be damaged somehow?   If you work with a distributor you could contact them about sending a device back to TI for failure analysis.

    Regards,

    Lori

     

     

     

  • Lori Heustess said:

    The FPWR registers control the power state of the flash (sleep, standby, active).  Any access to the flash will automatically wake it up and change the value of FPWR.  Even a debugger access - so I'm not sure this fits the problem in your case.

    Thanks for that. I had already decided in my own mind that any configuration related to reading/writing to flash was an unlikely culprit, because the design has 10s of thousands of run-time hours logged so far, and if that were the case, it should have turned up before now!

    Lori Heustess said:

    This is strange.  It is one reason I am wondering about the flash power pin - if it could be damaged somehow?   If you work with a distributor you could contact them about sending a device back to TI for failure analysis.

    I have requested of the electronics team to investigate the power circuit for the flash (attached graphic).

    I proposed that something (solder bridge, failed capacitor film, or something) may have caused some of the VDD's 3 volts to be dropped across the series resistor, causing the voltage seen by the flash module to be too low.

    The only fly in that ointment is that at least one of the 3 failed units was re-programmed successfully. That points to this being at most an intermittent problem.

    It seems a stretch to me that an intermittent cause would remain solidly in place up until a reprogramming attempt was tried (and succeeded)

    So I guess I am still mystified by this one.

    Is it possible that an intermittent failure of VDD3VFL might somehow latch the value of FPWR registers into a state that 

    1. actually reprogrammed flash to all zeroes OR

    2. remained in a state that made it look programmed with zeroes until a new attempt at programming it "fixed" the problem?

    Thanks - Frank

     

     

     

  • Frank Rudolph said:

    Is it possible that an intermittent failure of VDD3VFL might somehow latch the value of FPWR registers into a state that 

    1. actually reprogrammed flash to all zeroes OR

    2. remained in a state that made it look programmed with zeroes until a new attempt at programming it "fixed" the problem?

    Frank,

    A flash programmer like code composer studio will not be able to program a locked device.   All of the flash control registers are also protected by the code security.  The first thing Code Composer will try to do is unlock the device.  If it can not so do,  then you will get an error.   Code Composer works by loading the programming algos into unsecure RAM which cannot access secure memory or registers (which includes the flash control) unless the device is unlocked.

    I will ask some of my colleagues if they have other thoughts on the subject.

    -Lori

     

     

  • Frank,

    Bit 0 of the CSMSCR register (address 0xAEF) defines whether the device is secure. Can you check this bit? If 0, it is not. If 1, it is secure and that would explain why you are seeing all zeros in the flash. If it is secure I'm not sure how to explain that you were able to reprogram one of the devices using the CCS plugin, or how it got secure in the first place, but let's take this one step at a time and determine if the devices are locked or if they are truly programmed to all zeros.

    BTW, there are other ways to program the internal flash besides the plugin. TI provides an API (http://focus.ti.com/docs/toolsw/folders/print/sprc125.html) that can be used in different ways. An application can 1) embed this API into their code to reprogram devices in the field or 2) a tool can download this API onto the device using one of our boot loader options. for instance, I think Elprotronic (http://www.elprotronic.com/) uses the SCI and BP Micro (http://www.bpmmicro.com/index.html) uses the parallel I/O, or 3) an emulator can download this API via the JTAG port (CCS plugin, SDFlash - http://www.spectrumdigital.com/, Blackhawk - http://www.blackhawk-dsp.com/).

    The only 2 ways I can think of the flash actually being reprogrammed without intending to is 1) if the API was embedded in your software - I'm assuming this is not the case since it does not seem like you knew you could do this, or 2) rouge code somehow went in and modified the undocumented flash registers to start programming erroneous data. Both of these seem unlikely, especially that the entire flash is zeros. The fact that the entire flash is zeros makes me highly suspect the device is somehow erroneously secured. So again, let's answer that question first and then move to the next step.

    Regards,
    Dave Foley

     

  • Frank,

    I am curious as to why you have that series resistor in the VDD3VFL rail. Although the graphic shows its value to be 0 ohms, its presence could potentially cause issues (such as bad solder, incorrect resistor value etc) . Have you verified the value of VDD3VFL to be 3.3v?

  • HareeshJ said:

    I am curious as to why you have that series resistor in the VDD3VFL rail. Although the graphic shows its value to be 0 ohms, its presence could potentially cause issues (such as bad solder, incorrect resistor value etc) . Have you verified the value of VDD3VFL to be 3.3v?

    Yes, we have done this. The connection is a soldered wire. We have measured its resistance, which is very near zero, and the voltage to the VDD3VFL pin is 3.3V to a very tight tolerance.

    We have approximately 250 of these in service, running the same application code and the same boot-loader code. Collectively they have had about 200,000 to 300,000 hours of run-time, thousands of CPU restarts, and several hundred flash-programming attempts. The calendar time stretches over about 4 years.

    Out of all this, we have seen 3 instances of erased internal flash, and all have been within the last year.

    I have run out of ideas. All the suggestions given above are interesting, but nothing has given us a key to solving the problem yet.

    - Frank

     

  • Frank,

    Have you checked Bit 0 of the CSMSCR register (address 0xAEF) to see whether the device is secure? Regardless, I think the devices have to be submitted to TI for Failure analysis. Please let me know if you need the details to do this.

  • I powered up Code Composer and did a memory view of address 0xAEF and it contains 0x70

    The only bit that is turned on in this register is "reserved".

    The CPU is NOT secure.

    I assumed that the flash programming tool took care of all aspects of writing the internal flash code.

    Are we supposed to have our program set the 0th bit of this register before running the program to protect the CPU from having its internal flash erased??????

    Incidentally, we now have not 3, but 4 of these devices that have failed in this fashion. The 4th one failed a few days ago. All symptoms are the same.

    It is most perplexing that these devices would run for years with unchanged firmware and the same hardware design, and suddenly we start seeing instances of intermittent failures of this kind.

    - Frank

     

  • Frank Rudolph said:
    Are we supposed to have our program set the 0th bit of this register before running the program to protect the CPU from having its internal flash erased??????

    Frank,

    No, this bit is set whenever the CSM locks.  This is done on a reset automatically.  No action is needed from your algorithm. 

    Frank Rudolph said:
    It is most perplexing that these devices would run for years with unchanged firmware and the same hardware design, and suddenly we start seeing instances of intermittent failures of this kind.

    I agree.   You should have received information on how to submit the device for failure analysis.  If you have not, please let us know.  Hopefully that will reveal something.

    Best regards,

    Lori

  • Lori Heustess said:

    You should have received information on how to submit the device for failure analysis.  If you have not, please let us know.

    Sorry, we have not received any information so far.

    - Frank

  • Frank Rudolph said:

    Sorry, we have not received any information so far.

    - Frank

    I have followed up - you will receive it today.

    Regards

    Lori

     

  • Thank you, Lori. We have received the email and we are working on fillilng out the questionnaire now.

    - Frank