This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC2541: BLE121LR production devices losing flash

Part Number: CC2541

Hi BLE Team,

I have a customer who is having some trouble with their CC2541 based BLE121LR module. The customer has been notified by a handful of their clients that the CC2541 has lost its flash memory, and they believe it's due to the core SDK section being corrupt. Is there any way to help them figure out why the flash is being corrupted? They are unable to replicate the failure, but have had 10 unique devices with the same problem, and are able to to re-flash the units without issue.

Thanks,

Barend

  • Hi Barend,

    Have they been able to read the flash content of the devices? If not, what symptoms indicate flash corruption?

    Regards,
    Fredrik
  • Hi Fredrik,

    I emailed you the content of a failed flash, they say that when they analyzed it, "the core SDK section has been corrupted". Please let me know if we need any more information from them, and I will also loop them in to this post to help facilitate and expedite discussion.

    Thanks!
    Barend
  • Hi Fredrik,

    I understand Barend already sent you at least one of the failed flash dumps. I am one of the firmware engineers on this project trying to figure out the issue. We have clients that are experiencing this actively so the urgency is rising for us.

    We are having trouble reproducing the issue on demand, but we have seen this happen in the field now a significant number of times. What we see when looking at the dump is part of the SDK flash region is corrupted but not the entire flash.

    Our initial suspicion has been around voltage as that is the most typical reason we could think of for flash corruption. Additionally, given that we can flash the unit over and it begins working again correctly leads us to believe there is no permanent damage.

    Do you have other suggestions where we can look for answers to this issue?

    As a note, it is a small percentage of our units that seem to exhibit this behavior as other units have been in the field for quite a while with no issues and are running the same firmware.

    Thanks,
    Mark.
  • Hi Mark,

    Can you please explain what the SDK flash region is? What part of the code resided here?

    There has never been any cases of flash corruption on the CC2541 caused by malfunction of the IC itself, for example triggered by voltage. In few cases where undesired flash operation has experienced it has been caused by run-away code due to stack overflow. Additionally unstable system clock, for example caused by poor layout of the crystal oscillator, can also cause run-away function pointer.

    Questions:
    - Do you have code that writes to and/or erases flash?
    - Are the flash pages write-protected where applicable?
    - It sounds like the "corruption" happens in the same place, is that correct?
    - Do you have any information on what happens to the product right before the problem appears?
    - What is the application and what environment is it used in?


    Regards,
    Fredrik
  • Hi Fredrik,

    This module is the BLE121LR from Bluegiga (now SiLabs), but is based on the CC2541.  The corruption has happened in two different areas, at least of the devices we have checked, others may have been different yet again, we don't know at this time.  We have asked our support team in the future to send all devices to us that exhibit this issue for diagnosis. 

    Bluegiga has defined the flash regions as:

    0x00000 boot loader, 4kb

    0x01000 Core Software, 92Kb

    0x18000 Program/Flexible Flash space

    Our corruption has occurred in the Core Software but not necessarily exactly in the same location as seen in the hex dumps.  

    Answers:

    - We do not write or erase flash ourselves, however, we are unsure what Bluegiga may be doing under the covers. 

    - We do not know if Bluegiga has write protected these sections of flash, but suspect not.  

    - Corruption happens in the same flash region but different specific locations. 

    - We are trying to ascertain what is happening right before, but unfortunately our user base is not technical so we are getting limited data here. 

    - The BLE is in a medical device used in hospitals in general floor areas.  That we know of non of our units have been near MRI/CT etc as we are generally used for patient monitoring within patient hospital rooms.  The product is deployed in India, and we have seen power be an issue at times, however our design is DC power supply with battery and so we do not feel mains stability is a factor right now. Environmentally, not all hospital areas are A/C and humidity and temperature can be high at times, although we have design and tested the device to work within the environment and it is within spec of all components. 

    We have opened a support case also with Bluegiga, but were reviewing other forum posts about CC2541 and flash and so wanted to also get feedback from TI as well.  If there are any steps you feel we can take to help diagnose it, we are open to any feedback or suggestions.

    Thanks,

    Mark.

  • Hi all,

    We are also having similar issues to the ones you are describing, on CC2541 based design.

    Were you able to find the root cause of your issue and solve it ?

    Thanks,
    Mohamed
  • Hi Mohamed,

    No the issue still persists for us. I feel we would stand a better chance if we were only dealing with the CC2541 by itself, but our module is from Bluegiga (now Silabs) and they utilize an interpreted scripting language on the module which given TI's feedback makes me highly suspicious that it or the module design may be the root cause. But we have not had any luck getting answers from Silabs.

    We are working on a band-aid approach to rewrite the last flash program in case of a detected failure (we have another MCU with space to store the program and detect it). While not ideal it is the only solution given the lack of answers we can find at this time.

    For us, we are moving to a different module for the next revision of the product so a band-aid will work for us given it is temporary. Sorry I couldn't be of more help.

    Mark