This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3000 NVRAM Exhaustion after only 1600 connection cycles… reconnect required due to other CC3000 bugs

Other Parts Discussed in Thread: CC2591, CC2520

We have recently encountered a situation where we have large areas of corrupted data in the NVRAM chip of the CC3000 module.

Our investigation shows a very worrying result: products containing CC3000 will fail in the field if they have a reconnect cycle.

Attached is a zip file containing a dump of the NVRAM contents of two devices, one which was fully functional (after having had its profiles flushed) and one which suddenly stopped working after having been deployed in the field for only a few weeks. Comparison of these NVRAM dumps clearly shows 32-bytes of (assumed) corrupted data in every 128-byte block of the WLAN config area. The same corruption can also be seen in the WLAN config shadow area.

Having now read a previous thread on the E2E forum pertaining to an exhausted NVRAM, we have monitored and analysed the I2C activity of the NVRAM together with the commands on the SPI bus from the host. Also attached are some of the results we have found.

Analysis of the NVRAM I2C write cycles has shown that whilst performing a startup sequence of simple-link start, smart config then wlan connect we saw that addresses 0x28 to 0x2B were written 42 times each and addresses 0x10 to 0x13 were written 63 times each!

Because the CC3000 periodically crashes (various reasons, mostly reported on the forums already), we used a 1-minute reconnection timer if the TCP socket isn’t up to “get back on air”.

We found the big hitters were:
    set_connection_policy(...) (21 writes to 0x10..0x13)
    netapp_dhcp(...) (21 writes to 0x28..0x2b)
    netapp_set_timers (21 writes to 0x28..0x2b)

Optimising to just the wlan_ioctl_set_connection_policy(...) (21 writes to 0x10..0x13), if the access point were to fail and we repeat this cycle just once every minute, the NVRAM would exhaust its (alleged) 100,000 write cycles in less than four days! Extending to a connection per hour, our product would have a life time of only six months. That’s long enough to get many units in the field and have them all come back dead! A product nightmare!

Presuming the 100000 count is incorrect, a 1000000 cycle EEPROM would last a touch over a month in the field on the 1-minute cycle. This is similar to the time line of our troubled unit.

To me, this is a wholly unacceptable situation. I do not think a 1-minute re-connection attempt is unreasonable, but I would certainly expect a device to last more than a few days or so in the not uncommon situation of access point failure.

  1. Why does it even need to write the EEPROM in this scenario?
  2. Why have TI chosen such a low-endurance EEPROM, if there is actually a good reason to write the EEPROM?
  3. What do we do to fix this?

Surely, this must have been considered during the product’s development. If this situation of writing to the same address 21 times was known to the TI development engineers, alarm bells should have been ringing over the suitability of the EEPROM. If this was not known by the development engineers, alarm bells should be ringing higher up in TI!

Can TI not see how this is going to cause a problem for a Wi-Fi device targeted at the Internet of Things? For small low power Wi-Fi boards which might turn on and off regularly? For Cellular hotspot connected devices where the hotspot cycles regularly, etc?

Discounting the fact that there doesn’t seem to be a need to write the EEPROM at all, it’s obvious from the I2C analysis what is happening here – blocks of data are being written to different EEPROM locations at a fairly sensible rate, but the low-address range is being used as some kind of pointer/status area: address 0x10 has some status value written to it (presumably to indicate that an area is currently being written), data is being written to the EEPROM then address 0x10 is being written to again to change its status back. The very next write command is to address 0x10 again changing the status back to the same value as before, ready for the next block write. This is repeated for every block of data written to the EEPROM.

If there is such a ridiculously low lifetime on the EEPROM chip, this limitation should be explicitly stated in the documentation, in a large font.

The next problem is, of course, there isn’t any decent documentation; everything has to be done via the wiki, forums and a small number of sample programs if we want to know what’s going on inside the CC3000 module at all. We can no longer even get access to an actual TI engineer directly; this is how TI now treat their customers who buy and develop with their products every day. It’s just shocking.

As a company, we have now put many hundreds of hours of work into ongoing trouble-shooting of problem after problem with this CC3000 device. Initially, we believed the CC3000 to be a fully-developed device from TI that would easily integrate into our design (which already contained TI’s CC2520, TI CC2591 RF transceiver and a TI MSP430). How I wish I could reconsider the decision knowing what I know now.

I am of the firm opinion that the CC3000 simply does not work as intended and should not be sold in its current state.

I’d like to ask other members of the E2E community if they have successfully developed a robust product incorporating the CC3000 device that is fit for sale to customers who live in the real world and expect a decent level of functionality and a warranty with their purchase. I’d genuinely like to hear about the experiences of other engineers who have developed with this product, and looking at other forum threads, I certainly don’t think we’re alone in having these issues.

In some cases, the TI staff posting on the E2E forums have been helpful in solving developers’ issues, but in many other cases, the threads just seem to peter out and are not satisfactorily resolved. What do we need to do to bring our problems with this CC3000 to that attention of someone who can actually do something about them? Will someone please tell me!

cc3000_nvram_dumps.zip
  • Hi Ian,

     

    Thank you for your post.

     

    We are analyzing your memory dump and working to try and recreate it in-house.

    From reviewing your use case description we suspect that the issues might have been caused by the numerous, maybe over the part limit, writes to the EEPROM.

     

    Function such as: set_connection_policy(), netapp_dhcp() and netapp_set_timers () are indeed writing to the EEPROM.

    They are meant to be used in configuration phase of your application and not tens of thousands of times in a short period of time.

     

    There is no need to set the connection policy on every device wake up, it is recommended to use it once and let the device to automatically connect to an existing profile.

    Can you please elaborate on your SW flow a little bit, so we can understand why you were required to call these functions so frequently?

     

    Thanks in advance,

    Alon

  • Alon,

    Suppose you want to scan for available networks and connect to one of them. The only way to start and stop a scan is by calling wlan_ioctl_set_scan_params(...), which writes to the EEPROM every time you call it.

    So, how can writing to the EEPROM be avoided then?

  • Alon Srednizki said:

    Can you please elaborate on your SW flow a little bit, so we can understand why you were required to call these functions so frequently?

     

    Thanks in advance,

    Alon

    Hi Alon

    Thanks for your reply. I appreciate your attention to this issue.

    Since we now know that those functions are writing to the eeprom, we obviously want to reduce their use. In this use case, we have a device that:

    1. powers up
    2. configures smart config to determine whether to change any values
    3. waits 30 seconds
    4. configures to connect to the access point
      1. simple_link_start
      2. set connection policy (in case the first one didn't work, or we're in smart config mode, or the CC3000 is in an incorrect state)
      3. set dhcp (we have had DHCP corruption, meaning we now call this again)
      4. set timers (this is probably superfluous)
    5. connects TCP socket
    6. waits a few minutes to see whether TCP connected and traffic flow is established. If not,
      1. turn off the CC3000 and back on (we have found that the CC3000 locks up on occasion, requiring a power cycle)
      2. return to step 4 above (overall solution to the ARP problem, the CC3000 lockup, various other no-comms issues)

     

    Now, 4b may be overkill - we could perhaps send this every few times around the loop, limit the damage.

    I can't see a "read dhcp" method (maybe read one of the nvram files?) so I can't skip 4c

    4d probably is overkill.

     So, unless we can get the bugs under control, we still need to send these commands to get "back on air". Do you have another option?

    Incidentally, we have initial successful tests on the "nuclear option". That is, putting an external level converter and I2C FRAM on the CC3000. This seems like a very large hammer to fix an underlying design problem, however. Is this workaround the final solution to the problem (including Ivor's issue)?

     

    Best regards,
    Ian.

  • Hi Alon,

    Have you been able to make any progress with this yet? This is still a big problem for us.

    Thanks,
    Ian.

  • This is really more on the topic of hearing "about the experiences of other engineers who have developed with this product" than on your specific issue. Which probably won't help you with your issues, sorry...

    My two use cases are a) connect to AP to register presence and b) connect to AP to update and exhange registration data. The former is similar in concept to active RFiD with some current head scratching about battery types. The latter is a future project (maybe).

    I've nowhere near your amount of hours put in but so far I've found the documentation ok, if in error in a few places. But I solved a number of issues with a firmware update (wlan_add_profile specifically) which puts me in a quandry - to ship these in a system I now have to put in place a process to update the firmware first. Which could be quite a problem as patching hardware was never in the work flow so it'll have to done via the microcontroller. Which doesn't have the space. I'm a bit unhappy...

    At least my code process is simpler though:

    - Initialise by clearing all profiles, setting an AP profile and setting connection policy. This is done once only or until I need to change profile.

    - Each wake cycle is: wlan_init, wlan_start, [wait for connect] wlan_stop. All over in 1.5 seconds.

    - Change profile is done via the external PIC microcontroller (serial console).

    - From your post I'm putting in a mosfet to pull power altogether.

    So in this scenario I should be safe. I did notice from example code (Adafruit) that they were using set_connection_policy(), netapp_dhcp a lot. On every startup in fact. Knowing that EEPROM is finite but not knowing how internal implementation was performed (until your posting), I wrote my code to avoid that. Just as well evidently.

    After your post and reviewing the API, I think using NVRAM (ie: EEPROM) is problematic in many cases and that's a TI design flaw. The one mentioned by Ivor really stands out - if you have to scan each time, your lifetime is automatically finite which is crazy. I find it alarming that it's in no way obvious why wlan_ioctl_set_scan_params() would have to write to EEPROM. TI should have drawn a line between parameters that stay through a reset and everything else and make sure commands that write to eeprom were distinct and necessary.

    Looking at your use case, the only thing I can think of is a jumper pin on the controlling microcontroller. Only then would it perform a smart config which is when all your config changes become necessary. But I suppose this rather defeats the point of smart config as a physical connection now becomes necessary. Alternatively, the system you are connecting to could, over a socket, instruct the CC3000 to do a smart config on next boot. Neither is particularly attractive but you may not have a choice :-(

    Edit: And here's another non-obvious fact. The CC3000 does not hi-Z its SPI_DOUT line when CS is not asserted. Great. Cheers for that.