We have recently encountered a situation where we have large areas of corrupted data in the NVRAM chip of the CC3000 module.
Our investigation shows a very worrying result: products containing CC3000 will fail in the field if they have a reconnect cycle.
Attached is a zip file containing a dump of the NVRAM contents of two devices, one which was fully functional (after having had its profiles flushed) and one which suddenly stopped working after having been deployed in the field for only a few weeks. Comparison of these NVRAM dumps clearly shows 32-bytes of (assumed) corrupted data in every 128-byte block of the WLAN config area. The same corruption can also be seen in the WLAN config shadow area.
Having now read a previous thread on the E2E forum pertaining to an exhausted NVRAM, we have monitored and analysed the I2C activity of the NVRAM together with the commands on the SPI bus from the host. Also attached are some of the results we have found.
Analysis of the NVRAM I2C write cycles has shown that whilst performing a startup sequence of simple-link start, smart config then wlan connect we saw that addresses 0x28 to 0x2B were written 42 times each and addresses 0x10 to 0x13 were written 63 times each!
Because the CC3000 periodically crashes (various reasons, mostly reported on the forums already), we used a 1-minute reconnection timer if the TCP socket isn’t up to “get back on air”.
We found the big hitters were:
set_connection_policy(...) (21 writes to 0x10..0x13)
netapp_dhcp(...) (21 writes to 0x28..0x2b)
netapp_set_timers (21 writes to 0x28..0x2b)
Optimising to just the wlan_ioctl_set_connection_policy(...) (21 writes to 0x10..0x13), if the access point were to fail and we repeat this cycle just once every minute, the NVRAM would exhaust its (alleged) 100,000 write cycles in less than four days! Extending to a connection per hour, our product would have a life time of only six months. That’s long enough to get many units in the field and have them all come back dead! A product nightmare!
Presuming the 100000 count is incorrect, a 1000000 cycle EEPROM would last a touch over a month in the field on the 1-minute cycle. This is similar to the time line of our troubled unit.
To me, this is a wholly unacceptable situation. I do not think a 1-minute re-connection attempt is unreasonable, but I would certainly expect a device to last more than a few days or so in the not uncommon situation of access point failure.
- Why does it even need to write the EEPROM in this scenario?
- Why have TI chosen such a low-endurance EEPROM, if there is actually a good reason to write the EEPROM?
- What do we do to fix this?
Surely, this must have been considered during the product’s development. If this situation of writing to the same address 21 times was known to the TI development engineers, alarm bells should have been ringing over the suitability of the EEPROM. If this was not known by the development engineers, alarm bells should be ringing higher up in TI!
Can TI not see how this is going to cause a problem for a Wi-Fi device targeted at the Internet of Things? For small low power Wi-Fi boards which might turn on and off regularly? For Cellular hotspot connected devices where the hotspot cycles regularly, etc?
Discounting the fact that there doesn’t seem to be a need to write the EEPROM at all, it’s obvious from the I2C analysis what is happening here – blocks of data are being written to different EEPROM locations at a fairly sensible rate, but the low-address range is being used as some kind of pointer/status area: address 0x10 has some status value written to it (presumably to indicate that an area is currently being written), data is being written to the EEPROM then address 0x10 is being written to again to change its status back. The very next write command is to address 0x10 again changing the status back to the same value as before, ready for the next block write. This is repeated for every block of data written to the EEPROM.
If there is such a ridiculously low lifetime on the EEPROM chip, this limitation should be explicitly stated in the documentation, in a large font.
The next problem is, of course, there isn’t any decent documentation; everything has to be done via the wiki, forums and a small number of sample programs if we want to know what’s going on inside the CC3000 module at all. We can no longer even get access to an actual TI engineer directly; this is how TI now treat their customers who buy and develop with their products every day. It’s just shocking.
As a company, we have now put many hundreds of hours of work into ongoing trouble-shooting of problem after problem with this CC3000 device. Initially, we believed the CC3000 to be a fully-developed device from TI that would easily integrate into our design (which already contained TI’s CC2520, TI CC2591 RF transceiver and a TI MSP430). How I wish I could reconsider the decision knowing what I know now.
I am of the firm opinion that the CC3000 simply does not work as intended and should not be sold in its current state.
I’d like to ask other members of the E2E community if they have successfully developed a robust product incorporating the CC3000 device that is fit for sale to customers who live in the real world and expect a decent level of functionality and a warranty with their purchase. I’d genuinely like to hear about the experiences of other engineers who have developed with this product, and looking at other forum threads, I certainly don’t think we’re alone in having these issues.
In some cases, the TI staff posting on the E2E forums have been helpful in solving developers’ issues, but in many other cases, the threads just seem to peter out and are not satisfactorily resolved. What do we need to do to bring our problems with this CC3000 to that attention of someone who can actually do something about them? Will someone please tell me!