This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

BQ40Z80: flash corruption during INIT

Part Number: BQ40Z80

Tool/software:

Hi there,

this problem is a followup of this one: https://e2e.ti.com/support/power-management-group/power-management/f/power-management-forum/1402001/bq40z80-bms-field-reject---no-access-to-memory---eeprom-erased/5414983?tisearch=e2e-sitesearch&keymatch=bq40z80%20frank#5414983

Sealing the device did not improve the situation.

We have done further testing and found that

- the device stays in INIT for about 2 seconds (INIT bit of OperationStatus is set

- if we then remove power to the device (remove PACK+) it will brick itself

- when bricked, we see 0xFF in most responses from SBS commands that report flash contents (for example, 0x20 ManufacturerName)

- when bricked, it is no longer possible to unseal the device (assumption: all keys are also all 0xFF)

The device has minimal configuration as we only use it as a normal LiIon protection device with cell balancing. All the advanced features like IT, Gauging, Charging, are disabled (Mfg Status Init = 0x0010).

This application uses a removable balancing connection, that also provides GND to the chip. Because of this, and because we cannot guarantee contact bounce free insertion, we cannot ensure a stable supply voltage to the chip. 

We also have failures in the field, which suggests that there might be more failure modes.

The findings above suggest that the chip makes flash writes during INIT (and maybe also during operation), which result in memory corruption and bricking the device, when power is removed at the "wrong moment".

How can we configure the device such that it will not write any data to flash during INIT and normal operation (ACTIVE, SLEEP)?

Can we get more info about what exactly it is writing in which situation?

Is it possible to mitigate this with added supply capacitance?

Is there logic in the firmware that stops flash write activity _before_ supply voltages become critical (both INIT and operation)?

Is there a flash write protection mechanism that inhibits these "background" writes?

Maybe a different firmware image that we can load onto the devices?

Any help will be greatly appreciated, as at the moment the field failure rate is significant, in the order of 5%.

Regards Frank

Here is a dump of the device configuration:

  0x00, 0x00, 0x00, 0x00, 0xAC, 0x0D, 0x1E, 0x44, 0x00, 0x04, 0x20, 0x00, 0x00, 0x00, 0x00, 0x96, // 4AC0
  0x00, 0xAF, 0x00, 0xE4, 0x77, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 4AD0
  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x16, 0xEA, 0x00, 0xAC, 0x0D, 0xD0, 0x07, 0x05, 0x00, 0x00, // 4AE0
  0x0A, 0x00, 0x00, 0x00, 0x00, 0xD0, 0x07, 0x64, 0x64, 0x00, 0x00, 0xDE, 0x0A, 0x6E, 0x0C, 0x05, // 4AF0
  0x05, 0x04, 0x0A, 0x14, 0xA0, 0x05, 0x1E, 0x00, 0x3C, 0x0A, 0x00, 0x3C, 0xD0, 0x00, 0x00, 0x02, // 4B00
  0x00, 0x04, 0x64, 0x00, 0x10, 0x04, 0x0A, 0x00, 0x14, 0x28, 0x3C, 0x50, 0x5A, 0x0A, 0x00, 0x14, // 4B10
  0x28, 0x3C, 0x50, 0x5A, 0x2C, 0x01, 0xB0, 0x01, 0x0A, 0x00, 0x81, 0x00, 0x31, 0x00, 0x03, 0x0A, // 4B20
  0x00, 0x1E, 0x32, 0x03, 0x0A, 0x10, 0x0B, 0x3C, 0x0C, 0xB8, 0x0B, 0x10, 0x0E, 0x02, 0x03, 0x2D, // 4B30
  0xF2, 0x87, 0x33, 0xB5, 0xB6, 0x65, 0x99, 0x6A, 0xF1, 0x16, 0x24, 0x47, 0x3B, 0xE7, 0x20, 0xC1, // 4B40
  0x2A, 0x73, 0x1D, 0x01, 0xFA, 0x89, 0x61, 0x89, 0x67, 0x27, 0xDC, 0xCD, 0x4F, 0xB5, 0x42, 0x13, // 4B50
  0x4A, 0xD4, 0x99, 0x11, 0xD7, 0xC4, 0xA6, 0x7D, 0x46, 0x30, 0x11, 0xC0, 0x18, 0x40, 0x38, 0x5A, // 4B60
  0x0A, 0x00, 0x00, 0x00, 0xB8, 0x0B, 0x1C, 0x0C, 0x00, 0x05, 0x68, 0x10, 0x04, 0x10, 0x64, 0x5F, // 4B70
  0x80, 0x0C, 0xE4, 0x0C, 0x06, 0x08, 0x68, 0x10, 0x04, 0x10, 0x64, 0x5F, 0x00, 0x6F, 0x01, 0x02, // 4B80
  0x02, 0x02, 0x02, 0x03, 0x50, 0x46, 0x00, 0x00, 0x50, 0x5F, 0x3C, 0x64, 0x00, 0xCE, 0xD1, 0xFF, // 4B90
  0x01, 0x1A, 0x00, 0x07, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xE8, 0x03, // 4BA0
  0x96, 0x00, 0x64, 0x00, 0x14, 0x64, 0x64, 0x01, 0x03, 0x0C, 0x05, 0xD0, 0x07, 0x02, 0xD7, 0x3F, // 4BB0
  0x01, 0x0C, 0xC4, 0x09, 0x0A, 0x54, 0x0B, 0x00, 0x00, 0x02, 0x00, 0x00, 0x90, 0x10, 0x90, 0x10, // 4BC0
  0x90, 0x10, 0x90, 0x10, 0x90, 0x10, 0x01, 0xA0, 0x0F, 0xA0, 0x0F, 0xA0, 0x0F, 0xA0, 0x0F, 0xA0, // 4BD0
  0x0F, 0x00, 0x0A, 0x0F, 0x40, 0x1F, 0x05, 0x00, 0x00, 0x00, 0xE8, 0x03, 0x0F, 0x40, 0xA2, 0x05, // 4BE0
  0xD0, 0x8A, 0x01, 0x18, 0xFC, 0x0F, 0x00, 0x0A, 0x0F, 0x00, 0x0A, 0x05, 0x0F, 0x00, 0x0A, 0x05, // 4BF0
  0x0F, 0x00, 0x0A, 0x05, 0x0F, 0x6E, 0x0C, 0x02, 0x5A, 0x0C, 0x04, 0x0D, 0x02, 0xF0, 0x0C, 0xCC, // 4C00
  0x0D, 0x02, 0x9A, 0x0D, 0xAC, 0x0A, 0x02, 0xC0, 0x0A, 0xE4, 0x09, 0x02, 0xF8, 0x09, 0x0A, 0xD0, // 4C10
  0x07, 0x08, 0x07, 0x08, 0x07, 0x02, 0x00, 0xC4, 0x09, 0xD0, 0x07, 0xF0, 0xD2, 0x02, 0x00, 0x2C, // 4C20
  0x01, 0x02, 0x00, 0x5A, 0x2C, 0x01, 0x1E, 0x0C, 0xFE, 0xF4, 0x01, 0x02, 0x64, 0x00, 0x02, 0x32, // 4C30
  0x00, 0x02, 0x0A, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x98, 0x08, 0x05, 0x94, 0x11, 0x05, 0x10, // 4C40
  0x27, 0x05, 0xF0, 0xD8, 0x05, 0x36, 0x0D, 0x05, 0x94, 0x0E, 0x05, 0xB8, 0x08, 0x05, 0xC8, 0x00, // 4C50
  0xC8, 0x00, 0x96, 0x00, 0x02, 0x78, 0x00, 0x14, 0x02, 0xAC, 0x0D, 0x0A, 0x00, 0xF4, 0x01, 0x05, // 4C60
  0x64, 0x00, 0x74, 0x0E, 0x32, 0x00, 0xC8, 0x00, 0x05, 0x2C, 0x01, 0x90, 0x01, 0x02, 0x00, 0x00, // 4C70
  0x02, 0x05, 0x00, 0x05, 0xFB, 0xFF, 0x05, 0x05, 0x00, 0x05, 0x64, 0x02, 0x05, 0x64, 0x05, 0x05, // 4C80
  0x00, 0xAC, 0x0A, 0x24, 0x0B, 0x74, 0x0B, 0xA6, 0x0B, 0xD8, 0x0B, 0xD2, 0x0C, 0x0A, 0x00, 0xA0, // 4C90
  0x0F, 0x84, 0x00, 0x60, 0x01, 0x08, 0x01, 0x68, 0x10, 0xBC, 0x07, 0xA4, 0x0F, 0xB0, 0x0B, 0x68, // 4CA0
  0x10, 0xBC, 0x07, 0xA4, 0x0F, 0xB0, 0x0B, 0xA0, 0x0F, 0xF4, 0x03, 0xBC, 0x07, 0xD8, 0x05, 0x04, // 4CB0
  0x10, 0xCC, 0x09, 0x88, 0x11, 0xC0, 0x0D, 0x58, 0x00, 0x2C, 0x00, 0xE8, 0x03, 0xE8, 0x03, 0x10, // 4CC0
  0x0E, 0xA0, 0x0F, 0x00, 0x32, 0x4B, 0x01, 0x32, 0x00, 0x5F, 0x0A, 0x00, 0x0A, 0x96, 0x00, 0x50, // 4CD0
  0x28, 0x00, 0x14, 0x5E, 0x01, 0x3C, 0x46, 0x00, 0x28, 0xA0, 0x0C, 0x68, 0x10, 0x2C, 0x01, 0x19, // 4CE0
  0x00, 0xB8, 0x0B, 0xFA, 0x00, 0x40, 0x00, 0x4B, 0x00, 0x28, 0x01, 0x01, 0xC0, 0x0D, 0x68, 0x10, // 4CF0
  0x0C, 0x03, 0x01, 0x1C, 0x01, 0x06, 0x03, 0x70, 0xFF, 0x42, 0x53, 0xFF, 0x64, 0x98, 0x08, 0xFA, // 4D00
  0x64, 0x00, 0x32, 0x00, 0x0A, 0x00, 0x01, 0x3C, 0x0A, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, // 4D10

  • Hi there, any update? Client is very unhappy and we need a solution as quickly as possible. Regards Frank

  • Hi Frank,

    What is the voltage at this time, along with the Valid Update Voltage parameter at this time? If the voltage is less than the valid update voltage, the device should not allow for data flash updates.

    Can you also share an .srec from one of the effected devices if possible?

    Regards,

    Anthony

  • Hi Anthony,

    we only have small capacitors at the supply lines, 220p at BAT, 1n at VCC, 10n at PACK. 2u2 at PBI. As we are dealing with contact bouncing, the voltages drop very quickly.

    The devices are bricked, with the UNSEAL keys all 0xFF. This means that the memory cannot be accessed any longer, as UNSEAL is impossible. I was able to use the one-byte SBS commands to gather some of the data, and most of it is 0xFF. You can see my results in the original post that I have linked above.

    What you write is promising though. Can you confirm that there is a voltage check before starting a flash write? If yes, then would a larger capacitor help to sustain the write cycle better? How long is such a write cycle? Do you have a recommendation for a capacitor value? Where should it be located? I assume, a capacitor across the battery power leads would help?

    Regards Frank

  • Hi Anthony,

    also, if possible, can you answer any of the other questions that I had? Especially, is it possible to keep the device from writing to flash entirely? We are just using it as a normal BMS with voltage, current and temperature protection. No fancy stuff like charging or gauging, no lifetime, nothing that would actually require flash writes. Here are the questions:

    How can we configure the device such that it will not write any data to flash during INIT and normal operation (ACTIVE, SLEEP)?

    Can we get more info about what exactly it is writing in which situation?

    Is it possible to mitigate this with added supply capacitance? >>> in progress, need write cycle duration, current consumption, (or capacitance value), and correct place to put the capacitor.

    Is there logic in the firmware that stops flash write activity _before_ supply voltages become critical (both INIT and operation)?

    Is there a flash write protection mechanism that inhibits these "background" writes?

    Maybe a different firmware image that we can load onto the devices?

    Regards Frank

  • Hi Anthony,

    the situation is not resolevd, I need answers to my questions above.

    Regards Frank

  • Hi all,

    we're giving up on this, with the conclusion that it is not possible to stop the BQ40Z80 from bricking itself from supply contact bouncing. We have added a 22u/35V/X5R capacitor right across the battery terminals (on the BMS PCB), and created a contact bouncing simulator that is able to play back actually recorded contact behavior. We modulate the playback speed to cover a wide range of scenarios. This simulator can easily brick the IC after a few minutes of playback. In the field, the BMS solution is connected to the pack via a connector (JST XH), and this is an eBike application with possible micro contact bounce of these.

    The below screenshots show battery voltage in yellow, and input current (from a 25V bench PSU, with a MOSFET based switch in series) via a 1k series resistor.

    This shows an overview of the sequence. Left of center shows the normal operating pattern: short wakeup every 250ms, longer wakeup every second. Right of center shows the bricked condition.

    Below shows a zoom of the IC bricking itself. The cursor is at 1.7V (undervoltage cutoff of the IC). There is a spike that contains multiple bounces (zoom of that in the following picture), followed by a longer powered time. We think that the fact that the supply voltage 'hovers' around 1.7V during the bounces is confusing the IC, thinking that the flash contents needs to be repaired. The IC startup behavior is different from normal and we think that the large current spike (approx 3mA) at the right shows a flash erase. After this event, the IC is bricked We read all 0xFF from the still accessible flash regions, as explained in the first post above. The entire situation is repeatable. We cannot avoid contact bounce, especially not during manufacture where we solder connections. The contact bounce in field will be suppressed by adding a permanent solder connection instead of the balancer connector. But, unless there is not way to tell the IC to not do what it does here, we truly think that this is a severe design flaw.

    Regards Frank