This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

bqWizard consistently destroying devices. I am probably going to regret recommending TI to my customer

Other Parts Discussed in Thread: BQWIZARD, BQ78PL116

Despite my hesitation regarding a previous instance of bqWizard destroying a BQ78PL116 device (I posted about it here a few months ago) we decided to go ahead and build our new board around this device. I assumed that because so many other folk were successfully using this software, it must be just a minor glitch, and I was for the most part able to use it with the eval board aside from that one event.

Now I am regretting this. I have today seen bqWizard destroy two '116's, on two different boards (of the three we had made for the rev 1 run; the third had a faulty PCB).

Basically, therefore, it destroyed both of the working boards I had available to me. While we can of course remove the chip and put another one on, it's a fiddly procedure which costs our company money in both the price of the chip (not extravagant, but still $, particularly as I have to import them)  and more particularly our time in doing so (which is the real cost). And of course, there's no point in reworking them if it's just going to happen again, which seems likely given the above experience.

It also obliges me to inform our customer (for whom we have designed this board) that I cannot guarantee that they won't suffer the same fate if they go into full production, given it's their responsibility to program the devices.

Surely, TI, you can do better than this? You have what pretty clearly seems to be shoddy software that has the capability to destroy your hardware, and yet you REQUIRE us to use this software (or the API it is based on) to set the devices up! It is unacceptable that a crash in your software should irretrievably lock the chip.

Surely, surely, there is a way to recover it? I have had you tell me before that the answer is "no; there's no way to unseal it", but this just seems insane. The chip's firmware sees an inconsistency in its stored data, and its response is to say "meh, I give up, replace me?!"

At this point I am really really regretting recommending TI to my customers; it was my decision to point them at this chip and it really looks like I'm going to have a lot of egg on my face if I can't resolve this. There is simply no way I can say to them that they should go ahead and build 5,000 or 10,000 of these if I can't be confident in the procedure to configure the devices.

TI, please address this issue with your software; either change it so it doesn't crash, or if that's too hard make it so that when it does crash it doesn't seal the device! Or if even that is too hard for your expert software engineers, tell us, your customers, how to unseal the chip when the preceding events occur! I am sure that if you really wanted to, you could find out from your firmware engineers how to do this.

I'll attach the screenshots I took today of the crash. Both crashes were identical; they occurred when I tried to load the pack configuration onto the BQ on the newly-built boards.

The crash (keep in mind this happened twice on two different boards).
(I have no idea what modal form the message refers to; as far as I can tell no other dialog
was open and I took particular care the second time around to ensure nothing was amiss).

This is what it says when I click 'ok' after the run-time error.

When I click 'ok' again, I get the 'bqWizard has stopped working' message.

When I re-start bqWizard, the device is sealed. Time to schedule the board for re-work. Thanks TI!

  • Chris

    I have not seen most of the messages that you have posted, nor have I seen them reported from other users. What are you doing when the device fails? The earlier post indicated that there had been a software crash and your symptoms point towards flash memory corruption that the device cannot recover from. The device would normally jump to the bootloader routine, if the firmware gets corrupted.

    The PL116 is normally used for high cell count applications. Is there a particular requirement that drove you to use this device for a 3-cell application? We have better devices available for applications for 2 to 4 series cells.

    Regards
    Tom

  • ThomasCosby said:
    What are you doing when the device fails?

    Connect board, start bqWizard, it happily detects device at address 0x16, loads and sits ready for me to do something. It's a new board so the first step is to load the pack configuration onto the device (this is the same configuration I have successfully used with the dev board). During the process of loading the configuration (after about 5 or 10 seconds) the error dialog appears and the device is hosed.

    That's all there is to it.

    The earlier post indicated that there had been a software crash

    Yes, a crash in bqWizard, just like this one. It also occurred during loading of pack configuration and destroyed the device on the dev board (I later replaced it and it worked without problem after that). This time I decided to take screenshots since I was pretty annoyed.

    Is there a particular requirement that drove you to use this device for a 3-cell application?

    My client has an existing reasonably large customer base with family of portable devices manufactured by them that use a common battery pack containing an earlier battery controller design (12 years old). This design does not handle cell balancing very well and attempts by them to move to higher capacity cells with it didn't work out very well due to this.

    They retained me to design a replacement controller that had much better management of balance, maximize battery life and closely monitor battery behavior (the latter due to the fact that some brands of cell they were using performed worse than others, and they want to collect detailed information on these, which was not possible with the old controller).

    The devices are used in an industrial environment where battery changes during shifts are disruptive and hence reliability and quality are more important than low complexity or low cost. Given that TI promotes the charge pump cell balancing on the '116 as being more efficient and having not seen any reason not to use it  for a 3S application it appeared to be a suitable choice given the constraints. The decision to proceed with the investigation into the '116 (which lead to the contract to develop this board) was made prior to the device going NRD, so there was no reason to not use it, and as far as I am concerned aside from the issue with the bqWizard software it is still a suitable choice.

    This leaves me with the issue I described. Your software, if it happens to crash during the load of pack configuration, will for all intents and purposes destroy the '116 by leaving it sealed, since as previously advised in this forum, once sealed it cannot be unsealed and the device must be replaced.

    I repeat me previous assertion that this is unacceptable behavior: either the software should not crash, or if it does, it should not leave the device unrecoverable.

    I am at a loss to understand how a company with the reputation of Texas Instruments is willing to promote and sell hardware that can be damaged by its own software in this way. If it was the end-users fault that's one thing, but in this case it's TI software talking to TI hardware via a TI GPIO adapter. I've seen the fault on both your own EVM and my own board design, so it's target-agnostic.

    If TI cannot find a way to resolve this we are going to be left having to re-design the board (at our own expense), which is going to leave me with a permanent distaste for TI products. I simply cannot in good faith provide my client with a design for manufacturing that relies on a software product that has a known fault capable of destroying the hardware.

  • Chris

    Are you using a .dat file to load the configuration into the device? Can you send me the file or files that you are loading and I will see if i can find something that may be causing the corruption problem.

    Best regards
    Tom

  • ThomasCosby said:
    Are you using a .dat file to load the configuration into the device? Can you send me the file or files that you are loading and I will see if i can find something that may be causing the corruption problem.

    Yes I am using a dat file. It's nothing special; just a basic setup generated for evaluating the pack we are using (no profiling or aux chem). This has successfully been loaded onto the EVM in the past.

    Please PM me your email address and I'll send you it.

    I might try loading it onto one of our EVM's again with our I2C monitor attached. At least this time I will be able to capture the sequence of communication events if it crashes.

  • Chris

    Please attach the .dat file to this forum entry. I do not publish my email address in the open forum.

    Regards

    Tom

  • ThomasCosby said:
    . I do not publish my email address in the open forum.

    Which is why I asked you to PM me the address.

  • Chris

    I attempted to load your .dat file into a device and had the same results that you reported. The .dat file is corrupted, so please do not use it anymore. I would recommend replacing the device and creating a new .dat file. If you want to send your .ppcsv, .tmap, .aux, .cal or any other files that you are using to create the .dat file, then I can test them as well. Maybe one of them is also corrupted.


    Best regards
    Tom

  • ThomasCosby said:
    I attempted to load your .dat file into a device and had the same results that you reported. The .dat file is corrupted

    I see. Regardless of how it became corrupt (presumably after it was last successfully used) the point remains that your software is written in such a way that it crashes under these circumstances, and further, that such a crash for all intents and purposes destroys the target device.

    May I suggest that perhaps this outcome is not, shall we say, exactly 'user-friendly'??

    Given TI is not a backyard company, I would reasonably surmise that it has the engineering resources to write software that handles bad input more gracefully, should it wish to.

    Please pursue this issue and ask those responsible to rectify this severe bug by having bqWizard gracefully handle bad files rather than crashing and destroying the device, as in the meantime I and all your customers are still left with a situation whereby apparently the only warning we get of a bad .DAT file is for it to nuke the target.

    Alternately perhaps your firmware folks could change the 116 to not seal the device when bqWizard crashes during configuration.