This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

BQ40Z50-R2: SMBus stuck during SREC read

Part Number: BQ40Z50-R2
Other Parts Discussed in Thread: BQSTUDIO, BQ33100

We've encountered a problem during reading of the SREC file on a BQ40z50-R2.

Essentially, another device on the board has tried communicating during the first few seconds of the read and the read attempt fails. However, the BQ is unable to recover from this. It gets stuck in (presumably) ROM mode with the SMClock held low. It won't accept any commands or respond in any way. As there is no /MCLR or /RST pin on the device, the only way we can free it is to unsolder the board from the battery pack and re-connect it (power cycle the BQ)

For initial prototypes we have access to the solder terminals but once our device is sealed within a battery pack, there will be no way to power-cycle or reset the BQ. If we can suspend all other comms while BQ studio communicates, the read works fine, but it's still quite plausible that something could interrupt the comms (dodgy connection, operator forgets to kill all other comms, etc) and that could render the pack useless.

Firstly, to preserve my sanity during the prototype stage, is there another way to recover from this fault that doesn't require unsoldering the PCB?

Secondly, can this be fixed?

Thanks

  • Hello David,
    We strongly do not recommend having another device communicating on the comm lines especially during programming the gauge. This almost always causes problems.

    To recover your device, power cycle the IC by disconnecting the batteries and then reconnecting them. You could also try shorting the PBI pin to ground to see if that helps recover the device.
    thanks
    Onyx
  • Thanks for the response.

    We do aim to suspend the other device when using bq studio but inevitably we sometimes forget. I'll try the PBI pin if it happens again.

    I am more surprised /concerned that the BQ is unable to recover from this event. There doesn't seem to be any collision detection or even a time out. I would have expected it to fail the process more gracefully than just latching the clock low.

    We're migrating from the BQ20z95 which has 2 MRST pins. If there's a place to suggest features, I'd suggest a similar RST pin. There's surely a good reason why such a pin exists on most if not all processors.

    Thanks

  • thanks for the feedback. I have communicated this across to the responsible parties.
    thanks
    Onyx
  • Hi David,

    Can you provide a scope or bus log when this happens, as it sounds like you may be able to reproduce it reliably?  The device should not latch up, and that is a concern, the bus should timeout and recover. The log would hopefully show if the other traffic was sending an unexpected command that we may be sensitive to, or if it resulted in some type of unexpected condition on the bus due to a collision. 

    Thanks,

    Terry

  • Hi David,

    Thank you for the scope shot. Please try modifying the resistance of your pull-up resistors to increase/decrease the rise time of the signal.

    Please also adjust the value of the series resistors.

    Let me know if this resolves the issue.

    Sincerely,
    Bryan Kahler

  • Hi David,

    After further review of the logs, I have a few more questions.

    What value pull up resistors are you currently using?

    When SREC fails, what message(s) do you receive from bqStudio?

    If a device communicates on the bus while the SREC is being updated, blocks of data may be corrupted, causing the checksums to not match, resulting those blocks of data not being written to the device, ultimately leaving the device stuck in ROM mode.

    Please do not communicate with another device on the BUS when programming SREC.

    During prototyping, I highly recommend adding test pads for pins such as RST or removing soldermask in an appropriate location to enable easy access to those signal lines.

    Sincerely,
    Bryan Kahler
  • Hi David,

    To be clear, is the red line the clock signal? If the gauge was locking it low, what caused it to toggle up/dn later when the processor tried to communicate? It had to be released by the gauge for that to occur. Is it possible the processor was the one locking the line low?

    We do see comm issues in the log you sent, presumably due to your multi-master setup. Sometimes the master stops clocking in the rest of the data from the gauge, so the gauge was interrupted in ROM mode, then the host tries to access MAC subcommands, which are NACK'ed by the gauge, since it isn't in firmware mode yet. So after you get the multi-master setup cleaned up, it may also solve this issue you're seeing.

    Thanks,

    Terry
  • We are seeing similar issues using the BQ33100. To simulate the fault, we intermittently short the smbus data line during normal operations; creating the sort of bus competition you cite. When the fault starts, some entity keeps either the data or clock line pulled low permanently. We suspect it is our host processor since a reset of the host returns everything to normal. The Smbus spec requires all entities on the bus to correct the situation when a stuck bus is detected. For us, we think the host may be at fault not the BQ33100.