This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

BQ79616-Q1: BQ79616 stack chips not responding to queries or reset attempts

Part Number: BQ79616-Q1
Other Parts Discussed in Thread: BQ79616

We have encountered an issue in the field that we have been unable to rectify via software and are struggling to identify what may be the actual issue. Our implementation of the BQ79616 utilizes a host PCB with the MCU and a single BQ79616 acting as a transceiver and a plurality of stack boards each with a single BQ79616. This design does not utilize the ring architecture and the BQ79616 acting as the host transceiver has only commh+ and commh- broken out to a connector. The implementation in question utilizes a single host, and two stack boards. The system is sealed within the battery enclosure and generally physically inaccessible. External communications with the host MCU are accomplished either via CANBUS or Serial. Additionally, since the BQ79616 residing on the host board is powered from the boards power supply (13.6V nominal), it undergoes a POR with each power cycle of the host board.

The problem condition that has occurred is that the two stack chips are completely unresponsive to any communication attempts to read, write, address, or reset the devices.The condition manifested itself specifically when the host board was powered up by an external supply to receive a firmware update. The issue occured before the update was performed and has persisted both after the new host MCU firmware was installed and after rolling back to the previous working version. 

We are unable to replicate this issue through bench testing. We have confirmed by scoping the UART and COMMH/COMML lines that the HW_RESET procedure propagates from the MCU to the base device and through the stack as expected on the bench. Attached are some images of the scope data for reference. The firmware on the host mcu contains no provisions for manipulating the commh/comml transceivers of any devices directly and given that the condition occurred during an otherwise normal start up it seems exceedingly unlikely that the stack chips would have had any transceivers disabled. It seems as though the most likely issue is that the base device is not actually forwarding on any commands to the stack chips rather than the stack chips being truly unresponsive; however, a HW_RESET ping does nothing to remedy the issue, nor does manually enabling the commh/comml transceivers via the COMM_CTRL registers on the base device.

The base BQ79616 does communicate with the host MCU via UART without issue and the registers can be read from and written to as expected. As mentioned, neither a HW_RESET ping nor a POR results in any change to this behavior. We have tried both of these reset methods followed by a double wake ping after the HW_RESET period, as well as manually enabling the VIF through the COMM_CTRL debug registers, all without any change in behavior. Given that these devices are inaccessible, we are unable to physically probe any of the LDO or other electrical signals. 

At this point the only physical condition that we can think of that could have occured is a temporary Vbat voltage below the 9V minimum on the base device. This would have been momentary but could have happened during the connection to the external power supply and also possibly during the device's internal start up routine.  Is anyone aware of any other condition and/or state where a device may become unresponsive and unrecoverable through various reset attempts or where a base device may become unable to communicate with stack devices despite functional UART communications with a host?

This is a scope image of a stack device on the bench receiving the HW_RESET tone and shutting down the LDOIN regulator. In bench testing, all devices respond to the HW_REST tones or ping in the case of the base device

  • Hi,

    The most likely culprit is some issue with the hardware configuration. Can you send the schematic of the 616 devices so we can double-check the configuration for each of these devices?

    Best,

    Nancy

  • Here are the schematics for the Host and Module boards.

      

  • Hi,

    Thanks for sharing the schematics. It looks like U301 is the stack device because of the way RX is connected is this the case. If so, TX should be left floating for stack devices. Is IC1 for the isolation components the same as U301? Also how is U604 being used? That is not quite clear to me. 

    Best,

    Nancy

  • Correct, U301 is the stack device and U604 is the base device. We are using U604 in lieu of a BQ79600 due to availability at the time of the design and manufacture of the board. U604 is essentially acting as a UART interface.

  • Hi Nancy, I just wanted to follow up on this and ask what sort of condition you think may could cause this behavior? I do see that the TX line is pulled to CVDD on the stack devices and it is not floating. We will correct this in the future but these boards have been operating normally for months without any issue so I am wondering what may have created the issue if it is related to that resistor.

  • Hi,

    I am not sure why the device would be working and then stop working for this hardware configuration. In theory, this should have not worked from the start. Perhaps some physical issues with the PCB (think coupling, bad solder joint, or thermal damage) impacted the system. Can you check two things?

    1. Can you perform a connectivity check on the R320? 

    2. Is it possible to depopulate R320 and observe the behavior? This should solve the current issue. 

    Best,

    Nancy

  • We are in the process of getting the hardware back to our facility to perform further diagnostics. Per the documentation, the Tx line is pulled to CVDD internally during sleep and active modes so I suspect the additional resistor to CVDD externally is not affecting the normal function of the device in those modes.

    While this could certainly be a hardware issue, it is odd that it would manifest only in this specific occurence and not seem to be able to be replicated. Additionally, there are hundreds of systems with this exact hardware configuration that are currently operating without any issue.

    Is there anything that you could think of that could cause the stack device commH/commL to become disabled during a shutdown to active state transition? These stack devices would have been in a shutdown state because of a long communication timeout and then never responded again once woken up. 

  • 1) Both sides of R320 measure 4.91V to VC0 when in shutdown mode as does CVDD. Both sides of R320 measure 4.98V to VC0 when in active mode as does CVDD. 

    2) Can you elaborate at all on why you think this would manifest and persist at a seemingly uneventful time when the same PCBs have been functioning without issue for multiple years? Additionally, the base chip is communication with the MCU without issue so R320 is not preventing UART communications but the stack chips are not communicating via UART anyway. I do not have the affected hardware back in hand yet to try this.

    Some additional questions:

    1) Would CVDD voltage on the Tx pin on a stack device prevent it from transitioning from shutdown to active?

    2) Is there a UART fault state that would disable the commH and commL transceivers?

  • Hi,

    Thanks for checking that. I am not sure why functioning boards would suddenly stop working. As for the CVDD voltage, the CVDD pin powers the communication blocks within the device, which could interfere with the communication functionality. As far as I know, there is not a UART fault state that would prevent transmitting/receiving. 

    Best,

    Nancy