TMS320F28034: Possible cause(s) of I2C bus hang-up

Murdock Taylor

Part Number: TMS320F28034

We have seen a problem in production units using the TMS320F28034 where a symptom is that the I2C bus is hung up (slave holding SDA low). Resetting the TMS320F28034 does not clear the problem, but powering the whole board down and then powering it backup does. The I2C bus in this application has only 3 parts -- TMS320F28034 (master) and 2 slaves (I2C EEPROM and an I2C accelerometer). Everything about the I2C bus is nominal (pull up resistors (6.8k), bus capacitance (30-40pF) SCL clock speed (161kHz), supply voltage for master, slaves, & I2C (3.3V),.rise times on SCL & SDA, I2C high (~3.3V), I2C low (~0V), etc. We were able to find some TI documentation (I2C Tips http:/processors.wiki.ti.com/index.php/I2C_Tips ) that indicated if the I2C master was reset in the middle of data being sent from a slave on I2C, then it could hang up the bus. The TMS320F28034 reset pin (/XRS) has a 1k pull-up to 3.3V and a 0.1uF cap to GND. We were able to manually cause the I2C bus to hang up by triggering a reset in the middle of I2C communications where the serial I2C EEPROM read out its whole memory. (If the reset occurs while the slave is reading out a "0" and the clock goes away, the slave is left holding the SDA line low.) The TMS320F28034 expects the supply voltage to be 3.3V +/- 10% while the I2C slaves will operate with a supply voltage of 5.5V down to 1.9V. It could be possible for a glitch on the 3.3V supply to cause the TMS320F28034 to do a POR/BOR reset, but for the I2C slaves to continue to operate normally. The board layout does not provide the master with a way to do a power-on-reset of the I2C slaves.

In trying to debug this problem and/or reproduce it, we have managed to manually trigger a reset of the TMS320F28034 while it was in the process of reading an I2C EEPROM memory dump and we caught the EEPROM in the middle of sending a "0". The result is the I2C bus is hung with the slave holding the SDA line down.

The only ways we can think of for this problem to occur "normally" would be a POR/BOR reset, watchdog reset, NMI reset (attempting to write to protected memory, etc.), or a memory leak or errant pointer writing over the I2C control registers

Is there any way to operate the I2C master pathologically to cause it to hang up the I2C bus? If the I2C peripheral is disabled in the middle of an I2C read, could this mimic the microcontroller being reset and hang up the bus? (IRS bit in I2CMDR (I2C Mode Register) written with "0" while the I2C peripheral is reading data from slaves.) The I2C registers are not write protected so errant software could overwrite them.

POR/BOR reset would involve glitches on the supply voltage inputs. This would seem unlikely as the power looks clean and the glitch would need to cause 3.3V regulator to drop out enough for the TMS320F28034 to do a POR/BOR but not drop out low enough to do a power-on-reset of the slaves and this must occur in the middle of an I2C read where the slave is sending a "0". We were not able to trigger the reset from EMI (we used variable speed DC drill and you could watch the brushes arc). A watchdog reset or NMI type of reset would seem to be symptomatic of errant software -- memory leak or errant pointer. All of these would seem to not occur often enough to have a good chance of actually occurring at the magic point needed to hang the I2C bus. If an errant pointer or memory leak managed to write over the I2C Mode Register and hang the I2C bus first and eventually cause a watchdog reset or NMI type of reset, the reset itself might not be the actual cause of the problem we are seeing but just a coincidence.

Any observations and suggestions would be welcome. Please let me know if these speculations are delusional or not. We are trying to determine if the source of the problem is in the HW or SW in order to focus our efforts.

over 6 years ago

0 Kevin Allen18 over 6 years ago

TI__Mastermind 41535 points

Hi Murdock,

From your description it sounds like a POR while reading from the EEPROM is possible. Some questions to help me better understand the situation below:

1. Are all of your devices are on the same board (F28034 master and the two slave devices) and therefore all powered from the same source? Is it possible to test with only one slave on the I2C bus to see if it's specific to the EEPROM/Accelerometer?

2. Are you running your application code from flash, if so do you have access to the JTAG to debug your program within CCS? It would be helpful to check the status of the I2C related registers at different times, assuming a POR isn't happening and the debugger can actually stay connected.

3. Are you able to capture logic waveforms on the I2C bus or probe with an oscilloscope? The waveforms would be helpful for seeing what's happening on the bus before/after the SDA line is being held.

If a POR is indeed occurring regularly and holding the I2C bus (SDA line) as a result, it's probably better to narrow down the cause and fix it rather than countering it within software. However, there are some ways of recovering within software as mentioned in the I2C tips you linked.

1. For master devices that mux the SCL/SDA pins with GPIO, the easiest thing is to configure the pins for GPIO operation and toggle SCL until the slave releases SDA. At this point you should be able to resume normal operation.

2. Many master devices don't mux SCL/SDA with GPIO since the I2C I/O cells are often special open drain cells. A workaround has been reported to work even on these devices. By configuring the I2C for "free data format" and then reading a byte the I2C will immediately start sending clocks to input data (rather than trying to send an address). This can be used to free up the bus.

3. Some slave devices can reset their I2C interface when the bus is hanging (e.g. the LTC4151, after 33 ms). Switch it on if that is not the default behavior.

Option (3) might be a good practice to try first, seeing if setting the IRS bit frees up the I2C bus. Otherwise option (1) could be implemented within your application after a POR.

Hope this helps,
Kevin

0 Murdock Taylor over 6 years ago in reply to Kevin Allen18

Prodigy 30 points

Kevin

Thanks for responding.

The I2C bus is only 2-3 inches long and all on the same PCB. All of the devices on the I2C bus are powered by the same 3.3V LDO regulator. This board is already in production and as a consequence the JTAG pins are not brought out on the PCB so unfortunately there is no way to attach to JTAG. The board is normally enclosed in a metal box. We have a custom designed CAN bus bootloader to allow the production units to be programmed following test via CAN. We can read registers and monitor some points on the board via this CAN connection, but it is similar to debugging with a UART connection in that it is not real time. The application runs out of on chip Flash and responds to CAN inquires in the background. We have opened the metal enclosure on several units and probed various test points.

We have not seen any electrical noise on the input to the 3.3V LDO regulator or any electrical noise on the 3.3V signal, so we think that the POR/BOR reset is unlikely (although possible).

Resetting the Microcontroller in the Middle of I2C Communications Hung the I2C Bus

(SCL is in Yellow on the bottom and SDA is in Red on the top)

Reset of the Microcontroller Occurs on the Right and SCL is OK (high), but SDA is left asserted (low)

This scope photo shows how we were able to hang up the I2C bus my manually resetting the TMS320F28034 during an I2C read of a serial EEPROM.

We were unable to create a free form packet to send out and reset the I2C bus using the I2C peripheral. The I2C peripheral would see the I2C bus as busy and not transmit the reset packet. We ended up having to disable the I2C peripheral, turning the SCL and SDA pins to GPIO, setting them as inputs to confirm that SDA was low, then converting them to GPIO outputs and bit-banging a packet: [START] [1] [1] [1] [1] [1] [1] [1] [1] [NACK] [STOP]. This was found to reset the I2C bus. Just sending clock pulses did not do it. The following scope photo shows the hung I2C bus, the bit-banged I2C reset, I2C bus unhung, and I2C communications started again.

I2C Reset Routine Clearing a Hung I2C Bus (SDA low (red) & SCL high (yellow))

Note: The I2C routine uses GPIO pins to bit-bang so the SCL bit-bang frequency is a little slower (~100kHz) than the microcontroller’s SCL (~160kHz). The I2C Reset Routine changes the SDA and SCL pins from I2C function to GPIO’s and sends out the routine. GPIO outputs are not open drain (like for I2C) and you can see where the master is driving the SDA line while the slave is holding it low (see the small pulses). Eventually the slave releases the SDA line and it goes high. At the end it changes the GPIO’s to inputs and reads the bus (both signals high). At that point it then changes the GPIO pins back to the I2C peripheral functions SCL and SDA and enables the I2C peripheral again.

At this point, we do not see a likely HW cause of this problem. Granted we do see how a POR/BOR while operating could cause this problem if it occurred at the magic point of an I2C slave sending data consisting of a "0", but we have not been able to see or cause such a problem on a board operating within it's design parameters.

Since the I2C control registers are not write protected and might be overwritten by errant SW, is it possible to operate the I2C peripheral in a manner to cause such a problem?

If you were in the middle of reading data on the I2C bus and the I2C peripheral was disabled by writing a "0" into the IRS bit in the I2CMDR (I2C Mode Register) would this cause the I2C peripheral to stop in the middle of an I2C transmission/reception or would the I2C peripheral complete what it was doing before it could be disabled?

We are trying to determine if a SW issue/problem/error (such as a memory leak or errant pointer) that occurs on a pseudo irregular basis might over write an I2C control register and cause the same problem as a reset. It turns out that the I2C software module used in the SW sets up the I2C control registers every time the routine is called (rather than only once on initialization). Unless the I2C bus is reading a "0" from a slave when the I2C peripheral is disabled, it would not hang up the bus. If the SW goes further into the weeds, the microcontroller might be reset by an NMI type reset or by the internal watch-dog (expecting to be touched every 6-12ms). In that way the resetting of the microcontroller itself is not actually causing the bus to hang up, while writing over the I2C control register at the right instant may have caused it. Because this is a production unit, the system does not monitor things like what type of reset occurred and how often is a reset occurring. It is possible that the unless the application is instrumented and monitored, periodic resets by the watch-dog may not be noticed. Hanging the I2C bus however will result in a system fault and would be noticeable.

Murdock

0 Kevin Allen18 over 6 years ago in reply to Murdock Taylor

TI__Mastermind 41535 points

Hi Murdock,

It will be difficult to really know what's going on within the device's I2C module without having access to the register values. Maybe debugging over the CAN connection could be useful if you can periodically retrieve some register values (Like I2CSTR and I2CMDR).

The scope shot you provided is from your forced POR test, correct? Does the I2C hung case in your actual application look the same as what is shown, SCL stays high and is no longer driven by the master? Could be the I2C module is getting reset during the communication (I2CMDR.IRS being set to 0) since you said the registers are being touched each time the routine is being called.

1. Can you share what all is happening within your I2C routine and when in your application it's being called? What I2C registers are being set each time the routine is ran and to what values? Are they being set mid-communication? Also would help if you could provide your initial I2C init.

2. Probing the XRS pin while your application is running would give insight into whether or not watchdog resets are occurring (or if something else is driving it low).

When toggling SCL by itself on/off, you saw SDA stay low and it did not release? Only when you toggled both SDA/SCL it worked? Seems odd.

3. Can you share what accelerator and EEPROM you are using? They handle up to fast-mode at least?

best,
Kevin

0 Kevin Allen18 over 6 years ago in reply to Kevin Allen18

TI__Mastermind 41535 points

Murdock,

I'd like to add some more info.

Resetting the I2C peripheral (I2CMDR.IRS is set to 0 and then 1) in-between transfers will clear the BB bit and the module will not know the state of the bus (I.E. if it's busy or not) until a START or STOP condition is detected. I imagine this same response is seen, though I'd have to test to be certain, when the I2C module is reset in the middle of receiving a byte. The byte you are receiving would likely be lost as well, which is worse.

There are some steps listed at the end of section 5.4 "I2C Status Register (I2CSTR)" in the F2803x I2C guide for re-discovering the state of the bus and correctly setting the BB bit, assuming the resetting of the I2C peripheral is occurring between transfers. There is a slight correction that needs to be made in a newer revision however, discussed in this prior E2E post: e2e.ti.com/.../696954

http://www.ti.com/lit/sprufz9

All of this being said, it's probably best practice not to reset the I2C module unless necessary. And if you do, it should be performed in-between transfers with the steps in the document implemented.

Best,
Kevin

0 Murdock Taylor over 6 years ago in reply to Kevin Allen18

Prodigy 30 points

Kevin:

Thanks for the info.

The slave I2C devices are a Microchip 24LC16BT-E/OT serial EEPROM and a Freescale MMA8452Q serial accelerometer. Both slaves can work up to an SCL of 400kHz and we do not believe either can or does clock stretching.

The problem where the I2C bus was hung up has only been seen when completed products are being tested by automated test equipment in the factory. The problem has not been seen when the boards themselves are tested. In the final equipment the board is sealed in a metal box and has several cables coming out -- so there is no access to any test points on the board.

The I2C routines used were developed by a 3rd party and are polled (not interrupt driven) and operate in the background. The higher priority foreground tasks can operate whenever they want and this messes with being able to check on the various I2C control bits to tell if a communication had problems or not (i.e. if the NACK bit is set was it caused by the master or the slave and this is harder to tell unless you keep track of the context of what you are doing (and this is apparently not done)). The I2C routines found it easier to just clear all of the control bits every time an I2C communication is started rather than clear specific ones and look at others along with the context to see if a problem occurred. The I2C routines do not appear to be very robust and do not do much error checking or error handling.

We have been able to confirm that changing the IRS bit to "0" in the I2CMDR (I2C Mode Register) to disable the I2C peripheral in the middle of a read from a slave where the bit being read is a "0" will hang up the I2C bus -- the slave will be left holding the SDA line low. When the I2C peripheral is reenabled, the I2C bus will look "busy".

The software has portions of "C" code that were automatically generated from LabView plus a number of low level drivers in "C" (many of which were written by 3rd parties) that actually play with TMS320F28034 peripherals and specific external devices. All of this is incorporated together using CCS. There is no RTOS, but a scheduler with interrupt driven higher priority tasks. Debugging at this point is complicated since the only way to communicate with a completed system is via the CAN bus and since the production version of the board does not have JTAG. The watch-dog is enabled and will trip in 13ms unless it is touched. The SW was tested prior to release, but before a major SW effort could be taken to look into this we needed to make sure it was not HW related.

We have been able to reproduce the symptom (hung I2C bus) pathologically by forcing a reset at the magic point (while reading from an I2C slave that is sending a "0"), but have not been able to trigger a reset during operation within the design parameters. Currently we do not see the problem as HW related.

Having a working SW routine to reset a hung I2C bus is at least a first step.

Thanks.

Murdock

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28034: Possible cause(s) of I2C bus hang-up