AM625: I2C controller becomes stuck with SCL low

Matthias Schiffer

Part Number: AM625

On our TQMa62xx SoM, we've observed the I2C controller becoming stuck under high system load (simultaneous stress test of CPU, RAM and various peripherals). We've noticed multiple symptoms of this state:

SCL is permanently pulled low by the AM62 I2C controller
A spurious call to omap_i2c_receive_data(). This currently causes a buffer overrun and usually a crash in the current kernel (ti-linux-kernel 09.00.00.009-rt), as the function never checks buf_len before writing to the buffer; I assume the cause must be a spurious RRDY interrupt, as that code path passes omap->threshold as num_bytes without checking the remaining buffer size. As part of our debugging, we've patched our kernel to detect this case and print a warning instead of crashing.
Besides BB, the XUDF and BF flags were set in I2C_STAT (at least one time we saw this issue; flags might not have been the same every time). I haven't been able to get a proper trace of the changes of the register values when the issue occurred, so I'm not sure if there is actually an underflow before the other symptoms become visible.

Once the controller is in this state, omap_i2c_wait_for_bb() will always time out and no further transfers are possible. As SCL is low, omap_i2c_recover_bus() fails with -EBUSY.

Recovery from the stuck state is possible if transfers are stopped for long enough for the controller to be suspended by runtime PM. I could also get it unstuck by manually triggering a Soft Reset of the controller using devmem.

Some of the issues should be easy to solve in software, by resetting the controller when recovery is impossible, and improving handling of buf_len (synchronization between omap_i2c_xfer_msg() and omap_i2c_xfer_data() also seems questionable - I suspect the barrier() in omap_i2c_xfer_msg() might be too weak if the IRQ is handled on a different core).

I'm concerned about the spurious omap_i2c_receive_data() call however, as it might also lead to data corruption. Is this caused by the hardware misbehaving, or is there a different explanation?

Our Device Tree configuration:

&main_i2c0 {
	pinctrl-names = "default";
	pinctrl-0 = <&main_i2c0_pins>;
	clock-frequency = <400000>;
	status = "okay";
	
	/* Not shown: several devices including PMIC, EEPROMs, RTC, ... */
};

Regards,
Matthias

over 1 year ago

0 Nick Saulnier over 1 year ago

TI__Guru* 87245 points

Hello Matthias,

Is this a stress case that you are able to replicate on a TI EVM? That could help us to replicate your observations.

Regards,

Nick

Processors

Processors forum

AM625: I2C controller becomes stuck with SCL low