This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux: I2C crashes under heavy load, Linux 3.14.57-ti-r78



Tool/software: Linux

Hello, I'm having a problem with an AM335X SoC (BeagleBone Black) when doing high frequency I2C reads. I am using ubuntu linux with kernel 3.14.57-ti-r78. I am trying to connect an MPU6050 accelerometer/gyroscope on the i2c-1 bus using the inbuilt kernel driver to read data (using interrupts). I've gotten it working pretty well, but when I set the sampling frequency high enough I start getting the following errors:

Mar  7 13:28:16 DL-EM01 kernel: [  362.893943] omap_i2c 4802a000.i2c: SDA is stuck low, driving 9 pulses on SCL
Mar  7 13:28:19 DL-EM01 kernel: [  365.784033] omap_i2c 4802a000.i2c: SDA is stuck low, driving 9 pulses on SCL
Mar  7 13:28:21 DL-EM01 kernel: [  368.651978] omap_i2c 4802a000.i2c: controller timed out
Mar  7 13:28:23 DL-EM01 kernel: [  369.674012] omap_i2c 4802a000.i2c: SDA is stuck low, driving 9 pulses on SCL
Mar  7 13:28:26 DL-EM01 kernel: [  372.554164] omap_i2c 4802a000.i2c: SDA is stuck low, driving 9 pulses on SCL
Then the beaglebone usually freezes. I tried setting the I2C bus to 400KHz and made sure there is a decent amount of idle time on the bus, so it seems like the frequenct interrupts are causing the problem?
I've tried forcing the frequency with cpufreq-set -f 1GHz, but that only seems to make matters worse. Oddly enough, forcing the frequency to 300MHz works better, though the crashes still happen within a few hours.
Any help would be appreciated
Thank you

  • Hi Eric,

    I am not very familiar with this kernel release (kernel 3.14.57-ti-r78). From the log I see that you're experiencing the behavior specified in Section 3.1.16 Bus clear of the I2C Bus Specification (cache.nxp.com/.../UM10204.pdf)

    Can you crosscheck the i2c driver with the one from the latest TISDK (kernel 4.4.32):
    www.ti.com/.../PROCESSOR-SDK-AM335X
    Or with the latest mainline i2c-omap.c driver (from kernel.org)?

    Best Regards,
    Yordan
  • So for fun I tried patching in the i2c-omap.c from the mainline kernel to my version, now I get this in dmesg:

    [ 174.950636] omap_i2c 4802a000.i2c: Arbitration lost
    [ 175.965322] Unable to handle kernel NULL pointer dereference at virtual address 0000003c
    [ 175.965416] pgd = c0004000
    [ 175.965446] [0000003c] *pgd=00000000
    [ 175.965500] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
    [ 175.965531] Modules linked in: usb_f_acm u_serial usb_f_ecm g_multi usb_f_mass_storage usb_f_rndis u_ether libcomposite xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo cfg80211 bnep bluetooth 6lowpan_iphc binfmt_misc pruss_remoteproc wlcore_sdio ti_am335x_adc uio_pdrv_genirq uio inv_mpu6050 industrialio_triggered_buffer kfifo_buf industrialio
    ...

    I'm guessing it tries (and fails) to get a pointer in the recovery routine which causes the crash, probably a struct type not compatible with the old kernel.

    But why is it needing recovery in the first place? What would cause it to crash like that.
  • Ok I fixed the kernel panic and the recovery works now. It seems to be stable at 500Hz sample rate, will continue to perform tests...

  • Thanks for updating the thread.

    Best Regards,
    Yordan
  • No problem! I'm still encountering the crashes though. The recovery is working better than before, but not always, and it eventually runs into issues. Pretty much the same error message as before - sometimes it recovers and I can continue reading data from the MPU6050, sometimes it recovers and the MPU6050 stops working, and rarely the system crashes completely.

    I think the BeagleBone Black simply can't handle so many hardware interrupts. I guess I'm stuck poking the MPU6050 from userspace for now.

  • Okay so I thought of an idea: What if I allow data to accumulate in the MPU6050's FIFO before reading, in order to reduce the amount of work done in ISRs. So now I'm having it only perform reads every 40 interrupts, which has greatly reduced CPU load, and I can cleanly see the read intervals on my oscilloscope with a long pause in-between (reads happen around 14 times per second).

    I was all excited, but I still sometimes get an «arbitration lost» message, where the ISR for the I2C module then consumes 100% CPU and causes the system to hang. I noticed this comment in the driver`s code ''i2c-omap.c':

    /*
    * REVISIT: We should abort the transfer on signals, but the bus goes
    * into arbitration and we're currently unable to recover from it.
    */

    I'm still clueless as to what is causing the arbitration loss...maybe a bug in the MPU6050 itself? It wouldn't be such a big deal if it could recover from it, but it looks like I'm SOL here :(

    It almost seems that it's a signal integrity issue because I get the occasional NACK from the MPU6050 too, but I've completely isolated the MPU6050 on the bus and still get the errors...

  • So, I dig through the I2C-Omap driver code a bit, and realized that while it executes the omap_i2c_reset function on a timeout, it simply reports the error on an arbitration loss. So I have it call the reset function when there is an arbitration loss and it seems to fall back on it's feet. It's been running for 5 days now and I haven't had a single crash and I can still read I2C data. It's a hell of a party in the log though, especially since I added my own debug messages:

    Mar 21 09:46:29 DL-EM01 kernel: [499343.068071] omap_i2c 4802a000.i2c: Arbitration lost
    Mar 21 09:46:29 DL-EM01 kernel: [499343.068149] Bus recovery done!
    Mar 21 09:46:29 DL-EM01 kernel: [499343.187437] Bus is stuck busy, recovering
    Mar 21 09:46:29 DL-EM01 kernel: [499343.187484] Bus recovery done!
    Mar 21 09:46:29 DL-EM01 kernel: [499343.187504] Error during data reception! -11Did not read expected number of bytes from FIFO, flushing
    Mar 21 09:46:29 DL-EM01 kernel: [499343.307432] Bus is stuck busy, recovering
    Mar 21 09:46:29 DL-EM01 kernel: [499343.307462] Bus recovery done!
    Mar 21 09:46:29 DL-EM01 kernel: [499343.407467] omap_i2c 4802a000.i2c: controller timed out
    Mar 21 09:46:29 DL-EM01 kernel: [499343.407518] inv-mpu6050 1-0069: int_enable failed -110
    Mar 21 09:46:40 DL-EM01 kernel: [499354.349696] omap_i2c 4802a000.i2c: Arbitration lost
    Mar 21 09:46:40 DL-EM01 kernel: [499354.349773] Bus recovery done!
    Mar 21 09:46:40 DL-EM01 kernel: [499354.349789] Error during data reception! -11Did not read expected number of bytes from FIFO, flushing
    Mar 21 09:46:41 DL-EM01 kernel: [499354.627969] MPU6050 FIFO reset success!
    Mar 21 09:47:14 DL-EM01 kernel: [499387.968782] Got error -121 during address transmit, retrying....
    Mar 21 09:48:04 DL-EM01 kernel: [499437.710552] omap_i2c 4802a000.i2c: Arbitration lost
    Mar 21 09:48:04 DL-EM01 kernel: [499437.710719] Bus recovery done!
    Mar 21 09:48:04 DL-EM01 kernel: [499437.710735] Error during data reception! -11Did not read expected number of bytes from FIFO, flushing
    Mar 21 09:48:04 DL-EM01 kernel: [499437.987913] MPU6050 FIFO reset success!
    Mar 21 09:49:25 DL-EM01 kernel: [499519.319412] Error during data reception! -121Did not read expected number of bytes from FIFO, flushing
    Mar 21 09:49:26 DL-EM01 kernel: [499519.577962] MPU6050 FIFO reset success!
    Mar 21 09:49:34 DL-EM01 kernel: [499528.209595] Error during data reception! -121Did not read expected number of bytes from FIFO, flushing
    Mar 21 09:49:35 DL-EM01 kernel: [499528.467959] MPU6050 FIFO reset success!
    Mar 21 09:49:53 DL-EM01 kernel: [499547.261610] Error during data reception! -121Did not read expected number of bytes from FIFO, flushing
    Mar 21 09:49:54 DL-EM01 kernel: [499547.517967] MPU6050 FIFO reset success!
    Mar 21 09:50:08 DL-EM01 avahi-daemon[594]: Received response from host 192.168.1.81 with invalid source port 62466 on interface 'eth0.0'
    Mar 21 09:50:20 DL-EM01 kernel: [499574.001976] Got error -121 during address transmit, retrying....
    Mar 21 09:50:46 DL-EM01 kernel: [499600.236504] Error during data reception! -121Did not read expected number of bytes from FIFO, flushing
    Mar 21 09:50:47 DL-EM01 kernel: [499600.487982] MPU6050 FIFO reset success!

    I'd really love to know where all these errors are coming from- It almost seems like there's a signal issue with the I2C bus itself, but I tried connecting the MPU6050 directly to the BBB and it didn't help. Any AM335x experts out there want to chime in?

    Thank you

  • Hello Eric,

    I wanted to inform you that we may be experiencing the same issue.

    We have a 4.4 bone rt kernel. On the i2c1 device we attached 3 i2c slaves. When we poke into ground sda or scl, we randomly get kernel freezes that sometimes resolves and sometimes not.

  • Hello Eric.

    During your tests, did you tried to replicate the same behaviour on the I2C-2 bus?
    We are experiencing similar issues, but only on the I2C-1 bus.

    Regards.
  • Hello,

    I believe we only tested on the I2C-1 bus, the pins for I2C-2 were being used for something else IIRC.

    -Eric