AM62A7: I2C Controller lock-up due to the Void Message on I2C bus

Part Number: AM62A7

Tool/software:

Hi, 

SDK: 09.01.00

We are using the AM62A7 processor with 09.01.00 SDK. We have integrated the AR0235 Camera sensor on the I2C-2 bus. Sometime (1 or 2 time out of 100 time) we see the I2C timed out error as below. 

[    2.975722] omap_i2c 20020000.i2c: controller timed out
[    2.975747] ar0235 2-0036: failed to read chip id 1850

We have captured the Saleae logs to identify the issue. We come to know that there is void message on the I2C bus. We have found the void message description from the TRM of the AM62A7 and it is as below.

When we observed the Saleae logs of the I2C-2 bus (where image sensor is connected), we have found the similar waveform to void message condition.

I am attaching the Saleae logs for your reference. I found the void message condition when i2c got timeout error at last waveform at the 396th second.

4188.logs.zip

Please suggest that how can we avoid the void message on the I2C line?

Regards,

Jay

  • Hello Jay,

    I'll be working with you from the software side.

    For future readers, the hardware side of the discussion is over here. The latest I have asked for is a test to verify that the SDA is actually getting pulled low by the AM62Ax, and not by anything else on the I2C bus: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1446977/am62a7-i2c-bus-timeout-issue/5561441#5561441

    What could be going on from a software side?

    I do not have an immediate answer for you. So this response will be a bit long as I work through my thought process. I'll provide a summary of actions at the end.

    We'll be talking about the driver
    drivers/i2c/busses/i2c-omap.c

    I assume you are using interrupts with the I2C driver, right? If so, I would expect polling = false, as per omap_i2c_xfer_irq:

    static int
    omap_i2c_xfer_irq(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
    {
            return omap_i2c_xfer_common(adap, msgs, num, false);
    }
    
    static int
    omap_i2c_xfer_polling(struct i2c_adapter *adap, struct i2c_msg msgs[], int num)
    {
            return omap_i2c_xfer_common(adap, msgs, num, true);
    }
    

    So that means the "controller timed out" error is from here in the driver:

    /*
     * Low level master read/write transaction.
     */
    static int omap_i2c_xfer_msg(struct i2c_adapter *adap,
                                 struct i2c_msg *msg, int stop, bool polling)
    {
    ...
            /*
             * REVISIT: We should abort the transfer on signals, but the bus goes
             * into arbitration and we're currently unable to recover from it.
             */
            if (!polling) {
                    timeout = wait_for_completion_timeout(&omap->cmd_complete,
                                                          OMAP_I2C_TIMEOUT);
            }
            ...
                    if (timeout == 0) {
                    dev_err(omap->dev, "controller timed out\n");
                    omap_i2c_reset(omap);
                    __omap_i2c_init(omap);
                    return -ETIMEDOUT;
            }
    

    timeout == 0 means that the IRQ didn't return from omap_i2c_xfer_data and do the omap_i2c_complete_cmd function call:

    static irqreturn_t
    omap_i2c_isr_thread(int this_irq, void *dev_id)
    {
            int ret;
            struct omap_i2c_dev *omap = dev_id;
    
            ret = omap_i2c_xfer_data(omap);
            if (ret != -EAGAIN)
                    omap_i2c_complete_cmd(omap, ret);
    
            return IRQ_HANDLED;
    }
    

    What is happening where?

    let's refer to the TRM I2C Low-Level Programming Models:

    Configure Target Address and the Data Control Register

    In controller mode, configure the target address register by programming the I2C_SA[9-0] SA bit field and
    the number of data bytes (I2C data payload) associated with the transfer by programming the I2C_CNT[15-0]
    DCOUNT bit field.

    That happens in omap_i2c_xfer_msg:

    omap_i2c_write_reg(omap, OMAP_I2C_SA_REG, msg->addr);
    /* make sure writes to omap->buf_len are ordered */
    barrier();
    omap_i2c_write_reg(omap, OMAP_I2C_CNT_REG, omap->buf_len);
    

    I assume we are initiating a transfer.

    Initiate a Transfer

    Poll the I2C_IRQSTATUS_RAW[12] BB bit. If it is cleared to 0 (bus not busy), configure the I2C_CON[0] STT
    and I2C_CON[1] STP bits. To initiate a transfer, the I2C_CON[0] STT bit must be set to 1, and it is not
    mandatory to set the I2C_CON[1] STP bit to 1.

    This also happens in omap_i2c_xfer_msg:

            w = OMAP_I2C_CON_EN | OMAP_I2C_CON_MST | OMAP_I2C_CON_STT;
            ...
            other configuration here
            ...
            /*
             * NOTE: STAT_BB bit could became 1 here if another master occupy
             * the bus. IP successfully complete transfer when the bus will be
             * free again (BB reset to 0).
             */
            omap_i2c_write_reg(omap, OMAP_I2C_CON_REG, w);
    

    I don't see any polling of the BB bit before doing this write. Not sure if that could cause issues in a multi-master setup, but I assume that is fine in your usecase.

    Transmit Data

    Poll the I2C_IRQSTATUS_RAW[4] XRDY bit, or use the XRDY interrupt (the I2C_IRQENABLE_SET[4]
    XRDY_IE bit must be set to 1) to write data to the I2C_DATA register.
    ...
    interrupt subroutine sequence

    This is the part where we finally circle back to where your error is coming from. omap_i2c_xfer_msg starts the countdown. If the ISR has not completed the transmit within a second, it times out.

            /*
             * REVISIT: We should abort the transfer on signals, but the bus goes
             * into arbitration and we're currently unable to recover from it.
             */
            if (!polling) {
                    timeout = wait_for_completion_timeout(&omap->cmd_complete,
                                                          OMAP_I2C_TIMEOUT);
            } 

    Where OMAP_I2C_TIMEOUT is defined here:

    i2c-omap.c:#define OMAP_I2C_TIMEOUT (msecs_to_jiffies(1000))

    Meanwhile, we use omap_i2c_isr_thread to grab the XRDY interrupt, and omap_i2c_xfer_data to go through the interrupt subroutine sequence.

    So I would expect us to time out either if the ISR is never called, or if omap_i2c_xfer_data just spins indefinitely.

  • Next debug steps

    It looks like omap_i2c_xfer_data spins indefinitely while stat is nonzero:

                    bits = omap_i2c_read_reg(omap, OMAP_I2C_IE_REG);
                    stat = omap_i2c_read_reg(omap, OMAP_I2C_STAT_REG);
                    stat &= bits;
    

    Are you comfortable adding print statements to the kernel driver and recompiling? I would be curious to see the value of stat. You could add another dev_err right below the "controller timed out\n" line with that information.

    It's bad practice to add print statements within ISRs in production code, but I cannot remember if the Linux compiler actually blocks you from doing it. So you could also try adding a print statement inside the ISR to see if you ever reach the ISR.

    Regards,

    Nick