This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM623: I2C1_SCL stuck low

Part Number: AM623
Other Parts Discussed in Thread: TLV320AIC3101

Tool/software:

Test method:

In order to test if failed device on I2C bus hang system, created a test scenario to access 1 existing device and 1not existing device in alternatively.

#1. In normal case, the wave form as below (there is data stage after sending out slave address and ACK):

#2. Fail waveform as below, after sending out address to existing device, the bus was hold after address and ACK stage as below. it should be data stage as upper normal access.

#3.  It can be reproduced certainly from 1 hours to 14 hours.

#4. Captured 4 times failure, all happened at accessing existing device after address stage as upper captured waveform,  

#5. Test with 2 existing device accessing, did not reproduce the failure so far.

Analysis:

#1. Did experiments to replace serial resistor with wire, disconnect wire after reproduced to probe on either side, SCL is low on AM62x side, SCL is high on slave side, at least it prove SCL is not hold by slave.

#2. After I2C bus hang, I2C1_SYSTEST register value is 0x60, software detected BUSY and return.

 #3. Write I2C_SYSC[SRST] can resume the bus, but it is more easy to re-produce again.

  • updated the original description

  • Hello Tony,

    Thank you.

    Not sure if there is a question here.

    To me it seems you have put together your learning and looks good.

    is my understanding correct?

    Regards,

    Sreenivasa

  • I just add more back ground, the issue is why SCL stuck low, still need BU expert to analysis.

    I am not telling accessing non-existing device is the reason, it will return NACK, won't cause hang.

  • Hello Tony,

    Thank you.

    Regards,

    Sreenivasa

  • Just reminding, This is not resolved, need deep analysis. 

  • Hello Tony,

    Thank you.

    Let me review the inputs.

    regards,

    Sreenivasa

  • Hello Tony,

    Can you please confirm if the above schematics is the customer schematics.

    The device expert suggested to add a 22R or 33R resistor and perform some tests.

    Regards.

    Sreenivasa

  • Hello Tony,

    1) Can I get you to confirm how long these "hangs" are?

    10-100msec?

    1 sec?

    Indefinitely?

    2) What version of Linux SDK is the customer using? Are they using RT Linux or regular Linux?

    Keep in mind that clock stretching is normal and expected, where either the I2C peripheral or the I2C controller holds the SCL low while it finishes processing.

    During Linux boot time, I have observed the regular Linux scheduler (i.e., not RT Linux) causing the SCL line to be low for >30 msec, very rarely longer than 100 msec. This is expected with a non-Real Time operating system.

    3) Please note that testing the voltage on either side of the serial resistor would only be useful if you have a pullup resistor on both sides of the serial resistor. 

    The test is not valid if you simply removed R1 and then probed A and B separately, since there is no pullup on the AM62x side.

    Regards,

    Nick

  • "hang" means indefinitely. can't send data on the I2C bus as SCL is stuck low. 

    It is Linux SDK9.2 on HS-FS AM6232. 

    It is not a performance, latency, throughput issue. SCL stuck low is not a rare issue of I2C, Search "SCL stuck low" on this forum can get many similar posts:

    https://e2e.ti.com/search?q=SCL%20stuck%20low&category=forum

    I also have the I2C tips of old wiki page, can recover by reset I2C module, but it is easy to occur again, and need analysis why and what caused this "hang"

    /cfs-file/__key/communityserver-discussions-components-files/791/I2C-Tips-_2D00_-Texas-Instruments-Wiki.pdf

    The test is not valid if you simply removed R1 and then probed A and B separately, since there is no pullup on the AM62x side.

    Test result can prove SCL is not hold LOW by external device.

  • Hello Tony,

    Ok fair point on the external device probably not holding SCL low. I am not sure how removing a resistor during runtime could potentially affect the bus, so based on that single test I am still not comfortable saying that the AM62x is definitely holding the SCL low.

    Does the dotted line box on the schematic indicate a separate daughter card?

    Will continue conversation about code offline.

    Regards,

    nick

  • Nick,

    I run the test code on 2 AM62B-SK boards over30+hours, did not reproduce yet. but can produce on custom board definitely in 10+ hours. I will communicate with customer. 

  • Hello Tony

    Thank you for the updates.

    Regards,

    Sreenivasa

  • Hello Tony,

    Thank you for the updates.

    If an infinite loop is occurring somewhere in the code, it could be helpful to add print statements to try to see what the driver is doing. Let me know if you want some sample code on how to do that.

    Regards,

    Nick

  • Hi Nick,

    There are 2 kinds I2C pulled low issue.

    #1. pulled low by external slave, it will result in system hang.

    Customer find a way to easily reproduce the issue, short SDA and SCL to LOW, it will generate numerous time out or NACK interrupt, result in can't exit wait_for_completion_timeout 1s thread, then behave system hang.

    Change dtsi main_i2cn interrupts =<GIC SPI 161 IRQ_TYPE_LEVEL_HIGH> to IRQ_TYPE_EDGE_RISING solved the problem. 

    #2. Pulled low by host itself.

    Reset I2C module, but will reproduce again very soon <5minutes.

    As it only replicated on custom board with 2 slave on 1 I2C bus, did not replicate on AM62x SK, customer now respin to connect only one I2C device on one I2C bus to move forward.

  • Hello Tony,

    Hmm, interesting. Off the top of my head I am not sure why changing from IRQ_TYPE_LEVEL_HIGH to IRQ_TYPE_EDGE_RISING would make a difference...

    Was the customer able to add prints and trace what code was running in the situation where the AM62x host was the one pulling the signal low?

    Regards,

    Nick

  • Is this you wanted?

  • Hello Tony,

    First off,

    responding to #1 in your message on April 28:

    I am not sure I understand. The customer is saying that in this case, their code is hanging indefinitely in this part of the i2c-omap.c driver?

    /*
     * Low level master read/write transaction.
     */
    static int omap_i2c_xfer_msg(struct i2c_adapter *adap,
                                 struct i2c_msg *msg, int stop, bool polling)
    {
    ...
            if (!polling) {
                    timeout = wait_for_completion_timeout(&omap->cmd_complete,
                                                          OMAP_I2C_TIMEOUT);
            } else {
                    do {
                            omap_i2c_wait(omap);
                            ret = omap_i2c_xfer_data(omap);
                    } while (ret == -EAGAIN);
    
                    timeout = !ret;
            }
    
            if (timeout == 0) {
                    dev_err(omap->dev, "controller timed out\n");
                    omap_i2c_reset(omap);
                    __omap_i2c_init(omap);
                    return -ETIMEDOUT;
            }
    

    The reason that is confusing to me is that the entire point of wait_for_completion_timeout is that the function will wait either until the transfer completes, OR until OMAP_I2C_TIMEOUT (which is set to 1 sec in the default driver). So this function should at most hang for 1 second, continue to the "controller timed out" part of that code snippet, and reset the I2C bus.

    Was the customer able to add prints and trace what code was running in the situation where the AM62x host was the one pulling the signal low?

    I was looking for the customer to add print statements inside the I2C driver i2c-omap so that we could follow where the code was going when the hang occurred. That would tell us where in TI's I2C driver the code was hanging (if the code was actually hanging there).

    Is the customer's output is telling us that something is going wrong in the AIC3101 driver, and the code is hanging there before the lower level i2c-omap driver is even getting involved?

    I am not totally sure what is going on with that driver. I see bindings documentation for 
    Documentation/devicetree/bindings/sound/tlv320aic31xx.txt
    "ti,tlv320aic310x" - Generic TLV320AIC31xx with mono speaker amp

    However, I cannot find the driver itself for TLV320AIC3101 in Linux SDK 9.2, 10.1, or 11.0. I can send this thread over to the AIC3101 team to comment on their driver if you like.

    Regards,

    Nick