This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TUSB4041I: failed SMBus transactions

Part Number: TUSB4041I

Hello team,

This is an interesting issue. My customer is seeing failed SMBus transactions where the part fails to respond despite having correct power and being held out of reset. They have watched the SMBus message go out from our processor on a scope and it looks flawless both from an electrical and protocol perspective, they just never get an ACK on SDA. Some boots the part comes up fine and has no issues, other boots the part fails to respond. It doesn’t matter how many times or how long you wait, a pin-reset on the part won’t recover it. It takes a full board power cycle to (potentially) recover.

This issue seems to show up on specific devices; some not all of them. They first saw this issue on a single device many months ago and eventually replaced the USB hub which completely solved the issue. They figured a bad or damaged part and considered it resolved. Now they have many devices in our board run failing with the same symptoms as before.

They have written a few test scripts to stress-test the part. One particularly interesting result is that when this failure case is seen, the part can respond over SMBus for a brief time before going silent. The following test excerpt is from a script which cycles the reset pin, immediately performs 60 sequential reads, and repeats. The pseudocode for this script looks like this:

For 1 to 100:

     Assert reset

     Wait 1 second

     De-assert reset

     For 1 to 60:

           Read register looking for “01 51 04 40” (TI signature)

           Print * if found or X if failed/NAK’d.

           Wait 1 millisecond

     Done

Done

 

The result looks like this:

[root@impinj-13-ae-3d /tmp]# cat hub-polltime-test.log
Begin USB Hub Reset Test
0: * X X * * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
1: * X X * * * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
2: * X X * * * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
3: * X X * * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
4: * X X * * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
5: * X X * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
6: * X X * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
7: * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
8: * X X * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
9: * X X * * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
10: * X X * * * * * * * * * * * * * * X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X        Test done. Failed 11/11 times

They haven’t characterized the time our processor spends on all of the commands so each column isn’t exactly 1ms, but it’s close. This shows when pulled out of reset the part responds once immediately, fails the next two times, and then can respond until approximately 17 milliseconds have elapsed. A minority of the time it never responds (like line 7). All they normally do in software is write the config active bit to enable upstream data. The other defaults are fine for them. If our service manages to do this in that 17ms window, the part continues to pass USB data; you might never know it was in this state.

But this isn’t constantly failing. They could take this device failing for hours in a row and fully power-cycle it, and see it pass for hours. It’s pretty random.

[root@impinj-13-ae-3d /tmp]# cat hub-polltime-test2.log
Begin USB Hub Reset Test

0: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

1: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

2: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

3: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

4: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

5: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

6: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

7: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

8: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

9: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

10: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *        Test done. Failed 0/11 times

Also to note, this has failed in userspace Linux, as well as in the bootloader. They don’t think it’s software or driver related. Currently one of their build variants takes longer to configure our PMIC, and that one seems to be failing on units which don’t seem to fail otherwise.

Thanks!

Errol