This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Issue with musb-hdrc and sd-card reader

Other Parts Discussed in Thread: AM3352

Hi,

We've been experiencing an issue writing to a sd-card reader connected to a self powered USB hub. USB path is Sitara->hub->sd-card reader. The behaviour is that after a while of continuously writing to the sd-card eventually it seems that the sd-card device appears to lock-up and an error correction attempt kicks in ("reset high-speed USB device number 3 using musb-hdrc"). See dmesg at bottom for full dmesg output.

We used to see this problem once every couple of days. Since upgrading to the linux-3.2.0-psp04.06.00.11 kernel and upgrading to AM3352 silicon rev 2.1 we see this problem less frequently (once in a few weeks). The problem seems to only occur when we have more than one device connected to the USB hub although it is hard to confirm this because the problem is so intermittent.

We are certain that the problem is in the sitara or in the driver software. The reason I say this is that when the system gets into this state we tried issuing hard resets to all the connected devices in the chain  (ie: we see the "reset high-speed USB device number 3 using musb-hdrc" immediately).

Is there a way to reset the USB controller in isolation? Can this be done in a way that the linux USB stack will correctly re-enumerate the devices and remain happy?

Our issue looks to be identical to this post:

http://e2e.ti.com/support/arm/sitara_arm/f/791/t/332492.aspx?pi301021=2

I have some concerns about what happened in this case. The user reported that by disabling DMA mode the problem was resolved. My concern is that the reason this helps is that when dma is disabled the interval between transactions will naturally be longer. I'm sure this is just masking the real problem.

Is there another way to simply add a delay between transactions while DMA is still enabled?

Also, in this post is a quote from the video encoder manufacturer noting some issues in the packets being sent from the controller: 

"It looks like you are getting corrupted USB packets indicated by the packet discontinuity messages having a bogus sequence number, and a packet that must have misinterpreted as a end-of-stream packet. ... It could be that  host USB hardware has problems when using more than 1 device at a time, causing packet corruption"

This certainly rings true in both our cases, eg: the other user noted that streaming to the sd-card from RAM did not reproduce the problem but streaming from another USB device did.

Is there a known fix in any newer drivers that sounds related to this?

Characteristic dmesg output looks like this:

[ 684.158257] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 694.358279] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 710.558256] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 710.758229] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 720.958253] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 721.080640] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 721.087739] sd 0:0:0:0: [sda] Unhandled error code
[ 721.092804] sd 0:0:0:0: [sda] Result: hostbyte=0x05 driverbyte=0x00
[ 721.099516] sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 05 33 68 00 00 08 00
[ 721.107086] end_request: I/O error, dev sda, sector 340840
[ 721.112892] Buffer I/O error on device sda1, logical block 41581
[ 721.119265] EXT4-fs warning (device sda1): ext4_end_bio:244: I/O error writing to inode 30 (offset 0 size 4096 starting block 42606)
[ 721.131876] sd 0:0:0:0: rejecting I/O to offline device
[ 721.137385] sd 0:0:0:0: [sda] killing request
[ 721.142015] sd 0:0:0:0: [sda] Unhandled error code
[ 721.147060] sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
[ 721.153771] sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 34 57 58 00 00 40 00
[ 721.161353] end_request: I/O error, dev sda, sector 3430232
[ 721.167442] JBD2: Detected IO errors while flushing file data on sda1-8
[ 721.174766] sd 0:0:0:0: rejecting I/O to offline device
[ 721.187021] sd 0:0:0:0: rejecting I/O to offline device
[ 721.192566] Buffer I/O error on device sda1, logical block 41582
[ 721.198906] EXT4-fs warning (device sda1): ext4_end_bio:244: I/O error writing to inode 30 (offset 0 size 4096 starting block 42607)
[ 721.213004] Aborting journal on device sda1-8.
[ 721.217814] sd 0:0:0:0: rejecting I/O to offline device
[ 721.223412] JBD2: I/O error detected when updating journal superblock for sda1-8.
[ 721.258357] EXT4-fs (sda1): previous I/O error to superblock detected
[ 721.265290] sd 0:0:0:0: rejecting I/O to offline device
[ 721.288206] EXT4-fs error (device sda1): ext4_journal_start_sb:327: Detected aborted journal
[ 721.297112] EXT4-fs (sda1): Remounting filesystem read-only
[ 721.303005] EXT4-fs (sda1): previous I/O error to superblock detected
[ 721.309862] sd 0:0:0:0: rejecting I/O to offline device
[ 721.385271] EXT4-fs (sda1): previous I/O error to superblock detected
[ 721.392258] sd 0:0:0:0: rejecting I/O to offline device
[ 721.397835] EXT4-fs error (device sda1): ext4_put_super:818: Couldn't clean up the journal

Thank you,

Evan

  • Hi Evan,

    I'll ask the factory team USB experts to look into your issue.

    Best regards,
    Miroslav

  • Hi,

    We had a call with Bin Liu regarding this issue, this is the follow up to this call.

    There was a test I ran that reproduces a similar issue to this one. It involves cat and appending a large file from a USB stick to the USB SD-card reader and simultaneously doing the reverse. This is the test and messages seen:

    while $1; do cat /sd-card/uImage >> /usb/test.tmp; done &
    while $1; do cat /usb/uImage >> /sd-card/test.tmp; done &


    [ 407.798375] usb 1-1.2: reset high-speed USB device number 4 using musb-hdrc
    [ 407.939064] sd 1:0:0:0: [sdc] Media Changed
    [ 407.943526] sd 1:0:0:0: [sdc] Result: hostbyte=0x00 driverbyte=0x08
    [ 407.950270] sd 1:0:0:0: [sdc] Sense Key : 0x6 [current]
    [ 407.955981] Info fld=0x0
    [ 407.958660] sd 1:0:0:0: [sdc] ASC=0x28 ASCQ=0x0
    [ 407.963533] sd 1:0:0:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 00 33 8e a0 00 00 f0 00
    [ 407.971117] end_request: I/O error, dev sdc, sector 3378848
    [ 407.977335] FAT-fs (sdc1): FAT read failed (blocknr 3307)
    [ 407.989725] FAT-fs (sdc1): FAT read failed (blocknr 3305)

    Bin suggested disabling DMA. When DMA was disabled I could not reproduce the issue seen in this test, although as expected everything was significantly slower. One other suggestion was to try to issue a babble recovery via the musb procfs file, which would reset the controller logic. In the above test case this recovered the bus and the devices were re-enumerated. It's not clear how similar this test case is to the intermittent bug we see.

    Finally, Bin asked to see the layout and configuration of our bus in the case of the first failure we see:

    [fe80::d2ff:50ff:fe3b:ee26]# cat /sys/kernel/debug/usb/devices

    T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=480 MxCh= 1
    B: Alloc= 0/800 us ( 0%), #Int= 0, #Iso= 0
    D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1
    P: Vendor=1d6b ProdID=0002 Rev= 3.02
    S: Manufacturer=Linux 3.2.0 musb-hcd
    S: Product=MUSB HDRC host driver
    S: SerialNumber=musb-hdrc.0
    C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA
    I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
    E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=256ms

    T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 3
    D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1
    P: Vendor=0424 ProdID=2660 Rev= a.a0
    C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 2mA
    I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
    E: Ad=81(I) Atr=03(Int.) MxPS= 1 Ivl=256ms

    T: Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 3 Spd=480 MxCh= 0
    D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
    P: Vendor=0424 ProdID=4040 Rev= 2.01
    S: Manufacturer=Generic
    S: Product=Ultra Fast Media Reader
    S: SerialNumber=000000266001
    C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr= 96mA
    I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
    E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
    E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

    T: Bus=01 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#= 5 Spd=480 MxCh= 0
    D: Ver= 2.00 Cls=02(comm.) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
    P: Vendor=1bc7 ProdID=0021 Rev=16.70
    S: Manufacturer=Telit wireless solutions
    S: Product=Telit Wireless Module
    S: SerialNumber=351579050102845
    C:* #Ifs=14 Cfg#= 1 Atr=e0 MxPwr=100mA
    I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm
    E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=1ms
    I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm
    E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 2 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm
    E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=1ms
    I:* If#= 3 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm
    E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 4 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm
    E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=1ms
    I:* If#= 5 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm
    E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 6 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm
    E: Ad=87(I) Atr=03(Int.) MxPS= 64 Ivl=1ms
    I:* If#= 7 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm
    E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=08(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 8 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm
    E: Ad=89(I) Atr=03(Int.) MxPS= 64 Ivl=1ms
    I:* If#= 9 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm
    E: Ad=8a(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=0a(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#=10 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm
    E: Ad=8b(I) Atr=03(Int.) MxPS= 64 Ivl=1ms
    I:* If#=11 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm
    E: Ad=8c(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=0c(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#=12 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=cdc_acm
    E: Ad=8d(I) Atr=03(Int.) MxPS= 64 Ivl=1ms
    I:* If#=13 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_acm
    E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=0e(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms

    Thanks,

    Evan

  • Evan,

    Evan Carson said:

    There was a test I ran that reproduces a similar issue to this one. It involves cat and appending a large file from a USB stick to the USB SD-card reader and simultaneously doing the reverse. This is the test and messages seen:

    while $1; do cat /sd-card/uImage >> /usb/test.tmp; done &
    while $1; do cat /usb/uImage >> /sd-card/test.tmp; done &

    I am able to replicate the issue with 4.6.0.11 kernel using this test, but the issue does not happen with the SDK7.0 kernel. Is it possible for you to migrate to SDK7.0 kernel?

    Our SystemTest engineer for USB is on vacation. Once she is back, I will check with her if the multiple MSC access problem is a known issue in 4.6.0.11 kernel.


  • Hi Bin,

    Thank you for the quick reply.

    Long term we do plan to migrate to the new kernel but because of the device-tree changes required we can't do this right away. I'm doing some testing with DMA disabled in our application to see if the performance is suitable.

    When we do migrate to the newer kernel you had mentioned some additional patches that were available against the USB driver. Could you please point me to where I can find these?

    Thanks again,

    Evan

  • Evan,

    I am monitoring the linux-usb community mailing list, so I see people submitting patches and those have been applied into each newer rc kernel.

    I am also maintaining the wiki http://processors.wiki.ti.com/index.php/Sitara_Linux_SDK_MUSB_Issues to list all the known USB patches for SDK7.0 kernel. I have not updated it for a while, but I will try to keep it up to date.

  • Thank you for this info, I will check the rc kernel for updated driver when the time comes.

    It looks like for now the workaround (turning off DMA) will work for our application. In future we plan to add peripherals to the bus this will drive our switch to the newer kernel.

    Thanks again,

    Evan