Hi,
We've been experiencing an issue writing to a sd-card reader connected to a self powered USB hub. USB path is Sitara->hub->sd-card reader. The behaviour is that after a while of continuously writing to the sd-card eventually it seems that the sd-card device appears to lock-up and an error correction attempt kicks in ("reset high-speed USB device number 3 using musb-hdrc"). See dmesg at bottom for full dmesg output.
We used to see this problem once every couple of days. Since upgrading to the linux-3.2.0-psp04.06.00.11 kernel and upgrading to AM3352 silicon rev 2.1 we see this problem less frequently (once in a few weeks). The problem seems to only occur when we have more than one device connected to the USB hub although it is hard to confirm this because the problem is so intermittent.
We are certain that the problem is in the sitara or in the driver software. The reason I say this is that when the system gets into this state we tried issuing hard resets to all the connected devices in the chain (ie: we see the "reset high-speed USB device number 3 using musb-hdrc" immediately).
Is there a way to reset the USB controller in isolation? Can this be done in a way that the linux USB stack will correctly re-enumerate the devices and remain happy?
Our issue looks to be identical to this post:
http://e2e.ti.com/support/arm/sitara_arm/f/791/t/332492.aspx?pi301021=2
I have some concerns about what happened in this case. The user reported that by disabling DMA mode the problem was resolved. My concern is that the reason this helps is that when dma is disabled the interval between transactions will naturally be longer. I'm sure this is just masking the real problem.
Is there another way to simply add a delay between transactions while DMA is still enabled?
Also, in this post is a quote from the video encoder manufacturer noting some issues in the packets being sent from the controller:
"It looks like you are getting corrupted USB packets indicated by the packet discontinuity messages having a bogus sequence number, and a packet that must have misinterpreted as a end-of-stream packet. ... It could be that host USB hardware has problems when using more than 1 device at a time, causing packet corruption"
This certainly rings true in both our cases, eg: the other user noted that streaming to the sd-card from RAM did not reproduce the problem but streaming from another USB device did.
Is there a known fix in any newer drivers that sounds related to this?
Characteristic dmesg output looks like this:
[ 684.158257] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 694.358279] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 710.558256] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 710.758229] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 720.958253] usb 1-1.1: reset high-speed USB device number 3 using musb-hdrc
[ 721.080640] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 721.087739] sd 0:0:0:0: [sda] Unhandled error code
[ 721.092804] sd 0:0:0:0: [sda] Result: hostbyte=0x05 driverbyte=0x00
[ 721.099516] sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 05 33 68 00 00 08 00
[ 721.107086] end_request: I/O error, dev sda, sector 340840
[ 721.112892] Buffer I/O error on device sda1, logical block 41581
[ 721.119265] EXT4-fs warning (device sda1): ext4_end_bio:244: I/O error writing to inode 30 (offset 0 size 4096 starting block 42606)
[ 721.131876] sd 0:0:0:0: rejecting I/O to offline device
[ 721.137385] sd 0:0:0:0: [sda] killing request
[ 721.142015] sd 0:0:0:0: [sda] Unhandled error code
[ 721.147060] sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
[ 721.153771] sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 34 57 58 00 00 40 00
[ 721.161353] end_request: I/O error, dev sda, sector 3430232
[ 721.167442] JBD2: Detected IO errors while flushing file data on sda1-8
[ 721.174766] sd 0:0:0:0: rejecting I/O to offline device
[ 721.187021] sd 0:0:0:0: rejecting I/O to offline device
[ 721.192566] Buffer I/O error on device sda1, logical block 41582
[ 721.198906] EXT4-fs warning (device sda1): ext4_end_bio:244: I/O error writing to inode 30 (offset 0 size 4096 starting block 42607)
[ 721.213004] Aborting journal on device sda1-8.
[ 721.217814] sd 0:0:0:0: rejecting I/O to offline device
[ 721.223412] JBD2: I/O error detected when updating journal superblock for sda1-8.
[ 721.258357] EXT4-fs (sda1): previous I/O error to superblock detected
[ 721.265290] sd 0:0:0:0: rejecting I/O to offline device
[ 721.288206] EXT4-fs error (device sda1): ext4_journal_start_sb:327: Detected aborted journal
[ 721.297112] EXT4-fs (sda1): Remounting filesystem read-only
[ 721.303005] EXT4-fs (sda1): previous I/O error to superblock detected
[ 721.309862] sd 0:0:0:0: rejecting I/O to offline device
[ 721.385271] EXT4-fs (sda1): previous I/O error to superblock detected
[ 721.392258] sd 0:0:0:0: rejecting I/O to offline device
[ 721.397835] EXT4-fs error (device sda1): ext4_put_super:818: Couldn't clean up the journal
Thank you,
Evan