This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

musb-hdrc storage problem

I have an AM3354-based board with an SMSC 4-port hub IC.  A self-powered desktop 4-port hub is plugged into one of the ports.  I have 4 USB mpeg encoders plugged into the desktop hub.  I'm streaming from four cameras.  This all works fine.

The trouble happens when I try and record the video to an SD Card.  If I record using the mmc socket, everything works fine.  However, if I try and record using a USB SD card reader (plugged into one of the SMSC's 4 ports), then I get the following problem within a few minutes.  Any ideas?  I'm using a patched AM33x v3.2-staging kernel from Arago.

[  377.120715] usb 1-1.2: reset high-speed USB device number 3 using musb-hdrc
[  387.320702] usb 1-1.2: reset high-speed USB device number 3 using musb-hdrc
[  403.531115] usb 1-1.2: reset high-speed USB device number 3 using musb-hdrc
[  403.740716] usb 1-1.2: reset high-speed USB device number 3 using musb-hdrc
[  413.940681] usb 1-1.2: reset high-speed USB device number 3 using musb-hdrc
[  414.063350] sd 0:0:0:0: Device offlined - not ready after error recovery
[  414.070468] sd 0:0:0:0: [sda] Unhandled error code
[  414.075558] sd 0:0:0:0: [sda]  Result: hostbyte=0x05 driverbyte=0x00
[  414.082312] sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 24 bf c0 00 00 c0 00
[  414.090217] end_request: I/O error, dev sda, sector 2408384
[  414.096272] sd 0:0:0:0: rejecting I/O to offline device
[  414.101773] sd 0:0:0:0: killing request
[  414.105864] sd 0:0:0:0: rejecting I/O to offline device
[  414.111361] sd 0:0:0:0: [sda] killing request
[  414.115969] sd 0:0:0:0: rejecting I/O to offline device
...
[  414.838917] sd 0:0:0:0: rejecting I/O to offline device
[  414.844827] sd 0:0:0:0: [sda] Unhandled error code
[  414.849893] sd 0:0:0:0: [sda]  Result: hostbyte=0x01 driverbyte=0x00
[  414.856650] sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 24 c0 80 00 00 40 00
[  414.864418] end_request: I/O error, dev sda, sector 2408576
[  414.871824] sd 0:0:0:0: rejecting I/O to offline device
...
[  415.425457] sd 0:0:0:0: rejecting I/O to offline device
[  415.437045] FAT-fs (sda1): unable to read inode block for updating (i_pos 131188)
[  415.460655] sd 0:0:0:0: rejecting I/O to offline device
...
[  416.241209] sd 0:0:0:0: rejecting I/O to offline device
[  416.247108] FAT-fs (sda1): unable to read inode block for updating (i_pos 131188)
[  416.273638] sd 0:0:0:0: rejecting I/O to offline device
...
[  416.350794] sd 0:0:0:0: rejecting I/O to offline device
[  416.361045] FAT-fs (sda1): FAT read failed (blocknr 4697)
[  416.367093] sd 0:0:0:0: rejecting I/O to offline device
[  416.372757] FAT-fs (sda1): FAT read failed (blocknr 4692)
[  416.555313] sd 0:0:0:0: rejecting I/O to offline device
[  416.561677] FAT-fs (sda1): FAT read failed (blocknr 4697)
[  416.567787] sd 0:0:0:0: rejecting I/O to offline device
[  416.573705] FAT-fs (sda1): FAT read failed (blocknr 4692)
[  418.020756] sd 0:0:0:0: rejecting I/O to offline device
[  418.026356] FAT-fs (sda1): bread failed in fat_clusters_flush
[  421.381803] sd 0:0:0:0: rejecting I/O to offline device
...
[  422.129512] sd 0:0:0:0: rejecting I/O to offline device
[  422.140967] FAT-fs (sda1): unable to read inode block for updating (i_pos 131192)
[  422.160227] sd 0:0:0:0: rejecting I/O to offline device
...

  • Jason,

    Are you able to hook up a usb analyzer to capture a trace while the reset happens?

    Does the issue also happen if only one usb encoder is plugged?

    I guess you have usb dma enabled, ('zcat /proc/config.gz | grep CPPI' will tell it), have you tried to disable usb dma to see if the issue still happens? The usb throughput will be much lower when dma is disabled, I think it probably will not be able to support all 4 usb encoders when you run the test.

  • Jason Clay said:
    I'm using a patched AM33x v3.2-staging kernel from Arago.

    What patches have you applied?

  • Bin Liu said:

    Are you able to hook up a usb analyzer to capture a trace while the reset happens?

    I don't have such equipment.

    Bin Liu said:

    Does the issue also happen if only one usb encoder is plugged?

    No, although I get occasional video flicker, which I don't get when writing to the mmc socket.

    Bin Liu said:

    I guess you have usb dma enabled, ('zcat /proc/config.gz | grep CPPI' will tell it), have you tried to disable usb dma to see if the issue still happens? The usb throughput will be much lower when dma is disabled, I think it probably will not be able to support all 4 usb encoders when you run the test.

    root@sbc:~# zcat /proc/config.gz | grep CPPI
    CONFIG_USB_TI_CPPI41_DMA_HW=y
    # CONFIG_USB_TI_CPPI_DMA is not set
    CONFIG_USB_TI_CPPI41_DMA=y

    I have not tried with DMA disabled.  I could try one encoder with DMA disabled, to see if the video flicker goes away?

  • Bin Liu said:

    I'm using a patched AM33x v3.2-staging kernel from Arago.

    What patches have you applied?

    [/quote]

    A few board-specific patches, none of which modify the usb stuff.  And the following patch for fixing musb handling of zero-length packets:

    3583.0001-usb-musb-cppi41-zlp-rx-handling-in-generic-rndis-mod.patch.txt

  • Jason,

    Your initial log tells the SD card I/O failed. I could think of two possible reasons.

    1. usb transfer failure causes MSC/SCSI failure. It could be usb dma issue while handling certain pattern of usb packets. testing with dma disabled will narrow it down a little bit. But only the usb analyser trace can tell what happened.

    2. MSC/SCSI layer times out due to priority scheduling. '/sys/block/sda/device/timeout' defines the default timeout value. Is it possible to give higher priority to the SD card read/write thread/process than encoders?

  • Bin Liu said:

    1. usb transfer failure causes MSC/SCSI failure. It could be usb dma issue while handling certain pattern of usb packets. testing with dma disabled will narrow it down a little bit. But only the usb analyser trace can tell what happened.

    With dma disabled and one encoder, the flicker seems to go away.  But with all encoders, the original issue occurs.

    Bin Liu said:

    2. MSC/SCSI layer times out due to priority scheduling. '/sys/block/sda/device/timeout' defines the default timeout value. Is it possible to give higher priority to the SD card read/write thread/process than encoders?

    Are you talking about changing priority in my application somehow?

    I forgot to mention that I tried streaming live video with no recording, but I simulated writing 1GB to the SD card with something like this:

    dd count=1k bs=1M if=/dev/zero of=/mnt/CARD/test.img

    and the issue did not occur.

    I'm using gstreamer for the live streaming and recording.

  • Jason Clay said:
    But with all encoders, the original issue occurs.

    Ok, this tells the issue is not related to usb dma, since it also happens with dma disabled.

    I think we can try to debug this with usbmon trace. Please refer to Documentation/usb/usbmon.txt and capture a usbmon trace.

    I am not fluent in interpreting usbmon trace, since I use analyzer all the time. So you could also post this issue and the usbmon trace to linux-usb mailing list, there are many people who are master of usbmon, I believe they can give some comments about the trace.

    Jason Clay said:
    Are you talking about changing priority in my application somehow?

    Yes, but I think we can focus on usbmon trace for now.

  • usbmon output attached

    Also sent to linux-usb as suggested.

    8738.1.mon.out.tar.gz

  • Thanks Jason, could you please also provide 'lsusb' output so I know which device is the SD card reader?

  • I don't have lsusb but here are the devices according to /sys/kernel/debug/usb/devices:

    MUSB host:
    T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480  MxCh= 1

    HUB:
    T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=480  MxCh= 4

    USB storage:
    T:  Bus=01 Lev=02 Prnt=02 Port=01 Cnt=01 Dev#=  3 Spd=480  MxCh= 0

    HUB:
    T:  Bus=01 Lev=02 Prnt=02 Port=03 Cnt=02 Dev#=  4 Spd=480  MxCh= 4

    encoder:
    T:  Bus=01 Lev=03 Prnt=04 Port=00 Cnt=01 Dev#=  5 Spd=480  MxCh= 0

    encoder:
    T:  Bus=01 Lev=03 Prnt=04 Port=01 Cnt=02 Dev#=  6 Spd=480  MxCh= 0

    encoder:
    T:  Bus=01 Lev=03 Prnt=04 Port=02 Cnt=03 Dev#=  7 Spd=480  MxCh= 0

    encoder:
    T:  Bus=01 Lev=03 Prnt=04 Port=03 Cnt=04 Dev#=  8 Spd=480  MxCh= 0

  • Jason,

    The usbmon trace does not have much traffic on address #3 which is the SD card reader. The log only has the info about after the SD card reader has been reset, which is normally after the SD card read/write failure. Did you start cat 0u too late?

    The SCSI layer by default has 30 sec timeout. You can try 'echo 2 > /sys/block/sda/device/timeout' to shorten it to 2 sec. If the SD card reset was caused by SCSI errors, 2sec timeout gives better chance to capture the SCSI errors.

    By the way, I have not seen your post on linux-usb mailing list yet.

  • Attached is new usbmon output.  Here the devices are:

    #4 - encoder
    #5 - usb storage
    #2 - hub
    #1 - host

    6082.1.mon.tar.gz

    I've simplified my test somewhat.  I'm using a capture application from the encoder manufacturer instead of gstreamer.  I'm trying just one encoder plugged into the smsc hub.  Same as before, I can capture and write to the MMC socket indefinitely, with no trouble.  When I capture and write to USB storage, it works for at most a minute, and then the capture app seems to timeout, like it's being starved or something.

    I think my previous usbmon output was too big an attachment for the linux-usb mailing list.  I can try again with my simplified output, if we can't solve things here.

  • So I repeated my simplified test (with one encoder using a command-line capture utility) with DMA disabled, and the problem does not occur.

  • Jason,

    I am still on something else right now and have not got time to look at the trace yet. But is it captured with DMA disabled or enabled? I don't think the usbmon traces correctly capture the usb traffic with DMA enabled, so only a usb analyzer trace can tell the truth in this case. So you'd better to replicate the issue with DMA disabled if you don't have an analyzer.

  • The trace is with DMA enabled.  So far I cannot replicate the issue with DMA disabled, but I will continue to try.  By the way, here are comments from the encoder manufacturer:

    "It looks like you are getting corrupted USB packets indicated by the packet discontinuity messages having a bogus sequence number, and a packet that must have misinterpreted as a end-of-stream packet.  

    The lastpacket condition shuts down the processing thread, which is causing the application poll to timeout and stop.

    It could be that  host USB hardware has problems when using more than 1 device at a time, causing packet corruption."

  • I've been running a couple of hours now with 4 encoders and writing to USB storage, and the issue has not occurred.  I'm not sure what happened earlier when I thought the issue occurred even with DMA disabled.

    But it seems with DMA disabled the issue never happens, and furthermore it seems 4 encoders isn't too much bandwidth.  So this might be workable for me, for now.  What are the disadvantages of having DMA disabled?

  • Jason,

    The only issue with DMA disabled would be high CPU load, if that does not affect your application, I think you are ok.

    I just looked the usbmon trace which has one encoder. As you said MSC runs fine in this case, but in Line 5622, the encoder sent a video frame which size is 15686 bytes, which is way much bigger than normal size - around 1K.

    Then the controller sent a SETUP packet (Line 5654). I have to check the UVC Spec to understand what this SETUP packet means. This is not something I use in my daily work, I am unable to integrate it right now.