Because of the holidays, TI E2E™ design support forum responses will be delayed from Dec. 25 through Jan. 2. Thank you for your patience.

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

USB bulk data transfer speed improvment

Hi all,

Currently, my company is working on  a project that uses DM8148 and it connects to an optical module via the USB0 port. The module is connected as USB vendor spec 0xFF class and it transfer image file to Linux platform running on the DM8148 ARM subsystem.

We are using the TI EZSDK package "ti-ezsdk_dm814x-evm_5_05_02_00".

A single image file of max 10MB is transferred and was found that it took 550msec to transfer this amount of data to Linux ARM which is not acceptable to the project requirements.

Is anyone able to advise how could I improve the transfer speed?

Or could highlight which area I could look into?

Does increasing the Receive Endpoint FIFO size and enabling double buffering help?

Thank you for any help extended.

Regards

May

  • Hi May,

    Can you try improving the USB speed with applying all the USB related patches from the latest linux kernel tree:

    http://arago-project.org/git/projects/?p=linux-omap3.git;a=shortlog;h=refs/heads/ti81xx-master

    See also the below wiki page:

    http://processors.wiki.ti.com/index.php/TI81XX_PSP_04.04.00.02_Feature_Performance_Guide#USB_Driver

    BR
    Pavel

  • Hi Pavel,

    Thank you for your responses.

    I had patched the USB related patches dated from 2012-05-09 to 2013-07-05.

    The changes didn't help in improving the speed.

    Is improvement of speed reported with all these patches?

    Please advise does increasing the Receive Endpoint FIFO size and enabling double buffering help?

    Thanks.

    Regards

    May

  • Hi Pavel,

    From my quick test, after applying all the patches the speed had become slower when transferring image size of 10MB.

    Prior to the patches, the speed was ~550msec to ~700msec, slower by 150msec..

    Regards

    May

  • May,

    Lee, May Fong said:
    Please advise does increasing the Receive Endpoint FIFO size and enabling double buffering help?

    Yes, that might help. See DM814x TRM, sections 25.7.1.2 Bulk Transfers and 25.7.2.2 Bulk Transfer.

    Check also if you are in USB HS (High Speed) mode, and using the USB internal DMA. You can also try with increasing the Cortex-A8 ARM frequency (to 1GHz).

    Regards,
    Pavel

  • Hi Pavel,

    Would like to find out from you if there's a limitation on the TI Linux USB SW driver or TI DM8148 USB subsystem regarding the USB Request Block (URB) transfer buffer length?

    Currently, define in the "linux-2.6.37-psp04.04.00.01/drivers/usb/core/devio.c" under MAX_USBFS_BUFFER_SIZE, the size is set to 16,384 bytes. 

    With URB buffer length of 16KB to receive 10MB from USB port, we are incurring a lot of I/O activities.

    We intend to increase the buffer length to 32KB and even up to 128KB but need to find out if there's any limitation on the DM8148 chipset regarding this size.

    And also in the TI benchmark data under this link http://processors.wiki.ti.com/index.php/TI81XX_PSP_04.04.00.02_Feature_Performance_Guide#Performance_Benchmarks_.28DMA_mode.29

    Referring to the table "USB Host DMA-Read Performance Values", column "Buffer Size (in KBytes)".

    What the "Buffer Size" refers to? Does it refer to the URB buffer length?

    Thank you.

    Regards

    May

  • May,

    Lee, May Fong said:

    Would like to find out from you if there's a limitation on the TI Linux USB SW driver or TI DM8148 USB subsystem regarding the USB Request Block (URB) transfer buffer length?

    Currently, define in the "linux-2.6.37-psp04.04.00.01/drivers/usb/core/devio.c" under MAX_USBFS_BUFFER_SIZE, the size is set to 16,384 bytes. 

    With URB buffer length of 16KB to receive 10MB from USB port, we are incurring a lot of I/O activities.

    This is not DM814x USBSS limitation. This is USB driver design related.

    Lee, May Fong said:
    We intend to increase the buffer length to 32KB and even up to 128KB but need to find out if there's any limitation on the DM8148 chipset regarding this size.

    You can try to increase the max size from 16K to 32K and more, but I do not think this will give you better performance. Note also that you probably need to configure and the other side of the USB transfer with the new stuff (not only DM814x USB).

    I think you will improve the performance when increasing the Cortex-A8 ARM frequency to 1GHz and using double packet buffering.

    Lee, May Fong said:

    And also in the TI benchmark data under this link http://processors.wiki.ti.com/index.php/TI81XX_PSP_04.04.00.02_Feature_Performance_Guide#Performance_Benchmarks_.28DMA_mode.29

    Referring to the table "USB Host DMA-Read Performance Values", column "Buffer Size (in KBytes)".

    What the "Buffer Size" refers to? Does it refer to the URB buffer length?

    I will check this with the USB team.

    Regards,
    Pavel

  • Hi Pavel,

    Thank you for your responses.

    I had forgotten to mention that we had tried over clocking the DM8148 to 1GHz and the improvement was around 50msec which is not sufficient for our requirement and we are concerns about the reliability if we overclock the DM8148.

    And I also had enabled double buffering via setting the fifo_mode = 6 in "struct musb_hdrc_config" and also at the following code snippet:

    ....

    } else if (cpu_is_ti81xx()) {

          musb_config.fifo_mode = 6;  //original 4;

    ...

    }

    In "linux-2.6.37-psp04.04.00.01/drivers/usb/musb/musb_core.c", "mode_6_cfg" is defined:

    /* mode 6 - fits in 32KB */
    static struct musb_fifo_cfg __devinitdata mode_6_cfg[] = {
    { .hw_ep_num = 1, .style = FIFO_TX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 1, .style = FIFO_RX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 2, .style = FIFO_TX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 2, .style = FIFO_RX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 3, .style = FIFO_TX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 3, .style = FIFO_RX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 4, .style = FIFO_TX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 4, .style = FIFO_RX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 5, .style = FIFO_TX, .maxpacket = 256, },
    { .hw_ep_num = 5, .style = FIFO_RX, .maxpacket = 64, },
    { .hw_ep_num = 6, .style = FIFO_TX, .maxpacket = 256, },
    { .hw_ep_num = 6, .style = FIFO_RX, .maxpacket = 64, },

    I verified that this mode was used by enabling the debug printout.

    Without the debug printouts, I verified the timing but no improvement was observed.

    Are you able to advise if I missed out anything on the enabling of the double buffering?

    As for the buffer length increase to larger than 16KB, the reason is that I saw a lot of I/O commands between application (user space) and kernel space to submit URB and releasing URB. 

    Initially, suspect is with the copying of data from kernel space to user space during "processcompl" but this copying actually took a very short duration.

    Hence the thought to increase the buffer length to reduce the number of URB requests which in turn will reduce the number of I/O activities.

    Regards

    May

  • May,

    Lee, May Fong said:

    And I also had enabled double buffering via setting the fifo_mode = 6 in "struct musb_hdrc_config" and also at the following code snippet:

    ....

    } else if (cpu_is_ti81xx()) {

          musb_config.fifo_mode = 6;  //original 4;

    ...

    }

    In "linux-2.6.37-psp04.04.00.01/drivers/usb/musb/musb_core.c", "mode_6_cfg" is defined:

    /* mode 6 - fits in 32KB */
    static struct musb_fifo_cfg __devinitdata mode_6_cfg[] = {
    { .hw_ep_num = 1, .style = FIFO_TX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 1, .style = FIFO_RX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 2, .style = FIFO_TX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 2, .style = FIFO_RX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 3, .style = FIFO_TX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 3, .style = FIFO_RX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 4, .style = FIFO_TX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 4, .style = FIFO_RX, .maxpacket = 512, .mode = BUF_DOUBLE,},
    { .hw_ep_num = 5, .style = FIFO_TX, .maxpacket = 256, },
    { .hw_ep_num = 5, .style = FIFO_RX, .maxpacket = 64, },
    { .hw_ep_num = 6, .style = FIFO_TX, .maxpacket = 256, },
    { .hw_ep_num = 6, .style = FIFO_RX, .maxpacket = 64, },

    I verified that this mode was used by enabling the debug printout.

    Without the debug printouts, I verified the timing but no improvement was observed.

    Are you able to advise if I missed out anything on the enabling of the double buffering?

    I think you are enabling the double buffering correct. See the below link for some info:

    http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/100/t/7608.aspx

    See also if you are aligned with the DM814x TRM USBSS regarding the double (packet) buffering - functionality and registers settings.

    I will also check with the USB team regarding their opinion.

    Regards,
    Pavel

  • Below is the feedback from one of our USB experts:

    Since USB is packet oriented interface, causing interrupt on every single packet transferred on the bus, it is sensitive to interrupt latency and buffer loading/fetching time. Since the buffer load/fetch time is dependent mostly on the DMAs to the USB controller (excluding slowly responding software in userspace) there three things that could be done to increase the throughput:

    1.to decrease the interrupt latency on ARM side Set the OPP to max CPU frequency.

    2.to reduce the number of interrupts generated by USB.
    Enable double packet buffering (DPB) on the both RX and TX desired endpoints (take care about ZLPs in DPB mode - read all USB controller errata sheets). Keep in mind that enabling DPB will double the FIFO size required by that endpoint. YOU MUST reduce some other endpoint's FIFO size to ensure proper configuration, otherwise it may lead to unpredictable behaviour of the USB controller.

    Something like (edit your configuration, this is only example showing mode 4 modification on EP1 and EP2). Be sure to edit the endpoints you are using for bulk transfers:

    /* mode 4 - fits in 16KB */
            static struct musb_fifo_cfg __devinitdata mode_4_cfg[] = {
            - { .hw_ep_num =  1, .style = FIFO_TX,   .maxpacket = 512, },
            - { .hw_ep_num =  1, .style = FIFO_RX,   .maxpacket = 512, },
            - { .hw_ep_num =  2, .style = FIFO_TX,   .maxpacket = 512, },
            - { .hw_ep_num =  2, .style = FIFO_RX,   .maxpacket = 512, },
            +{ .hw_ep_num =  1, .style = FIFO_TX,   .maxpacket = 512, .mode = BUF_DOUBLE, },
            +{ .hw_ep_num =  1, .style = FIFO_RX,   .maxpacket = 512, .mode = BUF_DOUBLE, },
            +{ .hw_ep_num =  2, .style = FIFO_TX,   .maxpacket = 512, .mode = BUF_DOUBLE, },
            +{ .hw_ep_num =  2, .style = FIFO_RX,   .maxpacket = 512, .mode = BUF_DOUBLE, },

    And "steal" the appropriate amount of memory from another endpoint (ISO APs are very comfortable
    for that,
    because of their large packet sizes):

    - { .hw_ep_num = 13, .style = FIFO_RXTX, .maxpacket = 4096, },
    +{ .hw_ep_num = 13, .style = FIFO_RXTX, .maxpacket = 2048, },

    3. Reduce the latency timer in the device sescriptor to 1 like that (the example is taken from MTP
    driver. Edit the appropriate descriptors on the device side !!!):


    static struct usb_endpoint_descriptor mtp_intr_desc = {
            .bLength                = USB_DT_ENDPOINT_SIZE,
            .bDescriptorType        = USB_DT_ENDPOINT,
            .bEndpointAddress       = USB_DIR_IN,
            .bmAttributes           = USB_ENDPOINT_XFER_INT,
            .wMaxPacketSize         = __constant_cpu_to_le16(INTR_BUFFER_SIZE),
            - .bInterval              = 6,
            +.bInterval              = 1,
                       };
  • Hi Pavel,

    Thank you for your help.

    For the 3 changes suggested by your USB experts, I had tried item 1 & 2 but no significant improvements observed with the USB transfer speed.

    For item 1 "to decrease the interrupt latency on ARM side Set the OPP to max CPU frequency.", our DM8148 chipset max CPU frequency is 720MHz, we are already running at this speed. 

    We tried to overclock it to 1GHz and observed a 50 ms improvement but still not sufficient.

    As for item 3 "Reduce the latency timer in the device descriptor", I believe I need to change this bInterval parameter on the device side which I do not have access to hence I could not verify if this helps.

    Please advise if my understanding is incorrect. Thanks.

    A question to your USB experts, if these 3 suggestions had been implemented and no significant improvements to the USB transfer speed could be observed. Does it means that the USB transfer speed had reached its limitation for this DM8148 platform?

    And for this question on the TI benchmark data:

    And also in the TI benchmark data under this link http://processors.wiki.ti.com/index.php/TI81XX_PSP_04.04.00.02_Feature_Performance_Guide#Performance_Benchmarks_.28DMA_mode.29

    Referring to the table "USB Host DMA-Read Performance Values", column "Buffer Size (in KBytes)".

    What the "Buffer Size" refers to? Does it refer to the URB buffer length?


    Are you aware what this buffer size refers to?

    Regards

    May

  • I suggest, first without any changes, take the bus traces and analyze where the time is lost. Analyze what is the traffic pattern, turnaround response time between host and gadget for every usb transfer. Analyze whether the time is lost in host side or gadget side.

    I dont think it is driver related issue. If data transfered in bulk size like 64KB for every URB, the interpacket gap (for 512bytes) must be very less. You can measure how much time taken by bus/DMA to transfer the data to/from memory.

    Regards

    Ravi