This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6748 USB throug put

Other Parts Discussed in Thread: OMAP-L138, AM1808

Hello,
customer claimed that the data throughput of the USB is very slow.

GPIO_SET_DATA8=LITERAL_BIT_12;
USBBufferWrite
(&s_sTxBuffer,&pOutDataBuffer[Offset],sc_uiBytesPerTxPacket);
GPIO_CLR_DATA8=LITERAL_BIT_12;

To transmit a 1024 byte packet it needs 550us (1.8 MBytes/s) over end point 2. It is the same when using USB1.1 or USB2.0.
The correct USB version is shown by the host PC. Are there some configuration in the g_pBulkDeviceDescriptor entry to speed up the data rate?

Regards
Holger

  • Hi Holger,

    There is no configuration available in g_pBulkDeviceDescriptor to speed up the data rate.

    Please refer the below thread for the same.

    http://e2e.ti.com/support/embedded/starterware/f/790/p/180775/674580.aspx

  • Hi Holger,

    The binterval of end point descriptor shall be modified to improve the speed.

    Bus Speed / Maximum Latency / bInterval units

    High / 125 usec–4 sec/ 125 usec

    Full / 1-255 msec / 1 msec

    Low / 10-255 msec / 1 msec

     From the table it can be seen that the smallest possible interrupt latency that can be achieved between a device and the host is 125 usec. The device requests interrupt latency by setting the bInterval field of the endpoint descriptor for the corresponding interrupt endpoint. The requested latency can be calculated by (bInterval) x (bInterval units).

  • Hello,
    still the customer has a data throughput of 2Mbyte/s with USB2.0. Theoretically he should get 60Mbyte/s!

    Is there an example project with USB2.0 device bulk? (in the starterware it is only for USB1.1).

    Is the USB DMA (CPPI DMA Engine) enabled per default? Would it speed up the data throughput?

    Regards
    Holger

  • Hi Holger,

    The Starterware driver does not enable high speed enable bit in the POWER register. Please find the code snippet from starterware.

    Path:  ..Texas Instruments\pdk_C6748_2_0_0_0\C6748_StarterWare_1_20_03_03\drivers\usb.c

    Function: void

    USBDevConnect(unsigned int ulBase)
    {
    /* Check the arguments. */
    ASSERT(ulBase == USB0_BASE);

    /* Enable connection to the USB bus. */
    HWREGB(ulBase + USB_O_POWER) |= USB_POWER_SOFTCONN;
    }

    Path: ..\Texas Instruments\pdk_C6748_2_0_0_0\C6748_StarterWare_1_20_03_03\include\hw\hw_usb.h

    #define USB_POWER_ISOUP 0x00000080 // Isochronous Update
    #define USB_POWER_SOFTCONN 0x00000040 // Soft Connect/Disconnect
    #define USB_POWER_RESET 0x00000008 // RESET Signaling
    #define USB_POWER_RESUME 0x00000004 // RESUME Signaling
    #define USB_POWER_SUSPEND 0x00000002 // SUSPEND Mode
    #define USB_POWER_PWRDNPHY 0x00000001 // Power Down PHY

    As per OMAP-L138 TRM Section 35.4.22, 

    bit 5 HSEN - When set, the USB controller will negotiate for high-speed mode when the device is reset by the
    hub. If not set, the device will only operate in full-speed mode.

    Suggestion:

    By enabling the HSEN, we can achieve USB High speed.


    Holger:

    Is the USB DMA (CPPI DMA Engine) enabled per default? Would it speed up the data throughput?

    I will get back to you shortly on this.

    Thanks.

  • Hello,
    customer tested it with setting the HSEN bit but it make no different (it also is set per default!). Even he already has higher data rates than USB full speed.
    Please let me know the infos of the USB DMA.

    Regards
    Holger

  • Holger: said:
    Is the USB DMA (CPPI DMA Engine) enabled per default? Would it speed up the data throughput?

    1. No. It is not enabled per default. The cppi41dma.c driver has APIs for DMA configurations. (..\Texas Instruments\pdk_C6748_2_0_0_0\C6748_StarterWare_1_20_03_03\drivers\cppi41dma.c)

    2. The  usb_dev_msc example uses the CPPI DMA for transfer.

    The existing StarterWare USBLib Bulk Class is implemented to support only PIO mode for endpoint I/O.

    Please refer the starterware thread for details.

    http://e2e.ti.com/support/embedded/starterware/f/790/t/184201.aspx

    Below are few pointers which you can use to improve the throughput on StarterWare USBLib front, 
    - Using CPPI-DMA for I/O of the Bulk EndPoint (and EDMA to transfer anywhere within the application) can increase the transfer rate.
    - Setting the maximum packet size to 512 will also improve the throughput.

  • Hello,
    the register
    unsigned int x=HWREGB(ulBase + USB_O_POWER);
    showed the value 0x70 -> bit 5 (HSEN) set. It is set in the function USBDCDInit(...). So this should be ok.

    Further customer found the function "USBEndpointDMAEnable" and "USBEndpointDMADisable". Calling this function didn't change anything. Is there any documentation on the CCPI DMA API?
    E.g. what does "USB_EP_DEV_IN"? Is this USB nomeclature? (IN-Endpoint: Client->Host or "..DEV_IN" )

    Customer max throughput is now at 4.5MByte/s. He has to reach 20MByte. Isn't there any example project where we reach >=20MByte/s?

    Regards
    Holger

  • Hi,

    Is there any documentation on the CCPI DMA API? 

    API document shall be found at Starterware installation path.

    what does "USB_EP_DEV_IN"? Is this USB nomeclature? (IN-Endpoint: Client->Host or "..DEV_IN" )

    Yes. You are correct. On IN-EP, data flow will be from device to host.

    Isn't there any example project where we reach >=20MByte/s?

    I could not find any example projects. Did the earlier post suggestion helped in improving the performance?

    Thanks.

  • Hello,
    with the current suggention the customer could increase the data throughput to 4.5MByte/s. But he needs 20MBytes/s!

    Regards
    Holger

  • Hi Holger,

    As suggested by Rajashekaran, your best bet would be to use the usb_dev_msc example and strip of the MSC layer. This would involve some work but your performance bottle neck would be reduced . My guess is with the CPPI DMA configured in transparent mode of operation you would get a RAW performance of around 12- 13 MB/s with the stripped down MSC code ( At least this is what i got in AM335x  ).

    The bulk_dev example would be running in  PIO mode and you would find it difficult to push its performance.

  • Hi Holger,

    I have few questions. 

    1. What is the CPU speed they are using?

    2. Is there any limitation in increasing the CPU speed if they are operating at lower frequency?

    I am not able to download the Windows host driver from  StellarisWare download page. The provided link is not accessible. 

    Wiki Link: http://processors.wiki.ti.com/index.php/StarterWare_USB#Running_the_Example_Application

    If you have those files please upload them.

    Thanks.

  • Hello,
    they use the 456MHz Q100 derivate.

    > 2. Is there any limitation in increasing the CPU speed if they are operating at lower frequency?
    What do you mean with this question?

    > My guess is with the CPPI DMA configured in transparent mode of operation you would get a RAW performance
    > of around 12- 13 MB/s with the stripped down MSC code ( At least this is what i got in AM335x  ).
    Btw, is there a running CPPI DMA driver?
    Did you already measure 12-13MByte/s? Is this profed?

    You find the Win USB driver attached.

    Regar> ds
    Holger


    libusb-win32-bin-1.2.6.zip
  • Hi Holger

    I see that you have editted your previous response to update that the customer is using 456 MHz not 300 MHz. I think this clarifies Raja's Q2. The team just wanted to make sure that customer is not using Q3 part or limited to have the DSP running max at 300 MHz.

    Thanks.

  • Hi Holger,

    > 2. Is there any limitation in increasing the CPU speed if they are operating at lower frequency?
    What do you mean with this question?

    [Raja] The throughput is improved around 2MB/s with 456MHz CPU speed.

    > My guess is with the CPPI DMA configured in transparent mode of operation you would get a RAW performance of around 12- 13 MB/s with the stripped down MSC code ( At least this is what i got in AM335x ).
    Btw, is there a running CPPI DMA driver?

    [Holger] Did you already measure 12-13MByte/s? Is this profed?

    [Raja] I measured the usb device mass storage(usb_dev_msc) class throughput and it was around 9.1-9.3 MB/Sec with 456MHz CPU speed.
    I modified & measured usb_dev_msc for Raw performance and it was around 12-13MB/sec.
    I am working on to create USB bulk device(usb_dev_bulk) example with CPPI DMA enabled.

    [Holger] You find the Win USB driver attached.

    [Raja] Thanks. I have downloaded the USB Windows driver & application from below links.
    http://www.ti.com/tool/sw-usb-windrivers
    http://www.ti.com/tool/sw-usb-win&DCMP=STELLARIS&

    Thanks.

  • Hi Holger,

    I have enabled CPPI DMA for TX and RX transfers in usb bulk driver & usb_dev_bulk example(attached the modified files).

    I am able to receive the data successfully with DMA from host but while transmitting back to host I am facing some issues(Debugging the same).

    Build Procedure:

    1. While building the usblib (usblib_c674x_c6748), enable the BULK_DMA_MODE & DMA_MODE compiler predefined symbols. 

    2. While building the usb bulk example (usb_dev_bulk_c674x_c6748_lcdkC6748), enable the BULK_DMA_MODE compiler predefined symbol.

     

  • Hi Holger,

    Few updates to the earlier discussion points.

    Setting the maximum packet size to 512 will also improve the throughput:

    The usb_dev_bulk tested with the maximum packet size of 256. The following changes would require for 512 max packet size.

    1. Update the following macros in usbdbulk.c as below

        #define DATA_IN_EP_MAX_SIZE 512

        #define DATA_OUT_EP_MAX_SIZE 512

    2. Update the following macro in usb_bulk_structs.c

        #define BULK_BUFFER_SIZE 512

    3. Rebuild the usblib and usb bulk examples.

    4. The USB Host application should be modified to transmit / receive the 512 bytes packet.


    EDMA to transfer anywhere within the application:

    1.  No update required on Starterware USB driver.

    2. Shall integrate Starterware EDMA example with USB application(replace memcpy with DMA transfers).


    Note: With reference to the previous post, While building usblib & example, please delete DEBUG from compiler predefined symbols.

    The DEBUG will enable console UART debug messages.

    Thanks.

  • Hi Holger,

    The below numbers were achieved by running the CPU at 456MHz with USB 512 bytes max transfer size.

    1. The final throughput with usb_dev_msc(mass storage class) file system write is 9.1-9.3MB/s.

    2. The MSC striped raw USB transfer throughput is around 12.3-12.9MB/s.

    I referred usb_dev_msc and implemented usb_dev_bulk with CPPI DMA with high speed. 

    I am working on further optimization for performance. 

    Thanks.

  • Hi Holger,

    I have fixed the USB packet transmit issue reported earlier and attached the updated source.

    Enable BULK_DMA_MODE compiler predefined symbol in usblib & usb_dev_bulk, no change in the build procedure. 

    Holger: said:

    > I referred usb_dev_msc and implemented usb_dev_bulk with CPPI DMA with high speed. 
    > I am working on further optimization for performance. 
    I understand that you are working on the usb_deb_bulk SW with CPPI DMA but doesn’t have throughput data yet. Is this corret?
    When do you think you have first results?

    Yes. Your understanding is correct. I could measure throughput, only for usb_dev_msc example.

    I do not have USB host application support to measure throughput for usb_dev_bulk. The TI wiki links also does not help me to measure throughput on usb_dev_bulk. 

    Thanks.

  • Hello,
    customer measure the throughput with following method:
    - setting a GPIO on high before receiving or sending a 512byte package, after that setting it on low
    - average data throughput: continous receiving and sending. Data amoung / time measured by the PC (eg. 500 kB pro Frame, 8 fps => 4 MB pro Sekunde). Of course this data rate is less then for singel data packages.

    Regards
    Holger

  • Hi Holger,

    Thanks.

    I have integrated GPIO handling in usb_dev_bulk example and it is working fine. 

    But the Stellaris Windows USB Host examples does not support 512 bytes transfer(it supports 256 bytes) & Continuous packet transfers.

    I am getting some strange numbers(in throughput case which is not expected) and currently working on it to place the GPIO toggle in an appropriate place(256 bytes transfer).

    In the meantime, Is it possible to check it out at your end?

  • Hi Holger,

    Got 5.7-6.0 MB/s Write throughput on usb_dev_bulk(with CPPI DMA) with 256 Bytes USB Packet Size & 456MHz CPU speed. 

    Please find the updated source files(with GPIO integrated) in the attachment.

    Note: 

    • I could check only for 256 packet size because host application does not support 512 bytes.
    • I am running these benchmarks in C6748 LCDK.
    • Updated the gel file to change the CPU frequency(456MHz).

    Link to GEL file: 3326.5141.456MHZ_C6748_LCDK.gel

  • Hi Rajasekaran,

          Thanks for your test results. It's very useful.

    BTW, is there any test result for Usb_host_msc?

     

  • Hi Gary Wu,

    We don't have any performance data on USB Starterware for host or device. 

    I have worked on usb_dev_bulk example to improve USB Starterware performance(Customer Issue). 
    The device USB performance is measured on Linux and BIOS USB drivers.
    The performance data on Linux shall be found at following wiki links. (03.21.00.04 and 03.22.00.02 PSP releases)
    Please refer datasheet for BIOS USB Performance data from below link.
    Thanks.
  • Rajasekaran,

    Did not find attachement for your modified files, would you please share it again?

  • Hi Tony Tang,

    Please accept my friend request. I will send the source files in e-mail.

    Note: This source has not been officially released. 

    Thanks.

  • Hi Tony Tang,

    Have you received the file? I have sent via e2e conversations.

    If not received, please provide your e-mail id to send.

    Thanks.

  • Hi Rajasekaran,

        I did not find the attachement too,could you send it to me?

        My E-mail address:qiyejun@aliyun.com

        Thank you very much!

    Best regards

    June

  • hi rajasekaran,

    currently i'm trying to implement support of cppi41dma into my project, but facing some problems...

    would you mind sending me your code, so i can test it here and search for my error...

    thank you very much

    best regards

    peter

  • Peter and all,

    I would suggest against using the CPPI 41 DMA, because after a year of development I was still unable to ensure it worked error free. Under light loads it seems to run fine, but when stress testing it, it would occassionally lock up and on closer inspection, using debug, all CPPI queues would be empty, including the ones I had allocated as free queue and the Tx and Rx queues, as if there were a ?hardware? race condition occurring some how.

    I have however developed a working USB CDC-ACM driver(serial port) that can easily reach >20MB/s using only the 300MHz clock, with rates often above 30MB/s. The company I work for is now licencing the driver and I would recommend against going through the heartache of trying to get a high speed USB driver working, and especially not using the CPPI41DMA.

    If you want any more information please feel free to contact me:

    peter.myerscough-jackopson@macltd.com

    or see

    http://www.macltd.com/usbdriver

    (Includes a link to our manual and API documentation)

  • Hey Rajasekaran,

    I have successfully run the usb_dev_bulk example(with CPPI DMA) with 512 Bytes USB Paceket Size and 456MHz CPU speed,but the USB transferring speed (Read/Write) is 1MB/s or so.

    I guess the reason of low speed lies  in 'memcpy' function in the USBDBulkPacketWrite and USBDBulkPacketRead, this function may be replaced by EDMA.

    On the other hand, to enlarge the transfer package size is another method to improve the perfomance,when my host application initiated a bulk write operation with 1024 bytes,the result is error.

     usb_bulk_write(dev, EP_IN, data_in, 1024, 5000);

    I attempt to modify the endpoint infomation like the following,

    #ifdef DMA_MODE
        endpointInfo epInfo[]=
        {
            {
                USB_EP_TO_INDEX(USB_EP_1),
                CPDMA_DIR_RX,
                CPDMA_MODE_SET_GRNDIS,
            },
            {
                USB_EP_TO_INDEX(USB_EP_1),
                CPDMA_DIR_TX,
                CPDMA_MODE_SET_GRNDIS,
            }
        };

    #endif

    But the result is still error, so please give me some help.

    Best Regards,

    Sting Wang

  • Hi,

    Please create a new thread since old threads will be given lesser attention compared to the new one.

    Thanks & regards,

    Sivaraj K

  • Hi Sting Wang,

    1. Your understanding is correct. You need replace memcpy with EDMA transfers.

    2. Also refer usb_dev_msc application to enable DMA transfer on EPs.

    I have modified usb_dev_bulk by referring usb_dev_msc for throughput testing.

    Thanks.

  • Hi Rajasekaran,

    I also have modified usb_dev_bulk for throughput testing. My result is 54 Mbit/sec. Bu I understood your posts you have achieved about 96 Mbit/sec. Can you share your test source code?

    Thanks.

  • hi kemal,

    how did you reach the 54mbit/s? would you mind sharing some example code?

    thank you for your help

  • Peter,

    If you are struggling to reach 54Mbit/s, may I suggest my company's driver. It can easily reach 20MBytes/s (20MBytes/s == 160Mbit/s).

    In an application that was continuously streaming data from the UPP we managed to get it working at above 30MBytes/s, with a 300MHz DSP part.

    It is written for the DSP core, running DSP/BIOS, and operates as between one and three CDC-ACM ports.We have a demo version available and the documentation is freely downloadable from our web-site.

    http://www.macltd.com/usbdriver

    The licencing for the driver is quite reasonable and should save you a significant amount of time. We spent a long time getting it working and performing well.


    Peter Myerscough-Jackopson

  • Hi Peter,

    I have integrated the usb_dev_bulk example with CPPI DMA. I have changed  USB Packet Size as 1024 Bytes and transfer packet size as 65536 Bytes so now max USB throughput (Read/Write) is about 280/200 Mbit/s. But I tested on AM1808 not C67x. I will try to send sample code asap.

    Kemal

  • Kermal,


    Do be aware that the bulk endpoint for a USB2 link has a maximum official size of 512 bytes, so although the hardware supports this larger packet size I do believe this is non-standard. I would also encourage you to soak test your use of the CPPI DMA. I found it to occasionally lock up and lose my scheduled transfers from all the hardware queues.

    Peter

  • Peter,


    Thanks your suggestion and concern. I have done some stress test but not soak test. And actually my work is on discovering the limits. So there will be have some bugs and it will be need doing some works for improvements.

    Kemal

  • Hi,

    I share test project about AM1808 USB bulk mode with CPPI DMA.

    http://e2e.ti.com/support/embedded/starterware/f/790/t/373260.aspx

    Kemal

  • Thanks very much, I have solve my problem as refer your project

  • Hi June,

    Thank you for the update. Please help us to close this thread.

     

     

  • Hi
    Where can I get a copy of this Bulk example that uses DMA?
    Thanks
    Dhar

  • Hi Dhar,

    I've attached the USB library file which is modified for USB bulk example with DMA enabled.

    starterware_1_20_04_01.zip

  • Hi

    I used the files you attached

    defined BULK_DMA_MODE

    defined DMA_MODE

    set the USB_POWER_HS_MODE flag

    still only getting 1mbps

    Processor running at 300mhz

    Not much else happening in the system, I can increase the number of data blocks if I reduce the size per block, so it points to a raw throughput issue.

  • It appears the function USBBufferWrite hogs the processor until the data transfer is complete.

    I was expecting this call to be fast after switching to DMA mode.