This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/OMAPL137-HT: Performance of USB

Part Number: OMAPL137-HT

Tool/software: Linux

Hi, 

I would like to ask about the performance of USB on omapl137 platform. Reading the LSP 02.20 Linux Drivers Datasheet, I see that the max speed attained on CDC devices can be (table on pg 24) 36Mbps . Can i ask why is it so low? USB high-speed should give us a throughput of 480Mbps or so?

Thanks,

AQ

  • Hi,

    First and foremost, 480Mbps includes even the protocol packets. So if you consider only the data and not the protocol(could account for 8-10%), it will be lesser than 480Mbps. Next is the type of transfer. If it is Bulk, it will be slower than isochronous transfer. The USB numbers could have been given for ISO. But CDC uses Bulk. Another reason for reduction in speed compared to specified. Then there is CDC protocol overhead. Which should not really a big one though. These are few I could think of at the USB protocol level.

    The major reason for it is even if USB controller is 2.0 complaint and can send/receive the data at specified speed, the actual consumer/dispatcher of data is an embedded controller. So, if the OMAPL137 is not able to send data faster to the USB controller, then the USB speed would be even reduced. It will be less than what a high performing Desktop could achieve by interfacing to the same USB 2.0 complaint controller.
  • Dwarakesh,

    With respect, the answer this user is looking for is whether the "OMAPL137 is not able to send data faster". You have said that if the device is slow, the USB rate will be slow, but that is the question we want to know. We want to know if the device is too slow, or the peripheral is too slow, or if the Linux drivers are too slow, or if any improvements can be made. It certainly seems like a 456MHz processor could easily keep up with a 5-6MByte data rate.

    Getting 36Mbps out of the max 480Mbps is unreasonably low if the drivers claim to support high-speed USB.

    You and your team are very knowledgeable on this device and the Linux support, so if you can expand on the reasons for the lower speed results, it will be greatly appreciated.

    Have you seen support for USB through TI-RTOS on the OMAPL137? Perhaps getting rid of the Linux overhead would improve the performance greatly.

    Thank you for your quick response. Any new information will also be greatly appreciated.

    Regards,
    RandyP
  • Thanks a lot Randy for the expansion here. Appreciate it. 

    I understand that USB would have some overheads and CDC would have some overheads. If we even assume these to be 20%, the effective published transfer rate would be 43Mbps. This still is 10 times slower than what I would expect. So as Randy pointed out, wanted to know where the blockage is. Linux, arm9, arm clock, peripheral clock, USB driver, device driver, edma?

    Would appreciate the help

    Thanks,

    AQ

  • Hi,

    1. "Linux, arm9, arm clock, peripheral clock, USB driver, device driver, edma?"

    Me:All of these, since these are either code running in OMAPL137 or part of slower OMAPL137 hardware. But bulk of it is the rate at which OMAP controller sends to the USB Controller, that slows down the USB performance.

    2. "With respect, the answer this user is looking for is whether the "OMAPL137 is not able to send data faster""

    Me:Thats the point I tried to convey. OMAPL137 is not able to send data transfer faster. Why it is slow, is for combination of reasons as as asked/pointed in point 1.

    3. We want to know if the device is too slow, or the peripheral is too slow, or if the Linux drivers are too slow.

    Me: OMAPL137 is slower than PC. So the device is slow. USB Peripheral cannot be slow, since same e.g. Pendrive I connect to PC, it gives me a higher number with dd command. Linux drivers at the OMAPL137 is slow, since OMAPL137 is slower.

    4. Getting 36Mbps out of the max 480Mbps is unreasonably low if the drivers claim to support high-speed USB.

    Me: Drivers never claim that directly, since what if the driver is made to run in a very slow controller. It is claimed by the USB 2.0. So a USB 2.0 controller can/should provide near to 480Mbps. But what if the USB 2.0 controller is fed with data at a very slow rate compared to PC/Desktop running at a higher speed.

    5. Have you seen support for USB through TI-RTOS on the OMAPL137? Perhaps getting rid of the Linux overhead would improve the performance greatly.

    Me: Baremetal code could even have less overhead than TI-RTOS. But it could only avoid the ovehead of software(and thereby small improvement), but it wont improve the performance significantly. Since be it Baremetal code or TI-RTOS, it still runs on the same Hardware.
  • Hi Dwarakesh, 

    Thanks for a reply. 

    Your reply is based on the premise that OMAPL137 is slow and is not able to feed the USB Controller fast enough. So I am trying to do the math here, assuming I am in bulk transfer mode. Correct me where I fall short. 

    To get 60MB/s (480Mb/s) out of USB controller, it has to be fed with a 32bit sized word every 66ns. 

    An OMAPL137, clocked at 456MHz executes one instruction every 2.2ns. To get the job done, it gets to execute 30 instruction cycles. Should bare metal code have issues transferring 32bit sized word in 66ns?

    Thanks,

    AQ

  • Hi,

    Its getting interesting.

    My concern regarding your math is the assumption "To get the job done, it gets to execute 30 instruction cycles."

    I would agree to your math, if it was a MOV instruction from R1 to R2.

    But that's not the case in moving a 32-bit value to USB controller.

    There are other hardware components like DDR and DMA involved. I would like to point out that DDR and DMA work with their own frequency and not 456 MHz. Assuming the speed is dictated by the slowest of the three(CPU, DDR, EDMA), may be you need to consider the speed of DDR(150MHz may be). But, not for all instructions, only for memory related operations.

    So the code involved even assuming to be a baremetal case
    1. Read/Load the 32-bit value
    2. Configure the USB controller
    3. Configure the DMA controller
    4. Time taken to actual running of DMA, though CPU could be free to perform next instruction, but assuming only one byte transfer.
    5. function entry/exit overhead (I could see minimum 8 instructions getting added as a context save etc, other than the code of the function)

    We need to consider instructions for all the above 5 points. All the 5 steps need to be done, whether it is 1 byte or 512 bytes, if they are sent separately. Also point 1 and 4 execute at DDR speed and not by ARM speed.

  • Hi Dwarakesh,

    Why would we need to configure the USB and DMA controller in the loop ? I assume we are doing a bulk transfer and before we start the move, these things should be set in motion. Also let's not assume that we use EDMA, so as not to complicate the process. We can assume that the processor is responsible for the transfer, 100% of the time.

    So now the only bottleneck i see in this scenario, is the DDR (actually in this case would be SDRAM). You identified correctly, the DDR would be running at a lower speed and the loading of contents from the DDR would take longer than few instruction cycle time.

    If we do this on a DSP, we can make use of the fast DSP RAM. So running a bare metal on DSP may help.

    Thanks,
    AQ
  • 1. DMA is not for complicating, but to help CPU to be free. Definitely there is advantage of using DMA, otherwise why would they have DMA in the chip ? It helps to transfer bulk of data. Since you took the example of a 32-bit, it appears to complicate. But thats not the case in most of the time, since bulk data is sent in actual usecase like Mass storage.
    2. If USB is not configured every time, wont the USB controller freely run and transfer junk data, even when we don't have/intend to send ? Same holds to true of DMA free running to send junk data when not intended.
    3. I have no idea of DSP RAM and how we configure DSP RAM ? You mean to say DSP has access to USB controller ? Can you explain on this more ?

    From what you point to be the design of software, it looks like there is no other piece of code running in ARM and its sole purpose is to transfer data to USB. Unfortunately the ARM9 in OMAPL137 is an application processor and it is expected to do more work than this.
  • Hi Dwarakesh, 

    1. DMA is not for complicating, but to help CPU to be free. Definitely there is advantage of using DMA, otherwise why would they have DMA in the chip ? It helps to transfer bulk of data. Since you took the example of a 32-bit, it appears to complicate. But thats not the case in most of the time, since bulk data is sent in actual usecase like Mass storage.

    I understand. 


    2. If USB is not configured every time, wont the USB controller freely run and transfer junk data, even when we don't have/intend to send ? Same holds to true of DMA free running to send junk data when not intended. 

    That should not be the case. 


    3. I have no idea of DSP RAM and how we configure DSP RAM ? You mean to say DSP has access to USB controller ? Can you explain on this more ?

    You can configure DSP ram using linker command file and yes DSP has access to USB controller. 



    From what you point to be the design of software, it looks like there is no other piece of code running in ARM and its sole purpose is to transfer data to USB. Unfortunately the ARM9 in OMAPL137 is an application processor and it is expected to do more work than this.

    Yes, that was as intended. 

    I think the conversation is not going anywhere, so I'll leave it here. 

    Thanks for the help, 

    AQ

  • Hi,

    Thanks, nice discussing with you

  • Likewise Dwarakesh :)