This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430F5xxx - is windows driver the bottleneck for USB communication?

Other Parts Discussed in Thread: MSP430F5522, MSP430F5438

Hi,

I want to transfer some data from an SD card (via SPI) connected to the MSP430F5522 out the USB port.  I was doing down the path of implementing a CDC (Serial interface) but I came across this post..

http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/30e830f7-74a5-4833-bb98-d467a496dcfa/

Which implies that the windows usb driver sucks and will max out at 115.2k baud.  I wan to try to get a couple MBits.  Which transfer mode should I look at?  HDI? Something else?

Thanks,

Reza

  • From the MSP430 USB Developers Package User's Guide (http://www.ti.com/tool/msp430usbdevpack)

    Full-speed USB is rated 12Mbps. This is a theoretical maximum, and it includes protocol overhead – so it isn’t possible for a practical application to achieve this rate for data payload. Further, USB is a system involving many components. Any of these components is capable of reducing the bandwidth.

    The following factors can all affect bandwidth:

    • Host application. USB is very host-driven, which means the host initiates all data transfers, whether sending or receiving. If the host application doesn’t initiate transfers often enough, data will be slower.
    • Bus loading. CDC uses bulk transfers, which have the potential to reach the highest data rates by using any spare bandwidth. However, the tradeoff to this is that bandwidth scarcity will cause slowed transfers.
    • Software on the USB device. If the bus is fast (that is, if the factors mentioned above aren’t limiting), the device’s application will become the bottleneck in the system. This is because it must take time to process data received and prepare data for sending.

    As a benchmark, the CDC API stack can achieve 788KB/sec (6.3Mbps) under the following conditions:

    • 8MHz CPU master clock (MCLK)
    • Send data from the host to the device
    • The MSP430 application calls USBCDC_rejectData() for any data it receives.

    The purpose of the last item is to nearly eliminate the device application from being a factor. If the application rejects the data, it doesn’t use any time to move the data. In contrast, any real application must handle the data, and thus will get considerably less than 788KB/sec – probably closer to the 200-500KB/sec range.

    If bandwidth is a priority, DMA should be used. (This can be enabled using the Descriptor Tool.) For an application in which bandwidth is primarily limited by MCU data handling, bandwidth is roughly linear to MCLK. Therefore, running at the maximum MCLK can often get bandwidth closer to the 788KB/sec “ideal”.

    Hope this helps.

    Regards,

    Mo.

  • here are some more Benchmark results:

    MSC Performance (Double Buffering)

    Block Size

    Read (mb/sec)

    Write (mb/sec)

    4096

    0.20

    0.18

    8192

    0.22

    0.18

    16284

    0.23

    0.16

    32768

    0.24

    0.18

    65536

    0.24

    0.15

    131072

    0.23

    0.17

    262144

    0.24

    0.19

    524288

    0.24

    0.19

    MSC Performance

    Block Size

    Read (mb/sec)

    Write (mb/sec)

    4096

    0.14

    0.14

    8192

    0.19

    0.18

    16284

    0.19

    0.17

    32768

    0.19

    0.18

    65536

    0.19

    0.17

    131072

    0.19

    0.18

    262144

    0.19

    0.18

    524288

    0.19

    0.17

    Note: VDBENCH application used for MSC benchmarking.

    CDC Performance

    Direction

    Speed

    CPU Freq

    RX

     99.917KBps

     1.000MHz

     RX

     197.082KBps

     2.000MHz

     RX

     340.978KBps

     4.000MHz

     RX

     511.363KBps

     8.000MHz

     RX

     510.828KBps

     12.000MHz

     RX

     511.550KBps

     16.000MHz

     RX

     511.663KBps

     20.000MHz

     RX

     511.538KBps

     25.000MHz

     TX

     77.723KBps

     1.000MHz

     TX

     157.463KBps

     2.000MHz

     TX

     258.933KBps

     4.000MHz

     TX

     511.376KBps

     8.000MHz

     TX

     511.825KBps

     12.000MHz

     TX

     511.813KBps

     16.000MHz

     TX

     512.000KBps

     20.000MHz

     TX

     512.000KBps

     25.000MHz

    Note: Tranfer size for CDC benchmarks is 2048 arrays with 1024bytes/array

  • Thanks for the info and data.  I'm curious though; is there some other mode (HCI?) that can transfer data faster from the MCU to the PC (running windows) -- I'm still curious if the usb cdc windows driver poses a bottleneck for reception of data.   Also, is there sample code which demonstrates the double-buffering?  And was that for data from the MCU or to the MCU?   Also, why are the figures (max .19Mb/sec = 194kbps) so much slower than the following tables with figures around 512kbps?

    Thanks again,

    Reza

  • The problem with the oh-so-good USB (unbelevably silly bus) si that its high speed is only when doing a burst transmission,.
    So if you can send a whole kB of data in one package, you'll get it sent fast. If you send 1024* 1 byte, it will take ages.
    When writing to or reading from mass storage devices, you usually transfer 512 bytes at one time (a whoel sector) whcih is fine. But tunneling a serial connection through USB means that data needs to be sent when it is sent by the application. Not when the applciaiton has send enough data to do a block transfer. (the driver doesn't know whether the applciation will ever send another byte at all)
    So for each single byte, the full overhead of seting up an USB transfer is hitting the performance.

    Some more sophisticated drivers wait for some time (a few ms) before sending a packet, so if the application sends a continuous stream of data, the driver will pack them into one transfer. However, while this increases throughput, it adds to the transfer latency (the first byte is sent when the nth byte is provided by the applciation, and not immediately).
    So it's a tradeoff.
    Really good drivers allow specifying the maximum latency (the collecting timeout), so you can choose yourself between latency and throughput.

    It has a reason why gamers refused to use USB mice for gaming. Even though a serial mouse event takes 3 or 4 bytes at 1200Bd (26ms max), the protocol overhead of USB causes a much higher latency when using an USB mouse. Too much to play online shooters against people with a serial mouse.

    To draw a picture: imagine a flatbed truck packed with 1TB harddrivers full with data. The truck drives from New York to L.A. When it arrives, teh throughput is still immense (some gigabytes per second). However, it takes several days before the first byte arrives. This is the latency. And if you send just a single byte, it still takes days to arrive. For each byte.

  • Hi Mo, just poking an old topic I know, but I wanted to know if you MSC benchmarks are in megaBits/sec ot MegaBytes/sec.

    Cheers
  • I guess MByte / sec. I done some benchmarks, but didn't test MSC. Hardware module on MSP430 can go up to 1 MByte / sec ...

    http://forum.43oh.com/topic/2775-msp430-usb-benchmark/

  • Hi Paul,

    Paul Young said:
    Hi Mo, just poking an old topic I know, but I wanted to know if you MSC benchmarks are in megaBits/sec ot MegaBytes/sec.

    Cheers

    Those should be MByte/s.

    Hope this helps.

    Mo.

  • thanks Mo, so does that mean that when you recorded a MSC read of 0.24MB/s you were getting a data rate of 1.92Mbps?
  • Awesome! Neeeeeeext question, does anyone know of anybody who got faster than Mo's benchmark? (Using an MSP to implement a MSC device with an SD card via SPI)
  • quick sanity check for yourself Zrno & your post, you're saying you got up to 0.97MB/s as opposed to Mos benchmark of 0.2MB/s?
  • No, my benchmark is related to CDC and Bulk transfer. Mo's 0.2 MByte/sec is related to MSC that I never used.

    Here is short description of my USB stack, developed for my SBW+ flasher. Later, It is changed to streaming, because there was no need to go over 200 KByte/sec.

    TI USB stack is example how things can be done, but it is not fastest implementation. For my point of view, related to topic, MSP430 USB hardware module (where MSP430 is working as device on any OS) is not bottleneck for sure.

  • It we use newer MSP430's which have higher clock rate crystal, then we can transfer the data faster then MO's benchmark test.
  • Mo's benchmark is already done on device with USB hardware module and 25 MHz MCLK. There is no newer MSP430 with higher MCLK, and MSP432 is without USB hardware module.
  • Theoretical maximum SPI speed (master mode) on MSPs with USCI module is one bit per clock cycle.
    Plain SPI transfer rate on 25MHz clock rate is therefore 25MBPS. This is also the maximum clock speed supported by SD card in serial mode.
    Assuming the overhead for sending the transfer command as negligible (~1%), the limiting factor is the SD card controller.
    Most controllers first complete a flash write before accepting data for the next sector. Few can do a background write in serial mode. And when reading, few read directly from flash, most first copy from flash to internal buffer and then start sending.
    This is different in the parallel mode, where larger transfer sizes are combined with background write operation or read prefetch.

    With my hand-optimized busy-waiting code, I got about 1.25MB/s write and 1.75MB/s read throughput on a 16MHz MSP430F5438. On a 25MHz A-type the results would be somewhat (but likely not 50%) better. I made this test in 2010, when experimenting with the then-new 5438. So the exact results (as well as the test code) are lost.

    One could try to use parallel mode. But this would require to organize the data in 4bit nibbles in memory, to send them to a port using DMA. Also clever usage of the timer PWM output as clock signal would be required. Still 2 clock cycles for each 4 bits are required. And synchronizing DMA and timer would be difficult (starting the timer through DMA triggered by another timer while the CPU goes into LPM0).

**Attention** This is a public forum