• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Microcontrollers » Stellaris® ARM® Microcontrollers » Stellaris® ARM® LM3S Microcontrollers Forum » Real World USB and Ethernet Transfer Rates
Share
Stellaris® ARM® Microcontrollers
  • Forum
Options
  • Subscribe via RSS
Helpful Stellaris® LM4F Series Links
  • LM4F Series
  • Stellaris PinMux Utility
  • Stellaris® LM4F120 LaunchPad
  • LM4F MCU Applications
  • LM4F MCU Video
  • ARM Cortex-M4F Whitepaper
  • Stellaris MCU Brochure
  • LM4F232 Eval Kit
  • Forums

    Real World USB and Ethernet Transfer Rates

    This question is not answered
    Henry3374
    Posted by Henry3374
    on Apr 29 2012 22:26 PM
    Prodigy90 points

    Running a LM3S9B92 and FreeRTOS + LwIP my Ethernet Transfer Rate copying large files into SDRAM is around 2MB/sec at 100MHz and 1.7MB/sec at 80MHz over TCP/IP.  Running no FreeRTOS I can get around 4.6MB/sec so there seems to be quite some improvement that can be made there.  We have a gigabit ethernet switch and I can easily copy >100MB/sec from HD to HD over the network.

    Running FreeRTOS + TI USB Stack and slightly optimized USB Bulk I can copy around 900kB/sec into SDRAM at 100MHz and 730kB/Sec at 80MHz.  From my experience with an FTDI FT245R I could only get about 900kB out of it.  Is it possible to reach closer to the 1.5MB/sec theoretical?

    How far off are these from what other people have been able to achieve running some sort of RTOS?   

    We are trying to figure out how much more engineering effort to rewrite using uDMA will have on performance.

    Also as a big tip to USB users: whatever you do don't use the TI USBRead api function call for bulk transfers.  We didn't realize it was implemented by using a loop to call USBRead one byte at a time.  We replaced this with two memcpys (one to end of queue) directly out of the USBBuffer ring buffer and it was at least 2-3x faster.  We are also rewriting this to directly copy out of the USB FIFO into SDRAM skipping an additional copy into the USBBuffer.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    All Replies
    • TI Alex
      Posted by TI Alex
      on Apr 30 2012 10:27 AM
      Expert8210 points

      Henry,

      These are good questions and points but unfortunately we do not have any true performance data. Typically the uDMA helps with throughput since it offloads the transferring of data from the processor.

      Regards,
      Alex 

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Aaron Gage
      Posted by Aaron Gage
      on May 22 2012 16:41 PM
      Intellectual470 points

      I am seeing about 960KB/s on an LM3S3748 at 50MHz (transferring data back to the host) using the usblib driver.  The host is getting a NAK on the IN transaction after almost every good packet that is sent.  This implies that the Stellaris is the bottleneck.  It seems to take about 55-65usec to transmit each new packet according to a USB protocol analyzer.

      My code is copying from a few different buffers (32 bits at a time) into a set of fully aligned and packed packet buffers.  These are then copied (32 bits at a time) into the endpoint FIFO.  The FIFO is also double-buffered.  I doubt that the time necessary to copy 16 words, twice, plus the time to update and evaluate some loop variables, plus the time to update and evaluate some buffer pointers is really 50usec (2500 processor cycles).  I suspect that the extra cycles are being consumed by the usblib driver, or by something in the hardware itself as it signals to USB.

      I do not believe that uDMA could possibly help here because it is only allowed access to SRAM when the processor core is not using it.  Doing this through 32-bit programmed I/O instead of DMA should be the fastest it is possible to move the bytes because the processor is not subordinate to the uDMA controller.  There is nothing running at a higher priority than the USB interrupt that would delay the processor core.  I think that some gains could be made by tweaking the usblib driver, but realistically, I think that 1MB/s may be all that can be realistically achieved in a real application.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Aaron Gage
      Posted by Aaron Gage
      on May 23 2012 10:46 AM
      Intellectual470 points

      I have taken a little bit more time to study the actual behavior of the USB controller on the LM3S3748.  I was seeing 55-65usec per 64-byte packet on a USB protocol analyzer previously and wanted to know where that time was being spent.

      I configured a pin as a GPIO and wrote a new ISR for the USB0 interrupt.  This ISR raises the GPIO pin, then invokes the usblib handler (USB0DeviceIntHandler()), then lowers the GPIO pin.  All of my buffer management is handled in the ISR (invoked through the endpoint event handler interface to usblib), so it will occur between these two events.

      What I see on an oscilloscope is that the ISR is active for about 19usec, then there is a gap of about 45usec before it is active again.  That is, there is approximately 64usec between USB interrupts, the core is actively servicing the interrupt for 30% of the time (partly due to by buffer management code), and the other 70% of the time is spent by the hardware transferring the packet.  This timing appears to be pretty consistent.

      Of the 19usec to service the interrupt when there is data to transfer, about 2.5-3.5usec of this is the usblib driver doing its work before it invokes my endpoint handler.  This means that my buffer management and copying into the FIFO is consuming about 16usec per packet.  I found this by moving my GPIO pin toggle up to the entry point for the endpoint handler.

      When I do not have any data to transfer and the bus is idle, the 1KHz ticks on the bus only require about 2usec to service.

      Assuming all of the 12Mb/s USB bandwidth is available, each USB packet should require (64*8 bits)/(12 bits/microsecond) = 42.67usec to fully transfer, which seems to be a major part of the 45usec transfer time.

      Thus, I think that what I'm seeing is the maximum possible throughput (for a real-world application, meaning that there is a little work being done to fill each packet) if loading the packet and transmitting it are done sequentially.

      The odd thing is that I thought I had properly enabled double-buffering so that the ISR and the USB transfers could run concurrently which should increase the total throughput much closer to 12MBit/sec (since the time to service the ISR is less than the time to transfer a packet).  This was done by calling USBFIFOConfigSet() using USB_FIFO_SIZE_64_DB which appears to push a 1 to the double buffer bit in USBTXFIFOSZ.  I'm going to keep working at this to see if something else is disabling the double buffering.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Aaron Gage
      Posted by Aaron Gage
      on May 23 2012 16:38 PM
      Intellectual470 points

      I am all but convinced that double buffering is not working when it is supposed to be.  I've toggled another pin on either side of the code that actually writes to the FIFO, and put all of that code into a loop that keeps trying to write while TXRDY in USBTXCSRLn is cleared (which is supposed to happen automatically when double-buffered).  What I should see if it is double-buffering is two back-to-back FIFO writes the first time the ISR is invoked, followed by one FIFO write per ISR along with a boost in speed (because transmitting and loading would be concurrent).

      I tried increasing the FIFO sizes to 128 bytes while keeping the _DB flag in case there was some confusion there.  I tried making both the IN and OUT endpoints double-buffered.  I re-checked that the USBTXDPKTBUFDIS register is all cleared and that the DPB bit in USBTXFIFOSZ is set for all IN endpoints.  All of the packet sizes are limited at 64 bytes (which is normal for Full Speed).  I rechecked the manual to be sure that Bulk endpoints supported double-buffering.  I re-checked the errata (from November 2011) and saw nothing mentioned.

      I am unable to use uDMA for this because of an errata for the LM3S3748 (since not all of my packets are totally filled), and I don't see how that could make it any faster than programmed I/O.  I am essentially polling TXRDY in USBTXCSRLn because I wasn't sure if the interrupt would be called correctly.  For the most basic case, this does not seem to work.

      I might try changing the bits in USBTXDPKTBUFDIS to see if that makes any difference (since the manual seems to be contradictory in how it describes that register), but other than that there is not much else to try.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Stellaris Dave
      Posted by Stellaris Dave
      on May 30 2012 11:24 AM
      Expert5260 points

      Aaron,

      If you want to increase your throughput then you need to invoke the use of the uDMA. However, as you mentioned there is an errata on the LM3S3748 which prevents you from using the uDMA if your packet sizes are not multiples of the USB FIFIO. Essentially what you are seeing is that you are unable to fill the FIFO fast enough which is what is causing all of your NAKs on the host side. If you can somehow guarantee your packet size to be a multiple of the USB FIFO then you would be able to configure and use the uDMA and would see an increase in data rate.  

      -Dave


      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Aaron Gage
      Posted by Aaron Gage
      on May 30 2012 12:13 PM
      Intellectual470 points

      Greetings Dave --

      My memory configuration has a large (4K) RAM buffer, essentially acting as a FIFO, between my data producer and the USB peripheral's FIFO.

      By bringing out some signals to an oscilloscope, I have found that I can produce data approximately four times faster than the USB peripheral FIFO can consume it.  Thus, I disagree with the statement that "you are unable to fill the FIFO fast enough."  For every packet that I see getting loaded into the USB FIFO, the rest of the device produces five that sit in my RAM FIFO waiting for USB_TXCSRL1_TXRDY.

      At one point, I was polling the FIFO status continuously to try to stuff another packet in as soon as the USB_TXCSRL1_TXRDY appeared.  This made no difference.  I never saw it consume a second packet until it had fully transmitted the first.  I should have about 45 usec to copy in the next packet while the first one transmits, and I doubt that uDMA could do this faster than polling (especially since uDMA can be suspended due to core accesses to RAM, whereas polling cannot).

      Is there any sample code that proves that double-buffering works, uDMA or otherwise?  I would very much like to compare my settings.  I am tempted to write a trivial test program to demonstrate that it does not and post it here.

      Do I need to wait for TXRDY to clear before stuffing in the second packet?  The description in the datasheet says "TXRDY is also automatically cleared prior to loading a second packet into a double-buffered FIFO."  Does this mean that if I start writing a second packet while it is set it will automatically clear until the second packet is fully written?  This interpretation would disagree with another statement that "After the first packet is loaded, TXRDY is immediately cleared and an interrupt is generated."  However, I'm just grasping at straws here.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Aaron Gage
      Posted by Aaron Gage
      on May 31 2012 13:02 PM
      Intellectual470 points

      I think I've made a little progress.  I created a simple USB test program to try to show this issue in no uncertain terms.  In doing so, I made the following discovery:

      The double buffering bit in USBTXFIFOSZ is being cleared any time the device detects a disconnect event.

      I set the main loop of the firmware to simply keep checking whether the conditions for double buffering were being met, and for a while, they are.  Then, suddenly, they aren't.  When the device is allowed to connect to Windows machine, the double buffering seems to stay enabled right up until I do a disconnect.  At that point, it disappears and never recovers.  When the device is allowed to connect to a Linux machine, it appears that Linux is doing some initial probe that ends with a disconnect so the double buffering is gone before the device becomes available for my test code.

      My new goal is to figure out why the double buffering is being disabled (due to an automatic purge of the FIFO on disconnect, for instance, or some programmatic thing in the driver), and/or whether I can re-enable it whenever this happens.  I then need to test whether double buffering actually works, once I can be sure it is enabled.

      If anyone happens to know why this setting is lost on a disconnect, it might save me some time.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Aaron Gage
      Posted by Aaron Gage
      on May 31 2012 16:53 PM
      Intellectual470 points

      I think I've got it solved.  Part of it was a bug in my code (go figure) that was using the default FIFO configuration and then also manually setting up the FIFOs -- this meant that as soon as anything interesting happened the default configuration got re-applied which disabled double buffering.

      The other part is a bug in StellarisWare.  I have confirmed this in version 7243 and I am now downloading 8555 to see if it exists there too.

      Here's the bug:

      In driverlib, usb.c, USBEndpointDataSend() function, there is a test that looks like this:

          //
          // Don't allow transmit of data if the TxPktRdy bit is already set.
          //
          if(HWREGB(ulBase + USB_O_CSRL0 + ulEndpoint) & USB_CSRL0_TXRDY)
          {
              return(-1);
          }

      Note that this function can be used for any endpoint (so ulEndpoint is a multiple of 0x10, starting from 0x00 for EP 0).

      This applies the CSRL0_TXRDY mask (a.k.a. 0x02) for both endpoint 0 (correct) and endpoints 1, 2, and 3 (definitely not correct).

      In endpoints 1, 2, and 3, this refers to register USBTXCSRLn and bit 0x02 is the FIFONE (FIFO Not Empty) flag which gets set as soon as there is anything in the FIFO.  To properly check whether there is room for another packet, this should be checking bit 0x01 (TXRDY).

      In other words, if anyone does manage to enable double buffering, and then actually try to do it, any attempt to call USBEndpointDataSend() will fail out with an error and the packet is basically dropped.

      By adjusting the code above this that checks whether this is Endpoint 0 to also include the TXRDY test, I get this which seems to work:

          //
          // Get the bit position of TxPktRdy based on the endpoint.
          //
          if(ulEndpoint == USB_EP_0)
          {
              ulTxPktRdy = ulTransType & 0xff;
              //
              // Don't allow transmit of data if the TxPktRdy bit is already set.
              //
              if(HWREGB(ulBase + USB_O_CSRL0 + ulEndpoint) & USB_CSRL0_TXRDY)
              {
                  return(-1);
              }
          }
          else
          {
              ulTxPktRdy = (ulTransType >> 8) & 0xff;
              //
              // Don't allow transmit of data if the TxPktRdy bit is already set.
              //
              if(HWREGB(ulBase + USB_O_CSRL0 + ulEndpoint) & USB_TXCSRL1_TXRDY)
              {
                  return(-1);
              }
          }

      This is not the most elegant way to do it but it seems to fix the problem.  I have verified on an oscilloscope that two packets are loaded the first time around and then one more is loaded on each subsequent interrupt, which is exactly what I was expecting to see.

      I am going to try to make this adjustment in my main project and see what the impact on throughput is since double-buffering now seems to be working. I'm also going to capture some USB traffic to make sure that the NAKs are gone.  The time between packets now appears to be about 56usec which is longer than I thought it should be but is definitely shorter than before.

      It's not the first bug I've found in StellarisWare and it probably won't be the last, but at least I have the source to muddle through.

      Can someone at TI confirm that this is a bug that [is | was] being tracked and [will get | has been] fixed?  The last time I posted a bug I had to call special attention to it to get it fixed so I want to make sure someone notices.

      I have a suspicion that Dave's post earlier was probably almost correct -- uDMA may do this properly because the TXRDY bit should be wired directly to the correct spot in the uDMA controller so this error does not affect it.  It's only when this is done in software, relying on the USBEndpointDataSend() function, that the problem appears.  It isn't that the software can't keep up with the rate of data transfer, but rather, that the driverlib code would scrap anything that came too fast and force the caller to wait until the FIFO was completely empty to load another packet.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Stellaris Dave
      Posted by Stellaris Dave
      on Jun 01 2012 10:30 AM
      Expert5260 points

      Aaron,

      You are indeed correct. This is now a confirmed bug and is currently being tracked. Thank you for your followup post and work on this issue.

      -Dave


      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Aaron Gage
      Posted by Aaron Gage
      on Jun 01 2012 12:29 PM
      Intellectual470 points

      Thank you Dave.

      Now, back to the original question about real-world performance:

      With double-buffering enabled, I no longer see a NAK on every other packet like I had before.  The USB host seems to request each subsequent packet reliably every 55usec.  Each packet transaction looks like this:

      1. IN request: 3.2 usec
      2. DATA packet: 46.2 usec
      3. ACK packet: 5.6 usec

      This means there is a minimum of 16% overhead just for the host to request and ACK the data.

      There is additional overhead related to how USB is signaled.  Every 1ms, there is a pulse sent from the host that indicates Start of Frame (SoF).  Devices are not allowed to begin transmitting if they cannot finish within a certain window of the SoF.  This appears to add about 65usec of idle time between data packets before the next IN request.  There are about 18 full packets that can be transmitted per millisecond, minus the SoF, which leaves 17 good slots for packets per millisecond.  This works out to (17 * 64 * 1000) = 1.088MByte/s maximum.

      While streaming some data from my LM3S3748 (50MHz), I find that I can transfer 4432 bytes in about 4040usec on average.  This works out to about 1.1MByte/s (within the timing error of a computer in this case, not an oscilloscope, so this may be off a bit).

      To wrap all of this up, I think that you can achieve 1-1.1MB/s (8Mbit/s) via USB on the Stellaris.  This assumes the following:

      • Double buffering on the endpoint FIFO is enabled, and either the bug I described above is fixed or uDMA is used
      • The software on the Stellaris can provide a full 64-byte packet every 55usec or faster (preferably much faster)
      • The host keeps up with the transfer
      • The bus is not in contention

      For the sake of comparison, at usb.org they say that 900kbit/s is typical:

      http://www.usb.org/developers/usbfaq#band1

      The USB specification, chapter 5, gives the theoretical maximum for bulk transfers with 64 bytes per packet as 1216000 bytes per second.

      It looks like the Stellaris lands a little bit above typical and about 90% of theoretical maximum.  It might be possible to increase the speed a little more by tuning things, but that's beyond the scope of what I wanted to do.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Henry3374
      Posted by Henry3374
      on Jun 01 2012 12:49 PM
      Prodigy90 points

      We rewrote the USB stack to allow direct copy from USB Fifo to SDRAM and even without dma or special buffering we can get ~910kB/Sec sustained for 4mB-30mB.  Direct copy to memory (not sdram) probably can gain another 10% easily.

      The problem with slow speed is in the TI supplied USBRead functions.  If you look at source code it calls some function usbreadone or similar in a loop.  This is extremely inefficient.  Rewriting this with 1 or 2 lines to memcpy gives around 700kB/sec.  To move from 700->900+ requires major rewrite to eliminate the USBRingBuffer and replace with direct copy.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    TI E2E™ Community
    • Support Forums
    • Blogs
    • Videos
    • Groups
    • Site Support & Feedback
    • Settings
    TI E2E™ Community Groups
    • TI University Program
    • Make the Switch
    • Microcontroller Projects
    • Motor Drive & Control
    Other Communities
    • Deyisupport
    • Designsomething.org
    • beagleboard.org
    • TI on Element 14
    • TI on TechXchangeSM
    Other Technical & Support Resources
    • WEBENCH® Design Center
    • Product Information Centers
    • Technical Documents
    • TI Design Network
    • TI Technical Articles
    • TI Training

    All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

    Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

    Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
    TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

    TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
    embedded processors, along with software, tools and the industry’s largest sales/support staff.

    © Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
    Trademarks | Privacy Policy | Terms of Use