This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

USB + SD MSC performance on C5535

Other Parts Discussed in Thread: TMS320C5535

I know that this topic was already created here:

http://e2e.ti.com/support/dsp/c5000/f/109/t/243837

But since that time I don't see any workaround to improve the performance in "CSL_USB_MSC_dmaExample". I got pretty similar results for performance.

I'm using sd card that has 20MB/s and 8MB/s for read and write. But with eZdsp5535 TI HW and "CSL_USB_MSC_dmaExample" project (with turned on "CSL_MMCSD_MULTISECT_TXFER" improvement) I have about 1.3MB/s for read and 1.0MB/s for write.

In the post mentioned above Ming explained that they could achieve about 10MB/s on read in the past. And he said that he could share this project but he didn't. 

The similar statement I found here:

http://processors.wiki.ti.com/index.php/C5000_Chip_Support_Library

Here is a quote:

For a 100Mhz CPU, with 50mhz SD clock, with class 4 and class 4 cards, we typically see performance in the 3 to 5Mbps range (CSL version 2.50).
To improve performance, consider using raw read/write modes. You may also want to look at modifying the FAT updates to cache the writes to improve performance. In doing such changes, it may be possible to achieve 10Mbps. 

I believe they meant not Mbps but MBps. In this case it should have improved the performance in "CSL_USB_MSC_dmaExample" because "write ahead" was implemented there.

1. The first question concerning to the post: why they refused the improved scheme to submit 8kB and kept 512 CDMA transfers? And it looks like they admitted drop of performance with current implementation.

2. The second question. If they tried to improve performance, why they didn't try to use data pipeline? I mean to perform USB read and SD write simultaneously and vice versa, SD read and USB write. It should greatly improve the performance. Actually I tried it and it didn't work. CDMA and SD DMA works fine separately but together they stop working. I saw two case: when SD DMA is working and I'm starting the CDMA operation CDMA stops working after awhile. For example if I had 16 transfer 512 bytes length for CDMA while SD DMA in progress I saw that after 13th transfer it stopped working and there where no interrupts on CDMA. It was a hang in CompleteTransfer() loop. In the second case when CDMA was working and and I started SD DMA operation I saw that both CDMA and SD DMA stopped working.

3. The third question. Assume that we have achieved the performance 10MB/s like it was suggested in the post above and in "wiki" page. Where is a bottleneck? Having SD bus 50MHz we can get about 20MB/s (at least I saw that C5535 reads 32KB block from SD in 1.7-2.1 ms), with USB HS we can get 30MB/s and higher performance. Why 10MB/s was your threshold?

I made the following experiment: I commented out SD Write call in AppWriteMedia() function:

/* mediaError = MMC_writeNSectors(ATALogicalUnit[lunNo].pAtaMediaState,(AtaUint16*)srcptr, LBA, 1); */
mediaError = ATA_ERROR_NONE;

And I checked the speed of write with fictive SD operation. I got 10MB/s!

Analyzing the USB traffic I see that most of 512 bytes USB transaction takes about 30-40 us. Quite rare - 10 us. Using a few cardreaders that I have I see 10 us for the most USB transactions.

So it looks like USB is a bottleleck.

The maximum performance that I could achieve for eZdsp5535 was 4MB/s on write and 7MB/s on read.

Any hope to consider C5535 as a candidate for Low-power cardreader?

Thanks,

Denis

  • We recommended using 512 bytes. We also recommended to allocate dedicated data receiving buffer. This required data buffer size is highly dependent on applications. Like Windows OS. This is documented as Advisory 2.0.10 in Errata.
    Regards.
  • Hi Dennis,

    I was able to dig out the USB MSC DMA code. This code was developed quite sometimes back & even the CSL version used is pretty old. But this code is a good reference to achieve greater than 10MB/s bandwidth.

    Please note, that there will be limited/No support from us on this (code) and needs to be used “ASIS”.

    The changes or difference can be easily understood (look for defines “MULTICDMA” and “CDMANUM” in csl_usb_msc_dma.c file ) and needs to be incorporated in your present code. The above mentioned macro decides the number of multiple sectors read from or written to the SD card at a time. More information on this can be found in the readme file which exists in the example folder.

    Hope this helps you to proceed further.

    Regards

    Vasanth

    USB_Example_USB_MSC_DMA.zip

  • Thank you for your answers.

    to Steve:

    I read Errata for TMS320C5535, Advisory 2.0.10 before. It's not relevant for the problems that I encountered. I didn't see the "Starvation" issue in my case. To avoid this problem I implemented CDMA transfers in the same manner as in "CSL_USB_MSC_dmaExample". Before any USB Rx transaction I configure DMA descriptors,  then call USB_dmaRxStart() and then when interrupt occurs I call USB_dmaRxStop().

    to Vasanth:

    Thank you for your explanations and the code example that you have sent. I checked the code under "MULTICDMA" define. This is exactly how I implemented CDMA transfers.

    Actually I implemented it in two ways: The first one is the same, configure N descriptors with N 512 bytes buffers before the transfer is started. Then after getting N-th CDMA interrupt I'm calling USB_dmaRxStop(). The second way: I configured one descriptor + one 512 bytes buffer before starting the transfer. Then after receiving CDMA interrupt I'm configuring the next descriptor + next 512 bytes buffer until I get N interrupts. When I get N I'm calling USB_dmaRxStop(). Tx transfers implemented in the same way.

    Both methods gives me the same performance 7 MB/s read and 4 MB/s write for N = 16 or 32.

    Thanks,

    Denis

  • Hi Denis,
    That is about what we got (60Mb/s) for read. One more thing you can try is to change the SD card read from 512 byte block read/write into 8KB (which is 16 512-byte CDMA descriptors, if you add them as one linked request, the number of interrupt will reduce to 1 from 16. The USB interrupt processing is very time conssuming. Of course, you will have t do some buffer management on the SD card read. Some of the read-ahead may be wasted, but the benefit is obvious. In fact, read 512 block vs read 8KB block, the time difference is very small dueto the nature of the SD card read operation.
    Best regards,Ming
  • Hi Ming,

    I appreciate your help.

    Actually I implemented SD read and write with multiple sectors (or blocks) operation. I tried 16, 32 and 64 blocks. Indeed there is a significant performance gain.

    Also I have some updates from my side. I found out how to solve the problem with lots of NAK packets during Write operation in USB traffic.

    I noticed that USB NYET feature was disabled in CSL library. I don't know the reason and there are not much explanations in the code why it was disable:

    #if 0 ///MW, need set the DISNYET (bit 12) too
        /* Enable the USB DMA for Rx operation */
        usbRegisters->PERI_RXCSR_INDX |= CSL_USB_PERI_RXCSR_DMAEN_MASK;
    #else
        /* Enable the USB DMA for Rx operation (bit 13) and set DSINYET (bit 12) */
        usbRegisters->PERI_RXCSR_INDX |= (CSL_USB_PERI_RXCSR_DMAEN_MASK|CSL_USB_PERI_RXCSR_DISNYET_MASK);
    #endif

    When I enabled NYET, I achieved 8 MB/s on write. When I commented out SD write I saw 19 MB/s USB bandwidth. Quite nice result!

    One more thing. My colleague has also ordered eZdsp5535 development board and checked FW with my changes at his side. He got quite different results. He saw 11 MB/s for read on 8 MB/s for write using the same SD card. So he had much better performance on read, 11 MBps vs 7 MBps that I saw. I checked the USB traces from his results and saw that USB IN transaction took less time, about 20 us per transaction. When I performed measurements I tried different USB cables, different length and different qualities. The shortest cable was 1 foot length. Results didn't change. So the difference could be cause by development board HW.

    So all in all now we have 11MB/s read and 8MB/s write and still looking for ways how to increase the performance. The next step to make USB and SD working in parallel.

    Regards,

    Denis