This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SPI EDMA performance

Hi

I have some troubles getting a good SPI performance with the C6748 SOM on the Logic EVM. I've configured the DSP as SPI master, 7 Mbit/s (which turns out to be 7.35) using DMA. When my application calles GIO_write(), all data bytes are clocked out at 7.35 Mbit/s and there is no gap between individual bytes. However, there is a 53 us delay between CS going low and the actual start of the transfer. Once transfer is finished, there is another 40 us delay before CS is returning to high. I've explained this a little more (including scope shots) here: http://e2e.ti.com/support/embedded/f/355/t/98047.aspx#342705.

It also seems that I'm not the only one noticing this problem; for example: http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/p/12370/48359.aspx#48359 or 
http://e2e.ti.com/support/embedded/f/355/p/79653/283111.aspx#283111 seem like similar issues.

I've tracked the delays down to EDMA3_LOG_EVENT() calls, of which several happen in the time between CS going low and start of the clock.

What I'd like to know now:

  1. Are the delays that I see normal?
  2. Are there any methods to reduce these delays? For example:
    1. Disabling the EDMA3 logging with some configuration setting
    2. Recompiling EDMA3 drivers without logging?
    3. Cache settings? How can I see my current cache settings and what would the recommended cache settings be?

In case recompiling EDMA3 drivers is recommended: which project should I use? So far I've found these projects:

  • edma3_drv_bios_c6748_sample_lib
  • edma3_drv_bios_lib
  • edma3_drv_bios_lib_c674
  • edma3_drv_bios_c674_XDC
  • edma3_drv_bios_lib_XDC
  • edma3_rm_bios_c6748_lib
  • edma3_rm_bios_c6748_sample_lib

Or should I recompile all of them?

Cheers

Admar

PS: For the record: I have the Logic EVM with C6748 SOM, psp drivers 01_30_00_05, BIOS 5.41.09.34 and edma3_lld_01_10_00_01 (at least, I think it is using that one).

  • I've just confirmed that the UART driver examples also suffer from poor DMA performance. I have a setup here with a PC connected to UART2 on the C6748 on the Logic EVM, using 921600 baud 8N1.

    Transmitting data from the dsp to the pc seems to work fine. I have not done any extensive benchmarks yet, but so far it seems fast. (But I expect there are some delays at start and end of the transaction. However, since I transfer a block of bytes at once, these delays form a small penalty.)

    Transmitting data from the pc to the dsp on the otherhand is tricky: if the dsp uses DMA for the UART handling, I have to insert delays of at least 250 ms between each byte. I think this is because of the nature of my communication protocol (variable length, so I have to receive one byte per transaction). For a complete frame of a few (say, 24 bytes), all these delays add up to a considerable time.

    If the dsp uses interrupt method for UART handling, I do not have to insert any delays at all.

    I haven't looked at the GIO_submit() routines that I use to send and receive bytes, but I wouldn't be surprised if it eventually calls some EDMA3_LOG_EVENTS().

    Anyone from TI who can shed a light on this?

    Admar

  • Hi Admar,

    I would like to know few more things about your setup. When you configure UART in DMA mode of operation,

    How many bytes are you reading?.

    What is the RX trigger level being set to?.

    What if you do not insert any delays between each byte?. Is the data read being corrupted?. or, you are reading repeated characters?.

    The BIOS PSP we provide with the application has a different setup. We operate UART only upto baud 115200 and in 3 modes - POLLED mode, INTERRUPT mode, and DMA mode.  The sample application simply reads 1000 bytes from the serial console and puts it back on to the serial console both in INTERRUPT and DMAINTERRUPT mode. I hope you do not see any issues with this.

    The UART driver is tested for various bauds starting from 2400 upto 115200. It is also tested for data transfers of 1K, 8K and 50K bytes continuously to the UART serial console in both interrupt and DMA modes.

     

    Thanks & regards,

    Raghavendra 

  • Hi Raghavendra,

    Thanks for getting back to me.

    About the number of bytes that I am reading: just 1 at a time. The protocol contains messages of variable length, so I have to at least partially decode it before I know how long the message will be. To make things easier for me, I've implemented it right now to always read one byte at a time, though I could change that to read the remainder of the message once the length is known. That still leaves me with delay issues for the first part of the message where the length is not known.

    About the RX trigger: I don't know what that is, and I don't remember changing that so I guess it is the same as in the sample application.

    If I don't insert delays, the DSP misses a few bytes and eventually stalls (though that might be caused by my packet decoding algorithm which has lost sync).

    The sample application works fine, and indeed it reads 1000 bytes at a time. However, I suspect there is quite some delay (several ms) before it is actually capable of receiving those bytes. I have not tried it (and I don't have time to do that atm), but I suspect that if you change the code to 1000 times reading a single byte (instead of one time reading 1000 bytes), the example will fail since the setup time to read a byte is too long.

    I know that the UART driver is not tested for speeds higher than 115200 baud. However, I don't think the high baud rate is causing me issues here since if I use INTERRUPT mode, things work fine. Besides that, I see similar issues with DMA mode in SPI communication, which also has these long delays before actually doing an SPI transfer. Again, if I use INTERRUPT mode for SPI communication, the delays (and thus the issues) are gone.

    I'm sorry that I don't have time to investigate things further this week. I'll get back to this project next week.

    Best regards

    Admar

  • Hi Admar,

    Since you are facing issue while transmitting from UART, please refer to the following thread:

    http://e2e.ti.com/support/embedded/f/355/p/92464/322006.aspx#322006

    This might help..

     

    Thanks & regards,

    Raghavendra

  • Admar,

     

    As you have mentioned, we have also observed the delay between CS going low and the actual start of the clock. And, once transfer is finished, there is another delay between end of clock and the CS returning to high.

     

    After debugging through the SPI driver code, we have found  that some small changes in the SPI driver(software) can reduce the latency between CS going low and the start of clock.

    Solution:

     

    In the 'Spi_edma.c' file, we write to "*spiDat1Cfg" which would assert the CS HOLD line, and then the EDMA configuration is done for Rx and Tx events. After EDMA configuration, the EDMA transfer is enabled for both Rx and Tx. Because of the EDMA settings being done between asserting the CS HOLD line and enabling transfer, a considerable amount of delay could be expected between CS going low and the start of the clock. So to minimize this to a certain extent, we can try writing to "*spiDat1Cfg" just prior to enabling the transfer (preferably before EDMA enable for Tx event).

     

    Please refer the difference report - 6232.Spi_edma.c_file _Difference_ Report.html attached with this post (right side of the report is the modified code, only required portion of the file has been added in the report ). And accordingly, make the changes in the SPI driver file Spi_edma.c. 

     

    As per our observation, this would reduce the delay and do let me know the result.

     

    Thanks and Regards,

    Raghavendra

  • Hi Raghavendra,


    We have got a comparable question but related to the McBSP performance of the PSP BIOS driver.
    Do you have some comparable tips for the McBSP BIOS driver?

    Looking at the performance in the below BIOS PSP 1.30.01 datasheet document it seems that the performances are quite low and with an high CPU load (up to 70%):
    http://software-dl.ti.com/dsps/dsps_public_sw/psp/BIOSPSP/01_30_01/content/OMAPL138_BIOSPSP_Datasheet.pdf
    Do you have more information on the test condiftion?
       Was it with data in SDRAM or internal memory?
       Was it in loopback mode or connected to an other device?
      What it measure link for the Linux driver using a 2nd OMAPL-138 board?
       Are the performance measured using the provided example?

    Also do you know what is the bottleneck on those performances figures?

    Thanks in advance and best regards,

    Anthony

     

  • Hi Anthony,

    AnBer said:
    Looking at the performance in the below BIOS PSP 1.30.01 datasheet document it seems that the performances are quite low

    If you take a quick look at the datasheet, for a bit clock of 3MHz, you are getting a throughput of ~3Mbps(2893.52 kbps) which good. :) ya, the cpu load is quiet high though.. 

    AnBer said:
    Do you have more information on the test condiftion?

    Its a board to board transfer test(Master/Slave config), and the write performance is captured.

    In order to increase the throughput, you can increase the bit clock by increasing the wordlen(Max 32bits), sample rate, or no. of channels(Max 128).

    Eg: Configure bit clock of 10 MHz to have TP of ~10MHz. 

    Hope this helps..

     

    Thanks & regards,

    Raghavendra