This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SPI EDMA Timings

I have noticed significant delays when using SPI1 with EDMA to transmit  24 bit data. With data output continuously I see 64uS before CS goes low and 37uS before going high again. The data only takes 3uS to transfer.

I found a thread, "SPI EDMA Performance" which discussed the same problem and there was a modification to "Spi_edma.c" which was recommended. I made this mod and rebuilt the library but did not notice any difference in timings.

Are these timings to be expected? If so they put a severe limitation on the use of SPI with EDMA.

Any suggestions would be appreciated.

  • Iain,

    I guess you are referring to the pots http://e2e.ti.com/support/embedded/f/355/p/99324/361191.aspx#361191 ? am I right?

    Once after changing the driver, make sure that you build the driver as well as the application. (I just want to avoid all the possible errors)

    I assume, the SPI word length is 8 bit and 3 SPI words are being transferred

    Please mention the PSP being used and the EVM being used?

    Could you please make sure that there is no other delays are introduced in? like c2TDelay, t2CDelay etc.

    Is it possible to make sure that the slave device is causing delay or not? If possible can you share the slave device data sheet?

    Please try the same experiment in interrupt mode of operation and verify the delays.

    Let me know your observation.

    Thanks and Regards,

    Sandeep K

  • Hi Sandeep,

    Yes, I am referring to that post.

    I built both debug and release drivers although I am only using release. The debug driver adds additional delays.

    The SPI word length is 8 and 3 words are being transferred.

    I am using psp 1.30.00.05 & edma_lld 1.10.00.01 on the OMAP L138/C6748 Zoom EVM.

    I was using the default delays but I have now set both to zero to make sure.

    The equivalent timings using interrupt mode are 21uS to set up and 12uS CS active.

    I have attached the DAC datasheet.

    Regards Iain

    nano quad AD5624R_5644R_5664R.pdf
  • Iain,

    Thanks for the info, i will go through the data sheet.

    If you have adopted the modification provided in the post "SPi Performance", then there should be at least some change in the delay between CS hold going low and the data start. Since we have moved the CS hold just before the EDMA enable for the Tx (since Tx empty interrupt will be generated first), there will be reduction in the delay.

    In case of data stop and the CS going high, there could be some time spent in EDMA ISR and, then inside the EDMA callback (in SPI driver) before resetting the the CS hold. So the modification, resets the CS hold immediately when it enters into the Rx or Tx callback.

    Just for clarification sake, could you please verify the changes against the recommended changes?

    Thanks and Regards,

    Sandeep K

  • Sandeep,

    Just to confirm that I have done a double verification of the recommended changes and they are correct.

    Regards Iain

  • Sandeep,

    Have you any more information for me on this problem? Should I be considering dumping the psp and doing it all myself?

    I have found also that there is no change in timings between GIO_write and GIO_submit. I would have expected the latter to finish quicker since it doen't have to wait for the callback.

    Regards Iain

  • Iain,

    We have tested the proposed change in our setup which has the SPI flash on the C6748 EVM. Is it possible you to test this change with the application (read/write with the SPI flash) which comes with the PSP? so that we can eliminate the slave device interference.

    Note: Before testing the PSP application, please build the driver library and then the sample application.

    Let me know the result.

    Iain Scott said:
    I have found also that there is no change in timings between GIO_write and GIO_submit. I would have expected the latter to finish quicker since it doen't have to wait for the callback

     

    GIO_write() is a synchronous API and the GIO_submit() is the asynchronous API (if callback is used). So by using the GIO_submit() you can save the time in the application by not blocking for the IO completion. But this can not reduce the gap between the CS hold going low and the data start or the data end and the CS hold going up.

    BTW, how are you measuring the timing? I guess you are using the oscilloscope.

    Thanks and Regards,

    Sandeep K

  • Sandeep,

    I did as you suggested with the psp application. I monitored the 2 GIO_writes in the Spi_Flash_WriteEnable function.

    The first CS went low for less than 1uS. There was a gap of 89uS before the second CS went low for 22uS. Timings measured by scope.

    This illustrates perfectly the point I am trying to make. Between the first GIO_write and the second there are only a few structure variables being set yet it takes 89uS before the next CS low. Is all this time lost in the GIO_write?

    Similarly there are only 2 bytes being transfered at 20MHz, again <1uS, yet the CS is low for 22uS. This time must also be lost in GIO_write.

    These times are not that different to those I am measuring with the DAC ie 103uS total.

    I think the library mod is a bit of a red herring because it may change the CS timings but it does nothing for the time duration of GIO_write.

    Please let me know your thoughts.

    Regards Iain

  • Iain,

    Iain Scott said:

    I did as you suggested with the psp application. I monitored the 2 GIO_writes in the Spi_Flash_WriteEnable function.

    The two GIO_writes() are entirely different, one is writing the "Write enable command", which obviously takes less time. Other one is writing a "Read status register command" and then reading the status from the SPI flash.

     

    Iain Scott said:

    This illustrates perfectly the point I am trying to make. Between the first GIO_write and the second there are only a few structure variables being set yet it takes 89uS before the next CS low. Is all this time lost in the GIO_write?

    Yes, the GIO_write() comes into picture, during 89us (which is the gap between two transaction) i.e completion of first GIO_write() and the start of second GIO_write(). 

    I think there is a confusion here, I was under the impression that you are worried about the gap between the

    1. CS going low and then the data start

    2. Data end and the CS going up.

    Note: The fix provided is to reduce the above two cases. 

    But as per your comment, you are worried about the time gap between the two successive GIO_write(). Am I right?

     

    Iain Scott said:

    Similarly there are only 2 bytes being transfered at 20MHz, again <1uS, yet the CS is low for 22uS. This time must also be lost in GIO_write.

    No, the GIO_write() does not come into picture between CS low going low and then high (i.e during 22us).

    Explanation:

    When there is a request to the driver for read/write [i.e GIO_write()], the following sequence will be executed,

    1. Driver configures the EDMA parameter.

    2. Just before enabling the EDMA, the CS hold will be enabled.[Fix]

    3. EDMA enabled.

    4. Wait for the EDMA callback.

    Note: The EDMA callback will be generated, whenever EDMA completes its transaction. On occurrence of the EDMA completion interrupt the EDMA completion handler will parse through the IPR register to get the appropriate "tcc" for which the EDMA has completed. Once getting the appropriate "tcc", the registered callback of the device (SPI) will be called and in the callback, the device (SPI) specific configuration are made to complete the I/O.

    5. In the (Tx/Rx) callback, CS hold will be disabled. Then device specific configurations are made to complete the I/O.

    In the above sequence the, software overhead is always constant whatever the length of the data. In my opinion, it is better you measure the CS low for higher number of data transaction, which will narrow down the software overhead.

    And also, if the transaction involves the command and the response from the slave device, then the total duration depends on how fast the slave device responds to the command.

     

    Thanks and Regards,

    Sandeep K

  • Sandeep,

    Sandeep Krishnaswamy said:

    But as per your comment, you are worried about the time gap between the two successive GIO_write(). Am I right?

    Yes you are right.

    I have an application where I have to send 2 DAC outputs within a 1mS time sample as well as other processing. If, as you say, the software overhead is constant to set up EDMA then each will take over 100uS to transmit 3uS of data. I have to use EDMA because of 24bit data which means I also have to use EDMA for the ADC which has to be on the same SPI channel. Consequently that performance will also be compromised. I have measured interrupt mode as running 3 times faster than EDMA. 

    Sandeep Krishnaswamy said:

    In the above sequence the, software overhead is always constant whatever the length of the data. In my opinion, it is better you measure the CS low for higher number of data transaction, which will narrow down the software overhead.

    Those two statements seem to be contradictory. If the predominant factor in an EDMA transaction is in the set up and callback then the duration of the CS select must be irrelevant.

    I am getting the impression that there is no solution to this.

    Regards Iain

  • Iain,

    Iain Scott said:

    In the above sequence the, software overhead is always constant whatever the length of the data. In my opinion, it is better you measure the CS low for higher number of data transaction, which will narrow down the software overhead.

    Those two statements seem to be contradictory. If the predominant factor in an EDMA transaction is in the set up and callback then the duration of the CS select must be irrelevant.

    [/quote]

    Sorry for false sentence formation :( . What i was intended was, if the data length is higher then the software overhead will become negligible.

    Iain Scott said:
    I am getting the impression that there is no solution to this

     

    No, there are couple things which we can do to reduce the gap between the two successive requests.

    1. If you are sending only 3 bytes at a time, then you can try the polled or Interrupt mode. Where software overhead for the polled mode < interrupt mode < EDMA mode. But with respect to CPU load, polled mode > interrupt mode > EDMA mode.

    2. If you are concerned about the CPU load, then you can use interrupt or EDMA mode but instead of synchronous (GIO_write()) call, you need to use the Asynchronous (GIO_submit()) call with the callback function.

    Eg:

    /* Declare two callbacks for four requests  - refer to ti/bios/include/gio.h for type definitions*/

    struct GIO_AppCallback spiCallback1;

    struct GIO_AppCallback spiCallback2;

     

    spiCallback1.fxn = &callback_fxn1;

    spiCallback1.arg = &callback_arg1; /* note that this is a pointer to a user (reference) variable */

    spiCallback2.fxn = &callback_fxn2;

    spiCallback2.arg = &callback_arg2; /* note that this is a pointer to a user (reference) variable */

     

    Dataparam1->inBuffer       = loopRead1;

    Dataparam1->outBuffer    = loopWrite1;

    Dataparam1->bufLen         = 3u;

    Dataparam1->dataFormat= Spi_DataFormat_0;

    Dataparam1->flags              = Spi_CSHOLD;

       

    Dataparam1->chipSelect   = SPI_CHIPSELECT_SPIFLASH;

    GIO_submit (spiHandle, IOM_WRITE , dataparam1, &size, &spiCallback1 );     

    /* This would return IOM_PENDING/IOM_COMPLETED and hence check for anything else which indicates error. The actual status of the transaction comes up in the callback */

    Dataparam2->inBuffer       = loopRead2;

    Dataparam2->outBuffer    = loopWrite2;

    Dataparam2->bufLen         = 3u;

    Dataparam2->dataFormat= Spi_DataFormat_0;

    Dataparam2->flags              = Spi_CSHOLD;

       

    Dataparam2->chipSelect   = SPI_CHIPSELECT_SPIFLASH;

    GIO_submit (spiHandle, IOM_WRITE , dataparam2, &size, &spiCallback2 );        

    /* This would return IOM_PENDING/IOM_COMPLETED and hence check for anything else which indicates error. The actual status of the transaction comes up in the callback */

     

    Please also, not that since these are asynchrounous calls, which means that they just queue the packet to the driver and return (the actual end of transaction indicated in the callback), the user needs to provide separate buffer related arguments in the GIO_submit() else the packet may get corrupted.

     

    This way, since the driver is pre-queued with requests the loading of these buffers happens faster in the driver layers and there is no need to wait in the application for completion of each single packet and then queueing the next.

    Please try this with the interrupt as well as the EDMA mode and let me know the timing details.

    Thanks and Regards,

    Sandeep K

  • Sandeep,

    Sandeep

    1. If you are sending only 3 bytes at a time, then you can try the polled or Interrupt mode. Where software overhead for the polled mode < interrupt mode < EDMA mode. But with respect to CPU load, polled mode > interrupt mode > EDMA mode.

    I've been round this route already. If you use interrupt mode there is a gap between byte data transfers which the DAC rejects hence the reason for using DMA in the first place. I guess this is due to CPU loading.

    Sandeep

    2. If you are concerned about the CPU load, then you can use interrupt or EDMA mode but instead of synchronous (GIO_write()) call, you need to use the Asynchronous (GIO_submit()) call with the callback function.

    This is how I intended to do it anyway, however if you look at one of my earlier posts you will see I commented on the fact that whether using GIO_write or GIO_submit the timing between samples was much the same.

    So does that mean I am out of any other options?

    Regards Iain

  • Sandeep,

    I think we have probably taken this as far as we can now but I have one more question.

    Is this EDMA overhead all in the software library and not in the hardware?

    Regards Iain

  • Iain,

     

    Iain Scott said:

    Is this EDMA overhead all in the software library and not in the hardware?

    As per my understanding, it is not the hardware overhead.

    But in your case, it is not only the EDMA, it is the combined effect of EDMA (which is always inside the CS hold low), SPI driver (EDMA configuration, the EDMA callback and the signaling the GIO layer), the DSP/BIOS (GIO_write()/GIO_submit() and returning back to the application once signalled from the driver) and the application.

    So if we want to reduce the time gap, we need to measure delay introduced in each of the layer and then try to reduce the delay.

    Thanks and Regards,

    Sandeep K

  • Sandeep,

    Sandeep K said:

    So if we want to reduce the time gap, we need to measure delay introduced in each of the layer and then try to reduce the delay.

    Would the PSP development team @ TI be prepared to look at this?

    Regards Iain

  • Hi Iain Scott,

    Iain Scott said:

    Would the PSP development team @ TI be prepared to look at this?

    Even if we measure how much time BIOS takes and how much time it takes to configure EDMA, it may not help us to solve the problem. As I see from the previous posts, you are transfering 3 byes at a time. For such transfers using EDMA mode may not be efficient. Have you tried other modes as well?

    Regards,

    Shanmuga

  • Hi Shanmuga,

    I have tried interrupt mode and it is more efficient. However, as I explained earlier, there are gaps in the transmission which the DAC device rejects. The only way to drive a 24 bit or more SPI device is through EDMA.

    I think we have discussed this for long enough now and the conclusion would appear to be that the BIOS/PSP libraries are too inefficient for this type of application.

    Regards Iain

  • Dear David Friedland,

    I notice you have verified this thread as answered. The questions have never been answered properly.

    The issue has now been addressed by direct manipulation of the C6748 registers without PSP. What was a 100uS overhead has now been reduced to 800nS.

    It is clear that the "bloatware merchants" have taken over the world. This is supposed to be real time development. I can't believe anyone can consider this acceptable.

    If this is the best support that TI can provide I would give it zero stars for effort.

    Regards Iain Scott

  • Iain,

    Thanks for the "frank" feedback.  I don't actually own the PSP software per se, but as the forum moderator, I do try to mark threads as "answered" if the original poster has not done this, and the thread seems to be terminated, and there is no further posting for at least a week.  I occasionally get this wrong and the original submitter will take me to task, as you've done here.  This is actually valuable, as it flags the thread as needing further elevation, rather than just letting it simmer for some indefinite time. 

    I will forward your feedback up the chain to see if we can get a better resolution, at least in the longer term.

    Thanks again,

    Dave