OMAPL138 SPI-EDMA

Risto Hedman

Other Parts Discussed in Thread: OMAPL138, ADS8688

Hello,

In my application I have two ADS8688 converters daisy chained to OMAPL138. To read ADCs I need to transfer 6Bytes. To read all 8 channels I need to do 8x6B transfers so that CS goes high between each 6B. I'm using EDMA to transfer each 6B set. I'm using single EDMA param set which I'm reactivating after each 6B transfer in a an EDMA- interrupt function - works fine.

Unfortunately I noticed that my EDMA activation cycle takes some 12us and transfer itself of 6B some 5.5us. I've prepared a vectors of 6x8B to transfer to SIMO and to receive from SOMI, between transfers I just increase these pointers with 6B and give that new pointer values to EDMA start routine - so actual data preparation does not take time.

I would like to ask experts if there is a method to shorten my EDMA preparation time, maybe using several param sets - taking into account that I need to bring CS up between transfers?? Naturally an alternative method would be the use of SPI- interrupts without EDMA.

I'm using SYS/BIOS Hwi but I don't think it has any role in this business.OMAP138 is currently running with 300MHz which I'm planning to increase to 456MHz but that should not significantly help, I think. All code and data is in L2.

Best Regards,

Risto

over 10 years ago

0 Sivaraj Kuppuraj over 10 years ago

TI__Mastermind 35645 points

Hi,

Thanks for your post.

It would be a good idea to shorten the EDMA preparation time through using multiple PaRAM sets. I think, there is a feature of breaking large transfers which is intermediate transfer chaining (ITCCHEN) enabled sothat, we could prevent a large transfer which would lock out other transfers of the same priority level for the duration of the transfer.

For example, a large transfer on queue 0 from the internal memory to the external memory using the EMIF would starve other EDMA3 transfers on the same queue. In addition, this large high-priority transfer would prevent the EMIF for a long duration to service other lower priority transfers. When a large transfer is considered to be high priority, it should be split into multiple smaller transfers.

May be, this can be illustrated using an example. For instance, inorder to move a single large block of memory (16K bytes), the EDMA3 would perform an A-synchronized transfer. The element count is set to a reasonable value, where reasonable derives from the amount of time it would take to move this smaller amount of data. Assume 1K byte is a reasonable small transfer in this example. The EDMA3 is set up to transfer 16 arrays of 1K byte elements, for a total of 16K byte elements.

I think, there are transfer chaining examples described in omapl138 TRM spruh77a doc. Kindly refer it.

Thanks & regards,

Sivaraj K

-------------------------------------------------------------------------------------------------------

Please click the Verify Answer button on this post if it answers your question.

--------------------------------------------------------------------------------------------------------

0 RandyP over 10 years ago

TI__Guru* 84110 points

Risto,

If I understand correctly, you setup the EDMA3 to transfer 6B. Something triggers the transfer, and after the transfer completes you get an interrupt. In the ISR, you setup the EDMA3 PARAM for another transfer of 6B, and something triggers the transfer, and so on. Is this correct?

What triggers the transfer (event like SPI or timer, or manual by the CPU writing to ESR, or chaining from some other transfer)?

Which CPU are you using, the ARM9 or the DSP C674x? My guess would be the DSP since you are using SYS/BIOS, but you could be using SYS/BIOS with the ARM9, I suppose.

What software are you using for setting up the EDMA3 PARAM? EDMA3 LLD or register CSL or other software?

How fast do you want the EDMA activation time to be? I expect it will drop from 12us to 8us when you increase the clock speed, while the transfer time will still be 5.5us.

What is the minimum time that CS needs to be high between transfers? Does it automatically go high after the last byte is read and will then go low when the next byte is read?

Could you show your code used to do the activation of the EDMA3 in your ISR, please? And also what you use to trigger the channel to start, if that is done with code. It could help if you also post the hex values in the 8 PARAM registers.

My guess is you are using EDMA3 LLD functions that are very easy to use and very readable to setup source parameters and then destination parameters and individual settings in the OPT register, and so on. The reason for guessing this is that those functions do read-modify-write operations with the PARAM, and this is very slow. That is never a problem when the EDMA3 PARAM is setup during application initialization and then is not needed again during normal operation. But when the R/M/W operations are done during real-time operation, each read and each write can require 20 clock cycles and that adds up quickly when you do this multiple times on the same registers.

One 'simple' solution is to create a shadow struct or array of the 8 PARAM registers and put the values into that shadow struct/array using the multiple R/M/W functions. Then you can use a single LLD function that writes the full set of 8 registers to the PARAM at one time, which will be much faster. If there are registers that do not change (like ACNT, BCNT, CCNT, SRC Addr, others) you will not need to change those even in the shadow struct/array. Depending on your full setup of values, you might be able to reduce the time more, but I will wait until you have a chance to answer the questions above before speculating more.

You are right that the SYS/BIOS Hwi will not have any role in this activation time delay.

Regards,
RandyP

0 Risto Hedman over 10 years ago in reply to RandyP

Expert 1770 points

Hi Randy,

Thank you for your reply and sorry about many "guesses" you had to do due to the limited information from my side.

First 6B transfer is triggered by DSP timer Hwi every 50us (->20kHz) EDMA- Hwi(s) trigger next 7 ones. CS high time for ADC is 30ns (min). I've written my own EDMA -functions and they use Starterware - I've not optimized them as in my previous hardware I could transfer all 8 x 6B with a single EDMA.

If you don't mind I will take time-out with this issue to test transfer with SPI-interrupts without EDMA. I realized that I can transfer with 16bit words and thus I need (only) 3 interrupts for 6bytes (total 24 for all transfers).

If EDMA will be the solution, how do you think about "Linking transfer" so that we would have 8 primary PaRAM sets and 8 secondary PaRAM sets. PaRAM set 9 would be linked to PaRAM set 1; 10 to 2; 11 to 3 etc.. Would it operate so that when I trigger transfer with sw e.g. with PaRAM set 4, it would be automatically loaded with the content from PaRAM set 12 when exhausted? If it operates so, the time needed for PaRAM set load will be 0? If this Linking transfer is not real, then the shadow you proposed is the best solution as the PaRAM sets do not change in my app. If the shadow is loaded with EDMA block transfer it's CPU usage is minimal.

Thank You,

Risto

0 RandyP over 10 years ago in reply to Risto Hedman

TI__Guru* 84110 points

Risto,

In case you come back from your SPI-interrupt testing, assuming that you want to continue looking at using EDMA, I will make a few comments now.

To make sure I understand the new information: When you get a timer interrupt, the DSP goes to the timer ISR and from there manually triggers the first 6B transfer. You then get an EDMA interrupt after each 6B transfer, and in that EDMA ISR you change the PARAM values and manually trigger another DMA transfer until all 8 transfers have been done. Please correct me if any of this is wrong or if I am leaving something out.

How fast do you want the EDMA activation time to be? I expect it will drop from 12us to 8us when you increase the clock speed, while the transfer time will still be 5.5us.

Does CS automatically go high after the last byte is read and will then go low when the next byte is read? How fast is SPI running, in other words will 1 SPI clock of CS high be adequate for the 30ns minimum requirement?

Could you show your code used to do the activation of the EDMA3 in your ISR, please? It could help if you also post the hex values in the 8 PARAM registers. It is difficult to discuss a solution with linking without understanding the transfers and how they will work.

For example, does each set of 6B go to the same location as the last set of 6B? Or does each full set of 8x6B go into a sequential array of 48B? Are there several 48B arrays that you move through, like ping-pong buffers? Or are there 8 large arrays that you add 6B to each every 50us?

Showing the actual values you put into the PARAM for your transfers will help to clear this up as much as a text description. The code would also help but may be unclear if there are variables and constants used that are not clear.

Regards,
RandyP

0 Risto Hedman over 9 years ago in reply to RandyP

Expert 1770 points

Hi Randy,

Sorry it took so long time to answer - I've been hevily loaded with some other, even more important, problems.

I finally had time to test this SPI-EDMA -question and found a reasonable solution. Below I try to answer your questions and explane my solution.

I'm transmitting (TX) conversion commands (chX-cc in table below) to two 8ch AD- converters from a pre-build vector. ADCs are of type ADS8688 and operating in daisy-chained mode where SIMO from DSP goes to both ADC's and thus they get the very same command.

When several ADS8688s are daisy chained, the SOMI from last ADS to DSP is outputting 16bit words with content NA-AD1-AD2-AD3-... where 'NA' corresponds to conversion command in SIMO and AD1, AD2,... conversion results from selected channels of ADS-1 ADS-2, ADS-3,... Conversion results will flow to ping-pong buffers RX1 and RX2.

The problem, when using EDMA, is chip select (CS),. It has to be up prior the conversion cycle, go down for the time of the conversion cycle (conversion command out and one conversion result from each ADS) and then rise again before next cycle. In my app I need to perform 8 cycles in 50us and EDMA is the only way to handle timing.

In my solution I'm (pre)building a 32bit (unsigned long) TX- vector where upper 16 bits are for CS- control and lower 16 bits for conversion command. Every fourth TX- data, named cs below, is for chip select steering only. EDMA is transfering this vector to SPIDAT1 with params
sPaRAMsetTx.aCnt = 4;
sPaRAMsetTx.bCnt = 32;
sPaRAMsetTx.destAddr = SOC_SPI_1_REGS + SPI_SPIDAT1;
sPaRAMsetTx.srcBIdx = 4;
sPaRAMsetTx.destBIdx = 0;
sPaRAMsetTx.linkAddr = 0xffff;
sPaRAMsetTx.bCntReload = 0;
sPaRAMsetTx.srcCIdx = 0;
sPaRAMsetTx.destCIdx = 0;
sPaRAMsetTx.cCnt = 1;

And receiving to RX1 or RX2 with params
sPaRAMsetRx.opt = (EDMA3_CHA_SPI1_RX << EDMA3CC_OPT_TCC_SHIFT) & EDMA3CC_OPT_TCC;
sPaRAMsetRx.srcAddr = SOC_SPI_1_REGS + SPI_SPIBUF;
sPaRAMsetRx.aCnt = 2;
sPaRAMsetRx.bCnt = 32;
sPaRAMsetRx.destAddr = 0; // Fill here PING or PONG when activating EDMA
sPaRAMsetRx.srcBIdx = 0;
sPaRAMsetRx.destBIdx = 2;
sPaRAMsetRx.linkAddr = 0xffff;
sPaRAMsetRx.bCntReload = 0;
sPaRAMsetRx.srcCIdx = 0;
sPaRAMsetRx.destCIdx = 0;
sPaRAMsetRx.cCnt = 1;

One extra 16bit xfer is needed for CS, but solution is simple, initialisation time is minimal and it is not hungry for core resources. Below TX, ping and pong wector contents.

TX RX1 (or RX2)
[0] ch1-cc na
[1] na ad1-hi
[2] na ad2-lo
[3] cs na
[4] ch2-cc na
[5] na ad1-hi
[6] na ad2-lo
[7] cs na
.
.
.
[28] ch8-cc na
[29] na ad1-hi
[30] na ad2-lo
[31] cs na

Regards,

Risto

Processors

Processors forum

OMAPL138 SPI-EDMA