EDMA_setChannel stalls ?

CHli

 
Hi, 

We have a custom board with a C6415T running at 1GHz and an FPGA 
attached to it using the 64-bit EMIF bus. 

I'm using an HWI triggered by an external interruption pin connected to 
the FPGA to post a SWI function that starts an EDMA transfer toward the FPGA
by using the CSL function : EDMA_setChannel.

The data transfer happens and I get a EDMA_INT interruption when the transfer is complete. But looking at 
what happens I'm concerned about performance issues I might encounter 
later on. 

Below is a screenshot taken on the scope that is attached to some LEDS. 

Green is the HWI triggered by the FPGA (low means active) 
Violet is the EDMA_setChannel function call (inside a SWI posted by the HWI, 
low means running) 
Pink is the EDMA completion ISR (rescheduling the SWI if required)

What is happening inside the EDMA_setChannel that takes 55uS ? 

The only thing the function does is writing a bit in an internal 
register (Event Set Register). Could the writing stall if the queue for
the corresponding EDMA priority is full or for any other reason ? Or is
that writing always that slow ? Any suggestions on how to improve that 
(while keeping the CPU triggered transfer). 

Best Regards, 

Christophe

over 15 years ago

0 GiPa over 15 years ago

TI__Expert 6635 points

Dear Christophe,

below some suggestions and comments regarding the issue you mentioned:

- I believe within the SWI routine, as long as you will start the channel by phisically writing to the edma, the edma transfer will start and progress in parallel.

This means the duration of the SWI routine should not have a direct influence on the EDMA transfer duration from this point, but of course it would be better to have this code as short as possible, so that you don't run the risk to miss events, in case you want to use the same edma channel over time

I assume you would not have other tasks or HWIs running besides from the one you mentioned, for your test, otherwise you would have some other activities like HWI which could block the SWI, since they are higher priority, or other higher priority SWIs, giving the net effect of having the duration of the routine stretched

Having said that, there are several things which might be related to the task duration you measured, like:

- location of the BIOS code and objects, and the application code. In case of HWI/SWIs, having this located in external memory, for example, would make the duration slow since it could require some cache bring-in activity before the CPU will be able to execute the code. Note that if the external memory range is not marked as cache, or if L2 is setup as all ram, would make the access even slower (the cache user guide gives detail on the miss penalties)

- BIOS itself would require some time for the context switching between HWI/SWI. In the BIOS install dir you can find benchmarks with a document describing how the number of CPU cycles needed in several runtime scenarios were measured. This would also be interesting to get a better feeling of the overhead introduced by the OS

To help you in understand how the BIOS scheduler works, regarding the interaction between the system timer and the BIOS functions, I recommend looking at app note spra829 bios timers and benchmarking tips. It gives some good insights on how the statistics are created for periodically driven objects and how the timer is used

- note as well that in CCS you have a BIOS viewer which can be used to track the scheduler and the BIOS objects, so that you can also check the event sequence and the duration in ticks. Otherwise, you can also use the STS objects, although looking at a scope would be also a good way to check, of course

- see as well spru610c (C64x internal mem ref guide); on a longer term view, you have the possibility of tuning some parameters in respect to edma and cache activity, like EDMAWEIGHT and the L2ALLOC registers, for example . In the edma user guide the priority queue allocations can be seen as well in table 4-4 and 4-3. Might be useful for better tuning the system in a more complex scenario where multiple transfers would be ongoing (EDMA user driver, and EDMA activity from the cache controller and eventual masters like HPI/PCI)

in this sense a very good reference of information for how to tune the system is spraa02: c64x edma performance data

- for manually triggered EDMA, it would be probably better to use the QDMA which is a bit more efficient on this than EDMA. Of course depending of the complexity of the transfer the QDMA might not be the best choice since it is a 1D to 1D transfer, but if you can break up the EDMA transfer in smaller basic QDMA transfer probably this would give you better performances

- note that the edma can also directly react on external events from the gpio pins (see the edma user guide, spru234c). In this way you could have the transfer being directly triggered by the external event without having to pass through the ISR > SWI chain. Within the EDMA isr you could still back-signal to the application that the transfer has finished

There is also this interesting app note, spraa36: external programming of c64x edma for low overhead data transfers - which shows a way to have an external master (like an FPGA for example) scheduling EDMA transfer towards the DSP (which is the EDMA master).

- another last option which comes to my mind is using PDT transfers, which are particularly siuted if the purpose is to transfer data between two endpoints on the EMIF (like an FPGA writing / reading to SDRAM). Be aware though, there is a small silicon issue with one specific configuration of PDT transfers - see the silicon errata for the details

Hope this helps

0 CHli over 15 years ago in reply to GiPa

Intellectual 330 points

Hi, Thanks for your very informative answer. I will supply more information soon. Regards, Christophe

Processors

Processors forum

EDMA_setChannel stalls ?