Hello!
I'd like to ask for advice on the following topic. In our system we use C6414 DSP. It is connected to FPGA through EMIFA in 64-bit mode. We have to move OFDM data to and from FPGA at a rate about 2 to 4 times per approx. 100 us. We were using regular REG_WRITE/REG_READ to do the job. Obviously, that consumes most of DSP time. One 2KB transfer takes about 17 us. Now I'm trying to use EDMA mechanism to do that. Because I have to swap left/right side data when downloading to FPGA buffer, I use chaining of EDMA transfers. To reduce number of required operations, all necessary channels are configured in advance. Then, on FPGA interrupt I only issue EDMA_setChannel() for right side data. On completion in initiates chained left side transfer.
To monitor performance I use the following mechanism. In FPGA there is a test point register. I just write/clear certain bit and monitor it with oscilloscope.
So my observation is that It takes about 17 us to transfer 2KB block. I monitor that by CE signal on FPGA. In the same time, submitting EDMA transfer takes 4.4 us. Namely, this piece of code
SET_TP(TX_OFDM_TP); // Set test point High
EDMA_setChannel(dma->hdl_tx_ch1_r); // Trigger event
CLR_TP(TX_OFDM_TP); // Clear test point
takes 4.4 us to execute. It definitely much better, but I wonder, why so long.
Also I've tried to use QDMA for right side transfer and chain EDMA transfer to left side data like this:
SET_TP(TX_OFDM_TP);
// Set test point High
EDMA_qdmaConfig(&dma->cfg_tx_ch1_r); // Setup QDMA
CLR_TP(TX_OFDM_TP); // Clear test point
But it takes same 4.4 us to submit. I even manually set up QDMA registers - with same result.
So I wonder, why it takes so long time for what was claimed to take "only one to five CPU cycles" according to spru234. I was suspecting big overhead of EMIFA access for test point writes, so I've tried
SET_TP(TX_OFDM_TP); // Set test point High
x = y; // Some minor stuff
CLR_TP(TX_OFDM_TP); // Clear test point
and it took only 260 ns. So it looks that configuring EDMA/QDMA itself takes that long time.
Just in case it that might be important. In our configuration we don't use L2 cache. All L2 memory is split in SRAM and heap.
Would appreciate any advice. Thanks in advance.