DM6446 DSP 2D2D DMA Slowness Issue

IanC

For my Davinci project I've run into a serious problem with performance. For my initial testing I had a planer video buffer YUV (888) y0y1... u0u1...v0v1 ... coming in. For each row in the input buffer(-7) I DMA'd 7 rows of the y data into internal SRAM ran a conv7x7 algorithm on it and then DMA'd the ouptut row back. I got the performance up to a reasonable point. Then I switched over to the real data YUV(422) vy0uy1 buffer so I had to switch my DMA around and transfer every other byte from the input buffer to the internal SRAM buffer, run the conv7x7 function on it and DMA it back. To get this to work I had to switch from a 1D1D transfer to a 2D2D transfer. The performance went out the door. The following are for 10 video frames

1D1D DMA's 0.074097s 7x7 i8_c8

DMA .058274s 7x7 i8_c8

2D2D DMA 0.926687s

As you can see doing the 1D1D DMA the performance as pretty good, changing to 2D2D really brought it to a screeching halt. Here are the DMA parameters for both transfers hopefully I'm doing something wrong. If it's not possible to do DMA's like this quickly is it possible to tell the Davinci Front End VPFE to deliver the data planer rather than interleaved. I couldn't see anything in the manual.

Parameters for 1D DMA

   params.elementSize = width;
   params.numElements = 11;
   params.srcElementIndex = width;
   params.dstElementIndex = width;
   params.srcFrameIndex = 0;
   params.dstFrameIndex = 0;
   params.waitId = 0;
   params.srcAddr = (void*)in1Ptr;
   params.dstAddr = (void*)dma1Ptr;
   ACPY3_configure(dma1, &params, 0);

Parameters for 2D DMA

   params.transferType = ACPY3_2D2D;
   params.elementSize = 1;
   params.numElements = 11*width;
   params.srcElementIndex = 2;
   params.dstElementIndex = 1;
   params.numFrames = 1;
   params.srcFrameIndex = 0;
   params.dstFrameIndex = 0;

BTW - Is there a limit to the number of DMA channels an algorithm can have? I've been using 2 but when I switched to 3 I couldn't get it to not crash, ended up going back to 2.

Thanks,

over 15 years ago

0 Prateek Bansal over 15 years ago

TI__Expert 4510 points

Hi Ian,

I would suggest looking at Appendix D of the application note below -

focus.ti.com/lit/an/spraae7b/spraae7b.pdf

This has example code on using ACPY3 for color space conversion from interleaved YUV data to planar data. The algorithm implemented is rotate algorithm that switches U and V planes however you can easily move it from rotate to your convolution algorithm. In order to do so, the application note converts interleaved data into planar data and switches the same. This should provide you an example on exact configuration to get it running.

Prateek

0 IanC over 15 years ago in reply to Prateek Bansal

Prodigy 30 points

Prateek

I looked at the example and it's just using 1D1D DMA's to move the data into a buffer, then manually splitting up the data using intrinsic functions.

I'll test it to see how fast it is.

Processors

Processors forum

DM6446 DSP 2D2D DMA Slowness Issue