This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM6446 DSP 2D2D DMA Slowness Issue

For my Davinci project I've run into a serious problem with performance. For my initial testing I had a planer video buffer YUV (888) y0y1... u0u1...v0v1 ... coming in. For each row in the input buffer(-7) I DMA'd 7 rows of the y data into internal SRAM ran a conv7x7 algorithm on it and then DMA'd the ouptut row back. I got the performance up to a reasonable point. Then I switched over to the real data YUV(422) vy0uy1 buffer so I had to switch my DMA around and transfer every other byte from the input buffer to the internal SRAM buffer, run the conv7x7 function on it and DMA it back. To get this to work I had to switch from a 1D1D transfer to a 2D2D transfer. The performance went out the door. The following are for 10 video frames

1D1D DMA's 0.074097s 7x7 i8_c8

DMA .058274s 7x7 i8_c8

2D2D DMA 0.926687s

 As you can see doing the 1D1D DMA the performance as pretty good, changing to 2D2D really brought it to a screeching halt.  Here are the DMA parameters for both transfers hopefully I'm doing something wrong. If it's not possible to do DMA's like this quickly is it possible to tell the Davinci Front End VPFE to deliver the data planer rather than interleaved. I couldn't see anything in the manual.

 Parameters for 1D DMA

   params.elementSize = width;
   params.numElements = 11;
   params.srcElementIndex = width;
   params.dstElementIndex = width;
   params.srcFrameIndex = 0;
   params.dstFrameIndex = 0;  
   params.waitId = 0;
   params.srcAddr = (void*)in1Ptr;
   params.dstAddr = (void*)dma1Ptr;
   ACPY3_configure(dma1, &params, 0);

Parameters for 2D DMA

   params.transferType = ACPY3_2D2D;
   params.elementSize = 1;
   params.numElements = 11*width;
   params.srcElementIndex = 2;
   params.dstElementIndex = 1;
   params.numFrames = 1;
   params.srcFrameIndex = 0;
   params.dstFrameIndex = 0;

 BTW - Is there a limit to the number of DMA channels an algorithm can have? I've been using 2 but when I switched to 3 I couldn't get it to not crash, ended up going back to 2.

Thanks,

  • Hi Ian,

    I would suggest looking at Appendix D of the application note below -

    focus.ti.com/lit/an/spraae7b/spraae7b.pdf

    This has example code on using ACPY3 for color space conversion from interleaved YUV data to planar data. The algorithm implemented is rotate algorithm that switches U and V planes however you can easily move it from rotate to your convolution algorithm. In order to do so, the application note converts interleaved data into planar data and switches the same. This should provide you an example on exact configuration to get it running. 

    Prateek

     

     

     

  • Prateek

    I looked at the example and it's just using 1D1D DMA's to move the data into a buffer, then manually splitting up the data using intrinsic functions.

    I'll test it to see how fast it is.