This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MATLAB generated code optimization for DM6437

Hi

 

I m working on DM6437. I generate code from MATLAB for video capture and display. Code section for interleaving and deinterleaving takes more than 30% of CPU load. This code lies in ""mw_ycbcr422int_to_pl.c"" and ""mw_ycbcr422pl_to_int.c"" files. How this code is optimized ?? I have to run a complex algo on DM6437 which itself takes more than 35% CPU load and when 30% of it is already taken by interleaving and deinterleaving than it becomes difficult to meet real time deadlines.....

Is there any solution to this ????

 

Regards

Saira

  • Saira,

    This question does not really seem to be BIOS related, so I'm moving this thread to the DM64 forum in hopes that it will get a faster response there.

  • I assume you mean converting from Yuv4:2:2 interleaved (single buffer for all 3 components) to planar (separate buffer for each of the three components).

    Are you using interleaved for capture/display, and planar for image processing?  If you are only using the luminance (Y) component for processing, you could use the edma to extract the luma component, then process it in place, the use the edma to return the processed luma to the interleaved buffer.

    If you need all three components in planar form to do your processing, you may have to get more creative.  If your algorithm works on one component at a time, you could use the edma in three separate passes to process each component as above.  If you have to access all three components at the same time in your processing, you may have to split your algorithm into a block based processing attack (move a piece of the image over, process it in place, move it back, repeat until the whole image is done) to get a better performance with your cache.

    If you have a large image (much greater than your L1D cache size) you will find that your processing time will be dominated by the time it takes to move memory into and out of the cache if you don't arrange your algorithm to take advantage of cache, L1D sram, or the edma.  This is especially true for cases where you have more than 1 source and/or destination for image data for each access.  I say this only guessing at the nature of the code you are needing to optimize.