This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM814x - DMA - Data Copied Incorrectly

We have been struggling for a while to figure out why the processing of Ethernet telegrams sometimes got delayed quite a bit (350-500ms) and we now realized that it happens because we are creating copies of the Frame data we receive from the processing chain (mcfw)!
 
We have two different output paths from the processing chain:
       1) JPEG bitstream,
       2) YUV420 frames.
When we copy the bitstream data (~1MB) then it takes 1-3ms per image. But when we copy the YUV420 frame data (~7.5MB) then it takes more than 350ms (sometime even 500ms)!!!!
 
We use the standard memcpy to perform the copy operation on both the bitstream as well as the frame data, but for some reason the frame data is MUCH slower to be copied.

 

So my question are:
-          What is the difference between the bitstream data and the frame data from the MCFW processing chain?
  • I believe this is related to memory access – such as caching – is this correct?
  • Can we modify the Frame output processing so it will be faster (just like the bitstream data?
-          Is there any way to speed up the memcpy in general?
  • I believe that DMA woule be usefull – but when I tried to use the OSA_dma code from the “Stream” demo/example in my own code, the data copied incorrectly.
  • Are there any thing special that need to be done in order to use the OSA_dma functionality in my code?
  • Do I have to allocate the buffer memory using CMEM or something like that in order to get it to work!??
  • What is the difference between the bitstream data and the frame data from the MCFW processing chain?

     - Bitstream buffers are allocated from a shared region that is cached on A8.

     - Frame buffers are allocated from a shared region which is not mapped on A8. Only the physical address of the buffer is exported and the application does mmap to get a virtual pointer for CPU access.

    The timing difference in memcpy is due to frame buffers being non-cached.Only way to speed up memcpy is to mmap the frame buffers as cached. I believe mmap takes attributes to enable caching and since it is the application that is mmaping the buffers they  can map the buffers  as cached.

    Application has to handle cache-coherence.This is done by mcfw for bitstream buffers but if application is enabling caching for frame buffers it should ensure it does Cache_inv before read and Cache_wbInv after write to frame buffers.

    DMA is the correct method to avoid all these problems. We have been using DMA for bitstream copy by default in mcfw demos and we have not seen any issues with DMA usage .Bitstream content is more sensitive to errors as they will result in decoder errors.

    WHat errors are you seeing with DMA use:

     - Content is completely wrong or some errors occur ?

     - I assume you are passing the physcial address exported by mcfw to the DMA engine for the frame buffer

     - The destination buffer to which frame data is copied should be physically contiguous .Are you ensuring that in your application.

     - Application should ensure dma kermod is inserted (Happens by default if load.sh is invoked) and OSA_dmaInit() is invoked from application before DMA APIs are used.