This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DVRRDK question on allocating buffer in DSP

Other Parts Discussed in Thread: SYSBIOS

I want to create a buffer with size of the frame in DSP, so I can use it for processing and temporary storage.

I have tried following the SCD module using Utils_memAlloc(frame_size,  SharedRegion_getCacheLineSize(SYSTEM_IPC_SR_CACHED) ); to allocate buffer.

but then it is cached so I have to call ti_sysbios_family_c64p_Cache_inv and ti_sysbios_family_c64p_Cache_wb which is seems to use up a lot of DSP loading?

I tried just calling the Cache_inv and Canche_wb without other processing in between and the DSP loading increase by 30%. is this normal? I am passing 16ch D1 + 16ch CIF to DSP but only the D1 channels would call the Cache_inv and Canche_wb during my test.

or I should allocate a buffer from some where else? Thanks.

  • Thomas,

    You need to take care of cache_inv and cache_wb only if you have enabled cache on DSP side. This can be verified from cfg file FC_RMAN_IRES_c6xdsp.cfg. Please check mar bits that are enabled. By default complete region is cached for DSP.

    Utils_memAlloc() tries to allocate memory from SR2 first and then if it is not available then it looks into Tiler region and/or (SR1) Bitstream buffer. You can check memory map and accordingly you can set MAR bits for  (SR2) Frame Buffer Region. Just to verify cached/uncached region, you can check MAR bit status of the region from where memory is allocated. Please see AlgLink_ScdVACreate() function in scdLink_alg.c  file for more details.

    If you are requesting a frame memory then you can use simpler API i.e. Utils_memFrameAlloc().  Its easy to use.

  • Why are you doing cache invalidate operations ? If you use it as a temporary buffer there is no need to do any cache coherency operation. SCD does cache coherency operation because the buffer is sent to A8 .

  • Thanks Ritesh I would try..

    Badri, I don't want cache coherency operation. But I followed SCD module to allocate buffer and it requires cache handling, otherwise it is not updated. So I ask which is a better way to allocate a temporary buffer for use.

  • Allocating buffer from SharedRegion 1 does _not_ mean you have to do any cache coherency operation. Remove the code for cache coherency operation in your case as it is not meaningful. Cache management is automatically done by c674 core. If only c674 is accessing the buffer you allocate _do_ _not_ do cache coherency operation as it will unnecessarily waste CPU  cycles. As I mentioned cache coherency is required in SCD case because the buffer is sent to A8 which will not happen in your case. Cache coherency has nothing to do with buffer allocation

  • sorry for asking so much as I am not familiar with cache issue.

    if I need to use DMA to copy to or from this buffer to the frame buffer. Is cache operation needed? Thanks.

  • If you read from a buffer in DDR that is in a cache enabled region it will be fetched into c674 L2/L1 cache. All further references to that buffer will be to cache.

    If the contents of the buffer are then modified in DDR by another initiator like EDMA then the contents of DDR (actual updated contents) and contents of L2/L1 cache are no longer coherent.

    c674 does not have a h/w support for snoop and invalidate between DDR and L2 cache and you have to programmatically take care of doing Cache_inv before reading the buffer contents.

     

    If you write to a buffer in DDR that is in a cache enabled region it will be written into c674 L2/L1 cache and the line will be marked as dirty. Data will be updated in DR only when the line is evicted by the cache controller.

    If the contents of the buffer are read from DDR by another initiator like EDMA then the contents of DDR (stale data ) and contents of L2/L1 (updated data) cache are no longer coherent.

    c674 does not have a h/w support for snoop and invalidate between DDR and L2 cache and you have to programmatically take care of doing Cache_wb before reading the buffer contents by EDMA.

    Based on above you can decide whether you need cache coherency operation or not. It is preferable not to touch the buffer in DDR by CPU . Optmized algorithms generally have a ping pong buffer in L2 SRAM.

    A line of data is fetched from DDR to L2 SRAM by DMA, processed and written back to DDR by DMA.

    When above processing is happening the next line is fetched in parallel to the pong buffer.

    This way the CPU wait time for DMA xfer to be completed is minimized and also there is no need to do cache coherency operation on the buffer in DDR as it is never accessed by CPU directly.