This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Cache Prefetch function for C6747.

Hi All,

 Is there any cache pre-fetch function available for C6747 core ?.

 I have enabled 32Kb L1P and L1D cache, also 64Kb L2 cache.  From the profile report that I generated using CCS code coverage profiler, cache read miss count of  few functions are high.  What I am thinking is, by pre-fetching buffers used by these functions in to cache(copying from external memory to cache), I can reduce the cache read miss count. Is it possible..? Is there any CSL(Chip Support Library) API available for this..?

Thanks in advance.

Regards,

Paul

 

  • Hi Paul,

    Thanks for your post.

    Please find the list of available API's used for Cache module in the below C6000 API reference guide and in which, please refer Table 2-1, 2-2, 2-3 for cache API's and Macros:

    http://www.ti.com/lit/ug/spru401j/spru401j.pdf

    Also, there are rCSL examples (CACHE_dspLib_fft) for cache initialization using Macros to configure L1, L2 and MAR bits to improve the performance of source code located in external memory as below:

    http://processors.wiki.ti.com/index.php/QuickStartOMAPL1x_rCSL#rCSL_Examples

    CSL library for C6747 7 C6748 comes as a part of BIOS PSP package and in which, you shall be able to get the userguide, datasheet etc. Please find the BIOS PSP product download page as below:

    http://software-dl.ti.com/dsps/dsps_public_sw/psp/BIOSPSP/01_30_01/index_FDS.html

    Thanks & regards,

    Sivaraj K

    ------------------------------------------------------------------------------------------------------- 
    Please click the Verify Answer button on this post if it answers your question.
    --------------------------------------------------------------------------------------------------------
  • Hi Sivaraj,

     Thanks very much four your reply.

    From the CSL  and DSP/BIOS API sets I didn't find any function for cache prefetch.  From 'TMS320C674x DSP Cache User's Guide'  I identified a 'touch' function for cache prefetch.

    5706.touch.asm

    When I used this function, I got reduction in cache read miss and MCPS of that particular function. But the cycles consumed by 'touch' function is high. Thus, the gain in cache miss reduction is lost in the 'touch' function. 

    From TI forum( http://e2e.ti.com/support/dsp/omap_applications_processors/f/42/t/130932.aspx ), I found that, for getting benefits from 'touch' function the data should be in L2 SRAM. As specified in that post I transfered my buffers from external memory(SDRAM) to L2 SRAM using DMA. For buffer transfer I used 'DAT_copy()' API from CSL (I got example application for DAT_copy() DMA transfer form, 'edma3lld_01_11_02_05\examples\CSL2_DAT_DEMO'). 

    I am attaching the source code of my DMA transfer function.

    /* special magic DMA transfer IDs */
    #define DAT_XFRID_WAITALL   0xFFFFFFFF
    #define L2_RAM_BUFFER       0x00820000
    #define BUFFERSIZE_BYTES    4096
    
    
    /* Copying from external memory to internal memory using DMA */
    void DMA_Copy(float * restrict PrevRealL,
    		      float * restrict PrevImagL,
    		      float * restrict PrevRealR,
    		      float * restrict PrevImagR,
    		      float * restrict PrevRealL_RAM,
    	          float * restrict PrevImagL_RAM,
    		      float * restrict PrevRealR_RAM,
    	          float * restrict PrevImagR_RAM)
    {
    	/* Cache write back */
    	BCACHE_wb(PrevRealL, BUFFERSIZE_BYTES, 1);
    	/* DMA transfer */
    	DAT_copy(PrevRealL, PrevRealL_RAM, BUFFERSIZE_BYTES);
    
    	BCACHE_wb(PrevImagL, BUFFERSIZE_BYTES, 1);
    	DAT_copy(PrevImagL, PrevImagL_RAM, BUFFERSIZE_BYTES);
    
    	BCACHE_wb(PrevRealR, BUFFERSIZE_BYTES, 1);
    	DAT_copy(PrevRealR, PrevRealR_RAM, BUFFERSIZE_BYTES);
    
    	BCACHE_wb(PrevImagR, BUFFERSIZE_BYTES, 1);
    	DAT_copy(PrevImagR, PrevImagR_RAM, BUFFERSIZE_BYTES);
    }
    
    void Dummy_function(float * restrict PrevRealL,
    		              float * restrict PrevImagL,
    		              float * restrict PrevRealR,
    		              float * restrict PrevImagR,)
    {
    
    	float * restrict PrevRealL_RAM = (float *)(L2_RAM_BUFFER); 
    	float * restrict PrevImagL_RAM = (float *)(L2_RAM_BUFFER + (1 * BUFFERSIZE_BYTES));
    	float * restrict PrevRealR_RAM = (float *)(L2_RAM_BUFFER + (2 * BUFFERSIZE_BYTES));
    	float * restrict PrevImagR_RAM = (float *)(L2_RAM_BUFFER + (3 * BUFFERSIZE_BYTES));
    
    	/* Copying from external memory to internal memory using DMA */
    	DMA_Copy(PrevRealL,PrevImagL,PrevRealR,PrevImagR,
    			PrevRealL_RAM,PrevImagL_RAM,PrevRealR_RAM,PrevImagR_RAM);
    			
    	/* .......
           some processing loop 
    	   ....... */
    
       /* Wait for the completion of DMA transfer */
    	DAT_wait(DAT_XFRID_WAITALL);
    
    	/* Cache prefetch */
    	touch((void *)PrevRealL_RAM,BUFFERSIZE_BYTES);
    	touch((void *)PrevImagL_RAM,BUFFERSIZE_BYTES);
    	touch((void *)PrevRealR_RAM,BUFFERSIZE_BYTES);
    	touch((void *)PrevImagR_RAM,BUFFERSIZE_BYTES);
    
        /* .........
    	   processing loop which using the cached L2 SRAM buffers */
    	   
    }

    Now the cycles consumed by 'touch' function is reduced considerably. But again this gain is lost in DMA_Copy() function, as its cycle value is high. Also MCPS value of few more functions increased. 

    Can you figure out the reason for this...?

     Thanks & Regards,

    Paul