This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

L1D cache invalidate latency



Hi!

In our project FPGA write to MSMC of c6678 through PCIE. L1D ,L1P are enabled and L2 is disabled. In c6678 swi routine we do L1D cache invalidate on part of MSMC ram. Data in MSMC ram is aligned to 128.

We used sys bios function:

Cache_inv((void *)(pcie_bar1_channel+(4*512*task_struct[i].pcie_num)), 64*4, Cache_Type_L1D, TRUE);

The Latency of this code varies from 2000 clocks to 10000 clocks (measured with Timestamp_get32()).

To avoid sys bios overhead I've wrote my own inline function:

#define L1DIBAR_ptr ( unsigned int* )( 0x01844048u)

#define L1DIWC_ptr ( unsigned int* )(0x0184404Cu)

inline void InvalidateL1D_wait ( unsigned int a_bar, unsigned int size){

*(L1DIBAR_ptr+0) =a_bar;

*(L1DIWC_ptr+0) =size;

_mfence();

_mfence();

}

But the situation did not improved greatly -latency of this code varies from 800 clocks to 9000 clocks on 64*4 invalidate length.

Why can CPU invalidate cache for such a long time?

Thanks!

  • In the case that you're doing targeted Invalidation (instead of a global invalidation.)  The cache controller has to search the L1D space and find which cache lines are associated with those memories (space can be much larger than the cache itself so time isn't relative to cache size), and then invalidate the individual cache lines.  Depending upon how many cache lines from the space that is being invalidated that are currently resident in L1D cache along with block size to be invalidated, the time to fully invalidate it will vary.

    Other factors such as system activities may also increase this time, the mfence instruction makes sure the CorePac doesn't continue operating until these activities are complete (details on mfence operation can be found in the TMS320C66x CPU and Instruction Set Reference Guide.)

    Best Regards,

    Chad

  • Thanks! In other words cache target invalidate command is undetermined in CPU cycles and in worst case will take (total addresible ram space)/(cache line length) cycles to complete ?