Hello,
I’m runing a test of 4096 bytes data transfer from a 16bits SDRAM(using EMIF) to internal SRAM memory on a C6414.
In a first test, I disable the L2 cache and the SDRAM is not cacheable. I run the memory transfer using the DAT module of the CSL (which runs a DMA). I get a data throughput next to the EMIF bandwidth. I can see in the profiler that 1 EDMA transfer occurs.
In a second test, I enable the L2 cache (256K) and the SDRAM is configured as cachable. I run the memory transfer using a memcpy. Compared to the first step I’ve twice the number of cyclesto execute the same transfer. I was effectively awaiting a greater transfer time because the CPU has to perform 1 DMA every 128 bytes (for each cache line) which introduces an overhead. I can effectively see in the profiler that the cache management run 32 EDMA requests (4096/128).In the profiler, I get cpu.stall.mem.l2.cache.miss.read=8653 and CPU.stall.mem.L1D=9101 among the ~10000 cycles of the memcpy.
I don”t really understand the factor 2 between the 2 configurations (Cache vs EDMA).
If anyone can help me.
Thanks in advance.
Laurent.