Other Parts Discussed in Thread: OMAP3530
Hi,
I am posting for a customer here since the topic might be interesting for all DSP optimizers using OMAP3530. CCSv4.2.01.4 is being used.
While optimizing IVA2/DSP code on OMAP3530 generated by C6Run there is only the simulator that has the means to study cache behaviour for us. The application has been stripped down to be loadable into the simulator, and the remaining DSP algorithm is almost the same as in the real application. The algorithm is tiny in code size, but heavily using many data buffers, which means that the L1P cache almost never gets modified, but the L1D cache accesses present roughly 25% cache misses. We need to get this figure way down.
Here comes a number of related questions that needs to be answered:
Q1: Running the profiler, the L1D cache and L1P cache and CPU stall statistics are visible. But the L2 statistics seems to be all zeros. Is the L2 cache not modeled?
Q2: By what means can the simulator's L2 cache be configured to provide statistics?
Q3: What is the best practice for using L2 in a heavy data processing scenario? As L2 SRAM or L2 data cache? Philosophies to follow?
Q4: The Cache Ram Viewer window always show up empty, regardless that L1D cache statistics giving evidence of it being in use. Is this not modeled? Broken?
Q5: The penalty cycles for L1D cache misses are important, and we like to change these if possible to get closer to the real HW that obviously executes much more cycles for the same code than the simulator. What are the means to configure L1D cache miss penalty values in the simulator?
Q6: How can we get a measure on the L1D cache miss penalty from the real chip (in time or DSP cycles)?
Q7: When checking the box within the Profile Setup - "Code Coverage enable" - CCS will crash. Why?
Thanks for advicing us, so that we can get more grip on the situation.
/Magnus Aman