I went through the CCS tutorials on application code tuning today, and after completing them went to try them out on my own code. I made good use of the tools until I tried in vain to use the cachetune tool, only to find buried in the help files that (apparently) cachetune is only supported for c6400 (among others), but not c64+ simulators. I am also a little confused by the results I see from the profiler. The profiler summary tab tells me that 75% of my cycle count is L1P stalls, but none of the functions show L1P misses in the main tab of the profiler. Anything I am obviously doing wrong, and any way to get more insight into my cache optimization problems?