I made a small change in my code base and suffered a serious execution time hit, and would like to rule out L1P cache problems as the culprit (and solve them if that is the culprit). I have over 32kB of source code, but my "critical" section would easily fit in the 32k of L1P cache. I would like to take a handful of files and have the linker place their outputs sequentially on a 32kB boundary, and just kind of dump the rest of the code after that. I am using a dm6435 without DSP/BIOS compiled with no debug and -o3 optimization. What is the best way to accomplish this?