I am trying to optimize my L1P cache performance, and I've manually assigned the functions belonging to my processing loop of interest to a special section that I then aligned to a 32kB boundary. I also included the library calls made by those functions. I am still getting severe performance degradation from build to build with minor changes, so I want to throw my interrupt vector table into this 32k block as well. Currently in my linker command file I am set up like this:
MEMORY
{
...
DDR2 : origin = 0x80000000, len = 0x10000000 // 256 MB
}
SECTIONS
{
...
.vectors > DDR2 align 0x400 // interrupt table must be 1024 byte aligned
.text > DDR2
.FAST_CODE_SECTION > DDR2 align 0x8000 // want fast code aligned on 32kB cache boundary
{
rts64plus.lib<divi.obj>
rts64plus.lib<divu.obj>
IQmath_c64xPlus.lib
}
}
I tried carving a separate section out of the beginning of DDR2 with size 32kB and then assigning the .vectors section and the .fast_code_section to it with the required alignments, but it completely broke my program when I tried to execute it. Is there a way to assign the .vectors section "inside" the .fast_code_section while maintaining the required alignments using the first example style?