I have been comparing the execution cycles of some very simple functions and am very surprised at the large number of cycles required compared to simple integer math. One thought I have is that maybe my program cache is not handling this properly, but I'm not sure. This is setup so that the compiler should be able to pipeline the operations and seems to work with the integer math. I don't know what is actually happening so I could use some help please.
This code executes in an average of 56 cycles with NUM_TAPS set to 16:
STS_set
(&stsGenericTime3, CLK_gethtime());
iq10Rccoeff = 0;
for (i = 0; i < NUM_TAPS; i++)
iq10Rccoeff += iq10ImgNormVal[i] * iq10Tmpl[i];
STS_delta(&stsGenericTime3, CLK_gethtime());
This code executes in an average of 591 cycles with NUM_TAPS set to 16:
STS_set
(&stsGenericTime3, CLK_gethtime());
iq10Rccoeff = 0;
for (i = 0; i < NUM_TAPS; i++)
iq10Rccoeff += _IQ10mpy(iq10ImgNormVal[i], iq10Tmpl[i]);
STS_delta(&stsGenericTime3, CLK_gethtime());
That is a 10x increase. Any suggestions on how to improve this? My optimization options are set as follows:
Optimize for Speed (-mf): 5
Opt Level: File (-o3)