Hi,
In order to make our c6678 code faster, we have replaced a Dot_product() function with a DSPLIB equivalent as shown.
for loop ... {
#if 1
L_tmp = DSP_dotprod ((short *)y2_fx,(short *)y2_fx,L_SUBFR);
#else
L_tmp = Dot_product(y2_fx, y2_fx, L_SUBFR);
#endif
}
We added DSP_dotprod() source code as a static inline function. This always qualifies the loop, results faster performance (as expected) in some cases, but dramatically slower performance in others.
What intermediate or report information can the c66x compiler generate so I might get some idea of what's making the difference ? If I should post specific code examples (e.g. a "fast" one vs. a "slow" one) please let me know.
--
Thanks! Regards, Sarvani Chadalapaka HPC Systems Engineer Signalogic Inc.