Hi,
I am trying to benchmark 3 FFT routines(2 fixed DSP_fft16x16(), DSP_fft16x32(), 1 floating DSPF_sp_fftSPxSP() ) on C6A8167 DSP running at 1.5GHz. Since my input data are real of size 2N (N=16384) I need to convert them into complex sequence of size N for these FFTs. After taking size N complex FFTs, I need to perform a split function to get back to the FFT of the original real sequence. I am using TI compiler V7.3.1 with -O3 optimization, COFF output.
The floating point split function (as shown below) increased its execution time from 0.844ms to 1.801ms after I used some intrinsic functions (_dotp2/_dotpn2/_rotl) for one of the fixed point split routines. If I just use standard ANSI C for that fixed point split routine, then the following floating point split routine will run for about 0.844 ms.
This problem is strange. I wonder what is happening.
Thanks,
Zhao
/*
* Split DSPF_sp_fftSPxSP results
*/
void
split_SPxSP(int n, float *X, float *A, float *B, float*Y)
{
int k;
float Tr, Ti;
X[2*n] = X[0];
// real
X[2*n+1] = X[1];
// imag
for (k=0; k<n; k++)
{
Tr = X[2*k]*A[2*k] - X[2*k+1]*A[2*k+1] + X[2*n - 2*k]*B[2*k] + X[2*n - 2*k +1]*B[2*k+1];
Y[2*k] = Tr;
Ti = X[2*k+1]*A[2*k] + X[2*k]*A[2*k+1] + X[2*n - 2*k]*B[2*k+1] - X[2*n - 2*k +1]*B[2*k];
Y[2*k+1] = Ti;
}
}