This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Floating point routine doubles execution time when using intrinsic functions elsewhere in program



Hi,

I am trying to benchmark 3 FFT routines(2 fixed DSP_fft16x16(), DSP_fft16x32(), 1 floating DSPF_sp_fftSPxSP() ) on C6A8167 DSP running at 1.5GHz. Since my input data are real of size 2N (N=16384) I need to convert them into complex sequence of size N for these FFTs. After taking size N complex FFTs, I need to perform a split function to get back to the FFT of the original real sequence. I am using TI compiler V7.3.1 with -O3 optimization, COFF output.

The floating point split function (as shown below) increased its execution time from 0.844ms to 1.801ms after I used some intrinsic functions (_dotp2/_dotpn2/_rotl) for one of the fixed point split routines. If I just use standard ANSI C for that fixed point split routine, then the following floating point split routine will run for about 0.844 ms.

This problem is strange. I wonder what is happening.

Thanks,

Zhao

/*

* Split DSPF_sp_fftSPxSP results

*/

void

split_SPxSP(int n, float *X, float *A, float *B, float*Y)

{

int k;

float Tr, Ti;

X[2*n] = X[0];

// real

X[2*n+1] = X[1];

// imag

for (k=0; k<n; k++)

{

Tr = X[2*k]*A[2*k] - X[2*k+1]*A[2*k+1] + X[2*n - 2*k]*B[2*k] + X[2*n - 2*k +1]*B[2*k+1];

Y[2*k] = Tr;

Ti = X[2*k+1]*A[2*k] + X[2*k]*A[2*k+1] + X[2*n - 2*k]*B[2*k+1] - X[2*n - 2*k +1]*B[2*k];

Y[2*k+1] = Ti;

}

}