Floating point routine doubles execution time when using intrinsic functions elsewhere in program

Zhao Li

Hi,

I am trying to benchmark 3 FFT routines(2 fixed DSP_fft16x16(), DSP_fft16x32(), 1 floating DSPF_sp_fftSPxSP() ) on C6A8167 DSP running at 1.5GHz. Since my input data are real of size 2N (N=16384) I need to convert them into complex sequence of size N for these FFTs. After taking size N complex FFTs, I need to perform a split function to get back to the FFT of the original real sequence. I am using TI compiler V7.3.1 with -O3 optimization, COFF output.

The floating point split function (as shown below) increased its execution time from 0.844ms to 1.801ms after I used some intrinsic functions (_dotp2/_dotpn2/_rotl) for one of the fixed point split routines. If I just use standard ANSI C for that fixed point split routine, then the following floating point split routine will run for about 0.844 ms.

This problem is strange. I wonder what is happening.

Thanks,

Zhao

* Split DSPF_sp_fftSPxSP results

void

split_SPxSP(int n, float *X, float *A, float *B, float*Y)

{

int k;

float Tr, Ti;

X[2*n] = X[0];

// real

X[2*n+1] = X[1];

// imag

for (k=0; k<n; k++)

{

Tr = X[2*k]*A[2*k] - X[2*k+1]*A[2*k+1] + X[2*n - 2*k]*B[2*k] + X[2*n - 2*k +1]*B[2*k+1];

Y[2*k] = Tr;

Ti = X[2*k+1]*A[2*k] + X[2*k]*A[2*k+1] + X[2*n - 2*k]*B[2*k+1] - X[2*n - 2*k +1]*B[2*k];

Y[2*k+1] = Ti;

}

over 13 years ago

0 Yimin Zhang over 13 years ago

TI__Intellectual 1690 points

Hi,

Maybe your memory map change caused cycle increment.

regards,

Yimin

Processors

Processors forum

Floating point routine doubles execution time when using intrinsic functions elsewhere in program