DSPF_sp_fftSPXSP() multi-pass implementation problem

James Wang

Other Parts Discussed in Thread: OMAP-L138

Recently I met a problem when using DSPF_sp_fftSPXSP() multi-pass function for OMAP-L138.

When I use the single-pass DSPF_sp_fftSPXSP(), the FFT output is correct. While I use DSPF_sp_fftSPXSP() multi-pass implementation , the result is

definitely wrong. Can anybody help me figure it out? or is there any other limits for input variables in this function? thanks a million!

BTW:

The dsplib version is: dsplib_c674x_3_4_0_0

CCS version is : 5.5.0.00077

Part of the code is as follows:

#define N 1024

#pragma DATA_ALIGN(x_ref, 8);
float x_ref[2*N];

#pragma DATA_ALIGN(x_sp, 8);
float x_sp[2*N];

#pragma DATA_ALIGN(w_sp, 8);
float w_sp[2*N];

#pragma DATA_ALIGN(y_sp, 8);
float y_sp[2*N];

unsigned char brev[64] = {
0x0, 0x20, 0x10, 0x30, 0x8, 0x28, 0x18, 0x38,
0x4, 0x24, 0x14, 0x34, 0xc, 0x2c, 0x1c, 0x3c,
0x2, 0x22, 0x12, 0x32, 0xa, 0x2a, 0x1a, 0x3a,
0x6, 0x26, 0x16, 0x36, 0xe, 0x2e, 0x1e, 0x3e,
0x1, 0x21, 0x11, 0x31, 0x9, 0x29, 0x19, 0x39,
0x5, 0x25, 0x15, 0x35, 0xd, 0x2d, 0x1d, 0x3d,
0x3, 0x23, 0x13, 0x33, 0xb, 0x2b, 0x1b, 0x3b,
0x7, 0x27, 0x17, 0x37, 0xf, 0x2f, 0x1f, 0x3f
};

/* Generate complex signal,saved in x_sp */
SigGenerator();

/* Generate twiddle factors, stored in w_sp */
gen_twiddle_fft_sp(w_sp, N);

/* first stage */
DSPF_sp_fftSPxSP(N, &x_sp[0], &w_sp[0], &y_sp[0], brev, N/4, 0, N);

/* second stage */
/* y_sp is the array for FFT output */
DSPF_sp_fftSPxSP(N/4, &x_sp[2*3*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 3*N/4, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*2*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 2*N/4, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, N/4, N);

Thanks again

Yours,

James

over 11 years ago

0 Sivaraj Kuppuraj over 11 years ago

TI__Mastermind 35645 points

Hi James,

Thanks for your post.

Basically for an N-point FFT, if N=256, single pass implementation would be the best choice, but if you go beyong N-512, 1024 etc., you should go for multi-pass implementation (break-up large FFT's into several sub-FFT's). For more details on this, please refer section 3.3.3.2 from the below doc.

http://www.ti.com/lit/an/spra947a/spra947a.pdf

Please refer the C67x DSP lib. programmer's reference guide to check the algorithm implementation in your code is compliant to the specified below doc.

http://www.ti.com/lit/ug/spru657c/spru657c.pdf

In the above doc, please refer DSPF_sp_fftSPxSP function in page no. 49.

In addition to the above, please refer the below E2E threads which would give you better clarity:

http://e2e.ti.com/support/development_tools/compiler/f/343/t/185941.aspx

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/274765

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/95526.aspx

Thanks & regards,

Sivaraj K

-------------------------------------------------------------------------------------------------------

Please click the Verify Answer button on this post if it answers your question.

-------------------------------------------------------------------------------------------------------

0 James Wang over 11 years ago in reply to Sivaraj Kuppuraj

Prodigy 40 points

HI, Sivaraj , thanks for your reply.
Unfortunately, I have read almost every thread related to DSPF_sp_fftSPxSP in the forum, but I have found no correct answers. Does DSPF_sp_fftSPxSP multi-pass function work correctly for C67X instead of C674X? if any specialist knows it, please let me know.
Thanks a lot!

0 Sivaraj Kuppuraj over 11 years ago in reply to James Wang

TI__Mastermind 35645 points

Hi James,

Thanks for your update.

Ofcourse, multi pass implementation would work and performance wise too, you would see multi pass would allow less L1D cache miss cycles & lesser cycle count compared to single pass. However, multi pass implementation would require multiple stages and each stage would require multiple calls for the same function. The doc. spra947a.pdf is the right way to implement the multi pass and decrementing the counter at for loop is the correct implementation for multi-pass FFT.

=============

n=2048,float x[2048*2],float y[2048*2],float w[2048]

// stage one

DSPF_sp_fftSPxSP( n, &x[0], &w[0], y, brev, n/16, 0, n );

// stage two

for( i=0;i<16;i++){

DSPF_sp_fftSPxSP( n/16, &x[2*(15-i)*n/16], &w[2*n*15/16], &y[0], brev, 2, 2*(15-i)*n/16, n );

}

============

And got working with the following modifications:

==================

n=2048,float x[2048*2],float y[2048*2],float w[2048]

// stage one

DSPF_sp_fftSPxSP( n, &x[0], &w[0], y, brev, n/16, 0, n );

// stage two

for( i=15;i>=0;i--){

DSPF_sp_fftSPxSP( n/16, &x[2*(15-i)*n/16], &w[2*n*15/16], &y[0], brev, 2, 2*(15-i)*n/16, n );

}

============

Thanks & regards,

Sivaraj K

-------------------------------------------------------------------------------------------------------

Please click the Verify Answer button on this post if it answers your question.

-------------------------------------------------------------------------------------------------------

0 James Wang over 11 years ago in reply to Sivaraj Kuppuraj

Prodigy 40 points

Thanks Sivaraj!

I have tried the ways you gave to above, but it still not works.

While I take the following approach, it works correctly, so I guess there may be some limits on the input variables in the API DSPF_sp_fftSPxSP(), can anybody help me figure it out? Thanks a million.

//stage one
DSPF_sp_fftSPxSP_cn(N, x_sp, w_sp, y_sp, brev, N/4, 0, N);
//stage two
DSPF_sp_fftSPxSP(N/4, &x_sp[0], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 0, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*1*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 1*N/4, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*2*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 2*N/4, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*3*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 3*N/4, N);

It does not work as follows:

DSPF_sp_fftSPxSP(N, x_sp, w_sp, y_sp, brev, N/4, 0, N);
//stage two
DSPF_sp_fftSPxSP(N/4, &x_sp[0], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 0, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*1*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 1*N/4, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*2*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 2*N/4, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*3*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 3*N/4, N);

Yours,
James

0 Jarosław Wojtuń over 10 years ago in reply to James Wang

Prodigy 80 points

Hello

I have the same problem.

Has anybody know if it possible multipass FFT??
If yes, please say how.

Single pass works correctly:
DSPF_sp_fftSPxSP(N, input, w, output, brev, 4, 0, N);//N=1024
and I have 12815 clock cycle on OMAP L-137

When I use Multi pass:
DSPF_sp_fftSPxSP_cn( 1024, input,   w,     output, brev, 256, 0,   1024 );//NARURAL C

// stage two
DSPF_sp_fftSPxSP( 256, input,   w+2*768, output, brev, 4,   0,   1024 );
DSPF_sp_fftSPxSP( 256, input+2*256, w+2*768, output, brev, 4,   256, 1024 );
DSPF_sp_fftSPxSP( 256, input+2*512, w+2*768, output, brev, 4,   512, 1024 );
DSPF_sp_fftSPxSP( 256, input+2*768, w+2*768, output, brev, 4,   768, 1024 );

I have correctly answer but clock cycle was 133283

Yours,

Jarek

Processors

Processors forum

DSPF_sp_fftSPXSP() multi-pass implementation problem