This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DSPF_sp_fftSPXSP() multi-pass implementation problem

Other Parts Discussed in Thread: OMAP-L138

Recently I met a problem when using  DSPF_sp_fftSPXSP() multi-pass function for OMAP-L138.

When I use the single-pass DSPF_sp_fftSPXSP(), the FFT output is correct. While I use DSPF_sp_fftSPXSP() multi-pass implementation , the result is

definitely wrong. Can anybody help me figure it out? or is there any other limits for input variables in this function? thanks a million!

BTW:

The dsplib version is: dsplib_c674x_3_4_0_0

CCS version is : 5.5.0.00077

Part of the code is as follows:

#define N 1024

#pragma DATA_ALIGN(x_ref, 8);
float x_ref[2*N];

#pragma DATA_ALIGN(x_sp, 8);
float x_sp[2*N];

#pragma DATA_ALIGN(w_sp, 8);
float w_sp[2*N];

#pragma DATA_ALIGN(y_sp, 8);
float y_sp[2*N];

unsigned char brev[64] = {
0x0, 0x20, 0x10, 0x30, 0x8, 0x28, 0x18, 0x38,
0x4, 0x24, 0x14, 0x34, 0xc, 0x2c, 0x1c, 0x3c,
0x2, 0x22, 0x12, 0x32, 0xa, 0x2a, 0x1a, 0x3a,
0x6, 0x26, 0x16, 0x36, 0xe, 0x2e, 0x1e, 0x3e,
0x1, 0x21, 0x11, 0x31, 0x9, 0x29, 0x19, 0x39,
0x5, 0x25, 0x15, 0x35, 0xd, 0x2d, 0x1d, 0x3d,
0x3, 0x23, 0x13, 0x33, 0xb, 0x2b, 0x1b, 0x3b,
0x7, 0x27, 0x17, 0x37, 0xf, 0x2f, 0x1f, 0x3f
};

/* Generate complex signal,saved in x_sp */
SigGenerator();

/* Generate twiddle factors, stored in w_sp */
gen_twiddle_fft_sp(w_sp, N);

/* first stage */
DSPF_sp_fftSPxSP(N, &x_sp[0], &w_sp[0], &y_sp[0], brev, N/4, 0, N);

/* second stage */
/* y_sp is the array for FFT output */
DSPF_sp_fftSPxSP(N/4, &x_sp[2*3*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 3*N/4, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*2*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 2*N/4, N);
DSPF_sp_fftSPxSP(N/4, &x_sp[2*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, N/4, N);

Thanks again

Yours,

James

  • Hi James,

    Thanks for your post.

    Basically for an N-point FFT, if N=256, single pass implementation would be the best choice, but if you go beyong N-512, 1024 etc., you should go for multi-pass implementation (break-up large FFT's into several sub-FFT's). For more details on this, please refer section 3.3.3.2 from the below doc.

    http://www.ti.com/lit/an/spra947a/spra947a.pdf

    Please refer the C67x DSP lib. programmer's reference guide to check the algorithm implementation in your code is compliant to the specified below doc.

    http://www.ti.com/lit/ug/spru657c/spru657c.pdf

    In the above doc, please refer DSPF_sp_fftSPxSP function in page no. 49.

    In addition to the above, please refer the below E2E threads which would give you better clarity:

    http://e2e.ti.com/support/development_tools/compiler/f/343/t/185941.aspx

    http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/274765

    http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/95526.aspx

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.

    -------------------------------------------------------------------------------------------------------

  • HI, Sivaraj , thanks for your reply.
    Unfortunately, I have read almost every thread related to DSPF_sp_fftSPxSP in the forum, but I have found no correct answers. Does DSPF_sp_fftSPxSP multi-pass function work correctly for C67X instead of C674X? if any specialist knows it, please let me know.
    Thanks a lot!
  • Hi James,

    Thanks for your update.

    Ofcourse, multi pass implementation would work and performance wise too, you would see multi pass would allow less L1D cache miss cycles & lesser cycle count compared to single pass.  However, multi pass implementation would require multiple stages and each stage would require multiple calls for the same function.  The doc. spra947a.pdf is the right way to implement the multi pass and decrementing the counter at for loop is the correct implementation for multi-pass FFT.

    =============

    n=2048,float x[2048*2],float y[2048*2],float w[2048]

    // stage one

    DSPF_sp_fftSPxSP( n, &x[0], &w[0], y, brev, n/16, 0, n );

    // stage two

    for( i=0;i<16;i++){

        DSPF_sp_fftSPxSP( n/16, &x[2*(15-i)*n/16], &w[2*n*15/16], &y[0], brev, 2, 2*(15-i)*n/16, n );

    }

    ============

    And got working with the following modifications:

    ==================

    n=2048,float x[2048*2],float y[2048*2],float w[2048]

    // stage one

    DSPF_sp_fftSPxSP( n, &x[0], &w[0], y, brev, n/16, 0, n );

    // stage two

    for( i=15;i>=0;i--){

         DSPF_sp_fftSPxSP( n/16, &x[2*(15-i)*n/16], &w[2*n*15/16], &y[0], brev, 2, 2*(15-i)*n/16, n );

    }

    ============

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.

    -------------------------------------------------------------------------------------------------------

  • Thanks Sivaraj!

    I have tried the ways you gave to above, but it still not works.

    While I take the following approach, it works correctly, so I guess there may be some limits on the input variables in the API DSPF_sp_fftSPxSP(), can anybody help me figure it out? Thanks a million.

    //stage one
    DSPF_sp_fftSPxSP_cn(N, x_sp, w_sp, y_sp, brev, N/4, 0, N);
    //stage two
    DSPF_sp_fftSPxSP(N/4, &x_sp[0], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 0, N);
    DSPF_sp_fftSPxSP(N/4, &x_sp[2*1*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 1*N/4, N);
    DSPF_sp_fftSPxSP(N/4, &x_sp[2*2*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 2*N/4, N);
    DSPF_sp_fftSPxSP(N/4, &x_sp[2*3*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 3*N/4, N);

    It does not work as follows:

    DSPF_sp_fftSPxSP(N, x_sp, w_sp, y_sp, brev, N/4, 0, N);
    //stage two
    DSPF_sp_fftSPxSP(N/4, &x_sp[0], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 0, N);
    DSPF_sp_fftSPxSP(N/4, &x_sp[2*1*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 1*N/4, N);
    DSPF_sp_fftSPxSP(N/4, &x_sp[2*2*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 2*N/4, N);
    DSPF_sp_fftSPxSP(N/4, &x_sp[2*3*N/4], &w_sp[2*3*N/4], &y_sp[0], brev, 4, 3*N/4, N);

    Yours,
    James

  • Hello

    I have the same problem.

    Has anybody know if it possible multipass FFT??
    If yes, please say how.

    Single pass works correctly:
    DSPF_sp_fftSPxSP(N, input, w, output, brev, 4, 0, N);//N=1024
    and I have 12815 clock cycle on OMAP L-137

    When I use Multi pass:
    DSPF_sp_fftSPxSP_cn( 1024, input,   w,     output, brev, 256, 0,   1024 );//NARURAL C

    // stage two
    DSPF_sp_fftSPxSP(  256, input,   w+2*768, output, brev, 4,   0,   1024 );
    DSPF_sp_fftSPxSP(  256, input+2*256, w+2*768, output, brev, 4,   256, 1024 );
    DSPF_sp_fftSPxSP(  256, input+2*512, w+2*768, output, brev, 4,   512, 1024 );
    DSPF_sp_fftSPxSP(  256, input+2*768, w+2*768, output, brev, 4,   768, 1024 );

    I have correctly answer but clock cycle was 133283

    Yours,

    Jarek