This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Real FFT BenchMarking..

Expert 2370 points

Hi,

I am wondering about the benchmarking for 1024 point FFT given at the following wiki page for real FFT.

http://processors.wiki.ti.com/index.php/Efficient_FFT_Computation_of_Real_Input#Computing_a_Length_N.2F2_Complex_FFT_from_a_Length_N_Real_Input_Sequence

Have you counted the clock cycle for DSPF_sp_fftSPxSP and FFT_Split functions OR How this benchmarking had been made ?

Your quick reply will be highly appreciated.

Regards,

BAS

 

  • BAS,

    After looking at the main.c file, located here: http://processors.wiki.ti.com/images/f/f1/Real_fft_demo_c674x.zip

    It looks like the particular FFT function is benchmarked by cycle counts, definted by:

    printf("Clocks for %d Complex FFT = %d\n", N, (stop-start)-overhead);

    So, this entails the processor time between start = clock(); and stop = clock();

    "The clock() function returns the processor time since the program started, or -1 if that information is unavailable. To convert the return value to seconds, divide it by CLOCKS_PER_SEC."

     

    In the case of the ComplexFFT, yes it is just DSPF_sp_fftSPxSP ().

    In the case of the RealFFT, it is both DSPF_sp_fftSPxSP() and FFT_Split().

  • Hi Michael,

    I computed real FFT but clock for both start and stop variables are zero. Following message is printing after the execution of program.

    Clocks for 512 Real FFT using N/2 Complex FFT= 0

    What could be wrong ?

     

  • I would try printing start, stop, and overhead separately. That way you can see for yourself why that printout is being evaluated to 0. That should narrow it down to the root issue.

  • Should the clock from Profile tab be enabled for the clock() function to work ?

    I enabled clock from the profile and getting the following clock count using the printf command.

    Clocks for 1024 Real FFT using N/2 Complex FFT= 1225498

    N = 1024 in my code but the clock count seem way too much comparing to the wiki page.

    Help required to get the same profiling.

    Thanks.

  • The benchmark cycle counts were obtained using the C674 cycle-accurate simulator, which treats all memory access as if it were L1 RAM (i.e. instant access).  If you are running the algorithm with data buffers in external memory (i.e. DDR), then you will generally see higher cycle counts.  You can mitigate this in two different ways:

    1. Move the data buffers to internal memory
    2. Enable cache for the external memory containing the data buffers and/or application text

    Using these methods, you should be able to achieve performance similar to the benchmarks.

    Hope this helps.

  • Joe,

    I have moved data and code to IRAM section. I have also enabled L2 Cache by enabling Configure L2 Memory Settings and chosen 4-way cache for L2 Mode and 0x0001 for L2 MAR0-15 from Global Setting Properties of DSP BIOS.

    I used clock option under Profile tab. Now I am getting 39,631 Clock Cycles.

    Benchmark for 512 point Real FFT is 3181 Cycle Count on wiki page.

    You can also see part of my .cmd file.

    /* MODULE MEM */
    -stack 0x400
    MEMORY {
       IRAM        : origin = 0x0,         len = 0x30000
       CACHE_L2    : origin = 0x30000,     len = 0x10000
       SDRAM       : origin = 0x80000000,  len = 0x1000000
    }

    Need your help to improve that benchmark.

    BAS

  • BAS,

    Have you enabled L1 cache?  If the code and data is in L2RAM, then L1 cache can still improve performance.  Also, what DSP device are you using?  I want to double check that your memory map is setup correctly.

  • Joe,

    I am using C6713 DSK. I could be wrong but I think L1 is always enabled for c671x devices. I only enabled L2 memory setting. You can see all the settings of Memory in the snapshot given below.

    http://i52.tinypic.com/208ivdg.jpg

    I more thing I realized in the DSP BIOS setting, In Global Setting Properties, DSP Speed in MHz (CLKOUT) is 50.000, does it mean that my DSP is not running at 225 MHz ?

    Waiting for you quick help.

    BAS

     

  • BAS,

    Typically the clock speed setting in DSP/BIOS is just used to translate cycle or timer counts into real time units like milliseconds.  It is not actually used to program the device PLL.  If you're only looking at cycle counts, then this setting shouldn't affect you.

    One thing to note is that the benchmarks on the real FFT wiki page were measured using the C674x DSPLIB, which gives performance that's not available on C6713.  The newer library on the C674 DSP core would give you about a 2x boost in performance for this algorithm.

  • Joe,

    First I tried with C674x DSPLIB but it didn't support C6713 DSK. Therefore, I tried with C67x DSPLIB.

    Is my current setting as shown in the snapshot OK ?

    Any thing else which can be done to improve the clock count based on C6713 DSK ?

    Thanks.

  • BAS,

    If your program text and data buffers are all placed inside L2RAM, then turning on L2 cache will have no effect.  However, you may need to set the MAR bits to enable caching of L2 memory (i.e. using L1 cache).  I don't think this should be necessary, but I'm not 100% sure.  To test it, you can try setting the MAR bits (in the global settings page of your BIOS confiiguration utility) to all 0xFFFFFFFF.

  • Joe,

    Every thing is placed inside IRAM. I tested setting MAR bits to all 0xFFFF but no effect, still the same 40,000 cycle count.

    Now what ?

     

     

  • BAS,

    Are you timing the entire real FFT process or just the call to the DSPLIB FFT function?  If it's the former, then you may see a significant performance boost by simply rebuilding your application with optimization (i.e. remove -g option from compiler input and add -o3).

  • Joe,

    I tried both ways to profile the code. I tried with you suggestion (remove -g option from compiler input and add -o3) and now I am getting 32,452 clock cycle for DSPLIB FFT function, little bit improved but not much.

    I want to know how much improvement can we get for Real FFT Code using C6713 DSK ?

    Inside main.c, it says Once w(twiddles), A, B Tables are generated they can be saved as .const and need not be generated every time.  Does it mean to save variables A, B and w so that they don't generate every time ? If I do it successfully, I am sure I can cut down more clock cycle to improve the algorithm.

    I don't know how to save them so please help me with this as well.

    Appreciate your help.

    Regards.

     

     

     

     

     

  • Joe,

    still waiting for your reply...

  • Joe,

    Will you reply to this post ? My project is still pending..

  • BAS,

    Sorry for my delay in replying.  You should definitely be able to save some time by not re-calculating the w, A, and B tables every time you run your application.  Here's an example of how you can generate these tables in a separate application:

    float w[1024], A[1024], B[1024];
    int i;

    // generate twiddle factor, A, and B arrays
    // (not shown)

    printf("twiddle factor array:\r\nconst float w[1024] = {\r\n");
    for (i = 0; i < 1024; i+=2)
        printf("    %f, %f,\r\n", w[i], w[i + 1]);
    printf("};\r\n\r\n");

    printf("A coef array:\r\nconst float A[1024] = {\r\n");
    for (i = 0; i < 1024; i+=2)
        printf("     %f, %f,\r\n", A[i], A[i + 1]);
    printf("};\r\n\r\n");

    printf("B coef array:\r\nconst float B[1024] = {\r\n");
    for (i = 0; i < 1024; i+=2)
        printf("     %f, %f,\r\n", B[i], B[i + 1]);
    printf("};\r\n\r\n");

    This code will print your pre-generated arrays to the console.  You can then copy those array definitions to your "real" application source code and use the pre-generated arrays values instead of calculating them at run time.  Note that the above assumes your FFT size is 512.

    Hope this helps.

  • Joe,

    Thanks for your help.

    I generated the arrays using the code you written and saved them in a .h file and added to the project. I am getting the following errors probably because of the word const used with w, A and B.

    line 151: error: argument of type "const float *" is incompatible with parameter of type "float *"'

    Line 151 has the following call: FFT_Split (N / 2, pTemp, A, B, pRFFT_Out);

    I removed the word const and used extern instead, code complies fine but not working and showing the following message.

    1024 Value of N is not supported

    This message is generated because rad and nTemp shows strange values and code gets into the portion where that print message has been written.

    In the main.c file, I did the following things,

    #include "CONST.h" // saved w, A and B arrays..

    float w[N];
    float A[N];
    float B[N];

    disabled these functions,

        //tw_gen (w, N / 2);
        //split_gen (A, B, N / 2);

    and calling only these ones,

       twiddle = (float *) w;

       DSPF_sp_fftSPxSP

       FFT_Split

    I am sorry but I still need your help.

    Thanks.

     

  • BAS,

    That's a strange error message to encounter.  It should not be possible for that message to be displayed unless you changed the code near the top of the Real_FFT() function that determines what radix to apply to the DSPF_sp_fftSPxSP() function.  Please check to make sure that you still have this block of code near the beginning of Real_FFT():

    int i, rad, nTemp = N / 2;
    float *twiddle;
    int start, stop, overhead;

    if (nTemp == 16 || nTemp == 64 || nTemp == 256 || nTemp == 1024 || nTemp == 4096 || nTemp == 16384)
        rad = 4;
    else if (nTemp == 8 || nTemp == 32 || nTemp == 128 || nTemp == 512 || nTemp == 2048 || nTemp == 8192)
        rad = 2;
    else
    {
        printf ("%d Value of N is not supported \n", N);
        exit (0);
    }

    If this code is in place, then it should be impossible to get the error message that you describe.  I would recommend stepping through code to see if you can tell how you end up at this printf() call.  If nTemp is being overwritten, did you inset some additional code before the if/else block (above)?

    Hope this helps.

  • Joe,

    I am surprised myself. I haven't made any changes to the code. Stepping through the code disclosed that after executing DSPF_sp_fftSPxSP() function value of nTemp changes, therefore, during the next turn it goes inside the function and end up at printf() call.

    Yes there are some codes before if/else block but that is simply copying data from ADC to DSP momory. I ran the code after disabling those lines but still getting the same printf() message.






  • BAS,

    If calling the FFT function causes nTemp to change, that could indicate that it is overwriting memory outside of the input and output buffers, which would be a problem.  Are your input and output buffers properly aligned?  I believe that these functions require that the data buffers (including w and brev) be aligned by 8 bytes.  You can do this with the DATA_ALIGN pragma.  For example, here's how you could align an input data buffer:

    #pragma DATA_ALIGN(x_buf, 8);
    float x_buf[N] = { ... };

  • Joe,

    Data buffers, w and brev are aligned by 8 bytes.

    I solved the problem, my code is working fine by enabling the lines given below.

    #pragma DATA_ALIGN(pRFFT_InvOut, 8);
    float pRFFT_InvOut[N];

    I am performing just FFT not IFFT so I disabled those lines but after enabling them, nTemp is not changing.

    How enabling those lines could solve my problem when I am not using pRFFT_InvOut variable at all ?

    Looking forward to clarify my doubts.

  • BAS,

    If you're not calling the IFFT function, then presumably the alignment of your IFFT output buffer shouldn't matter.  Are you sure that you aren't using this buffer for some other purpose?  One other possibility is that the alignment of this buffer indirectly sets the alignment of some other buffer or variable.  You can examine any effect of this nature by comparing the .map file generated with and without this buffer being properly aligned.  Take a look at other symbols that may change position with this change to see whether any of them seems more relevant.

    Regardless, I'm glad that your application is working correctly.

  • Joe,

    I looked at the symbols and they are changing their address with or without that IFFT buffer alignmenmt. I tried to make new project and in here it is working fine with out IFFT alignment. I am happy that it is working now but I want to know what could be the possible cause for such kind of behaviour ?

    Thanks.

  • BAS,

    The problem is likely due to some buffer overrun somewhere in your application.  This could be caused by misaligned data buffers (especially buffers used by the FFT and IFFT functions), or it could be some other part of your application entirely.  This sort of problem is often difficult to debug.  If you want to get to the bottom of it, I recommend the following procedure:

    1. Revert your application to a state where the variables in question are being overwritten with bad values
    2. Reload/restart the application
    3. Put the variables in question in a watch window or memory window (you want to be able to view the values no matter where the program counter is)
    4. Step through the application until the "bad" values appear in memory
    5. The program's previous instruction (function call, etc.) is likely the culprit

    This can take a while depending on the complexity of your application, but it's the only way to figure out what is causing the problem.

    Hope this helps.

  • I appreciate your help Joe.

    I will try to see if those debugging techniques get me to the core of the problem.

    Regards.