This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

6474&6670 dsplib fft function performance comparison

Hey,

I have been trying to migrate a large-scale project from 6474 to 6670 and I have been doing measurements to estimate the gain in performance due to the chipset change. However, the FFT library results in with very close results (it takes nearly the same time for both DSPs to calculate FFTs) for both 6474 and 6670. I have been wondering if this is expected, or I'm doing something wrong.

ilke

  • Ilke,

    I don't believe this should be expected.  Is this C code?  or Assembly?  How is the original code optimized?  (i.e. how close to theoretical throughput are you getting).  Does it use intrinsics?  Have you updated the intrinsics to use the 66x specific instructions?

    Regards,

    Dan

  • Dan,

    I am using fft32x32 function in dsplib that is provided for 6670. I have assumed the source code for the library is already optimized for 6670; is this not the case? 

    Regards,

    ilke

  • Ilke,

    You hadn't mentioned that this was dsplib code.  Yes, this code should be expected to be optimized already for the 6670.  Are you getting your performance numbers specifically from looking at the cycles consumed by the function kernel?  Or is this an overall number for the entire application. 

    You're linking the dsplib binaries, right?  Or are you using the source code?  I don't suspect that it should matter, but if you aren't using the libs, you want to make sure that your optimization level (at least for those files) is set to -o2 or preferably -o3.  I suspect that you've probably already done that. 

    Also, where is the FFT data?  I'm assuming that it's in internal memory.  How big are the FFTs that you are using?  I would expect a larger performance increase for a larger fft. Also, be sure that the compiler is building with specifically for the C66x.  If you've chosen C6670 as your project, this should be taken care of for you.

    I will try to generate a small example and do the analysis on both libs to see if I see a difference.  

    Regards,

    Dan

  • Dan,

    I'm getting the performance numbers from looking at the cycles passed by the dsplib function. That's what intrigued me, if it were the entire application, then it might have been code-specific but this is a library function.

    I am linking the dsp libraries, so the optimization is out of the question I believe since the libraries are pre-compiled. But I have tried with higher optimization levels also just to be sure.

    The FFT data is in the internal memory, and the FFT size is 1024. I have also selected 6670 as the device in the compiler options.

    I would be grateful if you analyze this case, looking forward to hearing your results.

    Regards,

    ilke