Hi,
I'm using the Device Cycle Accurate simulator (little endian) for the C6747 DSP.
I use the profiler to bench the fixed-point FFT/IFFT of the DSP C64x+ DSP LIB.
With a 64 points FFT, I get : 182 cycles for the DSP_fft16x16 and 158 cycles for DSP_ifft16x16.
But when I look at the spruec5.pdf (TMS320C64x+ DSP Big-Endian DSP Library Programmer’s Reference), the benchmarks formula for FFT/IFFT is (6 * nx/8 + 19) * ceil[log4(nx) - 1] + 8*nx/8 + 30 cycles. For a 64 points FFT, I should get 224 cycles. In the sprueb8b.pdf (TMS320C64x+ DSP Little-Endian DSP Library Programmer’s Reference) the benchmarks are given in a table with 242cycles for DSP_fft16x16 (case SA assembly implementation)
I link my code with the dsplib.a64P (use the DSP_fft16x16_sa.sa). My data are 8bytes aligned and the code is in internal memory (no L2 cache).
I'm suprise to get a better result in the profiler. What could be wrong? Do the results/formula depend on the target architecture (parallele execution capabilities) and compilation options?
Regards.
Laurent.