This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Clock cycle details for FFT computation on LCDK: TMS320C6748

Other Parts Discussed in Thread: TMS320C6748

Hi... I'm making use of LCDK: TMS320C6748, it has a C6748 DSP in it.

I'm in need of a very fast FFT implementation, so rather than using my implementation, I thought of using well optimized FFT implementation in C674x DSPLIB. In the benchmarks section of this PDF spru657.pdf(Manual for TMS320C6000 DSP Library) it says that a call for the function DSPF_sp_fftSPxSP(...) and for a 64 Point FFT it takes only 598 clock cycles:

Benchmarks

cycles = 3 * ceil(log4(N)-1) * N + 21 * ceil(log4(N)-1) + 2*N + 44

e.g., N = 1024, cycles = 14464
e.g., N = 512, cycles = 7296
e.g., N = 256, cycles = 2923
e.g., N = 128, cycles = 1515
e.g., N = 64, cycles = 598
Code size
(in bytes)
1440

But when I make a call for  DSPF_sp_fftSPxSP(...)  in my example I find that for a 64 point FFT, the CCS Clock tool, measures a whooping 23,926 CPU cycles. Why is this disparity? Am I missing something?

Can anyone kindly explain how I can achieve a 64 Point FFT which uses least number of clock cycles possible or at least at 598 clock cycles as given in the manual?

Thank you.

Vikram

  • Vikram,

    How are you measuring the execution time? The TSC counter is the most reliable, capturing its value before and after the routine runs. Search the forum for TSCL and TSCH to find some discussions on its use.

    Do you have all L1 cache turned on?

    Is your program and data in L2 SRAM?

    What numbers do you get if you run the routine several times in a row? This will have the cache in different states for each run, cold and warm and filled.

    To get best possible speed, turn off all the cache and make your test program code all reside in L1P SRAM and all your data in L1D SRAM. See how the numbers are then.

    The document you are quoting is for the C67x family, so the numbers may not be exact for the C674x family; they could be better or worse, but they should be close. Much closer than what you are seeing.

    Regards,
    RandyP

  • Hi Randy,

    I'm working step-by-step with what you have suggested.

    I have come across a problem with TSCL and TSCH registers, they both are zeros all the time when my program is running. Any ideas if I'm missing something?

    Vikram

  • Hi Vikram,

    possibly the timer isn't yet started?

    See from here:

      // start time stamp counter with a dummy-write
      // runs forever once started
      TSCL=0;

    Regards,
    Joern.