This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Finding the right High Performance FFT processor

Other Parts Discussed in Thread: TMS320C5517, TCI6638K2K, TMS320C5505, TCI6630K2L

I am designing a high end audio calibration device, something like an active equalizer. It will utilize real-time FFT analysis. I prefer an FFT processor capable of 9600 data points, but I have not seen anything over 8192 (TCI6638K2K, discontinued and too slow). I would prefer it to process all data points within 1usec, but no more than 2.5usec. I would prefer it to be the simplest possible to implement, and as standalone as possible. I have so far been referenced to the TMS320C5505 and TMS320C5517, both of which are limited to thte range of 8-1024 data points. The TCI6630K2L has also been mentioned, but is in pre-production and from what I can gather from the datasheet, also holds only 1024 datapoints. I am open to alternative ways to process the information. Any help is appreciated.

  • Hi,

    We're looking into this. Feedback will be posted here.

    Best Regards,
    Yordan
  • It has been nearly a month, has anyone found any information yet?

  • From a theory point of view, assuming you are working with real-valued input (because you say your application is audio-like), a 9600-point FFT requires on the order of 2.5 * 9600 * log2(9600) = 317,492 arithmetic operations. To do that every microsecond, the hardware needs ~317.5 GOPs of (achieved) throughput, which is more than the nameplate throughput of any DSP currently listed at www.ti.com/.../overview.html. For the looser 2.5 microsecond timing, it would still be extremely challenging (or perhaps impossible) due to inter-core communications delays.

    If the input is related to itself, optimizations are possible. For example, if successive FFTs almost entirely overlap each other, the problem becomes enormously easier by using a sliding window FFT algorithm (also called the sliding DFT).

    When you say you want to have the FFT complete within 1 (or 2.5) usec, do you mean latency for each FFT, or average throughput with potentially multiple FFTs being calculated in parallel? Single-FFT latency under a microsecond (for N=9600 and non-sliding-window FFTs) would be challenging or impossible on most types of commercial hardware -- I would look towards high-end CPUs or very custom FPGA designs for that. Several unrelated FFTs running in parallel could be achieved on a broader selection of platforms.