Other Parts Discussed in Thread: C2000WARE
I need example code for 1024-point FFT (both directions) and need to evaluate if there is enough CPU power for my product's dsp requirements, before settling on the proper part for production.
TI literature suggests using VCU0 for FFT will yield speed-up by factor of 4x or 5x, but a counter example in a related post appeared to dash those hopes.
Q1) To clarify something, if I use C28x VCU for FFT, is the required format to use IQmath, or is that just your generic terminology for fixed-point math?
Q2) Since I start with real data, process then iFFT back to real data, would the RealFFT be most efficient or try doing the ComplexFFT on 2-sets of data ... ?
Q3) More confusion: is the CLA_FFT1024 library call the same as using the VCU or an actual CLA block?
Q4) If I can use 16-bit IQmath with VCU0, then why shouldn't it be faster than on FPU?
Q5) Given the VCU0 has a 3-deep (?) pipeline, would it be more efficient to use 5-stages of 4-point FFTs, instead of 10-stages of 2-point?
Q6) Where can I get code for 1024-point FFT which uses C28x+VCU0+TMU)?
Q7) Or, am I stuck using 32-bit FPU, and if so, where can I get example code for 1024-poing FFT (C28x+FPU)? Actually both would be better so I can compare.
Q8) Why does example code in library only go to 256/512-point FFTs? Is a 1024-point FFT not suitable for a C28x+VCU0 math or HW ?
Q9) I assume to get best performance, I'd need to put FFT routines in RAM. Can you suggest a good example how to do this?
thanks, Dan