This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C5535: HWAFFT and 32-bit FFTs

Part Number: TMS320C5535

I am developing a new audio product for which we would like to use the HWAFFT hardware on the C5515 series DSP.

However I have found that the scaling options are such that the S/N ratio of the FFT output is poor. This seems to stem from the fact that the FFT increases the dynamic range of the data. ie If you input full scale 16-bit data the FFT will overflow, causing corruption. To avoid this there is an option to divide the butterfly output by 2 for each processing stage. However for my 1024 point FFT this results in 30dB of added noise, thereby destroying S/N.

To overcome this I have tried using the 32-bit FFT provided in the DSP-LIB. This works very nicely. However it is comparatively very slow to execute.

So my question is: Is it possible to use the HWAFFT hardware to speed up a 32-bit FFT? 

Any advise is GREATLY appreciated!!

  • Hi John,

    I've forwarded this to the c55x FFT experts. Their feedback should be posted here.

    BR
    Tsvetolin Shulev
  • Great. Thanks very much!

  • Hi Tsvetolin, Any progress on this? I'd really appreciate some support on this.

  • John,

    I'll ping the experts.

    BR
    Tsvetolin Shulev
  • Hi John,

    Sorry for the wait..

    The HWAFFT can certainly improve execution time of any FFT computation (ie DSPLIB) that is executing on the CPU core alone.
    The CPU and HWAFFT accelerator can both be used (not in parallel) to compute an FFT. For example, you could compute a 1024-pt FFT by breaking it down into several 32-pt FFTs that run on the HWAFFT. This approach is more common with FFTs larger than 1024-pt, since 1024-pt is the largest FFT the HWAFFT can compute.

    The HWAFFT implementation of the 32-bit 32-pt (edit) FFT uses two double stages and one single stage of computation.
    If the simple divide-by-two scaling after each stage is insufficient, you can add your own dynamic scaling code in between stages. This would require scanning the data arrays to see if scaling is necessary and scaling only if necessary while keeping track of the scaling applied. There may be strategic places where dynamic scaling makes sense... For added granularity, the FFT can be rewritten to use all or more single stages instead of back-to-back double-stages.

    Scanning the arrays is a computationally expensive routine, which is why the divide-by-two scaling was chosen (trades off computation for SNR).
    One down side to modifying the HWAFFT code is that it will no longer execute from ROM. The modified code runs out of RAM instead, but it isnt huge.

    Alternately, you could pre-scale the input signal by FFT_LENGTH and disable scaling to make sure it will not overflow. This will also impact SNR on a fixed-point processor like the C5000. Ultimately the decision to scale depends on knowledge of the input signal...

    If you like, I can put you in contact with a 3rd party that has modified the HWAFFT with dynamic scaling.

    Find the code here: http://www.ti.com/lit/an/sprabb6b/sprabb6b.zip
    I'm sure you've read the HWAFFT appnote (or TRM chapter) already: http://www.ti.com/lit/an/sprabb6b/sprabb6b.pdf

    Hope this helps,
    Mark

  • Hi Mark,

    Thanks for your response. You mention the 'HWAFFT implementation of the 32-bit FFT...'. However as far as i can see this hasn't been implemented - there is certainly no mention of it in the appnote.

    I think perhaps you are confusing it with the 32 point (/bin) FFT?

    I am interested in 32-bit FFT. ie I would like the FFT result to have 32 bits of resolution (or 64 bits if you include the imaginary part) rather than the usual 16 bits.The DSPLIB implements this in function cfft32_NOSCALE. This function works great, but is very slow. Is it possible to speed this up with the HWAFFT?

    Your help is greatly appreciated!

    Best

    John

  • Hi John,

    Yep - I meant 32-pt (or 32 bin), not 32-bit.. Sorry for the confusion.

    The HWAFFT cannot accept data input or twiddle greater than 16-bit real and 16-bit imaginary parts.

    I tried to find some trick to allow it, but was unable.

    32-bit data types are used for intermediate multiply outputs in the butterflies, but the data is rounded back to 16-bit after the adds.

    I think your best chance is to use the 16-bit HWAFFT with intelligent dynamic scaling, checking if data has surpassed a threshold and scaling if necessary.

    Is your application low power? If not, consider a floating point processor or one with a higher clock frequency, like the C6000. The C55xx is a native 16-bit machine, so it is can be slow to compute 32-bit numbers.

    DSPLIB is already hand-optimized assembly, but it may be worth trying to further optimize through proper memory placement - make sure you are leveraging the DARAM where instructions can utilize it.

    Hope this helps,
    Mark
  • Thanks Mark, not what i hoped for, but helpful to have this confirmed!

    Best

    John