My customer has implemented a 4096 point real FFT and then wanted to check the execution time. Here is the scenario:
"I have written FFT code putting the input & output buffers and twiddle
factors in External Memory. I just now moved it all into fast on-chip RAM,
combining RAML4&5 for input buffer, RAML6&7 for output buffer, and moving
the previous stuff in RAML5 to RAML2. I left Twiddle factors and window
functions in Flash. I was able to get exactly the same result.
Now I want to compare the time taken by the 2 methods.
I used GPIO32 and measured the time between pulses on the
oscilloscope.
Doing a 4096 point real FFT using only on-chip memory took 3.65 ms without
any window function, 4.15 ms with window multiply.
If the input / output buffers are in external memory, with window function
it took 15.75 ms.
"
How come the same FFT code takes longer with external (fast) RAM compared to on-chip RAM? Though XINTF bus is slower the difference in the timings is quite large.
Any thoughts/feedback will be appreciated.
Regards,
Pradeep Shinde
DCAT, Dallas