Hi,
I'm using the DSP on the OMAP3530 for FFT calculations (C64x+DSPLIB). Due to the large FFT size (32K points), it is not possible to perform this calculation in dsp internal memory. Thus, input, output, and twiddle factor buffers are located in external ram allocated with CMEM. Timing analysis shows ~60ms to compute this FFT on the DSP with data in external memory. A comparison was made using an FFT found in FFMPEG, optimized for the ARM/Neon- timing for this resulted in 6.9ms to compute the same FFT (32-bit float for Neon, 16-bit int for dsp). My initial guess for relatively poor performance on the dsp is the need to access external memory, but given that the Neon is also acting on data in external memory, why the large difference in performance?
Thanks,
Rick