I'm seeing some odd timing results comparing FFT performance between the ARM/Neon and DSP cores of the OMAP3530.
- FFT size 32768
- ARM/Neon optimized FFTMPEG timing data: 10.4ms (32-bit floating point, complex)
- DSP timing data: 60ms (16-bit, fixed point, complex)
With the large FFT size, it is not possible to place the data in internal DSP memory. Is there a logical explanation for why the Neon would appear to outperform the DSP?
Thanks.