Dear all,
Using the BSP 2.2 from LogicPD, we've executed the c6run example named cfft, with the following results:
DM-37x# ./cfft_arm
N=16,nTimes=100: 0.001007 s
N=32,nTimes=100: 0.002045 s
N=64,nTimes=100: 0.004791 s
N=128,nTimes=100: 0.010772 s
N=256,nTimes=100: 0.024628 s
N=512,nTimes=100: 0.055725 s
N=1024,nTimes=100: 0.124604 s
N=2048,nTimes=100: 0.272583 s
N=4096,nTimes=100: 0.596222 s
N=8192,nTimes=100: 1.29132 s
N=16384,nTimes=100: 2.74329 s
DM-37x# ./cfft_dsp
N=16,nTimes=100: 0.126648 s
N=32,nTimes=100: 0.14206 s
N=64,nTimes=100: 0.177978 s
N=128,nTimes=100: 0.260376 s
N=256,nTimes=100: 0.451263 s
N=512,nTimes=100: 0.872955 s
N=1024,nTimes=100: 1.81073 s
N=2048,nTimes=100: 3.87048 s
N=4096,nTimes=100: 8.3595 s
N=8192,nTimes=100: 18.1759 s
N=16384,nTimes=100: 39.5378 s
We are surprised to see that ARM (with no NEON acceleration) is faster than DSP. We know that both processors are running at different speeds, but almost it was believed that the numerical performance of DSP was higher that ARM. Have you been able to reproduce this behavior? Could it be caused by a problem in software configuration?
c6run version: 0.98.03.03
dsplink: 1.65.01.05
Thanks and Best Regards,
Joaquim Duran