Hi!
Recently I started to use C6Run utility. So, I had very strange results,
when I built included examples. It looks like performance of Arm core is
mach faster then DSP core, around 10 times faster (for Fourier
transform). How It can be happened? I want to understand - whats wrong
in my actions? Where the bottleneck? Should I modify some configure
files, or use any keys during compillation? Maybe increase memofy size
for DSP?
I use last C6Run, as well as other recommended dependent tools. Processor - OMAP3530, Linux distr - Angstrom on Tsunami Board from Technexion.
Below - results of using example (here - c6runlib, for c6runapp actually
same results).
root@taodemo:~/TAO_INSTALL/examples/c6runlib/emqbit# ./cfft_arm
N=16,nTimes=100: 0.001342 s
N=32,nTimes=100: 0.002167 s
N=64,nTimes=100: 0.005249 s
N=128,nTimes=100: 0.012237 s
N=256,nTimes=100: 0.027558 s
N=512,nTimes=100: 0.062409 s
N=1024,nTimes=100: 0.138458 s
N=2048,nTimes=100: 0.307709 s
N=4096,nTimes=100: 0.675507 s
N=8192,nTimes=100: 1.4874 s
N=16384,nTimes=100: 3.2832 s
root@taodemo:~/TAO_INSTALL/examples/c6runlib/emqbit# ./cfft_dsp
N=16,nTimes=100: 0.084748 s
N=32,nTimes=100: 0.096069 s
N=64,nTimes=100: 0.120972 s
N=128,nTimes=100: 0.180298 s
N=256,nTimes=100: 0.317017 s
N=512,nTimes=100: 0.622894 s
N=1024,nTimes=100: 1.30252 s
N=2048,nTimes=100: 2.79202 s
N=4096,nTimes=100: 6.03702 s
N=8192,nTimes=100: 13.1281 s
N=16384,nTimes=100: 28.6032 s