I am working on optimizing the code for FFT algorithm using NEON of ARM. I am running Beagle Board xM as target. I am running my program without any operating system on the board(Running program directly on the board). The board is supposed to be run at 1Ghz, I am not where operating near to that frequency. Currently I am facing difficulties regarding basic understanding of NEON. Anyone please help me with the things.
The following are sample programs I ran. LOOP CODE:
Loop Unrolled code:
The following are the results I ran for different frequencies

The above does not make any sense, Different cycles per instructions at different frequencies.?