This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Starterware benchmarking

I ran the trivial NEON/VFP benchmark (as seen here) on a beaglebone, both as a Linux user process and a simple StarterWare program.  The StarterWare program uses similar MMU startup code to the other SW examples.  I added some code to initialize the FPU.  Using gcc for both, version 2011.09-69/arm-none-gnueabi and 2010q1-202/arm-none-linux-gnueabi, similar CFLAGS:  -O3 -mcpu=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp 

The StarterWare code takes about 2.1x as long to run the benchmark.  (The Linux time is consistent with the table and 720MHz mpu rate, ~0.79s).  Replacing the float computations with ints yields a similar result (2.5x slower on SW).  Coding the multiply as a vmulq_f32 intrinsic helped a little, but it helped the linux version too, by about the same amount.

Hoping a StarterWare expert can chime in and help me here, I'm sure I've overlooked something as I'm quite new to all this.

Thanks all,

G