Starterware benchmarking

gombold

I ran the trivial NEON/VFP benchmark (as seen here) on a beaglebone, both as a Linux user process and a simple StarterWare program. The StarterWare program uses similar MMU startup code to the other SW examples. I added some code to initialize the FPU. Using gcc for both, version 2011.09-69/arm-none-gnueabi and 2010q1-202/arm-none-linux-gnueabi, similar CFLAGS: -O3 -mcpu=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp

The StarterWare code takes about 2.1x as long to run the benchmark. (The Linux time is consistent with the table and 720MHz mpu rate, ~0.79s). Replacing the float computations with ints yields a similar result (2.5x slower on SW). Coding the multiply as a vmulq_f32 intrinsic helped a little, but it helped the linux version too, by about the same amount.

Hoping a StarterWare expert can chime in and help me here, I'm sure I've overlooked something as I'm quite new to all this.

Thanks all,

over 13 years ago

Processors

Processors forum

Starterware benchmarking