Hi All,
I've been trying to sort this out for a couple of days now and I'm beaten. I've got a simple suite of benchmarks called nbench ( http://www.tux.org/~mayer/linux/bmark.html ) that I'm running on the OMAP3EVM so as to compare it with some other potential processors for a new product. However, I cannot seem to get it to make use of the Neon coprocessor! I'm using GCC 4.3.2 (CodeSourcery 2008q3).
These are my current compiler flags:
CFLAGS = -s -save-temps -static -Wall -O3 -march=armv7-a -mtune=cortex-a8 -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -ftree-vectorize -fomit-frame-pointer -ffast-math
The annoying thing is that, in one function in one file, I do a float * float multiply and there is a vmul.f32 in the generated assembler. However for other float * float multiplies in the same file it has just used fmul!
I'm sure this chip has more floating-point power than it's demonstrating at the moment, but if I can't demonstrate it then we can't really take it seriously. Can anyone offer any hints as to where I'm going wrong? Does GCC only optimize certain types of multiply or certain types of variables?
Thanks in advance for any assistance,
--
Olly