Hello,
I just reviewed NEON \ VFP benchmark ( http://processors.wiki.ti.com/index.php/StarterWare_NeonVFP_Benchmark ) and it looks very strange - neon and vfp execution times are identical.
Can you please provide some comments about how to understand these results?
For example, the following routine:
static void _VectorFloatMultiply(float *vectorA, float *vectorB, float *result)
{
unsigned int index = 0u;
for(index = 0; index < VECTOR_SIZE; index++)
{
result[index] = vectorA[index] * vectorB[index];
}
}
is called 100000 times with VECTOR_SIZE = 200
so it is 20 MMACS, and benchmark result is about 1 second.
It looks like something is wrong, isn't it?