This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

StarterWare NeonVFP Benchmark

Hello,

I just reviewed NEON \ VFP benchmark ( http://processors.wiki.ti.com/index.php/StarterWare_NeonVFP_Benchmark ) and  it looks very strange - neon and vfp execution times are identical.

Can you please provide some comments about how to understand these results? 

For example, the following routine:

static void _VectorFloatMultiply(float *vectorA, float *vectorB, float *result)
{
    unsigned int index = 0u;

    for(index = 0; index < VECTOR_SIZE; index++)
    {
        result[index] = vectorA[index] * vectorB[index];
    }

}

is called 100000 times with VECTOR_SIZE = 200

so it is 20 MMACS, and benchmark result is about 1 second.

It looks like something is wrong, isn't it?

 

 

 

  • Hi Sergey,

    The basic idea behind the neonVFPbenchmark example is to allow users to add their own function
    to get the performance timing metrics. The procedure to get these numbers are listed out in the below links
    http://processors.wiki.ti.com/index.php/StarterWare_NeonVFP

    The Performance numbers for GCC compiler are calculated for Float point Addition, Subtraction
    and Math Library functions written using Neon intrinsics. For float point Addition & Subtraction
    the compiler uses AutoVectorization feature to generate the Neon instructions, this might not give the
    expected performance always and varies with the compiler used.
    When Neon intrinsics functions are used, you can see the performance numbers go up significantly from the performance numbers Table.
    To get better performance it is recommended to use Neon assembly or Intrinsics instructions.
    Also it depends on the input data used, the input data for float addition & multiplication in our application might not be vectorized correctly by the compiler.

    Additionally, You can go through the following links and make the necessary changes to get better performance for Neon.
    Cortex-A8 Technical Reference Manual
    http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/index.html
        4.2.1. NEON data alignment section at the arm infoCenter.com
    NEON™ Support in Compilation Tools Development Article
        1.4.3. Optimizing for vectorization at the arm infoCenter.com
       
    Regards
    Anant Pai