This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM4379: Sitara processors FLOPS performance

Part Number: AM4379
Other Parts Discussed in Thread: AM3358, AM5708,

Hello,

I am trying to compare the Float performance (FLOPS) of AM3358, AM4379 and AM5708, but it's not very clear for me.

Could someone please share the FLOP per Clock clycle of these CPUs ?

What is the performance diference using VFP and NEON ?

Thanks.

Best regards,

Julhio.

  • The factory team have been notified. They will respond here.
  • Hi Julhio,
    Benchmarks for our devices running Processor SDK Linux can be found here: processors.wiki.ti.com/.../Processor_SDK_Linux_Kernel_Performance_Guide

    The linpack benchmark measures the float point performance you are looking for.
  • Thanks for your answer.

    I verified the benchmark, but I didn't find the answer about my doubts.

    Does anyone know who may help me with this ?

    Thanks
  • What are your doubts exactly? Is it the VFP versus Neon? That particular point shouldn't matter as the gcc compiler automatically optimizes for you. Plus they are different co-processors for different tasks. NEON is SIMD for parallel calculations while VFP is just a single float-point unit. 

    A few additional details: from the benchmark wiki I provided, you can assume the AM335x EVM runs at 800MHz, the AM437x EVM is 1000MHz, and the AM572x is dual-core at 1500MHz. You can divide those out to find FLOP performance per clock cycle. I forgot to mention, the Whetstone benchmark also measures float-point performance but it is single threaded so don't divide by two on that one. 

  • Thanks for the explanation.

    My doubt is,  for example, in the benchmark below, there is a resolt in MFLOPS on NEON benchmark:

    With this information I can simply compare, for example the raspberry with a TI DSP (not considerating different application and hardware from each other), that has the float performance provided in FLOPS.

    About the FLOAT performance, it isn't clear for me how to convert Whetstone benchmark in FLOPS. Can you help me with this ?

    Thank you !

  • We don't have any benchmarks that test solely the NEON. You should just look at the Linpack benchmark for quick compare. From the link you provided I see NBench and Linpack tests run on the RPi and the wiki I provided has benchmark results for those.

    Linpack test from your link
    Pi 2 - 299.93 MFLOPS (quad A7 @ 900 MHz)
    Pi 3 - 462.07 MFLOPS (quad A53 @ 1200 GHz)

    Linpack from the Wiki I provided
    AM335x - 57.22 MFLOPS (single A8 @ 800 MHz)
    AM437x - 137.33 MFLOPS (single A9 @ 1000 MHz)
    AM57xx - 686.67 MFLOPS (dual A15 @ 1500 MHz)

    It does not look like we have comparable benchmarks for the C66x DSP. ARM Cortex cores are generally the same in any SoC and memory bandwidth is where you will find differences due to what interconnect is built around the core. If you can find NEON benchmarks for Cortex cores in any other SoC then you can expect the same performance in ours.
  • This comparison test using mostly Android includes a direct comparison between A9 and A15 with and without NEON in some floating point tests. Looks like A9 runs about twice as fast per CPU and A15 runs about four times as fast per CPU when using NEON. Look at the second table.

           

  • Sorry, I have not seen the Linpack test.
    I understood now.
    Thank you !
  • There is a considerable difference, and leaves the processor free for another tasks.