AM4379: Sitara processors FLOPS performance

Julhio

Intellectual 310 points

Part Number: AM4379
Other Parts Discussed in Thread: AM3358, AM5708,

Hello,

I am trying to compare the Float performance (FLOPS) of AM3358, AM4379 and AM5708, but it's not very clear for me.

Could someone please share the FLOP per Clock clycle of these CPUs ?

What is the performance diference using VFP and NEON ?

Thanks.

Best regards,

Julhio.

over 8 years ago

0 Biser Gatchev-XID over 8 years ago

TI__Guru**** 393215 points

The factory team have been notified. They will respond here.

0 Ahmad_Rashed over 8 years ago

TI__Genius 13567 points

Hi Julhio,
Benchmarks for our devices running Processor SDK Linux can be found here: processors.wiki.ti.com/.../Processor_SDK_Linux_Kernel_Performance_Guide

The linpack benchmark measures the float point performance you are looking for.

0 Julhio over 8 years ago in reply to Ahmad_Rashed

Intellectual 310 points

Thanks for your answer.

I verified the benchmark, but I didn't find the answer about my doubts.

Does anyone know who may help me with this ?

Thanks

0 Ahmad_Rashed over 8 years ago

TI__Genius 13567 points

What are your doubts exactly? Is it the VFP versus Neon? That particular point shouldn't matter as the gcc compiler automatically optimizes for you. Plus they are different co-processors for different tasks. NEON is SIMD for parallel calculations while VFP is just a single float-point unit.

A few additional details: from the benchmark wiki I provided, you can assume the AM335x EVM runs at 800MHz, the AM437x EVM is 1000MHz, and the AM572x is dual-core at 1500MHz. You can divide those out to find FLOP performance per clock cycle. I forgot to mention, the Whetstone benchmark also measures float-point performance but it is single threaded so don't divide by two on that one.

0 Julhio over 8 years ago in reply to Ahmad_Rashed

Intellectual 310 points

Thanks for the explanation.

My doubt is, for example, in the benchmark below, there is a resolt in MFLOPS on NEON benchmark:

With this information I can simply compare, for example the raspberry with a TI DSP (not considerating different application and hardware from each other), that has the float performance provided in FLOPS.

About the FLOAT performance, it isn't clear for me how to convert Whetstone benchmark in FLOPS. Can you help me with this ?

Thank you !

0 Ahmad_Rashed over 8 years ago in reply to Julhio

TI__Genius 13567 points

We don't have any benchmarks that test solely the NEON. You should just look at the Linpack benchmark for quick compare. From the link you provided I see NBench and Linpack tests run on the RPi and the wiki I provided has benchmark results for those.

Linpack test from your link
Pi 2 - 299.93 MFLOPS (quad A7 @ 900 MHz)
Pi 3 - 462.07 MFLOPS (quad A53 @ 1200 GHz)

Linpack from the Wiki I provided
AM335x - 57.22 MFLOPS (single A8 @ 800 MHz)
AM437x - 137.33 MFLOPS (single A9 @ 1000 MHz)
AM57xx - 686.67 MFLOPS (dual A15 @ 1500 MHz)

It does not look like we have comparable benchmarks for the C66x DSP. ARM Cortex cores are generally the same in any SoC and memory bandwidth is where you will find differences due to what interconnect is built around the core. If you can find NEON benchmarks for Cortex cores in any other SoC then you can expect the same performance in ours.

0 netrover over 8 years ago in reply to Ahmad_Rashed

Intellectual 850 points

This comparison test using mostly Android includes a direct comparison between A9 and A15 with and without NEON in some floating point tests. Looks like A9 runs about twice as fast per CPU and A15 runs about four times as fast per CPU when using NEON. Look at the second table.