AM3358 FPU speed

qxc

Assumed I'm doing only basic calculations (addition, substraction, division, multiplication) and only with 32 bit integers or floats: What loss in speed do I have to expect for floating point operations with the FPU of Sitara processor comparing to the same plain fixed point calculations where no FPU is required?

over 10 years ago

0 Biser Gatchev-XID over 10 years ago

TI__Guru**** 393215 points

The AM335X Linux SDK supports hard floating point. FP calculations are performed by the VFP coprocessor included in the Cortex-A8 core.

0 qxc over 10 years ago in reply to Biser Gatchev-XID

Genius 5820 points

OK, that's clear so far. The main question is about speed: are 32 bit floating point calculations that make use of VFP faster or slower than similar fixed point operations (basic arithmetic operations only). And in case VFP is still slower: which factor does one have to calculate with?

0 Jeff L over 10 years ago in reply to qxc

TI__Expert 5960 points

Hi Hans,

The VFP any ARM Cortex-A8 is not fully pipelined, so basic floating point operations (add, mult, sub) will take 9 - 12 cycles, where as the fixed point operations will occur in 1-2 cycles. These numbers are assuming the pipeline is full. Keep in mind, if you can utilize NEON, it is fully pipelined, so you can get 32 bit floating SIMD (mult, add, sub) out in 1 - 2 cycles. But NEON does not support double precision float.

0 qxc over 10 years ago in reply to Jeff L

Genius 5820 points

Jeff,

I'm using only 32 bit floats, so that is not a problem.

For TIs ARM compiler SIMD/NEON is used when option "--neon" is set and "--float-support=VFPv3" is used, correct?

Cheers

Hans

0 Jeff L over 10 years ago in reply to qxc

TI__Expert 5960 points

Hans,

For TI's compiler you can enable NEON: (This is somewhat old data and I haven't checked it in a while)

"-o3 -mv7a8 --neon -mf "

see this link for more info: http://processors.wiki.ti.com/index.php/Cortex-A8

However, since NEON is SIMD, the compiler will not automatically utilize NEON for any (add, mult, sub) in your code. You will find that the compiler will auto vectorize some simple for loops, but in general, you need to either write your own NEON assembly or utilize an existing library such as the NE10 library. http://projectne10.github.io/Ne10/

Processors

Processors forum

AM3358 FPU speed