Generating NEON instructions for Floating Point Operations on AM335x

Tim Kraus

Other Parts Discussed in Thread: AM3359, AM3352, OMAP3503

Hello Forum,

I am using CCS v5.4 and Industrial SDK 1.0.0.8 to develop some applications on an AM3359 (ICE Evaluation board) with SYS/BIOS.

I created a new CCS C-Project based on the "Typical" SYS/BIOS configuration template and added some code with Single Precision Floating Point operations.
With standard settings the compiler created instructions for the VFP unit and not for the faster NEON engine. (Because of the IEE754 conformance I guess)

The following (useless) testfunction works good for demonstration:

float tstfn (float k)
{
float x = 10005.7;
float y = -22.055;
float z = 0;
z = (x + k) * y;
return z;
}

The Disassembly is:

229       {
          tstfn:
80005c54:   F1AD0D10 SUB.W           R13, R13, #16
80005c58:   ED8D0A00 FSTS            S0, [R13, #0]
230         float x = 10005.7;
80005c5c:   483E     LDR             R0, $C$FL1
80005c5e:   9001     STR             R0, [SP, #0x4]
231         float y = -22.055;
80005c60:   483E     LDR             R0, $C$FL2
80005c62:   9002     STR             R0, [SP, #0x8]
232         float z = 0;
80005c64:   483E     LDR             R0, $C$FL3
80005c66:   9003     STR             R0, [SP, #0xC]
233         z = (x + k) * y;
80005c68:   EDDD0A00 FLDS            S1, [R13, #0]
80005c6c:   ED9D0A01 FLDS            S0, [R13, #4]
80005c70:   EE300A80 FADDS           S0, S1, S0
80005c74:   EDDD0A02 FLDS            S1, [R13, #8]
80005c78:   EE200A80 FMULS           S0, S1, S0
80005c7c:   ED8D0A03 FSTS            S0, [R13, #12]
234         return z;
80005ee4:   ED9D0A03 FLDS            S0, [R13, #12]
235       }

I'm using the TI v5.0.4 compiler, downloanded with CCS with the directives:

-mv7A8 --code_state=16 --float_support=VFPv3 --abi=eabi -me -Ooff -g --define=am3359 --define=omap3503 --define=am3352 --diag_warning=225 --display_error_number --diag_wrap=off --neon

As you can see optimizations are disabled (and enabling change anything on using VFPU opcodes) and the --neon directive is added.

My question is: How must the Compiler settings be changed to get NEON instructions?

-> I am aware of the wiki page: http://processors.wiki.ti.com/index.php/Cortex-A8, which says that the "-mf" directive should also be added. But the Compiler settings do not show this option. Using the GNU compiler instead doesn't work with the SYS/BIOS libs.

-> Refering to this post http://e2e.ti.com/support/arm/sitara_arm/f/791/t/216069.aspx, I see in the Debug View of CCS that in "ARM Advanced Features" the "NEON enabled" box is set.

I hope someone can help me enabling NEON Floating Point execution on AM335x with TI-ARM Compiler in SYS/BIOS Application.

Regards, Tim

over 12 years ago

0 Ki over 12 years ago

TI__Guru**** 476461 points

Hi Tim,

Sorry for the delay on this. I will move this thread to the compiler forums with the experts there can help you best.

Thanks

0 George Mock over 12 years ago

TI__Guru**** 252680 points

Please see this wiki article. You'll see that you need to change --opt_level=off (you wrote the equivalent -Ooff) to --opt_level=2 or higher. And you need to use --opt_for_speed=3 or higher.

Changing to those options does not result in NEON instructions for your simple example. That is because of a lack of optimization opportunity. When I changed the code to this ...

void tstfn(float *x, float *y, float * restrict z, float *k, int length)
{
int i;

for (i = 0; i < length; i++)
z[i] = (x[i] + k[i]) * y[i];
}

That change, along with the option changes, results in NEON instructions.

Why the restrict on the z pointer? This wiki article describes restrict in detail. In this case, it tells the compiler that the memory locations associated with z can only be written by z. That allows the compiler to reorder when memory is accessed, and thus order things so that NEON instructions can be used.

Thanks and regards,

-George

Code Composer Studio™︎

Code Composer Studio forum

Generating NEON instructions for Floating Point Operations on AM335x