Hello, I have a Beaglebone AI board with the AM5729 and want to build highly optimized floating point code for this platform using the gcc compiler. I will eventually be writing C66x specific code, but for now I want to just build for the ARM Cortex-A15 and NEON FPU, with auto-vectorization. I already have my code optimized for auto-vectorization on X86 platforms. I am looking for the best compiler flags to use for this CPU architecture.
What I believe are the best flags for my purpose:
-mcpu=cortex-a15 -mfloat-abi=hard -mfpu=neon-vfpv4 -mtune=cortex-a15 -funsafe-math-optimizations -O3 -ffast-math -fno-strict-aliasing
In particular I'm semi unsure about the -mfpu selection being neon-vfpv4. Is this the highest performance option for the AM5729?
The -funsafe-math-optimizations appears to be required to enable NEON for vectorization. I understand that this may mean a loss of precision with denormal numbers being set to 0 (is this a factor with the AM5729?). Are there any other flags required to enable auto-vectorization?
Best regards,
Element Green