Trying to get hardware floating point code generated for LM4F120

Tim Coddington

I'm at a point in my test project where I need to generated more optimized code and therefore I'm trying to configure and verify that hardware floating point code is being generated for my LM4F120.

My situation: Stellaris Launchpad LM4F120, CCS 5.2, Linux, Stellaris libs

After examining the assembly view of my code in the debugger it was obvious to me that VFP instructions were not being generated. I tried to search and find clarification for the different "-float_support" options but was not succesful so I tried building with each one and go warnings about certain options only being compatible with certain ARM architectures, e.g. A8. However, I hit upon "vfplib" and "fpalib" When I recompiled I got the following error:

fatal error #16016: file
"/home/tim/Stellaris/usblib/ccs-cm4f/Debug/usblib-cm4f.lib<usbdhandler.obj>"
was built with VFP coprocessor support while a previously seen file was not;
combining incompatible files

I interpret this to mean the usblib.lib has not been compiled with the -float_support set to "vfplib".

I get the same error when i attempt to use "fpalib"

I don't recall building the Stellarisware libs when they were installed. Is there a way to do this and produce compatible libraries?

Which option is appropriate and how get past this. What is your suggestion?

Thanks

over 12 years ago

0 Cody Addison over 12 years ago

TI__Expert 4540 points

Tim,

You should use the FPv4SPD16 argument to --float_support option to generate VFP instructions for Cortex-M4. This is actually the default for Cortex-M4 so if the option is not selected the compiler will generate VFP instructions. The vfplib and fpalib arguments are for software floating point. The vfp and fpa in the names refer to the order of 32-bit words for double precision values. The vfplib option matches that of VFP hardware.

If you are still failing to generate VFP instructions, make sure you are working with single precision (float) variables and make sure to use the 'f' suffix on floating point literals. The Cortex-M4F only support single precision operations in hardware. If you are using the 5.0 compiler, you can use the --float_operations_allowed=32 option to have the compiler generate errors if 64 bit operations are being performed.

0 Tim Coddington over 12 years ago in reply to Cody Addison

Intellectual 440 points

Thanks for your response, but a few follow-up questions....

codya said:
This is actually the default for Cortex-M4 so if the option is not selected the compiler will generate VFP instructions.

I interpret this statement as: FPv4SPD16 is the default, but regardless when IT IS set as the option for --float_support, the compiler WILL NOT generate hardware floating point, i.e. VFP instructions, so if I want VFP instructions I SHOULD NOT select it.

I sense this is not what you meant to say? If I interpreted it right, which option should I select for VFP?

-----------

codya said:
If you are still failing to generate VFP instructions, make sure you are working with single precision (float) variables

That's disappointing. I assumed I could use double and float and it would make suitable conversions and optimized when possible. Gee. Seems like I should have stayed with the compiler writing/tailoring part of my career. I do have some double types in use. You are suggesting I squeeze into floats instead to get FPU optimizations?

Please hurry and reply. I'm hoping you were having an "opposites" day :-)

Thanks

0 Cody Addison over 12 years ago in reply to Tim Coddington

TI__Expert 4540 points

Tim Coddington said:
"I interpret this statement as: FPv4SPD16 is the default, but regardless when IT IS set as the option for --float_support, the compiler WILL NOT generate hardware floating point, i.e. VFP instructions, so if I want VFP instructions I SHOULD NOT select it. "

What I was trying to say is that if the --float_support option is not specified, FPv4SPD16 will be used for Cortex-M4. The compiler will generate VFP instructions if the --float_support option is not used or if it is specified as --float_support=FPv4SPD16.

Tim Coddington said:
"That's disappointing. I assumed I could use double and float and it would make suitable conversions and optimized when possible. Gee. Seems like I should have stayed with the compiler writing/tailoring part of my career. I do have some double types in use. You are suggesting I squeeze into floats instead to get FPU optimizations?"

You should use the float type if you are trying to achieve the best floating point performance on Cortex-M4. There is more information in this wiki article. The article references Cortex-R4, but the information is the same for Cortex-M4 and even more relevant since Cortex-M4 only has single precision hardware.

0 Tim Coddington over 12 years ago in reply to Cody Addison

Intellectual 440 points

codya, thanks for hanging in here. I've been trying to follow your and the wiki's advice but no joy.

Here's what I've done:

o replaced all double's with floats and made sure that all decimal values used 'f' postfix

o I added "--float_operations_allowed=32" to the link command line in CCS 5.2 but the build process did NOT baulk when I had double's, etc. So I'm not sure it's taking effect. Note for some reason I don't see this option available as a pull-down in the Properties dialog of CCS project--odd since most others are there.

I think I know what to look for in the assembly view of my CCS debug sessions but maybe I'm not sure. I always break on a call to sqrt() and look at assemply view to verify that a subroutine has not been implicitly added for sqrt() and I look for a mnemonic that begins with V* that would seem obvious is related sqrt. Is this reasonable?

o The wiki mentioned using sqrtf() and __sqrt() but it wasn't explicitly clear from my read that these would generated FPU opcodes. When I tried using sqrt() for all instances, the compiler generated code that seemed to include a subroutine call to sqrt which clearly was not optimized. When I tried to use __sqrt() it was undefined.

What do I need to do in my CCS project to resolve intrinsic symbols like __sqrt()?

Again thanks

0 Cody Addison over 12 years ago in reply to Tim Coddington

TI__Expert 4540 points

I think you are using an older compiler version,most likely 4.9. Can you confirm this? The compiler version is different from the CCS version. The __sqrt intrinsic and --float_operations_allowed option are available in the 5.0 release. You should be able to get the latest compiler version through Help->"Install New Software". Also, you should use the __sqrtf() intrinsic to perform a single precision square root.

0 Tim Coddington over 12 years ago in reply to Cody Addison

Intellectual 440 points

Yes. The compiler version I'm using is 4.9.5 (installed with CCS 5.2). I've tried several ways to upgrade the compiler but none have been successful. I've tried the one you hinted, there's also another one based on instructions elsewhere that has you follow Preferences->Install/Update. As well as find an update to CCS. i think it may be due to my platform being Linux.

Do you have any other suggestions how I might get VFP code generation on Linux CCS?

Thanks

0 Cody Addison over 12 years ago in reply to Tim Coddington

TI__Expert 4540 points

The 4.9.5 compiler will generate VFP instructions for all single precision arithmetic operations (+,-,*,/). It does use the VSQRT instruction in the sqrtf() function, however there are several checks on the input before the VSQRT instruction is executed which cause the function to be fairly low performance. The 5.0 compiler improved the performance of the sqrtf() function by reducing checks before the VSQRT instruction and by providing the __sqrtf() intrinsic to avoid the call overhead entirely. If you need better square root performance then you will need to upgrade to the 5.0 compiler. I see you have another thread here asking how to upgrade to the latest compiler version in Linux CCS. Someone else should comment there with instructions on how to upgrade.

Cody

0 Tim Coddington over 12 years ago in reply to Cody Addison

Intellectual 440 points

I've now got the latest CCS version installed (including upgraded compiler) and i've compile a simple code fragment to look at the code generation. But it does not resolve the __sqrt() operation. What lib to I need to include in my project to resolve this, or what might be the reason it's not getting resolved?

I created a new project in CCS 5.4 instead of copying the old one from CCS 5.2. -code_state is "16", -float_support is "FPv4SPD16", and -fp_mode is "relaxed"

Here's my example code to test:

========

#include <math.h>
int main(void) {
   float x, y = 0.0f ;
   for (int i=0; i<4 ; i++) {
       y = (float)i*i ;
        x = __sqrt(y) ; // two underscore characters
   }
   return 0;
}

==========

Error:

"../main.cpp", line 10: error #20: identifier "__sqrt" is undefined

Thanks

0 Cody Addison over 12 years ago in reply to Tim Coddington

TI__Expert 4540 points

You must use __sqrtf() on Cortex-M4 because it only supports single precision. There is no VSQRT.F64 instruction so we do not allow the double precision intrinsic.

0 Tim Coddington over 12 years ago in reply to Cody Addison

Intellectual 440 points

OK. Got it. I use hi resolution and for some reason the e2e.ti.com web page text display very small and I did not catch the "f" on the end. So thanks for pointing that out.

I've made the appropriate edit to this code like this:

#include <math.h>

int main(void) {
   float x, y = 0.0f ;
   for (int i=0; i<4 ; i++) {
       y = (float)i*i ;
        x = __sqrtf(y) ;
   }
   return 0;
}

BUT, I get the same error. It appears I need to include another header file. Here's my build output

**** Build of configuration Debug for project ccs54-test ****

/opt/ti/ccs5.4/ccsv5/utils/bin/gmake -k all
Building file: ../main.cpp
Invoking: ARM Compiler
"/opt/ti/ccs5.4/ccsv5/tools/compiler/arm_5.0.4/bin/armcl" -mv7M4 --code_state=16 --float_support=FPv4SPD16 --abi=eabi -me -O0 --opt_for_speed=0 --fp_mode=relaxed -g --include_path="/opt/ti/ccs5.4/ccsv5/tools/compiler/arm_5.0.4/include" --diag_warning=225 --display_error_number --diag_wrap=off --preproc_with_compile --preproc_dependency="main.pp" "../main.cpp"
"../main.cpp", line 10: error #20: identifier "__sqrtf" is undefined
"../main.cpp", line 7: warning #552-D: variable "x" was set but never used

>> Compilation failure
1 error detected in the compilation of "../main.cpp".
gmake: *** [main.obj] Error 1
gmake: Target `all' not remade because of errors.

What headers (and libraries) resolve the intrinsic operations? Also, just to make sure--is it 1 or 2 underscores beforehand?

Thanks

0 Cody Addison over 12 years ago in reply to Tim Coddington

TI__Expert 4540 points

I am not able to reproduce this. You should not need any header files to use __sqrtf because it is an intrinsic and builtin to the compiler. It is 2 underscores. Is that the problem? If not we'll have to do some more investigation.

0 Tim Coddington over 12 years ago in reply to Cody Addison

Intellectual 440 points

I have always used 2 underscores. Are there any particular code snippets that you'd like me to try and build?

Thanks

0 Cody Addison over 12 years ago in reply to Tim Coddington

TI__Expert 4540 points

Tim,

I have been able to reproduce the issue and filed SDSCM00047236 to track the issue. The problem is a C vs. C++ issue. In C++ the __sqrt[f] intrinsics are defined in the standard namespace. This is a bug. As a workaround you can invoke them as std::__sqrtf(x). The fix will be in the 5.0.5 compiler release. Once fixed you will need to remove the namespace prefix.

Code Composer Studio™︎

Code Composer Studio forum

Trying to get hardware floating point code generated for LM4F120