Slow floating point code for MSP430

FiSz

Other Parts Discussed in Thread: MSP430F5438, MATHLIB, MSPMATHLIB, ENERGYTRACE

Hi,

some years ago, at our company, we did a simple test supposed to illustrate the performance of code generated by the TI (CCS 5.1) and the IAR (EW 5.40.6) compiler for floating point computations with the MSP430F5438 MCU. In both setups we used full speed optimization. The code under test was:

    volatile float i;

   WDTCTL = WDTPW + WDTHOLD;               // Stop watchdog timer

   PIN_LED_PROGRUN_INIT;                   // Init port pin for output

   i = 3.14f;

   for (;;)
   {
       LED_PROGRUN_TOGGLE;                   // Toggle pin using exclusive-OR

       i = i * 3.14f * 4.124f;               // do some float operations

       if (i > 0.0f)
           i = i / 3.14f;

   }

The result was that the IAR-generated code was 5-times faster. We decided for IAR EW. I wanted to reconsider the decision and made the test again a few days ago using IAR 6.4 and CCS 6.1.0 with TI compiler v4.4.3. I tried different compiler switches and optimization levels - always with the same result as a few years ago: the pin is toggled about every 200 µs with the IAR-generated code and about every 1 ms with the TI-generated code.

In the assembly I can see that IAR-compiler uses some special floating point routines (which surely incorporate the integer hardware multiplier) and this is not the case for the TI compiler. I also tried Crossworks compiler which was as fast as the IAR compiler.
It seems that there has been no progress in the development of the TI-compiler for MSP430 in terms of floating point computations for the last few years and would like to ask:

1. Does anybody know a way of boosting the performance of basic floating point arithmetic comparable to what TI did with mathlib for basic floating point functions (and I'm not thinking about tuning it in assembler myself)

2. Is there any activity going on to improve the TI-generated code? It is a commercial compiler and we would be willing to pay for it if it could measure up with the competition.

Kind regards,

Filip

P.S.
Trying GCC compiler is not an option for me since I have had big troubles using it with 20-bit addressing.

over 9 years ago

0 George Mock over 9 years ago

TI__Guru**** 244690 points

FiSz said:
Does anybody know a way of boosting the performance of basic floating point arithmetic comparable to what TI did with mathlib for basic floating point functions (and I'm not thinking about tuning it in assembler myself)

The mathlib libraries are for processors with built-in floating point instructions. MSP430 has no floating point instructions.

FiSz said:
Is there any activity going on to improve the TI-generated code?

The compiler is undergoing general improvement. In your specific case, performance is dominated by the speed of the routines for basic floating point operations like multiply and divide. Unfortunately, no work is being done to make those routines faster.

Thanks and regards,

-George

0 FiSz over 9 years ago in reply to George Mock

Prodigy 170 points

Hi George

> The mathlib libraries are for processors with built-in floating point instructions. MSP430 has no floating point instructions.

I meant mathlib for MSP430, i.e. www.ti.com/.../mspmathlib which is for devices without FPU.

> Unfortunately, no work is being done to make those routines faster.

This is the main reason why people are not using the TI-compiler commercially (beside code size).
I found some rumors of optimized assembler routines for floating point computations (TI's floating point package?, e.g. osdir.com/.../msg00148.html) on the Internet but
these references are in the meantime old and provide no further references. Do you know of a way to improve the
performance of the TI-compiler?

0 George Mock over 9 years ago in reply to FiSz

TI__Guru**** 244690 points

FiSz said:
I meant mathlib for MSP430, i.e. www.ti.com/.../mspmathlib which is for devices without FPU.

That library does not come from the compiler development team. I have no special knowledge about it. If you want to know more about future plans for this library, I recommend you start a new thread in the MSP device forum. Or, if you prefer, I can move this thread into that forum.

FiSz said:
Do you know of a way to improve the performance of the TI-compiler?

With regard to the performance of floating point operations ... Unfortunately, there is nothing to suggest.

Thanks and regards,

-George

0 Chester Gillon over 9 years ago

Guru 92251 points

FiSz said:
volatile float i;

   WDTCTL = WDTPW + WDTHOLD;               // Stop watchdog timer

   PIN_LED_PROGRUN_INIT;                   // Init port pin for output

   i = 3.14;

   for (;;)
   {
       LED_PROGRUN_TOGGLE;                   // Toggle pin using exclusive-OR

       i = i * 3.14 * 4.124;               // do some float operations

       if (i > 0.0)
           i = i / 3.14;

   }

That code uses a single-precision float for the results, but has constants which are implicitly double-precision and thus with strict floating point causes the calculation to be performed as double precision.

If you add an "f" suffix to the constants to make them explicitly single-precision does that change the relative speed of the program when used with different compilers?

With the TI MSP430 v15.12.1 compiler making the constants explicitly single-precision reduced the size of the program from 4094 to 798 bytes of flash, but I didn't time the change in execution time:

	volatile float i;

	WDTCTL = WDTPW + WDTHOLD;                // Stop watchdog timer

	PIN_LED_PROGRUN_INIT;                    // Init port pin for output

	i = 3.14f;

	for (;;)
	{
		LED_PROGRUN_TOGGLE;                    // Toggle pin using exclusive-OR

		i = i * 3.14f * 4.124f;                // do some float operations

		if (i > 0.0f)
			i = i / 3.14f;

	}

0 FiSz over 9 years ago in reply to Chester Gillon

Prodigy 170 points

Thanks Chester for having a look at it. I also tried to define the floats explicitly as single precision and also used the compiler switches for relaxed mode and float_operations_allowed=16. The speed did not improve at all though.

0 FiSz over 9 years ago in reply to George Mock

Prodigy 170 points

Thanks George,

I guess the mspmathlib library would be most suitable for providing improved floating point routines also for basic arithmetic.

If you can move the thread to a more suitable forum then I'd be grateful.

Regards,

Filip

0 user347219 over 9 years ago in reply to FiSz

Expert 1240 points

(Disclaimer, I am coming from a C6000 background, so some of this may not be applicable to the MSP platform)

Ideally, there would be (at least) two flavors of basic arithmetic floating point routines available:

The first is a standards (C, C++, and/or IEEE 754) conforming version. These would be delivered with the RTS (C runtime library) targeting a particular platform, and would be optimized for that platform. For example mpyf on C66 would use the floating multiply instruction of that architecture, where as on C64 it would do additional work because lack of float instructions. These routines are the ones that will automatically be used when you do an operation like

float f = 1.23f * 4.56f;

Just in case, it might be good to confirm that you are linking in the most specific RTS for your platform. Make sure that Runtime support library is either explicitly targeting the specific MSP430 RTS, or it is set to <automatic> ("libc.a") and additionally you have targeted the most specific Device Family and Variant to help it make the correct automatic choice.

The second flavor of basic arithmetic floating point routines would be ones that attempt to be as fast as possible, and in order to do so are willing to be non-conforming. Such trade-offs might include reduced precision, failure to handle certain error states, etc. These would be supplied in a supplemental library and would be explicitly called like

float f = fastMultiply(1.23f, 4.56f);

As an extension to this second flavor, sometimes it is possible that such libraries are able to be linked in and will replace the conforming RTS routines with the faster, non-conforming, versions from the library. So a call to

float f = 1.23f * 4.56f;

executes as if it were

float f = fastMultiply(1.23f, 4.56f);

with the pros and cons that come with it.

I think this is some of what MATHLIB for some of the C6x platforms is doing. It does not appear to be what MSPMATHLIB is doing.

As an additional thing to double check, make sure that you are targeting the most specific platform possible with the "Target processor version" (--silicon_version, -mv) compiler option. In my CCS setup, this is a related but separate setting from Device Variant that I previously mentioned.

One additional comment, I know your example is probably not the actual use-case you are interested in, but I'll just mention that sometimes multiplication-by-reciprocal can be faster than division, so instead of

i = i / 3.14;

you could try something like

i *= 0.318f;

0 Charles Oladimeji over 9 years ago in reply to user347219

TI__Mastermind 33805 points

Hi FiSz,

I will be taking a closer look at this.
Do you still need assistance with optimizing the floating point performance?

Regards,
Charles O

0 FiSz over 9 years ago in reply to Charles Oladimeji

Prodigy 170 points

Hi Charles,

thank you for coming back to me. And yes - if the floating performance would be good enough, I would certainly consider employing the TI compiler at our company. At the moment we are rather dependent on the IAR compiler because of this issue.

Regards,

Filip

0 f. m. over 9 years ago in reply to FiSz

Guru 11940 points

I think you might reconsider the requirements for your project. Do you really need floating point ?

Most things can be done with integer arithmetic / scaled math, or fixed-point. Even for complex routines like FFT exist integer implementations.

Time-critical routines and the elaborate floating point emulation don't fit well, especially on 8-bit or 16-bit MCUs.

0 FiSz over 9 years ago in reply to f. m.

Prodigy 170 points

I completely agree. However, there are many other developers who are not so keen on getting away from floating point computations or start porting lots of old (and working) code...

0 f. m. over 9 years ago in reply to FiSz

Guru 11940 points

However, there are many other developers who are not so keen on getting away from floating point computations or start porting lots of old (and working) code...

There is nothing wrong with floating point - if it goes with your project requirements. Hobbyist are usually very soft about that (including me !).

I vividly remember a (commercial) project with a Microchip PIC18, where even the library code for float was prohibitively large. Some cycle-by-cycle PWM control loops and a PID controller had to be realized, with a 1ms cycle time. Using floating point was not even considered.

Porting code might be somehow different. But I would start selecting a platform that meets the performance requirements with minimal code changes. Replacing a float-based algorithm with an integer-based one usually not qualifies as "porting", but rather as re-implementation ...

0 f. m. over 9 years ago in reply to f. m.

Guru 11940 points

BTW, there is something I want to mention, in regard to your first post:

i = 3.14;

I hope you realize that "3.14" is a double constant, and not a float (single precision) constant. That, in turn, involves double-to-float conversions. To have float constants, you need the "f" suffix, i.e.:

i = 3.14f;

The impact of this difference is usually significant.

0 FiSz over 9 years ago in reply to f. m.

Prodigy 170 points

This is unfortunately not a problem. First of all, I tried the explicit definition as well with the same result. Second, I have had a compiler flag set which enforces all floating point numbers to be treated as single precision. I also tried different runtime libraries (see one of the posts above). Actually the problem is already identified - there is no dedicated routine used by the TI compiler which would employ the integer hardware multiplication support also for floating point computations.

0 f. m. over 9 years ago in reply to FiSz

Guru 11940 points

This is unfortunately not a problem. ...
Actually the problem is already identified - there is no dedicated routine used by the TI compiler which would employ the integer hardware multiplication support also for floating point computations.

I followed this thread, and realized this.

Second, I have had a compiler flag set which enforces all floating point numbers to be treated as single precision.

This is IMHO the second-best option, because this is hidden somewhere in the project options. And they most probably get lost when porting. And, in difference to integer (C) variables sizes, "float" and "double" are well-defined in an IEEE norm. Providing such a forcing option seems a bad choice to me (by the toolchain vendor).

How about a platform with native floating point support, like the MSP432 ?

0 FiSz over 9 years ago in reply to f. m.

Prodigy 170 points

I'm already using it in newer projects :-) But it's not officially in production yet (no BGA package as well). But most of all, we still need to support the MSP430 series. It would be great to have the same IDE for both and CCS is predestined for this since it comes from TI and directly supports all new features (like EnergyTrace).

0 Robert Cowsill over 9 years ago in reply to f. m.

Guru 16361 points

f. m. said:
This is IMHO the second-best option, because this is hidden somewhere in the project options. And they most probably get lost when porting.

...and when posted on the forum too (hence the repeated questions about the missing 'f' suffix).

In fact I strongly suggest editing the source to make note of this, or to put the suffixes in place. That would avoid anyone testing the code as is - without the appropriate compiler option - and getting even worse results!

0 FiSz over 9 years ago in reply to Robert Cowsill

Prodigy 170 points

Thanks for the objection. I have adjusted the source code now to avoid confusions.

0 f. m. over 9 years ago in reply to Robert Cowsill

Guru 11940 points

In fact I strongly suggest editing the source to make note of this, or to put the suffixes in place.

I fully agree if the code is your own.

Things are usually slightly different for "foreign" code like communication stacks (USB,CAN,TCP/IP), where any modification of the source should be avoided. I have such code in front of me, with 4 or 5 different typedef for each of the very same basic int type, each from a different comm stack we use ...

0 Charles Oladimeji over 9 years ago

TI__Mastermind 33805 points

Hi Fisz,

I was able to recreate this and got the same results. IAR was way faster than CCS.
The conclusion is that the problem is inherent to the compiler.

I will be moving this to the CCS forum for their input.

Regards,
Charles O.

0 Ki over 9 years ago in reply to Charles Oladimeji

TI__Guru**** 450761 points

Hello,
I see this thread has been moved around across various forums. I'm afraid I'm going to have to move it back to the compiler forum where it first originated as the experts there can comment about an issues specific to the TI compiler. Thank you for your patience.

Thanks
ki

0 George Mock over 9 years ago in reply to Ki

TI__Guru**** 244690 points

I filed CODEGEN-1339 in the SDOWP system to track this issue regarding the floating point multiply operation being slow. This is not a bug, but a performance issue. At this time I cannot say whether, or when, it might be addressed. You are welcome to follow it with the SDOWP link below in my signature.

Thanks and regards,

-George

Code Composer Studio™︎

Code Composer Studio forum

Slow floating point code for MSP430