This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
some years ago, at our company, we did a simple test supposed to illustrate the performance of code generated by the TI (CCS 5.1) and the IAR (EW 5.40.6) compiler for floating point computations with the MSP430F5438 MCU. In both setups we used full speed optimization. The code under test was:
volatile float i;
WDTCTL = WDTPW + WDTHOLD; // Stop watchdog timer
PIN_LED_PROGRUN_INIT; // Init port pin for output
i = 3.14f;
for (;;)
{
LED_PROGRUN_TOGGLE; // Toggle pin using exclusive-OR
i = i * 3.14f * 4.124f; // do some float operations
if (i > 0.0f)
i = i / 3.14f;
}
The result was that the IAR-generated code was 5-times faster. We decided for IAR EW. I wanted to reconsider the decision and made the test again a few days ago using IAR 6.4 and CCS 6.1.0 with TI compiler v4.4.3. I tried different compiler switches and optimization levels - always with the same result as a few years ago: the pin is toggled about every 200 µs with the IAR-generated code and about every 1 ms with the TI-generated code.
In the assembly I can see that IAR-compiler uses some special floating point routines (which surely incorporate the integer hardware multiplier) and this is not the case for the TI compiler. I also tried Crossworks compiler which was as fast as the IAR compiler.
It seems that there has been no progress in the development of the TI-compiler for MSP430 in terms of floating point computations for the last few years and would like to ask:
1. Does anybody know a way of boosting the performance of basic floating point arithmetic comparable to what TI did with mathlib for basic floating point functions (and I'm not thinking about tuning it in assembler myself)
2. Is there any activity going on to improve the TI-generated code? It is a commercial compiler and we would be willing to pay for it if it could measure up with the competition.
Kind regards,
Filip
P.S.
Trying GCC compiler is not an option for me since I have had big troubles using it with 20-bit addressing.
FiSz said:Does anybody know a way of boosting the performance of basic floating point arithmetic comparable to what TI did with mathlib for basic floating point functions (and I'm not thinking about tuning it in assembler myself)
The mathlib libraries are for processors with built-in floating point instructions. MSP430 has no floating point instructions.
FiSz said:Is there any activity going on to improve the TI-generated code?
The compiler is undergoing general improvement. In your specific case, performance is dominated by the speed of the routines for basic floating point operations like multiply and divide. Unfortunately, no work is being done to make those routines faster.
Thanks and regards,
-George
Hi George
> The mathlib libraries are for processors with built-in floating point instructions. MSP430 has no floating point instructions.
I meant mathlib for MSP430, i.e. www.ti.com/.../mspmathlib which is for devices without FPU.
> Unfortunately, no work is being done to make those routines faster.
This is the main reason why people are not using the TI-compiler commercially (beside code size).
I found some rumors of optimized assembler routines for floating point computations (TI's floating point package?, e.g. osdir.com/.../msg00148.html) on the Internet but
these references are in the meantime old and provide no further references. Do you know of a way to improve the
performance of the TI-compiler?
FiSz said:I meant mathlib for MSP430, i.e. www.ti.com/.../mspmathlib which is for devices without FPU.
That library does not come from the compiler development team. I have no special knowledge about it. If you want to know more about future plans for this library, I recommend you start a new thread in the MSP device forum. Or, if you prefer, I can move this thread into that forum.
FiSz said:Do you know of a way to improve the performance of the TI-compiler?
With regard to the performance of floating point operations ... Unfortunately, there is nothing to suggest.
Thanks and regards,
-George
That code uses a single-precision float for the results, but has constants which are implicitly double-precision and thus with strict floating point causes the calculation to be performed as double precision.FiSz said:volatile float i;
WDTCTL = WDTPW + WDTHOLD; // Stop watchdog timer
PIN_LED_PROGRUN_INIT; // Init port pin for output
i = 3.14;
for (;;)
{
LED_PROGRUN_TOGGLE; // Toggle pin using exclusive-OR
i = i * 3.14 * 4.124; // do some float operations
if (i > 0.0)
i = i / 3.14;
}
If you add an "f" suffix to the constants to make them explicitly single-precision does that change the relative speed of the program when used with different compilers?
With the TI MSP430 v15.12.1 compiler making the constants explicitly single-precision reduced the size of the program from 4094 to 798 bytes of flash, but I didn't time the change in execution time:
volatile float i; WDTCTL = WDTPW + WDTHOLD; // Stop watchdog timer PIN_LED_PROGRUN_INIT; // Init port pin for output i = 3.14f; for (;;) { LED_PROGRUN_TOGGLE; // Toggle pin using exclusive-OR i = i * 3.14f * 4.124f; // do some float operations if (i > 0.0f) i = i / 3.14f; }
Thanks Chester for having a look at it. I also tried to define the floats explicitly as single precision and also used the compiler switches for relaxed mode and float_operations_allowed=16. The speed did not improve at all though.
Thanks George,
I guess the mspmathlib library would be most suitable for providing improved floating point routines also for basic arithmetic.
If you can move the thread to a more suitable forum then I'd be grateful.
Regards,
Filip
(Disclaimer, I am coming from a C6000 background, so some of this may not be applicable to the MSP platform)
Ideally, there would be (at least) two flavors of basic arithmetic floating point routines available:
The first is a standards (C, C++, and/or IEEE 754) conforming version. These would be delivered with the RTS (C runtime library) targeting a particular platform, and would be optimized for that platform. For example mpyf on C66 would use the floating multiply instruction of that architecture, where as on C64 it would do additional work because lack of float instructions. These routines are the ones that will automatically be used when you do an operation like
float f = 1.23f * 4.56f;
Just in case, it might be good to confirm that you are linking in the most specific RTS for your platform. Make sure that Runtime support library is either explicitly targeting the specific MSP430 RTS, or it is set to <automatic> ("libc.a") and additionally you have targeted the most specific Device Family and Variant to help it make the correct automatic choice.
The second flavor of basic arithmetic floating point routines would be ones that attempt to be as fast as possible, and in order to do so are willing to be non-conforming. Such trade-offs might include reduced precision, failure to handle certain error states, etc. These would be supplied in a supplemental library and would be explicitly called like
float f = fastMultiply(1.23f, 4.56f);
As an extension to this second flavor, sometimes it is possible that such libraries are able to be linked in and will replace the conforming RTS routines with the faster, non-conforming, versions from the library. So a call to
float f = 1.23f * 4.56f;
executes as if it were
float f = fastMultiply(1.23f, 4.56f);
with the pros and cons that come with it.
I think this is some of what MATHLIB for some of the C6x platforms is doing. It does not appear to be what MSPMATHLIB is doing.
As an additional thing to double check, make sure that you are targeting the most specific platform possible with the "Target processor version" (--silicon_version, -mv) compiler option. In my CCS setup, this is a related but separate setting from Device Variant that I previously mentioned.
One additional comment, I know your example is probably not the actual use-case you are interested in, but I'll just mention that sometimes multiplication-by-reciprocal can be faster than division, so instead of
i = i / 3.14;
you could try something like
i *= 0.318f;
Hi Charles,
thank you for coming back to me. And yes - if the floating performance would be good enough, I would certainly consider employing the TI compiler at our company. At the moment we are rather dependent on the IAR compiler because of this issue.
Regards,
Filip
I think you might reconsider the requirements for your project. Do you really need floating point ?
Most things can be done with integer arithmetic / scaled math, or fixed-point. Even for complex routines like FFT exist integer implementations.
Time-critical routines and the elaborate floating point emulation don't fit well, especially on 8-bit or 16-bit MCUs.
However, there are many other developers who are not so keen on getting away from floating point computations or start porting lots of old (and working) code...
There is nothing wrong with floating point - if it goes with your project requirements. Hobbyist are usually very soft about that (including me !).
I vividly remember a (commercial) project with a Microchip PIC18, where even the library code for float was prohibitively large. Some cycle-by-cycle PWM control loops and a PID controller had to be realized, with a 1ms cycle time. Using floating point was not even considered.
Porting code might be somehow different. But I would start selecting a platform that meets the performance requirements with minimal code changes. Replacing a float-based algorithm with an integer-based one usually not qualifies as "porting", but rather as re-implementation ...
BTW, there is something I want to mention, in regard to your first post:
i = 3.14;
I hope you realize that "3.14" is a double constant, and not a float (single precision) constant. That, in turn, involves double-to-float conversions. To have float constants, you need the "f" suffix, i.e.:
i = 3.14f;
The impact of this difference is usually significant.
This is unfortunately not a problem. ...Actually the problem is already identified - there is no dedicated routine used by the TI compiler which would employ the integer hardware multiplication support also for floating point computations.
I followed this thread, and realized this.
Second, I have had a compiler flag set which enforces all floating point numbers to be treated as single precision.
This is IMHO the second-best option, because this is hidden somewhere in the project options. And they most probably get lost when porting. And, in difference to integer (C) variables sizes, "float" and "double" are well-defined in an IEEE norm. Providing such a forcing option seems a bad choice to me (by the toolchain vendor).
How about a platform with native floating point support, like the MSP432 ?
f. m. said:This is IMHO the second-best option, because this is hidden somewhere in the project options. And they most probably get lost when porting.
...and when posted on the forum too (hence the repeated questions about the missing 'f' suffix).
In fact I strongly suggest editing the source to make note of this, or to put the suffixes in place. That would avoid anyone testing the code as is - without the appropriate compiler option - and getting even worse results!
In fact I strongly suggest editing the source to make note of this, or to put the suffixes in place.
I fully agree if the code is your own.
Things are usually slightly different for "foreign" code like communication stacks (USB,CAN,TCP/IP), where any modification of the source should be avoided. I have such code in front of me, with 4 or 5 different typedef for each of the very same basic int type, each from a different comm stack we use ...
I filed CODEGEN-1339 in the SDOWP system to track this issue regarding the floating point multiply operation being slow. This is not a bug, but a performance issue. At this time I cannot say whether, or when, it might be addressed. You are welcome to follow it with the SDOWP link below in my signature.
Thanks and regards,
-George