Comparison between 32 bit int, float and 64 bit int calculation regarding execution time - 64 bit int that slow?

Dennis Eichmann

Other Parts Discussed in Thread: MSP430G2553, MSP430G2955, IAR-KICKSTART

Hi everyone!

Just for fun I tried a few implementations of calculating a percentage of an ADC in relation to it's maximum value of 65535. Normally, whenever possible, I try to avoid floating point calculations and use integer ones instead. But in most cases I'm fine with 32 bit ones. It is faster and generates less code, but you have to be OK with a larger calculation error.

64 bit integers are data types I do not use very often (I cannot remember when I used one), but in order to get higher precision I just wanted to know if the 64 bit calculation is still faster and smaller than the float one, but I wouldn't have expected the result I got.

This is my simple calculation in this example:

16 bit ADC, so a maximum value of 65535.

No I read an ADC value of 38205 and want to calculate the percentage of the full ADC span, so one would calculate:

(38205 * 100%) / 65535 = 58.2970931....%

Then I want to recalculate the original value from this percentage:

((58.2970931% * 65535) / 100% = 38205

Here are three ways for the calculation (without any math library, just to see the difference) - optimization is completely off:

Variant A:

// Calculation in 32 bit integer
// 10000 means 100.00%

uint32_t adc_result = 38205;
uint32_t test;

test = ((adc_result * 10000) / 65535); // Results in 5829  - means 58.29%
test = ((test * 65535) / 10000);       // Results in 38200 - loss of 5

My existing code with this calculation generates 12.184 bytes of code, 34 bytes of data and 414 bytes of RAM.

The execution of these two lines needs 1.267 clock cycles.

Variant B:

// Calculation in float

uint32_t adc_result = 38205;
float    test;

test = ((adc_result * 100.0) / 65535); // Results in 58.29709
test = ((test * 65535) / 100.0);       // Results in 38205.0 - no loss

My existing code with this calculation generates 12.376 bytes (+ 192) of code, 34 bytes of data and 414 bytes of RAM.

The execution of these two lines needs 12.306 (almost 10x more than 32 bit integer) clock cycles.

Variant C:

// Calculation in 64 bit integer
// 100000 means 100.000%

uint32_t adc_result = 38205;
uint64_t test;

test = ((adc_result * 100000) / 65535); // Results in 58297 - means 58.297%
test = ((test * 65535) / 100000);       // Results in 38204 - loss of 1

My existing code with this calculation generates 13.624 bytes (+ 1248 compared to float) of code, 34 bytes of data and 414 bytes of RAM.

The execution of these two lines needs 19.297 (+ 6991 compared to float) clock cycles.

For all three types of calculation you can improve the speed slightly by using 65536 instead of 65535 and write a bit shift by 16, but the relation between the calculations stays the same.

In the last example, when declaring adc_result as uint64_t as well, the code size gets 13.656 bytes and the execution time is 37.410 clock cycles!

Honestly I wouldn't have expected the result of the 64 bit calculation to be that slow (and also that large in code size). And since I'm not that deep in how this calculation is done internally, I was hoping that someone could tell me why 64 bit takes so much longer.

Most of the clock cycles go into the second line of the calculation.

Dennis

over 9 years ago

0 Clemens Ladisch over 9 years ago

Guru 317480 points

I wouldn't be surprised if the runtime of your compiler (whatever it is) simply converts 64-bit integers to/from floating-point values to do the actual computations.

0 Ilmars over 9 years ago in reply to Clemens Ladisch

Guru 46710 points

>I wouldn't be surprised if the runtime of your compiler (whatever it is) simply converts 64-bit integers to/from floating-point values to do the actual computations.

I would be very surprised instead. - Because that would mean unforgivable resolution loss. Note that 64 bits double float have only 52 fraction bits, the rest is used for exponent and sign.

Dennis, this is ok. Double float (64 bit) is expected to be slower than 64bit integer, not 32bit float.

0 Alexey Bagaev over 9 years ago

Genius 5505 points

Hi Dennis,

If you want to beat compiler performance you should learn assembly language and specific bitwise properties in math. It is hard, but most of your code will be needed none of RAM or just a little fraction of compiler usage. If you have a integer data that needs to be converted to display literals, on assembler you can program such operation in 12 simple ASM commands which will cost you about 50 clock cycles (depending on architecture) for 4 decimal-digit-value instead of hundreds of thousands cycles in other conversion algorithms. With understanding of bitwise math properties you will realize that you need additional hardware to simplify such computations even further, that simple MSP430G-serices microcontrollers cannot do that.

Regards,
Alexey

0 Robert Cowsill over 9 years ago

Guru 16361 points

That's a surprising result. Is this on an MSP430 with no hardware multiplier? I could imagine the cost of a 32x32=>64 software multiply might exceed the equivalent calculation in single-precision floating point.

0 old_cow_yellow over 9 years ago in reply to Clemens Ladisch

Guru 58965 points

I am using an oldish IAR Kickstart. I tried to run your tests on a MSP430G2553. What surprised me is the Flash and Ram usage.

First, I surrounded your test codes like this:

#include <msp430.h>
#include <stdint.h>
void main(void) {
TA0CTL = TASSEL1|MC1|TACLR;

// your code here

TA0CTL = 0;
}

I use the default settings of IAR Kickstart with no optimization.

I got the # of cycles needed by reading the TA0R after TA0 is stopped. I got the amount of Flash needed by using debugger to examine the contents of the entire Flash memory. And I got the amount of Ram needed by filling the entire Ram with different known constants before I run the code, and examine the contents after I run the code.

My results are as follows:

Code Result Cycles needed Flash needed Ram needed

unit32_t gained 3 1.341 (0x0541) 226 (0x00E2) 18 (0x0012)

float correct 3.142 (0x0C46) 856 (0x0358) 20 (0x0014)

unit64_t lost 1 10.015 (0x271F) 660 (0x0294) 64 (0x0040)

0 Dennis Eichmann over 9 years ago in reply to old_cow_yellow

Guru 74080 points

Thanks Clemens, Ilmars, Alexey, Robert and OCY for your input!

Clemens Ladisch said:
I wouldn't be surprised if the runtime of your compiler (whatever it is) simply converts 64-bit integers to/from floating-point values to do the actual computations.

That would be very interesting and a bad solution in my opinion. I use CCS 6.1.1.00022 with TI compiler version 4.4.7, eabi (ELF).

Ilmars said:
Because that would mean unforgivable resolution loss.

Agree to that. If I use integer data types I expect them to behave like integers.

Ilmars said:
Double float (64 bit) is expected to be slower than 64bit integer, not 32bit float.

I would have expected the integer calculations (even 64 bit) to be faster than any floating point operation.

Alexey Bagaev said:
If you want to beat compiler performance you should learn assembly language

Although knowing about assembly isn't that bad, this wasn't my intention. I was just wondering about the results.

Robert Cowsill said:
Is this on an MSP430 with no hardware multiplier?

Sorry for the missing information - it is a MSP430G2955, so no hardware multiplier available.

old_cow_yellow said:
I surrounded your test code

I now did the same and created a new program, only having these few lines - it now looks like this (optimization off):

#include "msp430g2955.h"
#include "stdint.h"

uint32_t adc_res = 38205;
uint32_t test;

void main( void )
{
  WDTCTL = (WDTPW | WDTHOLD);	                 // Stop watchdog timer
  TA0CTL = (TASSEL_2 | ID_0 | MC_2 | TACLR);     // SMCLK, divider 1, continuous mode, clear

  // Test code begin
  test = ((adc_res * 10000) / 65535);            // 10000 equals 100.00%
  test = ((test * 65535) / 10000);               // 32 bit calculation
  // Test code end

  TA0CTL = 0;                                    // Stop timer
	
  while( 1 );
}

This is 562 (code), 32 (data) and expected RAM usage of 114 (uninitialized data + stack)

Clock cycles after setting TA0CTL to 0: 1260

Here with float:

#include "msp430g2955.h"
#include "stdint.h"

uint32_t adc_res = 38205;
float    test;

void main( void )
{
  WDTCTL = (WDTPW | WDTHOLD);	                 // Stop watchdog timer
  TA0CTL = (TASSEL_2 | ID_0 | MC_2 | TACLR);     // SMCLK, divider 1, continuous mode, clear

  // Test code begin
  test = ((adc_res * 100.0) / 65535);            // Calculation in float
  test = ((test * 65535) / 100.0);
  // Test code end

  TA0CTL = 0;                                    // Stop timer
	
  while( 1 );
}

This is 4604 (code), 32 (data) and expected RAM usage of 114 (uninitialized data + stack)

Clock cycles after setting TA0CTL to 0: 12289

Now with 64 bit integer:

#include "msp430g2955.h"
#include "stdint.h"

uint32_t adc_res = 38205;
uint64_t test;

void main( void )
{
  WDTCTL = (WDTPW | WDTHOLD);	                 // Stop watchdog timer
  TA0CTL = (TASSEL_2 | ID_0 | MC_2 | TACLR);     // SMCLK, divider 1, continuous mode, clear

  // Test code begin
  test = ((adc_res * 100000) / 65535);           // 100000 equals 100.000%
  test = ((test * 65535) / 100000);              // 64 bit calculation
  // Test code end

  TA0CTL = 0;                                    // Stop timer
	
  while( 1 );
}

This is 2058 (code), 32 (data) and expected RAM usage of 118 (uninitialized data + stack)

Clock cycles after setting TA0CTL to 0: 19285

Now adc_res is 64 bits as well:

#include "msp430g2955.h"
#include "stdint.h"

uint64_t adc_res = 38205;                        // <--- NOW 64 BITS
uint64_t test;

void main( void )
{
  WDTCTL = (WDTPW | WDTHOLD);	                 // Stop watchdog timer
  TA0CTL = (TASSEL_2 | ID_0 | MC_2 | TACLR);     // SMCLK, divider 1, continuous mode, clear

  // Test code begin
  test = ((adc_res * 100000) / 65535);           // 100000 equals 100.000%
  test = ((test * 65535) / 100000);              // 64 bit calculation
  // Test code end

  TA0CTL = 0;                                    // Stop timer
	
  while( 1 );
}

This is 1996 (code), 32 (data) and expected RAM usage of 126 (uninitialized data + stack)

Clock cycles after setting TA0CTL to 0: 37387

And here is double:

#include "msp430g2955.h"
#include "stdint.h"

uint32_t adc_res = 38205;
double   test;

void main( void )
{
  WDTCTL = (WDTPW | WDTHOLD);	                 // Stop watchdog timer
  TA0CTL = (TASSEL_2 | ID_0 | MC_2 | TACLR);     // SMCLK, divider 1, continuous mode, clear

  // Test code begin
  test = ((adc_res * 100.0) / 65535);
  test = ((test * 65535) / 100.0);               // Calculation in double
  // Test code end

  TA0CTL = 0;                                    // Stop timer
	
  while( 1 );
}

This is 4154 (code), 32 (data) and expected RAM usage of 118 (uninitialized data + stack)

Clock cycles after setting TA0CTL to 0: 21915

This looks completely different to yours regarding elapsed clock cycles, OCY! Only the 32 bit one looks similar. Is IAR making that large difference? That would be a very poor result for CCS. Or do we compare apples with pears?

Dennis

0 f. m. over 9 years ago in reply to Dennis Eichmann

Guru 11940 points

Not to be nitpicking, but:

Here with float:

#include "msp430g2955.h"
#include "stdint.h"

uint32_t adc_res = 38205;
float    test;

void main( void )
{
  ...

  // Test code begin
  test = ((adc_res * 100.0) / 65535);            // Calculation in float
  test = ((test * 65535) / 100.0);
  // Test code end

  ...
	
  while( 1 );
}

Most toolchains I know translate constants like "100.0" to double. Thus, your code most probably contains some unnecessary int-to-double and double-to-float conversions.
Except this is somehow hidden in project settings... The use of an "f" suffix (like "100.0f") would be more clear.

0 Dennis Eichmann over 9 years ago in reply to f. m.

Guru 74080 points

f. m. said:
Most toolchains I know translate constants like "100.0" to double.

Good catch!

You are absolutely right, this one is completely different:

#include "msp430g2955.h"
#include "stdint.h"

uint32_t adc_res = 38205;
float    test;

void main( void )
{
  WDTCTL = (WDTPW | WDTHOLD);	                 // Stop watchdog timer
  TA0CTL = (TASSEL_2 | ID_0 | MC_2 | TACLR);     // SMCLK, divider 1, continuous mode, clear

  // Test code begin
  test = ((adc_res * 100.0f) / 65535);
  test = ((test * 65535) / 100.0f);              // Calculation in float
  // Test code end

  TA0CTL = 0;                                    // Stop timer
	
  while( 1 );
}

This is 846 bytes of code only (was 4604). And it is 1526 cycles compared to 12289 without the f suffix. This now matches OCY's values for float.

What a difference! I didn't know that - thanks a lot! So only 64 bits increases the processing cycles and the code size that much.

Dennis

0 Ilmars over 9 years ago in reply to Dennis Eichmann

Guru 46710 points

Dennis Eichmann said:

Ilmars

Double float (64 bit) is expected to be slower than 64bit integer, not 32bit float.

I would have expected the integer calculations (even 64 bit) to be faster than any floating point operation.

You think floating point is evil, actually it is not :) Most of (fixed point CPU) cycles goes into multiply operation anyway. Normalization of result and exponent addition overhead does not takes so much.

0 Dennis Eichmann over 9 years ago in reply to Ilmars

Guru 74080 points

Yes, actually it is not too bad, that's right. Unfortunately this is my lack of knowledge about how those operations are performed on hardware level.

0 Alexey Bagaev over 9 years ago in reply to Dennis Eichmann

Genius 5505 points

Despite MSP430 was based on very old MCU technology, some hardware additions in specific situations can increase performance efficiency of MCU even higher than based on ARM Cortex M4F, which can execute 32-bit floating point multiplication as low as at 1 clock cycle.

0 f. m. over 9 years ago in reply to Ilmars

Guru 11940 points

You think floating point is evil, actually it is not :)

But one should keep in mind that single precision floating point is not very accurate - it has just a 24 bit mantissa. If you don't take advantage of the big "swing" of the exponent, you stand to lose, compared to 32 bit integer.

Sometimes (e.g. for FIR/IIR algorithms), single-precision can even break the implementation.

0 Alexey Bagaev over 9 years ago

Genius 5505 points

I see someone completely disagree with my statements. That is OK. I have what I want, that's quite enough for me.
Cheers!

0 Dennis Eichmann over 9 years ago in reply to Alexey Bagaev

Guru 74080 points

Alexey Bagaev said:
I see someone completely disagree with my statements. That is OK. I have what I want, that's quite enough for me.

Alexey, what do you mean?

0 Alexey Bagaev over 9 years ago in reply to Dennis Eichmann

Genius 5505 points

Dennis Eichmann said:
Alexey, what do you mean?

This is about TI forums, not this thread in particular. Don't worry.

0 Ilmars over 9 years ago in reply to Alexey Bagaev

Guru 46710 points

Alexey Bagaev said:

Dennis Eichmann

Alexey, what do you mean?

This is about TI forums, not this thread in particular. Don't worry.

Honeslty I did not get it too. Alexey, would you please explain what you were talking about? :)

0 Alexey Bagaev over 9 years ago in reply to Ilmars

Genius 5505 points

Ilmars said:

Honeslty I did not get it too. Alexey, would you please explain what you were talking about? :)

Mostly I don't have any interest in forums except some questions, which are really rare. But someone decided that my help is not even valuable, (I discovered it by indirectional indicator). So, I decided not to provide FREE consulting too. (I just remember some swindlers and betrayers who tried to sell rubbish consulting for around 500$ by spamming their bluffing business by e-mail. They ruined my life in the past.)

I really respect TI company. They provide me exceptional electronics which is in most cases exceed expectations. (Of course, there are some minor issues too). They recommended me not to involve in these forums, but to connect by offline connections. So this should I did.

Alexey

0 Dennis Eichmann over 9 years ago in reply to Alexey Bagaev

Guru 74080 points

So you are talking about me not being interested in your help? If so, this is not true, it simply wasn't what I was asking for. Surely true that using assembly might give you the fastest result, but I do not know about it. Would be great if, but for the moment I'm still the C-programmer. And as I also said, I'm not that experienced in how these operations are done on hardware level. This is why I asked about the differences of the given ways of calculating the value. As it turned out, the difference was caused by my lack of knowledge regarding the compiler I use. I didn't know that 100.0 would be treated as double - now I do. But this does not mean I'm not thankful about your input.

0 Alexey Bagaev over 9 years ago in reply to Dennis Eichmann

Genius 5505 points

Dennis, this is not about you or anyone in this thread. Don't wind up.

0 Alexey Bagaev over 9 years ago in reply to Dennis Eichmann

Genius 5505 points

Just there are many people who can't repay you with anything else than evil because they are evil, but there are a few who will exceed your expectations in good because they are amazing.

0 f. m. over 9 years ago in reply to Alexey Bagaev

Guru 11940 points

Just there are many people who can't repay you with anything else than evil because they are evil, but there are a few who will exceed your expectations in good because they are amazing.

While it digresses into the area of philosophy, still a pearl of wisdom. Is it yours, or a quote ?

0 Alexey Bagaev over 9 years ago in reply to f. m.

Genius 5505 points

It is not a quote, but rather statistics of life and partially well known philosophical statement about evil people can repay only by evil.

0 Alexey Bagaev over 9 years ago in reply to f. m.

Genius 5505 points

f.m., by the way, in YoutTube there are some documentations about Nazi developed and tested nuclear weaponry before the USA in Germany. The USSR later got documentation for developing nuclear bomb through Russian spies and such documentation was provided by German ex-Nazi scientist in the USA (there is an evidence given by on of Russian nuclear bomb developer). Hitler mostly didn't use chemical and biological weaponry in a battlefield in WWII because he was a warrior of WWI and was aware about signed convention to prevent using them. Later, nuclear rocket engine was developed in USA and 30 years later by USSR using the same principles and design. Today Russia planning to use nuclear rocket engines in manned space exploration in the year 2019. Noone says what will be if nuclear rocket engine get damaged after activation. But everyone understand.

0 f. m. over 9 years ago in reply to Alexey Bagaev

Guru 11940 points

...in YoutTube there are some documentations about Nazi developed and tested nuclear weaponry before the USA in Germany.

I've read quite a lot of books about this topic before the YOUTUBE times.And as a matter of fact, the heads of the Peenenmünde team (A4/A9, ... rockets) went to the US (von Braun, Dornberger), while the second-level engineers (the actual makers) went to the Soviet Union.

And there is this rumor / "conspiracy theory" amongst non-aligned historians that Hitler&co. had nuclear weapons in fall 1944, but the top echelon sold them (and the advanced missile, jet, guidance and radar technology) to the Allied Forces for a free passage to South America.

As another matter of fact, that publicly exhibited part of "skull", allegedly from Hitler, had been DNA tested a few years ago, and found to be from a female.

To paraphrase Napoleon here: "History is the lie the powerful of this world agreed upon."

The more I learn, the more I understand how few I know ...

Today Russia planning to use nuclear rocket engines in manned space exploration in the year 2019. Noone says what will be if nuclear rocket engine get damaged after activation. But everyone understand.

Agreed. But viewing "space agencies" as a front cover for military space operations, such a decision would make more sense. Just a guess...

0 Alexey Bagaev over 9 years ago in reply to Alexey Bagaev

Genius 5505 points

Alexey Bagaev said:
With understanding of bitwise math properties you will realize that you need additional hardware to simplify such computations even further, that simple MSP430G-serices microcontrollers cannot do that.

This was my mistake. Yes, there is (at least one) an algorithm that can do it in a short way and can be efficiently done on even MSP430G-serices MCUs.

Alexey

0 Ilmars over 9 years ago in reply to Alexey Bagaev

Guru 46710 points

Alexey Bagaev said:
Yes, there is (at least one) an algorithm that can do it in a short way and can be efficiently done on even MSP430G-serices MCUs.

Unless you are building device which shall be sold in huge quantities (> 1milllion) to save every penny out of CPU and/or battery price, in modern day electronics economy it is not feasible to hand-optimize code. If compiler optimizations does not do the job and software libraries does not help either, then better you just change your CPU to one that meets computation requirements without any tricks. By the way there's fierce competition in low power segment, msp432 is starting so slow that I am afraid it is losing battle

0 Alexey Bagaev over 9 years ago in reply to Ilmars

Genius 5505 points

Ilmars said:
If compiler optimizations does not do the job and software libraries does not help either, then better you just change your CPU to one that meets computation requirements without any tricks.

I still believe in tricks. Most of my complex research projects consume so small energy, that they all rely on 1F super capacitors and can last from some hours to some days (with graphical user interface, connected external sensors, wireless or wired connections, power supply gauge, high precision RTC and so on). TI provide me with electronics that can last for centuries from an AA batteries, if it's done well. So I also try hard. :)

Alexey

0 Alexey Bagaev over 9 years ago in reply to Ilmars

Genius 5505 points

Ilmars said:
By the way there's fierce competition in low power segment, msp432 is starting so slow that I am afraid it is losing battle

I worked on a mobile phone program project, which was successful in terms of performance but not power consumption. For example, performance on my 4-years old mobile device with 2-core ARM processor was quite good - intensive computation benchmark was only 2-4 times slower than on my 2-core Intel notebook. But when I've done presentation, the battery was depleted in less an half hour. :( It was a small lesson to me.

Regards,

Alexey

0 Ilmars over 9 years ago in reply to Alexey Bagaev

Guru 46710 points

>Most of my complex research projects consume so small energy, that they all rely on 1F super capacitors
Well.. researchers often do not care about price/performance and consumer product economics (note I mentioned huge quentities, thus consumer products). Often researchers/scientists uneccesary overspend on development, pehaps just because they love to "research" and solve challenges. Disclaimer: I am talking about my experience, not you

0 Alexey Bagaev over 9 years ago in reply to Ilmars

Genius 5505 points

Ilmars said:
Unless you are building device which shall be sold in huge quantities (> 1milllion) to save every penny out of CPU and/or battery price, in modern day electronics economy it is not feasible to hand-optimize code.

I assume my challenges can cause difficulties to me and others. But what I do is rely more on intuition and intention than on an economy. So, anyone, please don't try to fit it in your business plan. I am not a competitor to all of you. I do my projects alone to fit my personal requirements without any even moral support outside (except of TI).

Alexey

0 Alexey Bagaev over 9 years ago in reply to Ilmars

Genius 5505 points

Ilmars said:
msp432 is starting so slow that I am afraid it is losing battle

MSP432 rev.B had many issues. Some of them were quite reasonable to ultra-low-power mode: inability of TIMER-A to work in LPM3 (cannot source from A-clock), voltage supervisor consumed more that 1000 times more energy than on MSP430, LPM0 consumed more than 4 times more power than MSP430, flash worked in 0-wait state only up to 12Mhz, absence in some peripherals primary data buffers, despite have 10% wider working voltage range, peripherals states (including flash)should comply with strict requirements, global interrupts disabled when flash is writing and so on.

High performance is good, but when MSP432 claiming low power consumption, it should comply with such specific rules.

Alexey.

0 f. m. over 9 years ago in reply to Alexey Bagaev

Guru 11940 points

High performance is good, ...

Reminds me on the approach of Cortex M low-energy devices (AFAIK originating from EnergyMicro): high performance for a short duration, and then back to a deep-sleep mode with nA/uA current draw.

The MSP432 would be a good device for such an approach, having implemented MSP430-compatible peripheral blocks (other thread suggest you already think about that). However, my experiences with pilot-run silicons in general are not that glorious ...

0 Alexey Bagaev over 9 years ago in reply to Ilmars

Genius 5505 points

Ilmars said:
researchers often do not care about price/performance and consumer product economics

Today I think about Pokemon GO case study, so humanity in power evaluated the value of this product in more than 25 billion US $. Probably because this is the price of indirect global espionage solutions.

0 f. m. over 9 years ago in reply to Alexey Bagaev

Guru 11940 points

Today I think about Pokemon GO case study,... . Probably because this is the price of indirect global espionage solutions.

IMHO "Facebook" would be an even better case study. It is an acknowledged fact that Zuckerberg's company was startup-financed by a CIA front-end (the agency that is "rumored" to control most of the world-wide drug traffic, to finance it's black-budget operations). Facebook's daily business might be a loss (like it's stock market launch was - for the clueless investors), but the private information shared by the naive/careless millions is priceless for the three-letter-acronym agencies.

And there is this rumor (actually more than that) that Mark Zuckerberg's name is actually not "Mark Zuckerberg", and he is a grandson of the Rockefellers ...

...so humanity in power ...

And there is an ongoing discussion in many circles if those in power are actual humans.

Again digressed quite far, no link to 32-bit / 64-bit calculations ...

0 Alexey Bagaev over 9 years ago in reply to f. m.

Genius 5505 points

f. m. said:
Again digressed quite far, no link to 32-bit / 64-bit calculations ...

Ok, let I link all of this to the main theme of this forum. When I said that my first project was 2-4 times slower on mobile phone than on PC, it was because CPU clock was 50% lower and 32-bit based compare to faster PC clock and 64-bit calculations. The algorithm of that program officially was based entirely on NSA's technology in purpose of computer security and requires 64-bit platform for calculations by default (variables was 64-bit wide). Some additional constants for calculation needs 256-bit variables or more not to loose precision, so they were just preprogrammed. What else? I switched to microcontrollers not only because they have lower latencies and lower power, but also because they don't have any operating systems. (For security reasons). :)

Alexey

0 Alexey Bagaev over 9 years ago in reply to f. m.

Genius 5505 points

f. m. said:
Again digressed quite far, no link to 32-bit / 64-bit calculations ...

PS: Raspberry Pi 3 platform is based on 64-bit platform and full of powerful peripherals, but it doesn't look interesting to me.

Alexey

0 Alexey Bagaev over 9 years ago in reply to f. m.

Genius 5505 points

f. m. said:
Zuckerberg's company was startup-financed by a CIA front-end

f.m., don't warry, as Clemens L. early said "we are not the NSA, we cannot see your code". I'm just an advanced user, who wants a little bit of private life guarantied by constitution.

0 f. m. over 9 years ago in reply to Alexey Bagaev

Guru 11940 points

There are some "rumors" (again...) that Intel's CPUs contain certain "unwanted" cores and functionalities, similar to what you described. Including a kill switch to fuze the chip and "simulate" an EMP.

Raspberry Pi 3 platform is based on 64-bit platform and full of powerful peripherals, but it doesn't look interesting to me.

For the kind of reason you mentioned in the post before ?

I've read other reviews of the new Raspberry PI, stating it reaches it's maximal - and heavy advertised - performance only for a few seconds. Then it throttles down, for thermal reasons. I would feel cheated if I had one ... ;-)

0 Alexey Bagaev over 9 years ago in reply to f. m.

Genius 5505 points

f. m. said:

And there is an ongoing discussion in many circles if those in power are actual humans.

Again digressed quite far, no link to 32-bit / 64-bit calculations ...

I wanted to stop, but today I received e-mail with article with honoring the success of Pokémon Go, UC Berkeley MBA graduate - John Hanke. I assume such University hadn't done lectures in common and civil laws, if he didn't pay for that part.

0 f. m. over 9 years ago in reply to Alexey Bagaev

Guru 11940 points

I wanted to stop,...

Me too, but just can't resist ...

...the success of Pokémon Go, ...

Probably doesn't mean the many people that have themselves "nominated for the Darwin Award" ...

...UC Berkeley MBA graduate - John Hanke.

According to my (limited) experience, "business" (as in "MBA") and technical expertise are mutually exclusive. I guess hagiographers at work. His story smells like Facebook, somehow.

... lectures in common and civil laws, ...

Man's law is rather fluid, and can be changed quickly. We going to see soon.

0 Ilmars over 9 years ago in reply to f. m.

Guru 46710 points

Let's pull this thread back to the source? :)

What about Pokemon GO application being worst battery-hungry ever seen, yet it is not 3d-animation packed. Reviews say that pokeapp discharges battery in approx hour - even with fancy AR mode off, but waze which is very similar from graphics, GPS and communication point of view on same hardware lasts at least 4...5 times longer.

This happens when manufacturer does not optimize code/product at all, releases it in a hurry. U never know consumer mindset behind some product popularity :) ANY other mobile application would be thrown away in shame, but not this. Go figure

0 f. m. over 9 years ago in reply to Ilmars

Guru 11940 points

Unfortunately, I cannot contribute much in that direction - I don't have a smartphone.

For reasons mentioned earlier in this thread. And daily seeing people using it confirms my choice ...

0 Ilmars over 9 years ago in reply to f. m.

Guru 46710 points

To contribute talk about when to optimize code and how much, what will be optimization (or not) impact on product, you do not need to have smartphone. You can post in this forum using computer as well.

0 Alexey Bagaev over 9 years ago in reply to Ilmars

Genius 5505 points

Ilmars said:
ANY other mobile application would be thrown away in shame, but not this.

My application can compute NSA's 64-bit security algorithm on 32-bit 1GHz 1 CPU-core Cortex-A by analyzing the data from SD card at speed more that 100 Mbit/sec of any size files restricted only by disk file system on 4-years old phone. I don't fill shame about this. This program passed 9 different worldwide certifications for accessibility. (This programs don't have rights to access to any other file, than pointed by user, it is restricted by operating systems).

0 Katie Pier over 9 years ago in reply to f. m.

TI__Guru* 78271 points

Hi guys,

Let's try to keep the threads on topic - it makes it easier for people to find answers when they are searching for info about the thread topic later.
Thanks!
-Katie

0 old_cow_yellow over 9 years ago

Guru 58965 points

All is quiet now, I hope this may be a good time to reflect on your original post.

(1)

Dennis Eichmann said:
... without any math library, just to see the difference ...

I think you were using some kind of library to do them. Could you confirm?

(2)

Dennis Eichmann said:
... optimization is completely off ...

Library is pre-complied, or pre-assmbled. You cannot change its optimization level. I think you can only change the optimizations of code setup, the code for loading of arguments, and unloading of returned values. Please try to compare code size and execution speeds with different optimizations.

(3)

Dennis Eichmann said:
... I read an ADC value of 38205 and want to calculate the percentage of the full ADC span, so one would calculate:
(38205 * 100%) / 65535 = 58.2970931....%

None of your variant A, B, or C got very close to that result. Variant B was the closest with 58.29709%. Variants A and C were unreasonably off the mark.

I do not know how to write code in c, so I tried use assembly code instead -- and made my assembly code callable from c. The prototypes are:

#include <stdint.h>
uint32_t ocy_mlpy50000_div65535_mlpy2(uint32_t x);
uint32_t ocy_mlpy1000000000_div65535(uint32_t x);

Where the first subroutine can be regard as variant D. I calculated the *50000/65535 part with 16-bit shift-and-add, and the final *2 part with 32-bit shift. It took 34 words (68 bytes) and 36 MCLKs.

The second subroutine can be regard as variant E.I calculated the entire thing with 32-bit shift-and-add.It took 138 words (276 bytes) and 140 MCLKs.

I made a test as follows:

#include <msp430.h>
#include <stdint.h>

#include "ocy_asm4c.h"
volatile uint32_t adc_result;
volatile uint32_t A32;
volatile float Bfl;
volatile uint64_t C64;
volatile uint32_t D16;
volatile uint32_t E32;
void main(void)
{
  //adc_result = 0;                                           // test case adc=0
  //adc_result = 65535;                                       // test case adc=65535
  adc_result = 38205;                                       // test case 38205
  {
    A32 = ((adc_result * 10000) / 65535);
    Bfl = (((float)adc_result * 100.f) / 65535.f);
    C64 = (((int64_t)adc_result * 100000) / 65535);
    D16 = ocy_mlpy50000_div65535_mlpy2 (adc_result);
    E32 = ocy_mlpy1000000000_div65535 (adc_result);
    __no_operation();                                       // set breakpoint here to inspect
  }
}

For test case adc=0, all 5 variants resulted in 0%. For test case adc=65535, all 5 variants resulted in 100%

For test case adc=38205, the results are:

variant A: 58.29%

variant B: 58.29709%

variant C: 58.297%

variant D: 58.297%

variant E: 58.2970931%

My assembly code is very simple and stupid and I use IAR-Kickstart. I did not include them here knowing that no one would want to look at them

**Attention** This is a public forum

MSP low-power microcontrollers

MSP low-power microcontroller forum

Comparison between 32 bit int, float and 64 bit int calculation regarding execution time - 64 bit int that slow?