Performance: Starterware on Beaglebone vs. Mikrocontroller

Thomas Laudan

Good morning everbody,

At the moment I am testing out Starterware on the Beaglebone. I'm wondering how much better the performance is compared to a Mikrocontroller like the Arduino Due with its 84 MHz or the STM32F4 Discoveryboard with a Cortex-M4 (168MHz). I know that this is a question that has man aspects like whether the Conrtoller uses a FPU etc.

The Sitara Processor with its Cortex-A8 on the Beaglebone is clocked with 700MHz and has a FPU, so I guess it should speed out the Mikrocontrollerboars enormous.

Did anyone made some direct comparison so far?

over 12 years ago

qxc over 12 years ago

Genius 5820 points

I did no direct comparison/benchmarks with the hardware but comparison based on features of both. Here BBB of course is much faster, nevertheless some things depend on what you want to do exactly.

For me BBB offered enough power to implement a (very specific) serial protocol in software only (by bit-banging some GPIOs), which would not have been possible on an Arduino.

Thomas Laudan over 12 years ago in reply to qxc

Prodigy 75 points

I made a comparison between the Beaglebone+Starterware, Beaglebone+Linux and the ST Nucleoboard (Cortex-M3, 72MHz, no FPU). For example I did some integer addition. The programm goes like this:

int i;
int a = 0;

while(1)
{
    toggle_gpio(20);

    for(i=0; i<35000; i++)
    {
        a = a + i;
    }
}

The I measured the frequency of that gpio pin and got the following results:

Nucleoboard: 32,6Hz
Beaglebone+Starterware: 413 Hz
Beaglebone+Linux: 458 Hz

Doing the for-loop just once by setting i<1, I measure:

Nucleoboard: 67,8 kHz
Beaglebone+Starterware: 12,5 MHz (GPIO reach there maximum, no clear rectangle voltage output anymore)
Beaglebone+Linux: 6 kHz (GPIO Driver can't go faster on Linux)

qxc over 12 years ago in reply to Thomas Laudan

Genius 5820 points

Thomas Laudan said:
Beaglebone+Starterware: 413 Hz
Beaglebone+Linux: 458 Hz

The value for Beaglebone+Starterware is way too low. I'd guess you haven't enabled caches in this case!? This would boost it by some magnitudes.

Thomas Laudan over 12 years ago in reply to qxc

Prodigy 75 points

You're right. Starterware can go much faster. I forgot to enable Optimization in the Compiler Options. Using Level3, I measured something around 7 kHz instead of 413 Hz with Starterware.

Besides activiating VFPv3 and Neon in the Compiler Options, this documents

http://www.ti.com/lit/ug/spnu151i/spnu151i.pdf
(page 32 at the bottom next to --neon)

tells you that at least level 2 of optimization is needed when you want the compiler to use the Neon FPU for speeding up your code.

qxc over 12 years ago in reply to Thomas Laudan

Genius 5820 points

I don't know if this would make a difference for your short test but I found within my application optimisation level 4 offers an other boost of speed-up. There I have an alive-signal (blinking LED) with became about two times faster just by switching from level 3 to 4.

Unfortunately in this level compiler sometimes seems to damage some code and lwIP needs to be modified slightly to get it compiling...

Thomas Laudan over 12 years ago in reply to qxc

Prodigy 75 points

what do you mean by "IwIP" needs to be modified?

i found out that at a high level of optimization you can't really debug anymore and sometimes within my tests i really had to watch out because once i was just doing sqrt(3.1415) 35000 times and of course the compiler recognized that i was just doing operations without any sense ^^

qxc over 12 years ago in reply to Thomas Laudan

Genius 5820 points

Thomas Laudan said:
what do you mean by "IwIP" needs to be modified?

During linking with optimisation level 4 it complains about different function types for malloc()/free(). This is caused simply because an "#include <stdlib.h>" is missing at the place where lwIP creates its own macros that map to malloc()/free().

Thomas Laudan said:
i found out that at a high level of optimization you can't really debug anymore and sometimes within my tests i really had to watch out because once i was just doing sqrt(3.1415) 35000 times and of course the compiler recognized that i was just doing operations without any sense

Yes, debugging is quite difficult at this level because you do no longer step through the code like you have written it but in the way it looks like after optimisation - and that looks quite different. And in worst case you can see what I meant with "sometimes the code is damaged". But in my opinion for really time-critical applications it worth the efforts to be plagued with this optimisation level since increase in speed is sometimes really high.

An other possibility is to set this optimisation level for some functions only which can be done with

#pragma FUNCTION_OPTIONS(timeCriticalFunctionName,"--opt_level=4 --opt_for_speed=5")

And there is a compiler option that gives the possibility to optimise the code for cache usage which seems to be useful too (but I don't have any experiences with it regarding speed).

Processors

Processors forum

Performance: Starterware on Beaglebone vs. Mikrocontroller