This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Performance: Starterware on Beaglebone vs. Mikrocontroller

Good morning everbody,

At the moment I am testing out Starterware on the Beaglebone. I'm wondering how much better the performance is compared to a Mikrocontroller like the Arduino Due with its 84 MHz or the STM32F4 Discoveryboard with a Cortex-M4 (168MHz). I know that this is a question that has man aspects like whether the Conrtoller uses a FPU etc.

The Sitara Processor with its Cortex-A8 on the Beaglebone is clocked with 700MHz and has a FPU, so I guess it should speed out the Mikrocontrollerboars enormous.

Did anyone made some direct comparison so far?

  • I did no direct comparison/benchmarks with the hardware but comparison based on features of both. Here BBB of course is much faster, nevertheless some things depend on what you want to do exactly.

    For me BBB offered enough power to implement a (very specific) serial protocol in software only (by bit-banging some GPIOs), which would not have been possible on an Arduino.

  • I made a comparison between the Beaglebone+Starterware, Beaglebone+Linux and the ST Nucleoboard (Cortex-M3, 72MHz, no FPU).  For example I did some integer addition. The programm goes like this:

    int i;
    int a = 0;

    while(1)
    {
        toggle_gpio(20);

        for(i=0; i<35000; i++)
        {
            a = a + i;
        }
    }

    The I measured the frequency of that gpio pin and got the following results:

    Nucleoboard: 32,6Hz
    Beaglebone+Starterware: 413 Hz
    Beaglebone+Linux: 458 Hz

    Doing the for-loop just once by setting i<1, I measure:

    Nucleoboard: 67,8 kHz
    Beaglebone+Starterware: 12,5 MHz (GPIO reach there maximum, no clear rectangle voltage output anymore)
    Beaglebone+Linux: 6 kHz (GPIO Driver can't go faster on Linux)

  • Thomas Laudan said:
    Beaglebone+Starterware: 413 Hz
    Beaglebone+Linux: 458 Hz

    The value for Beaglebone+Starterware is way too low. I'd guess you haven't enabled caches in this case!? This would boost it by some magnitudes.

  • You're right. Starterware can go much faster. I forgot to enable Optimization in the Compiler Options. Using Level3, I measured something around 7 kHz instead of 413 Hz with Starterware.

    Besides activiating VFPv3 and Neon in the Compiler Options, this documents

    http://www.ti.com/lit/ug/spnu151i/spnu151i.pdf
    (page 32 at the bottom next to --neon)

    tells you that at least level 2 of optimization is needed when you want the compiler to use the Neon FPU for speeding up your code.

  • I don't know if this would make a difference for your short test but I found within my application optimisation level 4 offers an other boost of speed-up. There I have an alive-signal (blinking LED) with became about two times faster just by switching from level 3 to 4.

    Unfortunately in this level compiler sometimes seems to damage some code and lwIP needs to be modified slightly to get it compiling...

  • what do you mean by "IwIP" needs to be modified?

    i found out that at a high level of optimization you can't really debug anymore and sometimes within my tests i really had to watch out because once i was just doing sqrt(3.1415) 35000 times and of course the compiler recognized that i was just doing operations without any sense ^^

  • Thomas Laudan said:
    what do you mean by "IwIP" needs to be modified?

    During linking with optimisation level 4 it complains about different function types for malloc()/free(). This is caused simply because an "#include <stdlib.h>" is missing at the place where lwIP creates its own macros that map to malloc()/free().

    Thomas Laudan said:
    i found out that at a high level of optimization you can't really debug anymore and sometimes within my tests i really had to watch out because once i was just doing sqrt(3.1415) 35000 times and of course the compiler recognized that i was just doing operations without any sense

    Yes, debugging is quite difficult at this level because you do no longer step through the code like you have written it but in the way it looks like after optimisation - and that looks quite different. And in worst case you can see what I meant with "sometimes the code is damaged". But in my opinion for really time-critical applications it worth the efforts to be plagued with this optimisation level since increase in speed is sometimes really high.

    An other possibility is to set this optimisation level for some functions only which can be done with

    #pragma FUNCTION_OPTIONS(timeCriticalFunctionName,"--opt_level=4 --opt_for_speed=5")

    And there is a compiler option that gives the possibility to optimise the code for cache usage which seems to be useful too (but I don't have any experiences with it regarding speed).