This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C++ vs C performance

Hi there,

Does anybody out there have a measurement of the code performance written in C++ vs the same code written in C?

I saw the info under http://processors.wiki.ti.com/index.php/Overview_of_C++_Support_in_TI_Compilers

But I would like to know if anybody really benchmarked C++ performance.

Thanks,

B.

  • Moving this to the CCS forum.

  • Actually the best place for this is in the Compiler forum (sorry for the double move)

  • Behzad Lajevardi said:
    I would like to know if anybody really benchmarked C++ performance.

    I'm not aware of any studies done by anyone with TI.  This a topic with a surprising amount of depth.  The best collection of information on the topic I know is this technical report on C++ performance.  Any summary I might give can easily be challenged by someone with a different perspective.  With that in mind, I'll hazard this opinion: Of course you can write C++ code that is as efficient as C.  But you have to understand the performance aspects of the code you write.  If you pay no attention to performance, you can easily get into trouble.

    Thanks and regards,

    -George

  • Hi George,

    Thanks for taking time and responding to my question.  I reviewed briefly those articles you sent me.  I was specifically interested in numbers rather than ideas how to program in C++.

    What I would like to know is if anybody compiled a piece of code in C and C++ and compared the run time performance of the two.  I think it was common wisdom that C++ does not perform as C many years ago.  It depends on compiler if it has been improved in recent years.  So I am wondering how much slower is C++ compare to C.

    regards,

  • If you compile a C program as C++ without changing the code, you'll usually get exactly the same performance.  There are a few corner cases where the C++ language is subtly different, but the typical program will not be affected.  Some C++ features such as virtual functions are indeed expensive to implement, giving C++ a reputation for being slower than C, but if you avoid those expensive features, C++ can be as efficient as C.

  • Thanks for clarification.

  • I've been programming in C++ on MCU platform for over 4 years. My experience wrt performans is:

    Usage of Language
    Usually what people understand from C++ is lots of virtual methods, lots of heap usage.. Because C teaching talks a lot about functions, C++ books talks a lot about "new" and "delete".. When you tell "I am using C++ on MCU" you face with "it has poor performance" claim because of this.

    If you are careful with heap, this is also valid for C, and templates then you are mostly safe. Actually, good designed C++ code is much more clear than C code, so if you developed right habits it's much less buggy..

    Sometimes it has more performance than C because of habit.. Think about an Uart implementation.  If MCU has more than 1 Uart, C developer tend to use a variable as handle, and switch block to differentiate. I run into the following implementation many times:

    void uart_open(uart_t uart)
    {
      switch (uart)
      {
        case 0: uart_open0(); break;
        case 1: uart_open1(); break;
      }
    }
    

    If you differentiate with polymorphism, the performance is improved. Because the on call of Uart::open, compiler loads vmt (1 instruction), than loads the address of right function (1 instruction). That's it.. Switch cost a lot more than that..

    By the way, it is a little complex on fast platforms, which rely on flash accelerator. Usually read of VMT is beyond the cache, so you have to consider this if you are calling lots of virtual methods..

    Inline functions are really nice. It is equivalent of macro but with strong types.. I really recommend inline's to macros. They are much easier to track and performance really good.. The power of C++ is, if you see calling a function is the bottleneck, you just mark it with inline and see performance improvement.. If you want to do that with C, you have to work on ugly, bug prone macros..

    Templates are really handy tools if you use them right.. This is not about performance but about final image size. On each class compiler generates a new copy of that class. If you design methods of templates short and inline, lucky you. The performance really nice, comparable with to macros of C. The paradigm is not comparable, templates are superior..

    Exceptions are really nice, but on MCU platform it adds tons of checks, saves call tree to resolve back etc. Do not use exceptions.. Use return values instead..

    To sum up, C++ is a double edged knife. It gives you tools to help your coding on paradigm wise and performance wise; however, it also is very easy to ruin..

    The Compiler
    As I said it also can ruin has two sides. As far as I see, the compiler developers for MCU platform spend more time on C language. C++ is step brother.. I've been working on Arm until last year, and I had to do lots of tweaks to my library to deal with performance issues caused by TI's C++ compiler..

    TI's compiler tries to be safe on objects. It  does unnecessary checks on objects references.. For instance, suppose you declare class A that does not have any virtual methods, and declare class B which extends from class A with one virtual method..

    class A
    {
     //..
    };
    
    class B : public A
    {
    public:
      virtual foo();
    };
    
    void playWithA(A* a)
    {
    }
    
    B b;
    int main()
    {
      playWithA(&b);
    }

    If you call a function that accepts A* with a B* type object, before calling it check if given parameter is NULL, if it is NULL it calls with 0, if not with given parameter.. So you unknowingly add one comparison, one load and jump, one addition before call.. :))))

    This behavior understandable, as the position of A in B is translated, if &b is null, than it's A typecast must be NULL..  Keil's compiler does not check this, just calls playWithA as playWithA((A*)(((char*)&b) + 4)); It trusts the coder.. I prefer this, because I never pass NULL typecasted object to a function. Calling a function with NULL is C habit. I am sure such person is not good at C++.. As I said that, there are rare conditions I pass NULL:

    Spi::shift(void* txData, void* rxData, ShortL length);

    As you can see, it is void*, not a class.. If txData is null, Spi just send predetermined constant, if rxData is null, Spi discards what it receives.. ShortL is 16 bit integer on 8 and 16 bit platforms, 32 bit integer on 32 bit platforms..

    Another example is one of the discussion about the return value optimization. TI's compiler only optimizes when target function is not virtual..


    Value v = a.getValue();
    

    If getValue() is not virtual, compiler allocates space in stack, gives reference to getValue, getValue fills in the data and returns. Surprisingly TI's compiler can inline that function. Which is very nice. However, if getValue() is virtual, again the similar habit of checking reference.. Before calling space is allocated and reference is given to getValue, getValue also allocates space in its stack, fills in, then tests if given reference is not NULL, calls memcpy.. :))))

    Simple 2 word return becomes headache.. My solution to this was

    Value v; // v has empty constructor to prevent compiler add unnecessary zeroing..
    a.getValue(v);
    

    Not neat, but this time compiler works as intended.. I usually return error codes..

    I had many more tweaks on my code to suit TI's compiler. TI changed my coding style.. George says unless one creates a support base, they won't change that kind of behaviors.. He's a kind guy, but what I understand from his saying is: suck it up.. So, I think I won't go further asking about other tweaks I had to make.. :))

    Conclusion
    If you are careful, and look at the generated assembly code, C++ is much more powerful than C and it does not add performance degradation. With some paradigm change, it may even help you on performance..

    I hope this helps..

  • Hi Deniz,

    I really appreciate for taking time to detail out some of the advantage/disadvantages you have encountered in last few years.

    I was wondering if you would recommend some of the optimization flags that you think are useful to improve the  performance. Any Do's and Do not's?

    thanks,

     

  • First of all, why are you concerned with performance? I am obsessed with performance on my core library, which has hardware abstraction and operating system built in.. All of the IRQ, task switching, timing stuff is in it. But on application side I don't care much. Actually using an operating system also helps, because low priority task is preempted by high priority tasks..

    Performance became an issue only in rare projects, such as low processing power but high throughput. I accidentally had to compress/decompress audio on CC430, while communication with other peripherals were on going, data was received/transmitted through RF.. I had to tweak the compression section, and run a thread in background to catch up with incoming, outgoing data.. :) Apart from that singular problems, performance have not been an issue..

    BTW, I saw guys who claim C++ bad at performance, while using floating point math to do simple stuff.. :)

    Putting aside the "why", I usually do not like "do"s, and "don't"s.. I only care what works, and what does not, and say "be careful".. Sometimes one thing only creates problem, but sometimes it is the solution of another problem.. So, its up to you..

    While C++ adds lots of useful staff, such as namespaces, templates, operator overloading, member hiding etc. etc. mainly C++ is actually C with compiler help.. It is basically help two ways:

    1- Instead of writing doSomethingWith(object) in C, in C++ you write object.doSomething().. You do that in C alot. Call a function with an argument to work on. Behind the scene they are essentially same. Even the assembly out is the same.. Compiler does it for you.. But paradigm wise they are different. Writing object.doSomething() opens a lot of opportunities..

    2- To differentiate wrt to objects, instead of creating a table of functions, and holding a pointer to table; you let compiler to do that.. Essentially only pro's create table of functions, novice programmers do it by switch clause, they even do not use sequential enumerations to cause compiler to generate jump tables..

    So that's the major difference between C and C++.. I don't see any performance bottleneck here.. I've been programming in C++ nearly since I learned programming. At that time there was debate about performance but the problem was mostly in the paradigm, not the language..

    Mostly string handling was the problem.. :) In C++, programmer uses a string class, which works with heap. For instance in one really bad implementation if you say s = s1 + s2; + operator generates space in heap, copies s1 and s2 to that, than = operator also creates  space in heap, copies result from + operator, than finally + operators copy is destroyed. 2 copies, 2 heap allocations, 1 heap deallocation..

    A C coder usually allocates static memory for string, then writes: strcat(strcpy(s, s1), s2); this is just copy.. Although this is not optimized: strcat and strcpy returns not the end, but s itself and strcat has to search the the end. But it is not a big problem to write a simple copy and cat that returns the end of destination..

    So, C++ implementation can not catch C equivalent.. Right? Not quite right.. The problem ıs the ugly implementation.. My implementation is building a tree of operators, and running that operator three when in the = operator..

    Think about this:
    StringBuffer<80> s;
    s = s1 + (s2 + s3);

    StringBuffer extends from String, with having 80 character buffer. No heap usage than.. (s2 + s3) creates StringStringAdditionOperator in stack that holds s2 and s3 as parameter in stack.  s1 + () creates StringOperatorAdditionOperator in stack that holds s1 and pre allocated operator.. So there is a tree.. Then = operator is fed with root StringOperator object. It calculates the position of each string and copies them to destination..

    Of course because of StringOperator creation it is not efficient as strcat again.. (Actually it is more efficient as strcat searches end).. The problem with C++ is, you overload the operator but cannot tell the compiler to reverse the order.. If it allowed that, this could be much much more efficient.

    But why not this:
    s.copy(s1).append(s2).append(s3)?
    Whala.. This is always optimized.. s.copy(s1) is called with s, but append's are diffferent.. s.copy copies s1 to s and returns StringAppender class which holds pointer s and to end of it, so each append is very efficient.. The beauty of this is the application developer does not have to think about performance each time, the library developer gives him efficient one.. Look at this, can you read what it does?

    	// Date Time, YYYYMMDDHHMM
    	UShort i;
    	if (token.getNext().isNull() || (token.getLength() != 12)
    			|| (token.getSubString(0, 4).toInteger(i) < 0))
    		return ErrorInvalidFormat;
    	DateTime dt;
    	dt.setYear(i);
    	if (token.getSubString(4, 2).toInteger(i) < 0)
    		return ErrorInvalidFormat;
    	dt.setMonth(i);
    	if (token.getSubString(6, 2).toInteger(i) < 0)
    		return ErrorInvalidFormat;
    	dt.setDay(i);
    	if (token.getSubString(8, 2).toInteger(i) < 0)
    		return ErrorInvalidFormat;
    	dt.setHour(i);
    	if (token.getSubString(10, 2).toInteger(i) < 0)
    		return ErrorInvalidFormat;
    	dt.setMinute(i);
    	dt.setSecond(0);
    

    Could you imagine writing that one in C? The code does not copy anything, it just handles data in its place.. Would it be more efficient in C? I don't think so. And I am pretty sure C coder copies each part to buffer to work on it. Habit again.. :)

    More importantly would it be easier to read?


    Suggestions
    - Careful with using exceptions. If it is not mandatory, avoid it..
    - Careful adding a new virtual methods, if compiler sees a virtual it adds it even it is not reffered
    - Try to handle data in its place. This is also valid for C..
    - In C++ they say "don't use global variables". However, using global in MCU helps to reduce code also increases performance.. On call, compiler just assigns a constant to a register if the variable is global. If not, it reads the address of holding object, then it does a relative read to assign the address.
    - If the performance is the issue, working with pointers in objects, try loading it to local parameter.. Compiler tends to read/write the same variable everytime.. This depends on optimization but I saw failures..

    For instance, from my past trials:

    void DoubleLinkedListBase::add(DoubleLinkItem* item)
    {
    	DoubleLinkItem* items = this->items;
    	if (items == NULL)
    	{
    		item->next = item;
    		item->previous = item;
    		this->items = item;
    	}
    	else
    	{
    		DoubleLinkItem* p = items->previous;
    		item->previous = p;
    		item->next = items;
    		p->next = item;
    		items->previous = item;
    	}	
    }
    

    This is also valid for C. I explicitly assigned this->items to items, and items->previous to p.. This was from one of my templates that is called frequently. If it were not the bottleneck, I would not care.. :)

    I use fields without _ etc to differentiate. It gives me the power above. If I write items = this->items, now items are local.. I just be careful if arguments and local variables names are same with fields.. This method is like driving in bad road vs smooth road. You are become more careful when you use same name.. :) _ makes names cryptic.. Just suggestion, its up to you..

    - Be careful with heap. C++ coders tend to use it frequently. Instead preallocate the space.. Actually C++ helps you on that with constructors. Constructor also means compiler will make you favor calling a predetermined method when you refer an object.. Even before main is run..

    - This is not optimization but a pain reduction: If your globals are using each other, do not rely on compilers call sequence of constructors. Assume they are random..

    Actually no switch change is required on TI's compiler. C++ is C with compiler help..

  • Thank you very much for the detailed explanation.  Last time I was writing C or C++ was about 16-17 years ago.  Meanwhile I wrote programs in Smalltalk, Java and C#.  So I am much more comfortable with OO in general.

    I had a discussion about C vs C++ performance back then.  I was told just using C++ compiler in some architecture makes the C++ code about 10 slower than C.  Many things have been improved since then. And therefore I wanted to be sure by selecting C++ in an embedded environment I would not make any wrong decision.

  • Go with C++.. Even if you didn't use OO features, function overloading, inline functions and namespaces are enough to choose C++ over C..


    I remember those debate's.. Most of the problem was implementation though.. Even today, just try using standard library and see how crap it is performance wise.. Add that the developer's character: they are difficult people, thay are usually strongly biased on any subjects.

    Now I am biased.. :)