Uneffective access of virtual table

David Henri

Hi,

I'm coding in C++ and we defined several objects which inherit from the same interface. This interface is an abstract class which list several pure virtual functions that need to be defined by the derived clases.

The problem I'm facing is the following: the compiler doesn't compile efficiently my code when accessing one of those redefined virtual functions. In term of cycles, if I declare my function virtual in my interface I measure 44 cycles, if I remove the virtualization of my function (which means it is simple inheritance and the function to be call is know at compile time) it takes 4 cycles. It may not seems much like that but represent 40 cycles -> 0.41% CPU time (at 16.4kHz loop) and if I have 25 of those calls I'm loosing 10.25% of my whole CPU time!

The following scenario shows what I'm doing:

class Base { public: virtual void foo( ) = 0; }; class Derived : public Base { public: void foo( ) { //Do something } };

void CallFoo(Base* p) { p->foo( ); // Calls the derived version }

int main( ) { Derived a; CallFoo(&a)}

Please let me know if it is a known issue (limitation). I'm using TMS570Ls2 with compiler 4.9.0.

Thank you.

over 14 years ago

0 David Henri over 14 years ago

Intellectual 410 points

Correction,

After more investigation I realize the problem is not entirely related to the usage of the Virtual. In fact it seems a bit worst, it is related to the usage of a function pointer (which is also use with the virtual table) . I tried to use a function pointer intead of caling the virtual function and I get the same bad result.

Does anyone knows if there is any way (option) to reduce the number of cycle used by a function pointer call (which seems to be a BLX assembly instruction) ?

Thanks.

0 George Mock over 14 years ago in reply to David Henri

TI__Guru**** 250340 points

Go here http://www.parashift.com/c++-faq-lite/virtual-functions.html#faq-20.3 for a nice discussion of how virtual function calls are implemented, and the related overhead. The overhead is not that much, but it does include two extra memory reads. I suspect most of those 40 cycles is waiting on memory to provide the results of those reads. Does that make sense? If I'm right, then the only solution I see is to arrange for the data objects and associated v-tables to be in faster memory.

Thanks and regards,

-George

0 David Henri over 14 years ago in reply to George Mock

Intellectual 410 points

Hi George,

I understand the cost of a virtual function call (I'm doing exactly as the example in your link). But as my second post state, I'm getting very similar result (about 40 cycles waste) when I do a simple function pointer call without any virtual function..

I don't know if there is bad optimization in the compiler or if it is the best it could do or if it is a matter of buid options I selected or if it is a memory access time as you suggested.

Here's what I tried the second time to avoid virtual function (it is not a replacement to virtual function it is only a test on function pointer to verify where goes my cycles)

class FuncPtrTest

{

public:

void (FuncPtrTest::*funcPtr)(void);

int i;

void FuncToPointAt(void) { i++; }

void init(void) { i=0; funcPtr = &FuncToPointAt; }

void Process(void) { (*funcPtr)();}

}

Thank you

0 George Mock over 14 years ago in reply to David Henri

TI__Guru**** 250340 points

At this point, your question is about behavior of the CPU itself. You want to know why a call to named function like "BL fxn" is so much faster than a call through a register like "BLX V9". I would move this thread to the TMS570 Forum, except I'm concerned that, what with all the C++, virtual functions, etc. your actual question would be misinterpreted, if not outright overlooked. So I recommend you start a new thread in that forum, narrowly focused on this one question about the CPU. Leave all the C++ stuff out of it for now.

Thanks and regards,

-George

0 David Henri over 14 years ago in reply to George Mock

Intellectual 410 points

George,

I didn't checked if a standard function call use the instruction BL or BLX. At this point I noticed that the execution of a function pointer seems to use the BLX instruction. Now, does the extra cycles are due to the BLX function I don't know for sure.

But since I'm in the compiler forum, why does the compiler decided to use BLX instead of BL? Also, how may I know which version of BLX is used since there is BLX(1) and BLX(2)? Finally, the X in BLX means an exchange between instruction set (ARM and Thumbs) but I do I know which instruction set I'm using and which part of my C code is ARM and which is Thumb?

Thanks.

0 George Mock over 14 years ago in reply to David Henri

TI__Guru**** 250340 points

Please tell me exactly which compiler options you are using. I want to be sure we are looking at the same generated code.

Thanks and regards,

-George

0 David Henri over 14 years ago in reply to George Mock

Intellectual 410 points

George,

I'm using the TMS570LS2 processor with TI ARM compiler version v4.9.0. Do you need more information?

David.

0 George Mock over 14 years ago in reply to David Henri

TI__Guru**** 250340 points

Please show me exactly how the compiler is invoked, including the build options.

Thanks and regards,

-George

0 David Henri over 14 years ago in reply to George Mock

Intellectual 410 points

George,

Here's my console output:

"C:/Program Files/Texas Instruments/CCS v5.1/ccsv5/tools/compiler/tms470/bin/cl470" -mv7R4 -mt -g -O4 --opt_for_speed=5 --symdebug:dwarf_version=3 --gcc --define=CPU_FREQ_140MHZ=1 --include_path="C:/Program Files/Texas Instruments/CCS v5.1/ccsv5/tools/compiler/tms470/include" --include_path="C:\Workspaces\controle\projects\M4-MO4\dev\V0.0.2\Tm4ToolLib\Utilities" --include_path="C:\Workspaces\controle\projects\M4-MO4\dev\V0.0.2\Tms570Lib" --include_path="C:\Workspaces\controle\projects\M4-MO4\dev\V0.0.2\Tms570Lib\Bsp" --include_path="C:\Workspaces\controle\projects\M4-MO4\dev\V0.0.2\Tms570Lib\Drivers" --diag_warning=225 --enum_type=packed --optimize_with_debug --abi=eabi --code_state=16 --float_support=VFPv3D16 --preproc_with_compile --preproc_dependency="Drivers/DrvAdc.pp" --obj_directory="Drivers" "../Drivers/DrvAdc.cpp"

Thanks

0 George Mock over 14 years ago in reply to David Henri

TI__Guru**** 250340 points

David Henri said:
why does the compiler decided to use BLX instead of BL?

I'm confident this explanation is correct, but I was not able to confirm it today. I'll chance it for now, and get back to you if I find out I made an error ...

For normal, non-indirect, function calls, the compiler issues a BL name_of_function instruction, and then the linker changes it to BLX if it turns out a change in code state (either ARM-to-Thumb or Thumb-to-ARM) is needed. An indirect call through a register must use BLX. Whether a state change is needed cannot be known until run time.

David Henri said:
how may I know which version of BLX is used since there is BLX(1) and BLX(2)?

I'll have to get back to you on that.

David Henri said:
Finally, the X in BLX means an exchange between instruction set (ARM and Thumbs) but I do I know which instruction set I'm using and which part of my C code is ARM and which is Thumb?

Your command line options include both -mt and --code_state=16. Both options mean the same thing: use Thumb instructions.

Thanks and regards,

-George

0 David Henri over 14 years ago in reply to George Mock

Intellectual 410 points

George,

Two last questions:

When you say the -mt option means I'm using thumb does it means automatically thumbs2?

So if i'm use thumb compiler option (-mt) is it possible I'm still using ARM instruction by using special keyword in assembly code? Because from what I have been told very small piece of code is written in assembly with the Arm keyword, also I heard that some TI instruction/configuration has to be done in Arm mode so the whole program cannot be only in Thumb.

Thank you very much for your time.

0 George Mock over 14 years ago in reply to David Henri

TI__Guru**** 250340 points

David Henri said:
When you say the -mt option means I'm using thumb does it means automatically thumbs2?

You are building for Cortex-R4 with the build switch -mv7R4. So, yes, -mt means Thumb2. I forgot that Cortex devices support Thumb2.

David Henri said:
So if i'm use thumb compiler option (-mt) is it possible I'm still using ARM instruction by using special keyword in assembly code?

It is possible to build some C files with -mt, and others without, then link them together. The compiler and hardware work together to handle the transitions from ARM to Thumb(2) and back. It is also possible to have a single assembly file contain both ARM and Thumb(2) code.

Thanks and regards,

-George

Code Composer Studio™︎

Code Composer Studio forum

Uneffective access of virtual table