Part Number: CC1312R
Tool/software: TI C/C++ Compiler
Hi,
I try to investigate the difference in generated assembly code when using a simple dispatch function in two versions:
- switch/case
- if/else
I use the ARM GNU Linaro 9.2.1 compiler and ARM-M4F processor (CC1312).
To compare, I also checked results using TI v20.2.1.LTS compiler.
I use optimization O3 and O4 respectively.
Version 1 - switch/case
uint32_t read1(const uint32_t index) const {
switch (index) {
case 0:
return field1.read1();
case 1:
return field2.read1();
case 2:
return field3.read1();
...
...
}
return 0;
}
When the dispatch function is created using a switch/case statement, the generated result is optimum. It compiles to the jump table (TBB or TBH) when the number of branches is high enough or to multiple compare statements (CMP) when the number of branches is low.
000013f8: F2008146 bhi.w unknown 000013fc: E8DFF013 tbh [pc, r3, lsl #1] 00001400: 0142 lsls r2, r0, #5 00001402: 0140 lsls r0, r0, #5 00001404: 013E lsls r6, r7, #4 00001406: 013C lsls r4, r7, #4
Version 2 - if/else
uint32_t read1(const uint32_t index) const {
if (index == 0) {
return field1.read1();
}
else if (index == 1) {
return field2.read1();
}
else if (index == 2) {
return field3.read1();
}
...
...
return 0;
}
Unfortunately, when the same code is rewritten to use if/else statement it compiles as several compare (CMP) assembler instructions regardless of the number of branches. This approach is extremely inefficient. For 20 branches it gives almost the twice number of cycles when compared to the jump table (switch/case).
289 else if (index == 1) {
0000142c: 2B01 cmp r3, #1
0000142e: D06F beq unknown
292 else if (index == 2) {
00001430: 2B02 cmp r3, #2
00001432: F00080DD beq.w unknown
295 else if (index == 3) {
00001436: 2B03 cmp r3, #3
00001438: F00080E1 beq.w unknown
298 else if (index == 4) {
0000143c: 2B04 cmp r3, #4
0000143e: F00080EE beq.w unknown
The index sequence has no gaps (0,1,2,3,4,5,....) so in the theory jump table is the optimum solution for almost every number of branches.
The results are as follows:
/** * comparison - 10000 times invokes: * * No of cycles (size) - M4F (TI CC1312): * ------------------------------------- * | SWITCH | IF | * ------------------------------------- * | GNU Linaro 9.2.1 - -O3 | * ------------------------------------------------- * 2 elements | 280,008 | 280,008 | * | (27873) | (27873) | * ------------------------------------------------- * 20 elements | 660,117 | 973,134 | * | (28241) | (28273) | * ------------------------------------------------- */
Do I miss something?
The generated code has to be similar.
Why is it so?
Is there any compiler flag to deal with such situations?
What can I do to force the compiler to generate an optimum result in version 2?
The If/else version is crucial for me to introduce more fancy generalization into the project and use c++ template metaprogramming.
What can I do to use if/else version and receive a result based on the jump/branch table?
Below I enclosed the minimal reproducible example.
7043.main.cpp
Any help is appreciated.
/Adam