Tool/software:
I am using the TI C2000 compiler with the following optimization flags:
/opt/ti/ti-cgt-c2000_22.6.1.LTS/bin/cl2000 \ --issue_remarks --gen_opt_info=2 -v28 -ml -O4 -op=3 \ --c_src_interlist --auto_inline --verbose_diagnostics \ --advice:performance=all --opt_for_speed=5 \ --preproc_with_compile --keep_asm \ -I/opt/ti/ti-cgt-c2000_22.6.1.LTS/include \ main.cpp
However, I have noticed a few inefficiencies in the code optimization behavior at the `-O4` optimization level, and I wanted to ask if there are any recommendations or insights to address these.
Static Table Optimization
In the first example, where I have a static table[] in the function foo(), I expected the compiler to optimize this table away and remove unnecessary memory accesses. However, the table is still being accessed directly, even though the value of a is within the bounds of the table. In comparison, GCC at optimization level -O1 would handle this more efficiently. Is there any way to ensure that the table is properly optimized away?
/*
MOVZ AR6,AL ; [CPU_ALU] |4|
MOVL XAR4,#_table$1 ; [CPU_ARAU] |6|
SETC SXM ; [CPU_ALU]
MOVL ACC,XAR4 ; [CPU_ALU] |6|
ADD ACC,AR6 ; [CPU_ALU] |6|
MOVL XAR4,ACC ; [CPU_ALU] |6|
MOV AL,*+XAR4[0] ; [CPU_ALU] |6|
LRETR ; [CPU_ALU]
*/
int foo(char a) {
static const int table[] = { 1,2,3,4,5 };
return table[a];
}
Auto inline
In the second example, the read() function is simple and should ideally be inlined, especially given the --auto_inline flag. However, the compiler does not seem to inline this function. GCC at -O1 inlines it automatically. Is there a reason why this function is not inlined in the C2000 compiler even with -O4, and are there additional flags that can ensure this?
Also I observed that the compiler generates an unnecessary call to memcpy(), which is not ideal for performance. The code is essentially moving around values that could be done with simpler instructions, so I was surprised to see the memcpy call. How can I avoid this issue, or is there a setting that can better optimize this pattern?
/*
ADDB SP,#4 ; [CPU_ARAU]
MOV PL,#65012 ; [CPU_ALU] |2|
MOVZ AR4,SP ; [CPU_ALU] |8|
MOV PH,#16180 ; [CPU_ALU] |2|
MOVL ACC,XAR6 ; [CPU_ALU] |7|
MOVL *-SP[4],P ; [CPU_ALU] |2|
SUBB XAR4,#4 ; [CPU_ARAU] |8|
MOV PL,#65012 ; [CPU_ALU] |2|
MOV PH,#16180 ; [CPU_ALU] |2|
MOVZ AR5,AR4 ; [CPU_ALU] |8|
MOVL *-SP[2],P ; [CPU_ALU] |2|
B $C$L1,EQ ; [CPU_ALU] |8|
MOVL XAR4,ACC ; [CPU_ALU] |8|
MOVB ACC,#4 ; [CPU_ALU] |8|
LCR #_memcpy ; [CPU_ALU] |8| << Whoooh!
SUBB SP,#4 ; [CPU_ARAU]
LRETR ; [CPU_ALU]
*/
struct float2 {
float2(float x, float y) : x(x), y(y) {}
float x, y;
};
// Should be auto inlined (called once)
float2 read() { return float2(0.707f, 0.707f); }
float test() {
float2 v = read();
return v.x + v.y;
}
I would appreciate any insights or suggestions on improving the optimization for these cases with the TI C2000 compiler.