Tool/software:
I am using the TI C2000 compiler with the following optimization flags:
/opt/ti/ti-cgt-c2000_22.6.1.LTS/bin/cl2000 \ --issue_remarks --gen_opt_info=2 -v28 -ml -O4 -op=3 \ --c_src_interlist --auto_inline --verbose_diagnostics \ --advice:performance=all --opt_for_speed=5 \ --preproc_with_compile --keep_asm \ -I/opt/ti/ti-cgt-c2000_22.6.1.LTS/include \ main.cpp
However, I have noticed a few inefficiencies in the code optimization behavior at the `-O4`
optimization level, and I wanted to ask if there are any recommendations or insights to address these.
Static Table Optimization
In the first example, where I have a static table[]
in the function foo()
, I expected the compiler to optimize this table away and remove unnecessary memory accesses. However, the table is still being accessed directly, even though the value of a
is within the bounds of the table. In comparison, GCC at optimization level -O1
would handle this more efficiently. Is there any way to ensure that the table is properly optimized away?
/* MOVZ AR6,AL ; [CPU_ALU] |4| MOVL XAR4,#_table$1 ; [CPU_ARAU] |6| SETC SXM ; [CPU_ALU] MOVL ACC,XAR4 ; [CPU_ALU] |6| ADD ACC,AR6 ; [CPU_ALU] |6| MOVL XAR4,ACC ; [CPU_ALU] |6| MOV AL,*+XAR4[0] ; [CPU_ALU] |6| LRETR ; [CPU_ALU] */ int foo(char a) { static const int table[] = { 1,2,3,4,5 }; return table[a]; }
Auto inline
In the second example, the read()
function is simple and should ideally be inlined, especially given the --auto_inline
flag. However, the compiler does not seem to inline this function. GCC at -O1
inlines it automatically. Is there a reason why this function is not inlined in the C2000 compiler even with -O4
, and are there additional flags that can ensure this?
Also I observed that the compiler generates an unnecessary call to memcpy()
, which is not ideal for performance. The code is essentially moving around values that could be done with simpler instructions, so I was surprised to see the memcpy call. How can I avoid this issue, or is there a setting that can better optimize this pattern?
/* ADDB SP,#4 ; [CPU_ARAU] MOV PL,#65012 ; [CPU_ALU] |2| MOVZ AR4,SP ; [CPU_ALU] |8| MOV PH,#16180 ; [CPU_ALU] |2| MOVL ACC,XAR6 ; [CPU_ALU] |7| MOVL *-SP[4],P ; [CPU_ALU] |2| SUBB XAR4,#4 ; [CPU_ARAU] |8| MOV PL,#65012 ; [CPU_ALU] |2| MOV PH,#16180 ; [CPU_ALU] |2| MOVZ AR5,AR4 ; [CPU_ALU] |8| MOVL *-SP[2],P ; [CPU_ALU] |2| B $C$L1,EQ ; [CPU_ALU] |8| MOVL XAR4,ACC ; [CPU_ALU] |8| MOVB ACC,#4 ; [CPU_ALU] |8| LCR #_memcpy ; [CPU_ALU] |8| << Whoooh! SUBB SP,#4 ; [CPU_ARAU] LRETR ; [CPU_ALU] */ struct float2 { float2(float x, float y) : x(x), y(y) {} float x, y; }; // Should be auto inlined (called once) float2 read() { return float2(0.707f, 0.707f); } float test() { float2 v = read(); return v.x + v.y; }
I would appreciate any insights or suggestions on improving the optimization for these cases with the TI C2000 compiler.