Other Parts Discussed in Thread: CONTROLSUITE, C2000WARE
Tool/software: TI C/C++ Compiler
I would like to confirm whether we have a compiler issue or not.
The following function from our example works without issues when optimization is off:
#pragma CODE_SECTION(RFFT_f32_sincostable_TMU0, ".TI.ramfunc")
void RFFT_f32_sincostable_TMU0(RFFT_F32_STRUCT_Handle fft)
{
float delta_phi = 0.125;
uint16_t k = 1;
uint16_t i, j;
float *dst = fft->CosSinBuf;
for (i = 3; i <= fft->FFTStages; i++) {
float phi = delta_phi;
for (j=1; j <= k; j ++)
{
*dst++ = __cospuf32(phi);
*dst++ = __sinpuf32(phi);
phi += delta_phi;
}
*dst++ = 0.0;
*dst++ = 1.0;
k = (k * 2) + 1;
delta_phi = delta_phi * 0.5;
}
}
However, when setting the optimization level to 5, the data seems corrupted. These are the settings:
-v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --fp_mode=strict
The compiler is version TI v16.9.5.LTS.
Looking at the disassembled code, the number of operations seems fewer than what it should be to not violate the pipeline. With sine and cosine operators, we should expect 4 instructions before retrieving the results (4p instruction), but we are short one:
$C$L12:
COSPUF32 R1H,R3H ; [CPU_] |286|
SINPUF32 R0H,R3H ; [CPU_] |287|
NOP ; [CPU_]
NOP ; [CPU_]
MOV32 *XAR5++,R1H ; [CPU_] |286|
|| ADDF32 R3H,R3H,R2H ; [CPU_] |289|
MOV32 *XAR5++,R0H ; [CPU_] |287|
The above is the result with optimization. Without optimization, the compiler adds the necessary number of NOP's and the code runs without issues.
Can you please comment?
Thank you!