This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software: TI C/C++ Compiler
I would like to confirm whether we have a compiler issue or not.
The following function from our example works without issues when optimization is off:
#pragma CODE_SECTION(RFFT_f32_sincostable_TMU0, ".TI.ramfunc") void RFFT_f32_sincostable_TMU0(RFFT_F32_STRUCT_Handle fft) { float delta_phi = 0.125; uint16_t k = 1; uint16_t i, j; float *dst = fft->CosSinBuf; for (i = 3; i <= fft->FFTStages; i++) { float phi = delta_phi; for (j=1; j <= k; j ++) { *dst++ = __cospuf32(phi); *dst++ = __sinpuf32(phi); phi += delta_phi; } *dst++ = 0.0; *dst++ = 1.0; k = (k * 2) + 1; delta_phi = delta_phi * 0.5; } }
However, when setting the optimization level to 5, the data seems corrupted. These are the settings:
-v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --fp_mode=strict
The compiler is version TI v16.9.5.LTS.
Looking at the disassembled code, the number of operations seems fewer than what it should be to not violate the pipeline. With sine and cosine operators, we should expect 4 instructions before retrieving the results (4p instruction), but we are short one:
$C$L12: COSPUF32 R1H,R3H ; [CPU_] |286| SINPUF32 R0H,R3H ; [CPU_] |287| NOP ; [CPU_] NOP ; [CPU_] MOV32 *XAR5++,R1H ; [CPU_] |286| || ADDF32 R3H,R3H,R2H ; [CPU_] |289| MOV32 *XAR5++,R0H ; [CPU_] |287|
The above is the result with optimization. Without optimization, the compiler adds the necessary number of NOP's and the code runs without issues.
Can you please comment?
Thank you!
Richard,
the attached code demonstrates the issue. Indeed the number of cycles between the cos and sin functions does not explain a pipeline issue, however modifying the optimization of the code does affect the results, creating unexpected errors. Any slight modification in the code changes the structure of the assembly file, so the problem easily goes away. With this attached code however, the results are consistent. The errors can be turned on and off, according to the level of optimization. I am not sure how to explain this phenomena. I tested with no optimization versus this:
-v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5
I will take a look. Sounds like a latency might be violated on either a back branch or fall-through.
Anna
Indeed the compilation depends on controlSUITE being installed. I added the -ppo option for the compiler, but I now get an error. The console output is attached for your review. Anything I am missing?
Thanks for the help!
**** Build of configuration Debug for project TMU_Test **** "C:\\TI\\ccsv7_3\\ccsv7\\utils\\bin\\gmake" -k -j 4 all -O 'Building file: ../F2837xD_GlobalVariableDefs.c' 'Invoking: C2000 Compiler' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_GlobalVariableDefs.c" 'Finished building: ../F2837xD_GlobalVariableDefs.c' ' ' 'Building file: ../F2837xD_Gpio.c' 'Invoking: C2000 Compiler' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_Gpio.c" 'Finished building: ../F2837xD_Gpio.c' ' ' 'Building file: ../F2837xD_SysCtrl.c' 'Invoking: C2000 Compiler' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_SysCtrl.c" 'Finished building: ../F2837xD_SysCtrl.c' ' ' 'Building file: ../main.c' 'Invoking: C2000 Compiler' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../main.c" 'Finished building: ../main.c' ' ' 'Building file: ../F2837xD_GlobalVariableDefs.c' 'Building file: ../F2837xD_Gpio.c' 'Invoking: C2000 Compiler' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_GlobalVariableDefs.c" 'Invoking: C2000 Compiler' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_Gpio.c" 'Building file: ../F2837xD_SysCtrl.c' 'Invoking: C2000 Compiler' 'Building file: ../main.c' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_SysCtrl.c" 'Finished building: ../F2837xD_GlobalVariableDefs.c' 'Finished building: ../F2837xD_Gpio.c' 'Invoking: C2000 Compiler' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../main.c" ' ' ' ' 'Finished building: ../F2837xD_SysCtrl.c' ' ' 'Finished building: ../main.c' ' ' 'Building target: TMU_Test.out' 'Invoking: C2000 Linker' "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing -z -m"TMU_Test.map" --heap_size=0x400 --stack_size=0x1000 --warn_sections -i"C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/lib" -i"C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --reread_libs --diag_wrap=off --display_error_number --xml_link_info="TMU_Test_linkInfo.xml" --rom_model -o "TMU_Test.out" "./F2837xD_GlobalVariableDefs.obj" "./F2837xD_Gpio.obj" "./F2837xD_SysCtrl.obj" "./main.obj" "../2837xD_FLASH_lnk_cpu1.cmd" "../F2837xD_Headers_nonBIOS_cpu1.cmd" -lrts2800_fpu32.lib >> ERROR: no source files, nothing to do 'Finished building target: TMU_Test.out' ' ' **** Build Finished ****
Hi Anna,
I also had difficulty reproducing the issue at first. Anything we modify in the code seems to fix it. For actually confirming the errors, one has not only to maintain the original code structure, but also collect the information from the console that is being fed through printf's and populate an Excel spreadsheet to compare the results. The assembly code you show is not identical to mine, although very similar. I see that some of the registers being used are different, such as the one in the instruction MOVIZ. Here is my assembly code, collected recently:
C$L1: 008054: E801E000 MOVIZ R0, #0x3c00 29 *dst++ = __cospuf32(phi); 008056: E2790019 COSPUF32 R1H, R3H 30 *dst++ = __sinpuf32(phi); 008058: E278001C SINPUF32 R4H, R3H 28 *dst++ = phi; 00805a: E801E002 MOVIZ R2, #0x3c00 00805c: E80810C8 MOVXI R0H, #0x219 00805e: E010DB85 ADDF32 R3H, R3H, R0H || MOV32 *XAR5++, R3H 008060: E80810CA MOVXI R2H, #0x219 29 *dst++ = __cospuf32(phi); 008062: E2030185 MOV32 *XAR5++, R1H 008064: E2790019 COSPUF32 R1H, R3H 30 *dst++ = __sinpuf32(phi); 008066: E2780018 SINPUF32 R0H, R3H 008068: E2030485 MOV32 *XAR5++, R4H 28 *dst++ = phi; 00806a: E014DB85 ADDF32 R3H, R3H, R2H || MOV32 *XAR5++, R3H 29 *dst++ = __cospuf32(phi); 00806c: E2030185 MOV32 *XAR5++, R1H 32 phi += delta_phi; 00806e: E801E001 MOVIZ R1, #0x3c00 29 *dst++ = __cospuf32(phi); 008070: E279001D COSPUF32 R5H, R3H 30 *dst++ = __sinpuf32(phi); 008072: E278001C SINPUF32 R4H, R3H 008074: E2030085 MOV32 *XAR5++, R0H 32 phi += delta_phi; 008076: E80810C9 MOVXI R1H, #0x219 28 *dst++ = phi; 008078: E2030385 MOV32 *XAR5++, R3H 29 *dst++ = __cospuf32(phi); 00807a: E012DD85 ADDF32 R3H, R3H, R1H || MOV32 *XAR5++, R5H 30 *dst++ = __sinpuf32(phi); 00807c: E2030485 MOV32 *XAR5++, R4H C$L2: 00807e: E2AF05BE MOV32 R5H, *--SP, UNCF
These are all my compiler options:
-v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing
Thanks!
Does the exact set of preprocessed files, options, and compiler version 16.9.5.LTS cause the errors?
Do you still get the errors if all of this is the same, only with --opt_for_speed=2 instead of --opt_for_speed=5?
Are you compiling on Windows or Linux?
I am not making modifications to the preprocessed files you provided. I am only examining the assembly for a compiler bug and am unable to run the program.
Here is the code you posted. There is nothing wrong with the loop in terms of latency violations. Please see for each multi-cycle instruction, I have noted how many delay slots prior to the use of its output register. According to the information we have encoded about the pipeline, these are all accurate:
The ADDF32 takes 2 delay cycles prior to its use by a TMU instruction
The TMU instructions take 2 delay cycles prior to the use of their output
C$L1: delay slots before use of output register
008054: E801E000 MOVIZ R0, #0x3c00
29 *dst++ = __cospuf32(phi);
008056: E2790019 COSPUF32 R1H, R3H 5
30 *dst++ = __sinpuf32(phi);
008058: E278001C SINPUF32 R4H, R3H 7
28 *dst++ = phi;
00805a: E801E002 MOVIZ R2, #0x3c00
00805c: E80810C8 MOVXI R0H, #0x219
00805e: E010DB85 ADDF32 R3H, R3H, R0H 2
|| MOV32 *XAR5++, R3H
008060: E80810CA MOVXI R2H, #0x219
29 *dst++ = __cospuf32(phi);
008062: E2030185 MOV32 *XAR5++, R1H
008064: E2790019 COSPUF32 R1H, R3H 3
30 *dst++ = __sinpuf32(phi);
008066: E2780018 SINPUF32 R0H, R3H 6
008068: E2030485 MOV32 *XAR5++, R4H
28 *dst++ = phi;
00806a: E014DB85 ADDF32 R3H, R3H, R2H 2
|| MOV32 *XAR5++, R3H
29 *dst++ = __cospuf32(phi);
00806c: E2030185 MOV32 *XAR5++, R1H
32 phi += delta_phi;
00806e: E801E001 MOVIZ R1, #0x3c00
29 *dst++ = __cospuf32(phi);
008070: E279001D COSPUF32 R5H, R3H 4
30 *dst++ = __sinpuf32(phi);
008072: E278001C SINPUF32 R4H, R3H 4
008074: E2030085 MOV32 *XAR5++, R0H
32 phi += delta_phi;
008076: E80810C9 MOVXI R1H, #0x219
28 *dst++ = phi;
008078: E2030385 MOV32 *XAR5++, R3H
29 *dst++ = __cospuf32(phi);
00807a: E012DD85 ADDF32 R3H, R3H, R1H 2 (back edge of loop)
|| MOV32 *XAR5++, R5H
30 *dst++ = __sinpuf32(phi);
00807c: E2030485 MOV32 *XAR5++, R4H
C$L2:
00807e: E2AF05BE MOV32 R5H, *--SP, UNCF
Try single-stepping through your program to find at what point your data gets corrupted. If you can identify the exact instruction where your data gets corrupted, we can see if there is a code generation error at some other point in the program. Or there could be a memory corruption issue somewhere.
I was able to reproduce the error with a different preprocessed file. The difference was that phi was not saved back to *dst. The generated assembly looked like the original posted in the forum:
RPTB $C$L3,AR6 ; [CPU_] |19259|
; repeat block starts ; []
$C$L2:
.dwpsn file "main.pp",line 19261,column 13,is_stmt,isa 0
COSPUF32 R1H,R3H ; [CPU_] |19261|
.dwpsn file "main.pp",line 19262,column 13,is_stmt,isa 0
SINPUF32 R0H,R3H ; [CPU_] |19262|
NOP ; [CPU_]
NOP ; [CPU_]
.dwpsn file "main.pp",line 19261,column 13,is_stmt,isa 0
MOV32 *XAR5++,R1H ; [CPU_] |19261|
|| ADDF32 R3H,R3H,R2H ; [CPU_] |19264|
.dwpsn file "main.pp",line 19262,column 13,is_stmt,isa 0
MOV32 *XAR5++,R0H ; [CPU_] |19262|
; repeat block ends ; []
Here, there is a latency violation on the back edge of the RPTB: there should be 2 delay slots between the ADDF32 into R3H and the COSPUF32 at the top of the RPTB, but there is only 1. I have filed CODEGEN-3878 to track this issue.
Thanks,
Anna