This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hello,
I have migrated a project for F28335 DSP from CCS3.3 to the new CCS6.
But I noticed a strange thing. Altought I have migrated everything, even the .lib necessary, the operations envolving floating-point values were a few useconds slower.
I have the compiler optimazed for Speed and use the v6.4.2 C2000 compiler and use rts2800_fpu32_fast_supplement.lib, rts2800_fpu32.lib and IQmath_fpu32.lib.
Do I need some other type of .lib or some particular option of the compiler/linker enabled?
Best regards, Jorge
Hi Vishal,
Thank you for the tips, I'll try that and report back.
Regarding the math routines I'm running them in the RAM.
Best regards, Jorge Lopes
Do you have interrupts runnings? The way i benchmark these functions is to turn on the clock feature in CCS (i think you will find it under the Run tab in the debug perspective). The clock will show up as a little clock icon in the bottom right corner of the CCS window. Then i run code all the way to the actual calling of the functions....so if i were becnhmarking the divide function i would keep stepping code until this point
LCR #_FS$$DIV
or something like that - the syntax may not be exactly correct. Once im at the call instruction, i reset the clock by double clicking on the clock icon, then step over the LCR instruction and check what the cycles are. Can you try this and see if you get the same cycles mentioned in the user's guide - if you have interrupts running disable them and try this again to make sure that the interrupts arent responsible for the slowing down.
Hello Lori,
I have updated the compiler to the version 6.4.6 and it didn't work yet.
I've tried with diferent versions of the compiler with similar results.
Best regards, Jorge Lopes
Hello Vishal, how are you?
No, I haven't solved my problem.
I've made some extra tests to get more information about the situation.
I made a simple routine with math routines runing in an interruption and compiled this in CCS v3.3 and in the CCS v6.1:
sqrt(sin(2.3f));
sqrt(sin(3.4f));
sqrt(sin(5.6f));
sqrt(sin(6.7f));
sqrt(sin(7.8f));
sqrt(sin(8.9f));
sqrt(sin(10.11f));
sqrt(asin(2.3f));
sqrt(atan(3.4f));
sqrt(cos(5.6f));
sqrt(6.7f/45.6f);
sqrt(acos(7.8f));
sqrt(atan(8.9f));
sqrt(exp(10.11f));
I've tested the execution time for this routine and got the following results:
Execution time in CCS v3.3 = 13 uSeconds
Execution time in CCS v6.1 = 30 uSeconds
I have the following configuration on the CCS v3.3 project:
Compiler: v5.0.1
Compiler options:
-pm -pdsw225 -pden -o3 -fr"$(Proj_dir)\DSP\2833x_FLASH" -i"$(Proj_dir)\DSP\include" -i"$(Proj_dir)\DSP\librarias\lib\PowerLIB" -i"$(Proj_dir)\SOURCE\include" -d"_DEBUG" -d"FLASH" -mf -v28 --float_support=fpu32
Linker options:
-b -c -m".\DSP\2833x_FLASH\Conversor.map" -o".\DSP\2833x_FLASH\Conversor.out" -stack0x400 -w -i"$(Proj_dir)\DSP\lib" -priority
Linker order:
rts2800_fpu32_fast_supplement.lib
F28335_FLASH_Conversor.cmd
I have the following configuration on the CCS v6.1 project:
Compiler: v6.4.6
Compiler options:
-v28 --float_support=fpu32 --vcu_support=vcu0 -O2 --opt_for_speed=2 --fp_reassoc=off --fp_mode=relaxed --include_path="C:/ti/ccsv6/tools/compiler/ti-cgt-c2000_6.4.6/include" --include_path="C:/SVN/dsp/branches/_CC6/DSP/include" --include_path="C:/SVN/dsp/branches/_CC6/SOURCE/include" -g --float_operations_allowed=all --define="_DEBUG" --define="FLASH" --diag_warning=225 --display_error_number --issue_remarks --diag_wrap=on --c_src_interlist --obj_directory="C:/SVN/dsp/branches/_CC6/_DSP/F2833x_FLASH"
Linker options:
-v28 --float_support=fpu32 --vcu_support=vcu0 -O2 --opt_for_speed=2 --fp_reassoc=off --fp_mode=relaxed -g --float_operations_allowed=all --define="_DEBUG" --define="FLASH" --diag_warning=225 --display_error_number --issue_remarks --diag_wrap=on --c_src_interlist --obj_directory="C:/SVN/dsp/branches/_CC6/_DSP/F2833x_FLASH" -z -m"C:/SVN/dsp/branches/_CC6/DSP/2833x_FLASH/Conversor.map" --stack_size=0x400 --warn_sections -i"C:/ti/ccsv6/tools/compiler/ti-cgt-c2000_6.4.6/DSP/lib" -i"C:/ti/ccsv6/tools/compiler/ti-cgt-c2000_6.4.6/lib" -i"C:/ti/ccsv6/tools/compiler/ti-cgt-c2000_6.4.6/include" -i"C:/SVN/dsp/branches/_CC6/_DSP" -i"C:/SVN/dsp/branches/_CC6" --priority --reread_libs --display_error_number --xml_link_info="_DSP_linkInfo.xml" --no_sym_merge --rom_model
Linker order:
rts2800_fpu32_fast_supplement.lib
rts2800_fpu32.lib
IQmath_fpu32.lib
Hi Jorge,
This is probably the differences in the compiler that is causing the code to bloat. To be sure, you need to benchmark the actual square root function itself and not the ISR. So in my previous post i suggested using the CCS clock to do this. You would enable the clock (Run-> Clock) and then in the disassembly window you run to the point where it calls the square root function
LCR #_sqrt
Then you would do two things:
1. make sure its actually calling the fastRTS library
2. double click the clock icon (bottom right corner of CCS) to reset it then step-over this line (F6) and see how many cycles it takes, and that it matches what the fastRTS user guide says. That way we eliminate the issues with the math routines
If the execution time for a single sqrt matches the user guide numbers then its probably the newer compiler that is putting in more code than before.
Jorge,
Also did you check the .map file? Can you post it for us to review?
Thank you
Lori
Yes I can:
For the compiler 6.4.6:
For the v5.0.1 compiler:
Hello Lori and Vishal,
That was the problem indeed.
I have removed the rt2800_fpu32.lib from the project and got the same execution times for both versions of CCS.
The compiler was linking the wrong .lib for the project.
Thank you very much!