This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software: TI C/C++ Compiler
Hi,
I want to understand how an assignment of 2 struct variables are handled by the compiler. I have 2 struct variables a and b of same type and I am doing a = b.
If I look at the asm file generated by the compiler (I used C28x compiler v18.9), I see it gets translated to couple of MOV and RPT instruction. I tried with optimization enabled and disabled
However when I compile the same code using a different makefile (same compiler version), I see that it uses memcpy function instead. The makefile is quite large and spans across multiple files. Hence I am unable to send the same.
I want to understand in what cases does the struct assignment use memcpy function? Is it some compiler/linker flag or is there any other dependencies?
Thanks,
Veena
What you're seeing is the compiler inlining the call to memcpy. You should see a RPT over a PREAD or PWRITE instruction.
Check to see if you are using the --rpt_threshold option, which can prevent inlining of memcpy loops for large structures.
If both your source and destination structs are volatile, this will also prohibit inlining with PREAD or PWRITE.
Hi,
I searched in all the make files for the usage of rpt_threshold and couldn't find any such reference. I tried adding --rpt_threshold=256. It did not help.
I also searched for the --no_rpt and -mi option. I haven't explicitly declared the variables as volatile.
Below is the console output for build: (This uses memcpy)
/opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//bin/cl2000 -I=/home/veena/Argo/benchmarks/ti/vxlib/src/vx/dhrystone -I=/home/veena/Argo/benchmarks/ti/vxlib/src/vx/dhrystone/c28 -I=/home/veena/Argo/benchmarks/ti/vxlib/src/common/c2x -I=/home/veena/Argo/benchmarks/ti/vxlib/src/common/c2x/driverlib -I=/opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//include -D=REG=register -D=DEVICE_CLOCK -D=NO_OF_RUNS=50000 --abi=eabi --display_error_number --emit_warnings_as_errors --diag_remark=10068 --opt_level=off -g --silicon_version=28 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 --fp_mode=relaxed --gen_func_subsections=on --keep_asm --rpt_threshold=256 --preproc_with_compile --preproc_dependency=/home/veena/Argo/benchmarks/out/C28/debug/module/dhrystone/dhry_2.dep -fr=/home/veena/Argo/benchmarks/out/C28/debug/module/dhrystone/ -fs=/home/veena/Argo/benchmarks/out/C28/debug/module/dhrystone/ -ft=/home/veena/Argo/benchmarks/out/C28/debug/module/dhrystone/ -eo=.obj -fc=/home/veena/Argo/benchmarks/ti/vxlib/src/vx/dhrystone/dhry_2.c
Below is the console output from my Makefile (This uses RPT)
/opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//bin/cl2000 -v28 -ml -mt --keep_asm --abi=eabi --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -DCPU1 -D_LAUNCHXL_F28379D --gen_func_subsections --fp_mode=relaxed --display_error_number --diag_remark=10068 --diag_suppress=10063 --diag_warning=225 --obj_directory=objs -DDEVICE_CLOCK -DNO_OF_RUNS=50000 -I/opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//include -I../../common/c2x/driverlib -I../../common/c2x -DREG=register dhry_1.c dhry_2.c ../../common/c2x/device.c ../../common/c2x/driverlib/sysctl.c c28/strcmp.asm c28/strcpy.asm -z --heap_size=0x800 --stack_size=0x400 -I/opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//lib -m"dhrystone.map" ../../../concerto/c28/lnk.cmd -llibc.a -o dhry.out
I couldn't find any major difference in the compiler options.
Thanks,
Veena
Veena Kamath said:On a similar note, Is there a way to inline the strcpy function with RPT || PREAD instruction?
Use optimization --opt_level=2 or greater, use option --opt_for_speed=3 or greater, and don't forget to include string.h.
You won't get a PREAD instruction, but it will be an optimized inlined loop.
Veena Kamath said:I couldn't find any major difference in the compiler options.
The difference is the -mt (--unified_memory) option. With it, you get the RPT||PREAD; without it, you do not.
You also need -ml (--large_memory_model), but that's the default now.
Veena Kamath said:I used -O3 and --opt_for_speed=4 and I still see "LCR strcpy" in the generated assembly code.
Please show me all of the options for that test case.
I am unable to find any combination of compiler options which cause a call to strcpy to be inlined. So, I filed the entry CODEGEN-5705 in the SDOWP system to have this investigated. It does not report a bug, because the generated code is correct. But, it reports a performance issue, since the generated code could be faster. You are welcome to follow it with the SDOWP link below in my signature.
Thanks and regards,
-George