This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/C2000-CGT: assignment of structs

Part Number: C2000-CGT
Other Parts Discussed in Thread: C2000WARE

Tool/software: TI C/C++ Compiler

Hi,

I want to understand how an assignment of 2 struct variables are handled by the compiler. I have 2 struct variables a and b of same type and I am doing a = b.

If I look at the asm file generated by the compiler (I used C28x compiler v18.9), I see it gets translated to couple of MOV and RPT instruction. I tried with optimization enabled and disabled

However when I compile the same code using a different makefile (same compiler version), I see that it uses memcpy function instead. The makefile is quite large and spans across multiple files. Hence I am unable to send the same.

I want to understand in what cases does the struct assignment use memcpy function? Is it some compiler/linker flag or is there any other dependencies?

Thanks,

Veena

  • What you're seeing is the compiler inlining the call to memcpy.  You should see a RPT over a PREAD or PWRITE instruction.

    Check to see if you are using the --rpt_threshold option, which can prevent inlining of memcpy loops for large structures.

    If both your source and destination structs are volatile, this will also prohibit inlining with PREAD or PWRITE.

  • Hi,

    I searched in all the make files for the usage of rpt_threshold and couldn't find any such reference. I tried adding --rpt_threshold=256. It did not help.

    I also searched for the --no_rpt and -mi option. I haven't explicitly declared the variables as volatile.

    Below is the console output  for build: (This uses memcpy)

    /opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//bin/cl2000 -I=/home/veena/Argo/benchmarks/ti/vxlib/src/vx/dhrystone -I=/home/veena/Argo/benchmarks/ti/vxlib/src/vx/dhrystone/c28 -I=/home/veena/Argo/benchmarks/ti/vxlib/src/common/c2x -I=/home/veena/Argo/benchmarks/ti/vxlib/src/common/c2x/driverlib -I=/opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//include -D=REG=register -D=DEVICE_CLOCK -D=NO_OF_RUNS=50000 --abi=eabi --display_error_number --emit_warnings_as_errors --diag_remark=10068 --opt_level=off -g --silicon_version=28 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 --fp_mode=relaxed --gen_func_subsections=on --keep_asm --rpt_threshold=256 --preproc_with_compile --preproc_dependency=/home/veena/Argo/benchmarks/out/C28/debug/module/dhrystone/dhry_2.dep -fr=/home/veena/Argo/benchmarks/out/C28/debug/module/dhrystone/ -fs=/home/veena/Argo/benchmarks/out/C28/debug/module/dhrystone/ -ft=/home/veena/Argo/benchmarks/out/C28/debug/module/dhrystone/ -eo=.obj -fc=/home/veena/Argo/benchmarks/ti/vxlib/src/vx/dhrystone/dhry_2.c

    Below is the console output from my Makefile (This uses RPT)

    /opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//bin/cl2000 -v28 -ml -mt --keep_asm --abi=eabi --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -DCPU1 -D_LAUNCHXL_F28379D --gen_func_subsections --fp_mode=relaxed --display_error_number --diag_remark=10068 --diag_suppress=10063 --diag_warning=225 --obj_directory=objs -DDEVICE_CLOCK -DNO_OF_RUNS=50000 -I/opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//include -I../../common/c2x/driverlib -I../../common/c2x -DREG=register dhry_1.c dhry_2.c ../../common/c2x/device.c ../../common/c2x/driverlib/sysctl.c c28/strcmp.asm c28/strcpy.asm -z --heap_size=0x800 --stack_size=0x400 -I/opt/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS//lib -m"dhrystone.map" ../../../concerto/c28/lnk.cmd -llibc.a -o dhry.out

    I couldn't find any major difference in the compiler options.

    Thanks,

    Veena

  • On a similar note, Is there a way to inline the strcpy function with RPT || PREAD instruction?

    Regards,
    Veena
  • Veena Kamath said:
    On a similar note, Is there a way to inline the strcpy function with RPT || PREAD instruction?

    Use optimization --opt_level=2 or greater, use option --opt_for_speed=3 or greater, and don't forget to include string.h. 

    You won't get a PREAD instruction, but it will be an optimized inlined loop.

  • Hi,

    I used -O3 and --opt_for_speed=4 and I still see "LCR strcpy" in the generated assembly code.

    Thanks,
    Veena
  • Veena Kamath said:
    I couldn't find any major difference in the compiler options.

    The difference is the -mt (--unified_memory) option.  With it, you get the RPT||PREAD; without it, you do not.

    You also need -ml (--large_memory_model), but that's the default now.

  • Veena Kamath said:
    I used -O3 and --opt_for_speed=4 and I still see "LCR strcpy" in the generated assembly code.

    Please show me all of the options for that test case.

  • Hi,

    This is the build command I found in the console:

    "D:/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=4 --fp_mode=relaxed --include_path="C:/Users/a0132123/workspace_v827/test4/dhrystone" --include_path="D:/Gitorious/benchmarks/ti/vxlib/src/vx/dhrystone" --include_path="C:/Users/a0132123/workspace_v827/test4/dhrystone/device" --include_path="D:/ti/c2000/C2000Ware_1_00_03_00/driverlib/f2837xs/driverlib" --include_path="D:/Gitorious/benchmarks/ti/vxlib/src/common" --include_path="D:/ti/ccsv8/tools/compiler/ti-cgt-c2000_18.9.0.STS/include" --advice:performance=all --define=DEVICE_CLOCK --define=NO_OF_RUNS=1000000 --define=REG=register --define=_LAUNCHXL_F28377S --define=CPU1 --diag_suppress=10063 --diag_warning=225 --diag_wrap=off --display_error_number --gen_func_subsections=on --abi=coffabi --disable_inlining -k --asm_listing --preproc_with_compile --preproc_dependency="dhry_1.d_raw" "../dhry_1.c"

    Thanks,
    Veena
  • I am unable to find any combination of compiler options which cause a call to strcpy to be inlined.  So, I filed the entry CODEGEN-5705 in the SDOWP system to have this investigated.  It does not report a bug, because the generated code is correct.  But, it reports a performance issue, since the generated code could be faster.  You are welcome to follow it with the SDOWP link below in my signature.

    Thanks and regards,

    -George

  • Thanks George.
    I added a macro as shown below. This helped me in getting better results using inlined code
    #define strcpy(d,s) memcpy(d,s,sizeof(s))

    Thanks and Regards,
    Veena