This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TMS320F28379D: TMU pipeline issue when compiling with optimization level different than zero

Part Number: TMS320F28379D
Other Parts Discussed in Thread: CONTROLSUITE, C2000WARE

Tool/software: TI C/C++ Compiler

I would like to confirm whether we have a compiler issue or not. 

The following function from our example works without issues when optimization is off:

#pragma CODE_SECTION(RFFT_f32_sincostable_TMU0, ".TI.ramfunc")
void RFFT_f32_sincostable_TMU0(RFFT_F32_STRUCT_Handle fft)
{
    float delta_phi = 0.125;
    uint16_t k = 1;
    uint16_t i, j;
    float *dst = fft->CosSinBuf;

    for (i = 3; i <= fft->FFTStages; i++)    {
        float phi = delta_phi;

        for (j=1; j <= k; j ++)
        {
            *dst++ = __cospuf32(phi);
            *dst++ = __sinpuf32(phi);

            phi += delta_phi;
        }

        *dst++ = 0.0;
        *dst++ = 1.0;

        k = (k * 2) + 1;

        delta_phi = delta_phi * 0.5;
    }
}

However, when setting the optimization level to 5, the data seems corrupted. These are the settings:

-v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --fp_mode=strict

The compiler is version TI v16.9.5.LTS. 

Looking at the disassembled code, the number of operations seems fewer than what it should be to not violate the pipeline. With sine and cosine operators, we should expect 4 instructions before retrieving the results (4p instruction), but we are short one:

$C$L12:    
        COSPUF32  R1H,R3H               ; [CPU_] |286| 
        SINPUF32  R0H,R3H               ; [CPU_] |287| 
        NOP       ; [CPU_] 
        NOP       ; [CPU_] 

        MOV32     *XAR5++,R1H           ; [CPU_] |286| 
||      ADDF32    R3H,R3H,R2H           ; [CPU_] |289| 

        MOV32     *XAR5++,R0H           ; [CPU_] |287| 

The above is the result with optimization. Without optimization, the compiler adds the necessary number of NOP's and the code runs without issues.

Can you please comment?

Thank you!

  • Lenio,

    Both SINPUF32 and COSPUF32 require 4 pipeline cycles, which is achieved with three single cycle instructions between the instruction and the first use of its destination register. Here is an example of the latter from the TMU user's guide:

    COSPUF32 R2H,R1H ; R2H = COSPU(fraction(R1H))
    NOP ; pipeline delay
    NOP ; pipeline delay
    NOP ; pipeline delay
    MOV32 @CosValue,R2H ; Cos Value=cos(Radian Value)

    The three intervening NOPs between COSPUF32 and R2H being used in MOV32 guarantee 4 pipeline slots. In the disassembly code in your post
    there are three instructions (i.e. four pipeline cycles) between COSPUF32 and the instruction which next uses its destination register:

    COSPUF32 R1H,R3H
    SINPUF32 R0H,R3H ; <-- cycle1
    NOP ; <-- cycle 2
    NOP ; <-- cycle 3

    MOV32 *XAR5++,R1H ; <-- cycle 4
    || ADDF32 R3H,R3H,R2H

    MOV32 *XAR5++,R0H

    The same is true for SINPUF32 (destination register R0H). To me it looks OK. I think the problem is somewhere else.

    Regards,

    Richard
  • Richard,

    the attached code demonstrates the issue. Indeed the number of cycles between the cos and sin functions does not explain a pipeline issue, however modifying the optimization of the code does affect the results, creating unexpected errors. Any slight modification in the code changes the structure of the assembly file, so the problem easily goes away. With this attached code however, the results are consistent. The errors can be turned on and off, according to the level of optimization. I am not sure how to explain this phenomena. I tested with no optimization versus this:

    -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5

    TMU_Test.zip

  • Lenio,

    Thank you for submitting a compact test case. I have been able to reproduce the issue, which is present also in the latest CGT version. It appears to be an issue relating to pointer indexing and timing, but I need to get it into the hands of the compiler team. I'm going to move this post to the C compiler forum.

    I believe if you extend the repeat loop be adding a NOP at the end it will fix the issue. Your inner loop will then look like this:

    for (uint16_t j=1; j <= k; j++)
    {
    *dst++ = __cospuf32(phi);
    *dst++ = __sinpuf32(phi);

    phi += delta_phi;
    asm(" NOP");
    }

    Could you try that and let me know if it works? It pushes out the loop by an additional cycle but on my machine fixes the issue. Thanks.

    Regards,

    Richard
  • Hi Richard,

    thanks for looking into this. I have tried your suggestions and it does fix the problem. Actually, it is harder to make the problem manifest itself than it is to fix it. Any assembly instructions that are inserted between the sin and cos commands makes the issue go away. I look forward to hearing from the compiler team as we are curious to understand what the problem really is.

    Thank you!
  • I will take a look.  Sounds like a latency might be violated on either a back branch or fall-through.

    Anna

  • I can't compile the attached test case because the include files are missing. Please submit preprocessed source files. Compile with -ppo and send the *.pp files.

    Thanks,
    Anna
  • Indeed the compilation depends on controlSUITE being installed. I added the -ppo option for the compiler, but I now get an error. The console output is attached for your review. Anything I am missing?

    Thanks for the help!

    TMU_test console dump.txt
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    **** Build of configuration Debug for project TMU_Test ****
    "C:\\TI\\ccsv7_3\\ccsv7\\utils\\bin\\gmake" -k -j 4 all -O
    'Building file: ../F2837xD_GlobalVariableDefs.c'
    'Invoking: C2000 Compiler'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_GlobalVariableDefs.c"
    'Finished building: ../F2837xD_GlobalVariableDefs.c'
    ' '
    'Building file: ../F2837xD_Gpio.c'
    'Invoking: C2000 Compiler'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_Gpio.c"
    'Finished building: ../F2837xD_Gpio.c'
    ' '
    'Building file: ../F2837xD_SysCtrl.c'
    'Invoking: C2000 Compiler'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_SysCtrl.c"
    'Finished building: ../F2837xD_SysCtrl.c'
    ' '
    'Building file: ../main.c'
    'Invoking: C2000 Compiler'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../main.c"
    'Finished building: ../main.c'
    ' '
    'Building file: ../F2837xD_GlobalVariableDefs.c'
    'Building file: ../F2837xD_Gpio.c'
    'Invoking: C2000 Compiler'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_GlobalVariableDefs.c"
    'Invoking: C2000 Compiler'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_Gpio.c"
    'Building file: ../F2837xD_SysCtrl.c'
    'Invoking: C2000 Compiler'
    'Building file: ../main.c'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../F2837xD_SysCtrl.c"
    'Finished building: ../F2837xD_GlobalVariableDefs.c'
    'Finished building: ../F2837xD_Gpio.c'
    'Invoking: C2000 Compiler'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing "../main.c"
    ' '
    ' '
    'Finished building: ../F2837xD_SysCtrl.c'
    ' '
    'Finished building: ../main.c'
    ' '
    'Building target: TMU_Test.out'
    'Invoking: C2000 Linker'
    "C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --preproc_only --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing -z -m"TMU_Test.map" --heap_size=0x400 --stack_size=0x1000 --warn_sections -i"C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/lib" -i"C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --reread_libs --diag_wrap=off --display_error_number --xml_link_info="TMU_Test_linkInfo.xml" --rom_model -o "TMU_Test.out" "./F2837xD_GlobalVariableDefs.obj" "./F2837xD_Gpio.obj" "./F2837xD_SysCtrl.obj" "./main.obj" "../2837xD_FLASH_lnk_cpu1.cmd" "../F2837xD_Headers_nonBIOS_cpu1.cmd" -lrts2800_fpu32.lib
    >> ERROR: no source files, nothing to do
    'Finished building target: TMU_Test.out'
    ' '
    **** Build Finished ****
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • No, I believe that's the common error message when you do preprocess only because it doesn't actually compile. Look in the directory (probably Debug) and you should see files with the same source names but .pp extensions.
  • Ok, let me know if the attached does the trick!

    TMU_test-pp.zip

  • Can you tell me if the behavior is correct when --opt_for_speed=2 instead of 5 but keeping the optimization level the same (3)?
    That is with these flags:
    -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=2

    I know you said it works when optimization is disabled-- I'm assuming that means with -Ooff?
    Thanks,
    Anna
  • I'm a little baffled. When I compile with v.16.9.5.LTS with the options listed above:
    -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --fp_mode=strict

    I do not see assembly that matches the disassembly you posted above. There are no NOPs at all. At --opt_for_speed=5 we unroll the loop and have multiple iterations in a single repeat block. I've analyzed the assembly for this function with -O3 --opt_for_speed=5 and -O3 --opt_for_speed=2 (the default opt_for_speed setting) and I don't see any latency violations including on the back branch of the repeat block.

    I would like to make sure I have the same options you are using, please verify if your test case produces this assembly for the options provided. You can add -s to keep the source-interlisted assembly file.


    Here is the assembly for -O3 --opt_for_speed=2 for the repeat block:

    ; repeat block starts ; []
    $C$L1:
    .dwpsn file "main.c",line 18242,column 13,is_stmt,isa 0
    MOVIZ R2H,#15360 ; [CPU_] |18242|
    .dwpsn file "main.c",line 18239,column 13,is_stmt,isa 0
    COSPUF32 R1H,R3H ; [CPU_] |18239|
    .dwpsn file "main.c",line 18240,column 13,is_stmt,isa 0
    SINPUF32 R0H,R3H ; [CPU_] |18240|
    .dwpsn file "main.c",line 18238,column 13,is_stmt,isa 0
    MOV32 *XAR5++,R3H ; [CPU_] |18238|
    .dwpsn file "main.c",line 18242,column 13,is_stmt,isa 0
    MOVXI R2H,#537 ; [CPU_] |18242|
    .dwpsn file "main.c",line 18239,column 13,is_stmt,isa 0

    MOV32 *XAR5++,R1H ; [CPU_] |18239|
    || ADDF32 R3H,R3H,R2H ; [CPU_] |18242|

    .dwpsn file "main.c",line 18240,column 13,is_stmt,isa 0
    MOV32 *XAR5++,R0H ; [CPU_] |18240|
    ; repeat block ends ; []


    And here it is for -O3 --opt_for_speed=5:

    ; repeat block starts ; []
    $C$L1:
    .dwpsn file "main.c",line 18238,column 13,is_stmt,isa 0
    MOVIZ R0H,#15360 ; [CPU_] |18238|
    .dwpsn file "main.c",line 18239,column 13,is_stmt,isa 0
    COSPUF32 R1H,R3H ; [CPU_] |18239|
    .dwpsn file "main.c",line 18240,column 13,is_stmt,isa 0
    SINPUF32 R4H,R3H ; [CPU_] |18240|
    .dwpsn file "main.c",line 18238,column 13,is_stmt,isa 0
    MOVIZ R2H,#15360 ; [CPU_] |18238|
    MOVXI R0H,#537 ; [CPU_] |18238|

    ADDF32 R3H,R3H,R0H ; [CPU_] |18238|
    || MOV32 *XAR5++,R3H ; [CPU_] |18238|

    MOVXI R2H,#537 ; [CPU_] |18238|
    .dwpsn file "main.c",line 18239,column 13,is_stmt,isa 0
    MOV32 *XAR5++,R1H ; [CPU_] |18239|
    COSPUF32 R1H,R3H ; [CPU_] |18239|
    .dwpsn file "main.c",line 18240,column 13,is_stmt,isa 0
    SINPUF32 R0H,R3H ; [CPU_] |18240|
    MOV32 *XAR5++,R4H ; [CPU_] |18240|
    .dwpsn file "main.c",line 18238,column 13,is_stmt,isa 0
    MOV32 *XAR5++,R3H ; [CPU_] |18238|

    MOV32 *XAR5++,R1H ; [CPU_] |18239|
    || ADDF32 R3H,R3H,R2H ; [CPU_] |18238|

    .dwpsn file "main.c",line 18242,column 13,is_stmt,isa 0
    MOVIZ R2H,#15360 ; [CPU_] |18242|
    MOVXI R2H,#537 ; [CPU_] |18242|
    .dwpsn file "main.c",line 18239,column 13,is_stmt,isa 0
    COSPUF32 R4H,R3H ; [CPU_] |18239|
    .dwpsn file "main.c",line 18240,column 13,is_stmt,isa 0
    SINPUF32 R1H,R3H ; [CPU_] |18240|
    MOV32 *XAR5++,R0H ; [CPU_] |18240|
    .dwpsn file "main.c",line 18238,column 13,is_stmt,isa 0
    MOV32 *XAR5++,R3H ; [CPU_] |18238|
    .dwpsn file "main.c",line 18239,column 13,is_stmt,isa 0

    ADDF32 R3H,R3H,R2H ; [CPU_] |18242|
    || MOV32 *XAR5++,R4H ; [CPU_] |18239|

    .dwpsn file "main.c",line 18240,column 13,is_stmt,isa 0
    MOV32 *XAR5++,R1H ; [CPU_] |18240|
    ; repeat block ends ; []


    Please confirm you are seeing the same or verify the options being used.

    Thanks,
    Anna
  • Hi Anna, 

    I also had difficulty reproducing the issue at first. Anything we modify in the code seems to fix it. For actually confirming the errors, one has not only to maintain the original code structure, but also collect the information from the console that is being fed through printf's and populate an Excel spreadsheet to compare the results. The assembly code you show is not identical to mine, although very similar. I see that some of the registers being used are different, such as the one in the instruction MOVIZ. Here is my assembly code, collected recently:

            C$L1:
    008054:   E801E000    MOVIZ        R0, #0x3c00
    29                  *dst++ = __cospuf32(phi);
    008056:   E2790019    COSPUF32     R1H, R3H
    30                  *dst++ = __sinpuf32(phi);
    008058:   E278001C    SINPUF32     R4H, R3H
    28                  *dst++ = phi;
    00805a:   E801E002    MOVIZ        R2, #0x3c00
    00805c:   E80810C8    MOVXI        R0H, #0x219
    00805e:   E010DB85    ADDF32       R3H, R3H, R0H
                       || MOV32        *XAR5++, R3H
    008060:   E80810CA    MOVXI        R2H, #0x219
    29                  *dst++ = __cospuf32(phi);
    008062:   E2030185    MOV32        *XAR5++, R1H
    008064:   E2790019    COSPUF32     R1H, R3H
    30                  *dst++ = __sinpuf32(phi);
    008066:   E2780018    SINPUF32     R0H, R3H
    008068:   E2030485    MOV32        *XAR5++, R4H
    28                  *dst++ = phi;
    00806a:   E014DB85    ADDF32       R3H, R3H, R2H
                       || MOV32        *XAR5++, R3H
    29                  *dst++ = __cospuf32(phi);
    00806c:   E2030185    MOV32        *XAR5++, R1H
    32                  phi += delta_phi;
    00806e:   E801E001    MOVIZ        R1, #0x3c00
    29                  *dst++ = __cospuf32(phi);
    008070:   E279001D    COSPUF32     R5H, R3H
    30                  *dst++ = __sinpuf32(phi);
    008072:   E278001C    SINPUF32     R4H, R3H
    008074:   E2030085    MOV32        *XAR5++, R0H
    32                  phi += delta_phi;
    008076:   E80810C9    MOVXI        R1H, #0x219
    28                  *dst++ = phi;
    008078:   E2030385    MOV32        *XAR5++, R3H
    29                  *dst++ = __cospuf32(phi);
    00807a:   E012DD85    ADDF32       R3H, R3H, R1H
                       || MOV32        *XAR5++, R5H
    30                  *dst++ = __sinpuf32(phi);
    00807c:   E2030485    MOV32        *XAR5++, R4H
            C$L2:
    00807e:   E2AF05BE    MOV32        R5H, *--SP, UNCF

    These are all my compiler options:

    -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O3 --opt_for_speed=5 --include_path="C:/Users/a0274085/Documents/Customers/Crown/Jim Leonard/TMU_Test" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_headers/include" --include_path="C:/TI/controlSUITE/device_support/F2837xD/v210/F2837xD_common/include" --include_path="C:/TI/ccsv7_3/ccsv7/tools/compiler/ti-cgt-c2000_16.9.5.LTS/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/headers/include" --include_path="C:/ti/c2000/C2000Ware_1_00_02_00/device_support/f2837xd/common/include" --advice:performance=all --define=_INLINE --define=Version2 --define=_FLASH --define=CPU1 -g --c99 --diag_warning=225 --diag_wrap=off --display_error_number -k --asm_listing

    Thanks!

  • Does the exact set of preprocessed files, options, and compiler version 16.9.5.LTS cause the errors?

    Do you still get the errors if all of this is the same, only with --opt_for_speed=2 instead of --opt_for_speed=5?

    Are you compiling on Windows or Linux?

    I am not making modifications to the preprocessed files you provided.  I am only examining the assembly for a compiler bug and am unable to run the program.

  • I will try the preprocessed files in a different workspace.

    I still see the errors with --opt_for_speed=2.

    I am compiling with Windows.
  • If the only differences were the register numbers, no need to recompile the preprocessed files-- I compiled on Linux and that could account for small differences such as register names. As long as the instructions were the same, I'm not concerned. However, the code you originally posted with the NOPs looks completely different-- I assume that was either different source code or different options?

    Please try -O2, -O1 and -Ooff (no --opt_for_speed setting specified) and let me know which cause errors and which don't.

    Thanks,
    Anna
  • Here is the code you posted. There is nothing wrong with the loop in terms of latency violations. Please see for each multi-cycle instruction, I have noted how many delay slots prior to the use of its output register. According to the information we have encoded about the pipeline, these are all accurate:
    The ADDF32 takes 2 delay cycles prior to its use by a TMU instruction
    The TMU instructions take 2 delay cycles prior to the use of their output


    C$L1:                                                                                  delay slots before use of output register
    008054: E801E000 MOVIZ R0, #0x3c00
    29 *dst++ = __cospuf32(phi);
    008056: E2790019 COSPUF32 R1H, R3H                        5
    30 *dst++ = __sinpuf32(phi);
    008058: E278001C SINPUF32 R4H, R3H                         7
    28 *dst++ = phi;
    00805a: E801E002 MOVIZ R2, #0x3c00
    00805c: E80810C8 MOVXI R0H, #0x219
    00805e: E010DB85 ADDF32 R3H, R3H, R0H                   2
    || MOV32 *XAR5++, R3H
    008060: E80810CA MOVXI R2H, #0x219
    29 *dst++ = __cospuf32(phi);
    008062: E2030185 MOV32 *XAR5++, R1H
    008064: E2790019 COSPUF32 R1H, R3H                        3
    30 *dst++ = __sinpuf32(phi);
    008066: E2780018 SINPUF32 R0H, R3H                          6
    008068: E2030485 MOV32 *XAR5++, R4H
    28 *dst++ = phi;
    00806a: E014DB85 ADDF32 R3H, R3H, R2H                   2
    || MOV32 *XAR5++, R3H
    29 *dst++ = __cospuf32(phi);
    00806c: E2030185 MOV32 *XAR5++, R1H
    32 phi += delta_phi;
    00806e: E801E001 MOVIZ R1, #0x3c00
    29 *dst++ = __cospuf32(phi);
    008070: E279001D COSPUF32 R5H, R3H                     4
    30 *dst++ = __sinpuf32(phi);
    008072: E278001C SINPUF32 R4H, R3H                       4
    008074: E2030085 MOV32 *XAR5++, R0H
    32 phi += delta_phi;
    008076: E80810C9 MOVXI R1H, #0x219
    28 *dst++ = phi;
    008078: E2030385 MOV32 *XAR5++, R3H
    29 *dst++ = __cospuf32(phi);
    00807a: E012DD85 ADDF32 R3H, R3H, R1H               2 (back edge of loop)
    || MOV32 *XAR5++, R5H
    30 *dst++ = __sinpuf32(phi);
    00807c: E2030485 MOV32 *XAR5++, R4H
    C$L2:
    00807e: E2AF05BE MOV32 R5H, *--SP, UNCF



    Try single-stepping through your program to find at what point your data gets corrupted. If you can identify the exact instruction where your data gets corrupted, we can see if there is a code generation error at some other point in the program. Or there could be a memory corruption issue somewhere.

  • I was able to reproduce the error with a different preprocessed file.  The difference was that phi was not saved back to *dst.  The generated assembly looked like the original posted in the forum:

    RPTB $C$L3,AR6 ; [CPU_] |19259|
    ; repeat block starts ; []
    $C$L2:
    .dwpsn file "main.pp",line 19261,column 13,is_stmt,isa 0
    COSPUF32 R1H,R3H ; [CPU_] |19261|
    .dwpsn file "main.pp",line 19262,column 13,is_stmt,isa 0
    SINPUF32 R0H,R3H ; [CPU_] |19262|
    NOP ; [CPU_]
    NOP ; [CPU_]
    .dwpsn file "main.pp",line 19261,column 13,is_stmt,isa 0

    MOV32 *XAR5++,R1H ; [CPU_] |19261|
    || ADDF32 R3H,R3H,R2H ; [CPU_] |19264|

    .dwpsn file "main.pp",line 19262,column 13,is_stmt,isa 0
    MOV32 *XAR5++,R0H ; [CPU_] |19262|
    ; repeat block ends ; []

    Here, there is a latency violation on the back edge of the RPTB: there should be 2 delay slots between the ADDF32 into R3H and the COSPUF32 at the top of the RPTB, but there is only 1.  I have filed CODEGEN-3878 to track this issue.

    Thanks,

    Anna