This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28377D: Questions about compiler optimization

Part Number: TMS320F28377D
Other Parts Discussed in Thread: CCSTUDIO

Hi Champs,

I ask this for our customer. Now they have some questions about compiler optimization

① The customer looked at C:\ti\ccs1100\ccs\eclipse\plugins\com.ti.ccstudio.usersguide.doc_11.0.0.202109231134\html\images\c28x_compiler_optimization.pdf, there is such a page in this document as shown below. Customer checked the assembly code of several loops and found that the assembly results of int variables and uint variables are the same regardless of whether optimization is turned on or not. I didn't find an explanation for this part either, could you please help to check why it prefer signed over unsigned?

② After enabling optimization, is there a possibility that the execution order of C programs will be disrupted? For example, if optimization is turned on, in the following code, is it possible that the function call will be executed before statement1 or after statement2? If the function call is an inline function or an Intrinsic function, will this situation happen?

③ TMS320C28x Optimizing C/C++ Compiler v21.12.0.STS User's Guide mentions that special care should be taken when using assembly statements. The introduction of enabling O3 optimization and _program_level_optimization options is more detailed. How to avoid problems when optimizing O2 and below, is there a design guide? In addition, does the Intrinsic function also have the same problem as the assembly statement?

Could you please kindly help answer these customer questions? Thanks!

Best Regards,

Julia

  • the assembly results of int variables and uint variables are the same regardless of whether optimization is turned on or not

    This is probably less of an issue with recent versions of the compiler.  But I found one case where it makes a difference.

    Consider this source code ...

    #include <stdint.h>
    
    int32_t mac_signed_counter(int16_t *p1, int16_t *p2, int_fast16_t length)
    {
       int_fast16_t i;
       int32_t result = 0;
    
       _nassert((intptr_t)p1 % 2 == 0);
       _nassert((intptr_t)p2 % 2 == 0);
    
       #pragma MUST_ITERATE(,,2)
       for (i = 0; i < length; i++)
          result += (int32_t) p1[i] * p2[i];
    
       return result;
    }
    
    int32_t mac_unsigned_counter(int16_t *p1, int16_t *p2, int_fast16_t length)
    {
       uint_fast16_t i;
       int32_t result = 0;
    
       _nassert((intptr_t)p1 % 2 == 0);
       _nassert((intptr_t)p2 % 2 == 0);
    
       #pragma MUST_ITERATE(,,2)
       for (i = 0; i < length; i++)
          result += (int32_t) p1[i] * p2[i];
    
       return result;
    }

    The two functions are the same, except for the name of the function, and the type used for the loop counter variable i.  For the function mac_unsigned_counter, here is the central instruction sequence ...

            MOVZ      AR5,AL                ; [CPU_ALU] 
            MOV       P,#0                  ; [CPU_ALU] |28| 
            MOVL      ACC,XAR6              ; [CPU_ALU] 
            SUBB      XAR5,#1               ; [CPU_ARAU] 
            RPT       AR5
    ||      MAC      P,*XAR4++,*XAR7++     ; [CPU_ALU] |28| 
            ADDL      ACC,P                 ; [CPU_ALU] 
            MOVL      XAR6,ACC              ; [CPU_ALU] 

    At the start of this sequence, the variable length is in the register AL.  The performance of this code is dominated by the RPT || MAC instructions, which executes length times.

    Compare that to the same instruction sequence for the function mac_signed_counter ...

            MOVB      AH,#1                 ; [CPU_ALU] 
            ADD       AH,AL                 ; [CPU_ALU] 
            ASR       AH,1                  ; [CPU_ALU] 
            ADDB      AH,#-1                ; [CPU_ALU] 
            MOVZ      AR5,AH                ; [CPU_ALU] 
            MOVB      ACC,#0                ; [CPU_ALU] 
            RPT       AR5
    ||      DMAC     ACC:P,*XAR4++,*XAR7++ ; [CPU_ALU] |13| 
            ADDL      P,ACC                 ; [CPU_ALU]

    Again, at the start of this sequence, the variable length is in the register AL.  Note how it is right shifted by 1 (divided by 2), before it is copied to AR5.  The RPT || DMAC loop executes half as many times.  But because it is a DMAC (dual MAC) instruction, the same number of MAC operations are executed.  

    is it possible that the function call will be executed before statement1 or after statement2?

    That depends on what is computed in statement1 and statement2.  If those operations are entirely on local variables, and are otherwise independent of the function call, then instructions from those statements could execute before or after the call.  While that is possible, it is rare in practice.  Typically, code in those statements is very much related to the function call, and must be executed before (statement1) or after (statement2) the call.  

    How to avoid problems when optimizing O2 and below, is there a design guide?

    The restrictions related to asm statements are the same, without regard to the level of optimization used.  

    does the Intrinsic function also have the same problem as the assembly statement?

    No.  The compiler does not know the effect of asm statement, which is why so much care must be taken.  The compiler knows everything about the effects of an intrinsic, and can thus optimize it in context with the code around it.

    Thanks and regards,

    -George

  • Hi George,

    Thank you for your reply! In response to the previous reply, customer still have some questions:

    About question ②, if between statement 1 and statement 2 is assembler, surely the compiler won't adjust the execution order? Or is it also possible to adjust the execution order?

    About question ③, under O3 optimization, there are --program_level_compile and --call_assumptions=n to help users deal with the mixed use of C/C++ and assembly. Under O2 and below optimization, can only manually check the assembly result?

    Customer also have some new questions:

    (1) After using optimizations, will it cause an error if the code have function aliasing? Is there anything that needs special attention?

    (2)Regarding the use of volatile, customers want to know if there is a suggested check list that helps users to check volatile-related code. I've compiled some tips with customer and would like to check with you if there are more suggestions:

        1). If the global variable is modified in the interrupt service routine, if the variable will be referenced by other functions, it is necessary to check whether the volatile qualification should be added.
        2). Variables shared between tasks in a multitasking environment should be limited to volatile
        3). Memory-mapped hardware registers should be volatile-qualified
        4). The loop invariant should be limited by volatile to prevent it from being optimized into a conditional judgment statement by the compiler.

    Best Regards,

    Julia

  • if between statement 1 and statement 2 is assembler, surely the compiler won't adjust the execution order?

    Since this is all C code, the only way to get "assembler" between statements is by using an asm statement.  The compiler does not not know what an asm statement does.  So, the instructions from each statement will not move across it.

    After using optimizations, will it cause an error if the code have function aliasing?

    I'm not sure what you mean.  Please show an example.

    under O3 optimization, there are --program_level_compile and --call_assumptions=n to help users deal with the mixed use of C/C++ and assembly. Under O2 and below optimization, can only manually check the assembly result?

    The use of the word "assembly" here is different.  I presume you mean functions written in hand-coded assembly.  If you compile with --opt_level=2 or lower, all of the optimizations attempted by the compiler are independent of what may happen in any other function in the system, including those hand-coded in assembly.

    Regarding the use of volatile, customers want to know if there is a suggested check list

    No.  The C99 standard for C states "An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects."  You list some typical cases.  But I know of no comprehensive list of such cases.

    If the global variable is modified in the interrupt service routine

    Use volatile.

    Variables shared between tasks in a multitasking environment

    Use volatile.

    Memory-mapped hardware registers

    Use volatile.

    The loop invariant should be limited by volatile to prevent it from being optimized into a conditional judgment statement by the compiler.

    I'm not sure what you mean. Please show an example.

    Thanks and regards,

    -George

  • Hi George,

    Thank you for your reply! Let me clarify customer's question. I don't know how to quote your answer, so I just repost the remaining two questions from the customer:

    ① In optimization O2 and below, when C program calls hand-coded in assembly or hand-coded in assembly calls the C program, is it necessary to check the assembly code after compilation?(According to your previous answer, my understanding is NO?)

    ②Customer want to know if they have optimizations enabled while function aliasing exists, will there be problems caused by optimizations?

    Thanks for your help!

    Best Regards,

    Julia

  • In optimization O2 and below, when C program calls hand-coded in assembly or hand-coded in assembly calls the C program, is it necessary to check the assembly code after compilation?

    No.  This answer presumes the hand coded assembly functions observe all the constraints given in the sub-chapter titled Interfacing C and C++ With Assembly Language of the C28x compiler manual.

    A more general comment regarding optimization ... There is no reason to use --opt_level=3 and related options like --call_assumptions.  Use --opt_level=4 instead.  It is much easier to use.  For further details, please search for the sub-chapter titled Link-Time Optimization in the same manual.  The option --opt_level=3 remains because it is much older, and existing projects may use it.

    Customer want to know if they have optimizations enabled while function aliasing exists, will there be problems caused by optimizations?

    No.

    I don't know how to quote your answer

    Highlight some text in an earlier reply, then click on the box that says Quote.

    Thanks and regards,

    -George