This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

weird (?) behavior when inlining function with volatile access

Hi.


I have a problem with how (and when) the TI compiler will inline functions that access volatile objects. In particular, I want to wrap some hardware register accesses on a C64x+. The actual code is lengthy with register base addresses added, arguments checked, etc. For demonstration purposes, I've reduced the program to:

static inline uint32_t Get(volatile uint32_t *ptr)
{
  return *ptr;
}

int Test(uint32_t *ptr)
{
  int i;
  for(i=0; i<128; ++i)
  {
    if( Get(ptr) == 0 ) return 1;
  }
  return 0;
}

Now unfortunately the compiler does not want to inline "Get", and the assembler output is

;******************************************************************************
;* FUNCTION NAME: Test                                                        *
;*                                                                            *
;*   Regs Modified     : A0,A1,A3,A4,A5,B3                                    *
;*   Regs Used         : A0,A1,A3,A4,A5,B3,SP                                 *
;*   Local Frame Size  : 0 Args + 0 Auto + 0 Save = 0 byte                    *
;******************************************************************************
Test:
;** --------------------------------------------------------------------------*
           CALL    .S1     Get               ; |31| 
           MV      .L1     A4,A5             ; |27| 
           MV      .L1     A5,A4             ; |31| 
           MV      .L1X    B3,A1             ; |27| 
           MVK     .S1     0x80,A3           ; |29| 
;*----------------------------------------------------------------------------*
;*   SOFTWARE PIPELINE INFORMATION
;*      Disqualified loop: Loop contains a call
;*      Disqualified loop: Loop contains non-pipelinable instructions
;*----------------------------------------------------------------------------*
$C$L1:    
           ADDKPC  .S2     $C$RL0,B3,0       ; |31| 
$C$RL0:    ; CALL OCCURS {Get} {0}           ; |31| 
;** --------------------------------------------------------------------------*

           CMPEQ   .L1     A4,0,A4           ; |31| 
||         SUB     .S1     A3,1,A3           ; |29| 

           SUB     .L1     A4,1,A4           ; |31| 
           AND     .L1     A4,A3,A0          ; |29| 

   [ A0]   B       .S1     $C$L1             ; |29| 
|| [ A0]   MV      .D1     A5,A4             ; |31| 
|| [!A0]   CMPEQ   .L1     A4,0,A4           ; |33| 

   [ A0]   CALL    .S1     Get               ; |31| 
   [!A0]   RETNOP  .S2X    A1,3              ; |34| 
           ; BRANCHCC OCCURS {$C$L1}         ; |29| 
;** --------------------------------------------------------------------------*
           NOP             2
           ; BRANCH OCCURS {A1}              ; |34| 

My understanding is that the volatile parameter inhibits inlining (although I honestly have no clue as to why that wouldbe).
Therefore, I made the parameter non-volatile and instead used a cast to make the volatile access:

static inline uint32_t Get(uint32_t *src)
{
	return *(volatile uint32_t*)src;
}



This gets inlined, but unfortunately the volatile access seems to be optimized away, as the load instruction was pulled outside the loop!

;******************************************************************************
;* FUNCTION NAME: Test                                                        *
;*                                                                            *
;*   Regs Modified     : A3,A4,A5,A6,B0,B1,B4                                 *
;*   Regs Used         : A3,A4,A5,A6,B0,B1,B3,B4                              *
;*   Local Frame Size  : 0 Args + 0 Auto + 0 Save = 0 byte                    *
;******************************************************************************
Test:
;** --------------------------------------------------------------------------*
           LDW     .D1T1   *A4,A6            ; |29| 
           MVK     .L2     0x1,B1
           NOP             1
;*----------------------------------------------------------------------------*
;*   SOFTWARE PIPELINE INFORMATION
;*
;*      Loop found in file               : C:/Test.c
;*      Loop source line                 : 29
;*      Loop opening brace source line   : 30
;*      Loop closing brace source line   : 32
;*      Known Minimum Trip Count         : 1                    
;*      Known Maximum Trip Count         : 128                    
;*      Known Max Trip Count Factor      : 1
;*      Loop Carried Dependency Bound(^) : 2
;*      Unpartitioned Resource Bound     : 1
;*      Partitioned Resource Bound(*)    : 1
;*      Resource Partition:
;*                                A-side   B-side
;*      .L units                     1*       0     
;*      .S units                     0        0     
;*      .D units                     0        0     
;*      .M units                     0        0     
;*      .X cross paths               0        1*    
;*      .T address paths             0        0     
;*      Long read paths              0        0     
;*      Long write paths             0        0     
;*      Logical  ops (.LS)           0        0     (.L or .S unit)
;*      Addition ops (.LSD)          2        3     (.L or .S or .D unit)
;*      Bound(.L .S .LS)             1*       0     
;*      Bound(.L .S .D .LS .LSD)     1*       1*    
;*
;*      Searching for software pipeline schedule at ...
;*         ii = 2  Schedule found with 5 iterations in parallel
;*      Done
;*
;*      Loop will be splooped
;*      Collapsed epilog stages       : 4
;*      Collapsed prolog stages       : 0
;*      Minimum required memory pad   : 0 bytes
;*
;*      Minimum safe trip count       : 1
;*----------------------------------------------------------------------------*
$C$L1:    ; PIPED LOOP PROLOG
   [ B1]   SPLOOPW 2       ;10               ; (P) 
;** --------------------------------------------------------------------------*
$C$L2:    ; PIPED LOOP KERNEL
           NOP             1
           CMPEQ   .L1     A6,0,A3           ; |27| (P) <0,1> 

           SPMASK          S2
||         MVK     .S2     0x80,B4           ; |29| 
||         SUB     .S1     A3,1,A4           ; |27| (P) <0,2>  ^ 

           SUB     .L2     B4,1,B4           ; |29| (P) <0,3> 

   [ B1]   MV      .L1     A4,A5             ; |29| (P) <0,4>  ^ 
||         AND     .L2X    A4,B4,B0          ; |29| (P) <0,4>  ^ 

   [!B0]   ZERO    .S2     B1                ; |29| (P) <0,5>  ^ 
           NOP             2
           NOP             1
           SPKERNEL 0,0
;** --------------------------------------------------------------------------*
$C$L3:    ; PIPED LOOP EPILOG
;** --------------------------------------------------------------------------*
           RETNOP  .S2     B3,4              ; |34| 
           CMPEQ   .L1     A5,0,A4           ; |33| 
           ; BRANCH OCCURS {B3}              ; |34| 

Finally, if I replace this last inline function by a macro (which to my understanding should not make a difference!), it seems to work:

#define Get(src) (*(volatile uint32_t*)src)

produces:

;******************************************************************************
;* FUNCTION NAME: Test                                                        *
;*                                                                            *
;*   Regs Modified     : A0,A3,A4,A5,A6,A7,B0,B1                              *
;*   Regs Used         : A0,A3,A4,A5,A6,A7,B0,B1,B3                           *
;*   Local Frame Size  : 0 Args + 0 Auto + 0 Save = 0 byte                    *
;******************************************************************************
Test:
;** --------------------------------------------------------------------------*
           MVK     .L2     0x1,B1
;*----------------------------------------------------------------------------*
;*   SOFTWARE PIPELINE INFORMATION
;*
;*      Loop found in file               : C:/Test.c
;*      Loop source line                 : 29
;*      Loop opening brace source line   : 30
;*      Loop closing brace source line   : 32
;*      Known Minimum Trip Count         : 1                    
;*      Known Maximum Trip Count         : 128                    
;*      Known Max Trip Count Factor      : 1
;*      Loop Carried Dependency Bound(^) : 9
;*      Unpartitioned Resource Bound     : 2
;*      Partitioned Resource Bound(*)    : 2
;*      Resource Partition:
;*                                A-side   B-side
;*      .L units                     1        0     
;*      .S units                     0        0     
;*      .D units                     1        0     
;*      .M units                     0        0     
;*      .X cross paths               0        0     
;*      .T address paths             1        0     
;*      Long read paths              0        0     
;*      Long write paths             0        0     
;*      Logical  ops (.LS)           0        0     (.L or .S unit)
;*      Addition ops (.LSD)          4        5     (.L or .S or .D unit)
;*      Bound(.L .S .LS)             1        0     
;*      Bound(.L .S .D .LS .LSD)     2*       2*    
;*
;*      Searching for software pipeline schedule at ...
;*         ii = 9  Schedule found with 2 iterations in parallel
;*      Done
;*
;*      Loop will be splooped
;*      Collapsed epilog stages       : 1
;*      Collapsed prolog stages       : 0
;*      Minimum required memory pad   : 0 bytes
;*
;*      Minimum safe trip count       : 1
;*----------------------------------------------------------------------------*
$C$L1:    ; PIPED LOOP PROLOG
   [ B1]   SPLOOPW 9       ;18               ; (P) 
;** --------------------------------------------------------------------------*
$C$L2:    ; PIPED LOOP KERNEL
           NOP             4

           SPMASK          L1
||         MV      .L1     A4,A6

   [ B1]   LDW     .D1T1   *A6,A4            ; |31| (P) <0,5>  ^ 
           NOP             2

           SPMASK          S1,L2
||         MVK     .S1     0x80,A5           ; |29| 
||         MV      .L2     B1,B0

           NOP             1
           CMPEQ   .L1     A4,0,A7           ; |31| <0,10>  ^ 

           SUB     .L1     A5,1,A5           ; |29| <0,11> 
||         SUB     .S1     A7,1,A7           ; |31| <0,11>  ^ 

   [ B0]   MV      .L1     A7,A3             ; |29| <0,12> 
||         AND     .S1     A7,A5,A0          ; |29| <0,12>  ^ 

   [!A0]   ZERO    .L2     B1                ; |29| <0,13>  ^ 
           MV      .L2     B1,B0             ; |29| <0,14> Split a long life(pre-sched)
           NOP             1
           NOP             1
           SPKERNEL 0,0
;** --------------------------------------------------------------------------*
$C$L3:    ; PIPED LOOP EPILOG
;** --------------------------------------------------------------------------*
           RETNOP  .S2     B3,4              ; |34| 
           CMPEQ   .L1     A3,0,A4           ; |33| 
           ; BRANCH OCCURS {B3}              ; |34| 

To me, the second case looks like a compiler bug, but I am not certain. I would like to minimize the amount of macros in my code, so I'd really prefer an inline function. Does anyone know how to safely achieve that?

Thanks a lot

Markus

  • Markus Moll said:

    My understanding is that the volatile parameter inhibits inlining (although I honestly have no clue as to why that wouldbe).

    It has, and has been documented as such, for as long as I've worked here, but I'm also not sure why.

    Markus Moll said:

    Therefore, I made the parameter non-volatile and instead used a cast to make the volatile access:

    There is a known problem with volatile accesses in which the symbol is not volatile and all the accesses are done through cast pointers.  It's SDSCM00050849 and is fixed in 7.3.18, 7.4.12, and 8.0.1.

    Markus Moll said:

    Finally, if I replace this last inline function by a macro (which to my understanding should not make a difference!), it seems to work:

    A macro is simply a textual substitution in the source code.  There is no call, no call boundary, no inlining.  That can make a considerable difference in behavior.

    This version could also be subject to CQ50849, but appears the conditions weren't quite met.

  • Thank you for the quick reply!

    pf said:

    Therefore, I made the parameter non-volatile and instead used a cast to make the volatile access:

    There is a known problem with volatile accesses in which the symbol is not volatile and all the accesses are done through cast pointers.  It's SDSCM00050849 and is fixed in 7.3.18, 7.4.12, and 8.0.1.


    [/quote]

    Ah, good to know. I'm using 7.4.7, so that makes sense.

    pf said:

    This version could also be subject to CQ50849, but appears the conditions weren't quite met.

    Before I meet the conditions next time, could you tell me where to find this issue? (I usually try the SDO-WEB database in clearquest, it's not in there)

    In the meantime, I think I found a working solution, which is to define the function as

    static inline uint32_t Get(uint32_t *ptr)
    {
      volatile uint32_t *tmp = ptr;
      return *tmp;
    }

    Thanks again

    Markus

  • Markus Moll said:
    Before I meet the conditions next time, could you tell me where to find this issue? (I usually try the SDO-WEB database in clearquest, it's not in there)

    I'm sorry, SDSCM00050849 is not yet published.  The defect's description of the conditions is almost exactly as described:

    pf said:
    There is a known problem with volatile accesses in which the symbol is not volatile and all the accesses are done through cast pointers.