weird (?) behavior when inlining function with volatile access

Markus Moll

Hi.

I have a problem with how (and when) the TI compiler will inline functions that access volatile objects. In particular, I want to wrap some hardware register accesses on a C64x+. The actual code is lengthy with register base addresses added, arguments checked, etc. For demonstration purposes, I've reduced the program to:

static inline uint32_t Get(volatile uint32_t *ptr)
{
  return *ptr;
}

int Test(uint32_t *ptr)
{
  int i;
  for(i=0; i<128; ++i)
  {
    if( Get(ptr) == 0 ) return 1;
  }
  return 0;
}

Now unfortunately the compiler does not want to inline "Get", and the assembler output is

;******************************************************************************
;* FUNCTION NAME: Test                                                        *
;*                                                                            *
;*   Regs Modified     : A0,A1,A3,A4,A5,B3                                    *
;*   Regs Used         : A0,A1,A3,A4,A5,B3,SP                                 *
;*   Local Frame Size  : 0 Args + 0 Auto + 0 Save = 0 byte                    *
;******************************************************************************
Test:
;** --------------------------------------------------------------------------*
           CALL    .S1     Get               ; |31| 
           MV      .L1     A4,A5             ; |27| 
           MV      .L1     A5,A4             ; |31| 
           MV      .L1X    B3,A1             ; |27| 
           MVK     .S1     0x80,A3           ; |29| 
;*----------------------------------------------------------------------------*
;*   SOFTWARE PIPELINE INFORMATION
;*      Disqualified loop: Loop contains a call
;*      Disqualified loop: Loop contains non-pipelinable instructions
;*----------------------------------------------------------------------------*
$C$L1:    
           ADDKPC  .S2     $C$RL0,B3,0       ; |31| 
$C$RL0:    ; CALL OCCURS {Get} {0}           ; |31| 
;** --------------------------------------------------------------------------*

           CMPEQ   .L1     A4,0,A4           ; |31| 
||         SUB     .S1     A3,1,A3           ; |29| 

           SUB     .L1     A4,1,A4           ; |31| 
           AND     .L1     A4,A3,A0          ; |29| 

   [ A0]   B       .S1     $C$L1             ; |29| 
|| [ A0]   MV      .D1     A5,A4             ; |31| 
|| [!A0]   CMPEQ   .L1     A4,0,A4           ; |33| 

   [ A0]   CALL    .S1     Get               ; |31| 
   [!A0]   RETNOP  .S2X    A1,3              ; |34| 
           ; BRANCHCC OCCURS {$C$L1}         ; |29| 
;** --------------------------------------------------------------------------*
           NOP             2
           ; BRANCH OCCURS {A1}              ; |34|

My understanding is that the volatile parameter inhibits inlining (although I honestly have no clue as to why that wouldbe).
Therefore, I made the parameter non-volatile and instead used a cast to make the volatile access:

static inline uint32_t Get(uint32_t *src)
{
	return *(volatile uint32_t*)src;
}

This gets inlined, but unfortunately the volatile access seems to be optimized away, as the load instruction was pulled outside the loop!

;******************************************************************************
;* FUNCTION NAME: Test                                                        *
;*                                                                            *
;*   Regs Modified     : A3,A4,A5,A6,B0,B1,B4                                 *
;*   Regs Used         : A3,A4,A5,A6,B0,B1,B3,B4                              *
;*   Local Frame Size  : 0 Args + 0 Auto + 0 Save = 0 byte                    *
;******************************************************************************
Test:
;** --------------------------------------------------------------------------*
           LDW     .D1T1   *A4,A6            ; |29| 
           MVK     .L2     0x1,B1
           NOP             1
;*----------------------------------------------------------------------------*
;*   SOFTWARE PIPELINE INFORMATION
;*
;*      Loop found in file               : C:/Test.c
;*      Loop source line                 : 29
;*      Loop opening brace source line   : 30
;*      Loop closing brace source line   : 32
;*      Known Minimum Trip Count         : 1                    
;*      Known Maximum Trip Count         : 128                    
;*      Known Max Trip Count Factor      : 1
;*      Loop Carried Dependency Bound(^) : 2
;*      Unpartitioned Resource Bound     : 1
;*      Partitioned Resource Bound(*)    : 1
;*      Resource Partition:
;*                                A-side   B-side
;*      .L units                     1*       0     
;*      .S units                     0        0     
;*      .D units                     0        0     
;*      .M units                     0        0     
;*      .X cross paths               0        1*    
;*      .T address paths             0        0     
;*      Long read paths              0        0     
;*      Long write paths             0        0     
;*      Logical  ops (.LS)           0        0     (.L or .S unit)
;*      Addition ops (.LSD)          2        3     (.L or .S or .D unit)
;*      Bound(.L .S .LS)             1*       0     
;*      Bound(.L .S .D .LS .LSD)     1*       1*    
;*
;*      Searching for software pipeline schedule at ...
;*         ii = 2  Schedule found with 5 iterations in parallel
;*      Done
;*
;*      Loop will be splooped
;*      Collapsed epilog stages       : 4
;*      Collapsed prolog stages       : 0
;*      Minimum required memory pad   : 0 bytes
;*
;*      Minimum safe trip count       : 1
;*----------------------------------------------------------------------------*
$C$L1:    ; PIPED LOOP PROLOG
   [ B1]   SPLOOPW 2       ;10               ; (P) 
;** --------------------------------------------------------------------------*
$C$L2:    ; PIPED LOOP KERNEL
           NOP             1
           CMPEQ   .L1     A6,0,A3           ; |27| (P) <0,1> 

           SPMASK          S2
||         MVK     .S2     0x80,B4           ; |29| 
||         SUB     .S1     A3,1,A4           ; |27| (P) <0,2>  ^ 

           SUB     .L2     B4,1,B4           ; |29| (P) <0,3> 

   [ B1]   MV      .L1     A4,A5             ; |29| (P) <0,4>  ^ 
||         AND     .L2X    A4,B4,B0          ; |29| (P) <0,4>  ^ 

   [!B0]   ZERO    .S2     B1                ; |29| (P) <0,5>  ^ 
           NOP             2
           NOP             1
           SPKERNEL 0,0
;** --------------------------------------------------------------------------*
$C$L3:    ; PIPED LOOP EPILOG
;** --------------------------------------------------------------------------*
           RETNOP  .S2     B3,4              ; |34| 
           CMPEQ   .L1     A5,0,A4           ; |33| 
           ; BRANCH OCCURS {B3}              ; |34|

Finally, if I replace this last inline function by a macro (which to my understanding should not make a difference!), it seems to work:

#define Get(src) (*(volatile uint32_t*)src)

produces:

;******************************************************************************
;* FUNCTION NAME: Test                                                        *
;*                                                                            *
;*   Regs Modified     : A0,A3,A4,A5,A6,A7,B0,B1                              *
;*   Regs Used         : A0,A3,A4,A5,A6,A7,B0,B1,B3                           *
;*   Local Frame Size  : 0 Args + 0 Auto + 0 Save = 0 byte                    *
;******************************************************************************
Test:
;** --------------------------------------------------------------------------*
           MVK     .L2     0x1,B1
;*----------------------------------------------------------------------------*
;*   SOFTWARE PIPELINE INFORMATION
;*
;*      Loop found in file               : C:/Test.c
;*      Loop source line                 : 29
;*      Loop opening brace source line   : 30
;*      Loop closing brace source line   : 32
;*      Known Minimum Trip Count         : 1                    
;*      Known Maximum Trip Count         : 128                    
;*      Known Max Trip Count Factor      : 1
;*      Loop Carried Dependency Bound(^) : 9
;*      Unpartitioned Resource Bound     : 2
;*      Partitioned Resource Bound(*)    : 2
;*      Resource Partition:
;*                                A-side   B-side
;*      .L units                     1        0     
;*      .S units                     0        0     
;*      .D units                     1        0     
;*      .M units                     0        0     
;*      .X cross paths               0        0     
;*      .T address paths             1        0     
;*      Long read paths              0        0     
;*      Long write paths             0        0     
;*      Logical  ops (.LS)           0        0     (.L or .S unit)
;*      Addition ops (.LSD)          4        5     (.L or .S or .D unit)
;*      Bound(.L .S .LS)             1        0     
;*      Bound(.L .S .D .LS .LSD)     2*       2*    
;*
;*      Searching for software pipeline schedule at ...
;*         ii = 9  Schedule found with 2 iterations in parallel
;*      Done
;*
;*      Loop will be splooped
;*      Collapsed epilog stages       : 1
;*      Collapsed prolog stages       : 0
;*      Minimum required memory pad   : 0 bytes
;*
;*      Minimum safe trip count       : 1
;*----------------------------------------------------------------------------*
$C$L1:    ; PIPED LOOP PROLOG
   [ B1]   SPLOOPW 9       ;18               ; (P) 
;** --------------------------------------------------------------------------*
$C$L2:    ; PIPED LOOP KERNEL
           NOP             4

           SPMASK          L1
||         MV      .L1     A4,A6

   [ B1]   LDW     .D1T1   *A6,A4            ; |31| (P) <0,5>  ^ 
           NOP             2

           SPMASK          S1,L2
||         MVK     .S1     0x80,A5           ; |29| 
||         MV      .L2     B1,B0

           NOP             1
           CMPEQ   .L1     A4,0,A7           ; |31| <0,10>  ^ 

           SUB     .L1     A5,1,A5           ; |29| <0,11> 
||         SUB     .S1     A7,1,A7           ; |31| <0,11>  ^ 

   [ B0]   MV      .L1     A7,A3             ; |29| <0,12> 
||         AND     .S1     A7,A5,A0          ; |29| <0,12>  ^ 

   [!A0]   ZERO    .L2     B1                ; |29| <0,13>  ^ 
           MV      .L2     B1,B0             ; |29| <0,14> Split a long life(pre-sched)
           NOP             1
           NOP             1
           SPKERNEL 0,0
;** --------------------------------------------------------------------------*
$C$L3:    ; PIPED LOOP EPILOG
;** --------------------------------------------------------------------------*
           RETNOP  .S2     B3,4              ; |34| 
           CMPEQ   .L1     A3,0,A4           ; |33| 
           ; BRANCH OCCURS {B3}              ; |34|

To me, the second case looks like a compiler bug, but I am not certain. I would like to minimize the amount of macros in my code, so I'd really prefer an inline function. Does anyone know how to safely achieve that?

Thanks a lot

Markus

over 11 years ago

0 pf over 11 years ago

TI__Expert 4930 points

Markus Moll said:

My understanding is that the volatile parameter inhibits inlining (although I honestly have no clue as to why that wouldbe).

It has, and has been documented as such, for as long as I've worked here, but I'm also not sure why.

Markus Moll said:

Therefore, I made the parameter non-volatile and instead used a cast to make the volatile access:

There is a known problem with volatile accesses in which the symbol is not volatile and all the accesses are done through cast pointers. It's SDSCM00050849 and is fixed in 7.3.18, 7.4.12, and 8.0.1.

Markus Moll said:

Finally, if I replace this last inline function by a macro (which to my understanding should not make a difference!), it seems to work:

A macro is simply a textual substitution in the source code. There is no call, no call boundary, no inlining. That can make a considerable difference in behavior.

This version could also be subject to CQ50849, but appears the conditions weren't quite met.

0 Markus Moll over 11 years ago in reply to pf

Expert 1830 points

Thank you for the quick reply!

pf said:

Therefore, I made the parameter non-volatile and instead used a cast to make the volatile access:

There is a known problem with volatile accesses in which the symbol is not volatile and all the accesses are done through cast pointers. It's SDSCM00050849 and is fixed in 7.3.18, 7.4.12, and 8.0.1.

[/quote]

Ah, good to know. I'm using 7.4.7, so that makes sense.

pf said:

This version could also be subject to CQ50849, but appears the conditions weren't quite met.

Before I meet the conditions next time, could you tell me where to find this issue? (I usually try the SDO-WEB database in clearquest, it's not in there)

In the meantime, I think I found a working solution, which is to define the function as

static inline uint32_t Get(uint32_t *ptr)
{
  volatile uint32_t *tmp = ptr;
  return *tmp;
}

Thanks again

Markus

0 Archaeologist over 11 years ago in reply to Markus Moll

TI__Guru* 84285 points

Markus Moll said:
Before I meet the conditions next time, could you tell me where to find this issue? (I usually try the SDO-WEB database in clearquest, it's not in there)

I'm sorry, SDSCM00050849 is not yet published. The defect's description of the conditions is almost exactly as described:

pf said:
There is a known problem with volatile accesses in which the symbol is not volatile and all the accesses are done through cast pointers.
Cancel
Up 0 True Down

Cancel

Code Composer Studio™︎

Code Composer Studio forum

weird (?) behavior when inlining function with volatile access