This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

7.4.6 C compiler producing code that triggers memory protection fault

One of the developers here has come across a problem with the following (simplified) code snippet:

while((Index = GetIndexInBitField(aEnabledBitfield)) < MAX_NUM_INDICES)
{
   pThisParams = &(CtrlParams->aParam[ Index ]);

   pThisParams->pCtrlAddr = GetBuffAddr(Absolute, 0);
   
   /* asm("   NOP 1"); */
   
   pThisParams->pCtrlAddr->Index = 10;
}

GetIndexInBitField() is an inline function that included a loop. pThisParams->pCtrlAddr is NULL before this loop and GetBuffAddr() returns a non-NULL value. As shown, the last line triggers a memory protection fault with the FAR being 0x00000000. If the commented out asm statement is included, the code runs normally.

Looking at the assembler produced in the faulting case, the issue seems clear: the code is reading the contents of pThisParams->pCtrlAddr before it is being written. Putting the asm statement in the way forces the compiler to complete the write before the line is inserted and doesn't read it until afterwards.

[...]

$C$RL70:   ; CALL OCCURS {GetBuffAddr} {0}  ; |1568|
$C$DW$L$ulc_FgrCtrl$74$E:
;** --------------------------------------------------------------------------*
$C$DW$L$ulc_FgrCtrl$75$B:
;          EXCLUSIVE CPU CYCLES: 14
           MVK     .S1     148,A3            ; |1568|
           MPY32   .M1     A3,A11,A5         ; |1568|
           MVKL    .S2     ControlParams,B4
           MVKH    .S2     ControlParams,B4
           MVKL    .S1     ControlParams,A31
           MVKH    .S1     ControlParams,A31
           ADD     .L2X    A5,B4,B4          ; |1572|
           ADDK    .S2     192,B4            ; |1572|
           LDW     .D2T2   *B4,B5            ; |1572|  <-- Loads pThisParams->pCtrlAddr from memory *** before it has been updated *** => this will read NULL (0x0)


           ADD     .L1     A31,A5,A3         ; |1568|
           MVK     .L2     10,B7             ; |1572|
           MVK     .S2     0x10,B30          ; |190|
           ADDAW   .D2     SP,110,B31

           STW     .D2T2   B7,*B5            ; |1572| <-- Writes to address read above => Memory protection fault
||         SUB     .L2     B30,2,B7
||         ADDK    .S1     192,A3            ; |1568|
||         MVC     .S2     B30,RILC

$C$DW$L$ulc_FgrCtrl$75$E:
;*----------------------------------------------------------------------------*
;*   SOFTWARE PIPELINE INFORMATION

[...]

;*----------------------------------------------------------------------------*
$C$L73:    ; PIPED LOOP PROLOG
;          EXCLUSIVE CPU CYCLES: 7
        .dwpsn  file "ctrl_utils.h",line 190,column 0,is_stmt,isa 0

           SPLOOPD 2       ;8                ; (P)
||         STW     .D1T1   A4,*A3            ; |1568| <-- Writes pointer returned by GetBuffAddr to memory... too late
||         ADD     .L2     8,B31,B5
||         MVC     .S2     B7,ILC

;** --------------------------------------------------------------------------*
$C$L74:    ; PIPED LOOP KERNEL

[...]


I've spent a couple of hours trying the reproduce this without requiring our codebase and have not been able to. However, if there are any questions that I can answer to help you resolve this issue, please let me know. In the meantime, I've been able to refactor the loop body to get the compiler to produce correct code...

Regards,

SPH

  • Are any of these pointers declared with the "restrict" keyword?

    This does look like a bug, but without a test case, there's pretty much no hope of fixing it.

  • Hi,

    No, the restrict keyword is not used in this code.

    I understand that it is difficult to debug without a test case but I am happy to run the compiler with debug switches or run debug builds of the compiler, if that helps...

    Cheers,

    SPH.

  • This is such a narrow view into the test case that I'm not sure I can recommend a way to proceed.  At the very least, tell us your compilation options.  Try adding -s and --gen_opt_info=2.  --gen_opt_info should generate a long, detailed optimizer information file.  Try compiling at -o0, -o1, -o2, and -o3.  Does it work if you use --disable_software_pipeline?  Are there any interesting pointer casts or unions in your code?

  • Let's consider what's happening and what could cause it.

    The store to the field is happening after the load from the field.  That tells me that the compiler did not recognise that they're the same address.  In the simplified excerpt, there's no way that could happen except through some unknown compiler bug.

    What was simplified away?  Is the store of a different type than the load, thanks to a pointer cast?  Is the struct not really a struct, but rather a pointer to an array that has been cast to a pointer-to-struct?

    We have seen these kinds of things done before.  First, they're violations of the "strict aliasing rule."  Second, they're a great way to fake out the compiler.  One is not supposed to access the same object using two different types (with certain exceptions).

    George's question about "restrict" is in this same vein:  "restrict" indicates that two objects cannot be aliased, and the compiler will believe the annotation even when it causes trouble.

    Can you perhaps preprocess the file and then show exactly what this segment looks like?

  • Hi Guys,

    Thanks for your replies (and apologies for taking a while to find the time to progress this!)

    The issue is still present with -O2 and --disable_software_pipeline but not with -O1. --gen_opt_info=2 doesn't generate any information referring to this loop.

    The preprocessor output for this loop is:

          while((Index = GetIndexInBitField(aEnabledBitfield)) < (1000))
          {
             pThisParams = &(CtrlParams->aParam[ Index ]);
    
             
     
             pThisParams->pCtrlAddr = GetBuffAddr(Absolute, 0); 
    
             
    
             pThisParams->pCtrlAddr->Index = 10; 
          }
    

    In terms of types:

    • Index is a local unsigned int
    • aEnabledBitfield is a local array of unsigned int
    • pThisParams is a local pointer to a typedef, which is complex structure (with nested structures)
    • CtrlParams is a local  pointer to a different typedef, which is complex structure (with nested structures). It points to a static structure within this file
    • GetBuffAddr() is function outside this file
    • Absolute is a local unsigned int

    None of these use the restrict keyword.

    Note that GetIndexInBitField() is an inline function and is in the preprocessed output as (where U32 is an unsigned int):

    static inline U32 GetIndexInBitField(U32 Bitfield[])
    {
       U32 Index   = (1000);
       U32 WordIndex = 0;
       U32 WordFound = (((((1000)) + (32) - 1) / (32)));
       U32 IndexInWord = 0;
    
       #pragma UNROLL(2)
    
       for(WordIndex=0; WordIndex<(((((1000)) + (32) - 1) / (32))); WordIndex++)
       {
          if (Bitfield[WordIndex])
          {
             WordFound = _min2(WordIndex, WordFound);
          }
       }
    
       if (WordFound < (((((1000)) + (32) - 1) / (32))))
       {
          IndexInWord = (Bitfield[WordFound] & 0x1) ? 0 : _norm(_bitr(Bitfield[WordFound] & 0xFFFFFFFE)) +1;
          Index = IndexInWord + (WordFound << 5);
          (Bitfield[WordFound]) &= (~( 1 << ( IndexInWord ) ));
       }
    
       return Index;
    }
    

    Note that when I said that the code was simplified, I meant that various more complex expression can been replaced with constants (the 0 and the 10) but the issue remains. Unfortunately any attempt to simplify it further - so that it could reasonably be sent to TI - has failed.

    Thanks,

    SPH.

  • From studying your assembly code, it seems that the optimizer has hoisted both the call to GetIndexInBitField and the call to GetBuffAddr out of the loop, apparently believing that they are both loop-invariant pure functions.

    1. Can the variable aEnabledBitfield be modified anywhere in the loop you show?
    2. Does GetIndexInBitField depend on any state other than its argument aEnabledBitfield?
    3. Does GetIndexInBitField have any side effects?
    4. Can the variable Absolute be modified anywhere in the loop you show?
    5. Does GetBuffAddr depend on any state other than its argument Absolute?
    6. Does GetBuffAddr have any side effects?

  • From reading the definition of GetIndexInBitField you posted, it would appear that 2 is no, and 3 is yes.  I may have been reading that assembly code snippet incorrectly; it may be the case that the optimizer inlined two copies of GetIndexInBitField.

  • I still can't produce assembly code that looks like what you've shown.  Please post your complete compilation options.  Also, please add the compiler option -os and show us a bit more of the assembly code, from just before the call to GetAddr to after the loop.

  • Hi Archaeologist,

    The command line being used is: cl6x -qq --gcc -O3 -mi200 -mv6600 -mt -mw -oi -pdr <various include paths> <various defines> --abi=eabi -k -fr <temp dir> -fs <temp dir> -ft <temp dir> <filename>

    Regarding your questions above:

    1. Yes, as you spotted GetIndexInBitField returns the LS bit set and clears that bit in its parameter (so aEnabledBitfield)
    2. As you spotted, no
    3. Again, as you spotted, yes (see 1.)
    4. No
    5. Yes, there is a circular buffer of arrays of structures (all in another file, as is the definition of this function) and this function looks up the circular buffer entry using the first parameter and, then, looks up the array element using the second parameter, returning a pointer. Note that all this state is private to another file.
    6. No

    Regarding point 5, I've made this refer to a function that I've declared but not defined (so not state dependence and no side effects) and the assembler produced is the same.

    OK, here is the assembler produced with -os. The call immediately before the while loop is a memcpy (to copy the bitfield that is iterated over (and destroyed by) the loop so I've started from that call...

               CALLP   .S2     memcpy,B3
    ||         ADDAW   .D1X    SP,112,A4         ; |1566|
    ||         MVK     .S1     0x80,A6           ; |1566|

    $C$RL69:   ; CALL OCCURS {memcpy} {0}        ; |1566|
    ;** --------------------------------------------------------------------------*
    ;          EXCLUSIVE CPU CYCLES: 2
               MVK     .S2     0x10,B5           ; |219|

               SUB     .L2     B5,2,B5
    ||         ADDAW   .D2     SP,110,B4

    ;*----------------------------------------------------------------------------*
    ;*   SOFTWARE PIPELINE INFORMATION
    ;*
    ;*      Loop found in file               : fg_ctrl.c
    ;*      Loop inlined from                : ctrl_utils.h
    ;*      Loop source line                 : 219
    ;*      Loop opening brace source line   : 220
    ;*      Loop closing brace source line   : 225
    ;*      Loop Unroll Multiple             : 2x
    ;*      Known Minimum Trip Count         : 16                    
    ;*      Known Maximum Trip Count         : 16                    
    ;*      Known Max Trip Count Factor      : 16
    ;*      Loop Carried Dependency Bound(^) : 2
    ;*      Unpartitioned Resource Bound     : 1
    ;*      Partitioned Resource Bound(*)    : 2
    ;*      Resource Partition:
    ;*                                A-side   B-side
    ;*      .L units                     0        0     
    ;*      .S units                     0        0     
    ;*      .D units                     1        1     
    ;*      .M units                     0        0     
    ;*      .X cross paths               0        0     
    ;*      .T address paths             1        1     
    ;*      Long read paths              0        0     
    ;*      Long write paths             0        0     
    ;*      Logical  ops (.LS)           2        0     (.L or .S unit)
    ;*      Addition ops (.LSD)          2        0     (.L or .S or .D unit)
    ;*      Bound(.L .S .LS)             1        0     
    ;*      Bound(.L .S .D .LS .LSD)     2*       1     
    ;*
    ;*      Searching for software pipeline schedule at ...
    ;*         ii = 2  Schedule found with 4 iterations in parallel
    ;*
    ;*      Register Usage Table:
    ;*          +-----------------------------------------------------------------+
    ;*          |AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB|
    ;*          |00000000001111111111222222222233|00000000001111111111222222222233|
    ;*          |01234567890123456789012345678901|01234567890123456789012345678901|
    ;*          |--------------------------------+--------------------------------|
    ;*       0: |   ***                          |*   **                          |
    ;*       1: |   ***                          |*   **                          |
    ;*          +-----------------------------------------------------------------+
    ;*
    ;*      Done
    ;*
    ;*      Loop will be splooped
    ;*      Collapsed epilog stages       : 0
    ;*      Collapsed prolog stages       : 0
    ;*      Minimum required memory pad   : 0 bytes
    ;*
    ;*      For further improvement on this loop, try option -mh8
    ;*
    ;*      Minimum safe trip count       : 1 (after unrolling)
    ;*      Min. prof. trip count  (est.) : 2 (after unrolling)
    ;*
    ;*      Mem bank conflicts/iter(est.) : { min 0.000, est 0.000, max 0.000 }
    ;*      Mem bank perf. penalty (est.) : 0.0%
    ;*
    ;*
    ;*      Total cycles (est.)         : 6 + min_trip_cnt * 2 = 38        
    ;*----------------------------------------------------------------------------*
    ;*       SETUP CODE
    ;*
    ;*                  MV              B5,B4
    ;*                  ADD             12,B4,B4
    ;*                  ADD             8,B5,B5
    ;*
    ;*        SINGLE SCHEDULED ITERATION
    ;*
    ;*        $C$C1902:
    ;*   0              LDW     .D2T2   *B5++(8),B0       ; |223|
    ;*   1              LDW     .D2T2   *B4++(8),B0       ; |223|
    ;*   2              NOP             3
    ;*   5      [ B0]   MIN2    .L1     A3,A4,A4          ; |223|  ^
    ;*   6      [ B0]   MIN2    .L1     A5,A4,A4          ; |223|  ^
    ;*     ||           ADD     .D1     2,A5,A5           ; |219|
    ;*     ||           ADD     .S1     2,A3,A3           ; |219|
    ;*     ||           SPBR            $C$C1902
    ;*   7              NOP             1
    ;*   8              ; BRANCHCC OCCURS {$C$C1902}      ; |219|
    ;*----------------------------------------------------------------------------*
    $C$L71:    ; PIPED LOOP PROLOG
    ;          EXCLUSIVE CPU CYCLES: 7
        .dwpsn    file "ctrl_utils.h",line 219,column 0,is_stmt,isa 0

               SPLOOPD 2       ;8                ; (P)
    ||         MVC     .S2     B5,ILC
    ||         ADD     .L2     8,B4,B5

    ;** --------------------------------------------------------------------------*
    $C$L72:    ; PIPED LOOP KERNEL
    $C$DW$L$ulc_FgrCtrl$74$B:
        .dwpsn    file "ctrl_utils.h",line 220,column 0,is_stmt,isa 0
    ;          EXCLUSIVE CPU CYCLES: 2

               SPMASK          L2
    ||         ADD     .L2     12,B4,B4
    ||         LDW     .D2T2   *B5++(8),B0       ; |223| (P) <0,0>

               LDW     .D2T2   *B4++(8),B0       ; |223| (P) <0,1>
               NOP             2

               SPMASK          L1,S1
    ||         ZERO    .L1     A3                ; |219|
    ||         MV      .S1X    B10,A4            ; |212|

               SPMASK          S1
    ||         MVK     .S1     0x1,A5
    || [ B0]   MIN2    .L1     A3,A4,A4          ; |223| (P) <0,5>  ^

        .dwpsn    file "ctrl_utils.h",line 225,column 0,is_stmt,isa 0

               SPKERNEL 0,0
    ||         ADD     .S1     2,A3,A3           ; |219| <0,6>
    ||         ADD     .D1     2,A5,A5           ; |219| <0,6>
    || [ B0]   MIN2    .L1     A5,A4,A4          ; |223| <0,6>  ^

    $C$DW$L$ulc_FgrCtrl$74$E:
    ;** --------------------------------------------------------------------------*
    $C$L73:    ; PIPED LOOP EPILOG
    ;          EXCLUSIVE CPU CYCLES: 5
    ;** 210    -----------------------    Index = K$217;  // [109]
    ;** 219    -----------------------    K$226 = 32u;  // [109]
    ;**      -----------------------    goto g60;
               MVK     .S2     0x3e8,B4          ; |1518|
               MVK     .S1     32,A6             ; |227|
               NOP             1
               MVK     .S1     0x20,A10          ; |219|
               NOP             1
    ;** --------------------------------------------------------------------------*
    ;          EXCLUSIVE CPU CYCLES: 7

               MV      .S1X    SP,A3             ; |230| Register A/B partition copy
    ||         CMPLTU  .L1     A4,A6,A0          ; |227|

       [ A0]   MVK     .S1     448,A5            ; |230|
    ||         ADDAW   .D1     A3,A4,A3          ; |230|
    || [!A0]   B       .S2     $C$L74

               ADD     .L1     A5,A3,A5          ; |230|
       [ A0]   LDW     .D1T1   *A5,A3            ; |230|
               NOP             3
               ; BRANCHCC OCCURS {$C$L74}  
    ;** --------------------------------------------------------------------------*
    ;          EXCLUSIVE CPU CYCLES: 9
    ;**    -----------------------g59:
    ;** 230    -----------------------    C$205 = &aEnabledBitField[WordFound];  // [109]
    ;** 230    -----------------------    C$206 = *C$205;  // [109]
    ;** 230    -----------------------    (C$206&1u) ? (C$207 = 0u) : (C$207 = _norm((int)_bitr(C$206&0xfffffffeu))+1u);  // [109]
    ;** 230    -----------------------    Index = (WordFound<<5)+C$207;  // [109]
    ;** 231    -----------------------    *C$205 = ~(1u<<C$207)&C$206;  // [109]
    ;** 230    -----------------------    K$226 = 32u;  // [109]

               MVK     .L2     1,B4              ; |231|
    ||         SHL     .S2X    A4,5,B31          ; |230|

               AND     .L1     -2,A3,A6          ; |230|
               BITR    .M1     A6,A6             ; |230|
               AND     .L1     1,A3,A0           ; |230|
               NORM    .L1     A6,A6             ; |230|

       [!A0]   ADD     .L1     1,A6,A6           ; |230|
    || [ A0]   ZERO    .S1     A6                ; |230|

               SHL     .S1X    B4,A6,A7          ; |231|

               ANDN    .L1     A3,A7,A3          ; |231|
    ||         ADD     .L2X    A6,B31,B4         ; |230|

               STW     .D1T1   A3,*A5            ; |231|
    ;** --------------------------------------------------------------------------*
    $C$L74:    
    ;          EXCLUSIVE CPU CYCLES: 7
    ;**    -----------------------g60:
    ;** 234    -----------------------    Index = Index;  // [109]
    ;* 1568    -----------------------    if ( Index > K$269 ) goto g68;
    ;**      -----------------------    #pragma LOOP_FLAGS(5120u)
               CMPGTU  .L2     B4,B12,B0         ; |1568|
       [ B0]   BNOP    .S1     $C$L79,4          ; |1568|
               MV      .L1X    B4,A11            ; |234|
               ; BRANCHCC OCCURS {$C$L79}        ; |1568|
    ;** --------------------------------------------------------------------------*
    ;**   BEGIN LOOP $C$L75
    ;** --------------------------------------------------------------------------*
    $C$L75:    
    $C$DW$L$ulc_FgrCtrl$79$B:
    ;          EXCLUSIVE CPU CYCLES: 6
    ;**    -----------------------g62:
    ;* 1574    -----------------------    C$204 = Index*148;
    ;* 1574    -----------------------    *(C$204+(struct $fake35 **)K$180+192) = GetBuffAddr(Absolute, 0u);
    ;* 1578    -----------------------    (**((struct $fake36 *)K$180+C$204+192)).Index = 10u;
    ;** 212    -----------------------    WordFound = K$226;  // [109]
    ;** 219    -----------------------    L$4 = 16;  // [109]
    ;**      -----------------------    U$381 = 1u;
    ;**      -----------------------    U$378 = &aEnabledBitField[-2];
    ;** 219    -----------------------    WordIndex = 0u;  // [109]
    ;**      -----------------------    #pragma MUST_ITERATE(16, 16, 16)
    ;**      -----------------------    #pragma UNROLL(1)
    ;**      -----------------------    // LOOP BELOW UNROLLED BY FACTOR(2)
    ;**      -----------------------    #pragma LOOP_FLAGS(4099u)
    ;**    -----------------------g63:
    ;** 223    -----------------------    (*(U$378 += 2)) ? (WordFound = _min2((int)WordIndex, (int)WordFound)) : WordFound;  // [109]
    ;** 223    -----------------------    (U$378[1]) ? (WordFound = _min2((int)U$381, (int)WordFound)) : WordFound;  // [109]
    ;** 219    -----------------------    U$381 += 2u;  // [109]
    ;** 219    -----------------------    WordIndex += 2u;  // [109]
    ;** 219    -----------------------    if ( !__builtin_expect((long)!(L$4 = L$4-1), 0L) ) goto g63;  // [109]
    ;** 227    -----------------------    if ( WordFound < 32u ) goto g66;  // [109]
    $C$DW$657    .dwtag  DW_TAG_TI_branch
        .dwattr $C$DW$657, DW_AT_low_pc(0x00)
        .dwattr $C$DW$657, DW_AT_name("GetBuffAddr")
        .dwattr $C$DW$657, DW_AT_TI_call
               CALL    .S1     GetBuffAddr       ; |1574|
               ADDKPC  .S2     $C$RL70,B3,3      ; |1574|

               ZERO    .L2     B4                ; |1574|
    ||         MV      .L1X    B11,A4            ; |1574|

    $C$RL70:   ; CALL OCCURS {GetBuffAddr} {0}   ; |1574|
    $C$DW$L$ulc_FgrCtrl$79$E:
    ;** --------------------------------------------------------------------------*
    $C$DW$L$ulc_FgrCtrl$80$B:
    ;          EXCLUSIVE CPU CYCLES: 14
               MVK     .S1     148,A3            ; |1574|
               MPY32   .M1     A3,A11,A5         ; |1574|
               MVKL    .S2     s_ControlParams,B4
               MVKH    .S2     s_ControlParams,B4
               MVKL    .S1     s_ControlParams,A31
               MVKH    .S1     s_ControlParams,A31
               ADD     .L2X    A5,B4,B4          ; |1578|
               ADDK    .S2     192,B4            ; |1578|
               LDW     .D2T2   *B4,B5            ; |1578|
               ADD     .L1     A31,A5,A3         ; |1574|
               MVK     .L2     10,B7             ; |1578|
               MVK     .S2     0x10,B30          ; |219|
               ADDAW   .D2     SP,110,B31

               STW     .D2T2   B7,*B5            ; |1578|
    ||         SUB     .L2     B30,2,B7
    ||         ADDK    .S1     192,A3            ; |1574|
    ||         MVC     .S2     B30,RILC

    $C$DW$L$ulc_FgrCtrl$80$E:
    ;*----------------------------------------------------------------------------*
    ;*   SOFTWARE PIPELINE INFORMATION
    ;*
    ;*      Loop found in file               : fg_ctrl.c
    ;*      Loop inlined from                : ctrl_utils.h
    ;*      Loop source line                 : 219
    ;*      Loop opening brace source line   : 220
    ;*      Loop closing brace source line   : 225
    ;*      Loop Unroll Multiple             : 2x
    ;*      Known Minimum Trip Count         : 16                    
    ;*      Known Maximum Trip Count         : 16                    
    ;*      Known Max Trip Count Factor      : 16
    ;*      Loop Carried Dependency Bound(^) : 2
    ;*      Unpartitioned Resource Bound     : 1
    ;*      Partitioned Resource Bound(*)    : 2
    ;*      Resource Partition:
    ;*                                A-side   B-side
    ;*      .L units                     0        0     
    ;*      .S units                     0        0     
    ;*      .D units                     1        1     
    ;*      .M units                     0        0     
    ;*      .X cross paths               0        0     
    ;*      .T address paths             1        1     
    ;*      Long read paths              0        0     
    ;*      Long write paths             0        0     
    ;*      Logical  ops (.LS)           2        0     (.L or .S unit)
    ;*      Addition ops (.LSD)          2        0     (.L or .S or .D unit)
    ;*      Bound(.L .S .LS)             1        0     
    ;*      Bound(.L .S .D .LS .LSD)     2*       1     
    ;*
    ;*      Searching for software pipeline schedule at ...
    ;*         ii = 2  Schedule found with 4 iterations in parallel
    ;*
    ;*      Register Usage Table:
    ;*          +-----------------------------------------------------------------+
    ;*          |AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB|
    ;*          |00000000001111111111222222222233|00000000001111111111222222222233|
    ;*          |01234567890123456789012345678901|01234567890123456789012345678901|
    ;*          |--------------------------------+--------------------------------|
    ;*       0: |   ***                          |*   **                          |
    ;*       1: |   ***                          |*   **                          |
    ;*          +-----------------------------------------------------------------+
    ;*
    ;*      Done
    ;*
    ;*      Loop will be splooped
    ;*      Collapsed epilog stages       : 0
    ;*      Collapsed prolog stages       : 0
    ;*      Minimum required memory pad   : 0 bytes
    ;*
    ;*      For further improvement on this loop, try option -mh8
    ;*
    ;*      Minimum safe trip count       : 1 (after unrolling)
    ;*      Min. prof. trip count  (est.) : 2 (after unrolling)
    ;*
    ;*      Mem bank conflicts/iter(est.) : { min 0.000, est 0.000, max 0.000 }
    ;*      Mem bank perf. penalty (est.) : 0.0%
    ;*
    ;*
    ;*      Total cycles (est.)         : 6 + min_trip_cnt * 2 = 38        
    ;*----------------------------------------------------------------------------*
    ;*       SETUP CODE
    ;*
    ;*                  MV              B5,B4
    ;*                  ADD             12,B4,B4
    ;*                  ADD             8,B5,B5
    ;*
    ;*        SINGLE SCHEDULED ITERATION
    ;*
    ;*        $C$C1866:
    ;*   0              LDW     .D2T2   *B5++(8),B0       ; |223|
    ;*   1              LDW     .D2T2   *B4++(8),B0       ; |223|
    ;*   2              NOP             3
    ;*   5      [ B0]   MIN2    .L1     A3,A4,A4          ; |223|  ^
    ;*   6      [ B0]   MIN2    .L1     A5,A4,A4          ; |223|  ^
    ;*     ||           ADD     .D1     2,A5,A5           ; |219|
    ;*     ||           ADD     .S1     2,A3,A3           ; |219|
    ;*     ||           SPBR            $C$C1866
    ;*   7              NOP             1
    ;*   8              ; BRANCHCC OCCURS {$C$C1866}      ; |219|
    ;*----------------------------------------------------------------------------*
    $C$L76:    ; PIPED LOOP PROLOG
    ;          EXCLUSIVE CPU CYCLES: 7
        .dwpsn    file "ctrl_utils.h",line 219,column 0,is_stmt,isa 0

               SPLOOPD 2       ;8                ; (P)
    ||         STW     .D1T1   A4,*A3            ; |1574|
    ||         ADD     .L2     8,B31,B5
    ||         MVC     .S2     B7,ILC

    ;** --------------------------------------------------------------------------*
    $C$L77:    ; PIPED LOOP KERNEL
    $C$DW$L$ulc_FgrCtrl$82$B:
        .dwpsn    file "ctrl_utils.h",line 220,column 0,is_stmt,isa 0
    ;          EXCLUSIVE CPU CYCLES: 2

               SPMASK          L2
    ||         ADD     .L2     12,B31,B4
    ||         LDW     .D2T2   *B5++(8),B0       ; |223| (P) <0,0>

               LDW     .D2T2   *B4++(8),B0       ; |223| (P) <0,1>
               NOP             2

               SPMASK          L1,S1
    ||         ZERO    .L1     A3                ; |219|
    ||         MV      .S1     A10,A4            ; |212|

               SPMASK          S1
    ||         MVK     .S1     0x1,A5
    || [ B0]   MIN2    .L1     A3,A4,A4          ; |223| (P) <0,5>  ^

        .dwpsn    file "ctrl_utils.h",line 225,column 0,is_stmt,isa 0

               SPKERNEL 0,0
    ||         ADD     .S1     2,A3,A3           ; |219| <0,6>
    ||         ADD     .D1     2,A5,A5           ; |219| <0,6>
    || [ B0]   MIN2    .L1     A5,A4,A4          ; |223| <0,6>  ^

    $C$DW$L$ulc_FgrCtrl$82$E:
    ;** --------------------------------------------------------------------------*
    $C$L78:    ; PIPED LOOP EPILOG
    ;          EXCLUSIVE CPU CYCLES: 5
    ;** 210    -----------------------    Index = K$217;  // [109]
    ;** 219    -----------------------    K$226 = 32u;  // [109]
    ;**      -----------------------    goto g67;
    ;**    -----------------------g66:
    ;** 230    -----------------------    C$201 = &aEnabledBitField[WordFound];  // [109]
    ;** 230    -----------------------    C$202 = *C$201;  // [109]
    ;** 230    -----------------------    (C$202&1u) ? (C$203 = 0u) : (C$203 = _norm((int)_bitr(C$202&0xfffffffeu))+1u);  // [109]
    ;** 230    -----------------------    Index = (WordFound<<5)+C$203;  // [109]
    ;** 231    -----------------------    *C$201 = ~(1u<<C$203)&C$202;  // [109]
    ;** 230    -----------------------    K$226 = 32u;  // [109]
    ;**    -----------------------g67:
    ;** 234    -----------------------    Index = Index;  // [109]
    ;* 1568    -----------------------    if ( Index < K$217 ) goto g62;
               MVK     .S2     32,B7             ; |227|

               MVK     .S1     0x3e8,A11         ; |1518|
    ||         MVK     .S2     448,B5            ; |230|

               NOP             1
               MVK     .S1     0x3e8,A30         ; |1518|
               NOP             1
    ;** --------------------------------------------------------------------------*
    $C$DW$L$ulc_FgrCtrl$84$B:
    ;          EXCLUSIVE CPU CYCLES: 26
               MVK     .L1     1,A3              ; |231|
               MV      .L2X    A4,B6

               ADDAW   .D2     SP,B6,B4          ; |230|
    ||         CMPLTU  .L2     B6,B7,B1          ; |227|

               ADD     .L2     B5,B4,B4          ; |230|
       [ B1]   LDW     .D2T2   *B4,B13           ; |230|
               SHL     .S1X    B6,5,A31          ; |230|
       [!B1]   ZERO    .L2     B2                ; |230|
       [ B1]   MVK     .S1     0x20,A10          ; |230|
       [!B1]   MVK     .S1     0x20,A10          ; |219|
       [ B1]   AND     .L2     1,B13,B2          ; |230|
               MV      .L2     B2,B0             ; |230|
       [!B1]   MVK     .L2     0x1,B0            ; |230|
       [!B0]   AND     .L2     -2,B13,B5         ; |230|
       [!B0]   BITR    .M2     B5,B5             ; |230|
       [ B2]   ZERO    .L2     B10               ; |230|
       [!B0]   NORM    .L2     B5,B5             ; |230|
       [!B0]   ADD     .L2     1,B5,B10          ; |230|
       [ B1]   SHL     .S2X    A3,B10,B5         ; |231|

       [ B1]   ADD     .L1X    B10,A31,A11       ; |230|
    || [ B1]   ANDN    .L2     B13,B5,B5         ; |231|

               CMPLTU  .L1     A11,A30,A0        ; |1568|
    || [ B1]   STW     .D2T2   B5,*B4            ; |231|

       [ A0]   BNOP    .S1     $C$L75,5          ; |1568|
               ; BRANCHCC OCCURS {$C$L75}        ; |1568|
    $C$DW$L$ulc_FgrCtrl$84$E:
    ;** --------------------------------------------------------------------------*

    Thanks for your help,

    SPH.

  • At this time, I can only believe that pf's earlier post is right on the nose.  The compiler is clearly not able to determine that the load and store are to the same address.  Comparing the optimizer comments from your version:

    *(C$204+(struct $fake35 **)K$180+192) = GetBuffAddr(Absolute, 0u);
    (**((struct $fake36 *)K$180+C$204+192)).Index = 10u;

    and from my cobbled-together test case:

    ((struct $fake0 **)K$56)[37*Index+83] = C$4 = GetBuffAddr(0u, 0u);
    (*C$4).Index = 10u;

    show that the optimizer was able to figure it out for my case, but not in yours.  Note the existence of "struct $fake35" and "struct $fake36."  These are automatically-generated names for a struct used in a typedef. Notice that in one case we're casting K$180 to (struct $fake 35 **) once and (struct $fake36 *) once.  This is not necessarily wrong, but it is suspect.  Also note that those expressions add (C$204+192) to differently-typed pointers, which seems suspicious.  Please show me the optimizer comments where C$204 and K$180 are assigned.

    I suspect the differences is in the definition of the types involved.  Look carefully at those types and ask these questions:

    • Is there any pointer casting going on, perhaps an implicit pointer cast?
    • Is there any casting from an array to a pointer, or vice versa?
    • Are there any interesting keywords in use such as const, restrict, packed, etc?

    For grins, here is my source code:

    typedef unsigned int U32;
    
    #define SZ (((((1000)) + (32) - 1) / (32)))
    
    static inline U32 GetIndexInBitField(U32 Bitfield[])
    {
        U32 Index   = (1000);
        U32 WordIndex = 0;
        U32 WordFound = SZ;
        U32 IndexInWord = 0;
    #pragma UNROLL(2)
        for(WordIndex=0; WordIndex < SZ; WordIndex++)
        {
            if (Bitfield[WordIndex])
            {
                WordFound = _min2(WordIndex, WordFound);
            }
        }
        if (WordFound < SZ)
        {
            IndexInWord = (Bitfield[WordFound] & 0x1) ? 0 : _norm(_bitr(Bitfield[WordFound] & 0xFFFFFFFE)) +1;
            Index = IndexInWord + (WordFound << 5);
            (Bitfield[WordFound]) &= (~( 1 << ( IndexInWord ) ));
        }
        return Index;
    }
    
    typedef struct
    {
        U32 Index;
    } A;
    
    typedef struct
    {
        char d[144];
        A *pCtrlAddr;
    } Parm;
    
    typedef struct
    {
        char d[185];
        Parm aParam[1000];
    } C;
    
    A *GetBuffAddr(U32, U32);
    
    C ControlParams;
    
    void func()
    {
        U32 Index = 0;
        U32 Absolute = 0;
        U32 aEnabledBitfield[32];
    
        Parm *pThisParams;
    
        memset(aEnabledBitfield, -1, sizeof(aEnabledBitfield));
    
        C *CtrlParams = &ControlParams;
    
        while((Index = GetIndexInBitField(aEnabledBitfield)) < (1000))
        {
            pThisParams = &(CtrlParams->aParam[ Index ]);
            pThisParams->pCtrlAddr = GetBuffAddr(Absolute, 0);
            pThisParams->pCtrlAddr->Index = 10;
        }
    }
  • OK, first things first. Here is where K$180 gets set; C$204 gets set just before it is used my last post:

    ;**    -----------------------g27:
    ;* 1480    -----------------------    C$214 = &s_ControlParams;
    ;* 1480    -----------------------    (*(struct $fake89 *)C$214).Param.RetainedRsp = GetDataRsp(Absolute, C$214+12);
    ;* 1481    -----------------------    U$121 = s_SemiStaticParam.IsInfoPresent;
    ;* 1480    -----------------------    K$180 = C$214;
    ;* 1481    -----------------------    if ( !U$121 ) goto g30;
    ;* 1481    -----------------------    if ( !s_ControlParams.Params.IsRspPresent ) goto g30;
    ;* 1483    -----------------------    s_ControlParams.pDataReadyRsp = (struct $fake5 *)s_ControlParams.Params.RetainedRsp+672;
    ;* 1484    -----------------------    goto g31;

    I tried adding explicit tags to the structures within the typedefs so that the struct pointed to by pThisParams has "tag_1" and pThisParams->pCtrlParams has "tag_2", and I get the following:

    ;* 1574    -----------------------    C$204 = Index*148;
    ;* 1574    -----------------------    *(C$204+(struct ag_2 **)K$180+192) = GetBuffAddr(Absolute, 0u);
    ;* 1578    -----------------------    (**((struct ag_1 *)K$180+C$204+192)).Index = 10u;

    Regarding "interesting" keywords, const, restrict and packed are not used in this code; though CtrlParams is pointing to a typedef'ed struct declared "static". There are no explicit casts and, as far as I can see, all types are consistent so I do not believe that there are any implicit casts taking place.

    Cheers,

    SPH.

  • Before looking at this too deeply, are the optimizer comments verbatim from the assembly file?   The type "struct ag_2" should be "struct tag_2."  If it is verbatim, it's wrong, but probably not a bug.

  • Yes, I noticed that too - since the first cut had struct tags that only differed in the first letter so I couldn't tell them from each other! It appears that the first letter is omitted in the optimizer comment. Those 3 lines were definitely cut--and-paste directly from the assembler...

    SPH.

  • For the record, the name truncation is only a problem with the optimizer comments; it doesn't affect the execution, and I've been told it's fixed in later branches.

  • This issue is now being tracked as SDSCM00050359

  • SDSCM00050359 turns out to be a duplicate of SDSCM00049647, which is fixed in the upcoming 7.4.9 release.  I do not know the schedule for that release.

    The only workaround I know of is to compile at a lower optimisation level, -o0 or -o1.  The problem is an artifact of using a very large struct;  the compiler modifies some accesses, particular x->a.b, to create a base pointer with manageable offsets.  In so doing, it sometimes fails to clean up properly and thereby misses an alias.

    It might be helpful to introduce a temporary variable, as in "t = x->a;  ... t.b", but I haven't tried that;  I don't have a compilable test case to experiment with.

  • Thanks. As I said in my original post, I've been able to refactor the code to work around this problem (by using a temporary variable for the return value from GetBuffAddr() so we aren't desperate for a fix.

    Thanks,

    SPH.