I have a simple loop that the compiler refuses to optimize due to a high loop carried dependency. I boiled down the problem to this very short demonstration loop:
void LoopCarriedDependency(float *restrict * restrict list)
{ int i;
float * restrict p1;
float * restrict p2;
for (i=0; i<10; i++ )
{ p1 = *list;
p2 = *list++;
*p2 =*p1 + 2;
}
}
Of course, the compiler assumes that in the list array there could be entries pointing to the same memory locations, thus creating a dependency, but I want to tell the compiler that this will never be the case. I put restrict keywords whereever the compiler accepted it but with no success. How can I remove the dependency, knowinf that all list entries point to different locations?
Here the assembly output:
;*----------------------------------------------------------------------------*
;* SOFTWARE PIPELINE INFORMATION
;*
;* Loop found in file : ../test.c
;* Loop source line : 5
;* Loop opening brace source line : 6
;* Loop closing brace source line : 9
;* Known Minimum Trip Count : 10
;* Known Maximum Trip Count : 10
;* Known Max Trip Count Factor : 10
;* Loop Carried Dependency Bound(^) : 10
;* Unpartitioned Resource Bound : 2
;* Partitioned Resource Bound(*) : 3
;* Resource Partition:
;* A-side B-side
;* .L units 0 0
;* .S units 0 0
;* .D units 0 3*
;* .M units 0 0
;* .X cross paths 0 0
;* .T address paths 0 0
;* Logical ops (.LS) 0 1 (.L or .S unit)
;* Addition ops (.LSD) 0 0 (.L or .S or .D unit)
;* Bound(.L .S .LS) 0 1
;* Bound(.L .S .D .LS .LSD) 0 2
;*
;* Searching for software pipeline schedule at ...
;* ii = 10 Schedule found with 2 iterations in parallel
;*
;* Register Usage Table:
;* +-----------------------------------------------------------------+
;* |AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB|
;* |00000000001111111111222222222233|00000000001111111111222222222233|
;* |01234567890123456789012345678901|01234567890123456789012345678901|
;* |--------------------------------+--------------------------------|
;* 0: | | **** |
;* 1: | | *** |
;* 2: | | *** |
;* 3: | | *** |
;* 4: | | **** |
;* 5: | | *** |
;* 6: | | *** |
;* 7: | | *** |
;* 8: | | *** |
;* 9: | | *** |
;* +-----------------------------------------------------------------+
;*
;* Done
;*
;* Loop will be splooped
;* Collapsed epilog stages : 0
;* Collapsed prolog stages : 0
;* Minimum required memory pad : 0 bytes
;*
;* Minimum safe trip count : 1
;* Min. prof. trip count (est.) : 3
;*
;* Mem bank conflicts/iter(est.) : { min 0.000, est 0.000, max 0.000 }
;* Mem bank perf. penalty (est.) : 0.0%
;*
;*
;* Total cycles (est.) : 10 + min_trip_cnt * 10 = 110
;*----------------------------------------------------------------------------*
;* SINGLE SCHEDULED ITERATION
;*
;* $C$C104:
;* 0 LDW .D2T2 *B6++(4),B5 ; [B_D64P] |6|
;* 1 NOP 4 ; [A_L674]
;* 5 LDW .D2T2 *B5(0),B4 ; [B_D64P] |6| ^
;* 6 NOP 4 ; [A_L674]
;* 10 ADDSP .L2 B7,B4,B4 ; [B_L674] |6| ^
;* 11 NOP 3 ; [A_L674]
;* 14 STW .D2T2 B4,*B5(0) ; [B_D64P] |6| ^
;* || SPBR $C$C104 ; []
;* 15 NOP 5 ; [A_L674]
;* 20 ; BRANCHCC OCCURS {$C$C104} ; [] |5|
;*----------------------------------------------------------------------------*