Hello,
I am starting to optimize some C source code for speed gains and have been attempting to understand the information provided in the optimization feedback in the assembly code.
I am getting a loop carried dependency bound value much higher then I expected. The pointer parameter is a restricted, however my loop carried dependency bound value is "4" while I would expect a zero value here.
The function in question takes a pointer to floating point array, and a size. The floating point sum of the input floating point array is returned.
float array_sum ( const float * restrict In , /* Input array */
unsigned Size ) /* size */
{
int i = 0; /* Loop Counter */
float x = 0.0; /* Running sum of array */
_nassert( (int) In % 8 == 0); /* Input pointer is 64-bit aligned */
#pragma MUST_ITERATE (2, ) /* Loop must execute at least twice */
for ( i = 0; i < Size; i++ )
{
x += In[i];
}
return x;
}
*----------------------------------------------------------------------------*
;* SOFTWARE PIPELINE INFORMATION
;*
;* Loop found in file : ../foo.c
;* Loop source line : 300
;* Loop opening brace source line : 301
;* Loop closing brace source line : 303
;* Known Minimum Trip Count : 2
;* Known Max Trip Count Factor : 1
;* Loop Carried Dependency Bound(^) : 4
;* Unpartitioned Resource Bound : 1
;* Partitioned Resource Bound(*) : 1
;* Resource Partition:
;* A-side B-side
;* .L units 0 0
;* .S units 0 0
;* .D units 1* 0
;* .M units 0 0
;* .X cross paths 0 0
;* .T address paths 1* 0
;* Long read paths 0 0
;* Long write paths 0 0
;* Logical ops (.LS) 1 0 (.L or .S unit)
;* Addition ops (.LSD) 0 0 (.L or .S or .D unit)
;* Bound(.L .S .LS) 1* 0
;* Bound(.L .S .D .LS .LSD) 1* 0
;*
;* Searching for software pipeline schedule at ...
;* ii = 4 Schedule found with 2 iterations in parallel
;*
;* Register Usage Table:
;* +-----------------------------------------------------------------+
;* |AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB|
;* |00000000001111111111222222222233|00000000001111111111222222222233|
;* |01234567890123456789012345678901|01234567890123456789012345678901|
;* |--------------------------------+--------------------------------|
;* 0: | ** | |
;* 1: | *** | |
;* 2: | ** | |
;* 3: | ** | |
;* +-----------------------------------------------------------------+
;*
;* Done
;*
;* Loop will be splooped
;* Collapsed epilog stages : 0
;* Collapsed prolog stages : 0
;* Minimum required memory pad : 0 bytes
;*
;* Minimum safe trip count : 1
;* Min. prof. trip count (est.) : 3
;*
;* Mem bank conflicts/iter(est.) : { min 0.000, est 0.000, max 0.000 }
;* Mem bank perf. penalty (est.) : 0.0%
;*
;*
;* Total cycles (est.) : 4 + trip_cnt * 4
;*----------------------------------------------------------------------------*
Any suggestions?