Hi,
I have some code doing basically mergesort on two input arrays (actually its raw data not represented by a C data structure, thats why I rely on pointers):
int x;
for(x=0; x < 4; x++) {
short* restrict leftMin = &min0_0[x];
short* restrict leftMinIdx = &minIdx0_0[x];
short* restrict rightMin = &min0_1[x];
short* restrict rightMinIdx = &minIdx0_1[x];
short* restrict validMin = &validMin0[x];
short* restrict validMinIdx = &validMin0Idx[x];
int i;
#pragma UNROLL(4)
for (i = 0; i < 4; i++) {
if (*leftMin < *rightMin) {
*validMin = *leftMin;
*validMinIdx = *leftMinIdx;
leftMin += yStep;
leftMinIdx += yStep;
} else {
*validMin = *rightMin;
*validMinIdx = *rightMinIdx;
rightMin += yStep;
rightMinIdx += yStep;
}
validMin += valStep;
validMinIdx += valStep;
}
}
Compiling this code generates a loop (outer one) which takes 25 cycles, instead of the expected <20, the culprit seems to be the high Loop Carried Dependency Bound:
;* Loop found in file : ../main.c
;* Loop source line : 49
;* Loop opening brace source line : 49
;* Loop closing brace source line : 78
;* Known Minimum Trip Count : 4
;* Known Maximum Trip Count : 4
;* Known Max Trip Count Factor : 4
;* Loop Carried Dependency Bound(^) : 23
;* Unpartitioned Resource Bound : 16
;* Partitioned Resource Bound(*) : 16
;* Resource Partition:
;* A-side B-side
;* .L units 2 2
;* .S units 1 0
;* .D units 16* 16*
;* .M units 0 0
;* .X cross paths 10 4
;* .T address paths 16* 16*
;* Long read paths 0 0
;* Long write paths 0 0
;* Logical ops (.LS) 0 0 (.L or .S unit)
;* Addition ops (.LSD) 22 16 (.L or .S or .D unit)
;* Bound(.L .S .LS) 2 1
;* Bound(.L .S .D .LS .LSD) 14 12
;*
;* Searching for software pipeline schedule at ...
;* ii = 23 Did not find schedule
;* ii = 24 Did not find schedule
;* ii = 25 Schedule found with 2 iterations in parallel
However, what I don't understand is - how the loop can be dependency bound? It does not pass any data to the next iteration actually, and pointers are all annotated with the restrict keyword.
I would be grateful for a pointer whats going wrong here.
Thx