Hello,
I am using OpenMP 2.x for KeyStone 1 C6678 device. I have modified the OpenMP example program hello_with_make to use a OpenMP parallel for pragma
#pragma DATA_SECTION (count,"DDR3")
int count[8] = {0,0,0,0,0,0,0,0};
#define H_LEN 5
#define X_LEN 1000
int x[X_LEN+H_LEN-1]; int h[H_LEN]; int y[X_LEN+H_LEN-1];
int main (int argc, char *argv[]) {
int h_len = H_LEN; int x_len = X_LEN; int conv_len = h_len + x_len - 1;
test_variable |= 0x02;
for(int i=0; i < X_LEN; i++) x[i] = i+1;
for(int i=X_LEN; i < X_LEN + H_LEN; i++) x[i] = 0;
for(int i = 0; i < H_LEN; i++) h[i] = H_LEN - i + 1;
test_variable |= 0x04;
timer_start[DNUM] = _itoll(TSCH, TSCL);
#pragma omp parallel for
for (int i = 0; i < conv_len; i++) {
int sum = 0;
for (int j = 0; j < h_len; j++) {
count[DNUM]++;
sum += h[j] * x[i + j];
}
y[i] = sum;
}
timer_end = _itoll(TSCH, TSCL);
frame_cyc[DNUM] = timer_end - timer_start[DNUM];
test_variable |= 0x08;
return 0;
}
With OpenMP.numCores = 4 I am getting frame_cyc[0] as 0x121ab and count[0] as 0x139c and all other counts as 0.
With OpenMP.numCores = 8 frame_cyc[0] is 0x12296 and count[0] as 0x139c and all other counts as 0. I have tried using private(i, j) and was able to get Count[] values that made sense, but I'd like to use "int i" and "int j" which are given in the OpenMP tutorials.
I am changing number is cores used in cfg file. I looked at the thread
https://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/265543 but couldnt find a solution for this issue.
So far, I have never got any TSC-based profiling values that make sense.
Why is 8 cores showing more frame_cyc count than 4 cores? What else can I try to debug this problem?
Thanks
Anish
Signalogic