This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenMP BenchMarking

Hello,

I am using  OpenMP 2.x for KeyStone 1 C6678 device. I have modified the OpenMP example program hello_with_make to use a OpenMP parallel for pragma

#pragma DATA_SECTION (count,"DDR3")
int count[8] = {0,0,0,0,0,0,0,0};

#define H_LEN 5
#define X_LEN 1000 

int x[X_LEN+H_LEN-1]; int h[H_LEN]; int y[X_LEN+H_LEN-1];

int main (int argc, char *argv[]) {

int h_len = H_LEN; int x_len = X_LEN; int conv_len = h_len + x_len - 1;

test_variable |= 0x02;

for(int i=0; i < X_LEN; i++)   x[i] = i+1;
for(int i=X_LEN; i < X_LEN + H_LEN; i++)  x[i] = 0;
for(int i = 0; i < H_LEN; i++) h[i] = H_LEN - i + 1;

test_variable |= 0x04;

timer_start[DNUM] = _itoll(TSCH, TSCL);

#pragma omp parallel for
for (int i = 0; i < conv_len; i++) {
       int sum = 0;
       for (int j = 0; j < h_len; j++) {
              count[DNUM]++;
              sum += h[j] * x[i + j];
      }
      y[i] = sum;
}

timer_end = _itoll(TSCH, TSCL);
frame_cyc[DNUM] = timer_end - timer_start[DNUM];

test_variable |= 0x08;

return 0;
}

With OpenMP.numCores = 4 I am getting frame_cyc[0] as 0x121ab and count[0] as 0x139c and all other counts as 0.

With OpenMP.numCores = 8 frame_cyc[0] is 0x12296 and count[0] as 0x139c and all other counts as 0.  I have tried using private(i, j) and was able to get Count[] values that made sense, but I'd like to use "int i" and "int j" which are given in the OpenMP tutorials.

I am changing number is cores used in cfg file. I looked at the thread

https://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/265543 but couldnt find a solution for this issue. 

So far, I have never got any TSC-based profiling values that make sense.

Why is 8 cores showing more frame_cyc count than 4 cores? What else can I try to debug this problem?

Thanks
Anish
Signalogic 

  • Hi Anish,
    Thank you for the post. We will get back to you.
  • Hi Anish,

    It seems you didn't enable support for OpenMP 3.0 (--openmp, --omp) in project properties->C6000 compiler ->Advanced Options-> Advance Optimizations.

    And what's the compiler version are you using?

    Please post your complete CCS project so I can try to replicate the issue you observed. You probably have timer variables defined and TSC enabled somewhere else.

    Regards, Garrett

  • Garrett,

    I am not using CCS and am using TI dsp_utils program to download the code. I have modified the OpenMP example program hello_with_make and have --omp flag for OpenMP support in the Makefile. My compiler version is cgt-c6000 8.0.1.

    We noticed that with the below code, we are getting correct frame_cyc

    timer_start[DNUM] = _itoll(TSCH, TSCL);

    #ifdef _CIM
    #pragma cim parallel for
    #else
    #pragma omp parallel for private(i,j)
    #endif
    for (i = 0; i < conv_len; i++) {
           int sum = 0;
           for (j = 0; j < h_len; j++) sum += h[j] * x[i + j];
           y[i] = sum;
    }

    timer_end = _itoll(TSCH, TSCL);
    frame_cyc[DNUM] = timer_end - timer_start[DNUM];

    With 8 cores frame_cyc is 0x00030196 and with 4 cores frame_cyc is 0x0005c376. Removing count[DNUM]++ statement which was inside the parallel for loop gives correct results.

    Thanks 
    Anish

    ** The _CIM case applies when used with the CIM Hyperpiler, which separates the source into x86 and c66x code streams for use with a server + c66x accelerator card