i'm testing a disparity calculation program using OpenMP runtime 1.2.0.5 and 2.1.17.1, and i got some performance issues.
the code was 0.15s by not having #pragma omp clauses;
when using OpenMP 1.2, the time was 0.5s by on 1 core and 0.06s on 8 cores;
when using OpenMP 2.1, the time was same on 1 core but 0.3s on 8 cores, and the result does not change if i put data into DDR or MSMC.
I need some help on:
1. why using OpenMP on 1 core might be worse than not using OpenMP?
2. why OpenMP 2.1 might perform worse than Openmp 1.2 on 8 cores when memory setting were same?
3. The OpenMP 2.1 specification says OpenMP runtime need some non-cached MSMC but does not mention about program data, whereas in OpenMP 1.x the data memory in MSMC should be non-cached, is this still necessary in 2.x? If it is necessary, can you give a example how to do that?
4. I wrote another program using "omp atomic"/ "omp critical" clause and it seems that there were not any difference, but in general atomic should be much faster than critical, is it c6678 not support atomic?
summary of code: data1 = char[size]; data2 = char[size]; output = malloc(sizeof(char)*size); #pragma omp parallel for private(i,j) shared(data1,data2,output) collapse(2) schedule(static) num_threads(8) for (i=height;i>0;i--) { for (j=Width;j>0;j--) { output[i*Width+j]= dosomething(data1,data2); } }
The cfg file used was from the example in OpenMP packages.
The tools used are: ccs 6.1.1, compiler 8.1.0, xdc 3.25.6.96, ipc 1.24.3.32, c6678 pdk 1.1.2.6, sys/bios 6.33.6.50, uia 1.2.0.7
Thanks,
Han