This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

the efficiency of openMP

the pink line is running without openMP;

the red line is running with one thread using  openMP;

the blue line is running with two threads.

 

[C66xx_0] ROT#1 = [ 7699483 ] cycles


Hello World from thread = 0
Number of threads = 1
ROT#OMP = [ 7691248 ] cycles


Hello World from thread = 0

Number of threads = 2
ROT#OMP = [ 7699744 ] cycles
[C66xx_1] Hello World from thread = 1
ROT#OMP = [ 0 ] cycles


[C66xx_0] OMP TIME = [ 874812474 ] cycles


ROT#2 = [ 7699211 ] cycles

 

The question is why there is no difference about time between one thread and two threads?

How to  play to the efficiency of openMP?

  • Hi ZhengTian

    Could you please share your code so we can see how you are measuring the timing?

    How many cores are you using when you run this?

  • The following is the src file.

    I load the .out file into two cores.

    I run the program both one thread and two threads which you can see in the code.

    /******************************************************************************
    * OpenMP Example - Matrix-vector multiplication - C/C++ Version
    * FILE: omp_matvec.c
    * DESCRIPTION:
    *   This example multiplies all row i elements of matrix A with vector
    *   element b(i) and stores the summed products in vector c(i).  A total is
    *   maintained for the entire matrix.  Performed by using the OpenMP loop
    *   work-sharing construct.  The update of the shared global total is
    *   serialized by using the OpenMP critical directive.
    * SOURCE: Blaise Barney  5/99
    * LAST REVISED:
    ******************************************************************************/
    
    #include <ti/omp/omp.h>
    
    
    #include <string.h>
    #include <assert.h>
    #include <stdio.h>
    #include <time.h>
    #include <stdint.h>
    #include <xdc/std.h>
    #include <xdc/runtime/System.h>
    #include <ti/sysbios/BIOS.h>
    #include <xdc/runtime/Log.h>
    #include <xdc/runtime/Timestamp.h>
    #include <math.h>
    
    #include <ti/ipc/MultiProc.h>
    
    
    #define NTHREADS  1
    
    
    #define FFT_MAX_L 2048
    #define PI 3.14159265358979323846
    
    #define	ROT_LOOP
    
    #define	IMAGESIZE	((128)*(128))
    
    
    //#pragma DATA_SECTION(sin_table,"L2SRAM");
    float sin_table[360] = {0.0, 0.0174524, 0.0348995, 0.052336, 0.0697565, 0.0871557, 0.104528, 0.121869, 0.139173, 0.156434, 0.173648, 0.190809, 0.207912, 0.224951, 0.241922, 0.258819, 0.275637, 0.292372, 0.309017, 0.325568, 0.34202, 0.358368, 0.374607, 0.390731, 0.406737, 0.422618, 0.438371, 0.45399, 0.469472, 0.48481, 0.5, 0.515038, 0.529919, 0.544639, 0.559193, 0.573576, 0.587785, 0.601815, 0.615662, 0.62932, 0.642788, 0.656059, 0.669131, 0.681998, 0.694658, 0.707107, 0.71934, 0.731354, 0.743145, 0.75471, 0.766044, 0.777146, 0.788011, 0.798636, 0.809017, 0.819152, 0.829038, 0.838671, 0.848048, 0.857167, 0.866025, 0.87462, 0.882948, 0.891007, 0.898794, 0.906308, 0.913545, 0.920505, 0.927184, 0.93358, 0.939693, 0.945519, 0.951057, 0.956305, 0.961262, 0.965926, 0.970296, 0.97437, 0.978148, 0.981627, 0.984808, 0.987688, 0.990268, 0.992546, 0.994522, 0.996195, 0.997564, 0.99863, 0.999391, 0.999848, 1.0, 0.999848, 0.999391, 0.99863, 0.997564, 0.996195, 0.994522, 0.992546, 0.990268, 0.987688, 0.984808, 0.981627, 0.978148, 0.97437, 0.970296, 0.965926, 0.961262, 0.956305, 0.951057, 0.945519, 0.939693, 0.93358, 0.927184, 0.920505, 0.913545, 0.906308, 0.898794, 0.891007, 0.882948, 0.87462, 0.866025, 0.857167, 0.848048, 0.838671, 0.829038, 0.819152, 0.809017, 0.798636, 0.788011, 0.777146, 0.766044, 0.75471, 0.743145, 0.731354, 0.71934, 0.707107, 0.694658, 0.681998, 0.669131, 0.656059, 0.642788, 0.629321, 0.615661, 0.601815, 0.587785, 0.573576, 0.559193, 0.544639, 0.529919, 0.515038, 0.5, 0.48481, 0.469472, 0.453991, 0.438371, 0.422618, 0.406737, 0.390731, 0.374607, 0.358368, 0.34202, 0.325568, 0.309017, 0.292372, 0.275637, 0.258819, 0.241922, 0.224951, 0.207912, 0.190809, 0.173648, 0.156434, 0.139173, 0.121869, 0.104528, 0.0871559, 0.0697565, 0.052336, 0.0348995, 0.0174525, 0.0, -0.0174524, -0.0348994, -0.052336, -0.0697564, -0.0871556, -0.104528, -0.121869, -0.139173, -0.156434, -0.173648, -0.190809, -0.207912, -0.224951, -0.241922, -0.258819, -0.275637, -0.292372, -0.309017, -0.325568, -0.34202, -0.358368, -0.374607, -0.390731, -0.406737, -0.422618, -0.438371, -0.453991, -0.469472, -0.484809, -0.5, -0.515038, -0.529919, -0.544639, -0.559193, -0.573576, -0.587785, -0.601815, -0.615661, -0.62932, -0.642788, -0.656059, -0.669131, -0.681998, -0.694658, -0.707107, -0.71934, -0.731354, -0.743145, -0.75471, -0.766045, -0.777146, -0.788011, -0.798635, -0.809017, -0.819152, -0.829038, -0.838671, -0.848048, -0.857167, -0.866025, -0.87462, -0.882948, -0.891006, -0.898794, -0.906308, -0.913545, -0.920505, -0.927184, -0.93358, -0.939693, -0.945519, -0.951056, -0.956305, -0.961262, -0.965926, -0.970296, -0.97437, -0.978148, -0.981627, -0.984808, -0.987688, -0.990268, -0.992546, -0.994522, -0.996195, -0.997564, -0.99863, -0.999391, -0.999848, -1.0, -0.999848, -0.999391, -0.99863, -0.997564, -0.996195, -0.994522, -0.992546, -0.990268, -0.987688, -0.984808, -0.981627, -0.978148, -0.97437, -0.970296, -0.965926, -0.961262, -0.956305, -0.951056, -0.945519, -0.939693, -0.93358, -0.927184, -0.920505, -0.913545, -0.906308, -0.898794, -0.891007, -0.882948, -0.87462, -0.866025, -0.857167, -0.848048, -0.83867, -0.829038, -0.819152, -0.809017, -0.798636, -0.788011, -0.777146, -0.766044, -0.75471, -0.743145, -0.731354, -0.71934, -0.707107, -0.694659, -0.681998, -0.669131, -0.656059, -0.642788, -0.629321, -0.615661, -0.601815, -0.587785, -0.573577, -0.559193, -0.544639, -0.529919, -0.515038, -0.5, -0.48481, -0.469471, -0.453991, -0.438371, -0.422618, -0.406737, -0.390731, -0.374607, -0.358368, -0.34202, -0.325568, -0.309017, -0.292372, -0.275638, -0.258819, -0.241922, -0.224951, -0.207912, -0.190809, -0.173648, -0.156434, -0.139173, -0.121869, -0.104529, -0.087156, -0.0697564, -0.052336, -0.0348996, -0.0174526};
    
    //#pragma DATA_SECTION(cos_table,"L2SRAM");
    float cos_table[360] = {1.0, 0.999848, 0.999391, 0.99863, 0.997564, 0.996195, 0.994522, 0.992546, 0.990268, 0.987688, 0.984808, 0.981627, 0.978148, 0.97437, 0.970296, 0.965926, 0.961262, 0.956305, 0.951057, 0.945519, 0.939693, 0.93358, 0.927184, 0.920505, 0.913545, 0.906308, 0.898794, 0.891007, 0.882948, 0.87462, 0.866025, 0.857167, 0.848048, 0.838671, 0.829038, 0.819152, 0.809017, 0.798636, 0.788011, 0.777146, 0.766044, 0.75471, 0.743145, 0.731354, 0.71934, 0.707107, 0.694658, 0.681998, 0.669131, 0.656059, 0.642788, 0.62932, 0.615662, 0.601815, 0.587785, 0.573576, 0.559193, 0.544639, 0.529919, 0.515038, 0.5, 0.48481, 0.469472, 0.453991, 0.438371, 0.422618, 0.406737, 0.390731, 0.374607, 0.358368, 0.34202, 0.325568, 0.309017, 0.292372, 0.275637, 0.258819, 0.241922, 0.224951, 0.207912, 0.190809, 0.173648, 0.156434, 0.139173, 0.121869, 0.104529, 0.0871558, 0.0697565, 0.052336, 0.0348995, 0.0174524, 0.0, -0.0174524, -0.0348995, -0.0523359, -0.0697565, -0.0871558, -0.104528, -0.121869, -0.139173, -0.156434, -0.173648, -0.190809, -0.207912, -0.224951, -0.241922, -0.258819, -0.275637, -0.292372, -0.309017, -0.325568, -0.34202, -0.358368, -0.374607, -0.390731, -0.406737, -0.422618, -0.438371, -0.45399, -0.469472, -0.48481, -0.5, -0.515038, -0.529919, -0.544639, -0.559193, -0.573576, -0.587785, -0.601815, -0.615661, -0.62932, -0.642788, -0.656059, -0.669131, -0.681998, -0.694658, -0.707107, -0.71934, -0.731354, -0.743145, -0.754709, -0.766044, -0.777146, -0.788011, -0.798635, -0.809017, -0.819152, -0.829037, -0.838671, -0.848048, -0.857167, -0.866025, -0.87462, -0.882948, -0.891006, -0.898794, -0.906308, -0.913545, -0.920505, -0.927184, -0.93358, -0.939693, -0.945519, -0.951056, -0.956305, -0.961262, -0.965926, -0.970296, -0.97437, -0.978148, -0.981627, -0.984808, -0.987688, -0.990268, -0.992546, -0.994522, -0.996195, -0.997564, -0.99863, -0.999391, -0.999848, -1.0, -0.999848, -0.999391, -0.99863, -0.997564, -0.996195, -0.994522, -0.992546, -0.990268, -0.987688, -0.984808, -0.981627, -0.978148, -0.97437, -0.970296, -0.965926, -0.961262, -0.956305, -0.951057, -0.945519, -0.939693, -0.93358, -0.927184, -0.920505, -0.913545, -0.906308, -0.898794, -0.891007, -0.882948, -0.87462, -0.866025, -0.857167, -0.848048, -0.838671, -0.829038, -0.819152, -0.809017, -0.798635, -0.788011, -0.777146, -0.766044, -0.75471, -0.743145, -0.731354, -0.71934, -0.707107, -0.694659, -0.681998, -0.669131, -0.656059, -0.642788, -0.62932, -0.615662, -0.601815, -0.587785, -0.573576, -0.559193, -0.544639, -0.529919, -0.515038, -0.5, -0.48481, -0.469472, -0.453991, -0.438371, -0.422618, -0.406737, -0.390731, -0.374607, -0.358368, -0.34202, -0.325568, -0.309017, -0.292372, -0.275638, -0.258819, -0.241922, -0.224951, -0.207912, -0.190809, -0.173648, -0.156435, -0.139173, -0.12187, -0.104528, -0.0871557, -0.0697565, -0.0523361, -0.0348998, -0.0174523, 0.0, 0.0174523, 0.0348993, 0.0523357, 0.0697566, 0.0871557, 0.104528, 0.121869, 0.139173, 0.156435, 0.173648, 0.190809, 0.207911, 0.224951, 0.241922, 0.258819, 0.275637, 0.292371, 0.309017, 0.325568, 0.34202, 0.358368, 0.374606, 0.390731, 0.406737, 0.422618, 0.438371, 0.45399, 0.469472, 0.48481, 0.5, 0.515038, 0.529919, 0.544639, 0.559193, 0.573576, 0.587785, 0.601815, 0.615662, 0.62932, 0.642788, 0.656059, 0.66913, 0.681998, 0.694658, 0.707107, 0.71934, 0.731354, 0.743145, 0.75471, 0.766044, 0.777146, 0.788011, 0.798636, 0.809017, 0.819152, 0.829037, 0.838671, 0.848048, 0.857167, 0.866025, 0.87462, 0.882948, 0.891007, 0.898794, 0.906308, 0.913545, 0.920505, 0.927184, 0.93358, 0.939693, 0.945518, 0.951057, 0.956305, 0.961262, 0.965926, 0.970296, 0.97437, 0.978148, 0.981627, 0.984808, 0.987688, 0.990268, 0.992546, 0.994522, 0.996195, 0.997564, 0.99863, 0.999391, 0.999848};
    
    
    //#pragma DATA_SECTION(MidRawData,"MSMCSRAM");
    unsigned short MidRawData[16384];
    
    //#pragma DATA_SECTION(ROT_result,"MSMCSRAM");
    unsigned short ROT_result[16384];
    
    
    
    
    void main()
    {
    
    	int angle=50;
    
    /****************************** read image file **********************/
    	FILE *fp;
    	char *input_file = "1.raw";
    	size_t	read_cnt=0;
    	fp = fopen (input_file, "rb");
    	if (fp == NULL)
    	{
    	  printf ("Failed to open file %s\n", input_file);
    	  return;
    	}
    	read_cnt = fread(MidRawData,2,IMAGESIZE,fp);
    
    /******************************************************************/
    
           int nthreads, tid;
    
    
           nthreads = NTHREADS;
    
       	int numProcs, i, n;
       	numProcs = MultiProc_getNumProcessors();
    
    
           uint16_t j;
           uint32_t start,  end, totalTime;
           uint32_t start1, end1;
    
    
    	  	int	x, y;
    	  	int	angle_tmp;
    	  	float	nx, ny;
    
    
           TSC_enable();
    
           start = TSC_read();
    #ifdef ROT_LOOP
    
    	  	for(y = 0; y < 128; ++y)
    	  	{
    	  		for(x = 0; x < 128; ++x)
    	  		{  
    	  //					
    	  //					nx = (x - 128 / 2) * cos(angle * a2r) + (y - 128 / 2) * sin(angle * a2r);
    	  //					ny = 0 - (x - 128 / 2) * sin(angle * a2r) + (y - 128 / 2) * cos(angle * a2r);
    
    	  			if(angle >= 0)
    	  				angle_tmp = angle % 360;
    	  			if(angle < 0)
    	  				angle_tmp = 360 - (0 - angle) % 360;
    	  			nx = (x - 64) * cos_table[angle_tmp] + (y - 64) * sin_table[angle_tmp];
    	  			ny = 0 - (x - 64) * sin_table[angle_tmp] + (y - 64) * cos_table[angle_tmp];
    	  			if((nx + 128 / 2) >= 0 && (nx + 128 / 2) < 128 && (ny + 128/2) >= 0 && (ny + 128/2) < 128)
    	  			{
    	  				ROT_result[(int)(ny + 128/2) * 128 + (int)(nx + 128 / 2)] = MidRawData[y * 128 + x];
    	  				if(((ny + 128/2) + 1) < 128)
    	  					ROT_result[(int)((ny + 128/2) + 1) * 128 + (int)(nx + 128 / 2)] = MidRawData[y * 128 + x];
    	  				if(((nx + 128 / 2) + 1) < 128)
    	  					ROT_result[(int)(ny + 128/2) * 128 + (int)(nx + 128 / 2) + 1] = MidRawData[y * 128 + x];
    	  				if(((ny + 128/2) + 1) < 128 && ((nx + 128 / 2) + 1) < 128)
    	  					ROT_result[((int)(ny + 128/2) + 1) * 128 + (int)(nx + 128 / 2) + 1] = MidRawData[y * 128 + x];
    	  			}
    	  		}
    	  	}
    #else
           ROT_func(MidRawData, 50, ROT_result);
    #endif
           end = TSC_read();
           printf( "ROT#1 = [ %u ] cycles \n", ( end - start ) );
    
    
           totalTime = TSC_read();
           /* Fork a team of threads giving them their own copies of variables */
    		for (n = 1; n <= numProcs; n++)
    		{
    			omp_set_num_threads(1);
    	#pragma omp parallel private(nthreads, tid)
    		   {
    
    
    				  /* Obtain thread number */
    				  tid = omp_get_thread_num();
    				  printf("Hello World from thread = %d\n", tid);
    
    				  if (tid == 0){
    						 nthreads = omp_get_num_threads();
    						 printf("Number of threads = %d\n", nthreads);
    					 }
    
    				  // Hanning Window
    				  uint32_t startMp0, endMp0;
    				  startMp0 = TSC_read();
    	#ifdef ROT_LOOP
    
    				    #pragma omp parallel for private(x,y, angle_tmp, nx, ny)
    				  	for(y = 0; y < 128; ++y)
    				  	{
    				  		for(x = 0; x < 128; ++x)
    				  		{
    				  //					
    				  //					nx = (x - 128 / 2) * cos(angle * a2r) + (y - 128 / 2) * sin(angle * a2r);
    				  //					ny = 0 - (x - 128 / 2) * sin(angle * a2r) + (y - 128 / 2) * cos(angle * a2r);
    
    				  			
    				  			if(angle >= 0)
    				  				angle_tmp = angle % 360;
    				  			if(angle < 0)
    				  				angle_tmp = 360 - (0 - angle) % 360;
    				  			nx = (x - 64) * cos_table[angle_tmp] + (y - 64) * sin_table[angle_tmp];
    				  			ny = 0 - (x - 64) * sin_table[angle_tmp] + (y - 64) * cos_table[angle_tmp];
    				  			if((nx + 128 / 2) >= 0 && (nx + 128 / 2) < 128 && (ny + 128/2) >= 0 && (ny + 128/2) < 128)
    				  			{
    				  				ROT_result[(int)(ny + 128/2) * 128 + (int)(nx + 128 / 2)] = MidRawData[y * 128 + x];
    				  				if(((ny + 128/2) + 1) < 128)
    				  					ROT_result[(int)((ny + 128/2) + 1) * 128 + (int)(nx + 128 / 2)] = MidRawData[y * 128 + x];
    				  				if(((nx + 128 / 2) + 1) < 128)
    				  					ROT_result[(int)(ny + 128/2) * 128 + (int)(nx + 128 / 2) + 1] = MidRawData[y * 128 + x];
    				  				if(((ny + 128/2) + 1) < 128 && ((nx + 128 / 2) + 1) < 128)
    				  					ROT_result[((int)(ny + 128/2) + 1) * 128 + (int)(nx + 128 / 2) + 1] = MidRawData[y * 128 + x];
    				  			}
    				  		}
    				  	}
    
    	#else
    				  ROT_func(MidRawData, 50, ROT_result);
    	#endif
    				  endMp0 = TSC_read();
    				  printf( "ROT#OMP = [ %u ] cycles \n", ( endMp0 - startMp0 ) );
    
    		   }
    
           }  /* All threads join master thread and disband */
    
           printf( "OMP TIME = [ %u ] cycles \n", ( TSC_read() - totalTime ) );
    
           start1= TSC_read();
    #ifdef ROT_LOOP
    
    	  //#pragma omp parallel for
    	  	for(y = 0; y < 128; ++y)
    	  	{
    	  		for(x = 0; x < 128; ++x)
    	  		{  
    	  //					
    	  //					nx = (x - 128 / 2) * cos(angle * a2r) + (y - 128 / 2) * sin(angle * a2r);
    	  //					ny = 0 - (x - 128 / 2) * sin(angle * a2r) + (y - 128 / 2) * cos(angle * a2r);
    
    	  			
    	  			if(angle >= 0)
    	  				angle_tmp = angle % 360;
    	  			if(angle < 0)
    	  				angle_tmp = 360 - (0 - angle) % 360;
    	  			nx = (x - 64) * cos_table[angle_tmp] + (y - 64) * sin_table[angle_tmp];
    	  			ny = 0 - (x - 64) * sin_table[angle_tmp] + (y - 64) * cos_table[angle_tmp];
    	  			if((nx + 128 / 2) >= 0 && (nx + 128 / 2) < 128 && (ny + 128/2) >= 0 && (ny + 128/2) < 128)
    	  			{
    	  				ROT_result[(int)(ny + 128/2) * 128 + (int)(nx + 128 / 2)] = MidRawData[y * 128 + x];
    	  				if(((ny + 128/2) + 1) < 128)
    	  					ROT_result[(int)((ny + 128/2) + 1) * 128 + (int)(nx + 128 / 2)] = MidRawData[y * 128 + x];
    	  				if(((nx + 128 / 2) + 1) < 128)
    	  					ROT_result[(int)(ny + 128/2) * 128 + (int)(nx + 128 / 2) + 1] = MidRawData[y * 128 + x];
    	  				if(((ny + 128/2) + 1) < 128 && ((nx + 128 / 2) + 1) < 128)
    	  					ROT_result[((int)(ny + 128/2) + 1) * 128 + (int)(nx + 128 / 2) + 1] = MidRawData[y * 128 + x];
    	  			}
    	  		}
    	  	}
    #else
           ROT_func(MidRawData, 50, ROT_result);
    #endif
           end1= TSC_read();
    
           printf( "ROT#2 = [ %u ] cycles \n", ( end1 - start1 ) );
    
    
    }
    
    
    
    

  • 1) I suggest you first carefully go through the details posted in this thread... it seems that measuring timing with OpenMP can be quite tricky!

         http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/265543.aspx

    2) Make sure you have all of the latest software versions (OpenMP, MCSDK, etc.) installed

    3) Is Optimize level 3 (-O3) turned on? (In project properties)  I suggest that it is

    4) Which device are you using?  (Hardware and Not a simulator, correct?)

    5) Have you verified measuring device clock cycles with a more simple algorithm? Perhaps:

    • First: benchmark a simple for() loop on a single core without OpenMP
    • Second: benchmark that same loop on a single core WITH OpenMP (using omp_set_num_threads(1)  )
    • Third: benchmark the simple loop on two cores, 4 cores, & 8 cores
  • ZhengTian,

    I looked at your code. For the two thread case your code uses omp_set_num_threads(1) and then uses nested #pragma omp parallel calls embedded in a for loop. As Chris suggested I would start off with a simpler example if your objective is only to compare one v/s two thread performance. There are hello world and matrix multiplication examples that come with the OMP package that TI provides. You can create a project based on that by File -> New -> CCS Project. Select Generic C66x under 'Device' and the relevant OMP example under 'Project templates and examples'

    The e2e post that Chris pointed to provides more details on timing measurement.

    Also please make sure that you have correctly configured the number of processors in the cfg file using OpenMP.setNumProcessors.