Hi,
In the past, I saw bad performance on matvec sample on OMP 1.x.
I recently know that TI had moved the support from OMP 1.x to OMP 2.x and I re-checked matvec performance with new OMP 2.x runtime.
I followed the porting guide described in below:
Now I successfully re-built the required environment for OMP 2.x.
And by following the user's guide delivered from openmp_dsp_2_01_16_03, I created the attached OpenMP 2.x CCS project and run it on C6678 EVM. The source code is based on matvec sample in openmp_dsp_2_01_16_03 package but I have slightly changed it to see the performance gap between core0 only and full core.
sample_openmp_matvec_keystone.zip
Following is the CCS console log (result)
Core0 Only : 964790 cycles
sum of all c[] = 250500235264.00
Full core : 2754332 cycles
sum of all c[] = 250500251648.00
The above "sum of all c[]"s were same value each other so OpenMP runtime itlself is working correctly, but its performance is still bad compared to core0 only use case.Please note the definition of SIZE has been changed from the default(10) to bigger value (say, 1000) to get better performance on OpenMP runtime, but still, I saw bad numbers in benchmark.
Now my questions are :
- Do you have any other OpenMP sample codes to see its benefit (faster execution time than one of a single core execution) ?
- Our customers might consider about using OpenMP, but I think the performance can be worse as you see in my matvec. Do you have any guideline to get better performance in execution time for the application powered by OpenMP ?
- During the link time, I saw the following warnings :
warning #10247-D: creating output section ".tdata" without a SECTIONS specification
warning #10247-D: creating output section ".tbss" without a SECTIONS specification
This is because .tdata and .tbss are not intentionally mapped to the existing memory in cfg file. How should we handle these warning ?
Best Regards,
Naoki Kawada