Other Parts Discussed in Thread: AM5728, DM3730
PSDK: 03.02.05
Kernel: 4.4.32
CPU: AM5728
I test with the ti example of vecadd_openmp.
Result for OpenCL + OpenMP(Dual Core)
./vecadd_openmp
DEVICE: TI Multicore C66 DSP
[core 0] i:0
[core 0] i:100
[core 0] i:200
[core 0] i:300
[core 0] i:400
[core 0] i:500
[core 0] i:600
[core 0] i:700
[core 0] i:800
[core 0] i:900
[core 0] i:1000
[core 0] i:1100
[core 0] i:1200
[core 0] i:1300
[core 0] i:1400
[core 0] i:1500
[core 0] i:1600
[core 0] i:1700
[core 0] i:1800
[core 0] i:1900
[core 1] i:4100
[core 0] i:2000
[core 1] i:4200
[core 0] i:2100
[core 1] i:4300
[core 0] i:2200
[core 1] i:4400
[core 0] i:2300
[core 1] i:4500
[core 0] i:2400
[core 1] i:4600
[core 0] i:2500
[core 1] i:4700
[core 0] i:2600
[core 1] i:4800
[core 0] i:2700
[core 1] i:4900
[core 0] i:2800
[core 1] i:5000
[core 0] i:2900
[core 1] i:5100
[core 0] i:3000
[core 1] i:5200
[core 0] i:3100
[core 1] i:5300
[core 0] i:3200
[core 1] i:5400
[core 0] i:3300
[core 1] i:5500
[core 0] i:3400
[core 1] i:5600
[core 0] i:3500
[core 1] i:5700
[core 0] i:3600
[core 1] i:5800
[core 0] i:3700
[core 1] i:5900
[core 0] i:3800
[core 1] i:6000
[core 0] i:3900
[core 1] i:6100
[core 0] i:4000
[core 1] i:6200
Write BufA : Queue to Submit: 23 us
Write BufA : Submit to Start : 59 us
Write BufA : Start to End : 145 us
Write BufB : Queue to Submit: 177 us
Write BufB : Submit to Start : 77 us
Write BufB : Start to End : 131 us
Kernel : Queue to Submit: 5 us
Kernel : Submit to Start : 134 us
Kernel : Start to End : 4823 us
Read BufDst : Queue to Submit: 4926 us
Read BufDst : Submit to Start : 245 us
Read BufDst : Start to End : 146 us
PASS!
Result for OpenCL + OpenMP End
Result for OpenCL(by commenting the "#pragma omp parallel for") (Single Core)
root@am57xx-evm:/vecadd_openmp# ./vecadd_openmp
DEVICE: TI Multicore C66 DSP
[core 0] i:0
[core 0] i:100
[core 0] i:200
[core 0] i:300
[core 0] i:400
[core 0] i:500
[core 0] i:600
[core 0] i:700
[core 0] i:800
[core 0] i:900
[core 0] i:1000
[core 0] i:1100
[core 0] i:1200
[core 0] i:1300
[core 0] i:1400
[core 0] i:1500
[core 0] i:1600
[core 0] i:1700
[core 0] i:1800
[core 0] i:1900
[core 0] i:2000
[core 0] i:2100
[core 0] i:2200
[core 0] i:2300
[core 0] i:2400
[core 0] i:2500
[core 0] i:2600
[core 0] i:2700
[core 0] i:2800
[core 0] i:2900
[core 0] i:3000
[core 0] i:3100
[core 0] i:3200
[core 0] i:3300
[core 0] i:3400
[core 0] i:3500
[core 0] i:3600
[core 0] i:3700
[core 0] i:3800
[core 0] i:3900
[core 0] i:4000
[core 0] i:4100
[core 0] i:4200
[core 0] i:4300
[core 0] i:4400
[core 0] i:4500
[core 0] i:4600
[core 0] i:4700
[core 0] i:4800
[core 0] i:4900
[core 0] i:5000
[core 0] i:5100
[core 0] i:5200
[core 0] i:5300
[core 0] i:5400
[core 0] i:5500
[core 0] i:5600
[core 0] i:5700
[core 0] i:5800
[core 0] i:5900
[core 0] i:6000
[core 0] i:6100
[core 0] i:6200
[core 0] i:6300
[core 0] i:6400
[core 0] i:6500
[core 0] i:6600
[core 0] i:6700
[core 0] i:6800
[core 0] i:6900
[core 0] i:7000
[core 0] i:7100
[core 0] i:7200
[core 0] i:7300
[core 0] i:7400
[core 0] i:7500
[core 0] i:7600
[core 0] i:7700
[core 0] i:7800
[core 0] i:7900
[core 0] i:8000
[core 0] i:8100
Write BufA : Queue to Submit: 22 us
Write BufA : Submit to Start : 57 us
Write BufA : Start to End : 161 us
Write BufB : Queue to Submit: 192 us
Write BufB : Submit to Start : 176 us
Write BufB : Start to End : 127 us
Kernel : Queue to Submit: 3 us
Kernel : Submit to Start : 52 us
Kernel : Start to End : 6973 us
Read BufDst : Queue to Submit: 7000 us
Read BufDst : Submit to Start : 242 us
Read BufDst : Start to End : 162 us
PASS!
root@am57xx-evm:/vecadd_openmp#
Result for OpenCL(by commenting the "#pragma omp parallel for") End
I want to know why the "Kernel : Start to End" is almost the same? In my mind, If I comment the "#pragma omp parallel for", only one dsp core computing, Time should be nearly twice to the dual core version.