Hi,
I'm trying to profile the time taken by OpenCV algorithm for background subtraction (BackgroundSubtractorMOG2) on AM5728 based platform.
I have enabled DSP accelaration and so, I get the following debug logs from the OpenCL program -
...
[core 1] TIDSP Modified MOG2 clk=15294248 frame_row=480 frame_col=640 (80a580 80c980 808180) prune=-0.000100
[core 0] TIDSP Modified MOG2 clk=15297946 frame_row=480 frame_col=640 (80a580 80c980 808180) prune=-0.000100
The above logs seem to indicate that approx. 25.5 ms was taken for processing an image on each 600MHz DSP core (15294248 / 600000000).
My understanding is that the OpenCL programs should run parallel on each DSP core, and so, the time taken for algorithm to process entire image (640x480 resolution) should be 25.5 ms.
However, when I profile the time taken by the algorithm to process the image on ARM side, it is almost double (~51 ms).
Are the OpenCL programs on each DSP core somehow serialized? Or am I missing something here?
Regards,
Manu