This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenCV/OpenCL performance issue

Other Parts Discussed in Thread: AM5728

Hi,

I'm trying to get feeling about DSP c66x capability for image processing on AM5728 (SDK: am57xx-evm-linux-sdk-arago-src-03.01.00.06).

There is OpenCL demo on SDK image for vadd and that demo shows 0.98 better DSP performance than CPU. That, what raised question about DSP capability.

Currently SDK has demo application "Processor SDK Demos Video Analytics" wiki.tiprocessors.com/.../Processor_SDK_Demos_Video_Analytics
but OpenCV is used on CPU as it shown in document and openCL is used for some wave modeling. So I cannot get any feeling about OpenCV/OpenCL performance for full image processing.

Tried OpenCV OpenCL samples but, was able to run  bgfg_segm and that gave me the same performance on both CPU and OpenCL.
So at the end made simplest test tool to see what is going on there.

Tool does following:
1. Creates YUV I422 1920x1080 cv::Umat image with different kind of memory allocation: device,shared,host
2. The same dimension output UMat image for RGB out
3. Converts YUV I422 to RGB (there is opencl kernel available for such conversion in OpenCV: modules/imgproc/src/opencl/cvtcolor.cl)


Result is quite unexpected:
1. First run of color conversion triggers some kind OpenCL compilation (cg66x...) and freezes tool for 30 seconds
2. Following color conversions are much slower than CPU version 452 on OpenCL vs 17ms on CPU

Any idea why is it?

(ps: I did run OpenCL setup script: source /usr/share/OpenCV/titestsuite/setupEnv.sh to enable OpenCL and did not to run on CPU)

Source Code for tool

#include <iostream>
#include <string>
#include <chrono>

#include "opencv2/core.hpp"
#include "opencv2/core/ocl.hpp"
#include "opencv2/core/utility.hpp"
#include "opencv2/videoio.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/video.hpp"

using namespace std;
using namespace cv;


int main(int argc, const char** argv)
{



    CommandLineParser cmd(argc, argv,
        "{ m memtype  | host        | memory type to use: device|host|shared }"
        "{ h help     |             | print help message }");

    if (cmd.has("help"))
    {
        cout << "Usage : bgfg_segm [options]" << endl;
        cout << "Available options:" << endl;
        cmd.printMessage();
        return EXIT_SUCCESS;
    }

    string memTypeStr=cmd.get<string>("memtype");
    cv::Size dimension(1920,1080);

    cout<<"test #1. Convert YUV422 to RGB format with Umat"<<endl;
    cout<<"         Memory type: "<<memTypeStr<<", Dimension: {"<<dimension.width<<","<<dimension.height<<"}"<<endl;

    cv::UMat inImage;
    cv::Mat  testImage;
    cv::UMat outImage;
    UMatUsageFlags memType=USAGE_ALLOCATE_SHARED_MEMORY;
    if(memTypeStr == "device")
    	memType=USAGE_ALLOCATE_DEVICE_MEMORY;
    else if(memTypeStr == "shared")
    	memType=USAGE_ALLOCATE_SHARED_MEMORY;
    else if(memTypeStr == "host")
    	memType=USAGE_ALLOCATE_HOST_MEMORY;
    else {
    	cerr<<" Memory type is not set. Default will be used" << endl;
    	memType=USAGE_DEFAULT;
    }

    inImage.create(dimension.height,dimension.width,CV_8UC2, memType);
    testImage.create(dimension.height,dimension.width,CV_8UC2);
    outImage.create(dimension,CV_8UC3, memType);

	//fill YUV image with some data
    for(int y=0; y<testImage.rows; y++){
    	for(int x=0; x<testImage.cols; x++){
    		testImage.at<Vec2b>(y,x)[0]=x;
    		testImage.at<Vec2b>(y,x)[1]=y;
    	}
    }
    cout<<"Copy to Umat..."<<endl;

    testImage.copyTo(inImage);
    cv::ocl::finish();

    cout<<"Convert I422 to RGB first run..."<<endl;
    struct timespec tmStart;
    clock_gettime(CLOCK_MONOTONIC,&tmStart);

    cv::cvtColor(inImage,outImage,COLOR_YUV2RGB_Y422);
    cv::ocl::finish();

    cout<<"Convert I422 to RGB again.."<<endl;
    struct timespec tmStart2;
    clock_gettime(CLOCK_MONOTONIC,&tmStart2);
    int runs=10;

    for(int i=0; i<runs; i++){
    	cv::cvtColor(inImage,outImage,COLOR_YUV2RGB_Y422);
    	cv::ocl::finish();
    }

    struct timespec tmEnd;
    clock_gettime(CLOCK_MONOTONIC,&tmEnd);

    unsigned long long startMs=tmStart.tv_nsec/1000000l+tmStart.tv_sec*1000l;
    unsigned long long start2Ms=tmStart2.tv_nsec/1000000l+tmStart2.tv_sec*1000l;
    unsigned long long endMs=tmEnd.tv_nsec/1000000l+tmEnd.tv_sec*1000l;

    cout<<"STATISTIC:"<<endl;
    cout<<"First Run: "<<start2Ms-startMs<<" ms"<<endl;
    cout<<"Following runs Run: "<<(double)(endMs-start2Ms)/runs<<" ms"<<endl;

    return EXIT_SUCCESS;
}

Update: Figured out first run time issue. It seems OpenCV/OpenCL logic does so called on-line kernel compilation. Adding caching option mentioned in http://downloads.ti.com/mctools/esd/docs/opencl/environment_variables.html#envvar-TI_OCL_CACHE_KERNELS