This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM57X: OpenCV tests

Part Number: PROCESSOR-SDK-AM57X

Hi Team,

I have a question from the customer below, I have sent the attached file to Ron, but he recommended i post on the forums. Can you please review their inquiry and let me know if you have any questions?

Thanks for the help!

_____

"I've made some headway in evaluating the AM57x hardware offload. I've gotten a test OpenCV/CL application running and I'm getting some strange results. I used the instructions found here to assert OpenCL was working correctly, and ran my benchmark. I'm seeing that hardware offload is slower in every case my contrived benchmark runs (see attached logs).

 

My tests included 160x120, 320x240, and 1920x1080 frames at uint8, uint16 and float32 bit depth run both with and without hardware offload. The attached logs were captured while the CPU was underclocked to 1.0GHz. I'm currently running the same tests on 1.2 and 1.5GHz and will have those results in a day or so.

 

I'm running all my tests on a Phytec phyCORE-AM57x. Phytec uses the latest release of Project Arago in their RDK. I'm fairly certain my DSP slowdown is not caused by my hardware, though I'd love to be proved wrong. 

 

Can you help me understand these results? Are there certain filters I should always use the CPU for? What benchmarks did TI run to get these results? Can I get access to those benchmarking tools to verify the claims on TI's website? Is there anything else I can provide to help find the root cause of this slowdown?"

-------

  • Hi,

    The software team have been notified. They will respond here.
  • Sorry for the delay...I am escalating this one...
  • Sorry for the delay in getting a response on this issue.


    We reached out to the developer for his comments and here are our responses:

    None of particular kernels (from the project that you ran) have been specifically implemented for DSP – instead they use standard OpenCL, so worse performance of DSP (vs  A15) is consistent with our observation (A15 runs faster compared to C66 –  C66@700MHz as well vs A15@1.5GHz ).
    The bench-marking that we have done on the offload to DSP has required us to perform manual tuning for several DSP kernels in order to demonstrate the DSP gains over A15. That is the only option to achieve higher performance on the DSP as the OpenCV functions have not been written in a way that they will run optimally on the DSP architecture. OpenCV tests must be run from the command line.
    •    First one needs to clone github.com/.../testdata to /usr/share/OpenCV/testdata (on target FS).
    •    Then (from /usr/share/OpenCV/titestsuite) needs to setup env variables (on target FS) using setupEnv.sh
    •    Finally to run ./runtests.

    The benchmark results (referred in e2e post) were collected – using OpenCV perf tests version (code.opencv.org/.../HowToUsePerfTests ) available at that time.

    Hope this helps.


    Regards,

    Rahul

  • In addition to what Rahul has mentioned, you may want to take a look at <nowiki>processors.wiki.ti.com/.../OpenCV<nowiki> paragraph "OpenCV OpenCL related framework details: how to add new DSP kernel" . By manually coding specific DSP kernels, it is possible to achieve better performance.
    We have included (so far) only several manually optimized kernels primarily to show the optimization method and kernel integration into existing OpenCV framework.

    OpenCV source for AM572x can be found at <nowiki>git.ti.com/.../tiopencvrelease_3.1  <nowiki> .

    You would need Yocto setup established (as described in above link), and source cloned. After code modification you can compile with:

    ARAGO_BRAND=processor-sdk MACHINE=am57xx-evm bitbake opencv --force -c compile
    ARAGO_BRAND=processor-sdk MACHINE=am57xx-evm bitbake opencv


  • I'm still trying to reproduce the benchmarks provided by the marketing material. I'm just trying to focus on Gaussian blur for now. When I run Imgproc_GaussianBlur with OpenCL enabled it still runs slower than when OpenCL is disabled. Should I be running the OCL_Filter/GaussianBlurTest instead? How can I compare Imgproc_GaussianBlur with OCL_Filter/GaussianBlurTest? My test script is below. My results are attached.

    #!/bin/sh

    export OPENCV_TEST_DATA_PATH=/home/root/testdata

    unset OPENCV_OPENCL_DEVICE
    /usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_cpu.xml

    export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
    /usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_dsp.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-06T18:51:37" time="3.496" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl="disabled" name="AllTests">
      <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.495">
        <testcase name="accuracy" status="run" time="3.495" classname="Imgproc_GaussianBlur" />
      </testsuite>
    </testsuites>
    

    <?xml version="1.0" encoding="UTF-8"?>
    <testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-06T18:51:41" time="21.775" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl_platform_0_device_0="(Platform=TI AM57x)(Type=unknown)(Name=TI Multicore C66 DSP)(Version=OpenCL 1.1 TI )" cv_ocl_current_deviceType="TI_DSP" cv_ocl_current_deviceName="TI Multicore C66 DSP" cv_ocl_current_deviceVersion="OpenCL 1.1 TI " cv_ocl_current_maxComputeUnits="2" cv_ocl_current_maxWorkGroupSize="1024" cv_ocl_current_localMemSize="131072" cv_ocl_current_maxMemAllocSize="150994944" cv_ocl_current_haveDoubleSupport="1" cv_ocl_current_hostUnifiedMemory="1" cv_ocl_current_AmdBlas="0" cv_ocl_current_AmdFft="0" cv_ocl_current_preferredVectorWidthChar="8" cv_ocl_current_preferredVectorWidthShort="4" cv_ocl_current_preferredVectorWidthInt="2" cv_ocl_current_preferredVectorWidthLong="2" cv_ocl_current_preferredVectorWidthFloat="2" cv_ocl_current_preferredVectorWidthDouble="1" name="AllTests">
      <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="21.775">
        <testcase name="accuracy" status="run" time="21.774" classname="Imgproc_GaussianBlur" />
      </testsuite>
    </testsuites>
    

  • Appreciate the help and getting this escalated!
  • Please note that OpenCL kernels are compiled on-the-fly, so there is significant delay to online (cl6x DSP compiler running on A15) compilation (as long as few seconds but also up to 1-2 minutes).

    So if a kernel is executed few times only (same program), overall time is dominated by the compilation time. This time however is needed once. If the kernel is invoked multiple time, subsequent calls should take only the execution time.

    If you set TI_OCL_CACHE_KERNELS variable (http://downloads.ti.com/mctools/esd/docs/opencl/environment_variables.html#envvar-TI_OCL_CACHE_KERNELS, ) next calls should not trigger kernel compilation.

    We will also take a look at this particular test case.

  • Please find modified script with above variable set - and run the script at least twice:

    #!/bin/sh
    export TI_OCL_CACHE_KERNELS=Y
    export TI_OCL_KEEP_FILES=Y
    export OPENCV_BUILDDIR=/usr/share/OpenCV/samples
    export OPENCV_TEST_DATA_PATH=/usr/share/OpenCV/testdata

    echo "===========================================CPU========================================================"
    unset OPENCV_OPENCL_DEVICE
    /usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_cpu.xml
    echo "===========================================DSP========================================================"
    export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
    /usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_dsp.xml

    If CACHE_KERNELS is not set, DSP test time is close to 20seconds. If this environment variable is set, it drops to 3+ seconds (on 2nd, and any subsequent run):

    root@am57xx-evm:/usr/share/OpenCV/titestsuite# cat gauss_cpu.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T02:38:45" time="3.328" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl="disabled" name="AllTests">
      <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.327">
        <testcase name="accuracy" status="run" time="3.327" classname="Imgproc_GaussianBlur" />
      </testsuite>
    </testsuites>
    root@am57xx-evm:/usr/share/OpenCV/titestsuite# cat gauss_dsp.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T02:38:49" time="3.356" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl_platform_0_device_0="(Platform=TI AM57x)(Type=unknown)(Name=TI Multicore C66 DSP)(Version=OpenCL 1.1 TI )" cv_ocl_current_deviceType="TI_DSP" cv_ocl_current_deviceName="TI Multicore C66 DSP" cv_ocl_current_deviceVersion="OpenCL 1.1 TI " cv_ocl_current_maxComputeUnits="2" cv_ocl_current_maxWorkGroupSize="1024" cv_ocl_current_localMemSize="131072" cv_ocl_current_maxMemAllocSize="150994944" cv_ocl_current_haveDoubleSupport="1" cv_ocl_current_hostUnifiedMemory="1" cv_ocl_current_AmdBlas="0" cv_ocl_current_AmdFft="0" cv_ocl_current_preferredVectorWidthChar="8" cv_ocl_current_preferredVectorWidthShort="4" cv_ocl_current_preferredVectorWidthInt="2" cv_ocl_current_preferredVectorWidthLong="2" cv_ocl_current_preferredVectorWidthFloat="2" cv_ocl_current_preferredVectorWidthDouble="1" name="AllTests">
      <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.355">
        <testcase name="accuracy" status="run" time="3.355" classname="Imgproc_GaussianBlur" />
      </testsuite>
    </testsuites>

  • Setting TI_OCL_CACHE_KERNELS and TI_OCL_KEEP_FILES helps a lot, but I don't see the 1.9x speedup the marketing material promises. I've added --gtest_repeat=5 to each test to try to use the cached kernels. Attached are the xml outputs of my test script, my test script and my logs. Why are the A15 numbers so bad on the marketing site? I've never seen a Gaussian blur take 700ms. My DSPs are a bit better than reported on the marketing site (340 instead of 360ms). I've also added Imgproc_Feature2D tests to check their performance. The DSP and the CPU perform near identically at 340ms, but the marketing material claims the DSP runs at 240ms. What am I missing? 

    http://www.ti.com/lsds/ti/processors/technology/libraries/open-cv-libraries.page

    #!/bin/sh
    export TI_OCL_CACHE_KERNELS=Y
    export TI_OCL_KEEP_FILES=Y
    export OPENCV_BUILDDIR=/usr/share/OpenCV/samples
    export OPENCV_TEST_DATA_PATH=/home/root/testdata
    
    echo "===========================================CPU========================================================"
    unset OPENCV_OPENCL_DEVICE
    /usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_cpu.xml --gtest_repeat=5
    /usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Filter2* --gtest_output=xml:filter2d_cpu.xml --gtest_repeat=5
    
    echo "===========================================DSP========================================================"
    export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
    /usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_dsp.xml --gtest_repeat=5
    /usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Filter2* --gtest_output=xml:filter2d_dsp.xml --gtest_repeat=5
    

    ===========================================CPU========================================================
    CTEST_FULL_OUTPUT
    OpenCV version: 3.1.0
    OpenCV VCS version: unknown
    Build type: release
    Parallel framework: tbb
    CPU features: neon
    OpenCL is disabled
    
    Repeating all tests (iteration 1) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3388 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3389 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3391 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 2) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3341 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3342 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3346 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 3) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3342 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3344 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3347 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 4) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3387 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3387 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3390 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 5) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3384 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3384 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3384 ms total)
    [  PASSED  ] 1 test.
    CTEST_FULL_OUTPUT
    OpenCV version: 3.1.0
    OpenCV VCS version: unknown
    Build type: release
    Parallel framework: tbb
    CPU features: neon
    OpenCL is disabled
    
    Repeating all tests (iteration 1) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3464 ms)
    [----------] 1 test from Imgproc_Filter2D (3464 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3464 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 2) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3436 ms)
    [----------] 1 test from Imgproc_Filter2D (3436 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3438 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 3) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3479 ms)
    [----------] 1 test from Imgproc_Filter2D (3480 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3482 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 4) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3480 ms)
    [----------] 1 test from Imgproc_Filter2D (3482 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3485 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 5) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3404 ms)
    [----------] 1 test from Imgproc_Filter2D (3405 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3409 ms total)
    [  PASSED  ] 1 test.
    ===========================================DSP========================================================
    CTEST_FULL_OUTPUT
    OpenCV version: 3.1.0
    OpenCV VCS version: unknown
    Build type: release
    Parallel framework: tbb
    CPU features: neon
    OpenCL Platforms: 
        TI AM57x
            unknown: TI Multicore C66 DSP (OpenCL 1.1 TI )
    Current OpenCL device: 
        Type = TI_DSP
        Name = TI Multicore C66 DSP
        Version = OpenCL 1.1 TI 
        Compute units = 2
        Max work group size = 1024
        Local memory size = 128 kB 
        Max memory allocation size = 144 MB 
        Double support = Yes
        Host unified memory = Yes
        Has AMD Blas = No
        Has AMD Fft = No
        Preferred vector width char = 8
        Preferred vector width short = 4
        Preferred vector width int = 2
        Preferred vector width long = 2
        Preferred vector width float = 2
        Preferred vector width double = 1
    
    Repeating all tests (iteration 1) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3408 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3408 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3408 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 2) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3372 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3372 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3372 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 3) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3377 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3377 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3377 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 4) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3395 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3395 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3396 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 5) . . .
    
    Note: Google Test filter = Img*Gauss*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_GaussianBlur
    [ RUN      ] Imgproc_GaussianBlur.accuracy
    [       OK ] Imgproc_GaussianBlur.accuracy (3377 ms)
    [----------] 1 test from Imgproc_GaussianBlur (3377 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3377 ms total)
    [  PASSED  ] 1 test.
    CTEST_FULL_OUTPUT
    OpenCV version: 3.1.0
    OpenCV VCS version: unknown
    Build type: release
    Parallel framework: tbb
    CPU features: neon
    OpenCL Platforms: 
        TI AM57x
            unknown: TI Multicore C66 DSP (OpenCL 1.1 TI )
    Current OpenCL device: 
        Type = TI_DSP
        Name = TI Multicore C66 DSP
        Version = OpenCL 1.1 TI 
        Compute units = 2
        Max work group size = 1024
        Local memory size = 128 kB 
        Max memory allocation size = 144 MB 
        Double support = Yes
        Host unified memory = Yes
        Has AMD Blas = No
        Has AMD Fft = No
        Preferred vector width char = 8
        Preferred vector width short = 4
        Preferred vector width int = 2
        Preferred vector width long = 2
        Preferred vector width float = 2
        Preferred vector width double = 1
    
    Repeating all tests (iteration 1) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3498 ms)
    [----------] 1 test from Imgproc_Filter2D (3498 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3503 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 2) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3410 ms)
    [----------] 1 test from Imgproc_Filter2D (3411 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3415 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 3) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3471 ms)
    [----------] 1 test from Imgproc_Filter2D (3471 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3474 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 4) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3458 ms)
    [----------] 1 test from Imgproc_Filter2D (3458 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3459 ms total)
    [  PASSED  ] 1 test.
    
    Repeating all tests (iteration 5) . . .
    
    Note: Google Test filter = Img*Filter2*
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from Imgproc_Filter2D
    [ RUN      ] Imgproc_Filter2D.accuracy
    [       OK ] Imgproc_Filter2D.accuracy (3411 ms)
    [----------] 1 test from Imgproc_Filter2D (3412 ms total)
    
    [----------] Global test environment tear-down
    [==========] 1 test from 1 test case ran. (3416 ms total)
    [  PASSED  ] 1 test.
    

    <?xml version="1.0" encoding="UTF-8"?>
    <testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T16:49:01" time="3.384" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl="disabled" name="AllTests">
      <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.384">
        <testcase name="accuracy" status="run" time="3.384" classname="Imgproc_GaussianBlur" />
      </testsuite>
    </testsuites>
    

    <?xml version="1.0" encoding="UTF-8"?>
    <testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T16:49:36" time="3.377" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl_platform_0_device_0="(Platform=TI AM57x)(Type=unknown)(Name=TI Multicore C66 DSP)(Version=OpenCL 1.1 TI )" cv_ocl_current_deviceType="TI_DSP" cv_ocl_current_deviceName="TI Multicore C66 DSP" cv_ocl_current_deviceVersion="OpenCL 1.1 TI " cv_ocl_current_maxComputeUnits="2" cv_ocl_current_maxWorkGroupSize="1024" cv_ocl_current_localMemSize="131072" cv_ocl_current_maxMemAllocSize="150994944" cv_ocl_current_haveDoubleSupport="1" cv_ocl_current_hostUnifiedMemory="1" cv_ocl_current_AmdBlas="0" cv_ocl_current_AmdFft="0" cv_ocl_current_preferredVectorWidthChar="8" cv_ocl_current_preferredVectorWidthShort="4" cv_ocl_current_preferredVectorWidthInt="2" cv_ocl_current_preferredVectorWidthLong="2" cv_ocl_current_preferredVectorWidthFloat="2" cv_ocl_current_preferredVectorWidthDouble="1" name="AllTests">
      <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.377">
        <testcase name="accuracy" status="run" time="3.377" classname="Imgproc_GaussianBlur" />
      </testsuite>
    </testsuites>
    

    <?xml version="1.0" encoding="UTF-8"?>
    <testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T16:49:18" time="3.409" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl="disabled" name="AllTests">
      <testsuite name="Imgproc_Filter2D" tests="1" failures="0" disabled="0" errors="0" time="3.405">
        <testcase name="accuracy" status="run" time="3.404" classname="Imgproc_Filter2D" />
      </testsuite>
    </testsuites>
    

    <?xml version="1.0" encoding="UTF-8"?>
    <testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T16:49:53" time="3.416" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl_platform_0_device_0="(Platform=TI AM57x)(Type=unknown)(Name=TI Multicore C66 DSP)(Version=OpenCL 1.1 TI )" cv_ocl_current_deviceType="TI_DSP" cv_ocl_current_deviceName="TI Multicore C66 DSP" cv_ocl_current_deviceVersion="OpenCL 1.1 TI " cv_ocl_current_maxComputeUnits="2" cv_ocl_current_maxWorkGroupSize="1024" cv_ocl_current_localMemSize="131072" cv_ocl_current_maxMemAllocSize="150994944" cv_ocl_current_haveDoubleSupport="1" cv_ocl_current_hostUnifiedMemory="1" cv_ocl_current_AmdBlas="0" cv_ocl_current_AmdFft="0" cv_ocl_current_preferredVectorWidthChar="8" cv_ocl_current_preferredVectorWidthShort="4" cv_ocl_current_preferredVectorWidthInt="2" cv_ocl_current_preferredVectorWidthLong="2" cv_ocl_current_preferredVectorWidthFloat="2" cv_ocl_current_preferredVectorWidthDouble="1" name="AllTests">
      <testsuite name="Imgproc_Filter2D" tests="1" failures="0" disabled="0" errors="0" time="3.412">
        <testcase name="accuracy" status="run" time="3.411" classname="Imgproc_Filter2D" />
      </testsuite>
    </testsuites>
    

  • Could you pls confirm what are the A15 and C66 clock frequencies, during your tests?

  • The A15 is clocked at 1.5 GHz. How can I check the C66 clock?
  • Please do:
    omapconf show opp
    You should see something like:
    ...
    |-----------------------------------------------------------------------------------|
    | | Temperature | Voltage | Frequency | OPerating Point |
    |-----------------------------------------------------------------------------------|
    | VDD_CORE / VDD_CORE0 | 61C / 141F | 0.000 V | | NOM |
    | L3 | | | 266 MHz | |
    | DMM | | | 266 MHz | |
    | EMIF1 | | | 266 MHz | |
    | EMIF2 | | | 266 MHz | |
    | LP-DDR2 | | | 532 MHz | |
    | L4 | | | 266 MHz | |
    | IPU1 | | | 425 MHz | |
    | Cortex-M4 Cores | | | 212 MHz | |
    | IPU2 | | | 425 MHz | |
    | Cortex-M4 Cores | | | 212 MHz | |
    | DSS | | | 192 MHz | |
    | BB2D | | | (354 MHz) (1) | |
    | | | | | |
    | VDD_MPU / VDD_CORE1 | 62C / 143F | 0.980 V | | NOM |
    | MPU (CPU1 ON) | | | 1000 MHz | |
    | | | | | |
    | VDD_GPU / VDD_CORE2 | 62C / 143F | 1.020 V | | HIGH |
    | GPU | | | 532 MHz | |
    | | | | | |
    | VDD_DSPEVE / VDD_CORE3 | 61C / 141F | 1.050 V | | UNKNOWN |
    | DSP1 | | | 750 MHz | |
    | DSP2 | | | 750 MHz | |
    | EVE1 | | | (0 MHz) (1) | |
    | EVE2 | | | (0 MHz) (1) | |
    | | | | | |
    | VDD_IVA / VDD_CORE4 | 61C / 141F | 1.800 V | | HIGH |
    | IVA | | | (532 MHz) (1) | |
    | | | | | |
    |-----------------------------------------------------------------------------------|

    Notes:
    (1) Module is disabled, rate may not be relevant.
  • |-----------------------------------------------------------------------------------|
    | | Temperature | Voltage | Frequency | OPerating Point |
    |-----------------------------------------------------------------------------------|
    | VDD_CORE / VDD_CORE0 | 44C / 111F | 0.000 V | | NOM |
    | L3 | | | 266 MHz | |
    | DMM | | | 266 MHz | |
    | EMIF1 | | | 266 MHz | |
    | EMIF2 | | | 266 MHz | |
    | LP-DDR2 | | | 532 MHz | |
    | L4 | | | 266 MHz | |
    | IPU1 | | | (425 MHz) (1) | |
    | Cortex-M4 Cores | | | (212 MHz) (1) | |
    | IPU2 | | | (425 MHz) (1) | |
    | Cortex-M4 Cores | | | (212 MHz) (1) | |
    | DSS | | | 192 MHz | |
    | BB2D | | | (354 MHz) (1) | |
    | | | | | |
    | VDD_MPU / VDD_CORE1 | 40C / 104F | 1.030 V | | NOM |
    | MPU (CPU1 ON) | | | 1000 MHz | |
    | | | | | |
    | VDD_GPU / VDD_CORE2 | 42C / 107F | 1.020 V | | NOM |
    | GPU | | | 425 MHz | |
    | | | | | |
    | VDD_DSPEVE / VDD_CORE3 | 44C / 111F | 0.980 V | | UNKNOWN |
    | DSP1 | | | (600 MHz) (1) | |
    | DSP2 | | | (600 MHz) (1) | |
    | EVE1 | | | (0 MHz) (1) | |
    | EVE2 | | | (0 MHz) (1) | |
    | | | | | |
    | VDD_IVA / VDD_CORE4 | 43C / 109F | 1.800 V | | NOM |
    | IVA | | | (388 MHz) (1) | |
    | | | | | |
    |-----------------------------------------------------------------------------------|

    Looks like the DSPs are running at 600MHz
  • This (C66 clock) may have been modified in DT of Phytec board.
    C66 frequency is fixed, i.e. not adapted during execution, whereas A15 is using DVFS to tune frequency based on load.
    omapconf shows current state only.

    There are some stats you can check @ /sys/devices/system/cpu/cpu0/cpufreq/stats/ (A15 frequencies and transitions if adaptive clocking is used).

    Please check script >> git.ti.com/.../optimize-benchmark.sh
    You can set A15 to run at max speed and not do cpu governer (dvfs).
  • I don't see any references to the C66 clock in the Phytec device trees. Am I missing something?

    stash.phytec.com/.../am57xx-phycore-rdk.dts
    stash.phytec.com/.../am57xx-phycore-som.dtsi

    I'll try to repro with the performance governor later today. I'll report back the results.
  • DSP clock value could be defined in u-boot: I have forwarded question to colleagues more familiar with this topic.
  • Our latest Processor Linux SDK release 3.3.0.4 (1Q2017) release has this set to higher clock. Also few earlier releases.

    This feature is defined in u-boot configuration file. It will be good to check your u-boot config file, if you can see:

    CONFIG_DRA7_DSPEVE_OPP_HIGH=y

  • Below is the configuration for my development board. The CONFIG_DRA7_DSPEVE_OPP_HIGH is set to Y.

    CONFIG_ARM=y

    CONFIG_OMAP54XX=y

    CONFIG_TARGET_AM57XX_PHYCORE_RDK=y

    CONFIG_DM_SERIAL=y

    CONFIG_DM_GPIO=y

    CONFIG_ARMV7_LPAE=y

    CONFIG_SPL_STACK_R_ADDR=0x82000000

    CONFIG_DEFAULT_DEVICE_TREE="am57xx-phycore-rdk"

    CONFIG_SPL=y

    CONFIG_SPL_STACK_R=y

    CONFIG_HUSH_PARSER=y

    CONFIG_CMD_BOOTZ=y

    # CONFIG_CMD_IMLS is not set

    CONFIG_CMD_ASKENV=y

    # CONFIG_CMD_FLASH is not set

    CONFIG_CMD_MMC=y

    CONFIG_CMD_SPI=y

    CONFIG_CMD_I2C=y

    CONFIG_CMD_USB=y

    CONFIG_CMD_DFU=y

    CONFIG_CMD_GPIO=y

    # CONFIG_CMD_SETEXPR is not set

    CONFIG_CMD_DHCP=y

    CONFIG_CMD_MII=y

    CONFIG_CMD_PING=y

    CONFIG_CMD_REGULATOR=y

    CONFIG_CMD_EXT2=y

    CONFIG_CMD_EXT4=y

    CONFIG_CMD_EXT4_WRITE=y

    CONFIG_CMD_FAT=y

    CONFIG_CMD_FS_GENERIC=y

    CONFIG_OF_CONTROL=y

    CONFIG_DM=y

    CONFIG_DM_MMC=y

    CONFIG_SPI_FLASH=y

    CONFIG_SPI_FLASH_BAR=y

    CONFIG_SYS_NS16550=y

    CONFIG_USB=y

    CONFIG_USB_DWC3=y

    CONFIG_USB_DWC3_GADGET=y

    CONFIG_USB_DWC3_OMAP=y

    CONFIG_USB_DWC3_PHY_OMAP=y

    CONFIG_USB_GADGET=y

    CONFIG_USB_GADGET_DOWNLOAD=y

    CONFIG_G_DNL_MANUFACTURER="Texas Instruments"

    CONFIG_G_DNL_VENDOR_NUM=0x0451

    CONFIG_G_DNL_PRODUCT_NUM=0xd022

    CONFIG_ERRNO_STR=y

    CONFIG_FIT=y

    CONFIG_SPL_OF_LIBFDT=y

    CONFIG_SPL_LOAD_FIT=y

    CONFIG_OF_LIST="am57xx-phycore-rdk"

    CONFIG_OF_BOARD_SETUP=y

    CONFIG_DRA7_DSPEVE_OPP_HIGH=y

    CONFIG_DRA7_IVA_OPP_HIGH=y

    CONFIG_DRA7_GPU_OPP_HIGH=y

    CONFIG_DISK=y

    CONFIG_DWC_AHCI=y

    CONFIG_DM_ETH=y

    CONFIG_DM_PMIC=y

    CONFIG_PMIC_PALMAS=y

    CONFIG_DM_REGULATOR=y

    CONFIG_CMD_TIME=y

    CONFIG_DM_I2C=y

    CONFIG_DM_SPI=y

    CONFIG_DM_SPI_FLASH=y

    CONFIG_SPI_FLASH_STMICRO=y

    CONFIG_TI_QSPI=y

    CONFIG_CMD_SF=y

    /opt/PHYTEC_BSPs/yocto_ti/build/arago-tmp-external-linaro-toolchain/work/am57xx_phycore_rdk-linux-gnueabi/u-boot-phytec/2016.05+git_v2016.05-phy2-r1/git/configs/am57xx_phycore_rdk_defconfig