AM57X: OpenCV tests

Omid (TI)

Hi Team,

I have a question from the customer below, I have sent the attached file to Ron, but he recommended i post on the forums. Can you please review their inquiry and let me know if you have any questions?

Thanks for the help!

_____

"I've made some headway in evaluating the AM57x hardware offload. I've gotten a test OpenCV/CL application running and I'm getting some strange results. I used the instructions found here to assert OpenCL was working correctly, and ran my benchmark. I'm seeing that hardware offload is slower in every case my contrived benchmark runs (see attached logs).

My tests included 160x120, 320x240, and 1920x1080 frames at uint8, uint16 and float32 bit depth run both with and without hardware offload. The attached logs were captured while the CPU was underclocked to 1.0GHz. I'm currently running the same tests on 1.2 and 1.5GHz and will have those results in a day or so.

I'm running all my tests on a Phytec phyCORE-AM57x. Phytec uses the latest release of Project Arago in their RDK. I'm fairly certain my DSP slowdown is not caused by my hardware, though I'd love to be proved wrong.

Can you help me understand these results? Are there certain filters I should always use the CPU for? What benchmarks did TI run to get these results? Can I get access to those benchmarking tools to verify the claims on TI's website? Is there anything else I can provide to help find the root cause of this slowdown?"

-------

over 8 years ago

0 Biser Gatchev-XID over 8 years ago

TI__Guru**** 393215 points

Hi,

The software team have been notified. They will respond here.

0 Rogerio Almeida over 8 years ago in reply to Biser Gatchev-XID

TI__Mastermind 26205 points

Sorry for the delay...I am escalating this one...

0 Rahul Prabhu over 8 years ago in reply to Rogerio Almeida

TI__Guru** 116170 points

Sorry for the delay in getting a response on this issue.

We reached out to the developer for his comments and here are our responses:

None of particular kernels (from the project that you ran) have been specifically implemented for DSP – instead they use standard OpenCL, so worse performance of DSP (vs A15) is consistent with our observation (A15 runs faster compared to C66 – C66@700MHz as well vs A15@1.5GHz ).
The bench-marking that we have done on the offload to DSP has required us to perform manual tuning for several DSP kernels in order to demonstrate the DSP gains over A15. That is the only option to achieve higher performance on the DSP as the OpenCV functions have not been written in a way that they will run optimally on the DSP architecture. OpenCV tests must be run from the command line.
•   First one needs to clone github.com/.../testdata to /usr/share/OpenCV/testdata (on target FS).
•   Then (from /usr/share/OpenCV/titestsuite) needs to setup env variables (on target FS) using setupEnv.sh
•   Finally to run ./runtests.

The benchmark results (referred in e2e post) were collected – using OpenCV perf tests version (code.opencv.org/.../HowToUsePerfTests ) available at that time.

Hope this helps.

Regards,

Rahul

0 DjDjS over 8 years ago in reply to Rahul Prabhu

Intellectual 750 points

In addition to what Rahul has mentioned, you may want to take a look at <nowiki>processors.wiki.ti.com/.../OpenCV<nowiki> paragraph "OpenCV OpenCL related framework details: how to add new DSP kernel" . By manually coding specific DSP kernels, it is possible to achieve better performance.
We have included (so far) only several manually optimized kernels primarily to show the optimization method and kernel integration into existing OpenCV framework.

OpenCV source for AM572x can be found at <nowiki>git.ti.com/.../tiopencvrelease_3.1 <nowiki> .

You would need Yocto setup established (as described in above link), and source cloned. After code modification you can compile with:

ARAGO_BRAND=processor-sdk MACHINE=am57xx-evm bitbake opencv --force -c compile
ARAGO_BRAND=processor-sdk MACHINE=am57xx-evm bitbake opencv

0 Bradford Barr over 8 years ago in reply to DjDjS

Prodigy 70 points

I'm still trying to reproduce the benchmarks provided by the marketing material. I'm just trying to focus on Gaussian blur for now. When I run Imgproc_GaussianBlur with OpenCL enabled it still runs slower than when OpenCL is disabled. Should I be running the OCL_Filter/GaussianBlurTest instead? How can I compare Imgproc_GaussianBlur with OCL_Filter/GaussianBlurTest? My test script is below. My results are attached.

#!/bin/sh

export OPENCV_TEST_DATA_PATH=/home/root/testdata

unset OPENCV_OPENCL_DEVICE
/usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_cpu.xml

export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
/usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_dsp.xml

Fullscreen gauss_cpu.xml Download

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-06T18:51:37" time="3.496" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl="disabled" name="AllTests">
  <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.495">
    <testcase name="accuracy" status="run" time="3.495" classname="Imgproc_GaussianBlur" />
  </testsuite>
</testsuites>

Fullscreen gauss_dsp.xml Download

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-06T18:51:41" time="21.775" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl_platform_0_device_0="(Platform=TI AM57x)(Type=unknown)(Name=TI Multicore C66 DSP)(Version=OpenCL 1.1 TI )" cv_ocl_current_deviceType="TI_DSP" cv_ocl_current_deviceName="TI Multicore C66 DSP" cv_ocl_current_deviceVersion="OpenCL 1.1 TI " cv_ocl_current_maxComputeUnits="2" cv_ocl_current_maxWorkGroupSize="1024" cv_ocl_current_localMemSize="131072" cv_ocl_current_maxMemAllocSize="150994944" cv_ocl_current_haveDoubleSupport="1" cv_ocl_current_hostUnifiedMemory="1" cv_ocl_current_AmdBlas="0" cv_ocl_current_AmdFft="0" cv_ocl_current_preferredVectorWidthChar="8" cv_ocl_current_preferredVectorWidthShort="4" cv_ocl_current_preferredVectorWidthInt="2" cv_ocl_current_preferredVectorWidthLong="2" cv_ocl_current_preferredVectorWidthFloat="2" cv_ocl_current_preferredVectorWidthDouble="1" name="AllTests">
  <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="21.775">
    <testcase name="accuracy" status="run" time="21.774" classname="Imgproc_GaussianBlur" />
  </testsuite>
</testsuites>

0 Omid (TI) over 8 years ago in reply to Rogerio Almeida

TI__Intellectual 1470 points

Appreciate the help and getting this escalated!

0 Siby James over 8 years ago in reply to Omid (TI)

TI__Expert 5140 points

+Siby

0 DjDjS over 8 years ago in reply to Bradford Barr

Intellectual 750 points

Please note that OpenCL kernels are compiled on-the-fly, so there is significant delay to online (cl6x DSP compiler running on A15) compilation (as long as few seconds but also up to 1-2 minutes).

So if a kernel is executed few times only (same program), overall time is dominated by the compilation time. This time however is needed once. If the kernel is invoked multiple time, subsequent calls should take only the execution time.

If you set TI_OCL_CACHE_KERNELS variable (http://downloads.ti.com/mctools/esd/docs/opencl/environment_variables.html#envvar-TI_OCL_CACHE_KERNELS, ) next calls should not trigger kernel compilation.

We will also take a look at this particular test case.

0 DjDjS over 8 years ago in reply to DjDjS

Intellectual 750 points

Please find modified script with above variable set - and run the script at least twice:

#!/bin/sh
export TI_OCL_CACHE_KERNELS=Y
export TI_OCL_KEEP_FILES=Y
export OPENCV_BUILDDIR=/usr/share/OpenCV/samples
export OPENCV_TEST_DATA_PATH=/usr/share/OpenCV/testdata

echo "===========================================CPU========================================================"
unset OPENCV_OPENCL_DEVICE
/usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_cpu.xml
echo "===========================================DSP========================================================"
export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
/usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_dsp.xml

If CACHE_KERNELS is not set, DSP test time is close to 20seconds. If this environment variable is set, it drops to 3+ seconds (on 2nd, and any subsequent run):

root@am57xx-evm:/usr/share/OpenCV/titestsuite# cat gauss_cpu.xml
<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T02:38:45" time="3.328" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl="disabled" name="AllTests">
<testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.327">
<testcase name="accuracy" status="run" time="3.327" classname="Imgproc_GaussianBlur" />
</testsuite>
</testsuites>
root@am57xx-evm:/usr/share/OpenCV/titestsuite# cat gauss_dsp.xml
<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T02:38:49" time="3.356" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl_platform_0_device_0="(Platform=TI AM57x)(Type=unknown)(Name=TI Multicore C66 DSP)(Version=OpenCL 1.1 TI )" cv_ocl_current_deviceType="TI_DSP" cv_ocl_current_deviceName="TI Multicore C66 DSP" cv_ocl_current_deviceVersion="OpenCL 1.1 TI " cv_ocl_current_maxComputeUnits="2" cv_ocl_current_maxWorkGroupSize="1024" cv_ocl_current_localMemSize="131072" cv_ocl_current_maxMemAllocSize="150994944" cv_ocl_current_haveDoubleSupport="1" cv_ocl_current_hostUnifiedMemory="1" cv_ocl_current_AmdBlas="0" cv_ocl_current_AmdFft="0" cv_ocl_current_preferredVectorWidthChar="8" cv_ocl_current_preferredVectorWidthShort="4" cv_ocl_current_preferredVectorWidthInt="2" cv_ocl_current_preferredVectorWidthLong="2" cv_ocl_current_preferredVectorWidthFloat="2" cv_ocl_current_preferredVectorWidthDouble="1" name="AllTests">
<testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.355">
<testcase name="accuracy" status="run" time="3.355" classname="Imgproc_GaussianBlur" />
</testsuite>
</testsuites>

0 Bradford Barr over 8 years ago in reply to DjDjS

Prodigy 70 points

Setting TI_OCL_CACHE_KERNELS and TI_OCL_KEEP_FILES helps a lot, but I don't see the 1.9x speedup the marketing material promises. I've added --gtest_repeat=5 to each test to try to use the cached kernels. Attached are the xml outputs of my test script, my test script and my logs. Why are the A15 numbers so bad on the marketing site? I've never seen a Gaussian blur take 700ms. My DSPs are a bit better than reported on the marketing site (340 instead of 360ms). I've also added Imgproc_Feature2D tests to check their performance. The DSP and the CPU perform near identically at 340ms, but the marketing material claims the DSP runs at 240ms. What am I missing?

http://www.ti.com/lsds/ti/processors/technology/libraries/open-cv-libraries.page

Fullscreen script.txt Download

#!/bin/sh
export TI_OCL_CACHE_KERNELS=Y
export TI_OCL_KEEP_FILES=Y
export OPENCV_BUILDDIR=/usr/share/OpenCV/samples
export OPENCV_TEST_DATA_PATH=/home/root/testdata

echo "===========================================CPU========================================================"
unset OPENCV_OPENCL_DEVICE
/usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_cpu.xml --gtest_repeat=5
/usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Filter2* --gtest_output=xml:filter2d_cpu.xml --gtest_repeat=5

echo "===========================================DSP========================================================"
export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
/usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Gauss* --gtest_output=xml:gauss_dsp.xml --gtest_repeat=5
/usr/share/OpenCV/samples/bin/opencv_test_imgproc --gtest_filter=Img*Filter2* --gtest_output=xml:filter2d_dsp.xml --gtest_repeat=5

Fullscreen 2604.test.txt Download

===========================================CPU========================================================
CTEST_FULL_OUTPUT
OpenCV version: 3.1.0
OpenCV VCS version: unknown
Build type: release
Parallel framework: tbb
CPU features: neon
OpenCL is disabled

Repeating all tests (iteration 1) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3388 ms)
[----------] 1 test from Imgproc_GaussianBlur (3389 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3391 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 2) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3341 ms)
[----------] 1 test from Imgproc_GaussianBlur (3342 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3346 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 3) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3342 ms)
[----------] 1 test from Imgproc_GaussianBlur (3344 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3347 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 4) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3387 ms)
[----------] 1 test from Imgproc_GaussianBlur (3387 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3390 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 5) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3384 ms)
[----------] 1 test from Imgproc_GaussianBlur (3384 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3384 ms total)
[  PASSED  ] 1 test.
CTEST_FULL_OUTPUT
OpenCV version: 3.1.0
OpenCV VCS version: unknown
Build type: release
Parallel framework: tbb
CPU features: neon
OpenCL is disabled

Repeating all tests (iteration 1) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3464 ms)
[----------] 1 test from Imgproc_Filter2D (3464 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3464 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 2) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3436 ms)
[----------] 1 test from Imgproc_Filter2D (3436 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3438 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 3) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3479 ms)
[----------] 1 test from Imgproc_Filter2D (3480 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3482 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 4) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3480 ms)
[----------] 1 test from Imgproc_Filter2D (3482 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3485 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 5) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3404 ms)
[----------] 1 test from Imgproc_Filter2D (3405 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3409 ms total)
[  PASSED  ] 1 test.
===========================================DSP========================================================
CTEST_FULL_OUTPUT
OpenCV version: 3.1.0
OpenCV VCS version: unknown
Build type: release
Parallel framework: tbb
CPU features: neon
OpenCL Platforms: 
    TI AM57x
        unknown: TI Multicore C66 DSP (OpenCL 1.1 TI )
Current OpenCL device: 
    Type = TI_DSP
    Name = TI Multicore C66 DSP
    Version = OpenCL 1.1 TI 
    Compute units = 2
    Max work group size = 1024
    Local memory size = 128 kB 
    Max memory allocation size = 144 MB 
    Double support = Yes
    Host unified memory = Yes
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 8
    Preferred vector width short = 4
    Preferred vector width int = 2
    Preferred vector width long = 2
    Preferred vector width float = 2
    Preferred vector width double = 1

Repeating all tests (iteration 1) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3408 ms)
[----------] 1 test from Imgproc_GaussianBlur (3408 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3408 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 2) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3372 ms)
[----------] 1 test from Imgproc_GaussianBlur (3372 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3372 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 3) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3377 ms)
[----------] 1 test from Imgproc_GaussianBlur (3377 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3377 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 4) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3395 ms)
[----------] 1 test from Imgproc_GaussianBlur (3395 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3396 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 5) . . .

Note: Google Test filter = Img*Gauss*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_GaussianBlur
[ RUN      ] Imgproc_GaussianBlur.accuracy
[       OK ] Imgproc_GaussianBlur.accuracy (3377 ms)
[----------] 1 test from Imgproc_GaussianBlur (3377 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3377 ms total)
[  PASSED  ] 1 test.
CTEST_FULL_OUTPUT
OpenCV version: 3.1.0
OpenCV VCS version: unknown
Build type: release
Parallel framework: tbb
CPU features: neon
OpenCL Platforms: 
    TI AM57x
        unknown: TI Multicore C66 DSP (OpenCL 1.1 TI )
Current OpenCL device: 
    Type = TI_DSP
    Name = TI Multicore C66 DSP
    Version = OpenCL 1.1 TI 
    Compute units = 2
    Max work group size = 1024
    Local memory size = 128 kB 
    Max memory allocation size = 144 MB 
    Double support = Yes
    Host unified memory = Yes
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 8
    Preferred vector width short = 4
    Preferred vector width int = 2
    Preferred vector width long = 2
    Preferred vector width float = 2
    Preferred vector width double = 1

Repeating all tests (iteration 1) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3498 ms)
[----------] 1 test from Imgproc_Filter2D (3498 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3503 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 2) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3410 ms)
[----------] 1 test from Imgproc_Filter2D (3411 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3415 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 3) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3471 ms)
[----------] 1 test from Imgproc_Filter2D (3471 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3474 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 4) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3458 ms)
[----------] 1 test from Imgproc_Filter2D (3458 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3459 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 5) . . .

Note: Google Test filter = Img*Filter2*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Imgproc_Filter2D
[ RUN      ] Imgproc_Filter2D.accuracy
[       OK ] Imgproc_Filter2D.accuracy (3411 ms)
[----------] 1 test from Imgproc_Filter2D (3412 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3416 ms total)
[  PASSED  ] 1 test.

Fullscreen 6332.gauss_cpu.xml Download

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T16:49:01" time="3.384" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl="disabled" name="AllTests">
  <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.384">
    <testcase name="accuracy" status="run" time="3.384" classname="Imgproc_GaussianBlur" />
  </testsuite>
</testsuites>

Fullscreen 4278.gauss_dsp.xml Download

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T16:49:36" time="3.377" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl_platform_0_device_0="(Platform=TI AM57x)(Type=unknown)(Name=TI Multicore C66 DSP)(Version=OpenCL 1.1 TI )" cv_ocl_current_deviceType="TI_DSP" cv_ocl_current_deviceName="TI Multicore C66 DSP" cv_ocl_current_deviceVersion="OpenCL 1.1 TI " cv_ocl_current_maxComputeUnits="2" cv_ocl_current_maxWorkGroupSize="1024" cv_ocl_current_localMemSize="131072" cv_ocl_current_maxMemAllocSize="150994944" cv_ocl_current_haveDoubleSupport="1" cv_ocl_current_hostUnifiedMemory="1" cv_ocl_current_AmdBlas="0" cv_ocl_current_AmdFft="0" cv_ocl_current_preferredVectorWidthChar="8" cv_ocl_current_preferredVectorWidthShort="4" cv_ocl_current_preferredVectorWidthInt="2" cv_ocl_current_preferredVectorWidthLong="2" cv_ocl_current_preferredVectorWidthFloat="2" cv_ocl_current_preferredVectorWidthDouble="1" name="AllTests">
  <testsuite name="Imgproc_GaussianBlur" tests="1" failures="0" disabled="0" errors="0" time="3.377">
    <testcase name="accuracy" status="run" time="3.377" classname="Imgproc_GaussianBlur" />
  </testsuite>
</testsuites>

Fullscreen filter2d_cpu.xml Download

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T16:49:18" time="3.409" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl="disabled" name="AllTests">
  <testsuite name="Imgproc_Filter2D" tests="1" failures="0" disabled="0" errors="0" time="3.405">
    <testcase name="accuracy" status="run" time="3.404" classname="Imgproc_Filter2D" />
  </testsuite>
</testsuites>

Fullscreen filter2d_dsp.xml Download

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2017-05-31T16:49:53" time="3.416" cv_version="3.1.0" cv_vcs_version="unknown" cv_build_type="release" cv_parallel_framework="tbb" cv_cpu_features="neon" cv_ocl_platform_0_device_0="(Platform=TI AM57x)(Type=unknown)(Name=TI Multicore C66 DSP)(Version=OpenCL 1.1 TI )" cv_ocl_current_deviceType="TI_DSP" cv_ocl_current_deviceName="TI Multicore C66 DSP" cv_ocl_current_deviceVersion="OpenCL 1.1 TI " cv_ocl_current_maxComputeUnits="2" cv_ocl_current_maxWorkGroupSize="1024" cv_ocl_current_localMemSize="131072" cv_ocl_current_maxMemAllocSize="150994944" cv_ocl_current_haveDoubleSupport="1" cv_ocl_current_hostUnifiedMemory="1" cv_ocl_current_AmdBlas="0" cv_ocl_current_AmdFft="0" cv_ocl_current_preferredVectorWidthChar="8" cv_ocl_current_preferredVectorWidthShort="4" cv_ocl_current_preferredVectorWidthInt="2" cv_ocl_current_preferredVectorWidthLong="2" cv_ocl_current_preferredVectorWidthFloat="2" cv_ocl_current_preferredVectorWidthDouble="1" name="AllTests">
  <testsuite name="Imgproc_Filter2D" tests="1" failures="0" disabled="0" errors="0" time="3.412">
    <testcase name="accuracy" status="run" time="3.411" classname="Imgproc_Filter2D" />
  </testsuite>
</testsuites>

0 Bradford Barr over 8 years ago in reply to Bradford Barr

Prodigy 70 points

Any updates?

0 DjDjS over 8 years ago in reply to Bradford Barr

Intellectual 750 points

Could you pls confirm what are the A15 and C66 clock frequencies, during your tests?

0 Bradford Barr over 8 years ago in reply to DjDjS

Prodigy 70 points

The A15 is clocked at 1.5 GHz. How can I check the C66 clock?

0 DjDjS over 8 years ago in reply to Bradford Barr

Intellectual 750 points

Please do:
omapconf show opp
You should see something like:
...
|-----------------------------------------------------------------------------------|
| | Temperature | Voltage | Frequency | OPerating Point |
|-----------------------------------------------------------------------------------|
| VDD_CORE / VDD_CORE0 | 61C / 141F | 0.000 V | | NOM |
| L3 | | | 266 MHz | |
| DMM | | | 266 MHz | |
| EMIF1 | | | 266 MHz | |
| EMIF2 | | | 266 MHz | |
| LP-DDR2 | | | 532 MHz | |
| L4 | | | 266 MHz | |
| IPU1 | | | 425 MHz | |
| Cortex-M4 Cores | | | 212 MHz | |
| IPU2 | | | 425 MHz | |
| Cortex-M4 Cores | | | 212 MHz | |
| DSS | | | 192 MHz | |
| BB2D | | | (354 MHz) (1) | |
| | | | | |
| VDD_MPU / VDD_CORE1 | 62C / 143F | 0.980 V | | NOM |
| MPU (CPU1 ON) | | | 1000 MHz | |
| | | | | |
| VDD_GPU / VDD_CORE2 | 62C / 143F | 1.020 V | | HIGH |
| GPU | | | 532 MHz | |
| | | | | |
| VDD_DSPEVE / VDD_CORE3 | 61C / 141F | 1.050 V | | UNKNOWN |
| DSP1 | | | 750 MHz | |
| DSP2 | | | 750 MHz | |
| EVE1 | | | (0 MHz) (1) | |
| EVE2 | | | (0 MHz) (1) | |
| | | | | |
| VDD_IVA / VDD_CORE4 | 61C / 141F | 1.800 V | | HIGH |
| IVA | | | (532 MHz) (1) | |
| | | | | |
|-----------------------------------------------------------------------------------|

Notes:
(1) Module is disabled, rate may not be relevant.

0 Bradford Barr over 8 years ago in reply to DjDjS

Prodigy 70 points

|-----------------------------------------------------------------------------------|
| | Temperature | Voltage | Frequency | OPerating Point |
|-----------------------------------------------------------------------------------|
| VDD_CORE / VDD_CORE0 | 44C / 111F | 0.000 V | | NOM |
| L3 | | | 266 MHz | |
| DMM | | | 266 MHz | |
| EMIF1 | | | 266 MHz | |
| EMIF2 | | | 266 MHz | |
| LP-DDR2 | | | 532 MHz | |
| L4 | | | 266 MHz | |
| IPU1 | | | (425 MHz) (1) | |
| Cortex-M4 Cores | | | (212 MHz) (1) | |
| IPU2 | | | (425 MHz) (1) | |
| Cortex-M4 Cores | | | (212 MHz) (1) | |
| DSS | | | 192 MHz | |
| BB2D | | | (354 MHz) (1) | |
| | | | | |
| VDD_MPU / VDD_CORE1 | 40C / 104F | 1.030 V | | NOM |
| MPU (CPU1 ON) | | | 1000 MHz | |
| | | | | |
| VDD_GPU / VDD_CORE2 | 42C / 107F | 1.020 V | | NOM |
| GPU | | | 425 MHz | |
| | | | | |
| VDD_DSPEVE / VDD_CORE3 | 44C / 111F | 0.980 V | | UNKNOWN |
| DSP1 | | | (600 MHz) (1) | |
| DSP2 | | | (600 MHz) (1) | |
| EVE1 | | | (0 MHz) (1) | |
| EVE2 | | | (0 MHz) (1) | |
| | | | | |
| VDD_IVA / VDD_CORE4 | 43C / 109F | 1.800 V | | NOM |
| IVA | | | (388 MHz) (1) | |
| | | | | |
|-----------------------------------------------------------------------------------|

Looks like the DSPs are running at 600MHz

0 DjDjS over 8 years ago in reply to Bradford Barr

Intellectual 750 points

This (C66 clock) may have been modified in DT of Phytec board.
C66 frequency is fixed, i.e. not adapted during execution, whereas A15 is using DVFS to tune frequency based on load.
omapconf shows current state only.

There are some stats you can check @ /sys/devices/system/cpu/cpu0/cpufreq/stats/ (A15 frequencies and transitions if adaptive clocking is used).

Please check script >> git.ti.com/.../optimize-benchmark.sh
You can set A15 to run at max speed and not do cpu governer (dvfs).

0 Bradford Barr over 8 years ago in reply to DjDjS

Prodigy 70 points

I don't see any references to the C66 clock in the Phytec device trees. Am I missing something?

stash.phytec.com/.../am57xx-phycore-rdk.dts
stash.phytec.com/.../am57xx-phycore-som.dtsi

I'll try to repro with the performance governor later today. I'll report back the results.

0 DjDjS over 8 years ago in reply to Bradford Barr

Intellectual 750 points

DSP clock value could be defined in u-boot: I have forwarded question to colleagues more familiar with this topic.

0 DjDjS over 8 years ago in reply to DjDjS

Intellectual 750 points

Our latest Processor Linux SDK release 3.3.0.4 (1Q2017) release has this set to higher clock. Also few earlier releases.

This feature is defined in u-boot configuration file. It will be good to check your u-boot config file, if you can see:

CONFIG_DRA7_DSPEVE_OPP_HIGH=y

0 Bradford Barr over 8 years ago in reply to DjDjS

Prodigy 70 points

Below is the configuration for my development board. The CONFIG_DRA7_DSPEVE_OPP_HIGH is set to Y.

CONFIG_ARM=y

CONFIG_OMAP54XX=y

CONFIG_TARGET_AM57XX_PHYCORE_RDK=y

CONFIG_DM_SERIAL=y

CONFIG_DM_GPIO=y

CONFIG_ARMV7_LPAE=y

CONFIG_SPL_STACK_R_ADDR=0x82000000

CONFIG_DEFAULT_DEVICE_TREE="am57xx-phycore-rdk"

CONFIG_SPL=y

CONFIG_SPL_STACK_R=y

CONFIG_HUSH_PARSER=y

CONFIG_CMD_BOOTZ=y

# CONFIG_CMD_IMLS is not set

CONFIG_CMD_ASKENV=y

# CONFIG_CMD_FLASH is not set

CONFIG_CMD_MMC=y

CONFIG_CMD_SPI=y

CONFIG_CMD_I2C=y

CONFIG_CMD_USB=y

CONFIG_CMD_DFU=y

CONFIG_CMD_GPIO=y

# CONFIG_CMD_SETEXPR is not set

CONFIG_CMD_DHCP=y

CONFIG_CMD_MII=y

CONFIG_CMD_PING=y

CONFIG_CMD_REGULATOR=y

CONFIG_CMD_EXT2=y

CONFIG_CMD_EXT4=y

CONFIG_CMD_EXT4_WRITE=y

CONFIG_CMD_FAT=y

CONFIG_CMD_FS_GENERIC=y

CONFIG_OF_CONTROL=y

CONFIG_DM=y

CONFIG_DM_MMC=y

CONFIG_SPI_FLASH=y

CONFIG_SPI_FLASH_BAR=y

CONFIG_SYS_NS16550=y

CONFIG_USB=y

CONFIG_USB_DWC3=y

CONFIG_USB_DWC3_GADGET=y

CONFIG_USB_DWC3_OMAP=y

CONFIG_USB_DWC3_PHY_OMAP=y

CONFIG_USB_GADGET=y

CONFIG_USB_GADGET_DOWNLOAD=y

CONFIG_G_DNL_MANUFACTURER="Texas Instruments"

CONFIG_G_DNL_VENDOR_NUM=0x0451

CONFIG_G_DNL_PRODUCT_NUM=0xd022

CONFIG_ERRNO_STR=y

CONFIG_FIT=y

CONFIG_SPL_OF_LIBFDT=y

CONFIG_SPL_LOAD_FIT=y

CONFIG_OF_LIST="am57xx-phycore-rdk"

CONFIG_OF_BOARD_SETUP=y

CONFIG_DRA7_DSPEVE_OPP_HIGH=y

CONFIG_DRA7_IVA_OPP_HIGH=y

CONFIG_DRA7_GPU_OPP_HIGH=y

CONFIG_DISK=y

CONFIG_DWC_AHCI=y

CONFIG_DM_ETH=y

CONFIG_DM_PMIC=y

CONFIG_PMIC_PALMAS=y

CONFIG_DM_REGULATOR=y

CONFIG_CMD_TIME=y

CONFIG_DM_I2C=y

CONFIG_DM_SPI=y

CONFIG_DM_SPI_FLASH=y

CONFIG_SPI_FLASH_STMICRO=y

CONFIG_TI_QSPI=y

CONFIG_CMD_SF=y

/opt/PHYTEC_BSPs/yocto_ti/build/arago-tmp-external-linaro-toolchain/work/am57xx_phycore_rdk-linux-gnueabi/u-boot-phytec/2016.05+git_v2016.05-phy2-r1/git/configs/am57xx_phycore_rdk_defconfig

Processors

Processors forum

AM57X: OpenCV tests