This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

openCL TI platform example on EVMK2H boards

Hello

I encountered a problem when i run openCL code on EVMK2H .

I have attempted to run TI openCL examples which can show some platform information ,and  the results show the Device operate at  0.059GHz frequency

As I know,AK2H12/14 SOC have eight DSP cores which can operate at 1.2 GHz frequency。

I also attempt to run same openCL code ,but TI DSP execution time is apparently higher than Intel CPU execution time.

So i think that maybe some setting should be revised to make DSP cores can run at higher frequency.

  • please help me ,thanks you
  • You might be hitting one of the overflow bug during the dsp speed computation. It has since been fixed, but not in your OpenCL version. Can you try the enclosed program? Put it in a dspspeed directory in parallel to your platforms example, borrow a Makefile from other examples, make and run. It should give you the correct dsp speed.

    MCSDK 3.1.4.7 is the last release in MCSDK release name and will no longer be updated. It has since been merged into Processor SDK. The latest Processor SDK including latest OpenCL releases as of today for K2H evm is 2.0.2.11 release and it can be found here:
    www.ti.com/.../overview.page
    www.ti.com/.../PROCESSOR-SDK-K2H
    software-dl.ti.com/.../index_FDS.html



    $ cat dspspeed.cpp
    /******************************************************************************
    * Copyright (c) 2013-2014, Texas Instruments Incorporated - http://www.ti.com/
    * All rights reserved.
    *
    * Redistribution and use in source and binary forms, with or without
    * modification, are permitted provided that the following conditions are met:
    * * Redistributions of source code must retain the above copyright
    * notice, this list of conditions and the following disclaimer.
    * * Redistributions in binary form must reproduce the above copyright
    * notice, this list of conditions and the following disclaimer in the
    * documentation and/or other materials provided with the distribution.
    * * Neither the name of Texas Instruments Incorporated nor the
    * names of its contributors may be used to endorse or promote products
    * derived from this software without specific prior written permission.
    *
    * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
    * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
    * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
    * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
    * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
    * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
    * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
    * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
    * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
    * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
    * THE POSSIBILITY OF SUCH DAMAGE.
    *****************************************************************************/
    #define __CL_ENABLE_EXCEPTIONS
    #include <CL/cl.hpp>
    #include <iostream>
    #include <cstdlib>
    #include <cassert>
    #include <signal.h>
    using namespace cl;

    const char * kernStr = "kernel void dspspeed(global float* buf) \n"
    "{\n"
    " const unsigned DSP_PLL = 122880000;\n"
    " char *BOOTCFG_BASE_ADDR = (char*)0x02620000;\n"
    " char *CLOCK_BASE_ADDR = (char*)0x02310000;\n"
    " int MAINPLLCTL0 = (*(int*)(BOOTCFG_BASE_ADDR + 0x350));\n"
    " int MULT = (*(int*)(CLOCK_BASE_ADDR + 0x110));\n"
    " int OUTDIV = (*(int*)(CLOCK_BASE_ADDR + 0x108));\n"
    "\n"
    " unsigned mult = 1 + ((MULT & 0x3F) | ((MAINPLLCTL0 & 0x7F000) >>6));\n"
    " unsigned prediv = 1 + (MAINPLLCTL0 & 0x3F);\n"
    " unsigned output_div = 1 + ((OUTDIV >> 19) & 0xF);\n"
    " float speed = (float)DSP_PLL * mult / prediv / output_div;\n"
    " buf[0] = speed / 1e9; \n"
    "}\n";

    const int size = 4;
    cl_float ary[1];

    int main(int argc, char *argv[])
    {
    /*-------------------------------------------------------------------------
    * Catch ctrl-c so we ensure that we call dtors and the dsp is reset properly
    *------------------------------------------------------------------------*/
    signal(SIGABRT, exit);
    signal(SIGTERM, exit);
    memset(ary, 0, size);

    try
    {
    Context context(CL_DEVICE_TYPE_ACCELERATOR);
    std::vector<Device> devices = context.getInfo<CL_CONTEXT_DEVICES>();
    Buffer buf (context, CL_MEM_WRITE_ONLY, size);
    Program::Sources source(1, std::make_pair(kernStr, strlen(kernStr)));
    Program program = Program(context, source);
    program.build(devices);

    CommandQueue Q (context, devices[0]);
    Kernel K (program, "dspspeed");
    KernelFunctor devset = K.bind(Q, NDRange(1), NDRange(1));

    devset(buf).wait(); // call the kernel and wait for completion

    Q.enqueueReadBuffer(buf, CL_TRUE, 0, size, ary);
    }
    catch (Error err)
    { std::cerr <<"ERROR: " <<err.what() <<"(" <<err.err() <<")" <<std::endl; }

    std::cout << "DSP speed is " << ary[0] << " GHz" << std::endl;
    }
  • thanks for your help
    and my console show my DSP operated at 1.2GHz when my boards run your dspspeed.cpp
    but could you tell me why average DSP execution time is 7~10 times more than intel CPU execution time ..
    (I try many examples codes and compute execution time by clGetEventProfilingInfo and also by clock_gettime)
    What could i do to optimize the performance and revise this problem so that i could achieve real-time signal processing on EVMK2H


    Thanks you
  • I would recommend starting with the OpenCL documentation:  

    Specifically, the section on optimization: 

    As Yuan suggested in the previous post, it would also be beneficial to update to the most recent Processor SDK (2.0.2.11, link in Yuan's post).

    Ajay