This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenCL runtime does not free __local variables

It seems that the new OpenCL runtime of the K2H platform does not deallocate __local memory after use.  Consider two different kernels:

__kernel void foo(...)
{
  __local char some_array[767 * 1024];

 ...

}

and


__kernel void bar(...)
{
  __local char other_array[2 * 1024];

  ...

}

The memory is allocated from the 768K L2 area.  After foo() has finished, the runtime could deallocate "some_array", so that bar() could reuse the memory.  But this is not the way it is implemented; the application runs out of memory and aborts.  Is there a fix or work-around for this (other than not allocating anything at all and just using memory starting from 0x00800000?)

Thanks,  John Romein

  • Hi John,

        That is a correct observation.  Right now we haven't done anything special to __local variables declared in the kernel program, they are put into kernel binary and loaded into L2 along with kernel binary (instead of being allocated in L2 per kernel invocation instance).  It is one of the limitations of our current implementation.

        There is a workaround: instead of defining local variables (especially for large local arrays), please declare local arguments to the kernel, and set the local argument size on the host application side.  For example, in your kernel program,

    __kernel void foo(__local char *some_array, ...)

    {

        ....

    }

    __kernel void bar(__local char *other_array, ...)

    {

        ...

    }

    In your host OpenCL application program, do the following (I am using C++ bindings):

    ...

    foo_kernel.setArg(0, __local(767*1024*sizeof(cl_char)));

    ...

    Q.enqueueNDRangeKernel(foo_kernel, ...);

    ...

    bar_kernel.setArg(0, __local(2*1024*sizeof(cl_char)));

    ...

    Q.enqueueNDRangeKernel(bar_kernel, ...);

    Local arguments to the kernel are allocated/de-allocated per kernel invocation instance.  So, you could have multiple kernel tasks running concurrently, each has its own local arguments setup, because each core does have its own L2 cache.


    Thanks for bringing this up!


    - Yuan