This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenCL resize the DSP L1D cache size

Expert 7680 points


Hi Champions,

I'm using the OpenCL to implement the algorithm, the original algorithm will use 16K L1D as the scratch memory, and I invoke the csl to resize the L1D cache in the OpenCL kernel, it's reported the Segmentation falut (11). Is this behavior expected?

Is it feasible for customer to use the L1D as the SRAM in OpenCL?

Zhan.

  • Zhan,

    OpenCL by default configures L1D to be all cache.  It should be possible for the DSP kernel that is dispatched to dynamically reconfigure the cache as scratchpad.  The start of the DSP kernel would need to writeback L1D, reconfigure the size, run your algorithm, and then reconfigure the size back to all cache before the DSP kernel ends.

    Again, this should be possible, but I have not implemented any examples using this technique and so it is untested.

  • Hi Alan,

    Thanks for your comments, I modify the code based on your comments, and it works. I have another question, if I use the OpenCL profile, according the opencl documents, the profiling result is a 64bit value in nanoseconds, is it from the DSP TSCL and TSCH? Or it use the ARM generic timer?

    Zhan 

  • Zhan,

    Which example were you trying? In the multinode_fftdemo code, the OpenCL profiling only measure the DSP performance.  It does not contain overhead for dispatching the kernel to DSP

    regards,

    David

  • The profiling timestamps in OpenCL events are created using the arm timer using

    clock_gettime(CLOCK_MONOTONIC, ...)

    If you query an event for

    CL_PROFILING_COMMAND_START and  CL_PROFILING_COMMAND_END,

    the delta between those 2 timestamps is the time from decision on ARM to dispatch the kernel to the time confirmation is received on ARM that kernel is complete and so it does include the dispatch overhead.

  • I use it in C++ code that I call from an OpenCL kernel, and it works fine.  For example:

      CACHE_setL1DSize(CACHE_L1_8KCACHE);

      typedef struct {
        int array1[4096], array2[2048];
      } L1_SRAM;

      L1_SRAM &l1_sram = * (L1_SRAM *) 0x00F00000;
      assert(sizeof l1_sram <= 24 * 1024);

      // use it here, e.g., memset(l1_sram.array1, 0, sizeof l1_sram.array1);

      CACHE_setL1DSize(CACHE_L1_MAXIM3); // at end of function

    John