OpenCL resize the DSP L1D cache size

Z.X

Hi Champions,

I'm using the OpenCL to implement the algorithm, the original algorithm will use 16K L1D as the scratch memory, and I invoke the csl to resize the L1D cache in the OpenCL kernel, it's reported the Segmentation falut (11). Is this behavior expected?

Is it feasible for customer to use the L1D as the SRAM in OpenCL?

Zhan.

over 11 years ago

0 award over 11 years ago

TI__Intellectual 1400 points

Zhan,

OpenCL by default configures L1D to be all cache. It should be possible for the DSP kernel that is dispatched to dynamically reconfigure the cache as scratchpad. The start of the DSP kernel would need to writeback L1D, reconfigure the size, run your algorithm, and then reconfigure the size back to all cache before the DSP kernel ends.

Again, this should be possible, but I have not implemented any examples using this technique and so it is untested.

0 Z.X over 11 years ago in reply to award

TI__Expert 7680 points

Hi Alan,

Thanks for your comments, I modify the code based on your comments, and it works. I have another question, if I use the OpenCL profile, according the opencl documents, the profiling result is a 64bit value in nanoseconds, is it from the DSP TSCL and TSCH? Or it use the ARM generic timer?

Zhan

0 dzhou over 11 years ago in reply to Z.X

TI__Genius 9065 points

Zhan,

Which example were you trying? In the multinode_fftdemo code, the OpenCL profiling only measure the DSP performance. It does not contain overhead for dispatching the kernel to DSP

regards,

David

0 award over 11 years ago in reply to Z.X

TI__Intellectual 1400 points

The profiling timestamps in OpenCL events are created using the arm timer using

clock_gettime(CLOCK_MONOTONIC, ...)

If you query an event for

CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END,

the delta between those 2 timestamps is the time from decision on ARM to dispatch the kernel to the time confirmation is received on ARM that kernel is complete and so it does include the dispatch overhead.

0 John Romein over 11 years ago in reply to award

Intellectual 300 points

I use it in C++ code that I call from an OpenCL kernel, and it works fine. For example:

CACHE_setL1DSize(CACHE_L1_8KCACHE);

typedef struct {
int array1[4096], array2[2048];
} L1_SRAM;

L1_SRAM &l1_sram = * (L1_SRAM *) 0x00F00000;
assert(sizeof l1_sram <= 24 * 1024);

// use it here, e.g., memset(l1_sram.array1, 0, sizeof l1_sram.array1);

CACHE_setL1DSize(CACHE_L1_MAXIM3); // at end of function

John

Processors

Processors forum

OpenCL resize the DSP L1D cache size