Hi,
We are working on the Keystone 2 66AK2H12 evm to execute convolution of a 32x32 image on the DSP cores using OpenCL. We are storing the image and the mask in MSMC for better performance but the results are not as expected. We tried keeping the image and mask in the DDR to compare the performance, but the performance turned out to be similar. As per our understanding, keeping the image and mask in the MSMC should improve the time for access and hence the performance.
We use the following instructions to create space in the MSMC for the image, mask and output:
Output = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_USE_MSMC_TI, mem_size_A, NULL, &err);
Image = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR | CL_MEM_USE_MSMC_TI, mem_size_A, h_A, &err);
Mask = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR | CL_MEM_USE_MSMC_TI, mem_size_B, h_B, &err);
The timing we got for the execution is as follows:
execution time with CL_MEM_USE_MSMC_TI = 338 microseconds
execution time without CL_MEM_USE_MSMC_TI = 329 microseconds
Is there anything extra we need to do for creating the MSMC buffer space? Also can you give us more suggestions on improving the performance?
Thanks,
Faizan