This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: opencl matmpy performance

Part Number: AM5728

Tool/software: Linux

Hi,

Our customer is evaluating the following Opencl samples of Processor SDK linux.

example-applications/opencl-examples-1.1.14.10/matmpy

So they changed the size of the array and confirmed how the performance will change.
Up to DIM = 512, the DSP and CPUx2 have the same performance, but beyond this, the performance of the DSP becomes extremely poor,
With DIM = 1024 or higher, it was sometimes slower than CPUx1.

As a result of profiling, it takes time to Queue→Submit, regardless of DIM, and it takes 70 to 90% of the whole.

Please tell me the reason for this.
Also, is there any workaround?

Best Regards,
Shigehiro Tsuda