Part Number: AM5728
Tool/software: Linux
Hi,
Our customer is evaluating the following Opencl samples of Processor SDK linux.
example-applications/opencl-examples-1.1.14.10/matmpy
So they changed the size of the array and confirmed how the performance will change.
Up to DIM = 512, the DSP and CPUx2 have the same performance, but beyond this, the performance of the DSP becomes extremely poor,
With DIM = 1024 or higher, it was sometimes slower than CPUx1.
As a result of profiling, it takes time to Queue→Submit, regardless of DIM, and it takes 70 to 90% of the whole.
Please tell me the reason for this.
Also, is there any workaround?
Best Regards,
Shigehiro Tsuda