Hi ,I am now evaluating the performance of OpenCL on the 66AK platform, but with a problem I have now written a kernel function that simply calls a FFT function provided by TI, and I measured the time_consuming of different point.And I compare it with diretly running on the DSP cores not using opencl.And I found the performance of using Opencl is much lower than not using Opencl.SO I am very puzzled because the function is same and the input data and output data are all allocated in L3.I hope somebody can help me. Thanks!