Hi, allOn EVMK2H12, I used "cblas_sgemm" from Processor-SDK to calculate large matrix. I know the interface has been optimized based on OpenCL on 8 DSP cores. But the result is not satisfied according to performance requirement.I've checked the performance for M=N=K=1000, it is 0.027s as same as TI said (www.ti.com/.../linear-algebra-libraries.pageHowever when the values are large such as M=10,N=200,000,K=30, it is 0.277s. Well, maybe this kind of calculation has reached the limit of performance for all DSP cores. So I think it could be improved if the input data format is short or fp16 (half precision float-point), which is 2 bytes length instead of 4 bytes like float.Does anyone know if the SDK supports sgemm with 2 bytes data format? I didn't find anything by searching SDK manual and cblas.h. If not, is there any plan on this topic for TI?BTW, I had used fp16 data format on matrix calculation based on NVIDIA platform, which improves performance significantly (NV supports fp16 in CUDA library)Thanks very much for any help on this topic!
In reply to jianzhongxu:
Thanks for your information, jianzhong!
All the things is clear in my side now.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.