This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VMXEVM: c7x mma performance

Part Number: TDA4VMXEVM

What is the performance of mma when using small 3x3 kernels on uchar64 data? I am trying to figure out how well it will work for image processing like when running a 3x3 convolution kernel on a 1024x1024 black white uchar image. Looking at the spec it looks like a 64x64 * 64x1 mma for 8bit data but I didn't see a way to use it efficiently for smaller kernels. At 100% utilization of macs I would expect 1024*1024*(3x3)/(64*64)=2304 clk cycles.

I saw some perf numbers in the mmalib psdk_rtos_auto_j7_06_02_00_21/mmalib_01_01_00_00/docs/user_guide/performance_summary.html but the fir filter ones seem to be for larger operations that fit the mma vector width.