Hi,
I understand from SPRUIP0.pdf that MMA has an A vector, a B matrix as well as a C matrix , for 8-bit elements , A is 64 elements vector , B and C is 64x64 matrix (Ignore the presence of multiple instances of B and C) , The calculations supported by the MMA unit are:
C f|b = (A×B f )
C f|b = -(A×B f )
C f|b = C f +(A×B f )
C f|b = C f -(A×B f )
My questions:
1. vector A product matrix B should be a 64-elements vector , why C is 64x64 matrix , or just one of the row ?
2. When I need to replace the B matrix, only 512 bits can be loaded each time, which means that it needs to be loaded 64 times?
3. How to specify which calculation to perform as described above?
Thanks and regards.