In the related ticket it is stated that:
Yes your understanding is correct, MMAv1 ( as part of TDA4VM device) processes 64 output channels together and performs one multiplication for 64 spatial positions per channel in single cycle.
So it means 64x64 = 4K MAC ( or 8K OPs) every cycle and at 1 GHz operating frequency gives 8 TOPS.
Would like to verify,
What happens if require only 32 output channels? Would two channels be processed at the same time?