This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Hypothesis - You can use c7x DSP and MMA concurrently

Part Number: TDA4VM

Tool/software:

  • Is it possible for the MMA and C7x DSP to operate simultaneously without issues, or are there limitations due to how resources are shared between these two IPs? Could you please explain?

  • Can the maximum performance for the MMA (xx TOPS) and the DSP (xx GFLOPS) be achieved concurrently during benchmarking?

  • Hello,

    The MMA is tightly-coupled with the C7x core, and do not operate independently from each other. Due to this, performance is at a case-by-case basis depending on the algorithm implemented and the optimization techniques used. Instructions for both C7x and MMA will be pipelined together in a execution loop.

    Best,

    Asha

  • Thank you so much Asha, Can you please suggest a use-case where they will be working concurrently? Because TI advertise DSP GFLOPS and MMA TOPS separately; I am wondering if we can demonstrate peak performance advertised for both IPs simultaneously.

  • Hi,

    Let me clarify my previous post, when I refer to the C7x+MMA architecture being tightly-coupled and not independent from each other. Instructions that exercise the accelerator architecture and those that don't (purely C7x instructions) map to the same functional units on the hardware. The maximum performance numbers given for C7x and MMA are based on having access to all required functional units at a time. In that respect, you will not be able to reach the GFLOPS metric for C7x and the TOPS metric for MMA simultaneously, as instructions would be pipelined together by the compiler. 

    Best,

    Asha