• Resolved

TMS320C6748: Number of clock cycle for floating point operation

Part Number: TMS320C6748


In TMS320c6748  manual it is written that 

– 2 SP × SP → SP Per Clock

– 2 SP × SP → DP Every Two Clocks

– 2 SP × DP → DP Every Three Clocks

– 2 DP × DP → DP Every Four Clocks

But when I do assembly programming I see that MPYSP to multiply floating point numbers takes 4 clock cycles. Then why is it written that we can multiply two single precision numbers in every clock cycle. How can I acheive that? Please help. Thanks in advance.

My requirement is to get dotproduct of two floating point arrays where one is byte aligned and other not byte alighned.

With Regards


  • Hi,

    I've notified the design team. They will post their feedback directly here.

    Best Regards,


     Please make sure you read the forum guidelines first.

  • Guru 83140 points


    If you would like us to comment on specific excerpts from a C6748 document, please tell us the document name/number and exactly where the excerpt is found. General comments are not so easy to find when we have so many C6748 manuals and documents to search.

    You will probably get the best answer to your question by looking through the TMS320C6000 DSP Optimization Workshop material. You can download the student guide and look through it for figures and explanations of the C6000 pipeline architecture. The C6000 and C674x are complex VLIW processors with a lot of capability when programmed to take best advantage of their architecture.

    We strongly recommend that you write your program in C and use the compiler optimization switches and #pragma's to achieve your optimization goals. Writing in assembly is very difficult without a thorough understanding of the underlying architecture and the behavior of each individual instruction. The C compiler will generate very optimized code, and at the very least, you can use the assembly output of the compiler to help you learn the best way to write you own assembly code if you prefer that.

    The C674x CPU & Instruction Reference Guide has the details of the processor architecture and the information on the pipeline and usage for each instruction. For the MPYSP instruction, for example, it shows that the instruction is a "4 cycle" instruction with "3 delay slots" but has 1 cycle functional unit latency. You will learn the differences between these three terms and how to use them in the processor when you study the workshop material and the C674x CPU & Instruction Reference Guide.

    The C6748 process can actually do two (2) MPYSP instructions per clock cycle.

    And you may find the Dot-product-specific instructions useful. Those can be used from within C code using the C intrinsics, found in the C Compiler User Guide.

    But, please consider to write your application in C first. Get it working functionally, then measure performance and use C-based optimizations until you get the performance your application requires.


    Search for answers, Ask a question, click  Resolved  when complete, Help others, Learn more.