Part Number: TMS320C6748
In TMS320c6748 manual it is written that
– 2 SP × SP → SP Per Clock
– 2 SP × SP → DP Every Two Clocks
– 2 SP × DP → DP Every Three Clocks
– 2 DP × DP → DP Every Four Clocks
But when I do assembly programming I see that MPYSP to multiply floating point numbers takes 4 clock cycles. Then why is it written that we can multiply two single precision numbers in every clock cycle. How can I acheive that? Please help. Thanks in advance.
My requirement is to get dotproduct of two floating point arrays where one is byte aligned and other not byte alighned.
Please make sure you read the forum guidelines first.
If you would like us to comment on specific excerpts from a C6748 document, please tell us the document name/number and exactly where the excerpt is found. General comments are not so easy to find when we have so many C6748 manuals and documents to search.
You will probably get the best answer to your question by looking through the TMS320C6000 DSP Optimization Workshop material. You can download the student guide and look through it for figures and explanations of the C6000 pipeline architecture. The C6000 and C674x are complex VLIW processors with a lot of capability when programmed to take best advantage of their architecture.
We strongly recommend that you write your program in C and use the compiler optimization switches and #pragma's to achieve your optimization goals. Writing in assembly is very difficult without a thorough understanding of the underlying architecture and the behavior of each individual instruction. The C compiler will generate very optimized code, and at the very least, you can use the assembly output of the compiler to help you learn the best way to write you own assembly code if you prefer that.
The C674x CPU & Instruction Reference Guide has the details of the processor architecture and the information on the pipeline and usage for each instruction. For the MPYSP instruction, for example, it shows that the instruction is a "4 cycle" instruction with "3 delay slots" but has 1 cycle functional unit latency. You will learn the differences between these three terms and how to use them in the processor when you study the workshop material and the C674x CPU & Instruction Reference Guide.
The C6748 process can actually do two (2) MPYSP instructions per clock cycle.
And you may find the Dot-product-specific instructions useful. Those can be used from within C code using the C intrinsics, found in the C Compiler User Guide.
But, please consider to write your application in C first. Get it working functionally, then measure performance and use C-based optimizations until you get the performance your application requires.
Search for answers, Ask a question, click Resolved when complete, Help others, Learn more.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.