I plan to use TMS320C6678 to run an algorithm. According to technical document, TMS320C6678 is able to perform 256 16x16 bit fixed-point multiplies or 64 floating-point multiplies each clock cycle. My question is: How to implement this, by using certain instruction like MPY or by properly setting the pipeline?
Thanks a lot.
There are several techniques you can use to optimize your code and achieve optimum performance. You can go through the C6000 DSP Optimization Guide at http://www.ti.com/lit/an/sprabf2/sprabf2.pdf for more details. There is also a workshop that TI hosts on C6000 DSP optimization. You can find details and register for the workshop at http://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=4DW102260 or you can download the workshop collateral from http://processors.wiki.ti.com/index.php/TMS320C6000_DSP_Optimization_Workshop
--- If you need more help, please reply. If this answers your question, please Verify Answer below this post ---
Thank you very much for your reply. I find the document you provide quite useful. However, I think that only by using the programming optimization approaches the document offers is far from enough to achieve the calculation parallel degree that tms320c6678 can do. I still don't know how to implement 256 multipliers in one CPU cycle. Is there any example that illustrates this?
This can only be achieved through special types of instruction, such as CMATMPY. But I think it's just a figure to reference if the instruction doesn't fit the realization of your application. In other words, it's related to whether you are able to prepare so much data into proper registers before execution or is there any correlation between these multiplications according to your algorithm(e.g Multiplier#0 need the result of Multiplier #1 ), and so on.
So generally speaking, I don't think it's very meaningful to deeply dig the multiplier ability rather than make the optimization which is suitable and feasible according to your target application. It's all my opinion, welcome the further discussion.
Please press the "Verify Answer" button if you think the post is helpful to your question.Thanks.
How can I use instructions, such as CMATMPY and FMPYSP. When the main framework is in c language format, how to insert these instructions?
Hi ,please refer to my reply of another thread , http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/170468.aspx
Is there any instruction that can perform sixteen 16x16 bit signed real-valued multipliers a clock cycle? The instruction CMATMPY you mentioned performs complex conjugate matrix multiply, which does not fit my appliction. What I want to implement is as following:
s1(1)*s2(1)=d1; s1(2)*s2(2)=d2; s1(3)*s2(3)=d3; s1(4)*s2(4)=d4; s1(5)*s2(5)=d5; s1(6)*s2(6)=d6; s1(7)*s2(7)=d7; s1(8)*s2(8)=d8;
s1(9)*s2(9)=d9; s1(10)*s2(10)=d10; s1(11)*s2(11)=d11; s1(12)*s2(12)=d12; s1(13)*s2(13)=d13; s1(14)*s2(14)=d14; s1(15)*s2(15)=d15; s1(16)*s2(16)=d16;
Can the above sixteen multipliers be implemented via utilizing certain SIMD instruction?
Thanks very much!
Supposed that all the data is ready before the multiplication.
s1(1) -> A16l, s1(2) -> A16h
s1(3) -> A17l, s1(4) -> A17h
s2(1) -> A18l, s2(2) -> A18h
s2(3) -> A19l, s2(4) -> A19h
s1(9) -> B16l, s1(10) -> B16h
s1(11) -> B17l, s1(12) -> B17h
s2(9) -> B18l, s2(10) -> B18h
s2(11) -> B19l, s2(12) -> B19h
Then execute the MPY operation as:
So I'm afraid that you need 2 cycle to complete the calculation using DMPY2 instruction.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.