• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Digital Signal Processors (DSP) » C6000 Multicore DSP » Keystone Multicore Forum (C66, 66A, AM5) » How to parallelly utilize DSP's MAC resources to do multiplications in one cycle?
Share
C6000 Multicore DSP
  • Forums
  • Announcements
Options
  • Subscribe via RSS
Training Available
TI provides self-paced online training that introduces the primary components of the KeyStone II family of SoC devices.

  • KeyStone II SoC Overview >
  • KeyStone II Software Overview >
  • KeyStone II ARM Cortex-A15 Corepac Overview >
  • More Information >
  • Check out
    Multicore Mix blog
    • $core_v2_blog.Current.Name

      Geeks UNITE for Geek Pride Day

      Posted 17 hours ago
      by Lauren Reed1
      Happy Geek Pride Day from the Processors team! We wanted to celebrate...
    • $core_v2_blog.Current.Name

      OpenMP - All aboard!

      Posted 2 days ago
      by Debbie Greenstreet
      With so many end products today relying on multicore DSPs for...
    • $core_v2_blog.Current.Name

      A look back: Two years of Multicore Mix

      Posted 3 days ago
      by Lauren Reed1
      A big thank you to everyone who participated in our contest last...

    Forums

    How to parallelly utilize DSP's MAC resources to do multiplications in one cycle?

    This question is not answered
    Yang Lu99085
    Posted by Yang Lu99085
    on Mar 06 2012 05:01 AM
    Intellectual590 points

    Hello,

    I plan to use TMS320C6678 to run an algorithm. According to technical document, TMS320C6678 is able to perform 256 16x16 bit fixed-point multiplies or 64 floating-point multiplies each clock cycle. My question is: How to implement this,  by using certain instruction like MPY or by properly setting the pipeline?

    Thanks a lot.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    All Replies
    • Uday
      Posted by Uday
      on Mar 07 2012 09:44 AM
      Expert3730 points

      Hello Yang,

      There are several techniques you can use to optimize your code and achieve optimum performance. You can go through the C6000 DSP Optimization Guide at http://www.ti.com/lit/an/sprabf2/sprabf2.pdf for more details. There is also a workshop that TI hosts on C6000 DSP optimization. You can find details and register for the workshop at http://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=4DW102260 or you can download the workshop collateral from http://processors.wiki.ti.com/index.php/TMS320C6000_DSP_Optimization_Workshop

      - Uday

      --- If you need more help, please reply. If this answers your question, please Verify Answer below this post ---

      optimization dsp
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Yang Lu99085
      Posted by Yang Lu99085
      on Mar 10 2012 01:09 AM
      Intellectual590 points

      Hello,

      Thank you very much for your reply. I find the document you provide quite useful. However, I think that only by using the programming optimization approaches the document offers is far from enough to achieve the calculation parallel degree that tms320c6678 can do. I still don't know how to implement 256 multipliers in one CPU cycle. Is there any example that illustrates this?

      Best regards,

      Yang

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Allen Lee
      Posted by Allen Lee
      on Mar 12 2012 02:18 AM
      Genius3500 points

      Hi Yang,

      This can only be achieved through special types of instruction, such as CMATMPY. But I think it's just a figure to reference if the instruction doesn't fit the realization of your application. In other words, it's related to whether you are able to prepare so much data into proper registers before execution or is there any correlation between these multiplications according to your algorithm(e.g Multiplier#0 need the result of Multiplier #1 ), and so on.

      So generally speaking, I don't think it's very meaningful to deeply dig the multiplier ability rather than make the optimization  which is suitable and feasible according to your target application. It's all my opinion, welcome the further discussion.

      Allen

      Please press the "Verify Answer" button if you think the post is helpful to your question.Thanks.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • may may92122
      Posted by may may92122
      on Mar 12 2012 22:57 PM
      Expert1030 points

      Hi Allen,

      How can I use instructions, such as CMATMPY and FMPYSP. When the main framework is in c language format, how to insert these instructions?

      Thanks,

      May

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Allen Lee
      Posted by Allen Lee
      on Mar 12 2012 23:26 PM
      Genius3500 points

      Hi ,please refer to my reply of another thread , http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/170468.aspx

      Please press the "Verify Answer" button if you think the post is helpful to your question.Thanks.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Yang Lu99085
      Posted by Yang Lu99085
      on Mar 28 2012 01:52 AM
      Intellectual590 points

      Hi Allen,

      Is there any instruction that can perform sixteen 16x16 bit signed real-valued multipliers a clock cycle?  The instruction CMATMPY you mentioned performs complex conjugate matrix multiply, which does not fit my appliction.  What I want to implement is as following:

      s1(1)*s2(1)=d1;  s1(2)*s2(2)=d2;  s1(3)*s2(3)=d3;  s1(4)*s2(4)=d4;  s1(5)*s2(5)=d5;  s1(6)*s2(6)=d6;  s1(7)*s2(7)=d7;  s1(8)*s2(8)=d8;

      s1(9)*s2(9)=d9;  s1(10)*s2(10)=d10;  s1(11)*s2(11)=d11;  s1(12)*s2(12)=d12;  s1(13)*s2(13)=d13;  s1(14)*s2(14)=d14;  s1(15)*s2(15)=d15;  s1(16)*s2(16)=d16; 

      Can the above sixteen multipliers be implemented via utilizing certain SIMD instruction?

      Thanks very much!

      Sincerely,

      Yang

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Allen Lee
      Posted by Allen Lee
      on Mar 28 2012 04:14 AM
      Genius3500 points

      Hi Yang,

      Supposed that all the data is ready before the multiplication.

      s1(1) -> A16l, s1(2) -> A16h

      s1(3) -> A17l, s1(4) -> A17h

      s2(1) -> A18l, s2(2) -> A18h

      s2(3) -> A19l, s2(4) -> A19h

      s1(9) -> B16l, s1(10) -> B16h

      s1(11) -> B17l, s1(12) -> B17h

      s2(9) -> B18l, s2(10) -> B18h

      s2(11) -> B19l, s2(12) -> B19h

      ……

      Then execute the MPY operation as:

         DMPY2 A17:A16,A19:18,A23:A22:A21:A20

      ||DMPY2 B17:B16,B19:B18,B23:B22:B21:B20

         DMPY2 A25:A24,A27:A26,A31:A30:A29:A28

      ||DMPY2 B25:B24,B27:B26,B31:B30:B29:B28

      So I'm afraid that you need 2 cycle to complete the calculation using DMPY2 instruction.

       

      Allen

      Please press the "Verify Answer" button if you think the post is helpful to your question.Thanks.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    TI E2E™ Community
    • Support Forums
    • Blogs
    • Videos
    • Groups
    • Site Support & Feedback
    • Settings
    TI E2E™ Community Groups
    • TI University Program
    • Make the Switch
    • Microcontroller Projects
    • Motor Drive & Control
    Other Communities
    • Deyisupport
    • Designsomething.org
    • beagleboard.org
    • TI on Element 14
    • TI on TechXchangeSM
    Other Technical & Support Resources
    • WEBENCH® Design Center
    • Product Information Centers
    • Technical Documents
    • TI Design Network
    • TI Technical Articles
    • TI Training

    All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

    Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

    Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
    TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

    TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
    embedded processors, along with software, tools and the industry’s largest sales/support staff.

    © Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
    Trademarks | Privacy Policy | Terms of Use