• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Digital Signal Processors (DSP) » C6000 Multicore DSP » Keystone Multicore Forum (C66, 66A, AM5) » Data in register
Share
C6000 Multicore DSP
  • Forums
  • Announcements
Options
  • Subscribe via RSS
Training Available
TI provides self-paced online training that introduces the primary components of the KeyStone II family of SoC devices.

  • KeyStone II SoC Overview >
  • KeyStone II Software Overview >
  • KeyStone II ARM Cortex-A15 Corepac Overview >
  • More Information >
  • Check out
    Multicore Mix blog
    • $core_v2_blog.Current.Name

      OpenMP - All aboard!

      Posted 16 hours ago
      by Debbie Greenstreet
      With so many end products today relying on multicore DSPs for...
    • $core_v2_blog.Current.Name

      A look back: Two years of Multicore Mix

      Posted 1 day ago
      by Lauren Reed1
      A big thank you to everyone who participated in our contest last...
    • $core_v2_blog.Current.Name

      It’s our second anniversary, but you get the present!

      Posted 8 days ago
      by Lindsey Bare
      It’s hard to believe it’s already been two years...

    Forums

    Data in register

    This question is not answered
    Arun
    Posted by Arun
    on Apr 28 2012 00:02 AM
    Intellectual840 points

    Hello,

    I am again asking my Question because I have not get any satisfying solution to that till yet. I am using C6678 and I have tried running Matrix - Matrix Multiplication on it both in Single Precision and Double precision. I have tried to Keep data at L1, L2 ,MSMC and DDR3 by CCS and RTCS. But, I want to see performance and Want to calculate Cycle Counts while putting my data in Nearest to core , I mean REGISTERS. Nothing Else. I am not saying about large size matrix, I am only interested in smallest size, take 2by 2, or even 3by 3. 

    Do I need to write Linear Assembly for that Which I am not sure. I have C code with my self for Matrix matrix Multiplication which I have written. Can Anybody please help me with this. I am trying some solution for this, But I don't know I am not getting any satisfying solutions to it. Provide me some links, (Please Don't provide me RTSC Links), or Any Documents if you don't have any solutions, But I want to Put My Data on registers and Want to calculate matrix multiplication over there.

    Hope I will get some solutions this time.

    Thanks and Regards,
    Arun 

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    All Replies
    • one and zero
      Posted by one and zero
      on Apr 30 2012 01:00 AM
      Expert6825 points

      Hi Arun,

      there's the register keyword in C. The register type modifier tells the compiler to store the variable being declared in a CPU register (if possible), to optimize access. For example:

      register int i;

      Kind regards,

      one and zero

       

      Please click the Verify Answer button on this post if it answers your question.

      You can also follow me on Twitter: http://twitter.com/oneandzeroTI

      Do you want to read interesting multicore articles? Check out our Multicore Mix

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Arun
      Posted by Arun
      on Apr 30 2012 01:25 AM
      Intellectual840 points

      Hello One and Zero!

      I was thinking Nobody will reply to this question. People told me Its not possible, Or We can only keep data for a short time, but I am thinking why, we can keep data in it as long as we want for. Anyways, 

      Many Thanks for your reply. Actually, I am not very much sure how to do that. Do you know any examples in which this has happened so I can have a look on it and understand. We have two sides A and B in 6678 DSP and each side we have 32 registers each. Like, As we do by CCS for L1, L2 and Others we can see all those in auto generated linker file that where and which part has loaded where? can I will be see that thing for registers as well. 

      Thanks and Regards,
      Arun 

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • HRi
      Posted by HRi
      on Apr 30 2012 01:43 AM
      Mastermind7270 points

      Hi Arun,

      As you know the c66x includes 64 32bit registers the best way to use them is by using Linear Assembly or Assembly,

      BR,

      HR

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • one and zero
      Posted by one and zero
      on Apr 30 2012 01:55 AM
      Expert6825 points

      Hi Arun,

      the register keyword does not allow you to control which registers should be used nor does it guarantee that the Compiler actually uses a register. It only tells the the Compiler that you think the variable should be kept in a register and the compiler will try to allocate. So it is a recommendation to the compiler to use a register.

      The usage is straight forward just put the register keyword in front of your variable declaration. See also:

      http://www.geeksforgeeks.org/archives/4346

      In case you want to manually control the register allocation you have to go to assembly.

      Kind regards,

      one and zero

       

      Please click the Verify Answer button on this post if it answers your question.

      You can also follow me on Twitter: http://twitter.com/oneandzeroTI

      Do you want to read interesting multicore articles? Check out our Multicore Mix

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Arun
      Posted by Arun
      on Apr 30 2012 11:07 AM
      Intellectual840 points

      Hello One and Zero,

      Thanks for your reply. Yeah, I was also aware of the fact that registers are 32 bit and 64 in numbers including both sides A and B. And I had doubt on c Code by Register keyword as well. But, I am not interested in controlling and also make sure everything calculation which has happened should be into the Registers only. So, I read some where that I need to go by Linear Assembly .

      Anyways, can you provide some example codes to use by Assembly because I have no prior experience with it. I know way but I have not written any Assembly.

      Thanks and regards,
      Arun 

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • one and zero
      Posted by one and zero
      on May 02 2012 03:31 AM
      Expert6825 points

      Hi Arun,

      I'd recommend to stay with C since our Compiler does an excellent job in optimization. You can also do a lot on the C-level to optimize your code so that it fits the C6000 architecture best. Please have a look at the TMS320C6000 Programmer’s Guide.

      In case you want to educate yourself in linear assembly Chapter 5 of the Programmer's Guide will be helpful also showing code examples.

      Kind regards,

      one and zero

       

      Please click the Verify Answer button on this post if it answers your question.

      You can also follow me on Twitter: http://twitter.com/oneandzeroTI

      Do you want to read interesting multicore articles? Check out our Multicore Mix

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • one and zero
      Posted by one and zero
      on May 02 2012 07:35 AM
      Expert6825 points

      ... forgot to mention the very useful application report about Hand-Tuning Loops and Control Code on the TMS320C6000

      It is already a bit old and talking about older compiler versions and only up to the 64x+ architecture but the fundamentals and principles still apply today and also for c66x.

      Kind regards,

      one and zero

       

      Please click the Verify Answer button on this post if it answers your question.

      You can also follow me on Twitter: http://twitter.com/oneandzeroTI

      Do you want to read interesting multicore articles? Check out our Multicore Mix

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Arun
      Posted by Arun
      on May 03 2012 01:14 AM
      Intellectual840 points

      Hello One and Zero,

      Thank you very much for all this links and Knowledge. I also want to stick only with C only. But the problem is I want to access and play with registers and As far as I have understood and read, for Dealing with registers, I need to shift to Linear Assembly. As I have not write any type of code on Linear Assembly ever before so I am kind of hesitation but I am not seeing any other option as well.

      Yes, I am trying to work on Optimization and I am also working on new paper which TI has published for SGEMM and DGEMM on C6678 and I am trying to optimize SGEMM kernel over there. Let see how far I can go. I will keep you guys busy with my questions.

      Thank You very much for all your support and help! I appreciate it. 

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • one and zero
      Posted by one and zero
      on May 03 2012 02:18 AM
      Expert6825 points

      Hi Arun,

      I'm sure you're interested in the paper Unleashing DSPs for General-Purpose HPC which describes how to implement GEMM on a C6678 in C + using intrinsics.

      I hope that helps and gives you some more ideas  ...

      Kind regards,

      one and zero

       

      Please click the Verify Answer button on this post if it answers your question.

      You can also follow me on Twitter: http://twitter.com/oneandzeroTI

      Do you want to read interesting multicore articles? Check out our Multicore Mix

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Arun
      Posted by Arun
      on May 03 2012 11:11 AM
      Intellectual840 points

      Hello One and Zero,

      Yes, That is the paper which I am talking about. I have already seen and Read that and working on it. Anyways, If you have mentioned then let me ask some couple of questions on it.

      1. I have one major doubt on Kernel. Why do we need kernel code, Can we not write any of our own simple matrix to matrix multiplication code and try to optimize, paralleled  and then try to change memory locations based on chunks we are creating and sending in a way we want to do multiplication. Because It is already quite hard to understand kernel code.

      2. We all know there is a onboard emulator on C6678 and which is very slow. So, I think for achieving the results which are mentioned in this paper we need some external emulators. Because whatever knowledge and understanding I get from this paper I have tried using same kernel code and all and trying to optimize it, unfortunately I have got very very poor results, somewhere about 1%or 2% what they have got. I know i haven't understood it properly but still. 

      3. Another thing is they have not talked anything about registers in this on which I am quite interested this time. I want to start from very first level then move on to next memory level and see the difference.

      Thanks and Regards,
      Arun 

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • one and zero
      Posted by one and zero
      on May 04 2012 07:48 AM
      Expert6825 points

      Hi Arun,

      1. I'm not quite sure what your question is. Of course you could write your own kernel

      2. The benchmarking result is not dependent on the emulator you're using.

      3. If you want to look at a real linear (or serial) assembly implementation, you can look into the DSPLIB there's FFT implemented that way (DSPF_sp_fftSPxSP.sa)

      Kind regards,

      one and zero

       

      Please click the Verify Answer button on this post if it answers your question.

      You can also follow me on Twitter: http://twitter.com/oneandzeroTI

      Do you want to read interesting multicore articles? Check out our Multicore Mix

       

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Arun
      Posted by Arun
      on May 05 2012 00:25 AM
      Intellectual840 points

      Hello One and Zero,

      Thanks for your reply.

      1. I mean by kernel is like, Can't I write my own code in C and trying to paralleled it and then optimize it after that change or configure memory accordingly. Do i really need kernel like thing?

      2. I have tried installing DSP lib for Linux and then I go to the folder where I have installed and KI looked into packages  then Src and there are some examples for codes.Folder which you have mentioned there is nothing with.sa extension but there are codes But i didn't get any Linear assembly in it. All are c codes. can you attached one folder to me. I will appreciate your help.

      I am trying hard to understand Linear Assembly for C6678.

      Thanks and regards,
      Arun 

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    TI E2E™ Community
    • Support Forums
    • Blogs
    • Videos
    • Groups
    • Site Support & Feedback
    • Settings
    TI E2E™ Community Groups
    • TI University Program
    • Make the Switch
    • Microcontroller Projects
    • Motor Drive & Control
    Other Communities
    • Deyisupport
    • Designsomething.org
    • beagleboard.org
    • TI on Element 14
    • TI on TechXchangeSM
    Other Technical & Support Resources
    • WEBENCH® Design Center
    • Product Information Centers
    • Technical Documents
    • TI Design Network
    • TI Technical Articles
    • TI Training

    All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

    Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

    Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
    TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

    TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
    embedded processors, along with software, tools and the industry’s largest sales/support staff.

    © Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
    Trademarks | Privacy Policy | Terms of Use