• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Microcontrollers » C2000™ Microcontrollers » C2000 32-bit Microcontrollers Forum » Help running simple executable on F28069 Piccolo controlSTICK
Share
C2000™ Microcontrollers
  • Forums
  • Announcements
  • E2E Wiki
Options
  • Subscribe via RSS
C2000 Resources
  • Product Folder
  • C2000 Training Portal
  • C2000 Technical Training Catalog
  • C2000 Datasheets, App Notes, User Guides
  • C2000 Hardware Design Kits
  • controlSUITE for C2000 Software Library


  • InstaSPIN Resources
  • What is InstaSPIN?
  • Videos and Support


  • InstaSPIN-FOC and InstaSPIN-MOTION Resources
  • What is InstaSPIN-FOC?
  • What is InstaSPIN-MOTION?
  • Product Folder: F28069F, F28068F, F28062F, F28068M, F28069M
  • User’s Guide
  • Technical User’s Manual
  • Tools
  • Forums

    Help running simple executable on F28069 Piccolo controlSTICK

    This question is answered
    Stephen Moore
    Posted by Stephen Moore
    on Mar 05 2012 12:21 PM
    Prodigy250 points

    I want to benchmark the F28069 Piccolo with this simple floating-point [5x5] matrix multiplication. This code is copied from the TI benchmark sample and modified for [5x5] matrix size.

    The code apparently runs for 100 iterations, but freezes at 1000 iterations and over. I would like to perform 100,000 loops.

    #include <stdio.h>

    #include <math.h>

     

    void main(void) {

     

           int j, m, n, p;

           float m3[5][5] = { {0.0 , 0.0 , 0.0 , 0.0 , 0.0},{0.0 , 0.0 , 0.0 , 0.0 , 0.0},{0.0 , 0.0 , 0.0 , 0.0 , 0.0},{0.0 , 0.0 , 0.0 , 0.0 , 0.0},{0.0 , 0.0 , 0.0 , 0.0 , 0.0} };

           const float m1[5][5] = { {0.0001, 0.001, 0.01, 0.1, 1},{0.001, 0.01, 0.1, 1, 10},{0.01, 0.1, 1, 10, 100},{0.1, 1.0, 10, 100, 1000},{1, 10, 100, 1000, 10000} };

           const float m2[5][5] = { {0.0001, 0.001, 0.01, 0.1, 1},{0.001, 0.01, 0.1, 1, 10},{0.01, 0.1, 1, 10, 100},{0.1, 1.0, 10, 100, 1000},{1, 10, 100, 1000, 10000} };

     

           printf("Benchmark Program \n");

     

           printf("Starting \n");

     

           for(j = 0; j < 100000; j++) {

                  for(m = 0; m < 5; m++) {

                  for(p = 0; p < 5; p++) {

                       m3[m][p] = 0;

                       for(n = 0; n < 5; n++) {

                           m3[m][p] += m1[m][n] * m2[n][p];

                       }

                   }

               }

           }

     

           printf("Ending \n");

    }

    Screenshot attached. Any comments?

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    All Replies
    • Stephen Moore
      Posted by Stephen Moore
      on Mar 12 2012 09:09 AM
      Prodigy250 points

      Thanks again. What about this "make" problem? I get the compiler error in some circumstances. In other circumstances, I'll get this Make error:

      After two weeks of this, I'm either going to find some support or abandon the effort. My other candidates (STM32F4/Keil and LCP78xx/CodeRed) have been running real-time state-space system simulations for a week. I can't even get the Piccolo to blink an LED now.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Trey German
      Posted by Trey German
      on Mar 12 2012 09:41 AM
      Genius14510 points

      Once you setup the project to work with the compiler version you have, you will not get the managed make error.  Also, it looks like you didn't put this example in the right place which will prevent it from finding all the support files it needs (thats why all the files have "!" icons).  This project needs to reside in c:\ti\controlsuite\device_support\f2806x\version\f2806x_examples\.

      I understand your frustration.  I've been working with CCS for years now, so its all second nature to me.  The reason CCS is so complex is because it has to support so many different architectures.  Everything from the smallest MSP430 to the biggest C6000 mulitcore DSP is programmed through CCS, so everything has to be very extensible which adds a lot of complexity. 

      Trey

      Trey German

      C2000 Applications

      If a post answers your question, please mark it with the "verify answer" button.
      Visit these helpful C2000 Links!
      C2000 TI Wiki Pages
      TI Forum Sitemap
      ControlSUITE
      C2000 Getting Started
      CLA FAQs
      Workshop Material!
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Stephen Moore
      Posted by Stephen Moore
      on Mar 12 2012 10:13 AM
      Prodigy250 points

      OK, I made a totally fresh installation of ControlSUITE and CCS. I went through the registry and hard drive and cleaned out all of the TI garbage from previous installations and rebooted a couple times. I installed ControlSuite and CCS, opened CCS and followed instructions to the letter. See screenshots of green "checkmarks" in the TI Resource Explorer. It imports the example project, builds the project, and sets up the debugger (Steps 1-3). I left the properties dialog on the screen, set to the correct hardware and USB emulator, and also the compiler version. It throws 27 errors and does not produce an output file to proceed to Step 4. Notice that I have not performed any independent actions here, I'm allowing the wizard to guide the process of loading a demonstration example.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Trey German
      Posted by Trey German
      on Mar 12 2012 10:23 AM
      Genius14510 points

      Stephen,

      Sorry to hear you're still having trouble.  The resource explorer project import is design to make things very easy for users, but I guess this isn't very true for you.  Would you mind posting the errors?  I suspect an issue with linked in resources in the project.

      The way our controlSUITE software is architected right now, if the right versions of everything aren't installed the projects break.  We are fully aware of this issue and the frustration it causes new users and we are developing a new controlSUITE structure which will fix many of these problems.  Later this year we will be releasing this new controlSUITE architecture and problems like this won't happen any more.

      Trey

      Trey German

      C2000 Applications

      If a post answers your question, please mark it with the "verify answer" button.
      Visit these helpful C2000 Links!
      C2000 TI Wiki Pages
      TI Forum Sitemap
      ControlSUITE
      C2000 Getting Started
      CLA FAQs
      Workshop Material!
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Stephen Moore
      Posted by Stephen Moore
      on Mar 12 2012 12:26 PM
      Prodigy250 points

      Yahoo. With a little (more) perseverance, I got the program running again. Compiler was adjusted to v6.0.2. Interestingly, it won't compile with FLASH, but will run from RAM. The run took 47 seconds (same as before). Maybe the FPU not being used, so I looked at the Runtime support library under Properties was set to rts2800_ml.lib. I changed this to rts2800_fpu32.lib, and it won't compile anymore. Screenshots and errorlog attached when using rts2800_fpu32.lib.

      [code]<Linking>
      "../F2806x_RAM_BlinkingLED.CMD", line 49: error: BEGIN memory range has already
         been specified
      "../F2806x_RAM_BlinkingLED.CMD", line 49: error: BEGIN memory range overlaps
         existing memory range BEGIN
      "../F2806x_RAM_BlinkingLED.CMD", line 51: error: RAMM0 memory range has already
         been specified

      "../F2806x_RAM_BlinkingLED.CMD", line 51: error: RAMM0 memory range overlaps
         existing memory range RAMM0
      "../F2806x_RAM_BlinkingLED.CMD", line 52: error: progRAM memory range overlaps
      >> Compilation failure
         existing memory range RAML0_L3
      "../F2806x_RAM_BlinkingLED.CMD", line 54: error: FPUTABLES memory range has
         already been specified
      "../F2806x_RAM_BlinkingLED.CMD", line 54: error: FPUTABLES memory range
         overlaps existing memory range FPUTABLES
      "../F2806x_RAM_BlinkingLED.CMD", line 55: error: IQTABLES memory range has
         already been specified
      "../F2806x_RAM_BlinkingLED.CMD", line 55: error: IQTABLES memory range overlaps
         existing memory range IQTABLES
      "../F2806x_RAM_BlinkingLED.CMD", line 56: error: IQTABLES2 memory range has
         already been specified
      "../F2806x_RAM_BlinkingLED.CMD", line 56: error: IQTABLES2 memory range
         overlaps existing memory range IQTABLES2
      "../F2806x_RAM_BlinkingLED.CMD", line 57: error: IQTABLES3 memory range has
         already been specified
      "../F2806x_RAM_BlinkingLED.CMD", line 57: error: IQTABLES3 memory range
         overlaps existing memory range IQTABLES3
      "../F2806x_RAM_BlinkingLED.CMD", line 59: error: BOOTROM memory range has
         already been specified
      "../F2806x_RAM_BlinkingLED.CMD", line 59: error: BOOTROM memory range overlaps
         existing memory range BOOTROM
      "../F2806x_RAM_BlinkingLED.CMD", line 61: error: RESET memory range has already
         been specified
      "../F2806x_RAM_BlinkingLED.CMD", line 61: error: RESET memory range overlaps
         existing memory range RESET
      "../F2806x_RAM_BlinkingLED.CMD", line 66: error: RAMM1 memory range has already
         been specified
      "../F2806x_RAM_BlinkingLED.CMD", line 66: error: RAMM1 memory range overlaps
         existing memory range RAMM1
      error #10010: errors encountered during linking; "BlinkingLED.out" not built
      gmake: *** [BlinkingLED.out] Error 1
      gmake: Target `all' not remade because of errors.

      **** Build Finished ****[/code]

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Trey German
      Posted by Trey German
      on Mar 12 2012 12:48 PM
      Verified Answer
      Verified by Stephen Moore
      Genius14510 points

      CCSv5 does some wierd thing that should make running projects easier, but it some cases it breaks things.  What happened here is when you imported the CCSv4 project into CCSv5 it automatically added a linker command file for the 06x device you are using, but your project already had a linker command file.  The two files defined the same ranges in memory which is why it is complaining about overlap.  To fix this you can either remove the F2806x_RAM_BlinkingLED.cmd file or in the build properties remove the F2806x_ram_lnk.cmd file.  The fact that you changed to use the FPU run time support library didn't have any thing to do with the above errors.

      Also, I believe switching to the FPU run time support library ought to solve the speed issue.

      Regards,

      Trey

      Trey German

      C2000 Applications

      If a post answers your question, please mark it with the "verify answer" button.
      Visit these helpful C2000 Links!
      C2000 TI Wiki Pages
      TI Forum Sitemap
      ControlSUITE
      C2000 Getting Started
      CLA FAQs
      Workshop Material!
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Stephen Moore
      Posted by Stephen Moore
      on Mar 12 2012 15:28 PM
      Prodigy250 points

      That was the final bit. Thanks for your continued responsiveness.

      2.103 seconds for 5.94 MFLOPS. Does that sound consistent with design capability?

      I'm a little nervous about the TI CCS, but at least we now understand the hardware capabilities.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Trey German
      Posted by Trey German
      on Mar 13 2012 10:26 AM
      Genius14510 points

      Stephen,

      The core is capable of much more than 5.94 MFLOPS.  If you hand coded assembly you could actually theoretically get up to 160 MFLOPs as we have a parallel add multiply instruction that is single cycle.  That being said MFLOPs is more of a marketing number because it really depends on how the code is written: assembly, c, loops unrolled, optimizations, etc.  Your question has spurred some internal discussion between the floating point experts and I expect they will reply to this post soon. 

      Regards,
      Trey

      Trey German

      C2000 Applications

      If a post answers your question, please mark it with the "verify answer" button.
      Visit these helpful C2000 Links!
      C2000 TI Wiki Pages
      TI Forum Sitemap
      ControlSUITE
      C2000 Getting Started
      CLA FAQs
      Workshop Material!
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Lori Heustess
      Posted by Lori Heustess
      on Mar 13 2012 11:13 AM
      Guru50925 points

      Stephen,

      I suspect the compiler is not doing as well as it could.  Here are a few things to try out.

      Understand that as the compiler generates more optimal code there is a tradeoff with debug capability.  When you start out, you may want the most debug capability available.  In this case the compiler options will likely be limited to -g and mt.  You would then increase optimization from there.

      1. Start with -g -mt  (symbolic debug + unified memory)   these can both be found on the basic options tab of the project options (in CCS 5).
      2. Next you can add -mn (optimize with debug)  This is on the runtime model tab.  This will re-enable some optimizations that -g disabled but still allow you to debug fairly well.
      3. The next step would be to turn on some optimization level.  -o2 is often a good balance.  This can be found on the basic options tab.
      4. Next you would try perhaps -o3 or -o4 optimization.  These may nor may not help improve the benchmark.
      5. Finally you can remove -g - this can severely limit debug capability so it is often done only on a particular file with code you need highly optimized.

      There are some more details of these tips on this wiki page:

      http://processors.wiki.ti.com/index.php/C28x_Code_Generation_Tips_and_Tricks#Optimization

      Regards

      Lori

      Did a reply answer your question? If yes, please click the "yes" button located at the bottom of that post.
      Visit these helpful C2000 Links!
      C2000 TI Wiki Pages
      TI Forum Sitemap
      ControlSUITE
      C2000 Getting Started
      CLA FAQs
      Workshop Material!
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • John Connor
      Posted by John Connor
      on Apr 06 2012 13:09 PM
      Prodigy165 points

      Stephen Moore

      STM32F4 (168MHz and FPU)         8.6 seconds
      NXP mbed LPC1768 (96MHz)        16.2 seconds
      LCPXpresso LPC1769 (120MHz)     19.4 seconds
      Piccolo F28069 (80MHz and FPU)  43.7 seconds

      After unsuccessful attempt to run Coremark benchmark on C2000 (coremark doesn't like lack of 8bit data type on C2000), I tried to run code from the first post. Here are my results:

      1. Code in Flash - default waitstates
      -O0 - 85.149 seconds
      -O2 - 51.782 seconds
      -O4 - 51.782 seconds

      2. Code in Flash - minimum waitstates
      -O0 - 11.594 seconds
      -O2 - 5.603 seconds
      -O4 - 5.602 seconds

      3. Code in SRAM
      - O0 - 11.241 seconds
      - O2 - 5.414 seconds
      - O4 - 5.414 seconds

      I don't have STM32F4 to retest the code, but is it possible that piccolo on 80MHz is executing floating point code much faster than 168MHz STM?

      It is also interesting how not properly initialized flash gives you very crappy performance 0:-)

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • John Bennett
      Posted by John Bennett
      on Apr 12 2012 03:35 AM
      Prodigy110 points

      Just a reply to agree with you about the flash.

      I unwittingly left out the example flash initialisation routines when creating my software and spent a good day scratching my head wondering why it was taking something like 15 clock cycles to do a single assembler instruction.

      Once I put the flash wait-state setup code back in, performance was back to 1 instruction per clock cycle and all was great :-)

      Almost not worth putting code in SRAM, the flash is so quick when set up properly.

      flash slow instructions
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Stephen Moore
      Posted by Stephen Moore
      on Apr 12 2012 10:29 AM
      Prodigy250 points

      That being said MFLOPs is more of a marketing number because it really depends on how the code is written: assembly, c, loops unrolled, optimizations, etc.

      I'm using FLOPS as my benchmark number, based on code that was derived from TI benchmarking application note.

      The performance is highly dependent on the compiler settings. The compiler sensitivity is extremely high, yielding greater than an order of magnitude differences in performance.

      I don't have STM32F4 to retest the code, but is it possible that piccolo on 80MHz is executing floating point code much faster than 168MHz STM?

      In general, the F28069 is running FPU faster than the STM32, although the STM could be running slow because of similar optimization issues.

      If you hand coded assembly you could actually theoretically get up to 160 MFLOPs as we have a parallel add multiply instruction that is single cycle.

      The F28069/CSS system is very sensitive and tricky. I'm concerned what could be profitable software development time will be spent figuring out the sensitivities of the TI system. We could spend forever tweaking settings instead of writing revenue-generating code. As you mention, hand-coding the most math-intensive routines (matrix multiplication, dot products, or matrix inversions) may be the best way to go. Hand-coded assembly would basically remove the FPU-heavy routines from being CPU throughput hogs, and alleviate our timing worries.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    12
    TI E2E™ Community
    • Support Forums
    • Blogs
    • Videos
    • Groups
    • Site Support & Feedback
    • Settings
    TI E2E™ Community Groups
    • TI University Program
    • Make the Switch
    • Microcontroller Projects
    • Motor Drive & Control
    Other Communities
    • Deyisupport
    • Designsomething.org
    • beagleboard.org
    • TI on Element 14
    • TI on TechXchangeSM
    Other Technical & Support Resources
    • WEBENCH® Design Center
    • Product Information Centers
    • Technical Documents
    • TI Design Network
    • TI Technical Articles
    • TI Training

    All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

    Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

    Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
    TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

    TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
    embedded processors, along with software, tools and the industry’s largest sales/support staff.

    © Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
    Trademarks | Privacy Policy | Terms of Use