• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Digital Signal Processors (DSP) » C6000 Multicore DSP » Keystone Multicore Forum (C66, 66A, AM5) » Optimization of code
Share
C6000 Multicore DSP
  • Forums
  • Announcements
Options
  • Subscribe via RSS
Training Available
TI provides self-paced online training that introduces the primary components of the KeyStone II family of SoC devices.

  • KeyStone II SoC Overview >
  • KeyStone II Software Overview >
  • KeyStone II ARM Cortex-A15 Corepac Overview >
  • More Information >
  • Check out
    Multicore Mix blog
    • $core_v2_blog.Current.Name

      It’s our second anniversary, but you get the present!

      Posted 4 days ago
      by Lindsey Bare
      It’s hard to believe it’s already been two years...
    • $core_v2_blog.Current.Name

      Limited time offer: Save $100 on Keystone-based EVM!

      Posted 17 days ago
      by tscheck
      Have you been thinking about ordering a TI Keystone-based EVM...
    • $core_v2_blog.Current.Name

      Imagine the impact…TI’s KeyStone SoC + HP Moonshot

      Posted 28 days ago
      by Sanjay35057
      Last week, market leader Hewlett Packard announced a huge change...

    Forums

    Optimization of code

    This question is not answered
    Chunjian Li
    Posted by Chunjian Li
    on Apr 11 2012 08:21 AM
    Intellectual580 points

    Hi there,

    I have two threads, which share data by thread A writing to a global variable and thread B reading from it. It works fine when I build "Debug" or "Release" with -o1. But thread B is not able to get the updated data if I build "Release" with the -o2 option. Any suggestion?

    PS.: the global variable is declared as volatile. I have also tried to disable the L1D cache and L2 cache, didn't help.

    BR

    C.J.

    6678
    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    All Replies
    • Karthik Ramana Sankar
      Posted by Karthik Ramana Sankar
      on Apr 11 2012 09:10 AM
      Intellectual1695 points

      Hi Chunjian,

      In which memory end point (L2, MSMC or DDR3) is this global variable allocated from? Do you have synchronization between thread A and thread B, .i.e thread B waits until thread A has completed writing to the shared variable, before reading the shared variable. Instead of disabling L1D and L2 cache, try making the region of the memory containing the shared variable non-cacheable by appropriately configuring the MAR (memory attributes register).  

      Thanks,

      Karthik

      -------------------------------------------------------------------------------------------------------------------------------

      If you need more help, please reply back. If this answers the question, please click  Verify Answer , below.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Chunjian Li
      Posted by Chunjian Li
      on Apr 11 2012 09:21 AM
      Intellectual580 points

      Hi Karthik,

      It is located in the L2SRAM.

      I did try to set the MAR to 0 for the memory range where the variable is located in. It seems like it works only if the variable is in the DDR3 and if I set the whole DDR3 non-cacheable. I am not sure if I used it correctly. This is what I put in the code:

      This would work:

      Cache_setMar((xdc_Ptr *)0x80000000, 0x10000000, 0); 

      But this won't work:

      Cache_setMar((xdc_Ptr *)0x88000000, 0x02000000, 0);

      You have expereance with this?

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Karthik Ramana Sankar
      Posted by Karthik Ramana Sankar
      on Apr 11 2012 20:21 PM
      Intellectual1695 points

      Hi Chunjian,

      I do not know how the Cache_setMAR() is implemented and I ve not used it before. However, there are totally 256 MAR registers for each CorePAC and each MAR register can be configured to set the cacheability attributes of only 16 MB sized memory sections. For more details, refer to the C66x CorePAC users guide. A related E2E post, which might be helpful: 

      http://e2e.ti.com/support/embedded/bios/f/355/t/177410.aspx

      Thanks,

      Karthik

      -------------------------------------------------------------------------------------------------------------------------------

      If you need more help, please reply back. If this answers the question, please click  Verify Answer , below.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Alberto Chessa
      Posted by Alberto Chessa
      on Apr 12 2012 01:29 AM
      Genius3740 points

      Hi,

      It is not possible to change the cacheabiity of the L2 and MCSM (the first 15 MAR registers are read-only). So far, I have never find problem in enabling the cache for all the DDR or only a part (with the constrains of the granularity of the MAR registers, that is 32M).

      Anyway, multiple threads running of the same core should never have problems of cache coherency: both thread access data thought the same caches and memory interfaces. You should look somewhere else to resolve your problems. Maybe You can extract a little snippest of code to show exactly how you declare and define the shared variable and how you access and synchornize the threads: You use a semaphore? they poll continuosly over the shared var? The simple sleeps and then poll the variable?

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Chunjian Li
      Posted by Chunjian Li
      on Apr 12 2012 03:45 AM
      Intellectual580 points

      Hi Alberto,

      As you said, it is likely something else than cacheability that caused this problem, since the problem exists only when I build Release with the -o2 option, and it is fine if built as Debug or Release with -o1.

      Both threads running on the same core. The global variable is declared as: 

      extern volatile int32_t sample_counter;

      Thread B is a HWI routine, which processes the data comming from the converter and counts the number of sample. Thread A polls the variable sample_counter, and wait until it reaches a certain number:

      while (sample_counter<number_of_samples){}

       I suppose the cache coherence is not a problem, because the variable is in the L2SRAM ( 0x00800000)? 

      The volatile keyword should keep the compilor from doing wrong things, I suppose?

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Chunjian Li
      Posted by Chunjian Li
      on Apr 12 2012 05:05 AM
      Intellectual580 points

      I have also tried to put the variable in the L1SRAM(0x00F00000), same result:  the -o1 option works, -o2 won't work.

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Alberto Chessa
      Posted by Alberto Chessa
      on Apr 13 2012 01:10 AM
      Genius3740 points

      Hi,

      So You are using SYS/BIOS and Thread B is not a SYS/BIOS thread but a simple routine that process the sample and then routine. I suppose You have already verified that the Thread B runs correctly and the sample_counter is incremented as expected. Also I suppose You have already verified that number_of_samples hold the correct value.

      So far, I see no reason to justify that behaviour. The little loops could be critical since the cpu cannot serve interrupts when branching, but this is not the case I suppose since CGT 7.2.4+ with "-o2" can generates a "Software pipeline loop" that reload the sample_counter from memory at every iteration and is interruptible.

      You should try to set a break point in the middle of the loop (directly in the assembler), just before the load of the sample_counter, ant examine the context. You shoould find an instruction like:

         [ B0]   LDW     .D2T2   *+DP(sample_counter),B5   ; load the value
                 NOP             4  ; wait for the value to be ready in tegister B5
                 CMPLT   .L2     B5,B6,B4          ; compare with your threshold   <--- put the breakpoint here and look at register value and sample_counter value

        .....

      In my examples, B5 hold sample_counter, B6 number_of_samples (not reload at every iteration). You code could be a bit different.




      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Alberto Chessa
      Posted by Alberto Chessa
      on Apr 13 2012 01:27 AM
      Genius3740 points

      Hi,

      I forget to say that I think it is better to not place the variable in L1 since it used as cache by default (maybe do you disable it?). Since cache coherency should not be the problem, I suggest ti put the variable on the MCSM so it works regardless of you KL1/L2 configuration

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Chunjian Li
      Posted by Chunjian Li
      on Apr 16 2012 02:30 AM
      Intellectual580 points

      Ok, I found something that suprised me. The thread B I mentioned above is a HWI. When I put a breakpoint at the beginning of it, and it is never reached. That means the thread is not running at all. It is out of my imagination that -O2 option can destroy my HWI, while it works fine with the -O1 option?

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Alberto Chessa
      Posted by Alberto Chessa
      on Apr 16 2012 02:57 AM
      Genius3740 points

      Chunjian Li

      Ok, I found something that suprised me. The thread B I mentioned above is a HWI. When I put a breakpoint at the beginning of it, and it is never reached. That means the thread is not running at all. It is out of my imagination that -O2 option can destroy my HWI, while it works fine with the -O1 option?

      This should not be the case. Try to add some "nop" in the busy loop of thread A. Just write:

        while (sample_counter<number_of_samples) { __asm("  nop\n  nop\n nop\n nop\n nop\n nop\n nop\n  nop"); }

      In this way you lost some optimization of the loop, but you can better isolate the problem. If it works, it means the optimizer has generated code that cannot serve the interrupt in the Thread A loop, otherwise You have to look somewhere else.

      Which version of the compiler are You using?

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Chunjian Li
      Posted by Chunjian Li
      on Apr 16 2012 08:53 AM
      Intellectual580 points

      I have now removed the thread A, so that only thread B is running. The isr still won't run under the -o2 option, while it runs fine with -o1.

      The HWI was set up statically in the cfg file.

      I tried also to create the HWI dynamically from another thread, using the following code, but it won't run even with no optimization. 

       Hwi_Handle hwi0;
      Hwi_Params hwiParams;
      Error_Block eb;
      Error_init(&eb);
      Hwi_Params_init(&hwiParams);
      hwiParams.arg = 0;
      hwiParams.enableInt = 1;
      hwiParams.eventId = 84;
      hwiParams.priority = 8;
      hwi0 = Hwi_create(4, (Hwi_FuncPtr)isrConverter, &hwiParams, &eb);
      if (hwi0 == NULL) {
      System_abort("Hwi create failed");
      }
      
      
      Is this the correct way of creating the HWI?
      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    TI E2E™ Community
    • Support Forums
    • Blogs
    • Videos
    • Groups
    • Site Support & Feedback
    • Settings
    TI E2E™ Community Groups
    • TI University Program
    • Make the Switch
    • Microcontroller Projects
    • Motor Drive & Control
    Other Communities
    • Deyisupport
    • Designsomething.org
    • beagleboard.org
    • TI on Element 14
    • TI on TechXchangeSM
    Other Technical & Support Resources
    • WEBENCH® Design Center
    • Product Information Centers
    • Technical Documents
    • TI Design Network
    • TI Technical Articles
    • TI Training

    All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

    Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

    Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
    TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

    TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
    embedded processors, along with software, tools and the industry’s largest sales/support staff.

    © Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
    Trademarks | Privacy Policy | Terms of Use