• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Development Tools » TI C/C++ Compiler » TI C/C++ Compiler - Forum » Ask question about Loop Carried Dependency Bound
Share
TI C/C++ Compiler
  • Forum
Options
  • Subscribe via RSS

Forums

Ask question about Loop Carried Dependency Bound

This question is answered
Shi Tianqi
Posted by Shi Tianqi
on Apr 15 2012 04:17 AM
Prodigy40 points

Hi,

     I'm using CCS v4.2.4, I wrote a short test code as linear assembly in a file test_pipe.sa

which simply implement : for (i=0; i<N; i++) p_u[i] = p_u[i] + p_v[i]

 

_test: .cproc p_u, N, p_v

                     .reg j, u, ref_u, break_flag

                     .no_mdep

         MV N, j

loop:    .trip 16

       [j]  SUB j, 1, j

      

            LDH *+p_u[j], u

            LDH *+p_v[j], ref_u

            ADD  u, ref_u, u

            STH u, *+p_u[j]

        [j] B loop

                     .endproc

 

After compiled with –debug_software_pipeline, the compiler tells me the Loop Carried Dependency Bound(^) : 0

Then after a slight modified of the code,

_test: .cproc p_u, N, p_v

                     .reg j, u, ref_u, break_flag

                     .no_mdep

         MV N, j

loop:    .trip 16

       [j]  SUB j, 1, j

            LDH *+p_u[j], u

            LDH *+p_v[j], ref_u

            ADD  u, ref_u, u

            STH u, *+p_u[j]

            ZERO break_flag

       [!j] MVK 1, break_flag

       [!break_flag] B loop

                     .endproc

The compiler tells me Loop Carried Dependency Bound(^) : 3

Which doesn't quite make any sense,  can somebody tells me where the dependency is,

The attached file is the asm file produced by compiler with the second test code

 1738.test_pipe.txt

Thanks

Shi Tianqi

 

 

 

 

C64
Report Abuse
  • Reply
You have posted to a forum that requires a moderator to approve posts before they are publicly available.
All Replies
  • Luke Postema
    Posted by Luke Postema
    on Apr 16 2012 11:25 AM
    Intellectual420 points

    Shi,

    A loop carried dependency means that the results of one iteration of the loop are required as inputs in the next iteration of the loop. It might not look like you have such a dependency, but the compiler is protecting against the case when the arrays p_v and p_u overlap.

    Are you using the "restrict" keyword to describe p_v and p_u? I think that will fix your problem.

    Luke Postema

    DSP Engineer      D3 Engineering

    www.d3engineering.com

    optimization loop carried dependency restrict
    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • George Mock
    Posted by George Mock
    on Apr 16 2012 16:17 PM
    Guru51290 points

    Please see if this app note is helpful.  -George


    TI C/C++ Compiler Forum Moderator
    Please click Verify Answer on the best reply to your question.
    The Compiler Wiki answers most common questions.
    Track an issue with SDOWP. Enter your bug id in the "Find Record ID" box.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Shi Tianqi
    Posted by Shi Tianqi
    on Apr 19 2012 05:09 AM
    Prodigy40 points

    Hi,

        Thanks for quick reply.

        The "restrict" keyword -- can I ask what exactly it that?

        The .no_mdep is used in the linear assembly, and the compiler option -mt is used,

        And as I wrote, the first sample code the compiler tolds the loop dependency bound  _is_ 0.

        but the second sample code, compiler tells me it's 3, please have a reference of the attached file

    I cannot figure out how the two instructions in blue text below is loop dependent,

    And why the loop dependency bound is 3, the two instructions only takes 2 cycles.

    ;** --------------------------------------------------------------------------*
    ;          EXCLUSIVE CPU CYCLES: 7
    ;
    ; _test: .cproc p_u, N, p_v
    ;                      .reg j, u, ref_u, break_flag
    ;                      .no_mdep
    ; loop:    .trip 16
               MV      .L1X    N,N'              ; |2| 
               MV      .L1X    N,j               ; |2| 
               MV      .L1X    N,j$5             ; |2| 
     
       [ j$1]  ADD     .L1X    0xffffffff,N,j    ; |7| 
    ||         ZERO    .L2     break_flag        ; |13| (P) <0,2> 
    ||         MVC     .S2     CSR,B16
    ||         MV      .S1     p_v',p_v          ; |2| 
     
               MV      .L1     j,j$1             ; |7| 
    ||         LDH     .D1T2   *+p_v[j],ref_u    ; |10| (P) <0,4> 
    ||         MVK     .L2     0x1,B1
    ||         MV      .S1     p_u'',p_u         ; |2| 
    ||         MV      .S2X    p_u'',p_u'        ; |2| 
    ||         MV      .D2     N,j$2             ; |2| 
     
       [ j$1]  ADD     .L1     0xffffffff,j,j    ; |7| (P) <1,1>  ^   instruction 1
    ||         AND     .L2     -2,B16,B4
    ||         LDH     .D1T1   *+p_u[j],u        ; |9| (P) <0,3> 
    ||         MV      .S1     j,j$5             ; |7| 
    || [!j]    MVK     .S2     0x1,break_flag    ; |15| (P) <0,3> 
    || [ j$5]  ADD     .D2     0xffffffff,N,j$2  ; |7| 
     
               MV      .L1     j,j$6             ; |7| (P) <1,2>  ^ Split a long life(pre-sched)
     instruction 2
    ||         ZERO    .L2     break_flag        ; |13| (P) <1,2> 
    ||         MVC     .S2     B4,CSR            ; interrupts off
    || [ break_flag] ZERO .D2  B1                ; |15| (P) <0,5> 
     
     
    ;*       SETUP CODE
    ;*
    ;*                  MVK             0x1,B1
    ;*                  MV              A1,B8
    ;*                  MV              A3,B5
    ;*                  MV              B1,B2
    ;*                  MV              A1,A0
    ;*
    ;*        SINGLE SCHEDULED ITERATION
    ;*                                                           
    ;*        $C$C29:
    ;*   0              NOP             1
    ;*   1      [ A0]   ADD     .S1     0xffffffff,A1,A1  ; |7|  ^ instruction 1
    ;*   2              MV      .L1     A0,A2             ; |7| Split a long life(pre-sched)
    ;*     ||           MV      .D1     A1,A0             ; |7|  ^ Split a long life(pre-sched)
    instruction 2
    ;*     ||           ZERO    .S2     B0                ; |13| 
    ;*   3      [ A2]   ADD     .S2     0xffffffff,B8,B8  ; |7| Define a twin register
    ;*     ||   [ B1]   LDH     .D1T1   *+A3[A1],A4       ; |9| 
    ;*     ||   [!A1]   MVK     .D2     0x1,B0            ; |15| 
    ;*   4      [ B1]   LDH     .D1T2   *+A5[A1],B6       ; |10| 
    ;*   5      [ B0]   ZERO    .L2     B1                ; |15| 
    ;*   6              MV      .L2     B8,B9             ; |7| Split a long life(pre-sched)
    ;*   7              MV      .S2     B9,B7             ; |7| Split a long life(pre-sched)
    ;*   8              MV      .D2     B1,B4             ; |15| Split a long life(pre-sched)
    ;*     ||   [ B1]   B       .S1     $C$C29            ; |16| 
    ;*   9              ADD     .L1X    A4,B6,A6          ; |11| 
    ;*  10      [ B2]   STH     .D2T1   A6,*+B5[B7]       ; |12| 
    ;*     ||           MV      .L2     B4,B2             ; |15| Split a long life(pre-sched)
    ;*  11              NOP             3
    ;*  14              ; BRANCHCC OCCURS {$C$C29}        ; |16| 
     
     
     
     
     
    Thanks

    Shi Tianqi

     

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • George Mock
    Posted by George Mock
    on Apr 19 2012 14:01 PM
    Guru51290 points

    In C code, indexed addressing (i.e. p_u[i]) is preferred because it is a bit easier for the compiler to know such references cannot overlap, i.e. are not aliases.  Such code is often turned into auto-increment addressing (i.e. *A1++) in the generated assembly, because that requires only one register.  You should do the same thing, even in linear assembly.  Rewrite your linear assembly to use auto-increment addressing instead of indexed addressing, and I think most of your problems will go away.

    Thanks and regards,

    -George


    TI C/C++ Compiler Forum Moderator
    Please click Verify Answer on the best reply to your question.
    The Compiler Wiki answers most common questions.
    Track an issue with SDOWP. Enter your bug id in the "Find Record ID" box.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Shi Tianqi
    Posted by Shi Tianqi
    on Apr 19 2012 22:15 PM
    Prodigy40 points

    Hi, Gorge,

        Thanks for your reply.

        I'm optimizing the code by doing some tuning of the code so that compiler can do better software pipeline, 

        My goal is the reduce the LOOP CARRIED DEPENDENCY BOUND to zero,

        but sometimes it's hard to find where the loop dependency is, whatever I have done, the LOOP CARRIED DEPENDENCY BOUND remains the same,

        So I start to suspect whether the compiler tells the right thing,

        So I wrote some very simple code and test,I found the compiler quite doesn't make any sense.

        at least for the code I posted, I cannot find how the instructions with (^) is loop dependent,

    -- So can you help me to understand how the instruction with (^) above is loop dependent

        Thanks.

    Shi Tianqi

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Archaeologist
    Posted by Archaeologist
    on Apr 20 2012 08:09 AM
    Verified Answer
    Verified by Shi Tianqi
    Mastermind40550 points

    Shi Tianqi

                ZERO break_flag

           [!j] MVK 1, break_flag

           [!break_flag] B loop

    The output of the ZERO instruction is considered an input to the ZERO instruction because the MVK is a conditional write. Therefore, the ZERO must finish before the MVK starts; this is one cycle.

    The output of MVK is read by the branch, so clearly it must come before the branch.  This is the second cycle.

    The branch reads the value in the first cycle.  We must make sure there that the next iteration of the loop does not clobber this value before the branch reads it.  The compiler considers this a write-after-read hazard between the branch and the ZERO in the next iteration.  This is the third cycle, and closes the loop-carried dependence graph.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Shi Tianqi
    Posted by Shi Tianqi
    on Apr 22 2012 21:39 PM
    Prodigy40 points

    Hi, Archaeologist

    Now I understand the instruction like j=j-1 is loop dependent, so every instruction related with j is on the loop carry path

    Thanks all for the kind reply

    Shi Tianqi

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
TI E2E™ Community
  • Support Forums
  • Blogs
  • Videos
  • Groups
  • Site Support & Feedback
  • Settings
TI E2E™ Community Groups
  • TI University Program
  • Make the Switch
  • Microcontroller Projects
  • Motor Drive & Control
Other Communities
  • Deyisupport
  • Designsomething.org
  • beagleboard.org
  • TI on Element 14
  • TI on TechXchangeSM
Other Technical & Support Resources
  • WEBENCH® Design Center
  • Product Information Centers
  • Technical Documents
  • TI Design Network
  • TI Technical Articles
  • TI Training

All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
embedded processors, along with software, tools and the industry’s largest sales/support staff.

© Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
Trademarks | Privacy Policy | Terms of Use