This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IQmath Cosine Function Pipelining Issue

Hi.

The TMS320C64x+ IQmath Library User's Guide states that the IQNcos function takes 54 cycles when stand-alone and only 4 cycles when pipelined (Table 3-7, page 18).

Bullet 2 of the Notes section on page 19 also states:

  • The pipelined loop cycles mentioned are measured with only a single function being called in a loop for 1024 iterations. This figure also includes cycles for loading the input and storing output data. A combination of functions may not yield scalable performance. If a large number of functions are used in a loop, the loop may not schedule.

So, I have been trying to verify this using a simple loop code as well as following the software pipelining rules and all the relevant compiler options as explained in the TMS320C6000 Programmer's Guide  in order to assist the compiler to pipeline the loop. Here is the portion of the code that contains the loop that I wrote:

 

#pragma MUST_ITERATE (1024, 1024, 2);


for( i=0;  i<1024; i++)


{


 

  

test[i]=_IQcos(x);  


}

 

with the following declared:

int  i; 

_iq test[1024];

 

Global Q is set at default 24, and Basic Compiler options are:



Target Version: C64x+

 

Optimize for Size (-ms): No

 

Optimize for Speed (-mf): 5

 

Opt Level: Function (-o2)

 

Program Level Opt: 

-pm

 

 

When I profile the loop by counting clock cycles using the clock function in CCS3.3, the result is about 57 000 clock cycles for the loop to complete. This calculates to about 55 clock cycles for each iteration, which is very close to the stand-alone figure of 54 mentioned above.



The compiler is clearly NOT pipelining the loop... So, can anyone please tell me what I am doing wrong?

Regards.

Estian.

 


  • I forgot to add that I am trying this on the EVMC6474 Development board..

    Estian.

  • Due to the formatting errors above as well as a few mistakes, I am posting this again:

     

     

    Hi.

    The TMS320C64x+ IQmath Library User's Guide states that the IQNcos function takes 54 cycles when stand-alone and only 4 cycles when pipelined (Table 3-7, page 18).

    Bullet 2 of the Notes section on page 19 also states:

    • The pipelined loop cycles mentioned are measured with only a single function being called in a loop for 1024 iterations. This figure also includes cycles for loading the input and storing output data. A combination of functions may not yield scalable performance. If a large number of functions are used in a loop, the loop may not schedule.

    So, I have been trying to verify this using a simple loop code as well as following the software pipelining rules and all the relevant compiler options as explained in the TMS320C6000 Programmer's Guide  in order to assist the compiler to pipeline the loop. I am using CCS3.3. Here is the portion of the code that contains the loop that I wrote:

    #pragma MUST_ITERATE (1024, 1024, 2);

    for( i=0;  i<1024; i++)

    {

    test[i] = _IQcos(x);

    }

    with the following declared:

    int i;

    _iq test[1024];

    _iq x = _FtoIQ(3.14159);

     

    Global Q is set at default 24, and Basic Compiler options are:

    Target Version: C64x+

    Optimize for Size (-ms): No

    Optimize for Speed (-ms): 5

    Opt Level: Function (-o2)

    Program Level Opt: -pm

     

    When I profile the loop by counting clock cycles using the clock function in CCS3.3, the result is about 57 000 clock cycles for the loop to complete. This calculates to about 55 clock cycles for each iteration, which is very close to the stand-alone figure of 54 mentioned above.

    My conclusion is that the compiler is clearly NOT pipelining the loop... So, can anyone please tell me how to get the compiler to pipeline this loop?

    Regards.

    Estian.

     

     

     

  • You might want to consider the fact that the loop is accessing the same variable every single time (x). The compiler might be working under the assumption that the loops depend on each other and are therefore disqualified from running in parallel.

    Also this is likely not the key issue, but the compiler only pipelines code with loop counter that count down (TMS320C6000 Programmer’s Guide), although the compiler will generally be able to change the counter to make it count down. But it would be a small change to test that.

  • The problem could also be a matter of inlining. I had a similar problem with a different IQmath function not too long ago, and from the run times it seemed like it only pipelined when the IQmath_inline.h file was included rather than the regular IQmath.h. If this is indeed the reason it's not pipelining you might have to talk to a TI employee about getting inline support for the cos function, as it does not appear to be included in IQmath_inline.h.

  • Please see the posting ->here<- for information on the IQMath inline source.

     

    If this answers your question, please click the  Verify Answer  button below. If not, please reply back with more information.