This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[DM6437] Slow IQ math execution

Other Parts Discussed in Thread: CCSTUDIO

I have been comparing the execution cycles of some very simple functions and am very surprised at the large number of cycles required compared to simple integer math.  One thought I have is that maybe my program cache is not handling this properly, but I'm not sure.  This is setup so that the compiler should be able to pipeline the operations and seems to work with the integer math.  I don't know what is actually happening so I could use some help please.

This code executes in an average of 56 cycles with NUM_TAPS set to 16:

 

 

STS_set

(&stsGenericTime3, CLK_gethtime());
iq10Rccoeff = 0;
for (i = 0; i < NUM_TAPS; i++)
iq10Rccoeff += iq10ImgNormVal[i] * iq10Tmpl[i];
STS_delta(&stsGenericTime3, CLK_gethtime());

This code executes in an average of 591 cycles with NUM_TAPS set to 16:

 

 

STS_set

(&stsGenericTime3, CLK_gethtime());
iq10Rccoeff = 0;
for (i = 0; i < NUM_TAPS; i++)
iq10Rccoeff += _IQ10mpy(iq10ImgNormVal[i], iq10Tmpl[i]);
STS_delta(&stsGenericTime3, CLK_gethtime());

That is a 10x increase.  Any suggestions on how to improve this?  My optimization options are set as follows:

Optimize for Speed (-mf): 5

Opt Level: File (-o3)

  • Edwin,

    Which device are you using?  Are the functions custom or are they from a TI library?

    -Tommy

  • Tommy, this is running on a 6437.  The only function that changes between the 2 options is one uses integer multiplication and the other uses IQ10mpy from the library.  All the other code is the same and shown only to give an idea of the loop it is in - can the funciton IQ10mpy be pipelined?

  • Anyone?  This seems like a reasonable question to ask so please make some suggestions.

    Ed

  • Edwin,

    Keep in mind that you are replacing a few operations with a function call from a different file, and you can't do software pipelining if a function is called.  I'm not exactly sure how to get library functions pulled out and "inlined"... as a quick test you might try program mode compilation or you could try to select "inline functions called only once" from the compiler advanced 2 tab (assuming in your test program you only call this once).  It should be possible since the iqmath documentation lists significant performance improvements for pipelined calls (20 cycles stand alone, 1 cycle pipelined)

  • I am trying to learn more about pipelining, and wasn't sure if the _IQ10mpy (for example) was a function call or a macro.  In a nutshell, I think I need to find a way for this to be pipelined.  Whether that is possible with the IQ math functions or I need to write source code to accomplish the same thing and figure out how to make it pipelineable if that is possible.

    Your suggestion to "inline functions called only once" had no affect on the execution times.  I'm not clear on what you mean by program mode compilation, but will try it if you can explain what this means.  I don't find any source code to help me understand how difficult it is to multiply Q format numbers together - are you aware of any source that I should have with the library?  Mine is V213.

    Ed

  • Program mode compilation lets the compiler see the entire project at once, so that it can optimize beyond file level.  It is the last line of the compiler basic tab - select program mode compilation.  You should also make sure you have C64x+ listed as your target version on that same tab, and check the No debug level on the 2nd line.

    You shouldn't have to write your own source... if this doesn't work a TI employee should chime in and tell you how to inline this call.

     

  • Check out the dotprod example included in the library... they have an IQ_math_inlined header file that is probably what you are looking for.

  • Hello, yes IQMath supports inlining. Refer C:\CCStudio_v3.3\c64plus\IQmath_v213\docs\SPRUGG9.pdf refer table 3-7. As was pointer earlier, you need to include header file:C:\CCStudio_v3.3\c64plus\IQmath_v213\include\IQmath_inline.h and define symbol _INLINE_IQMATH

    Gagan

  • I have looked at the dotprod example and have tried to do as you suggested by adding the following:

    #define _INLINE_IQMATH

    #ifdef _INLINE_IQMATH
          #include <IQmath_inline.h>
    #else
          #include <IQmath.h>
    #endif

    But, I keep getting these errors 2 of the functions I use and am not sure what to do about it:

    "C:\TI\IQMath\IQmath_v213\include\IQmath.h", line 1205: error: function "_IQNdiv" was referenced but not defined
    "C:\TI\IQMath\IQmath_v213\include\IQmath.h", line 2874: error: function "_IQNsqrt" was referenced but not defined

    Oddly enough, I don't get this error on _IQmpy ???

  • Hmm, I checked out the IQmath_inline.h file, and it doesn't have functions for div & sqrt. (but does have code for IQnmpy)  But the IQmath.h file seems to think that someone is going to define that function, since the two error lines you mentioned define function prototypes for them.  I did notice that sprugg9 page 8 says that IQmath_inline.h "Includes source code for certain IQMath APIs to enable inlining".  I guess you found the certain APIs that are not included.  You can comment out the 

    #ifdef _INLINE_IQMATH

    static inline I32_IQ _IQNdiv(I32_IQ num, I32_IQ den, Uword32 qfmt);

    #else

    and the corresponding #endif line at the two places you are getting errors.  That will get it to build, but then you won't have inlined versions of those functions.  My question is, benchmark numbers are provided for those functions in sprugg9, so where is the code/library to support their use?

  • OK, thanks for looking into it.  I don't know what the point would be for me to get it to compile without the inlined functions - there wouldn't be any performance change or it wouldn't work at all.  I'm only using IQdiv and IQsqrt at this time.

    I elminated IQmpy which I was actually using to square a number.  I can't "prove" this, but after evaluating thousands of calculations I have an error < 0.02% by substituting the first lines of code for the second.  It also pipelines and reduced the cycles from 714 to 136 in my loop:

    Int32  lTmp;
    lTmp = iqImgNormVal[i] >> GLOBAL_Q/2;
    iqImgSumSqNormVal += (lTmp * lTmp);

    iqImgSumSqNormVal += _IQmpy(iqImgNormVal[i], iqImgNormVal[i]);

  • Edwin, sorry for trouble. As you have realized the IQMath release provides source for only few kernels. The source for other kernels is only released under NDA. Do you have local TI contact that you work with? Can you request them for such access?

    Thanks,
    Gagan