This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

cycle count _divi(int, int) on a TMS320C6472

Other Parts Discussed in Thread: TMS320C6472

Hi,

I'm using TMS320C6472,and I'm trying to obtain the cycle count the integer division _divi(int, int).

I checked different documents but non discuss the cycle counts.

I tried to obtain the cycle count though a benchmark, but the number of cycle obtained seems to be too much (76cycles).

Did anyone has banchmark'ed this instruction, or can give me pointers on how to do it?

thanks.

khaled.

  • Khaled,
    Can you send me the code that you use for benchmark?

    Do you use the version in the run time support library or in the IQMATH library?

    (http://www.ti.com/tool/SPRC542)

    The cycle counts of course depends on the location of the code and data,

    cache issues, how often you call the function, etc., so we need your benchmark code

    Thanks

    Ran

  • Hi,

    This is the main code:

    #include <time.h>
    #include <stdint.h>      //definitions like uint32_t, ...
    #include <stdlib.h>
    #include <std.h>
    #include <log.h>        //LOG_printf

    # include "IQArithmetic.h"


    //-----------------------------------------------------------------------
    //define
    //-----------------------------------------------------------------------

    //-----------------------------------------------------------------------
    // Redefinition : Variables used for Debug
    //-----------------------------------------------------------------------

    //-----------------------------------------------------------------------
    //local variables
    //-----------------------------------------------------------------------  
    extern far LOG_Obj LOG_Debug;

    //------------------------------------------------------------------------
    //  Kernel-specific array alignment requirements.                           
    //------------------------------------------------------------------------

    #pragma DATA_ALIGN(f_dat,     8);  // Double-Word aligned.
    #pragma DATA_ALIGN(tab_left,  8);  // Double-Word aligned.
    #pragma DATA_ALIGN(tab_right, 8);  // Double-Word aligned.
    #pragma DATA_ALIGN(res_add,   8);  // Double-Word aligned.
    #pragma DATA_ALIGN(res_sub,   8);  // Double-Word aligned.
    #pragma DATA_ALIGN(res_mult,  8);  // Double-Word aligned.
    #pragma DATA_ALIGN(res_div,   8);  // Double-Word aligned.
    #pragma DATA_ALIGN(left,       8);  // Double-Word aligned.
    #pragma DATA_ALIGN(right,       8);  // Double-Word aligned.
    #pragma DATA_ALIGN(resOfadd,  8);  // Double-Word aligned.
    #pragma DATA_ALIGN(resOfsub,  8);  // Double-Word aligned.
    #pragma DATA_ALIGN(resOfmult, 8);  // Double-Word aligned.
    #pragma DATA_ALIGN(resOfdiv,  8);  // Double-Word aligned.

     
    float f_dat[N];

    _iq  left ;
    _iq     right;
    _iq     resOfadd, resOfsub, resOfmult, resOfdiv;
    _iq     tab_left[N], tab_right[N];
    _iq  res_add[N], res_sub[N], res_mult[N], res_div[N];

    //------------------------------------------------------------------------
    //  Prototypes for local functions                                       
    //-------------------------------------------------------------------------

    int main()
    {
     
        //clock_t t_overhead, t_start, t_stop;
        float     x;
        int     i;
        
        //int t_mult;
        //--------------------------------------------------------------------
        //  Generate the input vectors.                                        
        // -------------------------------------------------------------------

        x = 0.5;
        for(i = 0; i < N; i++)
        {
            f_dat[i]       =  x;
            tab_left[i]      = _FtoIQ(f_dat[i]);
            tab_right[i]  = _FtoIQ(2 * f_dat[i]);
            x              += 1;
        }


        left         = _FtoIQ(1.8);          // float convert  to IQ
        right         = _FtoIQ(2.4);          // float convert  to IQ

        LOG_printf(&LOG_Debug,"first term of left tab in fp format = %0.2f\n",
                                                         _IQtoF(tab_left[0]));

        LOG_printf(&LOG_Debug,"first term of right tab in fp format = %0.2f\n",
                                                         _IQtoF(tab_right[0]));
        
        
        
        //---------------------------------------------------------------------
        //Addition
        //---------------------------------------------------------------------
        resOfadd  = IQAdd(left,right); //call IQAdd
        LOG_printf(&LOG_Debug,"IQ addition result, in float format= %0.2f\n",
                                                               _IQtoF(resOfadd));

        IQAdd_a(N,  tab_left, tab_right, res_add);
        LOG_printf(&LOG_Debug," first term of the IQ add, in float format= %0.2f\n",
                                                           _IQtoF(res_add[0]));          

        //---------------------------------------------------------------------
        //Substraction
        //---------------------------------------------------------------------
        resOfsub  = IQSub(left,right); //call IQsub
        LOG_printf(&LOG_Debug,"IQ substraction result, in float format= %0.2f\n",
                                                             _IQtoF(resOfsub));

        IQSub_a(N,  tab_left, tab_right, res_sub);
        LOG_printf(&LOG_Debug," first term of the IQ sub, in float format= %0.2f\n",
                                                           _IQtoF(res_sub[0]));          

        //---------------------------------------------------------------------
        //Multiplication
        //---------------------------------------------------------------------
        resOfmult  = IQMult(left,right); //call IQMult
        LOG_printf(&LOG_Debug,"IQ mult result, in float format= %0.2f\n",
                                                            _IQtoF(resOfmult));

        IQMult_a(N,  tab_left, tab_right, res_mult);
        LOG_printf(&LOG_Debug," first term of the IQ mult, in float format= %0.2f\n",
                                                          _IQtoF(res_mult[0]));          

        //---------------------------------------------------------------------
        //Division
        //---------------------------------------------------------------------
        resOfdiv  = IQDivi(left,right); //call IQDivi
        LOG_printf(&LOG_Debug,"IQ division result, in float format= %0.2f\n",
                                                                _IQtoF(resOfdiv));

        IQDiv_a(N,  res_mult, tab_right, res_div);
        LOG_printf(&LOG_Debug," first term of the IQ div, in float format= %0.2f\n",
                                                           _IQtoF(res_div[0]));          

        
        return (0);
    }

    The source code for IQDivi and  IQDiv_a is as follows:

    #pragma CODE_SECTION(IQDivi,"AdaptiveCode")
    _iq IQDivi(_iq left, _iq right){

        return _IQdiv(left, right);
    }


    #pragma CODE_SECTION(IQDiv_a,"AdaptiveCode")
    void IQDiv_a(int size_a,
                    _iq *restrict num1_p,
                    _iq *restrict num2_p,
                    _iq *restrict resDiv_p) {

          int16_t i;
        #pragma MUST_ITERATE(N,N,N)
        for( i=0 ; i < size_a ; i++)
        {
            resDiv_p[i] = _IQdiv(num1_p[i], num2_p[i]);
            
        }
    }

  • Hi,

    For your information, I included IQmath_c64x+.lib, IQmath_RAM_c64x+.lib, IQmath.h and IQmath_inline.h in my project.

    Tried to enable pipelining  by including the IQmath_inline.h, but I got an error stating that the inline division is not define in IQmath_inline.h. I think that I need the C library to "C64XPLUS-IQMATHSRC"  SOURCE code to enable pipelining. Is that is correct?

    those are the obtained results:

    Functions

    Inlined and Pipelined

    Inlined and Not Pipelined

    Inlined and Not Pipelined Expected results

    add

    1.5

    6

    1

    sub

    1.5

    6

    1

    mult

    1.53

    8

    1

    div

    73

    73

    11.1

    Finally, will it possible for you to send us the test bench that TI used. This way, I can take a look at the .pjt, .cmd, and .tcf and try to figure out what is different.

    Thanks,

  • Will you post the complete project, including all include files, link command, etc.?

    Ran

  •  

    One (or two) more thing

     

    As far as I know, there is no TI standard benchmark code. 

     

    The way I do it is running the thing that I want to benchmark many times, and read the timer before and after the code. 

    Then we subtract the overhead time associated with the timer reading and divide by the number of times we run the operation

    And yes, it makes sense that in order to do inline, the compiler needs the source

    By the way, what version of CCS do you use?

     

    Ran

  • Hi,

    Enclosed is the Full project.

    we use CCS v3.3.

    The project IQArithmetic.pjt,  is located in  the path : ..\c64xplus-iqmath_Benchmarks\example\IQ_Arithmetic

    Thanks,

    6558.c64xplus-iqmath_Benchmarks.zip

  • I suggest that you ask for the source code library. 

    The request is on teh following page:

    http://www.ti.com/tool/sprc542

     There is a form that needs to be filled, and if you meet the criteria, you will get the source code.

    The source code uses look-up table and Newton Raphson iterations to calculate a/b. 

     It is a generic function, so there are provisions for certain cases.

     If you know more about your data, you may build your own scheme and may use short-cuts to improve performances.

     

    Ran

     

  • Hi Ran,

    We did try to get the C library but no success. I'm not sure what are the criterias that we need to meet, its a simple form.

    khaled.

  • This is outside of my level

     

    I suggest you talk with the TI business developer or sales representative that work with you and try to push the issue

     

    By the way, there is email address on the download page.  You can try and send email and ask how to get the source

  • Hi Ran,

    For your information, I requested and received the source code library "C64XPLUS-IQMATHSRC".

    First by compiling the project, it seems that something is wrong in  IQmath_inline_all.h/_atoIQN( ) :

    const I32_IQ c1 ((I32_IQ)(0xffffffff))

    const I64_IQ c2 ((I64_IQ)(0xffffffff80000000))

    const I64_IQ c3 ((I64_IQ)(0x7fffffff))

    const I32_IQ c4 ((I32_IQ)(0x80000000))

    const unsigned int c5 ((0xffffffff))

    const unsigned int c6 ((0x80000000))

     

    I arrived to compile the project by doing these changes:

    const I32_IQ c1 = ((I32_IQ)(0xffffffff));

    const I64_IQ c2 = ((I64_IQ)(0xffffffff80000000));

    const I64_IQ c3 = ((I64_IQ)(0x7fffffff));

    const I32_IQ c4 = ((I32_IQ)(0x80000000));

    //const unsigned int c5 = ((0xffffffff));

    //const unsigned int c6 = ((0x80000000));

     

    After applying the changes, I tried to obtain the cycle count though a benchmark by including the IQmath_inline_all.h, but the number of cycle obtained for division seems to be too much (53cycles instead of 11.1 cycles).

    Could you take another look at the problem? I have enclosed the Full project for your reference.

    The project IQArithmetic.pjt,  is located in  the path : ..\c64xplus-iqmath_Benchmarks\example\IQ_Arithmetic

    Thanks,

     

    1830.c64xplus-iqmath_Benchmarks.zip

     

  • I urgently need a solution to this number of cycles mismatch problem.

    I have reached a point where I will not be able to continue working on this project unless I'm sure that I can meet the published 11.1 cycles/division.

    khaled.