cycle count _divi(int, int) on a TMS320C6472

khaled saab

Genius 3155 points

Other Parts Discussed in Thread: TMS320C6472

Hi,

I'm using TMS320C6472,and I'm trying to obtain the cycle count the integer division _divi(int, int).

I checked different documents but non discuss the cycle counts.

I tried to obtain the cycle count though a benchmark, but the number of cycle obtained seems to be too much (76cycles).

Did anyone has banchmark'ed this instruction, or can give me pointers on how to do it?

thanks.

khaled.

over 13 years ago

0 ran35366 over 13 years ago

TI__Genius 12805 points

Khaled,
Can you send me the code that you use for benchmark?

Do you use the version in the run time support library or in the IQMATH library?

(http://www.ti.com/tool/SPRC542)

The cycle counts of course depends on the location of the code and data,

cache issues, how often you call the function, etc., so we need your benchmark code

Thanks

Ran

0 khaled saab over 13 years ago in reply to ran35366

Genius 3155 points

Hi,

This is the main code:

#include <time.h>
#include <stdint.h>     //definitions like uint32_t, ...
#include <stdlib.h>
#include <std.h>
#include <log.h>       //LOG_printf

# include "IQArithmetic.h"

//-----------------------------------------------------------------------
//define
//-----------------------------------------------------------------------

//-----------------------------------------------------------------------
// Redefinition : Variables used for Debug
//-----------------------------------------------------------------------

//-----------------------------------------------------------------------
//local variables
//-----------------------------------------------------------------------
extern far LOG_Obj LOG_Debug;

//------------------------------------------------------------------------
// Kernel-specific array alignment requirements.
//------------------------------------------------------------------------

#pragma DATA_ALIGN(f_dat,     8); // Double-Word aligned.
#pragma DATA_ALIGN(tab_left, 8); // Double-Word aligned.
#pragma DATA_ALIGN(tab_right, 8); // Double-Word aligned.
#pragma DATA_ALIGN(res_add,   8); // Double-Word aligned.
#pragma DATA_ALIGN(res_sub,   8); // Double-Word aligned.
#pragma DATA_ALIGN(res_mult, 8); // Double-Word aligned.
#pragma DATA_ALIGN(res_div,   8); // Double-Word aligned.
#pragma DATA_ALIGN(left,       8); // Double-Word aligned.
#pragma DATA_ALIGN(right,       8); // Double-Word aligned.
#pragma DATA_ALIGN(resOfadd, 8); // Double-Word aligned.
#pragma DATA_ALIGN(resOfsub, 8); // Double-Word aligned.
#pragma DATA_ALIGN(resOfmult, 8); // Double-Word aligned.
#pragma DATA_ALIGN(resOfdiv, 8); // Double-Word aligned.

float f_dat[N];

_iq left ;
_iq    right;
_iq    resOfadd, resOfsub, resOfmult, resOfdiv;
_iq    tab_left[N], tab_right[N];
_iq res_add[N], res_sub[N], res_mult[N], res_div[N];

//------------------------------------------------------------------------
// Prototypes for local functions
//-------------------------------------------------------------------------

int main()
{

    //clock_t t_overhead, t_start, t_stop;
    float    x;
    int    i;

   //int t_mult;
    //--------------------------------------------------------------------
    // Generate the input vectors.
    // -------------------------------------------------------------------

    x = 0.5;
    for(i = 0; i < N; i++)
    {
        f_dat[i]       = x;
       tab_left[i]      = _FtoIQ(f_dat[i]);
       tab_right[i] = _FtoIQ(2 * f_dat[i]);
        x            += 1;
    }

   left         = _FtoIQ(1.8);         // float convert to IQ
   right         = _FtoIQ(2.4);         // float convert to IQ

   LOG_printf(&LOG_Debug,"first term of left tab in fp format = %0.2f\n",
                                                   _IQtoF(tab_left[0]));

   LOG_printf(&LOG_Debug,"first term of right tab in fp format = %0.2f\n",
                                                   _IQtoF(tab_right[0]));



   //---------------------------------------------------------------------
   //Addition
   //---------------------------------------------------------------------
   resOfadd = IQAdd(left,right); //call IQAdd
    LOG_printf(&LOG_Debug,"IQ addition result, in float format= %0.2f\n",
                                                          _IQtoF(resOfadd));

   IQAdd_a(N, tab_left, tab_right, res_add);
   LOG_printf(&LOG_Debug," first term of the IQ add, in float format= %0.2f\n",
                                                       _IQtoF(res_add[0]));

   //---------------------------------------------------------------------
   //Substraction
   //---------------------------------------------------------------------
   resOfsub = IQSub(left,right); //call IQsub
    LOG_printf(&LOG_Debug,"IQ substraction result, in float format= %0.2f\n",
                                                       _IQtoF(resOfsub));

   IQSub_a(N, tab_left, tab_right, res_sub);
   LOG_printf(&LOG_Debug," first term of the IQ sub, in float format= %0.2f\n",
                                                       _IQtoF(res_sub[0]));

   //---------------------------------------------------------------------
   //Multiplication
   //---------------------------------------------------------------------
   resOfmult = IQMult(left,right); //call IQMult
    LOG_printf(&LOG_Debug,"IQ mult result, in float format= %0.2f\n",
                                                        _IQtoF(resOfmult));

   IQMult_a(N, tab_left, tab_right, res_mult);
   LOG_printf(&LOG_Debug," first term of the IQ mult, in float format= %0.2f\n",
                                                      _IQtoF(res_mult[0]));

   //---------------------------------------------------------------------
   //Division
   //---------------------------------------------------------------------
   resOfdiv = IQDivi(left,right); //call IQDivi
    LOG_printf(&LOG_Debug,"IQ division result, in float format= %0.2f\n",
                                                            _IQtoF(resOfdiv));

   IQDiv_a(N, res_mult, tab_right, res_div);
   LOG_printf(&LOG_Debug," first term of the IQ div, in float format= %0.2f\n",
                                                       _IQtoF(res_div[0]));


    return (0);
}

The source code for IQDivi and IQDiv_a is as follows:

#pragma CODE_SECTION(IQDivi,"AdaptiveCode")
_iq IQDivi(_iq left, _iq right){

   return _IQdiv(left, right);
}

#pragma CODE_SECTION(IQDiv_a,"AdaptiveCode")
void IQDiv_a(int size_a,
               _iq *restrict num1_p,
               _iq *restrict num2_p,
               _iq *restrict resDiv_p) {

   int16_t i;
   #pragma MUST_ITERATE(N,N,N)
   for( i=0 ; i < size_a ; i++)
   {
       resDiv_p[i] = _IQdiv(num1_p[i], num2_p[i]);

   }
}

0 khaled saab over 13 years ago in reply to ran35366

Genius 3155 points

Hi,

For your information, I included IQmath_c64x+.lib, IQmath_RAM_c64x+.lib, IQmath.h and IQmath_inline.h in my project.

Tried to enable pipelining by including the IQmath_inline.h, but I got an error stating that the inline division is not define in IQmath_inline.h. I think that I need the C library to "C64XPLUS-IQMATHSRC" SOURCE code to enable pipelining. Is that is correct?

those are the obtained results:

Functions	Inlined and Pipelined	Inlined and Not Pipelined	Inlined and Not Pipelined Expected results
add	1.5	6	1
sub	1.5	6	1
mult	1.53	8	1
div	73	73	11.1

Finally, will it possible for you to send us the test bench that TI used. This way, I can take a look at the .pjt, .cmd, and .tcf and try to figure out what is different.

Thanks,

0 ran35366 over 13 years ago in reply to khaled saab

TI__Genius 12805 points

Will you post the complete project, including all include files, link command, etc.?

Ran

0 ran35366 over 13 years ago in reply to ran35366

TI__Genius 12805 points

One (or two) more thing

As far as I know, there is no TI standard benchmark code.

The way I do it is running the thing that I want to benchmark many times, and read the timer before and after the code.

Then we subtract the overhead time associated with the timer reading and divide by the number of times we run the operation

And yes, it makes sense that in order to do inline, the compiler needs the source

By the way, what version of CCS do you use?

Ran

0 khaled saab over 13 years ago in reply to ran35366

Genius 3155 points

Hi,

Enclosed is the Full project.

we use CCS v3.3.

The project IQArithmetic.pjt, is located in the path : ..\c64xplus-iqmath_Benchmarks\example\IQ_Arithmetic

Thanks,

6558.c64xplus-iqmath_Benchmarks.zip

0 ran35366 over 13 years ago in reply to khaled saab

TI__Genius 12805 points

I suggest that you ask for the source code library.

The request is on teh following page:

http://www.ti.com/tool/sprc542

There is a form that needs to be filled, and if you meet the criteria, you will get the source code.

The source code uses look-up table and Newton Raphson iterations to calculate a/b.

It is a generic function, so there are provisions for certain cases.

If you know more about your data, you may build your own scheme and may use short-cuts to improve performances.

Ran

0 khaled saab over 13 years ago in reply to ran35366

Genius 3155 points

Hi Ran,

We did try to get the C library but no success. I'm not sure what are the criterias that we need to meet, its a simple form.

khaled.

0 ran35366 over 13 years ago in reply to khaled saab

TI__Genius 12805 points

This is outside of my level

I suggest you talk with the TI business developer or sales representative that work with you and try to push the issue

By the way, there is email address on the download page. You can try and send email and ask how to get the source

0 khaled saab over 13 years ago in reply to ran35366

Genius 3155 points

Hi Ran,

For your information, I requested and received the source code library "C64XPLUS-IQMATHSRC".

First by compiling the project, it seems that something is wrong in IQmath_inline_all.h/_atoIQN( ) :

const I32_IQ c1 ((I32_IQ)(0xffffffff))

const I64_IQ c2 ((I64_IQ)(0xffffffff80000000))

const I64_IQ c3 ((I64_IQ)(0x7fffffff))

const I32_IQ c4 ((I32_IQ)(0x80000000))

const unsigned int c5 ((0xffffffff))

const unsigned int c6 ((0x80000000))

I arrived to compile the project by doing these changes:

const I32_IQ c1 = ((I32_IQ)(0xffffffff));

const I64_IQ c2 = ((I64_IQ)(0xffffffff80000000));

const I64_IQ c3 = ((I64_IQ)(0x7fffffff));

const I32_IQ c4 = ((I32_IQ)(0x80000000));

//const unsigned int c5 = ((0xffffffff));

//const unsigned int c6 = ((0x80000000));

After applying the changes, I tried to obtain the cycle count though a benchmark by including the IQmath_inline_all.h, but the number of cycle obtained for division seems to be too much (53cycles instead of 11.1 cycles).

Could you take another look at the problem? I have enclosed the Full project for your reference.

The project IQArithmetic.pjt, is located in the path : ..\c64xplus-iqmath_Benchmarks\example\IQ_Arithmetic

Thanks,

1830.c64xplus-iqmath_Benchmarks.zip

0 khaled saab over 13 years ago in reply to khaled saab

Genius 3155 points

I urgently need a solution to this number of cycles mismatch problem.

I have reached a point where I will not be able to continue working on this project unless I'm sure that I can meet the published 11.1 cycles/division.

khaled.

Processors

Processors forum

cycle count _divi(int, int) on a TMS320C6472