Hi,
I'm using TMS320C6472,and I'm trying to obtain the cycle count the integer division _divi(int, int).
I checked different documents but non discuss the cycle counts.
I tried to obtain the cycle count though a benchmark, but the number of cycle obtained seems to be too much (76cycles).
Did anyone has banchmark'ed this instruction, or can give me pointers on how to do it?
thanks.
khaled.
Khaled,Can you send me the code that you use for benchmark?
Do you use the version in the run time support library or in the IQMATH library?
(http://www.ti.com/tool/SPRC542)
The cycle counts of course depends on the location of the code and data,
cache issues, how often you call the function, etc., so we need your benchmark code
Thanks
Ran
This is the main code:
#include <time.h>#include <stdint.h> //definitions like uint32_t, ...#include <stdlib.h>#include <std.h>#include <log.h> //LOG_printf# include "IQArithmetic.h"//-----------------------------------------------------------------------//define//-----------------------------------------------------------------------//-----------------------------------------------------------------------// Redefinition : Variables used for Debug//-----------------------------------------------------------------------//-----------------------------------------------------------------------//local variables //----------------------------------------------------------------------- extern far LOG_Obj LOG_Debug;//------------------------------------------------------------------------// Kernel-specific array alignment requirements. //------------------------------------------------------------------------#pragma DATA_ALIGN(f_dat, 8); // Double-Word aligned. #pragma DATA_ALIGN(tab_left, 8); // Double-Word aligned. #pragma DATA_ALIGN(tab_right, 8); // Double-Word aligned.#pragma DATA_ALIGN(res_add, 8); // Double-Word aligned. #pragma DATA_ALIGN(res_sub, 8); // Double-Word aligned. #pragma DATA_ALIGN(res_mult, 8); // Double-Word aligned.#pragma DATA_ALIGN(res_div, 8); // Double-Word aligned.#pragma DATA_ALIGN(left, 8); // Double-Word aligned. #pragma DATA_ALIGN(right, 8); // Double-Word aligned.#pragma DATA_ALIGN(resOfadd, 8); // Double-Word aligned.#pragma DATA_ALIGN(resOfsub, 8); // Double-Word aligned.#pragma DATA_ALIGN(resOfmult, 8); // Double-Word aligned.#pragma DATA_ALIGN(resOfdiv, 8); // Double-Word aligned. float f_dat[N];_iq left ;_iq right;_iq resOfadd, resOfsub, resOfmult, resOfdiv;_iq tab_left[N], tab_right[N];_iq res_add[N], res_sub[N], res_mult[N], res_div[N];//------------------------------------------------------------------------// Prototypes for local functions //-------------------------------------------------------------------------int main(){ //clock_t t_overhead, t_start, t_stop; float x; int i; //int t_mult; //-------------------------------------------------------------------- // Generate the input vectors. // ------------------------------------------------------------------- x = 0.5; for(i = 0; i < N; i++) { f_dat[i] = x; tab_left[i] = _FtoIQ(f_dat[i]); tab_right[i] = _FtoIQ(2 * f_dat[i]); x += 1; } left = _FtoIQ(1.8); // float convert to IQ right = _FtoIQ(2.4); // float convert to IQ LOG_printf(&LOG_Debug,"first term of left tab in fp format = %0.2f\n", _IQtoF(tab_left[0])); LOG_printf(&LOG_Debug,"first term of right tab in fp format = %0.2f\n", _IQtoF(tab_right[0])); //--------------------------------------------------------------------- //Addition //--------------------------------------------------------------------- resOfadd = IQAdd(left,right); //call IQAdd LOG_printf(&LOG_Debug,"IQ addition result, in float format= %0.2f\n", _IQtoF(resOfadd)); IQAdd_a(N, tab_left, tab_right, res_add); LOG_printf(&LOG_Debug," first term of the IQ add, in float format= %0.2f\n", _IQtoF(res_add[0])); //--------------------------------------------------------------------- //Substraction //--------------------------------------------------------------------- resOfsub = IQSub(left,right); //call IQsub LOG_printf(&LOG_Debug,"IQ substraction result, in float format= %0.2f\n", _IQtoF(resOfsub)); IQSub_a(N, tab_left, tab_right, res_sub); LOG_printf(&LOG_Debug," first term of the IQ sub, in float format= %0.2f\n", _IQtoF(res_sub[0])); //--------------------------------------------------------------------- //Multiplication //--------------------------------------------------------------------- resOfmult = IQMult(left,right); //call IQMult LOG_printf(&LOG_Debug,"IQ mult result, in float format= %0.2f\n", _IQtoF(resOfmult)); IQMult_a(N, tab_left, tab_right, res_mult); LOG_printf(&LOG_Debug," first term of the IQ mult, in float format= %0.2f\n", _IQtoF(res_mult[0])); //--------------------------------------------------------------------- //Division //--------------------------------------------------------------------- resOfdiv = IQDivi(left,right); //call IQDivi LOG_printf(&LOG_Debug,"IQ division result, in float format= %0.2f\n", _IQtoF(resOfdiv)); IQDiv_a(N, res_mult, tab_right, res_div); LOG_printf(&LOG_Debug," first term of the IQ div, in float format= %0.2f\n", _IQtoF(res_div[0])); return (0);}
The source code for IQDivi and IQDiv_a is as follows:
#pragma CODE_SECTION(IQDivi,"AdaptiveCode")_iq IQDivi(_iq left, _iq right){ return _IQdiv(left, right);}#pragma CODE_SECTION(IQDiv_a,"AdaptiveCode")void IQDiv_a(int size_a, _iq *restrict num1_p, _iq *restrict num2_p, _iq *restrict resDiv_p) { int16_t i; #pragma MUST_ITERATE(N,N,N) for( i=0 ; i < size_a ; i++) { resDiv_p[i] = _IQdiv(num1_p[i], num2_p[i]); }}
For your information, I included IQmath_c64x+.lib, IQmath_RAM_c64x+.lib, IQmath.h and IQmath_inline.h in my project.
Tried to enable pipelining by including the IQmath_inline.h, but I got an error stating that the inline division is not define in IQmath_inline.h. I think that I need the C library to "C64XPLUS-IQMATHSRC" SOURCE code to enable pipelining. Is that is correct?
those are the obtained results:
Functions
Inlined and Pipelined
Inlined and Not Pipelined
Inlined and Not Pipelined Expected results
add
1.5
6
1
sub
mult
1.53
8
div
73
11.1
Finally, will it possible for you to send us the test bench that TI used. This way, I can take a look at the .pjt, .cmd, and .tcf and try to figure out what is different.
Thanks,
Will you post the complete project, including all include files, link command, etc.?
One (or two) more thing
As far as I know, there is no TI standard benchmark code.
The way I do it is running the thing that I want to benchmark many times, and read the timer before and after the code.
Then we subtract the overhead time associated with the timer reading and divide by the number of times we run the operation
And yes, it makes sense that in order to do inline, the compiler needs the source
By the way, what version of CCS do you use?
Enclosed is the Full project.
we use CCS v3.3.
The project IQArithmetic.pjt, is located in the path : ..\c64xplus-iqmath_Benchmarks\example\IQ_Arithmetic
6558.c64xplus-iqmath_Benchmarks.zip
I suggest that you ask for the source code library.
The request is on teh following page:
http://www.ti.com/tool/sprc542
There is a form that needs to be filled, and if you meet the criteria, you will get the source code.
The source code uses look-up table and Newton Raphson iterations to calculate a/b.
It is a generic function, so there are provisions for certain cases.
If you know more about your data, you may build your own scheme and may use short-cuts to improve performances.
Hi Ran,
We did try to get the C library but no success. I'm not sure what are the criterias that we need to meet, its a simple form.
This is outside of my level
I suggest you talk with the TI business developer or sales representative that work with you and try to push the issue
By the way, there is email address on the download page. You can try and send email and ask how to get the source
For your information, I requested and received the source code library "C64XPLUS-IQMATHSRC".
First by compiling the project, it seems that something is wrong in IQmath_inline_all.h/_atoIQN( ) :
const I32_IQ c1 ((I32_IQ)(0xffffffff))
const I64_IQ c2 ((I64_IQ)(0xffffffff80000000))
const I64_IQ c3 ((I64_IQ)(0x7fffffff))
const I32_IQ c4 ((I32_IQ)(0x80000000))
const unsigned int c5 ((0xffffffff))
const unsigned int c6 ((0x80000000))
I arrived to compile the project by doing these changes:
const I32_IQ c1 = ((I32_IQ)(0xffffffff));
const I64_IQ c2 = ((I64_IQ)(0xffffffff80000000));
const I64_IQ c3 = ((I64_IQ)(0x7fffffff));
const I32_IQ c4 = ((I32_IQ)(0x80000000));
//const unsigned int c5 = ((0xffffffff));
//const unsigned int c6 = ((0x80000000));
After applying the changes, I tried to obtain the cycle count though a benchmark by including the IQmath_inline_all.h, but the number of cycle obtained for division seems to be too much (53cycles instead of 11.1 cycles).
Could you take another look at the problem? I have enclosed the Full project for your reference.
1830.c64xplus-iqmath_Benchmarks.zip
I urgently need a solution to this number of cycles mismatch problem.
I have reached a point where I will not be able to continue working on this project unless I'm sure that I can meet the published 11.1 cycles/division.