This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Problem in benchmarking with CLK_gethtime()

Hi:

 

I am benchmarking my code with CLK_gethtime().

I have a problem when nesting the time marking.

 

Here is my code:

 

unsigned int times[3];
unsigned int periods[3];

times[0] = CLK_gethtime();

     times[1] = CLK_gethtime();

     // code block 1

    periods[1] = CLK_gethtime() - times[1];


     times[2] = CLK_gethtime();

     // code block 2

    periods[2] = CLK_gethtime() - times[2];

periods[0] = CLK_gethtime() - times[0];

 

As a result, periods[0] should approximately equal to the total of periods[1] and periods[2]. However, I found periods[0] is much greater (1 time more) than the sum, which means half of the total period was missing.

 

So I investigated the assembly code, and found that the code was not executed in the same order as in C code. This is understandable because the compiler tried to optimize the code for performance. I tried different  optimization options (-O3, -O0, or without -O option) in compiling,  but the result is the same.

 

Has anyone meet this problem and how to solve it for benchmarking?

How can I disable the optimization options for compilation to keep the actual execution same as C code? Here is my compiling command, the red part varied as I mentioned:

cl6x -c  -oe -qq -pdsw225 -k -pm -mw -mt -ss -os -O0 --no_compress --mem_model:data=far --disable:sploop  -pdr -pden -pds=681 -pds=452 -pds=195  -mv64p -eo.o64P -ea.s64P  -D_DEBUG_=1 -DDO_INTRINSIC  -Dxdc_target_name__=C64P -Dxdc_target_types__=ti/targets/std.h -Dxdc_bld__profile_debug -Dxdc_bld__vers_1_0_6_0_11 -g -DIDMA3_USEFULLPACKAGEPATH -DACPY3_USEFULLPACKAGEPATH -DUSE_ACPY3....

 

I have disabled the interrupts during the function call by using:

#pragma FUNC_INTERRUPT_THRESHOLD( AutoEnhance, -1)

 

Thank you in advance.

Kevin

 

 

 

 

 

 

 

  • When you build without optimization, the execution order of the assembly will closely follow that of the original source.  But you say you get the same answer regardless of whether you build with optimization or not.  This says that the optimized execution order is correct.  Something else is wrong.

    I notice you are not accounting for the overhead of calling the clock function two times.  You need to add code like this ...

        start = CLK_gethtime();
        stop  = CLK_gethtime();
        overhead = stop - start;
    

    Then subtract that overhead when computing your clock deltas ...

        stop = CLK_gethtime();
        periods[0] = stop - times[0] - overhead;
    

    I don't know if this will solve the entire problem.  But it should help.

    Thanks and regards,

    -George

     

  • Hi, George:

     

    Thanks for you advice.

     

    I accounted the timing overhead. It is around 14 in my case. The difference between periods[0]  and the sum of periods[1] and periods[2] is lessened  when the overhead is considered. But there is a gap in between. Here is the data I obtained:

    overhead: 14

    periods[1]: 33

    periods[2]: 16

    periods[0]: 95

     

    The equation is supposed to be:

    periods[0] = periods[1] + periods[2] + 2 * overhead

     

    A difference of 12 exists.

     

    I am not sure if the gap is reasonable. Can anyone post  the data he/she obtained?

     

    Thanks again.

    Kevin

     

     

     

     

  • There is no real way to avoid the overhead.  There will always be overhead in saving the results.  Your best bet is to minimize the overhead relative ot the measurement.  Do this by running the code segments to be measured inside a loop that is repeated a large number of times.  Then divide the results by the number of repetitions.  The overhead will be reduced by that factor, as well.