Hello,
I am experimenting with c66x DSPs in EVMK2GX board. I would like to measure function times for my application. However when I time the actual clocks using driver examples provided with the library, actual clock cycles appear to be always 25% more than theoretical clock cycles I calculate using formulas from Texas Instruments "Test Results DSPLIB 3.4.0.0 C66x (document comes with the library installation)". For example:
DSPF_sp_mat_submat_copy_cplx | DSPF_sp_mat_trans_cplx | DSPF_sp_mat_mul_gemm_cplx | |
Cycles Formula | 5270 | 3780 | 55700 |
EVMK2GX | 6692 | 4631 | 68989 |
I modifed an example to do the measurements:
void main(void) {
clock_t t_overhead, t_start, t_stop, t_opt;
/* ------------------------------------------------------------------- */
/* Compute the overhead of calling clock twice to get timing info */
/* ------------------------------------------------------------------- */
/* Initialize timer for clock */
TSCL= 0, TSCH=0;
t_start = _itoll(TSCH, TSCL);
t_stop = _itoll(TSCH, TSCL);
t_overhead = t_stop - t_start;
/* ------------------------------------------------------------------- */
/* Generate random inputs in range (-10, 10). */
/* ------------------------------------------------------------------- */
UTIL_fillRandSP(ptr_x1, 2 * NR1 * NC1, 10.0);
DSPF_sp_mat_submat_copy_cplx(ptr_x1, NR1, NC1, 0, NR1, ptr_x2, 1);
/* ------------------------------------------------------------------- */
/* Measure the cycle count */
/* ------------------------------------------------------------------- */
t_start = _itoll(TSCH, TSCL);
t_start = _itoll(TSCH, TSCL); /* to remove L1P miss overhead */
DSPF_sp_mat_submat_copy_cplx(ptr_x1, NR1, NC1, 0, NR1, ptr_x2, 1);
t_stop = _itoll(TSCH, TSCL);
t_opt = t_stop - t_start - t_overhead;
printf("\tNR = %.2d\tNC1 = %.2d\tNC2 = %.2d\toptC: %d\n", NR1, NC1, NC2, t_opt);
}
All data is stored in L2 memory (see attached linker file). It is stated that there might be differences due to cache misses etc. However, in my case, inconsistency is not constant in all functions. Actual measurement takes about 1.25 times cycles longer than the measurement calculated using cycles formula.
What might be the issue?