This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Trouble with Cycle count on C6418 evm.

Hi Im using a C6418 evm and I am having trouble getting the correct cycle counts.  Im getting two different results using the Target/clock/view or using the TIMER_getCount(hTimer); The one that seems to be correct is the Target/clock/view, but i cant use that during full optimization. Im using DSP-bios version 5.41.02.14.

One other problem Im having is that the processor takes way to long time to calculate simple commands.  For instance a for loop looking like

short x[32];

int i;

for(i=0;i<32;i++){

x[i]=x[i]<<2;

}

takes approximatly 1500 cycles using Target/clock/view . That can't be right.

The code for cyclecount im using is:

void cycle_count_init(void){

hTimer = TIMER_open(TIMER_DEVANY,0);

TIMER_configArgs(hTimer, 0x000002C0, 0xFFFFFFFF, 0x00000000);

/* Compute the overhead of calling the timer. */

cycle_start = TIMER_getCount(hTimer);

/* to remove L1P miss overhead */

cycle_start = TIMER_getCount(hTimer);

cycle_stop = TIMER_getCount(hTimer);

cycle_overhead = cycle_stop - cycle_start;

}

 

void cycle_count_start(){

cycle_start = TIMER_getCount(hTimer);

} 

 

#define CYCLES_PER_TIMERTICK 8

 

Void cycle_count_stop(unsigned int *cycle_ave){

unsigned int temp;

cycle_stop = TIMER_getCount(hTimer);

cycle_diff = ((cycle_stop - cycle_start) - cycle_overhead);

if(cycle_diff>0){

//avoid negative cycle counts

temp=(cycle_diff*CYCLES_PER_TIMERTICK);

(*cycle_ave)=temp*0.05+(*Cycle_count_ave)*0.95;

}

}

Is there somthing that I'm doing wrong?

  • Magnus,

    The cycle count you get might not be that wrong depending on the setup used (data/code located in internal L2SRAM or SDRAM, Cache enabled or not, CPU/EMIF speed...etc).

    - To start with you could place your basic example in L2 SRAM and verify the CPU CLK speed. Then do the benchmarks.
    The L2SRAM access time (L1D miss and L2SRAM hit) are described in tab 4 page 26 of the C64x+ Cache user's guide  - SPRU610.

    Note that it is difficult to benchmarks small code portion because of the pipeline. You should rather benchmark a complete loop that run a high number of time and then devide the number you get by n.

    - The simulator of CCS 3.30 (for example C6416 cycle accurate or DM642 cycle accurate) might be useful as well to analyze the Cache using the Cache tune utility (see the Cache tuning section of the CCS/help tutorial/Appication code tuning menu).
    There are no C6418 simulator but for cache analysis any of those 2 simulator would be ok.

    Hope it helps.

    A.