This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

F28377S benchmarking, timers seems to be off by a factor of 8, what am I doing wrong

Dear C2000 Champs,

I'm benchmarking some code on the F28375S device and trying to use a timer to measure performance. The results I'm getting seem to be off by a factor of 8. I'm likely doing something wrong but not sure what.

At first, I tried using one of the CPU timers. When those results didn't look right I discovered to the 64-bit free running IPC counter. I tried using that and the results are the same.

I have pasted in my main.c below. To simplify the experiment, I replaced the function I ultimately want to benchmark with a simple series of asm(" NOP") statements. I checked the disassembly and it indeed contains a series of NOPs back to back. I capture a timestamp value just before and just after then the NOPs and then take the delta. In both cases, I end up with what looks like 8 cycles per NOP instruction. This doesn't seem right, I would have expected 1 cycle per NOP. Could it be due to running out of flash or something like that.

I first ran this with 20 NOPs and measured 176 cycles

I next ran it with 40 NOPs and measured 336 cycles


If you take the difference between these two, it should give you the time to run 20 NOPs minus any overhead of reading the timers. 336-176=160 cycles for 20 NOPs which comes out exactly 8 cycles per NOP.


Can anybody explain this?

Thanks

main.c

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>

#include "InverterControl.h"

#define REG16(addr) *(volatile uint16_t *)(addr)
#define REG32(addr) *(volatile uint32_t *)(addr)
//#define TIMESTAMP()   REG32(0x0C00 + 0x00 /* TIM */);
#define TIMESTAMP()   REG32(0x0005000C /* IPCCOUNTERL */);

int main(void) {

  volatile uint32_t t0, t1, dt;

  // setup CPU timer 0, will be used for timestamping our code
  REG32(0x0C00 + 0x02 /* PRD */) = 0x0FFFFFFF;
  REG32(0x0C00 + 0x00 /* TIM */) = 0x00000000;
  REG32(0x0C00 + 0x04 /* TCR */) = 0x00000000;
    
  //InverterControlInit();

  t0 = TIMESTAMP();
  //InverterRun_ISR();
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  asm(" NOP");
  t1 = TIMESTAMP();

  dt = (t1 - t0);

  printf("dt=%lu\n", dt);

  while (1) {asm(" NOP");}

  return 0;
}

  • Brad,

    Looks like you did not configure the Flash wait-states and Flash data cache to take the performance advantage.

    Check couple of things:
    1) Flash Wait-states (RWAIT field in FRDCNTL register) should be 3 at 200MHz.
    2) Enable data cache in the Flash read path (bit 0 in FRD_INTF_CTRL should be set.)

    Once you configure above registers as suggested, you would notice 1 cycle per NOP. If you leave the default configuration (15 waitstates and data cache disabled), it will take 8 cycles as you mentioned. You can use the functions InitFlash_Bank0() and InitFlash_Bank1() provided in F2837xS_SysCtrl.c file to configure above registers.

    Also note that Flash bank 1 in F2837xS devices incur one extra wait-state compared to that of Flash bank 0, even for prefetched data. In your case, looks like you are executing from Flash bank 0 but wanted to just let you know as you might step into it some time.

    Thanks and regards,
    Vamsi
  • Brad,

    Did you get 1 cycle per NOP with data cache enabled and 3 Flash wait-states @ 200MHz?  

    Thanks and regards,

    Vamsi

  • Yes, I added in a call to InitSysCtrl(); and now I get expected results, single cycle per NOP. Thanks for your help. I can now benchmark my real code with confidence.