I'm trying to use clock() to time loops using a functional simulator for the c66 via loadti.bat on Cygwin and getting results that don't make sense. When I run the same code on an older simulator, I get eh expected results. I am timing timed_loop() with one trivial loop, no conditional code.
include <stdio.h>
#include <time.h>
clock_t t_overhead;
clock_t timed_loop(int size)
{
clock_t t_start, t_stop;
int i;
t_start = clock();
#pragma MUST_ITERATE(1)
for (i=0; i<size; i++)
{
asm(" NOP");
}
t_stop = clock();
return t_stop - t_start - t_overhead;
}
main()
{
int j;
clock_t t_start, t_stop;
t_start = clock();
t_stop = clock();
t_overhead = t_stop - t_start;
for (j=0; j < 10; j++)
{
clock_t runtime = timed_loop(1000);
printf("%d: #iter=%d: cyc: %d cyc (%.1f cyc/iter)\n",
j, 1000, runtime, ((float) runtime / (float) 1000));
}
The testcase is compiled using cl6x 7.4.2 with no special flags (i.e., "cl6x bug.c -z lnk.cmd rts6200.lib"), so it can be simulated on any target:
Using loadti.bat with a ccxml file for the c674, I get the expected results
(i.e., on Cygwin, executing "loadti.bat -c c674.ccxml a.out"):
0: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
1: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
2: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
3: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
4: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
5: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
6: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
7: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
8: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
9: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
With the same executable and a c66 little endian ccxml file, I get the incorrect results (e.g., "loadti.bat -c c66_sim_le.ccxml a.out"):
0: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
1: #iter=1000: cyc: 13015 cyc (13.0 cyc/iter)
2: #iter=1000: cyc: 2329 cyc (2.3 cyc/iter)
3: #iter=1000: cyc: 1367 cyc (1.4 cyc/iter)
4: #iter=1000: cyc: 1367 cyc (1.4 cyc/iter)
5: #iter=1000: cyc: 1367 cyc (1.4 cyc/iter)
6: #iter=1000: cyc: 1367 cyc (1.4 cyc/iter)
7: #iter=1000: cyc: 1367 cyc (1.4 cyc/iter)
8: #iter=1000: cyc: 1367 cyc (1.4 cyc/iter)
9: #iter=1000: cyc: 1367 cyc (1.4 cyc/iter)
5287.ccxml.zip contains ccxml files.
I am observing this behavior with various versions of the loop I am timing. The only loops for which I can repeatedly get correct timing using c66 simulation are software pipelined loops. They yield the expected timing results, even when I time them multiple times (as I did with the loop above).
Is this a known bug? Is there a work around?