Hello
I want to use TMS320C6472 for real time simulation. so I want to check how much time does a core take to compute a multiplication of 20*20 matrix and 20*1 column vector. all the data are in 'int'. I am using evm6472.gel that comes with the kit. presently I want to use a single core to calculate. I am using core-3.in the gel file the PLL1 multiplier is configured at 25 so that the core gets 625 MHz.
my linker file contains:
-c
-lrts64plus.lib
-heap 0xa000
-stack 0x2000
/* Memory Map 1 - the default */
MEMORY
{
L1D: o = 00f00000h l = 00008000h
L1P: o = 00e00000h l = 00008000h
L2: o = 00800000h l = 00098000h
}
SECTIONS
{
.text > L2
.stack > L2
.bss > L2
.cinit > L2
.cio > L2
.const > L2
.data > L2
.switch > L2
.sysmem > L2
.far > L2
}
my main program contains:
#include<stdio.h>
#define CNTLO3 (*((volatile unsigned int *) 0x02610010))
#define CNTHI3 (*((volatile unsigned int *) 0x02610014))
#define PRDLO3 (*((volatile int *) 0x02610018))
#define PRDHI3 (*((volatile int *) 0x0261001C))
#define TCR3 (*((volatile int *) 0x02610020))
#define TGCR3 (*((volatile int *) 0x02610024))
#define EMUMGT_CLKSPD3 (*((volatile int *) 0x02610004))
void main(void)
{
int sum,ord,i,j,arr[20][20],b[20],x[20];
PRDHI3=0xFFFFFFFF;
PRDLO3=0xFFFFFFFF;
CNTLO3=0x00000000;
CNTHI3=0x00000000;
TGCR3=0x00000003;
TCR3=0x00000000;
ord=20;
for(i=0;i<ord;i++){
b[i]=i+1;
for(j=0;j<ord;j++){
arr[i][j]=i+j+2;
}
} //generating matrix 'arr' & vector 'b'.
TCR3|=0x00000040; //timer starts
//main matrix by vector multiplication starts
for(i=0;i<ord;i++){
sum=0;
for(j=0;j<ord;j++){
sum+=(arr[i][j])*b[j];
}
x[i]=sum;
}
TCR3&=0xFFFFFFBF; //timer stops
printf("\ncalculation ends");
printf("\ncounter low:%x",CNTLO3);
printf("\ncounter high:%x",CNTHI3);
for(i=0;i<ord;i++)
printf("\nx[%d]=%d",i,x[i]);
}
problem:
I get a count 0xB19 ie, 2841 in decimal. so the calculation time is 2841*6/625 usec = 27.27 usec. the calculation involves:
20 assignments (sum=0)
400 multiply and accumulate
400 increment (j) and compare
20 increment (i) and compare
with a clock of 625 MHz the clock cycle becomes 1.6ns. with such a high speed clock it takes 27.27 usec to do less than 1000 operations. so 1 operation needs something more than 27.27ns. isn't it taking a much larger time? or I may be mistaken somewhere. please correct me. the timers are getting correct frequency. i have verified that by making a timer interrupt after 0xB19 counts. I kept the timer in clock mode. the output pin (TIMO2) is available at the HPI DC port. at that pin a 50%duty cycle square wave is obtained with time period around 2*27 usec.
Regards,
AC.