I am trying to use trace on the DM8148: Cortex A8 to do some low level access timing to optimize performance of certain routines. I don't seem to be able to get the trace cycle count to match the counter retrieved on A8 using the mcr instruction.
Here is my simple test code to test an access to a CPPI RAM location. The first call to profile_write_time returns 295 ticks (assumed uncached instructions), 160 to 168 ticks for subsequent accesses. And yet, below is the trace that I am seeing for a ~160 tick iteration. As you can see, 22 cycles for the STR instruction is clearly not 160. Also, cycle index is occasionally the same number for subsequent instructions. Its like the cycle index is based on a 100Mhz clock, but I don't know where that would be coming from. Is there a configuration option for the frequency of the ETB cycle counter? I found some frequency stuff for STM, but I'm just accessing the ETB through jtag.
|
unsigned int time32(void){ int profile_write_time(register int *address){ printf("%d CPU ticks\n",profile_write_time((int *)0x4A102000u)); |
| Instruction | Instr Addr | Read Addr | Write Addr | Cycle Index | Cycle delta |
| MOV R12, R0 | 0x80106B54 | 697 | 0 | ||
| BL 0x80106B48 | 0x80106B58 | 697 | 3 | ||
| MRC P15, #0, R0, C9, C13, #0 | 0x80106B48 | 700 | 0 | ||
| BX R14 | 0x80106B4C | 700 | 5 | ||
| MOV R2, R0 | 0x80106B5C | 705 | 0 | ||
| BL 0x80106B48 | 0x80106B60 | 705 | 2 | ||
| MRC P15, #0, R0, C9, C13, #0 | 0x80106B48 | 707 | 1 | ||
| BX R14 | 0x80106B4C | 708 | 4 | ||
| MOV R1, R0 | 0x80106B64 | 712 | 22 | ||
| STR R12, [R12] | 0x80106B68 | 734 | 1 | ||
| 0x4A102000 | 735 | ||||
| 735 | 8 | ||||
| BL 0x80106B48 | 0x80106B6C | 743 | 1 | ||
| MRC P15, #0, R0, C9, C13, #0 | 0x80106B48 | 744 | 3 | ||
| SUB R12, R0, R1, LSL #1 | 0x80106B70 | 747 | 1 | ||
| ADD R12, R2, R12 | 0x80106B74 | 748 | 0 |