We wanted to measure the performance of our code by querying an OMAP internal timer.
The DSP CPU runs at 300 MHz, the timer is clocked at 24 MHz.
Executing the following code snippet, in which the timer is queried twice,
and the read values are stored in the onchip shared memory
*( ( unsigned int * )0x8001c000 ) = TIMER0_ADDR[ TIMER_CNT34 ];
*( ( unsigned int * )0x8001c004 ) = TIMER0_ADDR[ TIMER_CNT34 ];
leads in example to following read timer values in the shared memory:
0x8001c000: 0x644AEA04 0x644AEA0E
Between the second and the first value, there is a difference
of 10, meaning that there was a latency of 10*(1/24MHz) = 400 nsec
in the execution of the 2 lines of code. We would have assumed
no difference or only a difference of 1. It looks as if the CPU is stopped
by anything, maybe by the data transfer via the system interconnect.
The associated assembler code are only 4 lines that do not provide
an explanation for the effect. Interrupts have been disabled during the
test.
Do you have an explanation for this observerd latency?
Do we have maybe misconfigured our system that leads to such an effect?
(In another thread, I have read something like: .. The PRU and DSP config port
are at a similar "distance" from the SYSCFG module ... Reads will be around
30-40 DSP clock cycles.
Is this maybe linked to what I have observed?)