While running performance tests on OMAP-L138 we found some timing descrepancies when accessing EMIFA that we cannot explain.
The function we running is this:
CMP r1,#0
BXEQ lr
MOV r12,#0
GetNextWord:
LDRH r3,[r2,#0]
SUBS r1,r1,#1
STRH r3,[r0,r12]
ADD r12,r12,#2
BNE GetNextWord
It takes 3 arguments pointer to destination buffer, number of words to read and fixed source address. The function is executed with interrupts locked.
When source address is located in EMIFA CS2 it takes 211us to execute this function, when source address is located in shared RAM (0x80000000) it takes 23us to excute the same function.
In both cases destination buffer is located in DDR memory, number of words to read is 758.
The CS2 is configured to produce a read cycle of 70ns, EMIFA clock is 100Mz. Turn around time is set to 2, but it should not matter as we are not switching between readds and writes.
This should account for 53us (70ns*758) + ~ 20us, so the total time is expected to be ~75us, but we see 211us.
Is there a "dead" time between read cycles that is introduced by the EMIFA controller?
What takes an extra 136us when EMIFA reads are performed?