I try to maximize the throughput on an external 8-Bit asynchronous device. I noticed some problems, but I'm not sure where the problem is.
EMIFA registers are like the following.
AWCC
MAX_EXT_WAIT: 10, RESERVED: 0, WP: 0, RESERVED: 0
CE2CFG
ASIZE: 0, TA: 0, R_HOLD: 0, R_STROBE: 8, R_SETUP: 0, W_HOLD: 0, W_STROBE: 8, W_SETUP: 0, EW: 1, SS: 0
EMIFA clock is 100 MHz.
I've added two outputs from the Oscilloscope from a Read Access. Blue is Chip Select, Magenta is Read and Cyan is Wait.
Problem 1:
After the Wait is release, another ca. 60 ns are spent for nothing. I can change the CE2CFG register as much as I like, the time does not change. R_STROBE is 8, which means 90 ns. But the whole Wait time is above 90 ns anyway.
Problem 2:
Between two consecutive read accesses are almost 200 ns. The assembler output looks like the following:
.L84:
subs fp, fp, #4
ldrb r2, [r3, #0] @ zero_extendqisi2
ldrb r2, [r3, #0] @ zero_extendqisi2
ldrb r2, [r3, #0] @ zero_extendqisi2
ldrb r2, [r3, #0] @ zero_extendqisi2
bne .L84
It is even loop unrolled, impossible that the CPU is the problem. Fancy thing is that between consecutive write accesses there is ca. 10 ns are wasted (according to TA).