I’m using StarterWare on BBB with the standard MMU and cache initialization values. With a tight main loop, I’m able to continuously toggle GPIO every 40 nsec. I’ve read other posts (https://e2e.ti.com/support/arm/sitara_arm/f/791/t/202814) that say L1 to L3 or L4 interconnect bus arbitration introduce additional delays.
Also significant is 40 nsec is 10x faster than what I can achieve with a single EDMA GPIO write. It takes 500 nsec from the time the signal that triggers EDMA until the DMA write updates the GPIO (destination address is GPIO_DATAOUT). During that time, the tight main loop is still continuously updating its GPIO. This indicates the delay is not due to L3/L4 write arbitration. In an attempt to isolate this further, I changed the source EDMA address from DDR memory to an unused ParamSet (repurposed as a buffer) but the delay is the same.
Not toggling GPIO in the tight main loop had no effect on the EDMA GPIO update time.
I then inserted a single read of an unrelated register (TIMER4 TCRR) in the tight main loop and the GPIO toggle rate slowed to 280 nsec. It appears reads or transitioning from write to read inserts a significant delay (flushes?), even if reading directly into a core register (e.g. R1) and not a DDR memory variable. These delays are very consistent.
Based on these observations, shouldn’t EDMA be able to at least match what can be achieved with a programmed loop? What causes EDMA to respond so slowly? EDMA is on L3 and has its own ParamSet memory so shouldn’t L3 (EDMA) to L4 (GPIO) be faster that L1 (program loop) to L4 (GPIO)?