GPIO Bit Bang Too Fast!

Ray Martin1

I’m using StarterWare on BBB with the standard MMU and cache initialization values. With a tight main loop, I’m able to continuously toggle GPIO every 40 nsec. I’ve read other posts (https://e2e.ti.com/support/arm/sitara_arm/f/791/t/202814) that say L1 to L3 or L4 interconnect bus arbitration introduce additional delays.

Also significant is 40 nsec is 10x faster than what I can achieve with a single EDMA GPIO write. It takes 500 nsec from the time the signal that triggers EDMA until the DMA write updates the GPIO (destination address is GPIO_DATAOUT). During that time, the tight main loop is still continuously updating its GPIO. This indicates the delay is not due to L3/L4 write arbitration. In an attempt to isolate this further, I changed the source EDMA address from DDR memory to an unused ParamSet (repurposed as a buffer) but the delay is the same.

Not toggling GPIO in the tight main loop had no effect on the EDMA GPIO update time.

I then inserted a single read of an unrelated register (TIMER4 TCRR) in the tight main loop and the GPIO toggle rate slowed to 280 nsec. It appears reads or transitioning from write to read inserts a significant delay (flushes?), even if reading directly into a core register (e.g. R1) and not a DDR memory variable. These delays are very consistent.

Based on these observations, shouldn’t EDMA be able to at least match what can be achieved with a programmed loop? What causes EDMA to respond so slowly? EDMA is on L3 and has its own ParamSet memory so shouldn’t L3 (EDMA) to L4 (GPIO) be faster that L1 (program loop) to L4 (GPIO)?

over 9 years ago

0 Biser Gatchev-XID over 9 years ago

TI__Guru**** 393215 points

Moving this to the Starterware forum.

0 Lalindra Jayatilleke over 9 years ago

TI__Mastermind 30365 points

Ray,

We are looking into your query and will post a response. Thanks for your patience.

Lali

0 Ray Martin1 over 9 years ago in reply to Lalindra Jayatilleke

Intellectual 290 points

I used one of the PRUs to probe when the event I was using to trigger EDMA actually occurred. In my case, the EQEP PCU flag in the QFLG register was the trigger event but it wasn’t being set (according to PRU polling) until 250 nsec after EQEP2A_IN transitioned (my external timing reference point). This explains half the 500 nsec EDMA delay I observed earlier. I also noticed the location in my DMA chain added additional delays (even though I set TCCMOD_EARLY).

So along with others who have reported delay issues, I also conclude that small transfers are not well served by EDMA. Not even when the source, destination and trigger event are peripheral registers.

Fortunately we have the PRUs. They give direct access to everything plus they have their own high speed GPIO so you can create your own “peripheral”. Definite learning curve but worth the investment.

0 Ray Martin1 over 9 years ago in reply to Ray Martin1

Intellectual 290 points

We are now using the PRU to trigger EDMA (pr1_host[7]). Our measurements indicate the minimum time until the first EDMA transfer in our chain completes is 200 nsec. This sometimes increases to as much as 800 nsec. I think this is the best that can be achieved when crossing L3/L4 interconnect. For anything faster than this, we use PRU GPIO which has awesome speed and no latency.

I’m also impressed that ARM code can toggle GPIO every 40 nsec. For small loops, running code entirely from cache and using register variables essentially eliminates execution time.

Processors

Processors forum

GPIO Bit Bang Too Fast!