Question on behalf of the customer
I recently noticed that there are differences in the access latencies of PRU subcomponents. I had previously assumed that everything inside the PRU can be accessed with a latency of one clock cycle, so an lbbo/lbco instruction fetching at most 4 bytes would finish in 3 cycles (1 cycle opcode execution + 1 cycle component access latency + 1 cycle read delay).
This does not seem to hold true for components like the IEP - I am e.g. observing an access latency of 10 cycles(!) to IEP registers. I am using the following instruction to read the current IEP timer value
; .asg "C26", C_IEP_BASE
; RA_IEP_TMR_COUNT .set 0x0C
; C26 is configured to point to the "IEP" registerblock (0x2E000)
lbco &r14, C_IEP_BASE, RA_IEP_TMR_COUNT, 4
and I see a total latency of 12 cycles by means of the PRU cycle counter (1 "active" cycle opcode execution + 11 "stall" cycles). For other subcomponents like the "Control" block of the PRU (which contains the PRU cycle counter), there also seems to be a larger access latency than expected - here it is supposedly 2 cycles (again measured using the PRU cycle counter).
Thus, I would greatly appreciate a list of access latencies not only for the different types of memories, but also for the various PRU-internal subcomponents like CTRL, INTC, IEP, UART etc.
It would be great to have some official numbers on this from TI