Hello,
I have been running performance tests on the OMAP-L138 PRU to determine the speed of accessing the various kinds of memory that the PRU can touch (PRU DRAM, DDR, memory from the two cores). My tests have been showing that each 32-bit write, independent of the type of memory, takes roughly one PRU cycle. 32-bit reads take 1-5 PRU cycles depending on the type of memory.
My question is, why are writes faster than reads? I'm guessing that the PRU puts the write request onto the SCR and continues execution without waiting for a response, while it must wait for the response when performing a read. Am I on the right track here? Is there somewhere I could go to read more about how this works?
In case it's relevant, the PRU code I am using for testing is essentially the following:
LD32 r1, pruCycleCountRegister
SBBO r14, memAddr, 0, 0x40 // or LBBO for read test
SBBO r14, memAddr, 0x40, 0x40 // or LBBO for read test
// many similar SBBOs or LBBOs
LD32 r2, pruCycleCountRegister
// divide (r2 - r1) by the number of 32-bit accesses to get PRU cycles per access
Thanks very much to anyone who can help.
-Emily