Compiler/PROCESSOR-SDK-AM437X: PRU shared RAM access efficiency

user5293216

Tool/software: TI C/C++ Compiler

Currently I have some tests on PRU-ICSS core, I use the PRU to generate the PTO:

our PRU has 4 cores, and every core read the data from the shared RAM;

I use LBBO assamble instruction to read the data from SRAM, for one core operation, it just cost 2 cycle time. If running on the four cores at the same time, whether does it affect the performace due to bus arbitration?

in the other hand, if I use the LBCO with constant table to access SRAM, whether improve the efficiency?

over 8 years ago

0 Biser Gatchev-XID over 8 years ago

TI__Guru**** 393215 points

Are you asking about the 32kB shared data RAM inside the PRU?

0 user5293216 over 8 years ago in reply to Biser Gatchev-XID

Prodigy 40 points

yes.
arm calculete the PTO profile parameter and write them to the 32K shared ram
and pru icss core will get the parameter, then do the correspongding move with PTO

0 Biser Gatchev-XID over 8 years ago in reply to user5293216

TI__Guru**** 393215 points

Thanks. I have notified the PRU experts. They will respond here.

0 melissaw over 8 years ago in reply to Biser Gatchev-XID

TI__Genius 16030 points

Hi,

The PRU-ICSS implements bus arbitration on the command phase, but it does not preempt during the active data phase. Below are additional details about how the PRU-ICSS handles each of these phases.

Command Phase

When multiple cores access the SRAM simultaneously, then it will arbitrate with the following priority (1 = highest priority, 4 = lowest priority):

1. PRU-ICSS1 PRU0

2. PRU-ICSS1 PRU1

3. PRU-ICSS0 PRU0/1

4. External host (i.e. ARM, eDMA, etc.)

* Please note this priority is only applicable for the SRAM. Other resources in the PRU-ICSS may have a different priority order.

Data Phase

If an external host (priority 4), for example, accesses the SRAM one cycle before the PRU-ICSS0 PRU0 (priority 3) core tries to access, it will block the PRU core (even though the PRU core has a higher priority) until the external host’s access is complete. The wait time will depend on the amount of data being accessed in that transaction. Limiting the SRAM accesses to 4 bytes per transaction will minimize the wait time.

As for the constant table, the constant table will save one PRU cycle (since you don’t have to pre-load the address in a register). However, if multiple cores are accessing the SRAM at the same time, then you would still experience the scenario described above.

Regards,

Melissa

Processors

Processors forum

Compiler/PROCESSOR-SDK-AM437X: PRU shared RAM access efficiency