This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/PROCESSOR-SDK-AM437X: PRU shared RAM access efficiency

Part Number: PROCESSOR-SDK-AM437X

Tool/software: TI C/C++ Compiler

Hi

Currently I have some tests on PRU-ICSS core, I use the PRU to generate the PTO:

our PRU has 4 cores, and every core read the data from the shared RAM;

I use LBBO assamble instruction to read the data from SRAM, for one core operation, it just cost 2 cycle time. If running on the four cores at the same time, whether does it affect the performace due to bus arbitration?

in the other hand, if I use the LBCO with constant table to access SRAM, whether improve the efficiency?

  • Are you asking about the 32kB shared data RAM inside the PRU?
  • yes.
    arm calculete the PTO profile parameter and write them to the 32K shared ram
    and pru icss core will get the parameter, then do the correspongding move with PTO
  • Thanks. I have notified the PRU experts. They will respond here.
  • Hi,

    The PRU-ICSS implements bus arbitration on the command phase, but it does not preempt during the active data phase.  Below are additional details about how the PRU-ICSS handles each of these phases.

    Command Phase

    When multiple cores access the SRAM simultaneously, then it will arbitrate with the following priority (1 = highest priority, 4 = lowest priority):

    1. PRU-ICSS1 PRU0

    2. PRU-ICSS1 PRU1

    3. PRU-ICSS0 PRU0/1

    4. External host (i.e. ARM, eDMA, etc.)

    * Please note this priority is only applicable for the SRAM.  Other resources in the PRU-ICSS may have a different priority order.

    Data Phase

    If an external host (priority 4), for example, accesses the SRAM one cycle before the PRU-ICSS0 PRU0 (priority 3) core tries to access, it will block the PRU core (even though the PRU core has a higher priority) until the external host’s access is complete.  The wait time will depend on the amount of data being accessed in that transaction.  Limiting the SRAM accesses to 4 bytes per transaction will minimize the wait time.

    As for the constant table, the constant table will save one PRU cycle (since you don’t have to pre-load the address in a register).  However, if multiple cores are accessing the SRAM at the same time, then you would still experience the scenario described above.

    Regards,

    Melissa