[FAQ] PRU Arbitration Delay

Nick Saulnier

I am writing a PRU application for a PRU-ICSS device (AM261x, AM263Px, AM263x, AM335x, AM437x, AM57x)**, PRUSS device (AM62x), or a PRU_ICSSG device (AM24x, AM64x, AM65x).

Arbitration delay can add additional clock cycles to a PRU read or write command. I want to calculate what kind of arbitration delay my system may experience. How do I do it?

-------------------------------------------------------------------------------------------------------------------------

This FAQ is an update to previous FAQ [FAQ] PRU: How do I calculate read and write latencies? . This FAQ is focused exclusively on the concept of arbitration delays. For information about calculating read & write latencies, refer to FAQ [FAQ] PRU Read & Write Latencies .

This FAQ is a work-in-progress! If you are reading this while it still has the work-in-progress label, feel free to create a new e2e thread to chat with us about the latest updates.

** I have only verified this information for the PRU-ICSS design on AM261x, AM263Px, AM263x. I would expect AM335x, AM437x, AM57x to behave similarly, but these are older PRU-ICSS designs, so it is possible that the internal bus structure changed between these devices and the more recent PRU-ICSS devices

4 months ago

+1 Nick Saulnier 4 months ago

TI__Guru** 110030 points

What kinds of busses are inside the PRU subsystem?

Before we start talking about arbitration delay, we need to understand the bus architecture within the PRU subsystem.

There are 2 kinds of busses that can be used within the PRU architecture: VBUSP, and VBUSM. The busses may also be called VBUS, cbass, CBASS, or sometimes just CBA.

No arbitration if accessing different endpoints/outputs

Both VBUSP and VBUSM are "fully switched". That means that multiple initiators can use the CBASS simultaneously, as long as the initiators are accessing different endpoints.

An "initiator" can be a PRU core, an XFR2VBUS instance, or an entity from outside the PRU subsystem.

An endpoint can be a peripheral, a memory region, or even a grouping of registers.

For example, PRU0 can access DMEM0 at the same time that PRU1 is accessing the PRU subsystem's hardware UART.

Separate memories act as separate endpoints. For example, one initiator could access DMEM0, one initiator could access DMEM1, and a third initiator could access SMEM simultaneously.

Accesses to the same endpoint: behavior for VBUSP & VBUSM is different

If multiple initiators attempt to access the same endpoint on the same clock cycle, or if one initiator attempts to interact with an endpoint while another initiator is already interacting with that endpoint, then arbitration occurs. Arbitration delay happens when an initiator must wait one or more cycles before the initiator is allowed to read or write to the endpoint.

Arbitration behavior for VBUSP endpoints is different than arbitration behavior for VBUSM endpoints.

We will discuss the details more after looking at the block diagrams.

+1 Nick Saulnier 4 months ago

TI__Guru** 110030 points

PRU-ICSS & PRUSS - external access path

This is the path that read & write commands take when accessing peripherals or memory outside of the PRU subsystem.

+1 Nick Saulnier 4 months ago

TI__Guru** 110030 points

PRU-ICSS & PRUSS: Internal access path

This is the path that read & write commands take when accessing peripherals or memory within the PRU subsystem.

+1 Nick Saulnier 4 months ago

TI__Guru** 110030 points

PRU_ICSSG: External access path

This is the path that read & write commands take when accessing peripherals or memory outside of the PRU subsystem.

+1 Nick Saulnier 4 months ago

TI__Guru** 110030 points

PRU_ICSSG: Internal access path

This is the path that read & write commands take when accessing peripherals or memory within the PRU subsystem.

+1 Nick Saulnier 4 months ago

TI__Guru** 110030 points

How do simultaneous reads & writes occur on a VBUSP bus?

Each output port allows one in-progress command.

That command can be a read, or a write.

“head of line” blocking occurs.

Reads can block writes, and vice versa.

There is no "line" that keeps track of the next read or write to execute, because there is no command FIFO per output port on a VBUSP. The highest priority initiator will go next as soon as the output port becomes free. This can lead to starving lower priority initiators. For example, a R5F core trying to write to DMEM0 could be continuously preempted by PRU0 & PRU1.

TODO: verify relative priority of each initiator for each bus

A PRU/XFR2VBUS write of N [output_port_data_width]-bit words blocks the VBUSP port for N PRU clocks, until the data has exited the PRU subsystem

A write to a VBUSP output port blocks BOTH write and read commands from getting executed to that output port until the write has completed the N clock cycles. This is DIFFERENT from VBUSM. On the VBUSM, an ongoing write will not block a read command from starting to execute.

Let's use a 64-Byte write on PRU_ICSSG as an example. Slice 0 VBUSP has an output port data bus width of 32 bits, so a 64 Byte read would block the output port for (64 Bytes x 8 bits/Byte / 32 bits/clock) = 16 clocks. On the other hand, XFR2VBUS TX VBUSP has an output port data bus width of 64 bits, so a 64 Byte read would block the output port for 8 clocks.

A read blocks until the read is complete

+1 Nick Saulnier 4 months ago

TI__Guru** 110030 points

How do simultaneous reads & writes occur on a VBUSM bus?

Each output port is “multi-issue”

This means that each output port allows multiple in-progress read commands.

Write commands are NOT “multi-issue” - an in-progress write will block other writes from occurring.

An output port can simultaneously have an ongoing read, and ongoing writes.

Each output port can only execute 1 command per clock cycle.

If multiple read or write commands are issued in the same clock cycle, the additional commands go into the command FIFO.

TODO: Verify that the command FIFO means that order of arrival DOES matter - i.e., if a write command from a higher priority initiator lands after a write command from a lower priority initiator is already in the command FIFO, does the lower priority write command still execute first?

TODO: How does the command FIFO work with reads and writes? Is there a separate command fifo for each kind of request? If there is a stalled write command in the FIFO, and then a read command lands in the FIFO, will that stalled write command block the read command from executing?

This means that there CAN be arbitration delay between a read and a write command, or between two read commands, if those commands land at the VBUSM in the same clock cycle. But the arbitration delay is very small. For N simultaneous read or write commands, the worst case arbitration delay for a read is N-1 clocks.

A PRU/XFR2VBUS write of N [data_width]-bit words blocks the VBUSM port from starting the next write for N PRU clocks, until the write data has exited the PRU subsystem.

This does NOT block read commands from starting. So an ongoing write will not cause arbitration delay with a read as long as the read does not reach the VBUSM in the same clock cycle as the write command.

+1 Nick Saulnier 4 months ago

TI__Guru** 110030 points

Arbitration: What happens if an external initiator tries to access a PRU subsystem endpoint, but is constantly preempted?

For example, what if an R5F tries to write to PRU internal memory, but the write is constantly preempted by PRU core access?

R5 data will NOT get lost or corrupted, though it could get “starved”
No data FIFO in internal main_cbass, so data would not be stored here
Data would start filling up the fifos & pipelines in the SoC going back towards the R5F
If the backpressure means that data filled up all the way back to the R5F itself, then it might start affecting R5F execution (e.g., maybe stall the next R5F write command until the backpressure releases)

Processors

Processors forum

[FAQ] PRU Arbitration Delay