66AK2H12: Serial Rapidio (SRIO) inbound DirectI/O transfer throughput & priority

Chris Curry

Part Number: 66AK2H12

Hi,

I'm using an FPGA connected to a 66ak2h12 SOC part with 1 port / 4 lanes at 5Gbit per lane. I've discovered that when writing data from the FPGA using DirectIO packet types (I've tried NWRITE, NWRITE_R, and SWRITE packets -- doesn't make a difference which) into the SOC DDR3A.

Right now I'm only trying to write 1 Gbit/sec in that data connection. If that is all I do, then the SOC keeps up handily. If I transfer data AND read through to verify a test pattern in the data, I see SRIO data packet payloads dropped (they never make it to DDR3A memory). It seems that contention for DDR3A is holding off the SRIO peripheral's MAU DMA enough for it to fail to transfer.

I see performance statistics in a TI document for output DirectIO transfers, but there isn't any information there about inbound transfers.

Am I hitting some threshold for bandwidth of this type of transfer? Why would there be enough contention on DDR3A to drop SRIO packets? are there any priorities I can change to make SRIO supersede any other transfers on the memory bus?

Thanks,

Chris

over 7 years ago

0 Yordan Kovachev over 7 years ago

TI__Guru**** 161600 points

Hi,

I've notified the sw team. Can you share which SDK are you using?

Best REgards,
Yordan

0 Garrett Ding over 7 years ago in reply to Yordan Kovachev

TI__Mastermind 43296 points

Chris,

Is it the TI document "Throughput Performance Guide for KeyStone II Devices (Rev. B)" (www.ti.com/lit/an/sprabk5b/sprabk5b.pdf ) you mentioned? Table 19 shows the throughput of a DirectIO operation for an NREAD transaction using a PHY line rate of 5 Gbps (SRIO line rate of 4 Gbps) in 1x, 2x and 4x modes. How much is the packet drop rate in your test case 'transfer data AND read '?

>>are there any priorities I can change to make SRIO supersede any other transfers on the memory bus?

What's the other transfers on the memory bus? The PRI_ALLOC described in the AK2H datasheet may help.

6.4 Bandwidth Management

The priority level for operations initiated outside the CorePac by system peripherals is declared through

the Priority Allocation Register (PRI_ALLOC).

9.4 Bus Priorities

The priority level of all master peripheral traffic is defined at the TeraNet boundary. User-programmable

priority registers allow software configuration of the data traffic through the TeraNet. Note that a lower

number means higher priority — PRI = 000b = urgent, PRI = 111b = low

Regards, Garrett

0 Garrett Ding over 7 years ago in reply to Garrett Ding

TI__Mastermind 43296 points

Chris,

>>I've tried NWRITE, NWRITE_R, and SWRITE packets -- doesn't make a difference which

I had more discussion with team:

If using NWRITE_R, there should be no packet drop as it always expects a write completion response.
Did you set NWRITEs/NWRITE_R/SWRITE and the NREADs at the same priority level during testing? And how were you verifying the lost packets? Did you verify if there is any packet drop with local path?

If you were using different priorities on the writes and reads, or even just using different LSUs, things may get reordered, and the NREAD response may not reflect the correct data, but the memory contents still may be correct.

Regards,
Garrett

0 Chris Curry over 7 years ago in reply to Garrett Ding

Prodigy 160 points

Hi Garret,

To clarify, I am generating NWRITE, NWRITE_R, and SWRITE packets from the FPGA going to the SOC. My understanding is that this travels through the MAU on the SOC instead of the LSUs.

Also, I have a correction. Reading the DRAM on the SOC during the transfer actually didn't have an effect on the throughput or packets being dropped. That was just a symptom of the test conditions presenting a red herring.

My team has found two scenarios:
a) when the PBMi watermark levels are set to default, we transfer about 120MByte/sec of data for roughly 10 seconds to the SOC. At that point the FPGA sees retry requests and has to rewind its internal transmission buffer instead of sending new data
b) when setting the PBMi watermark levels to 0, we do the same transfer, but after the same time period we see packets disappear with no apparent errors logged in any registers. When using NWRITE_R's here, we see no responses on the FPGA for those dropped packets.

Are there any published measurements for the throughput of the MAU? We are doing preliminary tests here and seeing a maximum throughput of about 500MByte/sec with a 4-lane 5GBit connection, which is almost a quarter of what we'd expect. We also see the issues with slower data as discussed above.

We are writing to four different DestIDs (and setting the SOC to promiscuous mode) in an attempt to fully utilize the MAU's four DMAs.

Thanks,
Chris

0 Garrett Ding over 7 years ago in reply to Chris Curry

TI__Mastermind 43296 points

Chris,

That is correct- the LSU controls the transmission of direct I/O packets, and the MAU controls the reception of direct I/O packets.

The 'DDR3 Memory Software Board-to-Board Mode Throughput Value' in table 18 - DirectIO Write Throughput With 5 Gbps PHY, from the "Throughput Performance Guide for KeyStone II Devices", SPRABK5B.pdf shows a much higher than 4Gbps that you observed.

www.ti.com/lit/an/sprabk5b/sprabk5b.pdf

Regards, Garrett

Processors

Processors forum

66AK2H12: Serial Rapidio (SRIO) inbound DirectI/O transfer throughput & priority