This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PCIe Throughput vs. Payload Size

Hello,

Our original question is in the following post: https://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/158004/1771298#1771298

However, since that post was already marked as answered, we were told to start a new post and continue the discussion. Thank you all for your help and apprehension.

For completeness, the original question was as follows:

We have a similar question on PCIe throughput performance. Our DSP(C6678) is configured as RC and the FPGA as EP. They are directly connected to each other -no PCIe switch-. The MAX_PAYLD (bits[7:5]) is programmed to 000b as Steven pointed out. We get 720MB/s data rate for 64B payload which is close to value that Ralf suggests. However, for 128B payload, we could only achieve 695MB/s which should have been 780MB/s. In addition if we set MAX_PAYLD (bits[7:5]) to 001b, then we get more drastic reductions in the throughput. What might be missing here?

----

After some discussion, Iding asked us:

In your cases, what if you change the DSP memory from DDR to other locations, say MSMC or L2? Will this make any difference using 64 bytes or128 bytes payload size writing from FPGA?

----

As you suggested Iding, we conducted another test in which the Inbound Address Offset was changed from DDR3(0x80000000) to MSMC-SRAM(0x0C000000). However, this didn't make any difference. We obtained nearly the same results as in the DDR3 case. In both MSMC and DDR3 cases, 64B payload achieves 720MB/s whereas the 128B payload remains at 695MB/s.

By the way, we checked the related PCIe signals from the FPGA side. We observed that, in the 64B payload case, the FPGA is able to send the TLP's successively with very rarely occuring idle cycles - the cycles in which the remote device (DSP PCIESS)issues a NOT READY signal to the PCIe IP in the FPGA side and as a result the FPGA waits for the READY signal to transmit the next TLP-. However, in the 128B payload case, the so-called idle cylces appear very often and thus the decrease in the throughput becomes inevitable.

Did we somehow misconfigured another register in the PCIESS which might result indirectly in such a degradation? Or might it be a HW issue?

Any other suggestion?

Best regards,

Abdulkerim