We have a performance issue with the PCIe lanes of the Shannon DSP.
We are running the following SW lib (Version #: 0x01000003; string PCIE LLD Revision: 01.00.00.03)
We have connected a Lattice FPGA with a Lattice PCIe core using a single PCIe lane.
When setting up a DMA data transmission from the FPGA to the DSP via PCIe the achievable data rate is lower than expected. Looking at the data with a logic analyzer we found that the reason is a pretty low amount of credits the DPS’s PCIe interface issues to the FPGA (0x2F).
We have found no documentation about how to influence this.
Any help appreciated.
May I ask what is the data rate achieved in your case and what is your expectation please?
And how the PCIe (single lane, but Gen1 or Gen2) and EDMA (DBS=64B or 128B, ACNT/BCNT) are configured for the transfer please?
The Throughput Performance Guide discussed about the PCIe throughput and the factors affecting it. It has the realistic data rate for C66x PCIe module with 2 lanes, Gen2 and different EDMA DBS configuration. It could be the expectation data you can compare with.
The credit is automatic managed. We can see if the other factors (section 11.1 in Throughput Performance Guide) affect your data rate and how we can get better performance. So please share more info about the configurations. Thanks a lot.
Please click the Verify Answer button on this post if it answers your question.
please let me continue with this thread and answer your questions.
Our connection between FPGA and DSP is one lane 2.5 Gbps. We do not use the EDMA, there is a DMA within the FPGA transferring 256 byte packets. Throughput is slightly above 1.2 Gbps, I would have expected/hoped something about 1.6 Gbps or even more.
I read the throughput performance guide but it was only based on the connection of two DSPs with the internal EDMA limiting the transfer. This is not the case in our implementation.
Watching the posted data credits offered to my VHDL implementation only two packets of 256 bytes are transferred and then transmission pauses because not enough credits are available. After credit update the next two packets are transmitted and so on. Sometimes the update does not release all credits but a smaller amount but this does not lead to slower overall transmission.
The number of posted header credits is always greater than zero and does not influence the transmission.
I think the flow control credit in C66x PCIe module is automatically managed. So there seems no control for the credit number or update period.
Using 2 lanes will definitely give you more throughput. The credit could be doubled as well.
Another thing you could try is to reduce the data burst size from 256 bytes to 128 bytes if you can control the DMA in FPGA. We can see if smaller credit required for each transaction will help on the throughput as well.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.