Transferring large bursts of data over PCIe link ?

Hakan YAMANYAR

Hello,

I want to read 256-bytes instead of reading a double-word (32-bits) for each TLP.
I examined the document "PCIe Uses Cases for KeyStone Devices". (http://www.ti.com/lit/an/sprabk8/sprabk8.pdf)
However, I couldn't find the registers which were mentioned in its sample pseudo-code.

In the same document, at 2.3 EDMA Considerations chapter, it says : " Better performance is achieved for outbound transfer using the EDMA
transfer controller with a larger data burst size."

So, as far as I understand we have to use EDMA for reading/writing large bursts of data like 128 bytes or 256 bytes over PCIe link.
Am I right ?

Regards,
Hakan.

over 13 years ago

0 Jay Shu over 13 years ago

TI__Prodigy 680 points

Hakan, the FPGA's "free" core that is available in ISE only supports 1 DWORD payload - if you read their user's guide - it does mention this. I'm assuming you are using that PIO design - if you are you can't do it. I actually tried and using EDMA but even trying to read or write anything greater than 1 DWORD will fail.

I have not played around with changing the MPS (maximum payload size) in the TLP for the DSP.

0 Hakan YAMANYAR over 13 years ago in reply to Jay Shu

Intellectual 265 points

Hi Jay,

We already modified the application VHDLcode of the PCI core of FPGA and Maximum Payload Size of it is 128 bytes.

The point that we are trying to understand is whether we can handle an incoming completion TLP with a payload of 128 bytes without using EDMA ?

Could you please send us your EDMA sample project if it is possible ?

Regards,
Hakan.

0 Jay Shu over 13 years ago in reply to Hakan YAMANYAR

TI__Prodigy 680 points

Hakan, I don't think you can handle an incoming TLP greater than 1 dword without EDMA. Hopefully someone on the prior thread can answer this 100%.

Jay

0 Hakan YAMANYAR over 13 years ago in reply to Jay Shu

Intellectual 265 points

Jay,

Do I have to use EDMA3 LLD or can I use ACPY3 instead, in order to accomplish these block transfers with the payload size of 256-bytes from the EP?

This 256-bytes transfer thing also confuses me. In the documentation, it is mentioned that max. payload size for outbound transfers is 128-bytes.
However, it is also mentioned that max. payload size for the remote read operations is 256-bytes.

Can I read 256-bytes from the EP by means of EDMA?

Regards,
Hakan

0 Steven Ji over 13 years ago in reply to Hakan YAMANYAR

TI__Genius 12065 points

Hakan,

C66x PCIe has the capability to support 128 bytes payload size in max for outbound transfer, which means the C66x originates the transactions to the external PCIe device. It does not matter the C66x PCIe writes or reads data to/from the external device.

C66x PCIe also supports 256 bytes payload size in max for inbound transfer, which means the external PCIe device originates the transactions, no matter it writes or reads data to/from the C66x device.

So if you are using C66x PCIe as the RC and initializes the data transfer to FPGA, it is outbound transfer for C66x. The C66x CPU could not generate the payload size more than 32 bits (1DWORD). The EDMA in C66x supports the maximum 128-byte Data Burst Size (for specific transfer queue), which means the RC could write or read 128 bytes in maximum to/from EP.

You can use EDMA3 CSL/LLD in C66x MCSDK package to configure the EDMA3 or use other library (like ACPY3) as long as the package is compatible with C66x devices.

C66x PCIe could accept the 256-byte payload size in TLP, which is originated by the external device. In that case, it is inbound transfer for C66x PCIe. No EDMA inside of C66x is involved in the inbound transfer. PCIe master port will dispatch the receiving packets to the memory region inside of C66x. But it may require the EDMA or other mechanism to initialize the transfer from the external PCIe device.

Sincerely,

Steven

0 Hakan YAMANYAR over 13 years ago in reply to Steven Ji

Intellectual 265 points

Hello Steven,

I am trying to understand the behaviour of EDMA3 LLD after getting your suggestions. I thought that it would be a good idea to start with the standalone application
under:

${EDMA3_LLD_INSTALL_DIR}\examples\edma3|driver\evm6678\sample_app\

as it was mentioned in the EDMA3 LLD 02.11.03 Release Notes document. I have chosen this application since my target is a C6678 and I will be using little endian ordering
on my target. However, I didn't manage to build this application when I imported it into CCS 5.1 environment. Therefore, I decided to create an empty SYS/BIOS project and
I copied the source files under this new project and after changing its *.cfg accordingly, I managed to build it without any issues. However, there are some points that I am confused with while trying to have it run in the debug mode which go as follow:

1. Am I supposed to load it onto three different CorePacks, i.e. Core0, Core1 and Core2, and have it run like that? If yes, can you refer me a documentation which describes
how to manage multicore debugging?

2. When I load it onto one CorePack, i.g. Core0, the application starts giving the console output of "[C66xx_0] waiting for interrupt..." and it never loops out of this stage. This section of the code is located under the module "dma_test.c". It seems to wait for a flag variable called irqRaised1 to be cleared. This flag seems to be cleared by a callback function called callback1 under the module common.c. However, I couldn't find its subscriber(s) anywhere among whole source files of the project. Isn't there supposed to be such a mechanishm within the context of the application to observe whether the DMA transfer is completed? Does it behave like that since I migrated the source files under a new SYS/BIOS project?

Note: I am having this application run on a EVM6678 board.

As an additional question, why couldn't I build the original sample project that came along with the MCSDK 2.0.5.17 installation which is the latest version?

Thanks in advance,
Hakan

0 Steven Ji over 13 years ago in reply to Hakan YAMANYAR

TI__Genius 12065 points

Hakan,

I am not quite familiar with EDMA LLD. After briefly browsing the code, I think it is OK to run on single core ("bypassCore()" and other APIs will manage the resource). And the "callback1" is subscribed in "EDMA3_DRV_requestChannel" whcih is defined in "edma3_drv_basic.c" in the folder as "<EDMA_LLD_DIR>\packages\ti\sdo\edma3\drv\src".

Instead, there is also an EDMA CSL example, which only requires the CSL library and does not need SYS/BIOS. It may be easier for debugging and testing. The EDMA example is located at "<MCSDK_DIR>\pdk_C6678_1_0_0_17\packages\ti\csl\example\edma".

The CSL document is located at "<MCSDK_DIR>\pdk_C6678_1_0_0_17\packages\ti\csl\docs\csldocs.chm".

You can give a try on this and see if it is working for you.

Sincerely,

Steven

0 Hakan YAMANYAR over 13 years ago in reply to Steven Ji

Intellectual 265 points

Hello Steven,

I think that I reached at a point about using QDMA for PCIe transfers.I started a new thread in E2E regarding this issue:

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/158998.aspx

Could you please take a look at it?

Regards,
Hakan

0 Mark Swick over 13 years ago in reply to Steven Ji

Intellectual 565 points

Hello,

I am also interrested in this topic. I have been assuming that C66x writes to te PCIe memory space are buffered and tranferred with up to 128 byte payloads in the outbound packets.

Is this true or do I have to use EDMA to acheive 128 byte outbound packets?

Also, How big are the buffers inside the C66x PCIe device feeding the outbound tranfers?

Thanls for your help in advance.

Mark

0 Steven Ji over 13 years ago in reply to Mark Swick

TI__Genius 12065 points

Mark,

The data being transferred are not buffered inside PCIe module. That means each CPU write or EDMA burst size will be translated into one PCIe packet.

So if you use CPU to do the transfer, the payload size of each PCIe packet will be 4 byte (32 bits), while EDMA could achieve higher efficiency since the burst size of EDMA (payload size of PCIe packet) could be 64B or 128B.

It is suggested using the EDMA transfer queue with larger burst size for the PCIe transfer if possible.

Sincerely,

Steven

Processors

Processors forum

Transferring large bursts of data over PCIe link ?