66AK2L06: Problem with PCIe DMA read length

Yurii Monakov

Intellectual 660 points

Part Number: 66AK2L06

Hi All!

We have a custom board with K2L SoC acting as RC and FPGA as endpoint.

Software reads data from EP with QDMA and CPU. CPU read transactions run without any issue.

The problem is that when reading odd number of words with DMA we observe that read TLP holds next even number of words.

For example, we start QDMA with 0x1C bytes to read (A-sync transfer), 7 dwords. But on FPGA side we receive TLP with length field equal to 8. And every odd length get rounded to the next even.

This extra read is acceptable when we deal with memory. The problem arises when reading from FIFO because it can leave hardware in inconsistent state.

So what can be the root cause of this? I can't find any explanation of this behaviour in the docs.

Best regards,

Yurii

over 4 years ago

0 Yurii Monakov over 4 years ago

Intellectual 660 points

We program QDMA PaRAM set as follows

	next	= 0xFFFF;
	abcnt	= CSL_EDMA3_CNT_MAKE(0x1C, 1);
	bidx	= CSL_EDMA3_BIDX_MAKE(0, 0);
	brld	= CSL_EDMA3_LINKBCNTRLD_MAKE(0xFFFF, 1);
	cidx	= CSL_EDMA3_CIDX_MAKE(0, 0);
	opt	= CSL_EDMA3_OPT_MAKE(	CSL_EDMA3_ITCCH_DIS,
					CSL_EDMA3_TCCH_DIS,
					CSL_EDMA3_ITCINT_DIS,
					CSL_EDMA3_TCINT_EN,
					0,
					CSL_EDMA3_TCC_NORMAL,
					CSL_EDMA3_FIFOWIDTH_NONE,
					CSL_EDMA3_STATIC_EN,
					CSL_EDMA3_SYNC_A,
					CSL_EDMA3_ADDRMODE_INCR,
					CSL_EDMA3_ADDRMODE_INCR);

	
	/* set transfer parameters */
	tpcc->PARAMSET[pset].OPT		= opt;
	tpcc->PARAMSET[pset].SRC		= 0x501c0000;
	tpcc->PARAMSET[pset].A_B_CNT		= abcnt;
	tpcc->PARAMSET[pset].SRC_DST_BIDX	= bidx;
	tpcc->PARAMSET[pset].LINK_BCNTRLD	= brld;
	tpcc->PARAMSET[pset].SRC_DST_CIDX	= cidx;
	tpcc->PARAMSET[pset].CCNT		= 1;

QDMA trigger word is 3 (PARAMSET.DST, address aligned on 4-byte boundary). On FPGA side this setup result in TLP with length equal to 8, not 7 (0x1C/4).

Yurii

0 Yurii Monakov over 4 years ago in reply to Yurii Monakov

Intellectual 660 points

Another observation: if exactly 1 DWORD is requested over QDMA (ACNT = 0x04), correct TLP with unity length is received on FPGA side.

Yurii

0 lding over 4 years ago in reply to Yurii Monakov

TI__Guru* 95265 points

Hi,

The EMDA code looks right. It seems that you have the PCIE trace capture. If that is the case, can you upload the PCIE read TLP packet for several cases, e.g, read ACNT = 0x4, ACNT=0x1C and ACNT = 0x20?

Edited: I meant just screenshot.

Regards, Eric

0 Yurii Monakov over 4 years ago in reply to lding

Intellectual 660 points

Hi, Eric!

Thank you for the answer.

Actually we have only transaction level trace coming out of Xilinx 7-series PCIe IP core. So I can’t definitely say that observed TLPs are “on the wire”.

Regarding to your question, PCIe core outputs:

1. If ACNT is 0x04, then TLP length is 1

2. If ACNT is 0x1C, then TLP length is 8

3. If ACNT is 0x20, then TLP length is also 8

Reading PCIe data space by CPU results in TLP length of 1. In all described cases TLP byte enables are contiguous sequence of 1’s (0xff).

Best Regards,

Yurii

0 Yurii Monakov over 4 years ago in reply to lding

Intellectual 660 points

> I meant just screenshot

I can provide only Chip Scope screenshots. We don’t have full-blown PCIe sniffer.

0 lding over 4 years ago in reply to Yurii Monakov

TI__Guru* 95265 points

Hi,

If you use K2L EDMA the same code for read, but changing you source address (0x501c0000) to any K2L internal memory address, say global L2 ,MSMC or DDR3A. And do a read of 0x1C bytes then look at your destination buffer, how many bytes of new data you received? 0x1c or 0x20? This we can understand if EDMA problem or PCIE.

Regards, Eric

0 Yurii Monakov over 4 years ago in reply to lding

Intellectual 660 points

Hi, Eric,

Sorry for the late answer, I was on a business trip.

Today I've tested EDMA transfers between MSMC SRAM (0x0C000000) and OCM SRAM (0x70000000):

1. Initialize first 8 words of MSMC SRAM with random data
2. Transfer 7 bytes from 0x0C000000 to 0x7000000
3. Transfer 7 words (0x1C bytes) from 0x0C000000 to 0x70000000
4. Transfer 8 words (0x20 bytes) from 0x0C000000 to 0x70000000
5. Transfer 7 words (0x1C bytes) from 0x501C0000 (PCIe data space) to 0x70000000
6. Transfer 0x1B bytes from 0x501C0000 to 0x70000000

And every single test went fine, no data in OCM SRAM was overwritten during transfers.

Event In the last two tests with PCIe data space exactly 0x1C and 0x1B bytes were filled in destination buffer.

So the EDMA controller is working fine. But we are still observing only even TLP sizes on the FPGA side.

Best Regards,
Yurii

0 lding over 4 years ago in reply to Yurii Monakov

TI__Guru* 95265 points

Hi,

Thanks for the various testing. Can you clarify the test connection for below:

5. Transfer 7 words (0x1C bytes) from 0x501C0000 (PCIe data space) to 0x70000000
6. Transfer 0x1B bytes from 0x501C0000 to 0x70000000

If the PCIE data space is usable, it must be mapped to a remote end FPGA device. Then the transfer from 0x501c0000 to 0x7000_0000 is a PCIE read from remote FPGA into OCM RAM of K2L.

>>>>Event In the last two tests with PCIe data space exactly 0x1C and 0x1B bytes were filled in destination buffer.>>>>

>>>>But we are still observing only even TLP sizes on the FPGA side.>>>>There are PCIE read request packet and completion packet. I knew you don't have a scope, can you clarify that even TLP size showing in FPGA is read packet (from K2L) or completion packet (from FPGA)? In other word, can you determine if TI read TLP has wrong data size? Or the FPGA round up the odd size request to even number? And if the size is round up to the even number, why there is no corruption in the destination buffer?

Regards, Eric

0 Yurii Monakov over 4 years ago in reply to lding

Intellectual 660 points

Hi,

> Can you clarify the test connection for below

Yes, everything is right. This is read request from DSP (RC) to remote FPGA (EP).

> Event In the last two tests with PCIe data space

I'm sorry, not "Event" but "Even" (as "actually").

> even TLP size showing in FPGA is read packet (from K2L) or completion packet (from FPGA)

It is read request packet coming out of DSP observed on FPGA side. And it's size is always even for read requests larger than one 32-bit word. So 7 words read request from DSP comes out of the FPGA core with size equal to 8.

> In other word, can you determine if TI read TLP has wrong data size?

I can only see what TLPs are coming out of PCIe core.

> And if the size is round up to the even number, why there is no corruption in the destination buffer

Hard to say. May be because of transfer controller, which definitely knows (in terms of bytes) what was requested and what portion of destination buffer is to be modified.

Best Regards,

Yurii

0 Yurii Monakov over 4 years ago in reply to lding

Intellectual 660 points

Hi,

This is TLP header we receive from Xilinx PCIe IP core reading 5 DWORDs (20 bytes) with EDMA3 from address 0x5081C000 (32-bit read request):

TLP DW0: 0x00000006
TLP DW1: 0x000013FF
TLP DW2: 0x5081C000

Reading 6 DWORDs (24 bytes) with EDMA3 from the same address leads to this TLP header:

TLP DW0: 0x00000006
TLP DW1: 0x000000FF
TLP DW2: 0x5081C000

So the only difference is the "Tag" field. But the memory in destination buffer is updated with exactly 5 and 6 DWORDs.
We cant distinguish between these read requests on FPGA side using byte enables.
FPGA always responds with 6 DWORDs and this extra read break internal packet FIFOs.

Best Regards,
Yurii

0 lding over 4 years ago in reply to Yurii Monakov

TI__Guru* 95265 points

Hi,

Thanks for the info! As the 1st BE and last BE fields are all 0xFF in both cases, so it requested all 4 bytes reading with length of 6 DW. There is no difference to FPGA side.

For tag field, it is a tracking number: When the Completer responds, it must copy this value to the Completion TLP. This allows the Requester to match Completion answers with its Request. So whether 0x00 or 0x13 is just a tracking number, can be anything 5-bit long.

I have asked our PCIE IP designer why the K2L sending out TLP length field rounding up to the nearest even number.

>>>>TLP header we receive from Xilinx PCIe IP core>>>>> One thing I am not sure if how reliable of the TLP observation from the FPGA IP core? I understood that you don't have a high speed scope for PCIE trace capture.

Regards, Eric

0 lding over 4 years ago in reply to lding

TI__Guru* 95265 points

Hi,

Our PCIE IP designer looked at the PCIe specifications for K2L and also looked at errata from PCIE IP vendor. We don’t see any limitation that indicates even DWORD length for TLP.

One possibility is that the address + length is crossing DWORD boundary. If this happens, it will round off to next DWORD length. However, looking at e2e thread, it looks like address used is aligned to DWORD boundaries.

For more detailed analysis, I need have to file ticket with PCIE IP vendor, which will take very long time. Also I think we need protocol analyzer capture from you to confirm TLP length on the bus (to rule out any potential issue on FPGA side). Is this possible?

Also, I would like to think about a workaround: to split the odd DW EDMA transfers into two: one EDMA with even DW and the other EDMA with 1 DW. Given your observation >>>>Another observation: if exactly 1 DWORD is requested over QDMA (ACNT = 0x04), correct TLP with unity length is received on FPGA side.>>>> This should work. You can use two EDMA channels chained together, the completion of the first channel triggers the second channel.

Let me know if you can provide PCIE protocol analyzer capture for further investigation.

Regards, Eric

0 Yurii Monakov over 4 years ago in reply to lding

Intellectual 660 points

Hi,

> Is this possible?

Thanks for support. I don't think that we can attach PCIe protocol analyzer to our board. We have direct connection between K2L and FPGA. And we don't have any Keystone II dev boards with external connectors.

The only way is to get low-level bus traffic from Xilinx PCIe core. I need to consult our hardware developers if it is possible.

> Also, I would like to think about a workaround

Thank you for suggestion. We are already using same quick workaround, but final read is a CPU request.

I'll try to ask on Xilinx support forums if this behavior could be explained by PCIe core peculiarities. If they provide any solution or workaround I will post it here.

Best Regards,

Yurii

0 lding over 4 years ago in reply to Yurii Monakov

TI__Guru* 95265 points

Yurii,

Thanks for the update! And glad to know you already implemented a workaround. Keep us updated if you hear any explanation from Xilinx side, I am also interested in it, but unable to investigate further without a PCIE trace.

Regards, Eric

Processors

Processors forum

66AK2L06: Problem with PCIe DMA read length