TMS320C6678: Maximum size of a single EDMA transfer?

Wolfgang Scholz

Part Number: TMS320C6678

Hello,

I am using EDMA for larger data transfers of variable size from DDR3 to DDR3, but also from PCIe mapped memory to DDR3.

By using the CSL, I try to find reasonable values for the three available EDMA loops. Please correct me if I am thinking the wrong way.

My scenario: Power-of-two biggest possible single EDMA transfer
Without chaining or linking, the biggest possible single transfer should be AB-synchronized. In this scenario, I am lucky to have only sizes of a power of two:
aCnt = 0x4000;
bCnt = 0x8000;
aCntbCnt = CSL_EDMA3_CNT_MAKE(aCnt, bCnt);
srcDstBidx = CSL_EDMA3_BIDX_MAKE(aCnt, aCnt);
cCnt = 1;
The counters are unsigned 16 bit values. For aCnt, I have to care for the "stride" being a signed 16 bit value.
The resulting transfer would be 537 MB - enough for my Scenario

I was able to test up to 8 MB from DDR3 to DDR3 successfully, but copying from PCIe I seem to encounter a limit at 1 MB. I don't yet know what is limiting this.

Are there reasons not to pack all data into a single transfer?

I have thought about using a cCnt > 1, but I am still working on that.

over 6 years ago

0 Wolfgang Scholz over 6 years ago

Prodigy 110 points

My current working thesis is that copying data via EDMA from PCIe mapped memory does not work for single transfers > 1 MB.
This is why I am setting up a chained transfer using all three EDMA loops now.

I am using the cCnt now, so I also need to set the stride for the scrDstCidx, which is a 16 bit signed integer. This reduces the possible size of the AB-synchronized transfers (which are now partial transfers) to 16 KB (the nearest power of two).
16 KB are easy to reach by the bCnt alone, so I set the aCnt to a very small value of 4 bytes.
aCnt = 0x4;
bCnt = 0x1000;
aCntbCnt = CSL_EDMA3_CNT_MAKE(aCnt, bCnt);
srcDstBidx = CSL_EDMA3_BIDX_MAKE(aCnt, aCnt);
cCnt = 0xFFFF;
srcDstCidx = CSL_EDMA3_BIDX_MAKE(aCnt*bCnt , aCnt*bCnt);

The resulting transfer would be over 1 GB.
For practical reasons I make the bCnt and cCnt adjust to the transfer size. I omit the arithmetic code here.

I adjusted the PaRAM setup:
option = CSL_EDMA3_OPT_MAKE( CSL_EDMA3_ITCCH_EN, CSL_EDMA3_TCCH_DIS,\
CSL_EDMA3_ITCINT_DIS, CSL_EDMA3_TCINT_EN, channelNum, CSL_EDMA3_TCC_NORMAL,\
CSL_EDMA3_FIFOWIDTH_NONE, CSL_EDMA3_STATIC_DIS, CSL_EDMA3_SYNC_AB,\
CSL_EDMA3_ADDRMODE_INCR, CSL_EDMA3_ADDRMODE_INCR );

CSL_EDMA3_ITCCH_EN allows the EDMA controller to signal intermediate transfer completion for every iteration of the C loop.

CSL_EDMA3_STATIC_DIS: It took me some time to find out that it is necessary to have nonstatic addressing at a chained transfer, otherwise the transfer gets stuck after doing one intermediate transfer.
The EDMA Keystone User Guide sprugs5b says about CSL_EDMA3_STATIC_DIS:
"0h = Set is not static. The PaRAM set is updated or linked after a TR is submitted. A value of 0 should be used for DMA channels and for non-final transfers in a linked list of QDMA transfers.
1h = Set is static. The PaRAM set is not updated or linked after a TR is submitted. A value of 1 should be used for isolated QDMA transfers or for the final transfer in a linked list of QDMA transfers."

The result works fine for transfers from DDR3 to DDR3 again, but from PCIe the transfers still get stuck at > 1 MB.

Thank you for any suggestions.

0 lding over 6 years ago in reply to Wolfgang Scholz

TI__Guru* 95265 points

Hi,

The C6678 has different EDMA Transfer Controllers (TC), some supports data burst size (DBS) 64 bytes, some supports DBS = 128 bytes. For any larger transfers, the system breaks it down into smaller 64 or 128 packets, depending to DBS limitation. Of course, this is hidden from user side.

Typically you can set CCNT = 1, ACNT = 128, BCNT = a big number < 65536 for transfer with AB SYNC. I am not aware of any limitation of PCIE + EDMA of 1MB size. There is PCIE test example for C6678 with EDMA support: software-dl.ti.com/.../Device_Drivers.html You can change the transfer size to make it bigger than 1MB.

We also have an PCIE + EDMA transfer example where Linux PC is the host and we transferred 4MB: pdk_c667x_2_0_X\packages\ti\boot\examples\pcie\linux_host_loader

Regards, Eric

0 Wolfgang Scholz over 6 years ago in reply to lding

Prodigy 110 points

Thank you, Eric.

I guess you are right that the problem of the 1MB transaction limit lies outside of the EDMA. I am investigating this.

I will report as soon as I have found something.

0 Wolfgang Scholz over 6 years ago in reply to Wolfgang Scholz

Prodigy 110 points

Eric,

comparing my code for PCIe setup with the examples you provided, my attention was drawn to the following lines of code:

From pcie_sample.c:
/* Only required for v0 hw */
obSize.size = pcie_OB_SIZE_8MB;

My code used pcie_OB_SIZE_1MB.

The outbound size is relevant for the C6678, as it has PCIe hardware revision 0:
"There are three revisions of the pcie hardware. The first, v0, in KeyStone devices (C66x, K2x). The second, v1, is in AM57xx devices. The third, v2, is in AM65xx devices." -
software-dl.ti.com/.../Device_Drivers.html

Changing the obSize.size to 8MB solved the problem for me.

While there are actually 32 outbound translation regions of 1MB, the reason why it did not work may be this:
"If a transaction is large enough that it goes past the address translation region, unspecified behavior may occur. [...] So, a memory write, for example, will not automatically go to the next translation region if it starts in the previous one and is bigger than the remaining size in the starting translation region." - sprugs6d PCIe user guide 2.7.1.1

0 Victor Kazmirenko over 6 years ago in reply to Wolfgang Scholz

Guru 13202 points

Hello!

Nice to hear you found solution.

Meanwhile I'd like to draw your attention, that according do PCIe spec, transactions are not allowed to cross 4KB boundary. As 1MB translation region has its boundary aligned on 4KB, so no one transaction may cross translation region boundary and point of 2.7.1.1 slips out of my understanding.

0 Wolfgang Scholz over 6 years ago in reply to Victor Kazmirenko

Prodigy 110 points

Hi rrlagic,

concerning sprugs6d 2.7.1.1, I'm not entirely sure that the word "transaction" always means the same thing throughout the document. EDMA transactions are split up internally, just as Eric wrote. I suppose that the EDMA Transfer Controller's default burst size complies with the 4KB PCIe transaction boundary you mention and it is the sub-transactions' size that is relevant here.

Staying inside an address translation region boundary seems to be an additional obligation to me - and it has different constraints.
As an indication, I did not succeed in splitting up my transaction into separate EDMA transfer calls, such that each of them matched the size of a single translation region. As result, the first call succeeded, while the second call got stuck without signaling the interrupt - with no boundaries crossed. Thus, I don't know if 2.7.1.1 really applies here. pcie_OB_SIZE_8MB did.

Processors

Processors forum

TMS320C6678: Maximum size of a single EDMA transfer?