This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6670 PCIe RC does not receive 256B writes from EP

Other Parts Discussed in Thread: 66AK2E05

Hello!

I am working on PCIe link between C6670 DSP and FPGA. In my FPGA design I have busmastering DMA engine. It is capable of making writes to DSP. As I know, the DSP can handle inbound TLPs with 256B of payload and outbound TLPs with 128B of payload.

I have tried to transmit 128B TLP from FPGA to DSP and it does arrive successfully to destination buffer. However, if I send 256B TLP, it does not appear in destination buffer at all.

PCIe destination address is 0x90000000, translated to memory buffer. I see the TLP appearing on transaction interface of FPGA PCIe block, and its values seems to be legit. Anyone can confirm 256B TLP reception on C6670?

Please suggest, is there any way to debug, why that packet was not received?

Another point is configuration of PCIe SS. I am reading DEVICE_CAP and DEV_STAT_CTRL registers.

The former, DEVICE_CAP at 0x21801074 reads as 0x00008001. Its bits [2:1] define MAX_PAYLD_SZ and there is value of 1. This value corresponds to maximum supported payload size of 256B according to PCIe spec.

Next, DEV_STAT_CTRL register at 0x21801078 reads as 0x0000281F. Bits [7:5] define max payload and that is 1 again, corresponding to 256B. However, bits[14:12] have value of 2 and they define max read request size to be 512B.Isn't there a contradiction?

Thanks in advance.

 



  • Hi,

    C66x PCIe has the capability to support 128 bytes payload size in max for outbound transfer and 256 bytes payload size in max for inbound transfer.

    Have you using EDMA for PCIe transfer? The EDMA in C66x supports the maximum 128-byte Data Burst Size (for specific transfer queue), which means the RC could write or read 128 bytes in maximum to/from EP.

    Thanks,
  • Hello,

    I am making transfers with DMA engine on FPGA side. It has bus master capability as well. Right now I cannot use EDMA of DSP because there is no matching hardware on the other side. Would appreciate a hint how to debug.

    And another point, could you please comment on values of DEV_STAT_CTRL register?

  • Hi,

    the DEV_STAT_CTRL register value of 0x0000281F corresponds to a value of 0 in Bits [7:5] (128 Bytes), doesn't it?

    Please note that there is a performance problem for inbound transfers in keystone devices when the max. payload value in DEV_STAT_CTRL is increased to 256 bytes:
    https://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/158004

    Ralf

  • Hi Ralf,
    You're right, MAX_PAYLD in DEV_STAT_CTRL[7:5] is really zero, corresponding to 128B.
    And about MAX_REQ_SZ in DEV_STAT_CTRL[14:12], which is 2=>512B. Does that mean that with default setup DSP is allowed to make requests for 512B at a time, but completions still are expected to be broken in 128B or 256B packets? Not to mention, that in SPRUGS6D in the clause 1.3 Features there is a statement:

    Maximum remote read request size of 256 bytes

    And thanks for pointing that performance issue topic. Its 3 years old already and we still cannot read that precaution in the manual :-( Actually, it took me efforts and fabric space to increase outbound payload in FPGA. Now I know it did not worth.

  • Well, the ultimate reason of the initial problem is that I have to program 256B supported payload to DEV_STAT_CTRL[7:5] which I did not.
  • Hi,

    I think your are right, completions for read requests >= 512MB have to be broken into smaller packets.

    I also agree with your disappointment, this problem still seems to be undocumented. TI actually recommends to increase payload size for inbound transfers in sprabk5a "Throughput Performance Guide":
    "For inbound data transfers, the PCIe peripheral is capable of handling payload sizes larger than 128 bytes, which can further reduce overhead and increase throughput performance."

    Every user has to waste time to figure out that this doesn't work. I think this issue should also be listed in the Silicon Errata.

    Unfortunately, you already marked this thread as answered and nobody at TI will read it again...

    Ralf

  • Rejected my answer in hope the thread might get some attention.
  • Hi Ralf,

    Thanks for your response and your time. I will check with my team and update the information on TI PCIe related documents or Silicon Errata document.

    Thanks,
  • Hi,

    We are trying something similar. We have connected Spartan-6 FPGA to 66AK2E05(Keystone SoC). K2 is configured as RP and PCIe in FPGA as EP. We are able do outbound data transfer from K2 to FPGA. But We are not able to do inbound write operation from FPGA to K2(Which you have achieved successfully).

    Though I don't have much info on software side, as far as I know We are configuring
    IB_START_lo - 0x80000000
    IB_OFFSET - 0x80000000
    And and address register at FPGA to 0x80000000

    Can you please let me know the IB Registr configuration you did? And what was the destination buffer location(was it DDR3 or some other memory space)?

    Best Regards
    Guru
  • Hi,

    To my memory we have to configure: 1) BAR index for IB translation; 2) IB LO/HI addresses; 3) Offset; 4) IB translation region. Here is excerpt from our code:

    // Inbound Region 0
    ibCfg.ibBar         = PCIE_BAR_IDX; // Match BAR that was configured above
    ibCfg.ibStartAddrLo = PCIE_IB_R0_LO_ADDR;
    ibCfg.ibStartAddrHi = PCIE_IB_R0_HI_ADDR;
    ibCfg.ibOffsetAddr  = (u_int32) Convert_coreLocaltoGlobalAddr ( (u_int32) pcie_l2_buf.buf );
    ibCfg.region        = PCIE_IB_REGION_0;
    
    pcie_ib_trans_cfg(handle, &ibCfg);
    

    You can easily guess values of those macro constants, and address conversion function is copied from PDK examples. It makes global address for given pointer.

    What I can guess, is that your IB_OFFSET setting is not correct. You have to put true address of the receiving buffer there. You have to keep in mind three types of addresses being used with PCIe. The first one is PCIe data memory window. On C6670 which is Keystone I we have that at 0x6000_0000. It can be used directly and many people do so without translation. The second is somewhat virtual PCIe address space. You may think, that accesses to PCIe data window address get translated to PCIe address space. Many examples use 0x9000_0000 address for that. This range on my device is not used by any hardware, so its ok to use it for virtual mapping. Thus, if one define OB (here I speak about OB!), then access to 0x6000_0000 window gets translated to 0x9000_0000 window according to OB translation regions. So see the sequence. You take some buffer or variable and that implicitly defines the third address. Then copying from this DSP buffer to PCIe space is like copying from, say, your L2SRAM or DDR3 to 0x6000_0000 window. PCIe hardware translates that to 0x9000_0000 window, so final receiver gets those data targeted to 0x9000_0000. Beware, many people do not use OB translation at all, I described that just to give an idea how it works.

    Transfer in the opposite direction should involve something similar by meaning, however in details. When your EP writes data towards DSP, it writes it to PCIe address space. For instance, that still could be somewhere in 0x9000_0000 window. However, at this time you need to setup DSP's BAR in way, that FPGA's writes match target BAR. If BAR accepts that address, next it goes through IB translation logic. Now we have to map PCIe address to the real receive buffer. If you have some buffer in L2SRAM, you have to define IB_OFFSET with that buffer start address, and that address to be converted to global address. If your receive buffer is in DDR3 or MCSRAM, then its address is already global.

    So, once again, assuming your FPGA is writing to 0x8000_0000, and you want those writes to land to some int buf[MANY]; you have to define IB_OFFSET=&buf[0];

    Hope this helps.

  • Hi,

    Thank for the response. 

    We are able to solve the issue. We were able to see the data landing into DDR area, but still there is some ambiguity wrt physical and virtual address in K2. Apart from this There was an issue with FPGA logic where in TLP packet the address was being sent incorrectly. Once this was corrected, it started working.

    Will post the details once it is concluded.

    Thank You

    Guru