This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5716: PCIe RC inbound data issue

Part Number: AM5716

I have an AM571x configured as the PCIe RC with an FPGA endpoint. I am attempting to send a small packet of data from the FPGA to the DSP L2SRAM over PCIe using bus mastered writes. The packet is 292 bytes and requires a number of write transactions to complete, depending on the data payload size. I am encountering issues with parts of the packet not being written correctly into L2SRAM. My initial investigation has led me to believe that I may have a problem with destination address alignment.

1. With inbound data payload sizes of 64 or 128 bytes, parts of the packet are dropped when a single transaction crosses certain memory boundaries. Putting a DATA_ALIGN pragma of 128 bytes on the destination buffer address eliminates this problem.

2. With an inbound data payload size of 256 bytes (the supported maximum), I discovered that no write transaction at this size ever successfully is written into L2SRAM, regardless of what DATA_ALIGN pragma I use (up to a 512 byte alignment requirement). The maximum payload size of the EP FPGA device is 512 bytes and I am not depleting the RC posted header or data credit pools.

According to the AM517x technical reference manual: 

"The PCIe controller maximum inbound payload size is 256 Bytes. The PCIe master port burst
maximum length is 16 words (16 x 64-bit data words = 128 Bytes per burst). Hence a 256 Byte
inbound data payload is converted by the PCIe master port to 2 max-sized bursts (128 Byte each)
towards the device L3_MAIN interconnect PCIe slave port which is 128 Byte-burst compatible."

AND

"Non-aligned bursts can be generated - bursts that are not aligned with their own size: a burst-aligned
portion of 2N-bytes aligned burst starts on a byte address multiple of 2N, that is a byte address with the
N LSBs at '0')."

From this language, it is unclear to me what is actually the required destination address alignment of PCIe inbound transactions. Regardless, I do not understand why a 256 byte payload is failing to be written at all. 

Thanks for the help,

Sean

  • Please post what software, resp. which version you use.
  • We are using RTOS Processor SDK for AM57x v4.00.00.04.
  • Hi,

    Looks that the FPGA can generate a burst write of 64, 128, 256 and 512 bytes, this will come into PCIE master port and goes into L3_Main, further landed into L2 of DSP of AM57x.

    On the FPGA side, when you read from FPGA, is the address aligned to 64 or 128 bytes, what is the setting of RCB in the FPGA side?

    On the AM57x side, is L2 cache enabled or not for C66x? Just want to make sure if you have L2 cache enabled, you need to invalidate the cache line to make sure the data show up. The data may land in several cache lines. For cache coherence: Align must be a multiple of cache line size (L2=128 bytes, L1=64 bytes) to operate with cache enabled.

    So, you need to use 128-byte alignment in DSP side if cache enabled and FPGA generates 64 or 128 bytes write. This matched your observation.

    For the 256 bytes maximum inbound size, we don't have an easier way to test it, as it relies on the remote generate a burst of 256 bytes of read/write. Our current Keystone I/II and Sitara devices only can generate 128 bytes maximum outbound transaction, so when two TI devices connected together, we can only verify 128-byte burst. As you mentioned some FPGA can generate even 512 byte burst, sorry we don't have such setup.

    My suggestion is to use 128-bytes write from FPGA and setting alignment of 128 bytes on DSP side to maximum throughput. Although 256 bytes should work, but we don't have a setup for trial and debug.

    Regards, Eric
  • Hi,

    Our PCIE HW expert is out of office and he will be back next week and update here.

    Regards, Eric
  • Hi Sean,
    Just an update: We are looking at this internally but I don't have an answer for you yet. As Eric mentioned, we do not have any way to validate this in-house so it's taking a bit longer than usual.

    I'll continue to update this thread as info become available.
  • Thank you for the update!

  • Hi Sean,
    I apologize, but we still don't have a resolution for this. The problem remains the test case, or more precisely, that we don't have one that allows us to repro the issue.

    I'll continue to update this thread as we progress. As Eric suggests above, our recommendation for now is that you limit writes to 128B.
  • Hi Sean,
    Just an update to let you know that we think we have identified an endpoint that mimics yours. This issue is in active debug.
  • Thank you for keeping me posted. We are still very interested in the outcome here.

    Sean

  • Hi Sean,
    Still working this. Slowed a bit by a a problem with our PCIe protocol analyzer, but the replacement part has been ordered and should be here next week.
  • Hi Sean,
    Still waiting for the analyzer part. The vendor assures me it shipped Monday.
  • Hi Sean,

    Great news, our analyzer is up-and-running and I'm actively looking into your issue. Sorry for the delay.

    I have the AM57xx instrumented (acting as RC) with a LAN EP attached. The bus is configured for MPS=128B and MRRS=512B. I've attached a analyzer trace illustrating an upstream read request of 512B (128DW) being serviced successfully. (TLP's 3719,3720,3721,3722):

    I can also execute bursted upstream writes of MPS without issue (TLPs 3049, 3050):

    I'm using the current Linux SDK from TI.COM.

    Please take a look at the traces and let me know if you have further questions.

  • Thank you for the update. Have you had any luck reproducing the 256 byte inbound data payload issue? (MPS=256). This is what I am most interested in, as the AM571x TRM states that it is supported but I could not get them to work at all. 128-bytes is working fine for us at the moment, as long as the destination buffer address is aligned properly with a pragma.
  • Hi Sean,
    Sure, let me see if I can get MPS=256 running.
  • Hi Sean,

    Got an OS build with MPS=256/MRRS=512 from the SW team and recorded the resulting transfers. No issues found for either reads or writes.

    Did this answer your question?

  • Can you confirm that the payload data is actually being written into memory at the destination address specifically for an MPS=256 bus master write into the AM571x RC? What I was seeing in code composer was that the write data from the FPGA was never making it into memory. My destination address is in the L2SRAM of the DSP core 0 (in the region starting at 0x40800000). For MPS=128, I see the payload data arrive successfully in the memory browser of code composer. For MPS=256, the destination buffer remains all 0x0. I have seen nothing to indicate that the actual PCIe transactions aren't taking place. It seems more likely that there is a problem writing the payload data to system memory after receiving the PCIe packets.

    I can reproduce this and pass along my findings, but it has been months since I was working on the project and will need some time to get back into it.

  • Sean,
    The application I was running would crash if the data wasn't being delivered. That said, I don't know exactly where the data is being delivered.

    I'll need our SW team to chime in. I'll ping them internally and they will post here.
  • Dave,

    This is Linux based test. The RC side is AM57x running Linux and EP side I believe is a third-party PCIE Ethernet card. You ran "ping -l size" command to generate a traffic with MPS=256, correct?

    Sean,
    If the Linux host didn't receive the data (that is data is not actually landed into SOC memory correctly), the ping test will fail. As the ping worked through continuously (Dave, correct me if I am wrong), the data should be landed but it is hard to locate in memory via JTAG as this is virtual Linux address.

    Regards, Eric
  • Thank you for the update.

    I suppose that we are a bit unusual in our usage of the AM571x, but we are not running Linux at all. The ARM core is simply used to boot the DSP and we never use it again. Given what I have seen, I suspect that there might be an issue accepting inbound 256-byte payload PCIe packets and writing them into the DSP's memory. As I previously stated, a JTAG probe of the DSP L2SRAM proves that 128-byte payload packets are successfully written into memory, but 256-byte ones are not. Is there any way you could modify your test to attempt to utilize the DSP?

    Sean

  • Sean,

    I thought this is very difficult. The RTOS on ARM or DSP is only a static PCIE link setting and just enable the link training. It doesn't have enumeration process and will not work with a third party card. Even it passed link training, there is no driver in the PCIE RTOS driver to incorporate any third party PCIE-USB/NIC/SATA driver code, normally they are already available in the Linux or Windows distribution, or you may update from the internet. There is no such way you can do for RTOS driver. You can can't do any data traffic test like "ping". We don't have a plan to do this development work, this will only work for a particular card with RTOS.

    I am not sure if this is DSP or ARM issue or different memory point issue:
    1. You can run the PCIE still in DSP, with 256 bytes payload, but put buffer from 0x4080_0000 to 0x8000_0000 or similar in DDR, does this work?
    2. The PCIE test example we have can run on ARM/DSP/M4. You can directly run it on ARM with JTAG (without using ARM boot and load DSP) to see if 256 bytes landing in DDR works or not?

    Regards, Eric