• Resolved

AM5728: PCIE Performance

Part Number: AM5728

Hello, TI Experts,

 

Our customer plan to use AM5728 for their product.

And they sent us the question about PCIE performance of AM5728 by using TI-RTOS(PROCESSOR-SDK-RTOS-AM57X).

We found related wiki site like below for K2G(It seems to use Linux).

http://processors.wiki.ti.com/index.php/Processor_SDK_Linux_Kernel_Performance_Guide#PCIe_Driver

 

Question:

Is there any other appropriate document/web-site for PCIE performance of AM5728 by using TI-RTOS(PROCESSOR-SDK-RTOS-AM57X).

 

If there are any difference/notice to use this wiki-site PCIE performance information to AM5728, please also tell us.

 

Best regards,

  • Guru 50440 points
    Hi,

    The link you showed is AM572x PCIE as a root complex and works with a third-party 1Gb Ethernet card under Linux environment and try to get the throughput.

    For RTOS, we have the PCIE user guide here: software-dl.ti.com/.../Device_Drivers.html. The RTOS test is based on AM572x----PCIE-----AM572x connection. As the AM572x RTOS doesn't have any third party PCIE driver code inside, there is no way to test a AM572x RC using RTOS and the same PCIE Ethernet card as EP.

    In AM572x----PCIE-----AM572x setup, we have PCIE + EDMA support for PCIE throughput benchmarking. The code is under: pdk_am57xx_1_0_12\packages\ti\drv\pcie\example\sample\src. If you have two boards setup, you can run the test to benchmarking it. Sorry we didn't document the number in the user guide, I recall we did the test before, the actual throughput is about ~90% of the theoretical.

    Regards, Eric
  • In reply to lding:

    Hi,

     

    Thank you very much for your kindness.

    I really appreciate your help.

     

    We understand "the performance in below link is AM572x PCIE as a root complex".

    Our customer need "PCIE(EP) performance of AM5728 by TI-RTOS" to decide to use AM5728 for their product.

    So, your test result seems to be helpful for the customer.    

     

    Question:

    - Could you tell us the real value of "~90% of the theoretical" which you said for reference?

    - Could you tell us the test condition?

         - Both RC&EP EVM name, sample-code file name, PCIE-lanes ( 1-lane or 4-lane)

        

    We would appreciate if you share us the detail about your test result with "AM572x----PCIE-----AM572x by TI-RTOS".

    (We also try to run the sample-code, if you provide the detail.)

     

    Best regards,

  • Guru 50440 points

    In reply to matusan:

    Hi,

    You can see the user guide at software-dl.ti.com/.../Device_Drivers.html. The AM57x SOC has two PCIE lanes, supports GEN1 or GEN2.

    The test setup is AM572x IDK EVM to AM572x IDK EVM, with a PCIE cross over cable. The EVM only has PCIE X 1 lane connector, so you can only test x1 configuration. Another setup is AM571x IDK EVM to AM571x IDK EVM, the EVM has a PCIEx4 connector, so you can test PCIE X 2 configuration.

    The sample code is: C:\ti\pdk_am57xx_1_0_x\packages\ti\drv\pcie\example\sample\src\pcie_sample.c. See function PcieExampleEdmaRC(). It has write throughput and read throughput. You need do a little math to convert cycle to second based on CPU speed.

    The PCIE outbound size is "Maximum outbound payload size of 64 Bytes (the L3 Interconnect PCIe1/2 target ports split bursts of size >64 Bytes to the into multiple 64 Byte bursts). Maximum inbound payload size of 256 Bytes (internally converted to 128 Byte - bursts)", TRM 24.9.1.1 PCIe Controllers Key Features.

    In theory, for GEN2X2 the TH is: 5.0Gbps x 2 lane x 8/10 bit encoding * (64 / (64+PCIE TLP header)). The TLP header is about 24-28 bytes depending if 4-byte CRC added or not. So, it probably 5.8Gbps = 730 MBps. When you have GEN2X1 or GEN1X2, the throughput is halved to 365 MBps.

    I recall we got 350-360 MBps using the EVM for measurement (GEN2X1).

    Regards, Eric
  • In reply to lding:

    Hi,

    Thank you very much for your detail explanation.

    I really appreciate your help.

    I can success to run "pcie_sample.c" with GEN2X1 condition from your guide.

     - AM574xIDK: as RC

     - AM572xIDK: as EP

    Thank you!

    And I understand like below from your explanation;

       - In theory, the GEN2x1 throughput is 365 MBps.

       - TI observed the GEN2x1 throughput is 350-360 MBps using the EVM for measurement.

    Question:

     - Could you tell us how to calculate the GEN2x1 throughput using the EVM.

        We would like to share the console log of "RC". (Please refer attached pdf)

    We would appreciate if you tell us how to calculate the PCIE throughput from the console log.

    Best regards,

    log.pdf

  • Guru 50440 points

    In reply to matusan:

    Hi,

    I don't know if you run on A15 or on C66x or M4. Assuming you run it at A15 with default 1.0 GHz (this is setup by GEL file).
    Then, EDMA write 65536 bytes, takes 184931 cycles.

    X = 65536/1048576 = 0.0625 MB
    Y = 184931/1,000,000,000 = 0.000184931 second
    Throughput you obtained: X/Y = 337.96 MB/s.

    In theory: 5.0 Gbps * 8/10 (encoding) * (64/(64+24)) = 2.9090 Gbps=====> divided by (1.048576 * 8 bit/byte) = 346.8 MB/s

    Please note either use 1M = 1048576 or use 1M = 1000000 consistently in the math to make a fair comparison of what you obtained and what the theory number.

    Your number is pretty good.

    Regards, Eric
  • In reply to lding:

    Hi,

    Thank you very much for your detail explanation.
    This information is very helpful!
    I understood the calculation.

    Best regards,
  • Part Number: AM5728

    Hello, TI Experts,

     

    Our customer sent us an additional question about Debugging of "pcie_sample.c" from SD-card boot.

    http://e2e.ti.com/support/processors/f/791/p/741478/2742633#2742633

     

    We can also success to run "pcie_sample.c" from SD-card like below procedure.

    - prepare FAT32 SD-card & insert it to Windows-PC.

    - copy MLO (C:\ti\pdk_am57xx_1_0_11\packages\ti\boot\sbl\binary\evmAM572x\mmcsd\bin\MLO) to SD-card.

    - copy app (C:\ti\pdk_am57xx_1_0_11\packages\MyExampleProjects\PCIE_idkAM572x_wSoCFile_armExampleProject\Debug\app) to SD-card

    - insert this SD-card to TMDXIDK5728 & boot.

     

    Question:

       Are there any way to debug this "app in SD-card" with CCS such as "BreakPoint-Debug"?

     

    We would appreciate if you tell us the recommended way of CCS-debugging on the EVM running program from SD-boot.

     

    Best regards,

  • Guru 50440 points

    In reply to matusan:

    Hi,

    The typical way I debug any program running on a boot-able media:

    1. I added a while loop in the beginning of the program to be debugged, like:

    unsigned int volatile flag = 1;

    void main () {

    while(flag);

    //the original code below
    ....
    }

    2. Use CCS to connect to the A15 core WITHOUT gel (because MLO initializes the board) and load the symbol, you should see the program stuck at above while(flag) location.

    3. Using CCS memory window to modify this flag from 1 to 0, then you can step through the code for debug.

    For you, if you use AM572x IDK EVM, you should use the MLO for AM572x IDK as well, not the GP EVM.

    Regards, Eric
  • In reply to lding:

    Hi,

    Thank you for your detail explanation.
    This information is very helpful!

    We can success to debug "app in SD-card" with CCS.
    We also use the MLO for AM572x IDK.

    Best regards
  • Part Number: AM5728

    Hello, TI Experts,

     

    Our customer sent us additional questions from the below E2E-thread.

    https://e2e.ti.com/support/processors/f/791/p/741478/2747658

     

    They would like to know which memory area is used as Source & Destination data location for PCIE data transfer demo.

    (DDR3? or OCMC_RAM?)

     

    Question:

    For Read demo from RC (like below console Log)

    1:Could you tell us where (memory address) to read data on EP?

    2:Could you tell us where (memory address) to write data on RC?

    3:Could you tell us where (which line) should be referred in the source code of "pcie_sample.c sample project" to know the Src/Dst Memory Address?

    4:For Write demo, the memory area of Src/Dst is same as the Read demo. Is this understanding correct?

     

    Best regards,