This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: PCIE Performance

Guru 10235 points
Part Number: AM5728

Hello, TI Experts,

 

Our customer plan to use AM5728 for their product.

And they sent us the question about PCIE performance of AM5728 by using TI-RTOS(PROCESSOR-SDK-RTOS-AM57X).

We found related wiki site like below for K2G(It seems to use Linux).

http://processors.wiki.ti.com/index.php/Processor_SDK_Linux_Kernel_Performance_Guide#PCIe_Driver

 

Question:

Is there any other appropriate document/web-site for PCIE performance of AM5728 by using TI-RTOS(PROCESSOR-SDK-RTOS-AM57X).

 

If there are any difference/notice to use this wiki-site PCIE performance information to AM5728, please also tell us.

 

Best regards,

  • Hi,

    The link you showed is AM572x PCIE as a root complex and works with a third-party 1Gb Ethernet card under Linux environment and try to get the throughput.

    For RTOS, we have the PCIE user guide here: software-dl.ti.com/.../Device_Drivers.html. The RTOS test is based on AM572x----PCIE-----AM572x connection. As the AM572x RTOS doesn't have any third party PCIE driver code inside, there is no way to test a AM572x RC using RTOS and the same PCIE Ethernet card as EP.

    In AM572x----PCIE-----AM572x setup, we have PCIE + EDMA support for PCIE throughput benchmarking. The code is under: pdk_am57xx_1_0_12\packages\ti\drv\pcie\example\sample\src. If you have two boards setup, you can run the test to benchmarking it. Sorry we didn't document the number in the user guide, I recall we did the test before, the actual throughput is about ~90% of the theoretical.

    Regards, Eric
  • Hi,

     

    Thank you very much for your kindness.

    I really appreciate your help.

     

    We understand "the performance in below link is AM572x PCIE as a root complex".

    Our customer need "PCIE(EP) performance of AM5728 by TI-RTOS" to decide to use AM5728 for their product.

    So, your test result seems to be helpful for the customer.    

     

    Question:

    - Could you tell us the real value of "~90% of the theoretical" which you said for reference?

    - Could you tell us the test condition?

         - Both RC&EP EVM name, sample-code file name, PCIE-lanes ( 1-lane or 4-lane)

        

    We would appreciate if you share us the detail about your test result with "AM572x----PCIE-----AM572x by TI-RTOS".

    (We also try to run the sample-code, if you provide the detail.)

     

    Best regards,

  • Hi,

    You can see the user guide at software-dl.ti.com/.../Device_Drivers.html. The AM57x SOC has two PCIE lanes, supports GEN1 or GEN2.

    The test setup is AM572x IDK EVM to AM572x IDK EVM, with a PCIE cross over cable. The EVM only has PCIE X 1 lane connector, so you can only test x1 configuration. Another setup is AM571x IDK EVM to AM571x IDK EVM, the EVM has a PCIEx4 connector, so you can test PCIE X 2 configuration.

    The sample code is: C:\ti\pdk_am57xx_1_0_x\packages\ti\drv\pcie\example\sample\src\pcie_sample.c. See function PcieExampleEdmaRC(). It has write throughput and read throughput. You need do a little math to convert cycle to second based on CPU speed.

    The PCIE outbound size is "Maximum outbound payload size of 64 Bytes (the L3 Interconnect PCIe1/2 target ports split bursts of size >64 Bytes to the into multiple 64 Byte bursts). Maximum inbound payload size of 256 Bytes (internally converted to 128 Byte - bursts)", TRM 24.9.1.1 PCIe Controllers Key Features.

    In theory, for GEN2X2 the TH is: 5.0Gbps x 2 lane x 8/10 bit encoding * (64 / (64+PCIE TLP header)). The TLP header is about 24-28 bytes depending if 4-byte CRC added or not. So, it probably 5.8Gbps = 730 MBps. When you have GEN2X1 or GEN1X2, the throughput is halved to 365 MBps.

    I recall we got 350-360 MBps using the EVM for measurement (GEN2X1).

    Regards, Eric
  • Hi,

    Thank you very much for your detail explanation.

    I really appreciate your help.

    I can success to run "pcie_sample.c" with GEN2X1 condition from your guide.

     - AM574xIDK: as RC

     - AM572xIDK: as EP

    Thank you!

    And I understand like below from your explanation;

       - In theory, the GEN2x1 throughput is 365 MBps.

       - TI observed the GEN2x1 throughput is 350-360 MBps using the EVM for measurement.

    Question:

     - Could you tell us how to calculate the GEN2x1 throughput using the EVM.

        We would like to share the console log of "RC". (Please refer attached pdf)

    We would appreciate if you tell us how to calculate the PCIE throughput from the console log.

    Best regards,

    log.pdf

  • Hi,

    I don't know if you run on A15 or on C66x or M4. Assuming you run it at A15 with default 1.0 GHz (this is setup by GEL file).
    Then, EDMA write 65536 bytes, takes 184931 cycles.

    X = 65536/1048576 = 0.0625 MB
    Y = 184931/1,000,000,000 = 0.000184931 second
    Throughput you obtained: X/Y = 337.96 MB/s.

    In theory: 5.0 Gbps * 8/10 (encoding) * (64/(64+24)) = 2.9090 Gbps=====> divided by (1.048576 * 8 bit/byte) = 346.8 MB/s

    Please note either use 1M = 1048576 or use 1M = 1000000 consistently in the math to make a fair comparison of what you obtained and what the theory number.

    Your number is pretty good.

    Regards, Eric
  • Hi,

    Thank you very much for your detail explanation.
    This information is very helpful!
    I understood the calculation.

    Best regards,
  • Part Number: AM5728

    Hello, TI Experts,

     

    Our customer sent us an additional question about Debugging of "pcie_sample.c" from SD-card boot.

    http://e2e.ti.com/support/processors/f/791/p/741478/2742633#2742633

     

    We can also success to run "pcie_sample.c" from SD-card like below procedure.

    - prepare FAT32 SD-card & insert it to Windows-PC.

    - copy MLO (C:\ti\pdk_am57xx_1_0_11\packages\ti\boot\sbl\binary\evmAM572x\mmcsd\bin\MLO) to SD-card.

    - copy app (C:\ti\pdk_am57xx_1_0_11\packages\MyExampleProjects\PCIE_idkAM572x_wSoCFile_armExampleProject\Debug\app) to SD-card

    - insert this SD-card to TMDXIDK5728 & boot.

     

    Question:

       Are there any way to debug this "app in SD-card" with CCS such as "BreakPoint-Debug"?

     

    We would appreciate if you tell us the recommended way of CCS-debugging on the EVM running program from SD-boot.

     

    Best regards,

  • Hi,

    The typical way I debug any program running on a boot-able media:

    1. I added a while loop in the beginning of the program to be debugged, like:

    unsigned int volatile flag = 1;

    void main () {

    while(flag);

    //the original code below
    ....
    }

    2. Use CCS to connect to the A15 core WITHOUT gel (because MLO initializes the board) and load the symbol, you should see the program stuck at above while(flag) location.

    3. Using CCS memory window to modify this flag from 1 to 0, then you can step through the code for debug.

    For you, if you use AM572x IDK EVM, you should use the MLO for AM572x IDK as well, not the GP EVM.

    Regards, Eric
  • Hi,

    Thank you for your detail explanation.
    This information is very helpful!

    We can success to debug "app in SD-card" with CCS.
    We also use the MLO for AM572x IDK.

    Best regards
  • Part Number: AM5728

    Hello, TI Experts,

     

    Our customer sent us additional questions from the below E2E-thread.

    https://e2e.ti.com/support/processors/f/791/p/741478/2747658

     

    They would like to know which memory area is used as Source & Destination data location for PCIE data transfer demo.

    (DDR3? or OCMC_RAM?)

     

    Question:

    For Read demo from RC (like below console Log)

    1:Could you tell us where (memory address) to read data on EP?

    2:Could you tell us where (memory address) to write data on RC?

    3:Could you tell us where (which line) should be referred in the source code of "pcie_sample.c sample project" to know the Src/Dst Memory Address?

    4:For Write demo, the memory area of Src/Dst is same as the Read demo. Is this understanding correct?

     

    Best regards,

  • Hi,

    For Q1/2/3,
    When the RC reads data from EP, the memory address is determined by the inbound translation in the EP side. See below code in pcie_sample.c:

    ibCfg.ibBar = PCIE_BAR_IDX_EP; /* Match BAR that was configured above*/
    ibCfg.ibStartAddrLo = PCIE_IB_LO_ADDR_EP;
    ibCfg.ibStartAddrHi = PCIE_IB_HI_ADDR_EP;
    ibCfg.ibOffsetAddr = (uint32_t)pcieConvert_CoreLocal2GlobalAddr ((uint32_t)dstBuf.buf);
    ibCfg.region = PCIE_IB_REGION_EP;

    if ((retVal = pcieIbTransCfg(handle, &ibCfg)) != pcie_RET_OK)

    Set a breakpoint here and check the value of ibCfg.ibOffsetAddr. I believe is internal memory, please double check.

    Where the data write to the RC side is determined by EDMA read function,

    *totalTimePointer=0;
    totalDMATime = 0;

    edmaTransfer(hEdma,(EDMA3_Type) EDMA_TYPE, (unsigned int*) remoteBuf, (unsigned int*) source,
    ACount, BCount, CCount, EDMA3_DRV_SYNC_A,totalTimePointer);

    totalDMATime += *totalTimePointer;
    PCIE_logPrintf("EDMA read %d bytes with %d cycles\n", (PCIE_EXAMPLE_LINE_SIZE*PCIE_EXAMPLE_UINT32_SIZE), (unsigned int)totalDMATime);

    Set a breakpoint, check the remoteBuf value. I believe is DDR, please double check.

    For Q4, RC writes into the EP, the memory type used is the same as RC reads from EP.

    Regards, Eric
  • Hi,

    Thank you very much for your detail explanation.

    I really appreciate your help.

    I'd like to share the checked result below. (please refer the pdf in detail.)

     - Set a breakpoint here and check the value of ibCfg.ibOffsetAddr.

        -> We found "ibCfg.ibOffsetAddr=0x81082CD0".

             The address seems to be "DDR".

     - Set a breakpoint, check the remoteBuf value.

       -> We found "remoteBuf=0x21000A00".

            The address seems to be "PCIE_SS1".

    So, our understanding is as follows;

    - pcie_sample.c : EDMA read demo

        - EP   read from DDR & the data is transferred to RC.

        - RC  write to PCIE_SS1.

    Question:

     - Is this understanding correct?

     - Could you tell us the next/final destination of the data of PCIE_SS1 written by RC like above.      

        -> Is Final destination DDR on RC?

    Best regards,

    break.pdf

  • Hi,

    Thanks for the test!

    For the first breakpoint, ibCfg.ibOffsetAddr, it is 2147813632 (integer) = 0x8005_0900 (Hex). So the EP side, the buffer is inside DDR.
    For the second breakpoint, seems I made a mistake, please check the value of source, I think it should be an address in DDR.

    Regards, Eric
  • Hi,

    Thank you very much for your kindness.

    I really appreciate your help.

    I checked the value of "source" like below. (please refer attached pdf)

      - source=0x81069400 (it seems to be an address in DDR as you said.)

    Thank you!

    And our customer sent us additional questions like below;

    Question:

    - 1: Could you explain the below definitions?

       #define PCIE_IB_LO_ADDR_RC   0x90000000

       #define PCIE_IB_HI_ADDR_RC   0

    -2: Are there any document or guide of below registers?

         - They would like to know how to configure those registers and recommended configuration.

         PCIECTRL_PL_IATU_INDEX

         PCIECTRL_PL_IATU_REG_CTRL_1

         PCIECTRL_PL_IATU_REG_CTRL_2

         PCIECTRL_PL_IATU_REG_LOWER_BASE

         PCIECTRL_PL_IATU_REG_UPPER_BASE

         PCIECTRL_PL_IATU_REG_LIMIT

         PCIECTRL_PL_IATU_REG_LOWER_TARGET  

         PCIECTRL_PL_IATU_REG_UPPER_TARGET

    Best regards,

    bp.pdf

  • Hi,

    Thanks for checking this! Yes, the buffer is inside DDR3.

    A1.
    #define PCIE_IB_LO_ADDR_RC 0x90000000
    #define PCIE_IB_HI_ADDR_RC 0

    This translate an incoming PCIE address into SOC internal memory address. For 32-bit BAR, the PCIE_IB_HI_ADDR_RC is zero (this is used for 64-bit BAR). For the meaning of: PCIE_IB_LO_ADDR_RC, you can refer to www.ti.com/.../sprabk8.pdf section 3.2.3 PCIe Inbound Address Translation Examples. Note: this document is for Keystone I/II device, but the inbound translation concept also applies to AM57x and well explained.

    In your example, you have a incoming PCIE address of 0x9000_0000, you minus this PCIE_IB_LO_ADDR_RC, then add the ibCfg.ibOffsetAddr = (uint32_t)pcieConvert_CoreLocal2GlobalAddr ((uint32_t)dstBuf.buf); =====>you got your SOC internal memory address.

    A2. Those registers are explained in AM572x TRM, section 24.9.7.5.1 PCIe_SS_PL_CONF Register Summary

    Regards, Eric
  • Hi,

    Thank you very much for your kindness.
    I really appreciate your help.
    I would like to answer to the customer.

    Our customer sent us additional questions about "pcie_sample.c".
    They try to connect TMDXIDK5728(as EP) to their Windows-PC(as RC) to measure performance by using "pcie_sample.c".
    - It seems to be success "link" to create windows driver by themselves.
    - For the next step, they would like to know "BARx address information" to access to the DDR on TMDXIDK5728 from Windows-PC.

    Question:
    - Could you tell us the BAR address information like below?
    BAR0:?
    BAR1:?
    BAR2:?
    BAR3:?
    BAR4:?
    BAR5:?

    - What BAR address should be set to RC which running on Windows PC?
    - Are there any source code modification of "pcie_sample.c" running on TMDXIDK5728(as EP) to their Windows-PC(as RC)?
    - Which part should be refer in the "pcie_sample.c" to understand "BAR address configuration" running on TMDXIDK5728(as EP)?

    Best regards,
  • Hi,

    We don't do any test of PCIE EP to work with any host PC, like Linux or Windows machine. So there is no code example for it.

    On the PC side, you need a Windows driver that enumerate the PCIE bus, then the driver reads the BAR mask programmed by the AM5728 and allocate memory. Based on the allocated memory, the driver program BAR0/1/... 5. On the AM5728 side, you only need to program BAR mask. The Windows PCIE driver needs to program the Inbound and outbound translation of AM5728.

    The reference code for BAR mask is inside: pcieCfgEP(Pcie_Handle handle)
    /* Configure BAR Masks */
    /* First need to enable writing on BAR mask registers */
    if ((retVal = pcieCfgDbi (handle, 1)) != pcie_RET_OK)
    {
    return retVal;
    }

    /* Configure Masks*/
    memset (&getRegs, 0, sizeof(getRegs));
    memset (&setRegs, 0, sizeof(setRegs));
    type0Bar32bitIdx.reg.reg32 = PCIE_BAR_MASK;
    setRegs.type0BarMask32bitIdx = &type0Bar32bitIdx;

    /* BAR 0 */
    type0Bar32bitIdx.idx = 0; /* configure BAR 0*/
    if ((retVal = Pcie_writeRegs (handle, pcie_LOCATION_LOCAL, &setRegs)) != pcie_RET_OK)
    {
    PCIE_logPrintf ("SET BAR MASK register failed!\n");
    return retVal;
    }

    /* BAR 1 */
    type0Bar32bitIdx.idx = 1; /* configure BAR 1*/
    if ((retVal = Pcie_writeRegs (handle, pcie_LOCATION_LOCAL, &setRegs)) != pcie_RET_OK)
    {
    PCIE_logPrintf ("SET BAR MASK register failed!\n");
    return retVal;
    }

    /* Disable DBI writes */
    if ((retVal = pcieCfgDbi (handle, 0)) != pcie_RET_OK)
    {
    return retVal;
    }

    Regards, Eric
  • Hi,

    Thank you very much for your kindness.
    I really appreciate your help.
    I would like to send the answer to the customer.

    Best regards,