This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

the problem of DMA transport of PCIe

Hi everyone:

                I try to send data from the RC device to the  EP device throught DMA. So I creat 512 word ordinal number(from 0 to 511)-srcbuf[512]- in the RC device. Then call the transdata funtion to send the data. the EP device can receive the data ,but    dstbuf[0-7]' value are 0 and so do dstbuf[504-511]. the rest dstbuf[8-503] can receive the right data.

the code as fellow:

 

#pragma DATA_SECTION(dstBuf, ".dstBufSec")

 

#pragma DATA_ALIGN(dstBuf, 256

/* last element in the buffer is a marker that indicates the buffer status: full/empty */

#define PCIE_EXAMPLE_MAX_CACHE_LINE_SIZE 128

#define PCIE_EXAMPLE_UINT32_SIZE           4 /* preprocessor #if requires a real constant, not a sizeof() */

 

//#define PCIE_EXAMPLE_DSTBUF_BYTES ((PCIE_BUFSIZE_APP + 1) * PCIE_EXAMPLE_UINT32_SIZE)

#define PCIE_EXAMPLE_DSTBUF_BYTES ((PCIE_BUFSIZE_APP ) * PCIE_EXAMPLE_UINT32_SIZE)

#define PCIE_EXAMPLE_DSTBUF_REM (PCIE_EXAMPLE_DSTBUF_BYTES % PCIE_EXAMPLE_MAX_CACHE_LINE_SIZE)

#define PCIE_EXAMPLE_DSTBUF_PAD (PCIE_EXAMPLE_DSTBUF_REM ? (PCIE_EXAMPLE_MAX_CACHE_LINE_SIZE - PCIE_EXAMPLE_DSTBUF_REM) : 0)

struct dstBuf_s {

  //volatile uint32_t buf[PCIE_BUFSIZE_APP + 1];

  volatile uint32_t buf[PCIE_BUFSIZE_APP];

  /* Cache coherence: Must pad to cache line size in order to enable cacheability */

#if PCIE_EXAMPLE_DSTBUF_PAD

  uint8_t padding[PCIE_EXAMPLE_DSTBUF_PAD];

#endif

} dstBuf;

 

 

void main()

{

..........

 TransData(srcaddr, 0x60000000, 2, 2, 0);

...........

}

 

函数名:

 Bool TransData(Uint32 sAddr, Uint32 dAddr, Uint16 Len_KB, Bool dstIsFifo, Bool intrEn)

 sAddr: source address

dAddr: destination address

Len_KB: length of data to be sended

dstIsFifo:

 =1:destination is fifo;=0:source is fifo;=其他值:both source and destination RAM;

intrEn:

 =1:enable interrupt; =0: disable interrupt;

 true: transport succeed

 false:transport fail

 

acout = 128byte, bcount =(length)/acout; ccount =1; transport mode :AB.

which reason can lead to this problem.

  • Hi Liang,

    I tried your EP code posted above but could not re-produce the problem: all the 2KB data (512 word) has been transferred correctly from RC to EP.

    I think you use the following definition based on your code:

    #define PCIE_BUFSIZE_APP 512

     

    In my testing, the RC does not enable outbound address translation, the DMA setup on RC side is as follows:

     

    paramSetup.option = CSL_EDMA3_OPT_MAKE(FALSE,FALSE,FALSE,TRUE,TEST_TCC,CSL_EDMA3_TCC_NORMAL, \

                            CSL_EDMA3_FIFOWIDTH_NONE,FALSE,CSL_EDMA3_SYNC_AB,CSL_EDMA3_ADDRMODE_INCR,CSL_EDMA3_ADDRMODE_INCR);

    paramSetup.aCntbCnt    = CSL_EDMA3_CNT_MAKE(128,size/128); 

    paramSetup.srcDstBidx  = CSL_EDMA3_BIDX_MAKE(128,128);

     

    paramSetup.srcDstCidx  = CSL_EDMA3_CIDX_MAKE(0,0);    

    paramSetup.cCnt        = 1;

        paramSetup.linkBcntrld = CSL_EDMA3_LINKBCNTRLD_MAKE(CSL_EDMA3_LINK_NULL,0);     

        paramSetup.srcAddr     = (Uint32)srcAddr;

        paramSetup.dstAddr     = (Uint32)dstAddr

    CSL_edma3ParamSetup(paramHandle,&paramSetup);

    Where srcAddr=0x10830000, dstAddr=0x60000000, using EDMA TPCC=2 TPTC=0

     

    On EP side, I enabled inbound address translation as follows:

     

    BAR1 = 0x60000000;

    IB_BAR0 = 1; //using BAR1 on EP

    IB_START0_LO = 0x60000000

    IB_START0_HI =  0x0; // 32 bit addressing

    IB_OFFSET0 = &dstBuf //dst Buffer address

     

     

    Since I could not re-produce the issue, could you check your code and provide the following info:

    1. How the srcAddr is defined on your RC device

    2. How the inbound translation is defined on your EP device

     

    If you do not mind, you can attach your source code (both RC and EP sides) here. More details could help us figure out the issue. Thanks.

     

    Sincerely,

    Steven

     

     

     

  • Hi steven:

               Thank you for your repply. the program I have been testing using the sample program provided by TI (...pdk_C6678_1_0_0_9_beta2\packages\ti\drv\pcie\example\sample).

    I add some DMA program in it. The inbound translation and outbound translaton are all enabled. the srcaddr is defined in MSMC. the dstbuf is defined int L2SRAM. 

               Could you mind tu provide the source code to me. My lab is emulator the TMS320C6678 EVM' performance. If could ,my mail is leliang2008@gmail.com  

  • Hi steven:

                The fellowing is CMD  file code.

                 -c

    -heap  0x41000

    -stack 0xa000

     

    /* Memory Map 1 - the default */

    MEMORY

    {

        L1PSRAM (RWX)  : org = 0x0E00000, len = 0x7FFF

        L1DSRAM (RWX)  : org = 0x0F00000, len = 0x7FFF 

     

        L2SRAM (RWX)   : org = 0x0800000, len = 0x100000

        MSMCSRAM (RWX) : org = 0xc000000, len = 0x200000

        DDR3 (RWX)     : org = 0x80000000,len = 0x10000000

    }

     

    SECTIONS

    {

        .csl_vect   >       MSMCSRAM

        .text       >       MSMCSRAM

        GROUP (NEAR_DP)

        {

        .neardata

        .rodata 

        .bss

        } load > MSMCSRAM

        .stack      >       MSMCSRAM

        .cinit      >       MSMCSRAM

        .cio        >       MSMCSRAM

        .const      >       MSMCSRAM

        .init_array > L2SRAM     

        .dstBufSec  > L2SRAM

      // .init_array > MSMCSRAM     

       // .dstBufSec  > MSMCSRAM

        .data       >       MSMCSRAM

        .switch     >       MSMCSRAM

        .sysmem     >       MSMCSRAM

        .far        >       MSMCSRAM

        .testMem    >       MSMCSRAM

        .fardata    >       MSMCSRAM

        platform_lib > MSMCSRAM

    }

       if I defined the scrbuf in the L2SRAM, The RC device can't transport the data to the EP deviced.  
  • Hi Liang,

    The EDMA needs to work with the global address. In your linker file, the L2SRAM is defined as the local address. The example of  global address is as follows:

    Core0 L2: local address 0x00800000  -> global address 0x10800000;

    Core1 L2: local address 0x00800000 -> global address 0x11800000;

    Core2 L2: local address 0x00800000 -> global address 0x12800000;

    ......

    The MSMC SRAM and DDR memory addresses do not need to convert.

    You can give a try using global address for both src and dst buff if placed in L2. Or you can place both src/dst buff in MSMC SRAM or DDR to see if the problem is resolved.

     

    Sincerely,

    Steven

  • Hi steven:

                 Thanks  for your advice, the problem I accounted has been solved. Next I will test data transport rata.

     

     

     

    Sincerely

    Liang Le 

  • Hi Steven:

               The data transport rata throught DMA  I haved work out is just 340MB/s, less than the expect(about 650MB/s ). The PCIe SERDES  configuration register is seted 0x01c9. The bit LN_EN of PL_GEN2 Register is seted to 0x2. is there anything I missed out ? I want the PCIe to work in 2x mode.

  • Hi Liang,

    The SerDes setup 0x1c9 is for reference clock 100MHz. Please check what is the reference clock you are using in the testing platform. And you can set the SerDes PLL according to the PCIe user's guide.

    You can try to set DIR_SPD=1 in PL_GEN2 register as well for both RC and EP. You need to make sure both devices are working in Gen2 mode with 2 lanes to achieve higher throughput.

    Hope it works for you.

     

    Sincerely,

    Steven

  • Hi Steven,

     I have adopt you advice, but there is someting that I fell puzzled. Can you tell me which register is used to decide 1 lane or two lanes? The LN_EN bit of Gen2 Register. I have changed the value of the bit from 0 to 1, but  the data transport rata doesn't change.

    LN_EN        0-1FFh              Lane Enable. 

                                                    1h = ×1

                                                    2h = ×2

                                                    Others = Reserved.

     

  • Hi Liang,

    For C66x PCIe devices, based on the spec and test, the default value (reset value) of PCIe registers configuration should support Gen1 (2.5Gbps) x2 lanes.

    Switch between x1 and x2 lanes: set LINK_MODE (bits 21-16) in PL_LINK_CTRL register to 3h (2x mode, default), or set to 1h (x1 mode),  before the link up

    Switch between Gen1 and Gen2: set DIR_SPD (bit 17) in PL_GEN2 register to 0h (initialized at Gen1, default) , or set  to 1h (switch to Gen2), before the link up

    Please note that after you set the DIR_SPD=1 in PL_GEN2, you may not see the change in memory review (DIR_SPD still read back as 0 after link up). But you should be able to see the throughput difference. 

    Please also make sure the 2 lanes are physically connected between RC and EP when you are trying x2 lanes testing.

    And please make sure both RC and EP will support Gen2 during the link up, when you are trying Gen2 testing.

    Please let us know if you get the desired throughput with those changes. Thanks.

     

    Sincerely,

    Steven