This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Reset PCIe, TMDXEVM6678L

Other Parts Discussed in Thread: SYSBIOS

Hi,

I tested the ti’ PCie example\ sample, the test passed ‘OK’, but whenever before I want to retry a new test, I must do a hard reset [WARM_RST].

Or in EVM user guide, it’s mentioned that the can be software, so how can I do it??

And it is another method to retry the test without [WARM_RST] ??

For example, I want just change the PCIe_Mode (RC or EP), or amount of data to send…

  • Delared,

    I think you can disable the PCIe power domain first and re-enable it in each test, which will put the PCIe registers in default values and you could retry the test without doing hard reset.

    For example, you can add the power down API ("pciePowerDownCfg()") as follows which is similar to power up API ("pciePowerCfg()") in the original example:

    pcieRet_e pciePowerDownCfg(void)
    {
    /* Turn off the PCIe power domain */
    if (CSL_PSC_getPowerDomainState (CSL_PSC_PD_PCIEX) != PSC_PDSTATE_OFF) {
    /* Enable the domain */
    CSL_PSC_disablePowerDomain (CSL_PSC_PD_PCIEX);
    /* Enable MDCTL */
    CSL_PSC_setModuleNextState (CSL_PSC_LPSC_PCIEX, PSC_MODSTATE_DISABLE);
    /* Apply the domain */
    CSL_PSC_startStateTransition (CSL_PSC_PD_PCIEX);
    /* Wait for it to finish */
    while (! CSL_PSC_isStateTransitionDone (CSL_PSC_PD_PCIEX));
    } else {
    System_printf ("Power domain is already disabled.\n");
    }

    return pcie_RET_OK;
    }

    And then you can call this power down PCIe module before you power it up in the main() :

    /* Power down PCIe Module */
    if ((retVal = pciePowerDownCfg()) != pcie_RET_OK) {
    System_printf ("PCIe Power Down failed (%d)\n", (int)retVal);
    exit(1);
    }

    /* Power up PCIe Module */
    if ((retVal = pciePowerCfg()) != pcie_RET_OK) {
    System_printf ("PCIe Power Up failed (%d)\n", (int)retVal);
    exit(1);
    }

    I think in this way, you could just reload your new test and retry it without hard reset. Hope it is working for you.

  • Steven,

    Refers to PCIe Userguide[P 25] the link training should be disabled before any config.

     And “Upon reset, the LTSSM_EN is de-asserted automatically by hardware”.

    It means that with hard reset, the link training is automatically disabled, So with the pciePowerDownCfg() I ‘ll have the same result???

    An other question please.

    In pcie_sample.c [line 226…] a memset is used for initialization:

    /* Configure the size of the translation regions */

    memset (&setRegs, 0, sizeof(setRegs));

      memset (&getRegs, 0, sizeof(getRegs));

    so here the setRegs and getRegs are all 0s.

    obSize.size = pcie_OB_SIZE_8MB;

      setRegs.obSize = &obSize;

      if ((retVal = Pcie_writeRegs (handle, pcie_LOCATION_LOCAL, &setRegs)) != pcie_RET_OK)

    OK the OBSIZE is configured with pcie_OB_SIZE_8MB  via the setRegs.

    /* Setting PL_GEN2 */  

      memset (&setRegs, 0, sizeof(setRegs));

    the setReg will be all 0s

      gen2.numFts = 0xF;

      gen2.dirSpd = 0x0;

      gen2.lnEn   = 1;

      setRegs.gen2 = &gen2;

      if ((retVal = Pcie_writeRegs (handle, pcie_LOCATION_LOCAL, &setRegs)) != pcie_RET_OK)

    the GEN2 is configured via the same setReg!!!

    So  how the OBZIZE maintains the old config.????

    and whenever i call Pcie_writeRegs (handle, pcie_LOCATION_LOCAL, &setRegs), all reg will be updated ???

    Please I nead more explanations!  

  • Steven,

    I have also others questions please,

    1- When we want to write a config in cmdStatus, some time I found a Pcie_readRegs() placed before!

    e.g in the function pcieLtssmCtrl(Pcie_Handle handle, uint8_t enable), we want to enable or disable link training, but I have also a readRegs() function!!!

    getRegs.cmdStatus = &cmdStatus;

      if ((retVal = Pcie_readRegs (handle, pcie_LOCATION_LOCAL, &getRegs)) != pcie_RET_OK)

      {

        System_printf ("Read CMD STATUS register failed!\n");

        return retVal;

      }

    And In the main we call the function:

    /*Enable link training*/

      if ((retVal = pcieLtssmCtrl(handle, TRUE)) != pcie_RET_OK)

      {

        System_printf ("Failed to Enable Link Training!\n", (int)retVal);

        exit(1);

      }

    So why we use the readRegs()  function??? and it is necessary to read the contents of cmd Status before to write in???

    2-   line 256:   /* Setting PL_GEN2 */   
      memset (&setRegs, 0, sizeof(setRegs));
      gen2.numFts = 0xF;

    refers to PCIe manuel, "numFST is the number of fast training sequences [8bits],  transmitted by a device in order to transition a Link from the low power L0s (standby)  to the full-on L0 state".

    but how can i define the number of FTS that i should  make??? and why 0xF?? i need more explanations please.

  • Delared,

    Power down the PCIe module will disconnect the PCIe link as well since LTSSM_EN bit will be cleared to 0.

    If you have multiple permutations (different data buffer size) need to be tested, you can have them in sequence in one single test. But if you want to switch operation mode (such as RC => EP), it may require a warm reset or power down/up on PCIe module.

    For the "memset" and "Pcie_writeRegs()" question, please take a look at the details of "Pcie_writeRegs()" API in PCIe LLD source code "C:\ti\pdk_C6657_1_1_2_5\packages\ti\drv\pcie\src\pcie.c".

    "......

    if (writeRegs->cmdStatus) {
    pcie_check_result(retVal, pcie_write_cmdStatus_reg (baseAppRegs, writeRegs->cmdStatus));
    }
    if (writeRegs->cfgTrans) {
    pcie_check_result(retVal, pcie_write_cfgTrans_reg (baseAppRegs, writeRegs->cfgTrans));
    }
    if (writeRegs->ioBase) {
    pcie_check_result(retVal, pcie_write_ioBase_reg (baseAppRegs, writeRegs->ioBase));
    }

    ......"

    "setRegs" and "getRegs" are just data structures of PCIe register pointers.  The API will check if any new register pointers are assigned to "writeRegs"(setRegs) and only the new modifications will be updated when calling this API. So the previous updates will remain the same.

    Similarly, "setRegs" and "getRegs" are pointing to the same "cmdStatus" register structure. The LLD is doing Read-and-Modify to this register in order to preserve the other bits which do not need to be modified along with "ltssmEn" bit.

    Again, you can trace the "Pcie_readRegs()" and "Pcie_writeRegs()" APIs in the LLD source file that in the end, the LLD is using "pcie_setbits()" (defined in C:\ti\pdk_C6657_1_1_2_5\packages\ti\drv\pcie\src\PcieLoc.h) to "OR" the old value with new value to preserve the other bits.

     

  • Steven,

    in line 256:   /* Setting PL_GEN2 */   

      memset (&setRegs, 0, sizeof(setRegs));
      gen2.numFts = 0xF;

    refers to PCIe manuel, "numFST is the number of fast training sequences [8bits],  transmitted by a device in order to transition a Link from the low power L0s (standby)  to the full-on L0 state".

    but how can i define the number of FTS that i should  make??? and why 0xF?? i need more explanations please.

  • Delared,

    As you may already see in the PCIe specification, the exchange of Fast Training Sequences (FTS) Ordered-Sets is used to achieve Bit Lock and Symbol Lock when exiting from the L0s to the L0 power state.  During Link training at Link initialization, the receiver sends the remote transmitter the NUM_FTS field to indicate how many FTS Ordered-Sets it must receive to reliably obtain Bit and Symbol Lock. Armed with this information, the transmitter sends at least that many FTS Ordered-Sets during exit from the L0s state.

    If you see the issue of the device transitioning from L0s state to L0 state, you can try to increase the value in this NUM_FTS field to give more time for the receiver to obtain the bit/symbol lock. But 0xF is typically a good number for C667x device, and it is also the default value in the NUM_FTS field in PL_GEN2 register of C667x PCIe module.

  • Steven,

    1. I want to evaluate the throughput performances using the _CSL_tscRead(), but I don’t know where I should exactly make the function in the code in order to calculate the number of cycle that spend data to be transferred from DSP1 to DSP2?

    there are other function that can help me to do my throughput measurement???

    2. the example use the cache mecanism reception as following:

      /* EP waits for the data received from RC */
        do {
          unsigned int key;

          /* Disable Interrupts */
          key = _disable_interrupts();

          /*  Cleanup the prefetch buffer also. */
          CSL_XMC_invalidatePrefetchBuffer();    

          CACHE_invL1d ((void *)dstBuf.buf,  PCIE_EXAMPLE_DSTBUF_BYTES, CACHE_FENCE_WAIT);
          CACHE_invL2  ((void *)dstBuf.buf,  PCIE_EXAMPLE_DSTBUF_BYTES, CACHE_FENCE_WAIT);

          /* Reenable Interrupts. */
          _restore_interrupts(key);

        } while(dstBuf.buf[PCIE_BUFSIZE_APP] != PCIE_EXAMPLE_BUF_FULL);

        System_printf ("End Point received data.\n");

    But I want to work without cache, so when i delete the CACHE_invL1d() and CACHE_invL2, i have ERROR in compilation!!!!

    3. The example illustrates just the memory Write transactions type, but how can I test the Memory Read transactions using the LLD???

  • Declared,

    1. You can put the timer reading before and after the actual PCIe data transfer section. In the LLD, it could be something as follows:

    /* add dstOffset to pcieBase for data transfer */

    start_time = CSL_tscRead();

    for (i=0; i<PCIE_BUFSIZE_APP; i++)
    {
    *((volatile uint32_t *)pcieBase + dstOffset/4 + i) = srcBuf[i];
    }

    stop_time = CSL_tscRead();

    Please note that the LLD is just using CPU write/read for the PCIe data transfer, in which the payload size is not optimum (much less than 128B) in each TLP. 

    In order to get higher throughput on PCIe interface, it is suggested to use EDMA to transfer the data to/from the PCIe data space.

    2. If you do want to work without cache, you need to disable L1D/L2 cache in the device at the beginning of the testing or disable the MAR bits in the memory that could be cacheable.

    May I ask what kind of error you get in the compilation please? It may be some other issue besides just deleting two APIs.

    3. The memory read or write is similar that only depends on the data transaction direction. It does not depend on the LLD.

    If you want to use CPU read, it could be something as follows

    for (i=0; i<PCIE_BUFSIZE_APP; i++)
    {
    dstBuf[i] = *((volatile uint32_t *)pcieBase + srcOffset/4 + i)
    }

    If you want to use EDMA, you just need to exchange the src and dst in the EDMA param setup based on Write or Read direction.

    But you need to pay attention to the Inbound transaction setup. For example, currently the IB Offset is mapped to the dst buffer since we are writing data from RC src to EP dst:

     ibCfg.ibOffsetAddr  = (uint32_t)pcieConvert_CoreLocal2GlobalAddr ((uint32_t)dstBuf.buf);

    If you want to read data from EP src to RC dst by RC, you need to map IB offset in EP to EP src buffer instead of dst buffer in write case:

    EP: ibCfg.ibOffsetAddr  = (uint32_t)pcieConvert_CoreLocal2GlobalAddr ((uint32_t)srcBuf.buf);

  • Steven,

    Now, I’m working without EDMA, but I have some problems:

    1-      I can’t transfer more than 180 KB!! It’s normal because my program also is in L2, or there is others ways to exploit the maximum of my memory space???

    2-      So I made my CSL_tscRead() as following:

    /* add dstOffset to pcieBase for data transfer */

    start_time = CSL_tscRead();

    for (i=0; i<PCIE_BUFSIZE_APP; i++)
    {
    *((volatile uint32_t *)pcieBase + dstOffset/4 + i) = srcBuf[i];
    }

    stop_time = CSL_tscRead();

    Ncycle= stop_time- start_time;

     

    With this method, I will have the measured write/Read performance of the PCIe peripheral like it is shown on userguide ” Throughput Performance Guide for C66x KeyStone Devices” ??

    And  how can i take also:

    • Overhead Considerations

    • Packet Size Considerations

    8b/10b encoding: takes away 20 percent of the raw bandwidth

    If not, there is for example a register in LLD that I can test (flag) to indicate if my data is transferred or yet?

    3.  with 4KDW=163 840B I had Ncycle =108 330 =>  throughput=1.21 Gbps

    this is the max because with other amount of data I had less, for example:

    64 DW=> 1.18 Gbps

    40 KDW=>1.16Gbps

    Do you thik this is normal?

    4. I tried to change the gen2.dirSpd to 0x1 [PCIe Gen2] or gen2.lnEn=2 [PCIe x2] but always I obtain the same result (Gen1/Gen2 or x1/x2) !!!

  • Declared,

    1. It might be the reason that the buffers is allocated in L2 SRAM. You can try the DDR memory if you want.

    I think the LLD example enables one outbound translation region and the size is 8MB (OBSIZE). So you should be able to transfer 8MB data in the current setup.

    For PCIe module in C6678 itself, there are 32 outbound regions that you could transfer 32*8MB=256MB data without changing the outbound setup.

    Please take a look section 2.7 in PCIe user guide or section 3 in PCIe use case document.

    2. The PCIe throughput mentioned in the performance user guide is measured using EDMA for the data transfer, which could give the maximum payload size (128Byte) to PCIe port to increase the throughput efficiency. The timing capture mechanism is similar that get the starting time before triggering the EDMA, trigger the EDMA and get the stopping time point after the EDMA transfer completes (such as polling EDMA completion register).

    "Overhead" and "8/10b encoding" are basically fixed in PCIe configurations and the major factor of the throughput is "Packet size".

    Using EDMA with data burst size (DBS) =128B will get the maximum throughput for PCIe module in C6678. Please take a look at the performance user guide and PCIe use case document for more details.

    3. If you are using the CPU for the data transfer, the payload size of each PCIe packet is probably only 32-bit (4B) or 64-bit (8B). So it could not achieve to the higher throughout as mentioned in the performance user guide.But they should be normal for the PCIe transfer initialized by CPU.

    4. Please make sure you changed both C6678 (RC and EP) together for the speed and lane number modifications.Otherwise you will get the minimum setup supported by both ends.

    And you may even not able to see the difference in your current setup since the rate of using CPU to push data to PCIe is pretty slow comparing to the PCIe throughput.

    It will be better to use EDMA for the throughput testing.

  • Steven,

    1. I did the same thing with Memory Read Request, but it looks that the throughput is so poor than Memory Write Request! [with CPU, I obtain 0.1135 Gbps max for 136 840 B against 1.1662 with MWr Requests]!!

    You think with CPU (32-bits) is normal to have this difference???

    2. So now I try to do it with EDMA and DDR Memory, and may I ask you if I worked with outbound address translation disabled (CMD_STATUS[OB_XLT_EN] = 0 ), What is the maximum size that I can transfer 8MB or 256MB???

    The same question with inbound address translation disabled (CMD_STATUS[IB_XLT_EN] = 0 ) ??

    3. I want to change memory space using:

    MEMORY {

     

    DDR3: o=0x80000000, l=0x20000000

    }

    But I get an error: DDR3 memory range overlaps existing…

    So it looks that the example using the SysBios config, and I don’t know how can I change it??

    i deleted the BIOS include (#include <ti/sysbios/BIOS.h> .....) but when I delete the  pci_sample.cfg  I get an error: this project does not contain a buildable RTSC config (.cfg)

  • If CPU writes or reads 32 bits (4 bytes) every time, the payload size of each PCIe packet is 4B. There are about 24B overhead in TLP and one DLLP (8B) between each TLP in the C667x PCIe data transfer. The efficiency is about 4/(4+24+8)=11.1%.

    The throughput in Gen1 x1 lane is about 2.5Gbps(Gen1) *1(lane)*(8b/10b)*11.1%=0.222Gbps

    For Gen2 x2 lanes, it is about 5.0Gbps(Gen2)*2(lane)*(8b/10b)*11.1%=0.888Gbps.

    So the Write results you have are about the correct range with Gen2 x2 lanes.

    For the meaningful throughput testing, you can try EDMA, that the payload size is 128B in each packet that the efficiency will be about 128/(128+24+8)=80%.

    The basic rule of PCIe transfer is that the PCIe address (with/without outbound translation) from RC should match the BAR base address of EP. 

    If the outbound address translation is disabled, the PCIe address over the PCIe link will be the PCIe data space address in C667x module, which is starting from 0x60000000.

    The outbound translation just gives the C667x PCIe user the capability to change the PCIe address over the link (other than PCIe data space in C667x module).

    The user still have 256MB window for the outbound transfer since the PCIe data space in C667x is 256MB (0x60000000~0x6FFFFFFF).

    The inbound address translation enable/disable does not affect the BAR window size on EP side. It only gives capability for EP to forward those PCIe packets to other device memory.

    The BAR mask register setup will affect the BAR window size. It could be more than 256MB. Please take a look at section 2.7.3.1 in PCIe user guide.

  • Steven,

    I want to change memory space so i tried with (.cmd):

    MEMORY {

     

    DDR3: o=0x80000000, l=0x20000000

    }

    But I get an error: DDR3 memory range overlaps existing…

    So it looks that the example using the SysBios config, and I don’t know how can I change it??

    i deleted the BIOS include (#include <ti/sysbios/BIOS.h> .....) but when I delete the  pci_sample.cfg  I get an error: this project does not contain a buildable RTSC config (.cfg)

    Pease how can i do it without using the sysbios ???

  • Please how can i avoid the sysbios???

    because it does not work without file .cfg !!!

  • Delared,

    If you could build the original project successfully, could you see the following memory sections in your .map file in the Debug folder:

    MEMORY CONFIGURATION

    name origin length used unused attr fill
    ---------------------- -------- --------- -------- -------- ---- --------
    L2SRAM 00800000 00100000 00022992 000dd66e RW X
    MSMCSRAM 0c000000 00100000 00000000 00100000 RW X
    DDR3 80000000 20000000 00000100 1fffff00 RWIX

    This RTSC project (BIOS) already defined the DDR3 for C6678 platform. You can just use it in the SECTION part in the .cmd file, such as 

    SECTIONS
    {
    .init_array > L2SRAM
    //.dstBufSec > L2SRAM
    .dstBufSec > DDR3
    }

    There are some details of the memory management in the SYS/BIOS user guide, such as the following link:

    http://www.ti.com/lit/ug/spruex3l/spruex3l.pdf

    Or you could create a new CCS project / C project with the PCIe example files and LLD library by yourself. Then you should have full control of the memory map linker file (.cmd).

  • Steven,

    So now i can build my project without sysbios, it's OK

    but may i ask again how can i measure  the throughput effectively because with your proposition:

    start_time = CSL_tscRead();

    for (i=0; i<PCIE_BUFSIZE_APP; i++)
    {
    *((volatile uint32_t *)pcieBase + dstOffset/4 + i) = srcBuf[i];
    }

    stop_time = CSL_tscRead();

    Ncycle= stop_time- start_time;

    When i added "Optimisation level"  for example "3" i get a different result that some time accross 5Gbps !!!

    Please there is other method to do this?? for example in the LLD i can test the flag in some register...???