This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: PCIe Can Only Enumerate To 1lan During the system startup phase

Part Number: TDA4VM

hi expert:

    On our cecu, two TDA4 PCIes are connected together by chip to chip. The PCIe controller is using pcie1. Set parameters, gen3, 2 lan. We found that when pcie starts enumeration, occasionally it can only enumerate to 1 lan. We tested the eye diagram of the cecu pcie and it looks very good. When only 1 lan can be enumerated, we dumped the registers related to pcie1 and serdes_16G1 and compared them with the normal enumerated registers, and found some differences. We cannot further analyze these register differences. Ask ti experts to help analyze the possible causes of the problem. Thanks!

The register dump file is attached.

The eye diagram is as follows:


reg compare is as follows:

  • hi expert:

    To add a piece of information, this problem occurs after the system is reset, and we use the MCU_PORz method to reset.

  • Billy, 

    could you confirm:

    1. what is the sequence when you applied MCU_PORz in terms of both RC and EP TDA4's? Are you just resetting one of them, or both? if later, what is the sequence?

    2. will the link be trained back to 2 lane if you rescan the bus from RC side. i.e.

         echo 1 > /sys/bus/pci/devices/0000:00:00.0/remove (replace the device node if using other slots)

         echo 1 > /sys/bus/pci/rescan

    3. can you also dump the SERDES1 mux registers CTRLMMR_SERDES1_LN0_CTRL and CTRLMMR_SERDES1_LN1_CTRL, just to check if SERDES lane mux are not corrupted. 

    Also I tried to download tar file, but it showed page not found. could you re-upload?

    From the snippet you sent, if I assume all other registers not shown are the same, then the differences are mostly dynamic SERDES registers PCIe differences are mainly on x2 vs. x1 and atu differences. So nothing obvious yet. 

    JIan

     

  • pcie1_reg_compare.tar.gz

    hi jian.
        register dump file had upload again.

    QA:
    1: TDA4#1 as RC, TDA4#2 as EP. when applied MCU_PORz , The signal will be sent to the MCU_PORZ of the both TDA4s at the same time.

    2:We are using the qnx system, the driver is somewhat different from the linux driver, there is no rescan, rescan. After driving enable training, if it is detected that the value of the status register is not 0x10000117, the register 0x0d900050 is assigned to 0x82010003. Retraining is repeated up to 10 times.
    3:CTRLMMR_SERDES1_LN0_CTRL and CTRLMMR_SERDES1_LN1_CTRL registers value both is 0x1 when problem happen.

  • Billy, 

    On 1, can I ask a few more details about your hardware configuration:

      - can you confirm your cold boot case is also using the same sequence, i.e., MCU_PORz are asserted simultaneously?  

      - what is the clocking scheme of both sides, for PCIe refclk?

      - what is the PERST signal connected to, on both sides?

    Not sure if you have controls over the MCU_PORz individually, if do, can you add a delay to the MCU_PORz on the RC side, when it is asserted? Goal is to bring up the EP first, let it be in LTSSM, and then bring up RC. 

    regards

    Jian

  • hi jian:

       Thanks for your suggestion, we tried the method of starting the EP before the RC, this method seems to be working, we tested it for a week, and the problem didn't reappear, it seems that the problem is solved. but we have some confusion

    1. What is the reason why this method works?

    2. How long should the EP/RC start time difference be set? Currently we set it to 2 seconds

    3. We have used the 0x0d900050 register to retrain, but it has no effect. This method is also a method provided by TI to solve the problem that PCIe can only be enumerated to 2.5G/1 lane at TDA4 startup phase . Why doesn't it work. Triggered using the 0x0d900050 register, will ltssm re-enter the detect stage?

    thanks

    billy

  • Billy, 

    Sorry for the extended delays in reply. The reason to start EP first is to ensure the RC can detect the presence of the EP when it is up. By default, the SERDES on the EP side disabled, if RC boots faster, or have a race condition, the RC will timeout, or not be able to detect the second lane. once detected, the RC initiates link training.

    When you set EP to boot first, EP driver will enable SERDES, and PCIe will be waiting in LTSSM state when the PCIe EP driver is loaded during kernel boot. When RC is up later, it will first detect the presence of EP by detecting SERDES termination. 

    PCIe CEM spec stated that EP must enter LTSSM within 100ms of deassertion of #PERST signal by RC. So if you can check your kernel boot log, record the time when PCIe EP driver is started, offset by the RC side of PCIe driver boot time, as long as RC PCIe can stably see the EP's SERDES being eanbled, that will be the minimium time needed. There is no max time as the EP can be waiting in LTSSM foreever. 

    Jian