This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5749: PCIe: PCIe scan may fail on PCIe_SS2 when PCIe_SS1 and PCIe_SS2 are in single line mode.

Part Number: AM5749
Other Parts Discussed in Thread: XIO2001

Hello,

I'm using an AM5749 on a custom board where I enabled the second PCIe controller (PCIe_SS2) along with PCIe_SS1.
So each controller is using a single pcie lane.

I reproduced the issue with the TI 5.10 kernel but also with the latest upstream kernel 6.1.8.

&axi1 {
    status = "okay";
};

&pcie2_rc {
    status = "okay";
};

&pcie2_phy {
    status = "okay";
};

Sometime (relatively easy to reproduce) the second PCI bridge can't be identified properly:

# lspci
0000:00:00.0 PCI bridge: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01)
0000:01:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
0000:02:00.0 Unassigned class [ff00]: Hilscher GmbH CIFX 50E-DP(M/S)
0001:00:00.0 Non-VGA unclassified device: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01)

The issue only appear on the PCIe_SS2.

Indeed, the device class reported by the bus is 0x0:
[ 1212.493316] pci 0001:00:00.0: [104c:8888] type 01 class 0x000000          <<<< ??? KO
[ 1212.493347] pci 0001:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff pref]
[ 1212.493377] pci 0001:00:00.0: reg 0x14: [mem 0x00000000-0x0000ffff pref]
[ 1212.493438] pci 0001:00:00.0: supports D1
[ 1212.493438] pci 0001:00:00.0: PME# supported from D0 D1 D3hot
[ 1212.517578] PCI: bus1: Fast back to back transfers enabled
[ 1212.517608] pci 0001:00:00.0: PCI bridge to [bus 01]

Just before in the dmesg log we have this line:
[    3.215606] dra7-pcie 51800000.pcie: Phy link never came up

Also lspci report "Invalid class 0000 for header type 01" and probably an issue with PCIe BAR mapping (Memory at <unassigned>)

# lspci -nv
0001:00:00.0 0000: 104c:8888 (rev 01)
        !!! Invalid class 0000 for header type 01
        Flags: fast devsel, IRQ 255
        Memory at <unassigned> (32-bit, prefetchable) [virtual] [size=1M]         <<<<
        Memory at <unassigned> (32-bit, prefetchable) [virtual] [size=64K]        <<<<
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: [disabled]
        Memory behind bridge: [disabled]
        Prefetchable memory behind bridge: [disabled]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Root Port (Slot-), MSI 00
        Capabilities: [100] Advanced Error Reporting

When it's working the device class is 0x060400
[    2.316070] pci 0001:00:00.0: [104c:8888] type 01 class 0x060400

and we have the PCIe Link:
[    2.315917] dra7-pcie 51800000.pcie: Link up

# lspci
0000:00:00.0 PCI bridge: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01)
0000:01:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
0000:02:00.0 Unassigned class [ff00]: Hilscher GmbH CIFX 50E-DP(M/S)
0001:00:00.0 PCI bridge: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01)
0001:01:00.0 Network controller: Qualcomm Device 1103 (rev 01)

I found a previous report related to this kind of issue but without a clear explanation about the cause and how it was fixed:
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1017941/dra722-pcie-either-pcie1-or-pcie2-is-not-detected-properly


I noticed that the LTSSM state is not the same between PCIe1 and PCIe2 when we have the issue (from PCIECTRL_TI_CONF_DEVICE_CMD register)
# devmem2 0x51002104 w
Read at address  0x51002104 (0xb6f0e104): 0x00000045
# devmem2 0x51802104 w
Read at address  0x51802104 (0xb6f67104): 0x00000000  <<<

When it's working:
# devmem2 0x51002104 w ; devmem2 0x51802104 w
Read at address  0x51002104 (0xb6f73104): 0x00000045
Read at address  0x51802104 (0xb6f7b104): 0x00000045

I tried to remove and rescan the pci bus without success:
echo "1" > /sys/bus/pci/devices/0001\:00\:00.0/remove; sleep 1; echo "1" > /sys/bus/pci/rescan

On the 6.1.8 kernel, this crash the system:

[  142.966583] pci 0000:01:00.0: PCI bridge to [bus 02]
[  142.971588] pci 0000:01:00.0:   bridge window [mem 0x20200000-0x202fffff]
[  142.985382] pci 0001:01:00.0: [17cb:1103] type 00 class 0x028000
[  142.991485] pci 0001:01:00.0: reg 0x10: [mem 0x00000000-0x001fffff 64bit]
[  142.998779] pci 0001:01:00.0: PME# supported from D0 D3hot D3cold
[  143.005035] pci 0001:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0001:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
[  143.044342] pci 0001:01:00.0: BAR 0: assigned [mem 0x00200000-0x003fffff 64bit]
[  143.051757] pci 0001:00:00.0: PCI bridge to [bus 01]
[  143.056793] pci 0001:00:00.0:   bridge window [mem 0x00200000-0x003fffff]
[  143.063995] ath11k_pci 0001:01:00.0: BAR 0: assigned [mem 0x00200000-0x003fffff 64bit]
[  143.072052] pci 0001:00:00.0: can't enable device: BAR 0 [mem 0x00000000-0x000fffff pref] not claimed
[  143.081359] pci 0001:00:00.0: Error enabling bridge (-22), continuing
[  143.087890] ath11k_pci 0001:01:00.0: enabling device (0000 -> 0002)

On the 5.10 kernel where the ath11k driver doesn't support the Qualcomm module, the lspci is able to list the module but the PCI bridge is still not correctly detected (Non-VGA unclassified device)

# lspci
0000:00:00.0 PCI bridge: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01)
0000:01:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
0000:02:00.0 Unassigned class [ff00]: Hilscher GmbH CIFX 50E-DP(M/S)
0001:00:00.0 Non-VGA unclassified device: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01)
0001:01:00.0 Network controller: Qualcomm Device 1103 (rev 01)


Do you have any clue?

Best regards,
Romain

  • Hello,

    I have some additional infos:
    I replaced the Wifi device by an older one and I don't have the issue anymore:

    0000:00:00.0 PCI bridge: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01)
    0000:01:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
    0000:02:00.0 Unassigned class [ff00]: Hilscher GmbH CIFX 50E-DP(M/S)
    0001:00:00.0 PCI bridge: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01)
    0001:01:00.0 Network controller: Intel Corporation Centrino Wireless-N 135 (rev c4)

    Indeed the issue seems to be related to the PCIe speed (Gen1 2.5 GT/s vs Gen2 5 GT/s).
    The PCIe controller used on AM5749 devices support up to Gen2 speed.

    dra7-pcie 51000000.pcie: PCIe Gen.1 x1 link up  << Gen1 (CIFX)
    dra7-pcie 51800000.pcie: PCIe Gen.2 x1 link up  << Gen2 (Qualcomm)
    dra7-pcie 51800000.pcie: PCIe Gen.1 x1 link up  << Gen1 (n135)

    With the Qualcomm device I also have an additional warning in dmesg:

    pci 0001:01:00.0:  4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0001:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)

    Its means that the Qualcomm device (0001:01:00.0) could use up to Gen3 speed but it's limited by the PCIe controller (0001:00:00.0).

    Is any know issue (glitch?) on the TI PCI bridge when the PCIe device fallback from Gen3 to Gen2 ?
    I don't I have a Gen2 only PCIe device to check if it's due to the PCIe bandwidth mismatch...

    I checked the PCIe wire on the PCB but it seems ok with PCIe Gen2 design rules.

    Anyway, the PCIe bridge class reported as 0x0 (Non-VGA unclassified device) seems dubious.

    Best regards,
    Romain

  • Hello,

    I had a look to the initialization process of the PCI bridge when it fail:

    [    3.636444] pci_init_host_bridge
    [    3.636474] dra7-pcie 51800000.pcie: host bridge /ocp/axi@1/pcie@51800000 ranges:
    [    3.636535] dra7-pcie 51800000.pcie:       IO 0x0030003000..0x0030012fff -> 0x0000000000
    [    3.636566] dra7-pcie 51800000.pcie:      MEM 0x0030013000..0x003fffffff -> 0x0030013000
    [    3.636627] dra7xx_pcie_msi_host_init
    [    3.636688] dra7xx_pcie_host_init
    [    3.636688] dw_pcie_setup
    [    3.636688] dw_pcie_setup PORT_LINK_MODE_1_LANES
    [    3.636688] dra7xx_pcie_establish_link
    [    3.636718] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.636718] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.636718] dra7xx_pcie_establish_link read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.636718] dra7xx_pcie_establish_link write PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x1
    [    3.636718] dw_pcie_wait_for_link
    [    3.636718] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x1
    [    3.636718] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.736816] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0  << FUNDAMENTAL reset ??
    [    3.736816] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.836914] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.836914] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.937011] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.937011] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.037078] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.037078] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.137176] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.137176] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.237274] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.237274] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.337402] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.337432] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.437530] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.437561] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.537658] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.537658] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.637786] dra7-pcie 51800000.pcie: Phy link never came up

    And there is something strange during dw_pcie_wait_for_link(), the PCIECTRL_DRA7XX_CONF_DEVICE_CMD register seems reset by the SoC while the kernel is still initializing it. It can explain why the Class is 0x0.

    See the same log when the PCI bridge is ok

    [    3.636596] pci_init_host_bridge
    [    3.636627] dra7-pcie 51800000.pcie: host bridge /ocp/axi@1/pcie@51800000 ranges:
    [    3.636688] dra7-pcie 51800000.pcie:       IO 0x0030003000..0x0030012fff -> 0x0000000000
    [    3.636718] dra7-pcie 51800000.pcie:      MEM 0x0030013000..0x003fffffff -> 0x0030013000
    [    3.636779] dra7xx_pcie_msi_host_init
    [    3.636810] dra7xx_pcie_host_init
    [    3.636840] dw_pcie_setup
    [    3.636840] dw_pcie_setup PORT_LINK_MODE_1_LANES
    [    3.636840] dra7xx_pcie_establish_link
    [    3.636840] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.636871] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.636871] dra7xx_pcie_establish_link read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.636871] dra7xx_pcie_establish_link write PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x1
    [    3.636871] dw_pcie_wait_for_link
    [    3.636871] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x1
    [    3.636871] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.736968] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x45
    [    3.736968] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.736968] dra7-pcie 51800000.pcie: Link up
    [    3.736968] dra7xx_pcie_enable_interrupts
    [    3.736968] dra7xx_pcie_enable_wrapper_interrupts
    [    3.736999] dra7xx_pcie_enable_msi_interrupts
    [    3.736999] dra7xx_pcie_host_init exit

    Is something related to PCI "fundamental resets" ?

    For testing purpose, I switched the PCIe Wifi module to the first PCIe controller and I was not able to reproduce the issue.
    I don't wee any reason why it fail on the PCIe_SS2 and not on the PCIe_SS1.

    Do we need a PCIe quirk like for the DWC cadence "Retrain Link to work around Gen2 training defect" ?
    https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1d3efd15e8a43ce2ea91c040f77ef67a0b2b3bc5


    But it doesn't explain why PCIe_SS1 and PCIe_SS2 behave differently on this SoC during the Link training.

    Best regards,
    Romain

  • Hello,

    It seems related to a Link training issue:
    "The LTSSM_EN bit will reset back to 0 if the link training fails"

    See this nice post:
    e2e.ti.com/.../am5746-pcie-5gt-s-link-retry

    Best regards,
    Romain

  • Romain, 

    As you noted in the latest post, the PCIe_SS2 failed link training and LTSSM_EN bit was reset back to 0. In the normal link up case, we can see that the driver first set the LTSSM_EN bit to 1, then polling the same register and the PCIECTRL_TI_CONF_PHY_CS[LINK_UP] register bit. When the LTSSM state goes to 0x11 (L0 state), it proceed to link negotiation. in the failed case, the drive keep polling these registers and eventually timed out. so can we try the following steps to debug:

    1. Could you confirm you already checked that PCIe_SS1 is configured as 1xlane, in its 

       PCIe_SS1_RC_CFG_DBICS2[MAX_LINK_WIDTH]

    this should be configured as 1xlane so PCIe1 does not try to use both lanes. 

    2. Additionally, could you try to set the PCIe_SS2_RC_CFG_DBICS2[MAX_LINK_SPEEDS] to 0x1, Gen 1 only, and see if the Qualcomm card can link stably. I am not sure link speed is the issue, as the Qualcomm card should start out with Gen 1, then switch to Gen 2 as the RC is Gen 2 capable. But since link training never went  to L0 state, so even Gen 1 did not start. 

    3. In the log of the failed case, we can see that during continuous polling, the LTSSM was always 0x0, indicating that the lane was in DETECT_IDLE state - card was not detected. So as a third debug experiment, could your try to add a delay in front of the driver code where the LTSSM_EN bit was set, this gives more time for the WIFI card to go into link training. The PCIe spec. requires that the card must go to link training within 100ms of RESET deassertion. so adding a delay to gurantee that as a debug step. 

    4. As a last experiment, can you try to manually set the LTSSM_EN bit from linux command line, when the driver failed link training? then immediacy poll the two registers and see if link up. You will not be able to see the device after that, even there is a link up, due to no further enumeration. But via this experiment, we can check if there is a timing issue. 

    regards

    Jian

  • Hello Jian,

    Thanks you for your reply!

    1. The number of line is defined from the devicetree and by default it configure all PCIe controller for x1 lane.
    Also the function dra7xx_pcie_configure_two_lane() is not used in this case.

    About the register, you mean PCIECTRL_RC_DBICS2_LNK_CAP[MAX_LINK_WIDTH] ?

    |----------------------------|
    | Address (hex) | Data (hex) |
    |----------------------------|
    | 0x5100107C    | 0x00733C22 |
    | 0x5180107C    | 0x00733C22 |

    Indeed MAX_LINK_WIDTH is 0x2 and MAX_LINK_SPEEDS is 0x2
    But we have to check PCIECTRL_RC_DBICS2_LNK_CAS[NEG_LW]

    |----------------------------|
    | Address (hex) | Data (hex) |
    |----------------------------|
    | 0x51001080    | 0x10110008 |
    | 0x51801080    | 0x10110008 | when Link up failed
    | 0x51801080    | 0xF0120048 | when Link up succeeded

    So the PCIe_SS2_RC is still using x1 lane mode and GEN1 speed when it fail.

    I will try to write 0x1 to PCIECTRL_RC_DBICS2_LNK_CAP[MAX_LINK_WIDTH] to force the x1 lane mode.

    2. I already used with a Gen1 only device (Intel) without any issue. I'll force the Gen1 speed with the Qualcomm module, write 0x1 to PCIECTRL_RC_DBICS2_LNK_CAP[MAX_LINK_SPEEDS].

    3. I already tried to add a delay before setting the LTSSM_EN bit (as suggested on other post I found on this forum), I'm still able to reproduce the issue.
    My current workaround is to check if LTSSM_EN is 0 after calling dw_pcie_wait_for_link(pci) and retry dra7xx_pcie_establish_link() again until it pass.

    4. This is implemented with my workaround. When dw_pcie_wait_for_link(pci) fail and we call dra7xx_pcie_establish_link() again, the link is up.

    [    3.645965] pci_init_host_bridge
    [    3.645965] dra7-pcie 51800000.pcie: host bridge /ocp/axi@1/pcie@51800000 ranges:
    [    3.646057] dra7-pcie 51800000.pcie:       IO 0x0030003000..0x0030012fff -> 0x0000000000
    [    3.646087] dra7-pcie 51800000.pcie:      MEM 0x0030013000..0x003fffffff -> 0x0030013000
    [    3.646118] dra7xx_pcie_msi_host_init
    [    3.646179] dra7xx_pcie_host_init
    [    3.646209] dw_pcie_setup
    [    3.646209] dw_pcie_setup PORT_LINK_MODE_1_LANES
    [    3.646209] dra7xx_pcie_stop_link read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.646209] dra7xx_pcie_stop_link write PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.647216] dra7xx_pcie_establish_link
    [    3.647216] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.647247] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.647247] dra7xx_pcie_establish_link read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.647247] dra7xx_pcie_establish_link write PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x1
    [    3.647247] dw_pcie_wait_for_link
    [    3.647247] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x1
    [    3.647247] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.747344] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0        <<< reset
    [    3.747344] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.847442] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.847442] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.947570] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.947601] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.047668] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.047698] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.147827] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.147827] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.247924] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.247924] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.348022] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.348022] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.448120] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.448120] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.548248] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.548248] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.648406] dra7-pcie 51800000.pcie: Phy link never came up                                             <<< fail
    [    4.648406] dra7xx_pcie_host_init read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.648406] dra7xx_pcie_stop_link read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.648406] dra7xx_pcie_stop_link write PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.649414] dra7xx_pcie_establish_link                                                                                 <<< retry
    [    4.649444] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.649444] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.649444] dra7xx_pcie_establish_link read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    4.649444] dra7xx_pcie_establish_link write PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x1
    [    4.649444] dw_pcie_wait_for_link
    [    4.649444] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x5      <<< OK
    [    4.649444] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    4.749542] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x45    <<< OK
    [    4.749542] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    4.749572] dra7-pcie 51800000.pcie: Link up                                                                          <<< OK
    [    4.749572] dra7xx_pcie_host_init read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x45
    [    4.749572] dra7xx_pcie_enable_interrupts
    [    4.749572] dra7xx_pcie_enable_wrapper_interrupts
    [    4.749572] dra7xx_pcie_enable_msi_interrupts
    [    4.749572] dra7xx_pcie_host_init exit

    Best regards,
    Romain

  • Romain, 

    On the experiment:

    >>I will try to write 0x1 to PCIECTRL_RC_DBICS2_LNK_CAP[MAX_LINK_WIDTH] to force the x1 lane mode.

    please make sure to write the same for BOTH PCIe controllers. I had a suspect that PCIe1 was trying to take over both SERDES lanes, though you mentioned the "dra7xx_pcie_configure_two_lane()" was never called. 

    The PCIECTRL_RC_DBICS2_LNK_CAS[NEG_LW] indicates negotiated link width, since the WIFI card is 1 lane, therefore the link will always end up with x1lane. 

    It is interesting to know that when you retry the link training, PCIe2 will link up, though adding the delay did not work. can you comment what is the length of delays you have tried?

    regards

    Jian

  • Hi Jian,

    I have some trouble to debug the LTSSM states... Here is all the LTSSM state I could see on my debug log:

    DRA7XX_CONF_DEVICE_CMD 0x1 :               LTSSM_EN
    DRA7XX_CONF_DEVICE_CMD 0x5 : DETECT_ACT  + LTSSM_EN
    DRA7XX_CONF_DEVICE_CMD 0x9 : POLL_ACTIVE + LTSSM_EN
    DRA7XX_CONF_DEVICE_CMD 0x35: RCVRY_LOCK  + LTSSM_EN
    DRA7XX_CONF_DEVICE_CMD 0x39: RCVRY_SPEED + LTSSM_EN
    DRA7XX_CONF_DEVICE_CMD 0x45: L0          + LTSSM_EN

    I changed the usleep_range() delay values to read the register more often and forced to loop to exit after LINK_WAIT_MAX_RETRIES
    Doing so, we can see the issue with dw_pcie_wait_for_link() function.
    Indded dw_pcie_wait_for_link() don't expect to fail...

    If dw_pcie_wait_for_link() with DRA7XX_CONF_DEVICE_CMD = 0x39, then the PCIe stack trigger omap_l3_noc errors:

    [    3.681671] omap_l3_noc 44000000.ocp: L3 application error: target 5 mod:1 (unclearable)
    [    3.681701] omap_l3_noc 44000000.ocp: L3 debug error: target 5 mod:1 (unclearable)

    Here is the trace I get when the PCIe controller reset:

    [    3.655151] dra7xx_pcie_probe read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.655181] dra7xx_pcie_probe write PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.655670] dra7-pcie 51800000.pcie: host bridge /ocp/axi@1/pcie@51800000 ranges:
    [    3.655761] dra7-pcie 51800000.pcie:       IO 0x0030003000..0x0030012fff -> 0x0000000000
    [    3.655792] dra7-pcie 51800000.pcie:      MEM 0x0030013000..0x003fffffff -> 0x0030013000
    [    3.655914] dra7xx_pcie_establish_link
    [    3.655914] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.655914] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.656951] dra7xx_pcie_establish_link read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.656982] dra7xx_pcie_establish_link write PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x1
    [    3.656982] dra7xx_pcie_establish_link read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x5
    [    3.656982] dw_pcie_wait_for_link
    [    3.656982] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x5
    [    3.656982] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.658050] dw_pcie_wait_for_link: retry 0
    [    3.658050] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x9
    [    3.658050] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0
    [    3.659088] dw_pcie_wait_for_link: retry 1
    [    3.659088] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.659118] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000     <<< dra7xx_pcie_link_up() return true (BUG)
    [    3.660156] dw_pcie_wait_for_link: retry 2
    [    3.660156] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.660156] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000     <<< but we are stuck in the recovery state
    [    3.661193] dw_pcie_wait_for_link: retry 3
    [    3.661224] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.661224] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000     <<< If dw_pcie_wait_for_link() return now
    [    3.662261] dw_pcie_wait_for_link: retry 4
    [    3.662261] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.662261] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000     <<< we will get omap_l3_noc errors
    [    3.663360] dw_pcie_wait_for_link: retry 5
    [    3.663360] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.663360] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000     <<< omap_l3_noc 44000000.ocp: L3 application error: target 5 mod:1 (unclearable)
    [    3.664428] dw_pcie_wait_for_link: retry 6
    [    3.664428] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.664428] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000     <<< omap_l3_noc 44000000.ocp: L3 debug error: target 5 mod:1 (unclearable)
    [    3.665466] dw_pcie_wait_for_link: retry 7
    [    3.665466] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.665496] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.666534] dw_pcie_wait_for_link: retry 8
    [    3.666534] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.666534] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.667602] dw_pcie_wait_for_link: retry 9
    [    3.667602] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.667602] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.668640] dw_pcie_wait_for_link: retry 10
    [    3.668640] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.668670] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.669708] dw_pcie_wait_for_link: retry 11
    [    3.669708] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.669708] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.670776] dw_pcie_wait_for_link: retry 12
    [    3.670776] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.670776] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.671813] dw_pcie_wait_for_link: retry 13
    [    3.671844] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.671844] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.672882] dw_pcie_wait_for_link: retry 14
    [    3.672882] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.672882] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.673919] dw_pcie_wait_for_link: retry 15
    [    3.673950] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.673950] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.674987] dw_pcie_wait_for_link: retry 16
    [    3.674987] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.674987] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.676025] dw_pcie_wait_for_link: retry 17
    [    3.676055] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.676055] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.677093] dw_pcie_wait_for_link: retry 18
    [    3.677093] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.677093] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.678161] dw_pcie_wait_for_link: retry 19
    [    3.678161] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.678161] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.679199] dw_pcie_wait_for_link: retry 20
    [    3.679229] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.679229] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.680267] dw_pcie_wait_for_link: retry 21
    [    3.680267] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.680267] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.681335] dw_pcie_wait_for_link: retry 22
    [    3.681335] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.681335] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.682373] dw_pcie_wait_for_link: retry 23
    [    3.682403] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x39
    [    3.682403] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x10000
    [    3.683471] dw_pcie_wait_for_link: retry 24
    [    3.683471] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0   <<< here the PCIe controller reset
    [    3.683471] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0       <<< the link goes down :-( BUG

    Game over :-(

    Now, the dra7xx_pcie_link_up() will rever return true and the driver will reach the LINK_WAIT_MAX_RETRIES and fail with
    "Phy link never came up".

    We have to do something similar to the keystone PCI DWC implementation:
    static int ks_pcie_link_up(struct dw_pcie *pci)
    {
        u32 val;

        val = dw_pcie_readl_dbi(pci, PCIE_PORT_DEBUG0);
        val &= PORT_LOGIC_LTSSM_STATE_MASK;
        return (val == PORT_LOGIC_LTSSM_STATE_L0);
    }

    We have to wail for the LTSSM stage L0 instead of checking for the linkup status.

    Note: The LTSSM state machine reach the L0 state before going into the Recovery state.

    git.kernel.org/.../pci-keystone.c

    [    3.762664] dra7-pcie 51800000.pcie: Phy link never came up
    [    3.762817] dra7-pcie 51800000.pcie: PCI host bridge to bus 0001:00
    [    3.762817] pci_bus 0001:00: root bus resource [bus 00-ff]
    [    3.762817] pci_bus 0001:00: root bus resource [io  0x0000-0xffff]
    [    3.762847] pci_bus 0001:00: root bus resource [mem 0x30013000-0x3fffffff]
    [    3.762878] pci 0001:00:00.0: [104c:8888] type 01 class 0x000000
    [    3.762908] pci 0001:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff pref]
    [    3.762908] pci 0001:00:00.0: reg 0x14: [mem 0x00000000-0x0000ffff pref]
    [    3.763000] pci 0001:00:00.0: supports D1
    [    3.763000] pci 0001:00:00.0: PME# supported from D0 D1 D3hot
    [    3.771575] PCI: bus0: Fast back to back transfers disabled
    [    3.771575] pci 0001:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring                      <<<< This is bad also
    [    3.771728] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_DEVICE_CMD 0x0
    [    3.771728] dra7xx_pcie_link_up read PCIECTRL_DRA7XX_CONF_PHY_CS 0x0

    Now the question is: Why are we stuck in the recovery state (0x0E: RCVRY_SPEED) ?

  • Romain, 

    Could you confirm if the LTSSM state cycles through values you recorded, or it is stable on the last one 0x45 (L0)?

    If it cycles, I am guessing the link failed to negotiate link speed. If you happen to have a PCIe protocol analyzer, we should be able to see if it is trying to link with the WIFI card with Gen 3, when it failed, it did not drop back to Gen1, instead, it went to recovery->Detect...

    also I am checking the previous notes, and not sure if there was any outcome from below experienments:

    >>1. Could you confirm you already checked that PCIe_SS1 is configured as 1xlane, in its PCIe_SS1_RC_CFG_DBICS2[MAX_LINK_WIDTH]

    >>2. Additionally, could you try to set the PCIe_SS2_RC_CFG_DBICS2[MAX_LINK_SPEEDS] to 0x1, Gen 1 only, and see if the Qualcomm card can link

    Sorry for the delays in response. I was not able to get to e2e. 

    Regards

    jian 

  • Romain, 

    Also once of my colleagues asked if you are using common refclk architecture, as we have seen similar link issues solved by common refclk. 

    thanks
    Jian

  • Hello Jian,

    Could you confirm if the LTSSM state cycles through values you recorded, or it is stable on the last one 0x45 (L0)?

    The LTSSM state I recorded is 0 at the end since the controller has reset. The LTSSM state 0x45 (L0) was expected.

    the link failed to negotiate link speed

    I'm agree with that, the link failed to negotiate the link speed.
    Sadly I don't have a PCIe protocol analyzer, I can't measure anything on the PCIe bus...

    >>1. Could you confirm you already checked that PCIe_SS1 is configured as 1xlane, in its PCIe_SS1_RC_CFG_DBICS2[MAX_LINK_WIDTH]

    Let me double check. For me the PCIe_SS1 should be configured as 1xlane.

    >>2. Additionally, could you try to set the PCIe_SS2_RC_CFG_DBICS2[MAX_LINK_SPEEDS] to 0x1, Gen 1 only, and see if the Qualcomm card can link

    As a workaround, I limited the PCIe speed from the kernel devicetree:

    /* PCIe lan1 to Wifi module */
     &pcie2_rc {
         status = "okay";
         gpios = <&gpio4 0 GPIO_ACTIVE_HIGH>;

        /* Limit to PCIe Gen1 speed due to a "Link Training and Status State Machine" (LTSSM)
         * issue. The LTSSM may enter in RCVRY_SPEED state after failing to negotiate the
         * PCIe bus Gen2 speed with the Wifi module. The RCVRY_SPEED state is followed by
         * a reset of the PCIe controller that is not correctly handled or detected by the
         * DWC dra7 PCIe driver.
         */
        max-link-speed = <1>;
     };

    Since then the module is detected correctly.

    Best regards,
    Romain

  • Also once of my colleagues asked if you are using common refclk architecture, as we have seen similar link issues solved by common refclk. 

    The refclk is provided by an PI6C557-05LE device.

    CLK0 of PI6C557-05LE is connected to LJCB_CLK of the AM5749
    CLK1 is connected to the PCIe device connected to PCIe_SS1
    CLK2 is connected to the PCIe device connected to PCIe_SS2
    CLK3 is unused.

    Moving the Qualcomm Wifi module to the PCIe_SS1 doesn't have the issue and the speed is negotiated to GEN2.

    Best regards,
    Romain