This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: AM57 hang during PCIe shutdown

Part Number: AM5728

We're seeing frequent hangs during shutdown/reboot on our TQMa5728/MBa57xx board on ti-rt-linux-5.10.y (tag cicd.2022.11.02.13.24.41-rt), as well as an older branch based on ti-rt-linux-5.4.y. I have not tested other kernel versions.

On the 5.10 kernel, I managed to narrow the issue down to the shutdown of the PCIe controller and PHY:

device_shutdown()
    pci_device_shutdown()
        pcie_portdrv_remove()
            pcie_port_device_remove()
                pci_disable_device()
                    do_pci_disable_device()
                        pci_write_config_word() // A
    platform_drv_shutdown()
        dra7xx_pcie_shutdown()
            dra7xx_pcie_disable_phy()
                phy_power_off()
                    ti_pipe3_power_off()
                        regmap_update_bits() // B

Location B is where the hang actually happens. There seems to be some interaction between disabling the controller and PHY however: Adding a delay of 1ms (or a synchronous printk to a serial console) anywhere between the locations A and B makes the issue go away, or at least unlikely enough that I haven't been able to observe it anymore, when it previously happened in roughly 50% of shutdowns. Adding the delay *before* location A doesn't have any effect.

Other information that might be relevant:

  • Our Device Tree enables both PCIe controllers. The hang can happen during the shutdown of either of them.
  • Nothing is connected to the two PCIe ports.
  • Our config enables PREEMPT_RT.
  • Matthias, 

    I suspect the hang is either caused by writing to a register that was turned off, or there are standing transactions in the interconnect, while the port is turning off. 

    Can you add a check before the //A and //B lines to see what registers they are writing to? It seems that we first turned disabled PHY, then turned off power, the shut down the PIPE3 interface between the PHY and the controller. So the last regmap operation should be on PCIe controller not the PHY.  

    On timing, for quick tries, can you add a CPU memory fencing instruction after //A line and see if the issue still happens?

    jian

  • Hi Jian,

    do_pci_disable_device() at //A clears the PCI_COMMAND_MASTER bit in the PCI_COMMAND register. I can see the following values passed to pci_generic_config_write(): where=4 size=2 val=0x0143

    ti_pipe3_power_off() at //B clears the PWRCTL_CMD bits in the CTRL_CORE_PHY_POWER_PCIESS register corresponding to the PHY.

    A few other things are happening between //A and //B that I didn't include in the call tree included in my post. For example, dra7xx_pcie_shutdown() calls dra7xx_pcie_stop_link() and pm_runtime_put_sync(): git.ti.com/gitweb

    I tried adding an mb() after //A, however it did not have any effect on the issue. A udelay(1000) at the same location does make the issue disappear.

  • Matthias, 

    sorry for the delayed response. I was wrapping up some debug then took some time off during holidays. 

    The CTRL_CORE_PHY_POWER_PCIESS register is part of SOC control module, and writing to the PWRCTL_CMD bits should not hang the CPU. Instead, I am suspecting that when you are writing to the PCI_COMMAND register is as non-posted transaction (to configure space), and it expect to receive completion TLP from the EP. However, the SERDES buffered was shutdown when you called:

        phy_power_off()->ti_pipe3_power_off

    thus the CPU hangs. so adding a delay after the //A is needed. The delay should be slightly larger than the Completion Timeout value programmed. 

    Regards

    Jian