This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-K2L: Another PCIe bug in Processor SDK

Part Number: PROCESSOR-SDK-K2L

Hi All!

I've found another bug in PCIe driver from Linux Processor SDK (appears in latest SDK and ti git master). This bug affects initial link training sequence.

This code snippet is from drivers/pci/dwc/pci-keystone.c file:

static void ks_pcie_stop_link(struct dw_pcie *pci)
{
	struct keystone_pcie *ks_pcie = to_keystone_pcie(pci);
	u32 val;

	/* Disable Link training */
	val = ks_pcie_app_readl(ks_pcie, CMD_STATUS);
	val &= ~LTSSM_EN_VAL;
	ks_pcie_app_writel(ks_pcie, CMD_STATUS, LTSSM_EN_VAL | val);
}

static int ks_pcie_start_link(struct dw_pcie *pci)
{
	struct keystone_pcie *ks_pcie = to_keystone_pcie(pci);
	struct device *dev = pci->dev;
	u32 val;

	if (dw_pcie_link_up(pci)) {
		dev_dbg(dev, "link is already up\n");
		return 0;
	}

	/* Initiate Link Training */
	val = ks_pcie_app_readl(ks_pcie, CMD_STATUS);
	ks_pcie_app_writel(ks_pcie, CMD_STATUS, LTSSM_EN_VAL | val);

	return 0;
}

As you can see, ks_pcie_stop_link function never clears LTSSM_EN_VAL bit. So link training is initiated only once during kernel boot.

If the link is unstable during early boot, it will never be established. I've found this behavior on low temperature testing.

Best Regards,

Yurii

PS. Unfortunately previously reported bug () was not fixed in latest SDK.

  • Hi,

    K2L devices are NRND and have limited support. This is why  bug fixes may be delayed.

    Best Regards,
    Yordan
      

  • Actually , there are no plans to fix any bugs on K2L-PSDK as the device is limited support. 

    I will ask a colleague to look to see if we can confirm this to be a bug from our side. 

    Given you are seeing issues at low temp etc are you suspecting a hardware problem or software problem?

    Regards

    Mukul

  • Mukul Bhatnagar said:
    Actually , there are no plans to fix any bugs on K2L-PSDK as the device is limited support. 

    I thought that peripherals drivers are shared between KS2 devices. These two bugs affect all devices which share same drivers. Do you have some K2L-specific drivers in the kernel and K2L PSDK?

    Mukul Bhatnagar said:
    Given you are seeing issues at low temp etc are you suspecting a hardware problem or software problem?

    After fixing link training bug described above we have no issues. Link goes up every time (every time processor starts correctly).

    We had some problems with U-Boot loading, but they were fixed by lowering SPI frequency.

    Now we face very-very rare OS hangup after link goes up (on FPGA side it reports link active and stays up), but we have locked up CPU. No output on tty's nor ethernet. The same behavior is observed when trying to read PCIe data memory when link was inactive. When the link is down, calling

    devmem 0x50000000 32

    causes device hangup. Don't know if it is hardware or software problem. Is it safe to read/write PCIe data region without link?

    Best Regards,
    Yurii

  • Hi,

    Thanks for reporting this issue! The issue applies to all Keystone II devices. And we opened a ticket to track this and it will be assessed and fixed in future Linux releases. Meanwhile, please use your workaround for now. 

    "Is it safe to read/write PCIe data region without link?"====>No, you can't access the PCIE outbound region when the link is down.

    Regards, Eric

  • Hi Eric,

    lding said:
    And we opened a ticket to track this and it will be assessed


    Thank you. There is another issue with OB_WIN_SIZE mentioned in first message.

    lding said:
    No, you can't access the PCIE outbound region when the link is down


    Do you confirm that accessing PCIe outbound region when the link is down causes CPU hangup?

    Best Regards,
    Yurii

  • Yurii,

    What you mentioned in https://e2e.ti.com/support/processors/f/791/t/763933 for OB_WIN_SIZE is a valid bug. Sorry, it was opened for a while but was missed for some reason. This bug applies to all keystone II device as well. In the code, OB_WIN_SIZE can be 1, 2, 4, 8 (in unit of MB), coded as 0, 1, 2, 3 in the OB_SIZE register, this is correct. When creating memory mapping,

    start += OB_WIN_SIZE; this is wrong, it should be what you suggested: start += (OB_WIN_SIZE << 20);

    I also asked the Linux driver team to open a ticket and track it.

    Do you confirm that accessing PCIe outbound region when the link is down causes CPU hangup? >>>> Yes, that is my experience when accessing an outbound region without a PCIE link. It always hang the CPU (I used the PCIE RTOS driver or directly used the CCS memory window, I am not in Linux team, but the same thing) .

    Regards, Eric

  • Hi, Yurii,

    We also open a ticket to track the Keystone-II PCIe memory space mapping issue. Thank you for your information. Both of your changes are valid.

    Though TI will fix these bugs in the next release, it will be nice if you can post the changes to the upstream too.

    Rex

  • Eric,

    lding said:
    Yes, that is my experience when accessing an outbound region without a PCIE link

    Thank you for the answer. We shall consider adding some kind of watchdog on FPGA side to handle this failure.

    Yurii

  • Hi, Rex,

    Rex Chang said:
    Though TI will fix these bugs in the next release


    Great to hear that. These fixes are the only modification I've made to kernel (besides custom device tree and defconfig).

    Rex Chang said:
    it will be nice if you can post the changes to the upstream too


    Oh, if I can find time for this. I've never contributed to kernel before. There are some code reorganization in recent kernels so patches will go to other files.

    Best Regards,
    Yurii

  • Yurii,

    It is open source kernel so everyone can contribute. It is not an issue if it takes effort to post to upstream. We'll upstream the change after fixing it.

    Thanks!

    Rex