PROCESSOR-SDK-K2L: Another PCIe bug in Processor SDK

Yurii Monakov

Hi All!

I've found another bug in PCIe driver from Linux Processor SDK (appears in latest SDK and ti git master). This bug affects initial link training sequence.

This code snippet is from drivers/pci/dwc/pci-keystone.c file:

static void ks_pcie_stop_link(struct dw_pcie *pci)
{
	struct keystone_pcie *ks_pcie = to_keystone_pcie(pci);
	u32 val;

	/* Disable Link training */
	val = ks_pcie_app_readl(ks_pcie, CMD_STATUS);
	val &= ~LTSSM_EN_VAL;
	ks_pcie_app_writel(ks_pcie, CMD_STATUS, LTSSM_EN_VAL | val);
}

static int ks_pcie_start_link(struct dw_pcie *pci)
{
	struct keystone_pcie *ks_pcie = to_keystone_pcie(pci);
	struct device *dev = pci->dev;
	u32 val;

	if (dw_pcie_link_up(pci)) {
		dev_dbg(dev, "link is already up\n");
		return 0;
	}

	/* Initiate Link Training */
	val = ks_pcie_app_readl(ks_pcie, CMD_STATUS);
	ks_pcie_app_writel(ks_pcie, CMD_STATUS, LTSSM_EN_VAL | val);

	return 0;
}

As you can see, ks_pcie_stop_link function never clears LTSSM_EN_VAL bit. So link training is initiated only once during kernel boot.

If the link is unstable during early boot, it will never be established. I've found this behavior on low temperature testing.

Best Regards,

Yurii

PS. Unfortunately previously reported bug () was not fixed in latest SDK.

over 5 years ago

0 Yordan Kovachev over 5 years ago

TI__Guru**** 161600 points

Hi,

K2L devices are NRND and have limited support. This is why bug fixes may be delayed.

Best Regards,
Yordan

0 Mukul Bhatnagar over 5 years ago in reply to Yordan Kovachev

TI__Guru* 83705 points

Actually , there are no plans to fix any bugs on K2L-PSDK as the device is limited support.

I will ask a colleague to look to see if we can confirm this to be a bug from our side.

Given you are seeing issues at low temp etc are you suspecting a hardware problem or software problem?

Regards

Mukul

0 Yurii Monakov over 5 years ago in reply to Mukul Bhatnagar

Intellectual 660 points

Mukul Bhatnagar said:
Actually , there are no plans to fix any bugs on K2L-PSDK as the device is limited support.

I thought that peripherals drivers are shared between KS2 devices. These two bugs affect all devices which share same drivers. Do you have some K2L-specific drivers in the kernel and K2L PSDK?

Mukul Bhatnagar said:
Given you are seeing issues at low temp etc are you suspecting a hardware problem or software problem?

After fixing link training bug described above we have no issues. Link goes up every time (every time processor starts correctly).

We had some problems with U-Boot loading, but they were fixed by lowering SPI frequency.

Now we face very-very rare OS hangup after link goes up (on FPGA side it reports link active and stays up), but we have locked up CPU. No output on tty's nor ethernet. The same behavior is observed when trying to read PCIe data memory when link was inactive. When the link is down, calling

devmem 0x50000000 32

causes device hangup. Don't know if it is hardware or software problem. Is it safe to read/write PCIe data region without link?

Best Regards,
Yurii

0 lding over 5 years ago in reply to Yurii Monakov

TI__Guru* 95265 points

Hi,

Thanks for reporting this issue! The issue applies to all Keystone II devices. And we opened a ticket to track this and it will be assessed and fixed in future Linux releases. Meanwhile, please use your workaround for now.

"Is it safe to read/write PCIe data region without link?"====>No, you can't access the PCIE outbound region when the link is down.

Regards, Eric

0 Yurii Monakov over 5 years ago in reply to lding

Intellectual 660 points

Hi Eric,

lding said:
And we opened a ticket to track this and it will be assessed

Thank you. There is another issue with OB_WIN_SIZE mentioned in first message.

lding said:
No, you can't access the PCIE outbound region when the link is down

Do you confirm that accessing PCIe outbound region when the link is down causes CPU hangup?

Best Regards,
Yurii

0 lding over 5 years ago in reply to Yurii Monakov

TI__Guru* 95265 points

Yurii,

What you mentioned in https://e2e.ti.com/support/processors/f/791/t/763933 for OB_WIN_SIZE is a valid bug. Sorry, it was opened for a while but was missed for some reason. This bug applies to all keystone II device as well. In the code, OB_WIN_SIZE can be 1, 2, 4, 8 (in unit of MB), coded as 0, 1, 2, 3 in the OB_SIZE register, this is correct. When creating memory mapping,

start += OB_WIN_SIZE; this is wrong, it should be what you suggested: start += (OB_WIN_SIZE << 20);

I also asked the Linux driver team to open a ticket and track it.

Do you confirm that accessing PCIe outbound region when the link is down causes CPU hangup? >>>> Yes, that is my experience when accessing an outbound region without a PCIE link. It always hang the CPU (I used the PCIE RTOS driver or directly used the CCS memory window, I am not in Linux team, but the same thing) .

Regards, Eric

0 Rex Chang over 5 years ago in reply to lding

TI__Guru 50170 points

Hi, Yurii,

We also open a ticket to track the Keystone-II PCIe memory space mapping issue. Thank you for your information. Both of your changes are valid.

Though TI will fix these bugs in the next release, it will be nice if you can post the changes to the upstream too.

Rex

0 Yurii Monakov over 5 years ago in reply to lding

Intellectual 660 points

Eric,

lding said:
Yes, that is my experience when accessing an outbound region without a PCIE link

Thank you for the answer. We shall consider adding some kind of watchdog on FPGA side to handle this failure.

Yurii

0 Yurii Monakov over 5 years ago in reply to Rex Chang

Intellectual 660 points

Hi, Rex,

Rex Chang said:
Though TI will fix these bugs in the next release

Great to hear that. These fixes are the only modification I've made to kernel (besides custom device tree and defconfig).

Rex Chang said:
it will be nice if you can post the changes to the upstream too

Oh, if I can find time for this. I've never contributed to kernel before. There are some code reorganization in recent kernels so patches will go to other files.

Best Regards,
Yurii

0 Rex Chang over 5 years ago in reply to Yurii Monakov

TI__Guru 50170 points

Yurii,

It is open source kernel so everyone can contribute. It is not an issue if it takes effort to post to upstream. We'll upstream the change after fixing it.

Thanks!

Rex

Processors

Processors forum

PROCESSOR-SDK-K2L: Another PCIe bug in Processor SDK