PCIESS hangs?

Dennis McLeod

Other Parts Discussed in Thread: TLC59108, AM3874

I am using a DM8148 evm and have PCIESS configured as RC. In the ti814x_pcie_setup routine, as soon as the code enables the link training LTSSM, all subsequent reads from the PCIESS application registers cause the kernel to panic. The error message is:

"Unhandled fault: external abort on non-linefetch (0x1008) at e8820004"

This is triggered when the driver next tries to read the CMD_STATUS register.. which is in disable_bars() -> set_dbi_mode(). What happened in PCIESS to cause inability to read ioremapped memory???

Thank you in advance!

Dennis

over 11 years ago

0 Pavel Botev over 11 years ago

TI__Guru**** 170625 points

Dennis,

Are you working with EZSDK 5.05.02.00 / PSP 04.04.00.01?

Dennis McLeod said:
I am using a DM8148 evm and have PCIESS configured as RC

Are you align with the below wiki page:

http://processors.wiki.ti.com/index.php/TI81XX_PSP_PCI_Express_Root_Complex_Driver_User_Guide

Dennis McLeod said:
What happened in PCIESS to cause inability to read ioremapped memory???

Will the below patch fix this?

http://arago-project.org/git/projects/?p=linux-omap3.git;a=commit;h=6369801405c5b10cf2d0837ad89b4a826e11615d

BR
Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

Hi Pavel,

Thank you for your reply. I did follow the RC driver guide quite closely. I just applied the patch you referred to and there is no change in behavior.

I can provide any register dumps you think might help solve the problem

Another interesting piece of information is that we have a PLX 8605 downstream. If we hold the PLX in reset, the kernel boots fine. Once the kernel is booted (and with PLX still held in reset), we can do "devmem 0x51000004 32 0xa07" to initiate link training and there is no problem. If we then take the PLX out of reset and then initiate link training, the problem reappears. Any accesses to the area at 0x5100xxxx cause the "external abort" error to reappear.

I would understand if the link training failed. But for it to suddenly have problems accessing memory, this is quite confusing.

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis,

Can you also apply the below patches, does it make any difference?

http://arago-project.org/git/projects/?p=linux-omap3.git;a=commit;h=7367129164936713aaa7e832fd4c22e3bc1c3a2a

http://arago-project.org/git/projects/?p=linux-omap3.git;a=commit;h=937f9325a14db8a584af933f9b9b8c51fa34573c

http://arago-project.org/git/projects/?p=linux-omap3.git;a=commit;h=d5352bda02e9af813632e0afa5bd25dcc997b086

http://arago-project.org/git/projects/?p=linux-omap3.git;a=commit;h=3e1bd8effac5332322e1dbe98e2c7535f20c0416

Regards,
Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

The SERDES_STATUS register in the PCIESS application registers block is undocumented. Can anyone tell us what the bits represent?

I will apply the patches you listed and report back soon.

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

No luck after applying the patches. The same behavior continues.

Just so it is clear, the issue happens instantly after the call to

writel(LTSSM_EN_VAL | readl(reg_virt + CMD_STATUS),
reg_virt + CMD_STATUS);

in ti81xx_pcie_setup(). Any call to readl or writel to the memory pointed to by "reg_virt" will cause the error.

If the writel above is allowed to execute during bootup, it results in a kernel halt due to the "Unhandled fault: external abort on non-linefetch (0x1008) at e882000" and a reboot is required.

I built a kernel with that line commented out, and it boots fine. Once booted, I can initiate link training with

devmem2 0x51000004 32 0x0a07

once I do this, any calls to devmem2 0x5100xxxx will cause the error to be printed to the screen, (Unhandled fault: external abort on non-linefetch) but it does not result in a kernel halt.

What's also very interesting, is that if we use devmem2 to initiate link training and then wait a VERY long time (many many minutes later), we can again use devmem2 to query the PCIESS without getting the error.

Please, any input is greatly appreciated :)

Thank you again

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis,

This is what we have in pcie-ti81xx.c

static int ti81xx_pcie_setup(int nr, struct pci_sys_data *sys)
{

......

/* 16KB region is sufficiant for reg(4KB) + configs(8KB) + IO(4KB) */
    reg_virt = (u32)ioremap_nocache(reg_phys, SZ_16K);

   if (!reg_virt) {
       pr_err(DRIVER_NAME ": PCIESS register memory remap failed\n");
       goto err_ioremap;
   }

   pr_info(DRIVER_NAME ": Register base mapped @0x%08x\n", (int)reg_virt);

......

__raw_writel(DIR_SPD | __raw_readl(
reg_virt + SPACE0_LOCAL_CFG_OFFSET + PL_GEN2),
reg_virt + SPACE0_LOCAL_CFG_OFFSET + PL_GEN2);

......

if (device_id)
__raw_writew(device_id, reg_virt + SPACE0_LOCAL_CFG_OFFSET +
PCI_DEVICE_ID);

......

__raw_writel(LTSSM_EN_VAL | __raw_readl(reg_virt + CMD_STATUS),
           reg_virt + CMD_STATUS);

   /* 100ms */
   msleep(100);

__raw_writew(PCI_CLASS_BRIDGE_PCI,
reg_virt + SPACE0_LOCAL_CFG_OFFSET + PCI_CLASS_DEVICE);

.....

}

These are the messages I have on a successful boot:

ti81xx_pcie: Register base mapped @0xd7020000

This means that the PCIe register base address (physical address) 0x51000000 is mapped to the virtual address 0xD7020000. Do you have something similar?

Also, do you mean that you have several successful __raw_real() and __raw_writel() before the crash?

Dennis McLeod said:

Just so it is clear, the issue happens instantly after the call to

writel(LTSSM_EN_VAL | readl(reg_virt + CMD_STATUS),
reg_virt + CMD_STATUS);

Is you boot up flow crash at the below line?

__raw_writew(PCI_CLASS_BRIDGE_PCI,
reg_virt + SPACE0_LOCAL_CFG_OFFSET + PCI_CLASS_DEVICE);

BR
Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

This means that the PCIe register base address (physical address) 0x51000000 is mapped to the virtual address 0xD7020000. Do you have something similar?

Yes, that line is printed for us also. However, the virtual address is different. It says "ti81xx_pcie: Register base mapped @0xe8820000". (Probably different because we're on a DM8148 and you're using a DM8168?)

Is you boot up flow crash at the below line?

__raw_writew(PCI_CLASS_BRIDGE_PCI,
reg_virt + SPACE0_LOCAL_CFG_OFFSET + PCI_CLASS_DEVICE);

I am sorry, I was incorrect when I said that "any reads/writes to reg_virt window" cause the error. During boot up, the crash happens when application registers are accessed. So in the following code block, the crash happens inside the disable_bars() function. The first thing disable_bars() does is call set_dbi_mode(). The set_dbi_mode() function contains a readl(reg_virt + CMD_STATUS) that blows everything up.

writel(LTSSM_EN_VAL | readl(reg_virt + CMD_STATUS),
reg_virt + CMD_STATUS);

/* 100ms */
msleep(100);

/*
* Identify ourselves as 'Bridge' for enumeration purpose. This also
* avoids "Invalid class 0000 for header type 01" warnings from "lspci".
*
* If at all we want to restore the default class-subclass values, the
* best place would be after returning from pci_common_init ().
*/
writew(PCI_CLASS_BRIDGE_PCI,
reg_virt + SPACE0_LOCAL_CFG_OFFSET + PCI_CLASS_DEVICE);

/*
* Prevent the enumeration code from assigning resources to our BARs. We
* will set up them after the scan is complete.
*/
disable_bars(); // << ----- #### crash happens in here ####

Best Regards,

Dennis

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis,

Dennis McLeod said:
Yes, that line is printed for us also. However, the virtual address is different. It says "ti81xx_pcie: Register base mapped @0xe8820000". (Probably different because we're on a DM8148 and you're using a DM8168?)

No, I am also using DM8148 EVM, but with nothing attached to the PCIe. If you remove the PCIe device attached to the DM8148 EVM, with you have the same virtual address (0xE8820000)?

Can you read the CMD_STATUS register right after you set the virtual address:

/* 16KB region is sufficiant for reg(4KB) + configs(8KB) + IO(4KB) */
reg_virt = (u32)ioremap_nocache(reg_phys, SZ_16K);

if (!reg_virt) {
       pr_err(DRIVER_NAME ": PCIESS register memory remap failed\n");
       goto err_ioremap;
   }

   pr_info(DRIVER_NAME ": Register base mapped @0x%08x\n", (int)reg_virt);

   pcie_ck = clk_get(NULL, "pcie_ck");
   if (IS_ERR(pcie_ck)) {
       pr_err(DRIVER_NAME ": Failed to get PCIESS clock\n");
       goto err_clkget;
   }

   if (clk_enable(pcie_ck))
       goto err_clken;

__raw_readl(reg_virt + CMD_STATUS); ----> Are you able to read the CMD_STATUS register? Or the flow crash as before?

BR
Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

I placed several reads between the ioremap and the "initiate link training", as you requested. Like this:

if (clk_enable(pcie_ck))
goto err_clken;

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS);
printk("pcie: %s %d read CMD_STATUS, value = %x\n", __FILE__, __LINE__, val32);

/*
* TI81xx devices do not support h/w autonomous link up-training to GEN2
* form GEN1 in either EP/RC modes. The software needs to initiate speed
* change.
*/
writel(DIR_SPD | readl(
reg_virt + SPACE0_LOCAL_CFG_OFFSET + PL_GEN2),
reg_virt + SPACE0_LOCAL_CFG_OFFSET + PL_GEN2);

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS);
printk("pcie: %s %d read CMD_STATUS, value = %x\n", __FILE__, __LINE__, val32);

.....

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS);
printk("pcie: %s %d read CMD_STATUS, value = %x\n", __FILE__, __LINE__, val32);
/*
* Override the default device ID if required - TI81XX devices generally
* come up with ID 0x8888.
*/
if (device_id)
writew(device_id, reg_virt + SPACE0_LOCAL_CFG_OFFSET +
PCI_DEVICE_ID);

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS);
printk("pcie: %s %d read CMD_STATUS, value = %x\n", __FILE__, __LINE__, val32);
/*
* Initiate Link Training. We will delay for L0 as specified by
* standard, but will still proceed and return success irrespective of
* L0 status as this will be handled by explicit L0 state checks during
* enumeration.
*/
writel(LTSSM_EN_VAL | readl(reg_virt + CMD_STATUS),
reg_virt + CMD_STATUS);

/* 100ms */
msleep(100);

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS); // <--- #### causes crash ####
printk("pcie: %s %d read CMD_STATUS, value = %x\n", __FILE__, __LINE__, val32);

----------------------

Here is the output:

ti81xx_pcie: Invoking PCI BIOS...
ti81xx_pcie: Setting up Host Controller...
ti81xx_pcie: Register base mapped @0xe8820000
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 655 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 657 read CMD_STATUS, value = a00
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 668 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 670 read CMD_STATUS, value = a00
ti81xx_pcie: forcing link width - x1
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 703 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 705 read CMD_STATUS, value = a00
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 714 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 716 read CMD_STATUS, value = a00
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 729 reading CMD_STATUS
Unhandled fault: external abort on non-linefetch (0x1008) at 0xe8820004
Internal error: : 1008 [#1]
last sysfs file:
Modules linked in:
CPU: 0 Not tainted (2.6.37+ #2)
PC is at ti81xx_pcie_setup+0x388/0x6c0
LR is at release_console_sem+0x198/0x1ac
pc : [<c005ca3c>] lr : [<c006cf04>] psr: 60000013
sp : e783be40 ip : e783bd78 fp : e783be6c
r10: 00000000 r9 : 00000000 r8 : e786ac9c
r7 : e786ac40 r6 : c04c4610 r5 : e786ac80 r4 : c04c4610
r3 : e8820000 r2 : c049fe00 r1 : 000015ca r0 : 00000042
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis,

Dennis McLeod said:

/*
* Initiate Link Training. We will delay for L0 as specified by
* standard, but will still proceed and return success irrespective of
* L0 status as this will be handled by explicit L0 state checks during
* enumeration.
*/
writel(LTSSM_EN_VAL | readl(reg_virt + CMD_STATUS),
reg_virt + CMD_STATUS);

/* 100ms */
msleep(100);

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS); // <--- #### causes crash ####

May be this 100ms delay is not enough for the link training successful completion. Can you check the value of DEBUG0[4:0] LTSSM_STATE right after the msleep(100) function? Does it show 0x11?

BR
Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

I think we are getting closer to a solution. I noticed that on the EVM, the u-boot "bootargs" contained parameters for mem=size@offs entries. I have never needed to do that for our board. Our board has 1GB of ddr, in contrast to the EVM's 2GB ddr. So just for the sake of trying, I added this to u-boot's bootargs:

mem=512M@0x80000000 mem=512M@0xa0000000

It booted! It detected the PLX bridge chip and assigned resources for all 4 ports.

But, it only boot once! I can not get it to boot again. Even with those mem= entries, it is back to behaving as it did before. So I think I have some memory allocation problem. The memory related bootargs parameters are mem=512M@0x80000000 mem=512M@0xa0000000 and vmalloc=256M

I did change the ti81xx_pcie_resources entry in devices.c for "pcie-inbound0" to the following:

{
/* Inbound memory window - DJM: EVM has 2GB ddr, we only have 1GB */
.name = "pcie-inbound0",
.start = PLAT_PHYS_OFFSET,
.end = PLAT_PHYS_OFFSET + SZ_1G - 1,
.flags = IORESOURCE_MEM,
},

Is there anyplace else I need to modify to reflect the smaller memory size?

Maybe the 4 different ports of the PLX switch are getting enumerated and take up too much memory? And that is what is causing the crash? I tried slightly smaller and slightly larger sizes for vmalloc, but it didn't make a difference.

BR,

Dennis

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis,

Dennis McLeod said:
Our board has 1GB of ddr, in contrast to the EVM's 2GB ddr.

The DM8148 EVM has 1GB DDR3 memory.

Dennis McLeod said:
So I think I have some memory allocation problem.

Dennis McLeod said:
Maybe the 4 different ports of the PLX switch are getting enumerated and take up too much memory? And that is what is causing the crash?

Yes, I also suspect that the root cause for the crash is the PLX switch and the memory it requires.

BR
Pavel

0 Pavel Botev over 11 years ago in reply to Pavel Botev

TI__Guru**** 170625 points

A possible solution can be to reduce some resource from the EZSDK 1GB memory map and provide this memory to the linux kernel (mem argument):

http://processors.wiki.ti.com/index.php/EZSDK_Memory_Map

Regards,
Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

Hi Pavel,

I was just reviewing that page also, the EZSDK_Memory_Map. It appears my mem= arguments were incorrectly set. I was passing in the entire ddr range.

So as instructed on that wiki page, i set the bootargs as follows:

mem=364M@0x80000000 mem=320M@0x9fc00000 vmalloc=500M

The board booted again, without problem.

Here is part of the boot log:

ti81xx_pcie: forcing link width - x1
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 709 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 711 read CMD_STATUS, value = a00
pcie: cmregs PCIE_CFG = 2
pcie: cmregs PCIE_PLLCFG0 = 70007017
pcie: cmregs PCIE_PLLCFG1 = 640010
pcie: cmregs PCIE_PLLCFG2 = 0
pcie: cmregs PCIE_PLLCFG3 = 4008e0
pcie: cmregs PCIE_PLLCFG4 = 609c
pcie: cmregs PCIE_PLLSTATUS = 88cd
pcie: cmregs PCIE_RXSTATUS = 0
pcie: cmregs PCIE_TXSTATUS = 0
pcie: cmregs SERDES_RFCK_CTL = 2
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 744 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 746 read CMD_STATUS, value = a00
pcie: cmregs PCIE_CFG = 2
pcie: cmregs PCIE_PLLCFG0 = 70007017
pcie: cmregs PCIE_PLLCFG1 = 640010
pcie: cmregs PCIE_PLLCFG2 = 0
pcie: cmregs PCIE_PLLCFG3 = 4008e0
pcie: cmregs PCIE_PLLCFG4 = 609c
pcie: cmregs PCIE_PLLSTATUS = 88cd
pcie: cmregs PCIE_RXSTATUS = 0
pcie: cmregs PCIE_TXSTATUS = 0
pcie: cmregs SERDES_RFCK_CTL = 2
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 784 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 786 read CMD_STATUS, value = a01
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 797 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 799 read CMD_STATUS, value = a01
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 806 reading CMD_STATUS
pcie: arch/arm/mach-omap2/pcie-ti81xx.c 808 read CMD_STATUS, value = a01
ti81xx_pcie: Starting PCI scan...
PCI: bus0: Fast back to back transfers disabled
pci 0000:01:00.0: unsupported PM cap regs version (7)
PCI: bus1: Fast back to back transfers enabled
PCI: bus2: Fast back to back transfers enabled
ti81xx_pcie: PCI scan done.
pci 0000:00:00.0: BAR 8: assigned [mem 0x20000000-0x200fffff]
pci 0000:01:00.0: BAR 0: assigned [mem 0x20000000-0x20003fff]
pci 0000:01:00.0: BAR 0: error updating (0x20000000 != 0xffffffff)
pci 0000:01:00.0: BAR 0: set to [mem 0x20000000-0x20003fff] (PCI address [0x20000000-0x20003fff])
pci 0000:01:00.0: PCI bridge to [bus 02-02]
pci 0000:01:00.0: bridge window [io disabled]
pci 0000:01:00.0: bridge window [mem disabled]
pci 0000:01:00.0: bridge window [mem pref disabled]
pci 0000:00:00.0: PCI bridge to [bus 01-02]
pci 0000:00:00.0: bridge window [io disabled]
pci 0000:00:00.0: bridge window [mem 0x20000000-0x200fffff]
pci 0000:00:00.0: bridge window [mem pref disabled]
pci 0000:00:00.0: Refused to change power state, currently in D3
bio: create slab <bio-0> at 0

...

But you're not going to believe this...

It wouldn't boot a second time. When I cycled power, the pcie driver got to the same line and then crashed the same way.

pcie: arch/arm/mach-omap2/pcie-ti81xx.c 784 reading CMD_STATUS
Unhandled fault: external abort on non-linefetch (0x1008) at 0xd7020004

Why would it boot once, but never again?

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis McLeod said:
It wouldn't boot a second time. When I cycled power, the pcie driver got to the same line and then crashed the same way.

When boot for the second time, have you verified your boot arguments are the same as the fisrt/successful time boot? Is it possible that the boot args to be wrong in the second time boot?

Regards

Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

I did verify bootargs, yes.

I find it so strange that altering the memory arguments can cause it to boot successfully, but only once.

I just got it to boot again, with the following bootargs:

[root@BWS5F:~]# lspci
00:00.0 Class 0604: 104c:b801
01:00.0 Class 0604: 10b5:8605
02:01.0 Class 0604: 10b5:8605
02:02.0 Class 0604: 10b5:8605
02:03.0 Class 0604: 10b5:8605
[root@BWS5F:~]# cat /proc/cmdline
console=ttyO0,115200n8 earlyprintk mem=364M@0x80000000 mem=93M@0x98000000 mem=320M@0x9fc00000 root=/dev/mtdblock3 rootfstype=jffs2 noinitrd ip=off vmalloc=500M

but as soon as I reboot or power cycle, it crashes with the unhandled fault. This is so confusing..

BR,

Dennis

0 Dennis McLeod over 11 years ago in reply to Dennis McLeod

Intellectual 925 points

note: we are not using hdmi/video/dsp at all.

0 Dennis McLeod over 11 years ago in reply to Dennis McLeod

Intellectual 925 points

Update:

I saw in ti81xx_pcie_setup there was a late call to hook_fault_code to register ti81xx_pcie_fault().

This happens at the end of ti81xx_pcie_setup() . I moved the registration of the fault handler to BEFORE the function tries to initiate link training (and where it was causing the crash).

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS);
printk("pcie: %s %d read CMD_STATUS, value = %x\n", __FILE__, __LINE__, val32);

printk("pcie: calling hook_fault_code just before LTSSM_EN\n");
hook_fault_code(8, ti81xx_pcie_fault, SIGBUS, 0,
"Precise External Abort on non-linefetch");

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS);
printk("pcie: %s %d read CMD_STATUS, value = %x\n", __FILE__, __LINE__, val32);

/*
* Initiate Link Training. We will delay for L0 as specified by
* standard, but will still proceed and return success irrespective of
* L0 status as this will be handled by explicit L0 state checks during
* enumeration.
*/
writel(LTSSM_EN_VAL | readl(reg_virt + CMD_STATUS),
reg_virt + CMD_STATUS);

/* 100ms */
msleep(200);

printk("pcie: %s %d reading CMD_STATUS\n", __FILE__, __LINE__);
val32 = readl(reg_virt + CMD_STATUS);
printk("pcie: %s %d read CMD_STATUS, value = %x\n", __FILE__, __LINE__, val32);

....

/*
* PCIe access errors that result into OCP errors are caught by ARM as
* "External aborts" (Precise).
*/
// printk("pcie: hooking fault at normal location\n");
// hook_fault_code(8, ti81xx_pcie_fault, SIGBUS, 0,
// "Precise External Abort on non-linefetch");

The ti81xx_pcie_fault function gets called many many times after the LTSSM_EN bit is set. At least now it doesn't crash, but it obviously also doesn't enumerate any PCI devices.

The fact that this fault code is hooked in the driver already, that must be a sign that the driver author(s) must have also experienced this problem? Was there ever any answer as to why the faults are happening?

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis McLeod said:
The SERDES_STATUS register in the PCIESS application registers block is undocumented. Can anyone tell us what the bits represent?

The 32-bits of this registers are mapped to the 16-bit STS_TX and STS_RX buses of the PCIe PHY. These registers/buses are not for customer use, but are occasionally useful for debugging with the factory. The description of the bits should be "Reserved" in TRM:

SERDES_STATUS[31:16] STS_TX
SERDES_STATUS[15:0] STS_RX

Regards

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis McLeod said:
note: we are not using hdmi/video/dsp at all.

You can try to change the EZSDK map thus providing more memory to the linux kernel. See the below E2E threads for more info:

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/304274.aspx

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/294217.aspx

Regards,
Pavel

0 Pavel Botev over 11 years ago in reply to Pavel Botev

TI__Guru**** 170625 points

Some notes:

Can you dump the DEBUG0 and DEBUG1 registers? addr: 0x51001728 and 0x5100172c. These registers show the status of the link during the training.

How are you connecting the EP. Have you checked the cable modification/clocking scheme wiki page http://processors.wiki.ti.com/index.php/DM816x_C6A816x_AM389x_PCIe_Clocking_Schemes these are applicable for DM814X too?

There is read access error so there is abort - this cannot be avoided.

The application must know the region it tries to access is valid - else be prepared for abort.

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/240799.aspx

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/154734.aspx

http://e2e.ti.com/support/embedded/linux/f/354/t/141733.aspx

http://e2e.ti.com/support/embedded/linux/f/354/t/162993.aspx

BR
Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

Pavel Botev said:

Some notes:

Can you dump the DEBUG0 and DEBUG1 registers? addr: 0x51001728 and 0x5100172c. These registers show the status of the link during the training.

How are you connecting the EP. Have you checked the cable modification/clocking scheme wiki page http://processors.wiki.ti.com/index.php/DM816x_C6A816x_AM389x_PCIe_Clocking_Schemes these are applicable for DM814X too?

There is read access error so there is abort - this cannot be avoided.

The application must know the region it tries to access is valid - else be prepared for abort.

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/240799.aspx

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/154734.aspx

http://e2e.ti.com/support/embedded/linux/f/354/t/141733.aspx

http://e2e.ti.com/support/embedded/linux/f/354/t/162993.aspx

BR
Pavel

Hi Pavel,

I am reviewing everything you posted. Thank you.

Some questions about modifying the memory map:

1) according to the thread here http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/304274.aspx and here http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/294217.aspx , there was instruction on which elements from the memory map can be modified to 0MB size. My system has 1GB ddr, so the default map http://processors.wiki.ti.com/index.php/EZSDK_Memory_Map#Memory_Map_in_the_current_EZSDK_.28version_5.02_onwards.29 should be what my system uses currently. Our product does not use DSP or any video at all. Will it be ok if I make the following changes?

- CMEM = 0MB

- DSP_ALG_HEAP = 0MB

- IPC_SR_HOST_DSP = 0MB

- DSP_DATA = 0MB

- IPC_SR_MC_HDVICP2_HDVPSS = 0MB

- MC_HDVPSS_INT_HEAP_CACHED = 0MB

- MC_HDVICP2_INT_HEAP_CACHED = 0MB

and

- IPC_SR_FRAME_BUFFERS = 0MB

2) Once I do that, bootargs would be mem=477M mem=508M@0x9fc00000 ?

3) the "firmware loader" and mm_host_util, where do they burn the bin file? Is there embedded flash in the AM387x ? Is it a "one time burn" or does it need to happen every boot?

4) Some items in the memory map (mentioned above) are not present in the sources board-support/media-controller-utils_2_05_00_17/src. For example, you mentioned changing IPC_SR_HOST_DSP and DSP_DATA but they do not exist in the sources. How would i claim those regions?

Thanks Pavel, for all your help!

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

Dennis,

Dennis McLeod said:
Our product does not use DSP or any video at all.

Does this mean you do not need Cortex-M3 ARM, HDVICP2 and HDVPSS cores? If yes, the easiest way is just to remove the hdvicp2 and hdvpss firmware auto loading.

targetfs/etc/init.d/load-hd-firmware.sh

case "$1" in
    start)
#      echo "Loading HDVICP2 Firmware"
#      prcm_config_app s
#      modprobe syslink
#      until [[ -e /dev/syslinkipc_ProcMgr && -e /dev/syslinkipc_ClientNotifyMgr ]]
#      do
#          sleep 0.5
#      done
#      firmware_loader $HDVICP2_ID /usr/share/ti/ti-media-controller-utils/dm814x_hdvicp.xem3 start
#      echo "Loading HDVPSS Firmware"
#      firmware_loader $HDVPSS_ID /usr/share/ti/ti-media-controller-utils/dm814x_hdvpss.xem3 start
#      modprobe vpss sbufaddr=0xBFB00000 mode=hdmi:1080p-60 i2c_mode=1
#      modprobe ti81xxfb vram=0:24M,1:16M,2:6M
#      configure_lcd
#      modprobe ti81xxhdmi
#      modprobe tlc59108
      ;;

Thus you will remove the firmware load, and you can pass more memory to the linux kernel.

Dennis McLeod said:

Will it be ok if I make the following changes?

- CMEM = 0MB

- DSP_ALG_HEAP = 0MB

- IPC_SR_HOST_DSP = 0MB

- DSP_DATA = 0MB

- IPC_SR_MC_HDVICP2_HDVPSS = 0MB

- MC_HDVPSS_INT_HEAP_CACHED = 0MB

- MC_HDVICP2_INT_HEAP_CACHED = 0MB

and

- IPC_SR_FRAME_BUFFERS = 0MB

I think this is OK.

Dennis McLeod said:
2) Once I do that, bootargs would be mem=477M mem=508M@0x9fc00000 ?

Correct.

Dennis McLeod said:
3) the "firmware loader" and mm_host_util, where do they burn the bin file? Is there embedded flash in the AM387x ? Is it a "one time burn" or does it need to happen every boot?

You should generate new bin file:

~/ti-ezsdk_dm814x-evm_5_05_02_00$make media-controller-utils

Then you should install the new bin file:

~/ti-ezsdk_dm814x-evm_5_05_02_00$make media-controller-utils_install

The console messages should point you where exactly the new bin file is installed.

Regards

0 Pavel Botev over 11 years ago in reply to Dennis McLeod

TI__Guru**** 170625 points

For IPC_SR_HOST_DSP, it is used only when remote codec engine is used:

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/p/160814/589538.aspx#589538

Regards,
Pavel

0 Pavel Botev over 11 years ago in reply to Pavel Botev

TI__Guru**** 170625 points

For DSP_DATA (0x99500000), it is used only when OMX and/or RPE is used:

ti-ezsdk_dm814x-evm_5_05_02_00/component-sources/omx_05_02_00_48/src/ti/omx/build/MemSegmentDefinition.xs

memory[2] = ["DSP",
{
          name: "DSP",
          base: 0x99500000,
          len: 0x00C00000,    //if you are planning to use OMX without DSP, you can change this to 0x0
          space: "code/data"
}];

ti-ezsdk_dm814x-evm_5_05_02_00/component-sources/rpe_1_00_01_13/examples/dm81xx/dspsubsys.xs

memory[2] = ["DSP",
{
          name: "DSP",
          base: 0x99500000,
          len: 0x00C00000,
          space: "code/data"
}];

Best regards,
Pavel

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

Pavel Botev said:
Can you dump the DEBUG0 and DEBUG1 registers? addr: 0x51001728 and 0x5100172c. These registers show the status of the link during the training.

I can't, actually. After setting the LTSSM_EN bit, readl() will cause the abort.

I am going to pursue the memory map changes, but I am pretty sure it isn't the cause of the problem. The driver calls pci_common_init in the probe() function, then it soon enters the ti81xx_pcie_setup() function to set up the RC. The enumeration of downstream devices doesn't happen until ti81xx_pcie_scan, much later. The problem happens in ti81xx_pcie_setup, so I think that means it is not likely a problem with the downstream device or it's BARs/resource allocation since we're not even getting past setting up the RC. Would you agree?

0 Dennis McLeod over 11 years ago in reply to Pavel Botev

Intellectual 925 points

I have more info that might help debug the problem. In the ti81xx_pcie_setup function, just before setting the LTSSM_EN bit I set the smart idle and smart standby to "off", I also still have the hook_fault_code moved up above there too, so it doesn't abort and crash the kernel.

/*
* PCIe access errors that result into OCP errors are caught by ARM as
* "External aborts" (Precise).
*/
printk("pcie: early fault hook register\n");
hook_fault_code(8, ti81xx_pcie_fault, SIGBUS, 0,
"Precise External Abort on non-linefetch");

printk("pcie: removing smart idle caps\n");

val32 = __raw_readl(reg_virt + CMD_STATUS);
val32 = (val32 & ~(0x0f << 8)) | (0x5 << 8); // set bits 11:8 to 0101 (no idle, no standby)
__raw_writel(val32, reg_virt + CMD_STATUS);

/*
* Initiate Link Training. We will delay for L0 as specified by
* standard, but will still proceed and return success irrespective of
* L0 status as this will be handled by explicit L0 state checks during
* enumeration.
*/
__raw_writel(LTSSM_EN_VAL | __raw_readl(reg_virt + CMD_STATUS),
reg_virt + CMD_STATUS);

The kernel always boots now, and maybe 50% of the time it finds the downstream PLX switch. Nothing changes between reboots, but yet it is only half the time that it "works". Here's some info from the boot log from one of the "successful" boot attempts:

ti81xx_pcie: Register base mapped @0xd7020000
pcie: early fault hook register
pcie: removing smart idle caps
ti81xx_pcie: Starting PCI scan...
PM: Adding info for No Bus:pci0000:00
PM: Adding info for No Bus:0000:00
pci 0000:00:00.0: [104c:b801] type 1 class 0x000604
PCI: bus0: Fast back to back transfers disabled
pci 0000:01:00.0: [10b5:8605] type 1 class 0x000604
pcie: fault hook, addr = d702203c
PCI: bus1: Fast back to back transfers enabled
PCI: bus2: Fast back to back transfers enabled
PM: Adding info for pci:0000:00:00.0
PM: Adding info for pci:0000:01:00.0
PM: Adding info for No Bus:0000:02
PM: Adding info for No Bus:0000:01
pci 0000:00:00.0: BAR 8: assigned [mem 0x20000000-0x201fffff]
pci 0000:00:00.0: BAR 9: assigned [mem 0x20200000-0x203fffff pref]
pci 0000:00:00.0: BAR 7: can't assign io (size 0x1000)
pci 0000:01:00.0: BAR 8: assigned [mem 0x20000000-0x201fffff]
pci 0000:01:00.0: BAR 9: assigned [mem 0x20200000-0x203fffff pref]
pci 0000:01:00.0: BAR 7: can't assign io (size 0x1000)
pci 0000:01:00.0: PCI bridge to [bus 02-02]
pci 0000:01:00.0: bridge window [io disabled]
pci 0000:01:00.0: bridge window [mem 0x20000000-0x201fffff]
pci 0000:01:00.0: bridge window [mem 0x20200000-0x203fffff pref]
pci 0000:00:00.0: PCI bridge to [bus 01-02]
pci 0000:00:00.0: bridge window [io disabled]
pci 0000:00:00.0: bridge window [mem 0x20000000-0x201fffff]
pci 0000:00:00.0: bridge window [mem 0x20200000-0x203fffff pref]
pci 0000:00:00.0: Refused to change power state, currently in D3
pci_bus 0000:00: resource 0 [mem 0x20000000-0x2fffffff]
pci_bus 0000:00: resource 1 [io 0x40000000-0x402fffff]
pci_bus 0000:01: resource 1 [mem 0x20000000-0x201fffff]
pci_bus 0000:01: resource 2 [mem 0x20200000-0x203fffff pref]
pci_bus 0000:02: resource 1 [mem 0x20000000-0x201fffff]
pci_bus 0000:02: resource 2 [mem 0x20200000-0x203fffff pref]
pci 0000:00:00.0: BAR 7: can't assign io (size 0x1000)
pci 0000:01:00.0: BAR 7: can't assign io (size 0x1000)
PM: Adding info for No Bus:default

lspci does show both devices. But oddly enough, even though this looks relatively successful, I still am not able to do "devmem2 0x51000004" without it printing the "

Unhandled fault: Precise External Abort on non-linefetch (0x1018) at 0x40077004
Bus error" message.

Any suggestions on what to look at next? Or what could be going wrong?

BR,

Dennis

0 Dennis McLeod over 11 years ago in reply to Dennis McLeod

Intellectual 925 points

After adjusting the pre-emphasis settings on both ends of the link (between am3874 and PLX 8605), we have been able to achieve link-up most of the time. It is still not reliable, but it's better than it ever has been.

I still have that fault-hook function registered early, that keeps the kernel from crashing.

I also created a script to run when the pciess locks up, this resets it (and thereafter restores access to the 0x5100xxxx window):

[root@am3874:~]# cat /bin/resetpci
#!/bin/sh

devmem 0x48180b10 32 0x09c
sleep 1

devmem 0x48180578 32 0
devmem 0x48180510 32 0
sleep 1

devmem 0x48180510 32 0x02
devmem 0x48180578 32 0x02
sleep 1

devmem 0x48180b10 32 0x001c

echo Done

I am curious about memory allotted during pcie enumeration though. The am3874 has an embedded pci-pci bridge, that has a BAR8 as :

cat /sys/bus/pci/devices/0000\:00\:00.0/resource
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000020000000 0x00000000200fffff 0x0000000000000200
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000

According to the L3 memory map, PCIE has a window 256MB in size.. from 0x20000000 to 0x2FFFFFFF. Why is the RC only using 0x20000000 to 0x200FFFFF ? 16MB

Although I don't have any endpoints, pci enumeration sees the 4 downstream ports of the PLX switch:

cat /sys/bus/pci/devices/0000\:01\:00.0/resource
0x0000000020000000 0x0000000020003FFF 0x0000000000040200
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000

So from that initial 16MB, the first enumerated device gets a 16K chunk. Seems strange the whole available memory window isn't used for the pool?

Processors

Processors forum

PCIESS hangs?