This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: PCI RC failing on non 32 bit aligned access to peripherals

Part Number: AM5728
Other Parts Discussed in Thread: TUSB7340

Tool/software: Linux

We are having a problem with non 32 bit aligned access to a PCI switch from our AM5728 RC.   We have a PEX 8606 underneath the RC and the switch header type is not getting picked up correctly.  I initially discussed this in e2e.ti.com/.../597740.

A summary illustrating the problem is:

debian@arm:~$ lspci -s 0000:01:00.0 -x

0000:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)

00: b5 10 06 86 40 01 10 00 ba 00 04 06 10 00 01 00

10: 00 00 20 20 00 00 00 00 01 02 07 00 f1 01 00 00

20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00

30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 01 00

 

Then do a byte access of the header type:

 

debian@arm:~$ sudo setpci -s 0001:02:09.0 e.b

00

 

But do a 32bit access:

 

debian@arm:~$ sudo setpci -s 0001:02:09.0 c.l

00010010

 

So PCI device probe fails as follows with the stock code:

 

[    1.850281] pci 0000:01:00.0: [10b5:8606] type 00 class 0x060400

[    1.850311] pci 0000:01:00.0: ignoring class 0x060400 (doesn't match header type 00)

 

We can modify the kernel driver code, but this operation seems fundamental and there has to be something else wrong.

Any ideas why the non aligned access is failing?

 

  • I've come up with a possible work around, but it is a real hack in the PCI driver.  I modified KERNEL/drivers/pci/access.c pci_bus_read_config_##size to force 32 bit aligned access using the following macro:

    #define PCI_OP_READ(size,type,len) \
    int pci_bus_read_config_##size \
    (struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
    { \
    int res; \
    unsigned long flags; \
    u32 data = 0; \
    if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
    raw_spin_lock_irqsave(&pci_lock, flags); \
    if ((pos & 3))                         \
        {                                                   \
       res = bus->ops->read(bus, devfn, pos & 0xfffffffc, 4, &data); \
       data >>= ((pos & 0x3) * 8);       \
        }                                     \
        else                                  \
    {                                     \
        res = bus->ops->read(bus, devfn, pos, len, &data); \
    }                                     \
    *value = (type)data;    \
    raw_spin_unlock_irqrestore(&pci_lock, flags); \
    return res;            \
    }
    This resolved the access problem and the PCI probe now discovers the switch and switch ports, but causes a driver crash.  
    Initially I though it was due to the work around, but eventually decided to investigate the fault and I found references to the problem in a number of places.  It looks like pci-dra7xx.c has the same bug.  The error that occurs with the work around is:
    [    1.863698] error: hwirq 0x4 is too large for dummy

    coming up during dra7 initialization and leads to a driver crash with stack trace back in dmesg.

    The problems I found related to this is that it is due to PCI interrupts being numbered from 1, not zero, so the irq_domain setup has to use +1.  I found a number of other PCI drivers where this problem had been encountered and corrected.

    The specific area in our pci-dra7xx.c code appears to have the same error, so I changed:

    dra7xx->irq_domain = irq_domain_add_linear(pcie_intc_node, 4,
          &intx_domain_ops, pp);
    if (!dra7xx->irq_domain) {
    dev_err(dev, "Failed to get a INTx IRQ domain\n");
    return PTR_ERR(dra7xx->irq_domain);
    }
    return 0;
     
    to:
     
    // PCI interrupt lines start at 1 not zero so need to add 1
    dra7xx->irq_domain = irq_domain_add_linear(pcie_intc_node, 4 + 1,
          &intx_domain_ops, pp);
    if (!dra7xx->irq_domain) {
    dev_err(dev, "Failed to get a INTx IRQ domain\n");
    return PTR_ERR(dra7xx->irq_domain);
    }
    return 0;
    }
    and it fixed the crash.  However, this is a real hack as it is in the PCI driver, so it will help us to continue development, but is not a proper fix.
    Any suggestion on why this is occurring would be greatly appreciated.

  • The workaround provides a successful probe of the switch, however, setpci still fails:

    debian@arm:~$ lspci -s 0000:01:00.0 -x
    0000:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
    00: b5 10 06 86 40 01 10 00 ba 00 04 06 10 00 01 00
    10: 00 00 20 20 00 00 00 00 01 02 07 00 f1 01 00 00
    20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
    30: 00 00 00 00 40 00 00 00 00 00 00 00 d8 01 01 00
    debian@arm:~$ setpci -s 0000:01:00.0 c.l
    00010010
    debian@arm:~$ setpci -s 0000:01:00.0 e.b
    00
  • Hi Chris,

    Thanks for sharing the solution.

    Regards,
    Pavel
  • This problem isn't solved, it is only a hack work around. Why are we having to hack the Linux PCI driver to do this?
  • As a further example of the continuing problems, we have a TI TUSB7340 PCI hub EP and it has the same kind of access problems:

    debian@arm:~$ sudo setpci -s 0001:07:00.0 e.b
    00
    debian@arm:~$ sudo setpci -s 0001:07:00.0 c.l
    00000010

    There is something wrong with this.
  • Hi Chris,

    Do you use AM572x custom board? Do you use ti-processor-sdk-linux-am57xx-evm-03.03.00.04?

    Regards,
    Pavel
  • The board design we are using is based on the AM572x EVM.

    We are using the TI linux kernel:

    root@arm:~# uname -a
    Linux arm 4.4.49-ti-r89 #22 SMP Mon Jun 5 17:22:45 EDT 2017 armv7l GNU/Linux

    I've been using the TI AM572x IDK for PCI examples and device tree setup.
  • Chris Welch66 said:
    root@arm:~# uname -a
    Linux arm 4.4.49-ti-r89 #22 SMP Mon Jun 5 17:22:45 EDT 2017 armv7l GNU/Linux

    This kernel is not official and not tested. Please have a try with the one coming with latest PSDK 3.03 (kernel 4.4.41)

    http://processors.wiki.ti.com/index.php/Processor_SDK_Linux_Kernel_Release_Notes

    Regards,
    Pavel

  • I'm working on that now, should have the results available later today.

  • I've reproduced the problem with the supported TI SDK ti-processor-sdk-linux-am57xx-evm-03.03.00.04

    To recap, we have an AM5728 board that is based on the AM5728 EVM design.  We've split the PCI into two RCs of one lane each.

    There is a PEX 8606 PCI switch under each PCI RC.

    root@am57xx-evm:~# uname -a
    Linux am57xx-evm 4.4.41-gf9f6f0db2d #6 SMP PREEMPT Wed Jun 7 11:00:14 EDT 2017 armv7l GNU/Linux

    There is a hardware problem with the first PEX 8606 on this particular board so it doesn't show up under the first RC in this example.  

    root@am57xx-evm:~# lspci
    0000:00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
    0001:00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
    0001:01:00.0 Non-VGA unclassified device: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)

    As you can see, the listing doesn't show the ports on the second PEX 8606 which illustrates the problem.

    Here is the dmesg report highlighting the failure:

    [ 0.637531] dra7-pcie 51800000.pcie: PCI host bridge to bus 0001:00
    [ 0.637543] pci_bus 0001:00: root bus resource [bus 00-ff]
    [ 0.637555] pci_bus 0001:00: root bus resource [io 0x10000-0x1ffff] (bus address [0x0000-0xffff])
    [ 0.637564] pci_bus 0001:00: root bus resource [mem 0x30013000-0x3fffffff]
    [ 0.637594] pci 0001:00:00.0: [104c:8888] type 01 class 0x060400
    [ 0.637632] pci 0001:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]
    [ 0.637653] pci 0001:00:00.0: reg 0x14: [mem 0x00000000-0x0000ffff]
    [ 0.637762] pci 0001:00:00.0: supports D1
    [ 0.637772] pci 0001:00:00.0: PME# supported from D0 D1 D3hot
    [ 0.638001] PCI: bus0: Fast back to back transfers disabled
    [ 0.638156] pci 0001:01:00.0: [10b5:8606] type 00 class 0x060400
    [ 0.638183] pci 0001:01:00.0: ignoring class 0x060400 (doesn't match header type 00)

    The header type is obtained using a byte access to the PEX 8606 header which always returns zero because it is not aligned to a 32 bit address.  The type is actually 1 but the only way you can get the correct value is to use a 32 bit aligned access.

    This illustrates the problem using setpci:

    root@am57xx-evm:~# setpci -s 0001:01:00.0 e.b
    00

    However, a 32 bit access shows the correct header type:

    root@am57xx-evm:~# setpci -s 0001:01:00.0 c.l
    00010010

    Using a hack to Linux PCI to force aligned access (hack previously posted), I can get some limited operation:

    root@am57xx-evm:~$ lspci
    0000:00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
    0001:00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
    0001:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
    0001:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
    0001:02:04.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
    0001:02:05.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
    0001:02:07.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
    0001:02:09.0 PCI bridge: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
    0001:07:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)

    Byte access to the switch still fails:

    root@am57xx-evm::~$ setpci -s 0001:01:00.0 e.b
    00
    root@am57xx-evm::~$ setpci -s 0001:01:00.0 c.l
    00010010

    However the TUSB7340 endpoint doesn't have the problem:

    root@am57xx-evm:~$ lspci -s 0001:07:00.0 -x
    0001:07:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
    00: 4c 10 41 82 46 05 10 00 02 30 03 0c 10 00 00 00
    10: 04 00 20 30 00 00 00 00 04 00 21 30 00 00 00 00
    20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    30: 00 00 00 00 40 00 00 00 00 00 00 00 fa 01 00 00

    root@am57xx-evm:~$ setpci -s 0001:07:00.0 0.b
    4c
    root@am57xx-evm:~$ setpci -s 0001:07:00.0 1.b
    10
    root@am57xx-evm:~$ setpci -s 0001:07:00.0 2.b
    41
    root@am57xx-evm:~$ setpci -s 0001:07:00.0 3.b
    82
    root@am57xx-evm:~$ setpci -s 0001:07:00.0 4.b
    46

    Any suggestions would be appreciated.  Thanks.

  • Chris,

    Please check AM572x silicon errata, i870 PCIe Unaligned Read Access Issue

    www.ti.com/.../sprz429k.pdf

    Regards,
    Pavel
  • Our understanding is that that errata is only applicable to EP operation, not RC.  Futhermore there is a work around fix in the pci-da7xx.c (again only applicable to EPs):

    /*
    * dra7xx_pcie_ep_legacy_mode: workaround for AM572x/AM571x Errata i870
    * @dra7xx: the dra7xx device where the workaround should be applied
    *
    * Access to the PCIe slave port that are not 32-bit aligned will result
    * in incorrect mapping to TLP Address and Byte enable fields. Therefore,
    * byte and half-word accesses are not possible to byte offset 0x1, 0x2, or
    * 0x3.
    *
    * To avoid this issue set PCIE_SS1_AXI2OCP_LEGACY_MODE_ENABLE to 1.
    */

    The fix is activated by setting the syscon-legacy-mode in the EP device tree settings as was found in this TI presentation:

    www.linuxplumbersconf.org/.../ep framework.pdf

    Given that this is only applicable to end point operation and we are using the PCI as RCs, and the TI configuration example for this problem is only in the context of an EP it doesn't appear that this is the problem we are experiencing.

    Any other suggestions?

  • Chris,

    From what I understand, when using TI PSDK 3.03, you do not have 8-bit access problem with TUSB7340 EP, you have problem only with PEX 8606 EP, is that correct? In that case, seems like the problem is not in PCIe RC driver but in PEX8606 (3rd Party PCIe End Point). So might check with PEX8606 support team.

    Regards,
    Pavel
  • That is correct, however, we have a similar based design using an i.MX6 ARM based processor and it doesn't have this problem.  The PEX 8606 and everything else can be accessed with non aligned 8 bit access without any driver modifications or issues.

    This implies something in the TI eco system is potentially causing the problem.

  • For anyone else running into this and to wrap up the issue, the root cause is i870. Even though it only documents it as an EP issue, doesn't state that the RC is involved and the patches produced are only for EP usage, it is also an issue with RC operation.

    I reworked the i870 patch so that it is activated with RC device tree configurations using the syscon-legacy-mode setting: syscon-legacy-mode = <&scm_conf1 0x14 3>; in the rc setup.

    This differs from the original TI code by setting both bits as documented in i870 where as the original TI fix only sets the PCIE_SS1_AXI2OCP_LEGACY_MODE_ENABLE bit as they use syscon-legacy-mode = <&scm_conf1 0x14 2>;

    Here is another posting regarding this, hopefully the errata document update will occur:

    e2e.ti.com/.../2223582

    Another heads up is the i870 patch doesn't appear to be in the 4.12 or 4.13 kernels yet.