This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PCIE issue in C6A816xEVM board.

Hello,

I found the pcie interface can't work normally now, which can't detect the device(e1000e nic which works well in other board).

Run lspci can't get anything and I got below logs. Any hints? thanks a lot!

PS: I tried TI816XPSP_04.00.00.07 and TI816XPSP_04.00.00.08, both have same problem.

ti816x_pcie: Invoking PCI BIOS...
ti816x_pcie: Setting up Host Controller...
ti816x_pcie: Register base mapped @0xf0808000
ti816x_pcie: Setting outbound translation for 0x20000000-0x2fffffff
ti816x_pcie: Starting PCI scan...
pci_bus 0000:00: scanning bus
ti816x_pcie: Reading config[0] for device 0000:00:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:01:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:02:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:03:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:04:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:05:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:06:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:07:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:08:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:09:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:0a:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:0b:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:0c:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:0d:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:0e:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:0f:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:10:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:11:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:12:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:13:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:14:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:15:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:16:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:17:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:18:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:19:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:1a:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:1b:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:1c:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:1d:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:1e:00..
failed. No link/device.
ti816x_pcie: Reading config[0] for device 0000:1f:00..
failed. No link/device.
pci_bus 0000:00: fixups for bus
PCI: bus0: Fast back to back transfers enabled
pci_bus 0000:00: bus scan returning with max=00
pci_bus 0000:00: resource 0 [mem 0x20000000-0x2fffffff]
pci_bus 0000:00: resource 1 [io  0x40000000-0x402fffff]

Best regards

Gary

  • Gary,

    Can you let me know what is the setting of SW5-1 (PCIe RST) switch on the EVM? Please make sure it is set to OFF.

       Hemant

  • Sorry, I clicked "Verified Answer" by mistake.

    I have tried to set SW5 1-4 on both direction, but also failed to detect e1000e NIC.

    BTW: Which side means off? close to R195 or not?

    Best regards,

    Gary

  • Gary,

    OFF is towards the PCIe slot, that is farther to R195. The switch bank has written "ON" (small font) on it on the opposite side.

    Can you please provide the register value from following address (keep the SW5-1 to off) after you get Linux shell prompt? :

    0x51001728

    (you can build and use devmem2 utility to read this register).

     

    Also please provide the details about the NIC card you are using, such as:

    1) Link width - x1 or x2?

    2) Speed - GEN1?

    3) Connector configuration - x1/x2?

       Hemant

     

     

     

  • Hi Hemant,

    I kept SW5 with off, and it seems there is error when read the specified addr.

    root@c6a816x-evm:~# devmem2

    Usage:  devmem2 { address } [ type [ data ] ]
            address : memory address to act upon
            type    : access operation type : [b]yte, [h]alfword, [w]ord
            data    : data to be written

    root@c6a816x-evm:~# devmem2 0x51001728 b
    /dev/mem opened.
    Unhandled fault: Precise External Abort on non-linefetch (0x1018) at 0x40020728
    Memory mapped at address 0x40020000.
    Bus error

    The NIC is e1000e, unfortunately I don't have the detail info about it, but it has PCI x 1 interface with REVA00,

    and 816xEVM board should have PCI x 2 slot.

    Best regards,

    Gary

  • Sorry, after tried another kernel(not pre-build image located in sd card) which is built from arago tree, I got below logs.

    The value of 0x51001728 always changed, not stable.

    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x7512
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0xA612
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x9712
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x3003911

  • Gary,

    0x11 is the value we want for lower 5 bits so 0x3003911 seems correct, does the LS 5 bit value remain constant after the last read you provided (other bits will keep changing)?

    If yes, then can you try forcing re-enumeration using:

    echo 1 > /sys/bus/pci/rescan

    Then see if device is detected, check 'lspci' output. If it is, then try loading the e1000 driver.

     

    Thanks.

       Hemant

  • Hi Hemant,

    The  last five bit value isn't constant, which always be 0x12 or 0x11 with e1000e NIC in slot, and most times is 0x12.

    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x5912
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x9512
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x5F12
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0xBA12
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x4312
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x300EC11
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x3003811
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x3005B11
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x9212
    root@localhost:/root> ./devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x40003000.
    Value at address 0x51001728 (0x40003728): 0x5612

    If pull out e1000e NIC, the value should be 0x00 or 0x01, most of the time is 0x00.

    What is the meaning of the last end bit?  I can't find the detail decription in RM, thanks.

    Best regards,

    Gary

     

  • Got it, 0x11 should be DEBUG0.LTSSM_STATE, the value indicates the up status of the Link.

    Now I doubt there is hardware issue about pcie interface, because I got pci info once after run lspci which shows pci bridge and e1000e infos,

    and got 0x11 stably.

    However only once, after reboot board, DEBUG0.LTSSM_STATE became wrong again and pcie can't work, :(

  • If the syslink driver loads with the current ezsdk version, it turns off PCIe.

    See this thread for more info (the linked post in particular):

    http://e2e.ti.com:80/support/dsp/integra_dsparm/f/625/p/98131/350156.aspx#350156

    So, either disable loading of the video drivers, or apply the patch and see if that solves the problem.

    Best regards,

    B.J.

  • Gary,

    The link seems to go to L0s state for some reason, ideally we would want it to remain in L0 state (0x11). It would help if you could provide me the details such as card make etc.

    Since you mentioned at least once it worked fine, can you try following (avoid loading any kernel modules, even the e1000 driver)?

    1) Force link training once you get Linux prompt. This can be done by using following sequence:

    devmem2 0x51000004 w 0xa00

    sleep 1

    devmem2 0x51000004 w 0xa07

    # Now read LTSSM state (dump a few times)

    devmem2 0x51001728

    2) If you don't get L0 with above sequence, try following sequence to reset PCIe module and initiate link:

    devmem2 0x48180b10 w 0xff

    sleep 1

    devmem2 0x48180b10 w 0x7f

    sleep 1

    devmem2 0x51000004 w 0xa07

    # Now read LTSSM state (dump a few times)

    devmem2 0x51001728

     

    Please note that, in either case (if the above works or not) we need to find details about your card such as whether it is using clock from slot (most likely yes) and then see if we need to tune some timing to make it detected at boot time itself.

    Thanks.

       Hemant

  • Hi B.J.,

    Thanks, I used the image built from arago tree not pre-build image, and don't load syslink module.

    Hi Hemant,

    Thanks for your patient advice.

    I have tried your instructions, unfortunately failed. I suspected the issue is caused from PCIE bridge, lspci can

    show the bridge info even if no pcie device exist, right? But pcie can't detect  pcie bridge which should be device 0000:00:00.

    ti816x_pcie: Invoking PCI BIOS...
    ti816x_pcie: Setting up Host Controller...
    ti816x_pcie: Register base mapped @0xca808000
    ti816x_pcie: Setting outbound translation for 0x20000000-0x2fffffff
    ti816x_pcie: Starting PCI scan...
    pci_bus 0000:00: scanning bus
    ti816x_pcie: Reading config[0] for device 0000:00:00..
    failed. No link/device.

     

    Best regards,

    Gary

     

     

     

     

     

     

  • Gary,

    The PCIe configuration on C6A816x is skipped if there is no link established with the peer during enumeration, which is actually happening in you case. Thus, lspci won't show any device (even the 0:0.0).

    Summary of issue you are facing is (correct me if wrong):

    1) Generally, link never reach L0 state and remains in power saving L0S

    2) Even in cases when link would go to L0, it may immediately go to L0S making device inaccessible

    3) On rare occasions, it can go to L0 and then the card would work fine

    Can you do following?

    1) Try another PCIe card and see if it gets detected?

    2) Provide card details - or let me know how many pins you see on the connector? Is it an x1 connector?

    Thanks.

       Hemant

  • Hi Hemant,

    "On rare occasions, it can go to L0 and then the card would work fine", No, even lspci can get the device info,

    however the e1000e nic card also can't work well,  because there are some error infos when set ip addr thru dhcpclient.

    With your instructions, I tried with another pcie nic card(Dlink sky2), which is an x1 connector and has 18 pins(which is same as e1000e nic card).

    Now sky2 card can be detected more times than e1000e, but it also failed with dhclient, pls see below logs.

    root@localhost:/root> lspci
    00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
    01:00.0 Ethernet controller: D-Link System Inc DGE-560T PCI Express Gigabit Ethernet Adapter (rev 13)
    root@localhost:/root> ifconfig -a
    eth0      Link encap:Ethernet  HWaddr 00:21:91:19:96:11
              BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:0 (0.0 b)  TX bytes:590 (590.0 b)
              Interrupt:48
    eth1      Link encap:Ethernet  HWaddr 64:7B:D4:11:43:D2
              inet addr:128.224.163.209  Bcast:128.224.163.255  Mask:255.255.254.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:13045 errors:0 dropped:0 overruns:0 frame:0
              TX packets:7400 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:12411899 (11.8 MiB)  TX bytes:1507212 (1.4 MiB)
              Interrupt:40 Base address:0x8000
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:8 errors:0 dropped:0 overruns:0 frame:0
              TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:560 (560.0 b)  TX bytes:560 (560.0 b)
    root@localhost:/root> dhclient eth0
    Unknown HZ value! (88) Assume 100.

    Best regards,

    Gary

     

     


  • I got lots of "receiver hang detected" when run "dhclient eth3", pls see following.

    sky2 0000:01:00.0: eth3: enabling interface
    ADDRCONF(NETDEV_UP): eth3: link is not ready
    sky2 0000:01:00.0: eth3: Link is up at 100 Mbps, full duplex, flow control both
    ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready
    sky2 0000:01:00.0: eth3: hung mac 0:103 fifo 103 (0:103)
    sky2 0000:01:00.0: eth3: receiver hang detected
    sky2 0000:01:00.0: eth3: Link is up at 100 Mbps, full duplex, flow control both
    sky2 0000:01:00.0: eth3: hung mac 0:70 fifo 194 (0:194)
    sky2 0000:01:00.0: eth3: receiver hang detected
    sky2 0000:01:00.0: eth3: Link is up at 100 Mbps, full duplex, flow control both
    sky2 0000:01:00.0: eth3: hung mac 0:40 fifo 40 (0:40)
    sky2 0000:01:00.0: eth3: receiver hang detected
    sky2 0000:01:00.0: eth3: Link is up at 100 Mbps, full duplex, flow control both
    sky2 0000:01:00.0: eth3: hung mac 0:31 fifo 31 (0:31)
    sky2 0000:01:00.0: eth3: receiver hang detected
    sky2 0000:01:00.0: eth3: Link is up at 100 Mbps, full duplex, flow control both
    sky2 0000:01:00.0: eth3: hung mac 0:20 fifo 20 (0:20)
    sky2 0000:01:00.0: eth3: receiver hang detected
    eth3: no IPv6 routers present
    sky2 0000:01:00.0: eth3: Link is up at 100 Mbps, full duplex, flow control both
    sky2 0000:01:00.0: eth3: hung mac 0:71 fifo 71 (0:71)
    sky2 0000:01:00.0: eth3: receiver hang detected

  • In fact, receiver  hang has been detected when kernel boot as follow log shows

    ti816x_pcie: Handling MSI irq 49
    sky2 0000:01:00.0: error interrupt status=0xc0000000
    sky2 0000:01:00.0: PCI hardware error (0x2010)
    sky2 0000:01:00.0: eth0: hung mac 0:53 fifo 53 (0:53)
    sky2 0000:01:00.0: eth0: receiver hang detected

    And I got below errors occasionally when run dhclient.

    sky2 0000:01:00.0: error interrupt status=0x80000000
    sky2 0000:01:00.0: eth0: ram data read parity error
    sky2 0000:01:00.0: error interrupt status=0x80000000
    sky2 0000:01:00.0: eth0: ram data read parity error

     

  • Looks like the driver uses MSI interrupts by default while the RC driver doesn't support them yet (till last PSP release).

    Try one of the following:

    1) Apply the patch form here which enables MSI handling support OR

    2) Disable MSI in the sky2 driver by passing disable_msi=1 as part of module parameter while inserting the module

       Hemant

  • Any update?

    Thanks.

       Hemant

  • Sorry for later reply, I am stuck to other stuffs these days.

    Now, I applied the patch as below.

    -bash-3.2$ git diff arch/arm/mach-omap2/pcie-ti816x.c

    diff --git a/arch/arm/mach-omap2/pcie-ti816x.c b/arch/arm/mach-omap2/pcie-ti816x.c
    index a9f556e..ed1ce16 100644
    --- a/arch/arm/mach-omap2/pcie-ti816x.c
    +++ b/arch/arm/mach-omap2/pcie-ti816x.c
    @@ -505,14 +505,15 @@ int arch_setup_msi_irq(struct pci_dev *pdev, struct msi_desc *desc)
                            msg.address_hi = 0;
                            msg.address_lo = reg_phys + MSI_IRQ;
     
    -                       pr_debug(DRIVER_NAME ": MSI %d @%#x:%#x\n",
    +                       pr_debug(DRIVER_NAME ": MSI %d @%#x:%#x, irq = %d\n",
                                            msg.data, msg.address_hi,
    -                                       msg.address_lo);
    +                                       msg.address_lo, irq);
     
                            write_msi_msg(irq, &msg);
     
                            set_irq_chip_and_handler(irq, &ti816x_msi_chip,
                                                    handle_level_irq);
    +                       set_irq_flags(irq, IRQF_VALID);
                    }
            }

    after that I inserted sky2 module under two situations(with add

    "disable_msi=1" or without the parameter), however still no luck

    for me, both situations have same logs.

    root@localhost:/root> insmod sky2.ko disable_msi=1 (insmod sky2.ko)
    PCI: enabling device 0000:01:00.0 (0140 -> 0142)
    root@localhost:/root> dmesg|grep sky
    sky2: driver version 1.27
    sky2 0000:01:00.0: enabling bus mastering
    sky2 0000:01:00.0: Yukon-2 EC chip revision 2
    sky2 0000:01:00.0: eth1: addr 00:21:91:19:96:11
    root@localhost:/root> dhclient eth1
    sky2 0000:01:00.0: error interrupt status=0x80000000
    sky2 0000:01:00.0: PCI hardware error (0x2010)

    root@localhost:/root> lspci
    00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
    01:00.0 Ethernet controller: D-Link System Inc DGE-560T PCI Express Gigabit Ethernet Adapter (rev 13)

    And this PCIE NIC still can't work.

    ps: due to other projects, maybe I can't respond in time.

    Regards,

    Gary