This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

XIO2001: PC freeze

Other Parts Discussed in Thread: XIO2001

 

Hello,

 

we use the XIO2001 to build a PCIe card and connect the bridge to a PCI core. Using DMA we copy data from the card to the RAM of the PC. About 5 minutes after starting the operation, the PC freezes. No dump file is created. Our PCI traffic looks normal until that point.

 

System:

Windows 7 Embedded Standard 64 Bit

Intel Core i7 610 (Arrandale)

Chipset Ibex Peak (QM57), supports PCIe 2.0

 

We found 2 workarounds:

-Disabling PCIe ASPM completely in the energy options.

-Disabling PCIe ASPM for the XIO2001 in Windows registry

 

We tried several combinations of PCIe ASPM L0s/L1 activation and L0s/L1 latencies, but no effect. (According to PCIe 2.0 spec, L0s may not be disabled.)

 

I saw the earlier post in the forum, "Does the XIO2000A work in PCIe 2.0 slots?"

It is mentioned there that a chipset driver update might fix some PCIe issue, but it didn't work for me.

 

Disabling ASPM is not a clean option.

 

  • Hello Christian,

     

    Is your system going suspend before the error occurs?

    Can you share your schematics?

    Regards.

  • Check that the N_FTS_ASYNC_CLK and N_FTS_COMMON_CLK values in the Control and Diagnostic Register 2 are set to the correct value. I have seen ASPM fail if these values were incorrectly set to 0. I recommend that you use the default values specified in the datasheet.

    If that doesn't fix the problem, then you will need to provide more info concerning the problem. A PCIe analyzer dump at the time of failure would help a lot.

  • Thanks for the reply. Control and Diagnostic Register 2 has the default value 0x32142000.

     

    The schematics including the relevant parts is attached.

    PCIe_Extract.pdf
  • The schematics did not reveal any problems. At this point it is difficult to tell if the problem is on the PCIe side or the PCI side. Try monitoring the Correctable Error Status Register in the bridge after you enable ASPM. The only bit in that register that may be set is the Advisory Non-fatal Error Status bit, if any of the other bits are getting set then that may be an indication of a signal integrity problem on PCIe.

  •  

    Some more info:

     

    In the Windows driver, the interrupt handler is not called and no PNP/Power IRP is received bevore the freeze.

    The PC stays in S0 the whole time. Manually activating the PC sleep (S3) or hibernation (S4) works normally.

    The freeze happens only when data is transmitted over PCIe, not when it is idle. Freeze seems to come faster when traffic is higher, e.g. 5 MByte/s.

    Other PCIe hardware like graphics card and network controller in that PC don't have a problem.

     

    Is there a way to check whether the XIO2001 is currently in L0, L0s or L1?

  • No there is not. Does the problem occur when only L1 is enabled, or does it fail with just L0s enabled? Did you try a different computer? When it comes to ASPM a lot of cards will claim support, but they don't actually go in an out of L0s or L1 after it has been enabled, do you know for a fact that these other cards really support L0s and L1? I'm guessing you don't have a PCIe analyzer? I'm still leaning towards a possible signal integrity problem that results in the PCIe link being lost and the bridge is getting reset causing data from the PCI bus to stop.

  • PC freezes with L0s-only and with L0s/L1 enabled in Windows energy options. The card works normally in other PCs, even when they have an Ibex Peak chipset and an Intel core i processor. I don't know whether the other PCIe devices really support ASPM. We don't have a PCIe analyzer available. One more thing: I installed Windows 7 Ultimate on that PC, and it still freezes.

  • Did you try monitoring the Correctable Error Status Register in the bridge after you enable L0s? Are you enabling L0s on the chipset and on the bridge or just on the bridge? If you're enabling L0s on both, try enabling just one then the other. This will help determine if the problem is on the receiver or transmitter. 

  • I didn't take a look in the Correctable Error Status Reg yet. How can I disable ASPM only for the chipset? Can I access the Link Control Access Register of the chipset config space to do it? I would have to write a program to do this, because my tools only work on 32 bit Windows.

  • Yes, you have to set the appropriate ASPM Control bits for both the TI bridge and the chipset bridge. TI offers a free Windows based program called TopHAT that you can use to read and write registers of PCI devices. It will work with the 64-bit version of Windows but you will need to disable driver signature enforcement by tapping on F8 right before Windows starts. Then choose the option to disable driver signature enforcement. Then install the program and run it by right clicking on the icon and choosing "run as administrator". Once the program is running find the XIO2001 in the Normal 0 false false false MicrosoftInternetExplorer4 hiearchy then double click on the PCI Express Capability Structure icon and choose the Link Info tab. Towards the bottom there is a drop down box called ASPM Control. You can use this control to enable L0s, L1, or L0s and L1 on the bridge. Likewise you can do the same thing for the chipset by choosing the same PCI Express Capability for the chipset bridge that the XIO2001 is linked with.  You can also use this tool to look at the Correctable Error Status Register by double clicking on the Advanced Error Reporting Capability Structure icon and choosing the Correctable Error tab. There is also a refresh button that when pressed will rescan the register. You can also clear the status bit by un-checking the check box. You can download TopHAT from the FTP drop site.

     

    To retrieve the files from the pickup directory:

      1. Open an ftp session to ftppickup.ti.com.
             ftp ftppickup.ti.com

      2. Login with the userid 'pickup'.
             Name: pickup

      3. Enter the password.
             Password: piurh!

      4. Change directories to the pickup directory.
             ftp> cd /pub/share/aquuxoo

         Note: If you are using a graphical ftp client, you will not
               see the hidden pickup directory name aquuxoo appear
               in the file list.  You will need to use a manual
               'Change Directory' or 'CD' command to change into the
               dropoff directory.

      5. Set the file transfer mode to ascii or binary as necessary.
             ftp> bin

      6. Transfer the file.
             ftp> get <file>

      7. End the ftp session.
             ftp> quit

    The files can also be retrieved with the url:

      ftp://pickup@ftppickup.ti.com/pub/share/aquuxoo

    List of files in the pickup directory 'aquuxoo':
    TopHAT_v_2_1_0.zip

  • Thanks for the tool.

    Error occurs when root port and TI bridge BOTH have L0s activated in Link Control register. L1 seems to be OK.

    The faulting PC has a Windows behaviour that I don't see on other PCs:

    My default Link Control ASPM setting is overwritten by Windows. Depending on the energy options, if I selected disabled, it's disabled there, if I select moderate savings, it's L0s, if I select maximum savings, it's L0s/L1. Similar things happen on other PCIe root complexes or PCIe devices. Is that allowed behaviour? ASPM setting in Link Control register is always set to disabled on other PCs.

    I get a Receiver Error (Correctable Error) on my device. 2 incidents per minute with low IO load, 1 incident every 10 seconds with high IO load. This does not happen when I have ASPM deactivated. I also get a PCI Master abort, but only once during initialisation, so I think it's not relevant.

  • So are you saying that the problem never occurs if only L0s is enabled on the root port or only enabled on the TI bridge? Did you try disabling ASPM on the bridge but leave L0s enabled on the root port?

    The behavior you describe for the failing PC is correct behavior for a system that properly supports ASPM. Your other PC's apparently don't support ASPM at all, or they may have it disabled in the BIOS.

    Since you are seeing receiver errors I suspect that you might also be seeing other errors as well and this is what eventually causes the system to freeze, but they are not showing up as frequently. You can use TopHAT to automatically poll a register. Open the PCI Registers and right click on the Correctable Error Status register and add this register to the "watch window". Also click on the Alert check box. Now anytime there is a change in this register a new dialog box will appear alerting you of the change. Use this to determine if any of the other error bits are getting set.

    You might want to find another PC that supports ASPM to check if the problem is just with this one PC or if it happens with others. Another test to try is to poll the correctable error status register on the failing PC with ASPM disabled and let it run overnight to see if you ever get any receiver errors. I'm still leaning towards a signal integrity issue with the layout.

  • Chipset - TI Bridge - Result
    -----------------------------
    None - None - OK
    L0s - None - OK
    None - L0s - OK
    L0s - L0s - FAIL

    I added the error reporting registers to the watch window, but beside the errors I already saw, nothing new showed up before the freeze (tried it several times).

    I tried activating L0s on a PC where it is usually turned off and there the PCIe card runs without receiver errors/freeze.

    When running the card over the night in the faulting PC with ASPM deactivated, not a single Receiver Error is generated.

     

  • I don't think this problem can be debugged any further without capturing some analyzer data at the time of failure. It seems the problem may be platform specific. It's odd that the problem only occurs when L0s is enabled on both the bridge and chipset as the receiver and transmitter are independent of each other. I expected the problem would also occur if only L0s was enabled on the chipset and not the bridge.

  • Ok thanks, I will try to get an analyzer log.

  • Ok, I have it! Could a TI engineer please contact me, so I can send it via email? Thanks in advance.

    Here is my address: good_weather at gmx.net

  • Communication is running.

  • Just for the record: The problem was added to the XIO2001 errata sheet in May 2012. It suggests as workaround to deactivate L0s. Thanks for reproducing the problem.