We are using the XIO2001IZAJ PCI Express to PCI bridge chip in one of our designs. The device has been working as expected for years. We have a customer who is using our product and is experiencing sporadic problems with enumeration at cold temperature. The symptoms of the reported problem is that sometimes the unit fails to enumerate on power-up. The failure to enumerate varies from unit to unit and tends to be more prevalent at cold temperature (below -25 deg C). When the device fails to enumerate, both the bridge chip and the PCI device behind the bridge are missing (i.e. cannot be seen using lspci). I would have expected to at least see the device ID but the devices (bridge and PCI device behind it) do not show up at all in the list.
We are speculating that there might be a reset issue. This application has PERST and GRST reset lines connected together in their system. We have been reviewing the Power-Up Sequence requirements as described in the xio2001 datasheets, which specifies that the PERST rest must be asserted for at least 100ms after applying 3.3V and 1.5V power and at least 100 usec after applying the PCIE REF CLOCK. This design holds the device in reset (asserting both PERST and GRST low) for 520 usec after starting the PCI REF CLOCK (i.e. it exceeds the 100 us min) and seconds after applying power (i.e. it satisfies the 100 ms power supply stabilization requirement). They observe that PCI-E output data from the bridge starts approx 13 ms after releasing reset.
The power-up sequence for the processor involves two reset events. During the BIOS boot sequence (Intel firmware), following application of power, the management engine (ME) makes a lot of adjustments (drive strength for DDR, PCIe, etc). Then the ME switches off the power supply and reboots. The PERST/GRST is held in reset for several seconds following the reboot (i.e. satisfies the 500 ms power stabilization requirement in the datasheet).
I am wondering if connecting PERST and GRST together is related to the problem. The reset lines are inner traces so we cannot easily experiment with separating the reset lines. Any insight into the reset sequence or any other suggestions on what may be causing the problem would be greatly appreciated.