I want to begin by saying that I called technical support and received a reply that "TI is migrating support to our E2E (Engineer to Engineer) Community Forums" so it's not obvious whether someone at TI is actually working on this already, but in case they are the ID is: [SR THREAD ID: 1-RGDCD1]
The issue seems to happen after some time, so I expect it has to do with synchronizer failure, but in a nutshell here's what I have found:
The primary bus can be running at 33.33 MHz or 66.66 MHz and the issue occurs either way. In all cases I run the secondary bus at 33.00 MHz using the SEC_ASYNC_CLK as described in this earlier thread. On the secondary side I have an FPGA doing DMA to the host memory. This consists of a long sequence of 16-word (one cache line) memory write bursts to successive addresses. In the FPGA I was able to see that there was nothing unusual about the transactions occuring just before the lock up. I also had a PCI bus analyzer on the primary bus and saw nothing out of the ordinary just before the lock up. Transactions just before lockup were coming through in a handful of clock cycles, indicating that the bridge internal FIFO was empty or nearly empty. At lock up the bridge accepted an additional 64 words (256 bytes) of data from the FPGA that did not get passed through to the primary bus. All subsequent attempts by the FPGA to write were terminated with retry. Only resetting the bridge would allow further transactions in either direction after the lock up occurs.
As noted in the thread title, this only happens when the clocks are asynchronous and write combining is enabled in the bridge. I can work around the issue by either turning off write combining or using the primary clock as the clock source. As turning off write combining is the simplest solution for my system, and write combining itself doesn't appear to improve throughput in these conditions, I have used this as the preferred workaround.
My question is whether there is any known errata for the PCI2060 that might explain this behavior. I did not see any errata for this device on the website.