Hi Richard,
We've done some further investigation of our problem. We believe that the PCIe issue is now fixed, but there is still [at least] one other lock-up issue that we're finding very difficult to hit in testing.
So far, we've been able to investigate six of the these CPU lock-ups over JTAG, with varying symptoms.
----
(1) In this case, many of the peripheral memory mapped registers were not accessible.
Not accessible: I2C, UART, SPI, any SERDES, PLL controllers, NET CoProcessor, Power sleep controller, GPIO, Semaphore system, Queuing system, DDR PHY
However we could access some systems: CIC0, CIC2, EDMA0, MSMC, SDRAM (these all seem to be on TeraNet3_C).
We then tried accessing the DDR RAM. Some accesses were fine, but the first time we tried to access a CPU3-specific memory region (our first memory access that might have been cached on CPU3) the debug system hung and we couldn't debug any further. We could no longer connect to the JTAG TAP.
----
(2), (3) On these lock-ups, we had more success with peripheral memory mapped registers. Until we tried to access the USB PHY. Then the debug system hung again and we couldn't debug any further. Before that, we'd already tried a few DDR RAM memory accesses and several peripherals. What we had tried were all ok - NET CoProcessor, all the SerDes, PLL controller, Power sleep controller, I2C, UART, INT controller, CIC, GPIO, BOOTCFG
We haven’t even enabled the USB subsystem in the code, so it's surprising that it hung the debug system.
----
(4), (5), (6) For the other 3 lock-ups, we seem to be able to access all the DDR RAM and all the memory mapped registers of peripherals that are enabled. Accessing the USB PHY returns an error (we'd expect that because it's not enabled) but it doesn't hang the debug system as in (2) and (3).
Nothing appears to be obviously wrong over JTAG, except that all of the CPU cores are stuck. It doesn't suggest any particular sub-system as they are all in the state we'd expect (accessible if enabled, errors if not enabled). We currently have one machine in this state. Is there anything else you think that we could try to work out why the CPU cores locked up?
Many thanks for any suggestions you might have,
Tim