One of the developers has been having intermittent issues with the performance of the C6678 and, on analysis, a contributor (or, possibly, a symptom) is that sometimes the cache coherence operations are taking an astonishing amount of time to complete. The case that I'm looking at has core 1 performing two cache coherence operations involved in receiving a message from the sRIO peripheral (one on the free memory being added back onto the FDQ and the other on the descriptor).
Both are WB/Inv and the first is of 268 bytes in DDR3 and takes ~15us; the second is of 40 bytes in MSMC shared memory and takes ~12us!
For all of this time, 4 of the other cores are idle (for most of it, there is only one other core non-idle!) Core 0 is also performing cache operations at this time (Invalidates of just over 4K of MSMC memory where there is an inbound DirectIO transfer from the sRIO peripheral expected (but it takes longer than expected to turn up - it is not clear at the moment whether this is due to the same root-cause as the cache operation time, or due to congestion elsewhere in the sRIO fabric).
Can anyone suggest why the cache operations are taking so long and, more importantly, how we can ensure that they perform better in the future?
TIA,
SPH.