I am having what I believe to be cache coherency problems on a dm6435 custom app.
Basically, I am in a loop of acquire then process:
preloop: invalidate cache for buffer 1
during loop:
trigger image (will be placed in buffer1 by vpfe)
wait for image acquired interrupt from vpfe
process image, placing results in buffer2
invalidate cache for buffer 1
repeat
If I perform a global L2 invalidate, everything appears to work fine. But if I try to only invalidate the cache for buffer1, I get artifacts in buffer2, but not buffer1. Here is my routine to perform local invalidates:
void invalidateCacheBlock( Uint32 blockStart, Uint32 blockSize )
{
Uint32 currentSize;
while (blockSize > 0) {
if (blockSize > (2 << 18)) {
currentSize = 2 << 18;
} else {
currentSize = blockSize;
}
CACHE_L2IBAR = blockStart;
CACHE_L2IWC = currentSize >> 2;
// wait for invalidate operation to complete
while (CACHE_L2IWC != 0);
blockSize -= currentSize;
blockStart += currentSize;
}
}
I have a few questions: 1) In my loop, I believe I should only have to invalidate buffer1 (there is no dma action besides the vpfe filling the buffer. Is this correct, or would I need to invalidate buffer2 for some reason? 2) If 1 is true, is there anything wrong with the routine above for performing a localized invalidate? The frame I am invalidating is 256 byte aligned at start and finish. 3) Invalidating L2 also invalidates L1D, correct? 4) I know that the image consumes > 100% of my cache. Is there any speed penalty to just performing a global invalidate vs. the local invalidate?
Thanks for any insight
edit:
I realized that I can't just do a global invalidate, I would have to do a global writeback/invalidate (to save changes to various other data not related to image processing), which I'd rather not do if possible for time reasons.