This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Bug in CMEM cache IOCTL?

Hi,

I have been working with CMEM for a Cortex-A9 (v7) architecture and have found that the cache operations are not working properly with the L2 (outer_*) calls.

It looks like the __pa() macro being used in the outer_*_range calls isn't providing the proper physical address for the case where the buffer has been allocated / mmaped in a non-CMA block.  This is because the virtp being given to the __pa() macro is a user pointer not a kernel logical address.

Also, for invalidation, ARM's site suggests that the caches be invalidated outside->inside (L2, then L1), which is consistent with dma_sync_single_for_cpu().

The patch below outlines what I had to do to get invalidation to work properly.  Can anyone confirm / check this out?

diff --git a/src/cmem/module/cmemk.c b/src/cmem/module/cmemk.c
index e757345..717547d 100644
--- a/src/cmem/module/cmemk.c
+++ b/src/cmem/module/cmemk.c
@@ -1473,9 +1473,8 @@ alloc:
 			     */
 			    virtp_end = virtp + size;
 #if 1
+			    outer_inv_range(physp, physp + size);
 			    dmac_map_area(virtp, size, DMA_FROM_DEVICE);
-			    outer_inv_range(__pa((u32)virtp),
-					    __pa((u32)virtp_end));
 #else
 			    dma_sync_single_for_device(NULL, (dma_addr_t)physp, size, DMA_FROM_DEVICE);
 #endif
@@ -1749,8 +1748,7 @@ alloc:
 	      case CMEM_IOCCACHEWB:
 #if 1
 		dmac_map_area(virtp, block.size, DMA_TO_DEVICE);
-		outer_clean_range(__pa((u32)virtp),
-		                  __pa((u32)virtp + block.size));
+		outer_clean_range(physp, physp + block.size);
 #else
 		dma_sync_single_for_device(NULL, (dma_addr_t)physp, block.size, DMA_TO_DEVICE);
 #endif
@@ -1761,9 +1759,8 @@ alloc:
 
 	      case CMEM_IOCCACHEINV:
 #if 1
+		outer_inv_range(physp, physp + block.size);
 		dmac_map_area(virtp, block.size, DMA_FROM_DEVICE);
-		outer_inv_range(__pa((u32)virtp),
-		                __pa((u32)virtp + block.size));
 #else
 		dma_sync_single_for_device(NULL, (dma_addr_t)physp, block.size, DMA_FROM_DEVICE);
 #endif
@@ -1775,8 +1772,7 @@ alloc:
 	      case CMEM_IOCCACHEWBINV:
 #if 1
 		dmac_map_area(virtp, block.size, DMA_BIDIRECTIONAL);
-		outer_flush_range(__pa((u32)virtp),
-		                  __pa((u32)virtp + block.size));
+		outer_flush_range(physp, physp + block.size);
 #else
 		dma_sync_single_for_device(NULL, (dma_addr_t)physp, block.size, DMA_TO_DEVICE);
 		dma_sync_single_for_device(NULL, (dma_addr_t)physp, block.size, DMA_FROM_DEVICE);

  • Thankyou for the patch, it seems good on first glance.  I'm not sure why the __pa() macro is being used when the physp variable is valid and available.  I will look into this.

    How did you validate this (both the failing CACHEINV and the fixed one)?

    lliamsonchael said:

    Also, for invalidation, ARM's site suggests that the caches be invalidated outside->inside (L2, then L1), which is consistent with dma_sync_single_for_cpu().

    So, does ARM recommend that order (outside->inside) for just the invalidate operation?

    If you would, please provide a link to the pertinent ARM recommendation.

    Thanks & Regards,

    - Rob

     

  • See this link regarding cache order (looks like outer->inner is for invalidation only):

    http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka14802.html

    I am validating this using an arm v7 processor with DMAs on a custom application.  I am only checking the CACHEINV ioctl.

    1) Fill CMEM area memory with known pattern (0xdeadbeef) using /dev/mem mapping (to ensure write accesses are not cached).

    2) Issue a DMA request that will put data into the area.

    3) Wait for DMA to complete.

    4) Issue CMEM cache invalidate ioctl.

    5) Check data for (0xdeadbeef) patterns using the CMEM mapping.

    6) Goto 1.

    We were observing some 0xDEADBEEF patterns in the data reported by the CPU, but if we mapped in /dev/mem at the same address (which is mapped as NON-CACHEABLE) the external RAM did not contain the 0xDEAFBEEF pattern but had the expected data words.  We also observed that if we disabled caching in the CMEM_alloc call we did not see the 0xDEADBEEF patterns ever reported.

    When we added the fix for the outer_* calls, the issue cleared in the same way as if the caches were disabled.

    I think this was particularly bad for us as the L2 controller we were using has a prefetch block.

    -Mike