During tests with our TQMa64xxL SoM, we noticed that SPI-NOR flash detection in U-Boot is unreliable - SFDP tables were sometimes not read correctly when using the `sf probe` command.
We determined that the cause is dma_memcpy (called from cadence_qspi_apb_direct_read_execute()) not copying the data as expected, which seems to be a cache invalidation issue. We found two workarounds:
1. Adding a second `invalidate_dcache_range()` after the transfer fixes the issue:
--- a/drivers/dma/dma-uclass.c +++ b/drivers/dma/dma-uclass.c @@ -247,7 +247,11 @@ int dma_memcpy(void *dst, void *src, size_t len) invalidate_dcache_range((unsigned long)dst, (unsigned long)dst + roundup(len, ARCH_DMA_MINALIGN)); - return ops->transfer(dev, DMA_MEM_TO_MEM, dst, src, len); + ret = ops->transfer(dev, DMA_MEM_TO_MEM, dst, src, len); + invalidate_dcache_range((unsigned long)dst, (unsigned long)dst + + roundup(len, ARCH_DMA_MINALIGN)); + + return ret; } UCLASS_DRIVER(dma) = {
2. Increasing ARCH_DMA_MINALIGN from 64 to 128 fixes the issue
Neither of these fixes feels like the correct solution to me though. As far as I can tell, 64 should be correct for the AM64x (ARCH_DMA_MINALIGN is unconditionally set to 128 for ARM64 on Linux though: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c1132702c71f4b95db9435bac5fdc912881563e0). I also would expect invalidate_dcache_range() before the transfer to be sufficient, as the buffer is correctly aligned and nothing should touch that cache line during the transfer, but this doesn't seem to be the case for some reason.
Our U-Boot tree is based on the latest ti-u-boot-2021.01 (commit 49beccc18dfd3609b96fed0d13b7ef38bdff57a6).