This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: Cache invalidation issue in U-Boot dma_memcpy

Part Number: AM6442

During tests with our TQMa64xxL SoM, we noticed that SPI-NOR flash detection in U-Boot is unreliable - SFDP tables were sometimes not read correctly when using the `sf probe` command.

We determined that the cause is dma_memcpy (called from cadence_qspi_apb_direct_read_execute()) not copying the data as expected, which seems to be a cache invalidation issue. We found two workarounds:

1. Adding a second `invalidate_dcache_range()` after the transfer fixes the issue:

--- a/drivers/dma/dma-uclass.c
+++ b/drivers/dma/dma-uclass.c
@@ -247,7 +247,11 @@ int dma_memcpy(void *dst, void *src, size_t len)
 	invalidate_dcache_range((unsigned long)dst, (unsigned long)dst +
 				roundup(len, ARCH_DMA_MINALIGN));
 
-	return ops->transfer(dev, DMA_MEM_TO_MEM, dst, src, len);
+	ret = ops->transfer(dev, DMA_MEM_TO_MEM, dst, src, len);
+	invalidate_dcache_range((unsigned long)dst, (unsigned long)dst +
+				roundup(len, ARCH_DMA_MINALIGN));
+
+	return ret;
 }
 
 UCLASS_DRIVER(dma) = {

2. Increasing ARCH_DMA_MINALIGN from 64 to 128 fixes the issue

Neither of these fixes feels like the correct solution to me though. As far as I can tell, 64 should be correct for the AM64x (ARCH_DMA_MINALIGN is unconditionally set to 128 for ARM64 on Linux though: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c1132702c71f4b95db9435bac5fdc912881563e0). I also would expect invalidate_dcache_range() before the transfer to be sufficient, as the buffer is correctly aligned and nothing should touch that cache line during the transfer, but this doesn't seem to be the case for some reason.

Our U-Boot tree is based on the latest ti-u-boot-2021.01 (commit 49beccc18dfd3609b96fed0d13b7ef38bdff57a6).

  • Hello Matthias,
    Thanks for reporting the issue and some temporary WAs.
    Will you share some details to help us investigate the issue?
    - full boot log showing "sf probe" errors?
    - Is the issue observed on individual board or multiple boards?
    - Is the issue reproducible consistently or only randomly?
    - NOR-SPI flash information?
    Best,
    -Hong

  • Hello Hong,

    please see the attached log:

    U-Boot SPL 2021.01-tq-g41169928324 (Aug 15 2022 - 16:23:09 +0200)
    Selected configuration for 1 GiB RAM
    SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.1--v08.04.01 (Jolly Jellyfi')
    SPL initial stack usage: 13408 bytes
    Trying to boot from MMC2
    spl_load_fit_image: Skip load 'dm': image size is 0!
    Starting ATF on ARM64 core...
    
    NOTICE:  BL31: v2.5(release):08.01.00.001-dirty
    NOTICE:  BL31: Built : 23:53:39, May 27 2021
    
    U-Boot SPL 2021.01-tq-g41169928324 (Aug 16 2022 - 11:46:32 +0200)
    SYSFW ABI: 3.1 (firmware rev 0x0008 '8.4.1--v08.04.01 (Jolly Jellyfi')
    Trying to boot from MMC2
    
    
    U-Boot 2021.01-tq-g41169928324 (Aug 16 2022 - 11:46:32 +0200)
    
    SoC:   AM64X SR1.0 GP
    Model: TQ-Systems TQMa64xxL SoM on MBax4xxL carrier board
    DRAM:  1 GiB
    cdns3_bind: not able to bind usb device mode
    MMC:   mmc@fa10000: 0, mmc@fa00000: 1
    Loading Environment from MMC... OK
    In:    serial@2800000
    Out:   serial@2800000
    Err:   serial@2800000
    EEPROM:
      ID: TQMa6442L-P1 REV.0100
      SN: 12345678
      MAC: 02:03:04:05:06:07
      VARD CRC: 4354 (calculated 4354) [OKAY]
      HW REV:   01xx
      RAM:      type 1, 1024 MiB, no ECC
      RTC:      yes
      SPI-NOR:  yes
      eMMC:     yes
      SE:       no
      EEPROM:   type 1, 8 KiB, pagesize 32
    
    Net:   eth0: ethernet@8000000port@1
    Hit any key to stop autoboot:  0 
    => sf probe
    dma_memcpy: 00000000bdef2f00 00000000bdef2f40
    53 46 44 50 06 01 02 ff 00 06 01 10 30 00 00 ff 
    dma_memcpy: 00000000bdef9040 00000000bdef9080
    00 00 00 00 00 00 00 00 e0 88 00 00 00 00 00 00 
    dma_memcpy: 00000000bdef9280 00000000bdef92c0
    00 00 00 00 00 00 00 00 a0 86 00 00 00 00 00 00 10 5a fc bf 00 00 00 00 10 5a fc bf 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
    SF: Detected mx66u51235f with page size 1 Bytes, erase size 64 KiB, total 0 Bytes
    => sf probe
    dma_memcpy: 00000000bdef9b00 00000000bdef9b40
    10 5a fc bf 00 00 00 00 10 5a fc bf 00 00 00 00 
    SF: Detected mx66u51235f with page size 256 Bytes, erase size 64 KiB, total 64 MiB
    => sf probe
    dma_memcpy: 00000000bdefa380 00000000bdefa3c0
    53 46 44 50 06 01 02 ff 00 06 01 10 30 00 00 ff 
    dma_memcpy: 00000000bdefa5c0 00000000bdefa600
    c2 00 01 04 10 01 00 ff 84 00 01 02 c0 00 00 ff 
    dma_memcpy: 00000000bdefa640 00000000bdefa680
    e5 20 fb ff ff ff ff 1f 44 eb 08 6b 08 3b 04 bb fe ff ff ff ff ff 00 ff ff ff 44 eb 0c 20 0f 52 10 d8 00 ff d3 49 c5 00 81 df 04 e3 44 01 07 38 30 b0 30 b0 f7 bd d5 5c 4a 9e 29 ff f0 50 f9 85 
    SF: Detected mx66u51235f with page size 256 Bytes, erase size 64 KiB, total 64 MiB
    => sf probe
    dma_memcpy: 00000000bdefae80 00000000bdefaec0
    53 46 44 50 06 01 02 ff 00 06 01 10 30 00 00 ff 
    dma_memcpy: 00000000bdefb0c0 00000000bdefb100
    00 00 00 00 00 00 00 00 60 68 00 00 00 00 00 00 
    dma_memcpy: 00000000bdefb300 00000000bdefb340
    00 00 00 00 00 00 00 00 20 66 00 00 00 00 00 00 10 5a fc bf 00 00 00 00 10 5a fc bf 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
    SF: Detected mx66u51235f with page size 1 Bytes, erase size 64 KiB, total 0 Bytes
    => sf probe
    dma_memcpy: 00000000bdefa880 00000000bdefa8c0
    53 46 44 50 06 01 02 ff 00 06 01 10 30 00 00 ff 
    dma_memcpy: 00000000bdefa880 00000000bdefa8c0
    c2 00 01 04 10 01 00 ff 84 00 01 02 c0 00 00 ff 
    dma_memcpy: 00000000bdefbec0 00000000bdefbf00
    e5 20 fb ff ff ff ff 1f 44 eb 08 6b 08 3b 04 bb fe ff ff ff ff ff 00 ff ff ff 44 eb 0c 20 0f 52 10 d8 00 ff d3 49 c5 00 81 df 04 e3 44 01 07 38 30 b0 30 b0 f7 bd d5 5c 4a 9e 29 ff f0 50 f9 85 
    SF: Detected mx66u51235f with page size 256 Bytes, erase size 64 KiB, total 64 MiB
    => sf probe
    dma_memcpy: 00000000bdefa880 00000000bdefa8c0
    53 46 44 50 06 01 02 ff 00 06 01 10 30 00 00 ff 
    dma_memcpy: 00000000bdefb540 00000000bdefb580
    c2 00 01 04 10 01 00 ff 84 00 01 02 c0 00 00 ff 
    dma_memcpy: 00000000bdefa880 00000000bdefa8c0
    e5 20 fb ff ff ff ff 1f 44 eb 08 6b 08 3b 04 bb fe ff ff ff ff ff 00 ff ff ff 44 eb 0c 20 0f 52 10 d8 00 ff d3 49 c5 00 81 df 04 e3 44 01 07 38 30 b0 30 b0 f7 bd d5 5c 4a 9e 29 ff f0 50 f9 85 
    SF: Detected mx66u51235f with page size 256 Bytes, erase size 64 KiB, total 64 MiB
    => 
    

    I have extended dma_memcpy with some additional log messages showing the start and end addresses of the destination buffer (i.e. the region that invalidate_dcache_range is called on), as well as the contents of this buffer after the copy.

    The issue occurs randomly. Apparently only the first ~5 runs of `sf probe` after boot are affected. All later calls give correct results, and the buffer contents displayed in my debug messages become consistent.

    We have observed this issue on two of our boards. A third board did not exhibit the issue, but this third board has some hardware modifications causing the driver initialization etc. to differ, leading to different addresses for the kmalloced buffers, which may or may not have an effect on this issue.

    The used flash is a Macronix MX25U51245GXDI00 (the log shows an incorrect model name because of JEDEC ID reuse).

  • Hello Matthias,
    I've shared your findings with internal SW team for review.
    I'll get back to you once I receive the feedback.
    Best,
    -Hong

  • Hello Matthias,
    Linux SW team recommend to test the patch to see if it helps.
    lore.kernel.org/.../
    Best,
    -Hong

  • Hello Hong,

    the patch is already included in ti-u-boot 2021.01.

  • Hello Matthias,

    Your first solution looks to be a correct one. We should be cleaning the dcache before the operation to prevent
    dirty cache lines from raining out into DRAM as the DMA does its work. Then after we should invalidate to zap
    any remaining stale lines.

    Although I don't know why any lines would be allocated during this time frame (we should look into that, maybe
    we are writing to near this area for some reason causing, speculative fetch? might explain why increasing the
    ARCH_DMA_MINALIGN helps), the right order is to invalidate after the DMA operation.

    I have a patch addressing this for several drivers that I'm going to push into ti-u-boot-2021.01 here in the next few days.

    Thanks,
    Andrew

  • Hello Matthias,
    There's a fix for DMA/cache operations in u-boot
    git.ti.com/.../
    The above fix is also included in TI Linux SDK 8.4 release.
    Will you help re-run your test on your setup?
    Best,
    -Hong

  • Hello Hong,

    everything seems to be working correctly now with the new version. Thanks!