This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM625: Occasional SPI-NOR write timeout under high load

Part Number: AM625
Other Parts Discussed in Thread: AM6442,

On our TQMa62xx SoM, we are occasionally seeing MTD write timeouts under high CPU and memory load. We are writing to a SPI-NOR flash using mtd_debug write. The following messages are emitted to the kernel log:

[ 2178.795315] cadence-qspi fc40000.spi: Indirect write completion error (-110)
[ 2178.795351] spi-nor spi0.0: operation failed with -110

Our device tree is configured as follows:

&ospi0 {
	pinctrl-names = "default";
	pinctrl-0 = <&ospi0_pins>;
	status = "okay";

	flash@0 {
		compatible = "jedec,spi-nor";
		reg = <0x0>;
		spi-tx-bus-width = <8>;
		spi-rx-bus-width = <8>;
		spi-max-frequency = <84000000>;
		cdns,tshsl-ns = <60>;
		cdns,tsd2d-ns = <60>;
		cdns,tchsh-ns = <60>;
		cdns,tslch-ns = <60>;
		cdns,read-delay = <2>;

		partitions {
			compatible = "fixed-partitions";
			#address-cells = <1>;
			#size-cells = <1>;

			/* Filled by bootloader */
		};
	};
};

We are using a Macronix MX25U51245GXDI00 QuadSPI flash, and I have verified that 1-4-4 Page Writes (opcode 0x3E) are used.

We are seeing the issue with the latest ti-linux-rt 6.1.y (tag 09.00.00.009-rt, CONFIG_PREEMPT_RT is currently disabled in our config however).

The issue usually appears within a few minutes of writing to the flash while stressing the system using stress-ng and stressapptest.

  • We have further analyzed the issue and made the following observations:

    • Accessing the eMMC while the CPU/RAM stress and SPI-NOR writes are running makes the issue much more likely to occur. A simple loop of `cat /dev/mmcblk0 >/dev/null` is usually sufficient to trigger it in less than 2 minutes
    • The issue actually has 2 stages that always occur in the same order:
      • The first error results in the timeout log message and a single corrupted transfer. Subsequent writes are successful again.
      • When the write test is continued after the first error without rebooting until a second error is encountered, no kernel message is printed. Instead, the controllers gets stuck in an invalid state where all subsequent writes are corrupted. It seems like in this state, the first bytes of every transfer are missing (so when writing a file to the NOR flash, all contents of the flash are offset by the missing number of bytes). I've encountered 4 and 8 missing bytes in my tests.
    • The cadence-qspi runtime PM feature introduced in the latest ti-linux has an effect on this issue: When in stage 2 of the issue, and then the write tests are interrupted for long enough that the controller can be powered down, it will recover, and subsequent writes are correct again. On older kernels without runtime PM, a reboot is needed to recover.
  • Hi Matthias,

    looks like you have done some good analysis already! I'm currently tied up some some other work items I need to finish first, so it'll take me a few days before being able to look into this more closely. Please ping me next week in case I haven't been able to respond until then.

    Regards, Andreas

  • Hi Andreas,

    did you already have a chance to have a closer look at the issue?

    Regards, Matthias

  • Hi Matthias, I've not been able to prioritize this yet, it'll take me a few more days to free up some time.

    Regards, Andreas

  • Hi Matthias,

    spi-tx-bus-width = <8>;
    spi-rx-bus-width = <8>;

    The chip you are using is a quad-SPI chip, so this setup isn't really right. Can you please try the below:

    spi-tx-bus-width = <1>;
    spi-rx-bus-width = <4>;

    (Note that 'tx-bus-width' is only set to '1' as this seems to be what the driver supports at the moment, and it was said it may actually be sufficient in many/all use cases due to Flash write speed limitations --> please go with this for now, if it turns out to be a bottleneck we can update the driver).

    Furthermore, your spi-max-frequency = <84000000>; setting seems unusually high. Can you please bring this down to spi-max-frequency = <25000000>; for testing purposes. The max. allowed frequency for the OSPI peripheral module in non-PHY mode is 50MHz. If you are going beyond that (assuming the driver actually lets you and applies this high frequency you are requesting) this for sure will create issues.

    Regards, Andres

  • Hi Andreas,

    thanks for your reply.

    Regarding the bus width: It is set to 8, so that different flash chips can be used with the same Device Tree. This should not cause any issues (it only limits the maximum bus width, but the actual bus width is determined depending on the detected flash chip). The driver supports TX bus width 4 just fine (we fixed that during development of the TQMa64xxL SOM on kernel 5.10), and it is working in both mainline and TI kernels 6.1. As noted in my issue description, I've verified that 1-4-4 writes are used.

    We will try reducing the clock rate (84MHz is the maximum supported by our flash in this mode; I was not aware of this limitation of the AM62x).

    Does the AM64x have the same limitation? I have now found mention of this in the AM625 datasheet, but the same explanatory text on PHY and Tap mode doesn't exist in the AM6442 datasheet at all.

    The limitation described in the text is also inconsistent with the Timing Requirements table, which gives a minimum cycle time of 7ns (1.8V) and 6.03ns (3.3V) for SDR without training in both datasheets.

    Regards,
    Matthias

  • Regarding the bus width: It is set to 8, so that different flash chips can be used with the same Device Tree. This should not cause any issues (it only limits the maximum bus width, but the actual bus width is determined depending on the detected flash chip).

    Understood about your use case.

    The driver supports TX bus width 4 just fine (we fixed that during development of the TQMa64xxL SOM on kernel 5.10),

    (Out of curiosity) How did you fix this? Like shown below? I think this should do the trick but I never got the chance to validate in HW.

    $ git diff
    diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
    index 957c1a4561ad..d0a398a074aa 100644
    --- a/drivers/spi/spi-cadence-quadspi.c
    +++ b/drivers/spi/spi-cadence-quadspi.c
    @@ -2405,7 +2405,7 @@ static int cqspi_probe(struct platform_device *pdev)
                    dev_err(&pdev->dev, "spi_alloc_master failed\n");
                    return -ENOMEM;
            }
    -       master->mode_bits = SPI_RX_QUAD | SPI_RX_DUAL;
    +       master->mode_bits = SPI_RX_QUAD | SPI_RX_DUAL | SPI_TX_QUAD | SPI_TX_DUAL;
            master->mem_ops = &cqspi_mem_ops;
            master->mem_caps = &cqspi_mem_caps;
            master->dev.of_node = pdev->dev.of_node;

    Does the AM64x have the same limitation? I have now found mention of this in the AM625 datasheet, but the same explanatory text on PHY and Tap mode doesn't exist in the AM6442 datasheet at all.

    It's the same OSPI IP, so asides from occasional slight differences in timing/electrical parameters based on device/char results that usually means it operates and works essentially the same across device families. Device datasheets (as well as TRMs) are not always updated all at the same time, leading to the scenario where some documents may have more information at a given point in time.

    We will try reducing the clock rate (84MHz is the maximum supported by our flash in this mode; I was not aware of this limitation of the AM62x).

    Yes please do that, also with an intentional "slow" frequency like 25MHz as suggested, to get some more data points here.

    The limitation described in the text is also inconsistent with the Timing Requirements table, which gives a minimum cycle time of 7ns (1.8V) and 6.03ns (3.3V) for SDR without training in both datasheets.

    What you are referring to here is known as "OSPI Mode Without PHY Training". While the HW supports this, this is currently not supported by the Linux drivers. In terms of PHY mode operation, my understanding is it currently (as of SDK v9.0) only supports 8-bit DDR mode w/ PHY training, which is very different from what you are looking for.

    Regards, Andreas

  • (Out of curiosity) How did you fix this? Like shown below? I think this should do the trick but I never got the chance to validate in HW.

    In the spimem subsystem, SPI_TX_OCTAL implies SPI_TX_QUAD and SPI_TX_DUAL (handled by spi_check_buswidth_req() in drivers/spi/spi-mem.c), so no such fix was necessary, and the driver has always tried to use 4 TX lanes if the configured bus width and the flash allowed it. This resulted in data corruption in earlier kernels, as the driver did not actually handle these operations correctly; several patches to fix this were applied to mainline Linux in April 2022 (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/spi/spi-cadence-quadspi.c) and have made it to ti-linux-kernel since.

    Regards,
    Matthias

  • Hi Andreas,

    Yes please do that, also with an intentional "slow" frequency like 25MHz as suggested, to get some more data points here.

    I have now run the tests you asked for:

    • Reducing the frequency to 25MHz did not have any effect, we're still seeing the same error within less than 5 minutes of testing
    • Reducing the frequency to 25MHz and setting tx-bus-width to 1 did not any effect
    • With the same kernel and the original setup (1-4-4 writes, 84MHz), we're not seeing any errors on the AM6442 after 30+ minutes of testing; it seems that only the AM62x is affected.

    Regards,
    Matthias

  • Hi Matthias,

    • Reducing the frequency to 25MHz did not have any effect, we're still seeing the same error within less than 5 minutes of testing
    • Reducing the frequency to 25MHz and setting tx-bus-width to 1 did not any effect

    Thanks for running these experiments, I think this will probably rule out some low-level HW issues, but of course we cannot be 100% sure.

    With the same kernel and the original setup (1-4-4 writes, 84MHz), we're not seeing any errors on the AM6442 after 30+ minutes of testing; it seems that only the AM62x is affected.

    This is a good experiment/comparison. One key difference between AM62 and AM64 is that the former has 4x A53 cores, as opposed to two. Perhaps the multi-core processing/kernel scheduling on AM64 has an impact on whatever testing you are doing. I wouldn't really expect it here, but it would be good to run one more experiment as follows. On your AM62 system, isolate the CPU cores 2 and 3 from the Kernel scheduling, using a Kernel command line parameter of optargs='isolcpus=2-3' to make it look a bit like AM64. You can add this parameter to the optargs ENV variable in U-Boot before starting your boot progress, like env set optargs 'isolcpus=2-3', and then checking the Kernel command line in the boot log to make sure this option got applied.

    Also, I suppose on AM62x and AM64 you ran the exact same stress test/load when you evaluated the behavior across platform. Can you share exactly how this test is run? (command parameters, sequencing, etc.). Perhaps this gives us some clues on how to further analyze this.

    Regards, Andreas

  • Hi Andreas,

    I used the following test:

    stress-ng --cpu-load 50 --cpu 2 --timeout 31536000 &
    stressapptest -W -s 31536000 -M 128 -m 1 -C 1 -i 1 &
    while sleep 1; do dd if=/dev/mmcblk0 of=/dev/null bs=4M count=128; done &
    
    while true; do
    	dd if=/dev/urandom of=input.bin bs=128K count=1
    	mtd_debug erase /dev/mtd0 0 131072
    	mtd_debug write /dev/mtd0 0 131072 input.bin
    	mtd_debug read /dev/mtd0 0 131072 output.bin
    	cmp input.bin output.bin
    done
    

    Limiting the test to two cores using isolcpus makes the error much less likely to occur on the AM62x, I've only seen it once in one hour of testing.

    Regards,
    Matthias

  • Hi Matthias,

    ok thanks, that's good input. Let me think about this some and also talk to some folks here internally about possible next steps. Will get back here in a few days.

    Regards, Andreas

  • Matthias,

    I did some more research on this, and it looks like the issue may be related to https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1033521/tda4vm-an-indirect-write-completion-error-occurred-in-the-linux-ospi-driver/3863014

    Digging deeper, there's a ticket in our system (LCPD-24144) that fixed this issue on our previous SDK series (v8.x) based on ti-linux-5.10.y with the following commit:

    $ git show 027f03a8512086e5ef05dc4e4ff53b2628848f95
    commit 027f03a8512086e5ef05dc4e4ff53b2628848f95
    Author: Pratyush Yadav <p.yadav@ti.com>
    Date:   Tue Feb 1 23:24:35 2022 +0530
    
        spi: cadence-quadspi: flush posted register writes before INDAC access
    
        cqspi_indirect_read_execute() and cqspi_indirect_write_execute() first
        set the enable bit on APB region and then start reading/writing to the
        AHB region. On TI K3 SoCs these regions lie on different endpoints. This
        means that the order of the two operations is not guaranteed, and they
        might be reordered at the interconnect level.
    
        It is possible for the AHB write to be executed before the APB write to
        enable the indirect controller, causing the transaction to be invalid
        and the write erroring out. Read back the APB region write before
        accessing the AHB region to make sure the write got flushed and the race
        condition is eliminated.
    
        Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
        Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
    
    diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
    index 53ba8502b988..a19292818006 100644
    --- a/drivers/spi/spi-cadence-quadspi.c
    +++ b/drivers/spi/spi-cadence-quadspi.c
    @@ -1294,6 +1294,7 @@ static int cqspi_indirect_read_execute(struct cqspi_flash_pdata *f_pdata,
            reinit_completion(&cqspi->transfer_complete);
            writel(CQSPI_REG_INDIRECTRD_START_MASK,
                   reg_base + CQSPI_REG_INDIRECTRD);
    +       readl(reg_base + CQSPI_REG_INDIRECTRD); /* Flush posted write. */
    
            while (remaining > 0) {
                    if (!wait_for_completion_timeout(&cqspi->transfer_complete,
    @@ -1432,6 +1433,8 @@ static int cqspi_indirect_write_execute(struct cqspi_flash_pdata *f_pdata,
            reinit_completion(&cqspi->transfer_complete);
            writel(CQSPI_REG_INDIRECTWR_START_MASK,
                   reg_base + CQSPI_REG_INDIRECTWR);
    +       readl(reg_base + CQSPI_REG_INDIRECTWR); /* Flush posted write. */
    +
            /*
             * As per 66AK2G02 TRM SPRUHY8F section 11.15.5.3 Indirect Access
             * Controller programming sequence, couple of cycles of
    

    Can you please cherry-pick this commit on top of your ti-linux-6.1.y tree as follows and see if this resolves your issue:

    a0797059@dasso:~/git/linux (ti-linux-6.1.y)
    $ git cherry-pick 027f03a8512086e5ef05dc4e4ff53b2628848f95
    Auto-merging drivers/spi/spi-cadence-quadspi.c
    [ti-linux-6.1.y 0a8fd4bd3d90] spi: cadence-quadspi: flush posted register writes before INDAC access
     Author: Pratyush Yadav <p.yadav@ti.com>
     Date: Tue Feb 1 23:24:35 2022 +0530
     1 file changed, 3 insertions(+)

    I'll do some more digging but it appears like this commit has been missed during Linux migration.

    Regards, Andreas

  • Hi Andreas,

    I cherry-picked the patch, and so far, it looks like it solved our issue! Our stress test has been running for 6 hours without errors, compared to 5-10 errors / hour without the patch.

    Thanks a lot,
    Matthias

  • Hi Matthias,

    thanks for the feedback, glad to hear that. The team is working to make sure this patch will make it into our ti-linux-6.1.y integration tree, and with this into the next SDK release. Sorry again for the trouble with this one, this should have been completely avoidable on our end.

    Regards, Andreas