AM62L: AM62L: SPI communication issues

Part Number: AM62L

AM62L, RT Linux SDK11.02.08.02, Kernel 6.12.57. SPI 5.5MHz, dma on, 1KB per 2.5ms.
spi application thread priority: 55.

Disabling DMA for SPI communication results in very high CPU usage, so I enabled DMA to use SPI. However, when sending and receiving data via ioctl(), the execution time varies greatly and the jitter is also significant. Referring to AM62L-EVSE-DEV-EVM: dmaengine: ti: k3-udma: 1 second polling delay, I modified the SPI driver and spi: spi-omap2-mcspi: Use EOW interrupt for completion when DMA enabled - ti-linux-kernel/ti-linux-kernel - This repo contains a Linux kernel that has been integrated with outstanding. The changes are shown in the attached diff file.

Before modification: execution time 3602 ~ 1612 µs, jitter 1930 ~ -764 µs.  
After modification: execution time 2093 ~ 1641 µs, jitter 200 ~ -248 µs.  

The performance after modification basically meets our requirements. However, the modified SPI communication has a high probability of completely freezing the system, leaving it totally unresponsive, so that only a power cycle can restart it, regardless of which core.

The SPI test program I used is attached. The execution time is measured by `time_exec*`, and the jitter is measured by `time_jitter*`.

Test code: spi_mcu_test.tar.gz

Driver diff: spi-omap2-mcspi_6.12.57.diff 

  • If the FIFO scheduling policy is not assigned to the SPI test program, the system will not hang; however, in this case, the test program's CPU usage can be observed to be extremely high. I enabled kernel debugging features to capture relevant logs when this issue occurs.

    The content captured by `taskset -c 1 perf record -g -C 0` is provided in `perf_report_tree.md`.

    The content of `trace_stat/function0` captured via `function_profile_enabled` is provided in `trace_stat_function0.md`.

    The kernel did not report any related error logs.

    captured.tar

  • compiled test binary:

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/908/spi_5F00_mcu_5F00_arm64.elf

    the updated spi-omape2-mcspi.c

    // SPDX-License-Identifier: GPL-2.0-or-later
    /*
     * OMAP2 McSPI controller driver
     *
     * Copyright (C) 2005, 2006 Nokia Corporation
     * Author:	Samuel Ortiz <samuel.ortiz@nokia.com> and
     *		Juha Yrjola <juha.yrjola@nokia.com>
     */
    
    #include <linux/kernel.h>
    #include <linux/interrupt.h>
    #include <linux/module.h>
    #include <linux/device.h>
    #include <linux/delay.h>
    #include <linux/dma-mapping.h>
    #include <linux/dmaengine.h>
    #include <linux/pinctrl/consumer.h>
    #include <linux/platform_device.h>
    #include <linux/err.h>
    #include <linux/clk.h>
    #include <linux/io.h>
    #include <linux/slab.h>
    #include <linux/pm_runtime.h>
    #include <linux/of.h>
    #include <linux/of_device.h>
    #include <linux/gcd.h>
    
    #include <linux/spi/spi.h>
    
    #include "internals.h"
    
    #include <linux/platform_data/spi-omap2-mcspi.h>
    
    #define OMAP2_MCSPI_MAX_FREQ		48000000
    #define OMAP2_MCSPI_MAX_DIVIDER		4096
    #define OMAP2_MCSPI_MAX_FIFODEPTH	64
    #define OMAP2_MCSPI_MAX_FIFOWCNT	0xFFFF
    #define SPI_AUTOSUSPEND_TIMEOUT		2000
    
    #define OMAP2_MCSPI_REVISION		0x00
    #define OMAP2_MCSPI_SYSSTATUS		0x14
    #define OMAP2_MCSPI_IRQSTATUS		0x18
    #define OMAP2_MCSPI_IRQENABLE		0x1c
    #define OMAP2_MCSPI_WAKEUPENABLE	0x20
    #define OMAP2_MCSPI_SYST		0x24
    #define OMAP2_MCSPI_MODULCTRL		0x28
    #define OMAP2_MCSPI_XFERLEVEL		0x7c
    
    /* per-channel banks, 0x14 bytes each, first is: */
    #define OMAP2_MCSPI_CHCONF0		0x2c
    #define OMAP2_MCSPI_CHSTAT0		0x30
    #define OMAP2_MCSPI_CHCTRL0		0x34
    #define OMAP2_MCSPI_TX0			0x38
    #define OMAP2_MCSPI_RX0			0x3c
    
    /* per-register bitmasks: */
    #define OMAP2_MCSPI_IRQSTATUS_EOW	BIT(17)
    
    #define OMAP2_MCSPI_MODULCTRL_SINGLE	BIT(0)
    #define OMAP2_MCSPI_MODULCTRL_MS	BIT(2)
    #define OMAP2_MCSPI_MODULCTRL_STEST	BIT(3)
    
    #define OMAP2_MCSPI_CHCONF_PHA		BIT(0)
    #define OMAP2_MCSPI_CHCONF_POL		BIT(1)
    #define OMAP2_MCSPI_CHCONF_CLKD_MASK	(0x0f << 2)
    #define OMAP2_MCSPI_CHCONF_EPOL		BIT(6)
    #define OMAP2_MCSPI_CHCONF_WL_MASK	(0x1f << 7)
    #define OMAP2_MCSPI_CHCONF_TRM_RX_ONLY	BIT(12)
    #define OMAP2_MCSPI_CHCONF_TRM_TX_ONLY	BIT(13)
    #define OMAP2_MCSPI_CHCONF_TRM_MASK	(0x03 << 12)
    #define OMAP2_MCSPI_CHCONF_DMAW		BIT(14)
    #define OMAP2_MCSPI_CHCONF_DMAR		BIT(15)
    #define OMAP2_MCSPI_CHCONF_DPE0		BIT(16)
    #define OMAP2_MCSPI_CHCONF_DPE1		BIT(17)
    #define OMAP2_MCSPI_CHCONF_IS		BIT(18)
    #define OMAP2_MCSPI_CHCONF_TURBO	BIT(19)
    #define OMAP2_MCSPI_CHCONF_FORCE	BIT(20)
    #define OMAP2_MCSPI_CHCONF_FFET		BIT(27)
    #define OMAP2_MCSPI_CHCONF_FFER		BIT(28)
    #define OMAP2_MCSPI_CHCONF_CLKG		BIT(29)
    
    #define OMAP2_MCSPI_CHSTAT_RXS		BIT(0)
    #define OMAP2_MCSPI_CHSTAT_TXS		BIT(1)
    #define OMAP2_MCSPI_CHSTAT_EOT		BIT(2)
    #define OMAP2_MCSPI_CHSTAT_TXFFE	BIT(3)
    
    #define OMAP2_MCSPI_CHCTRL_EN		BIT(0)
    #define OMAP2_MCSPI_CHCTRL_EXTCLK_MASK	(0xff << 8)
    
    #define OMAP2_MCSPI_WAKEUPENABLE_WKEN	BIT(0)
    
    /* We have 2 DMA channels per CS, one for RX and one for TX */
    struct omap2_mcspi_dma {
    	struct dma_chan *dma_tx;
    	struct dma_chan *dma_rx;
    	char dma_rx_ch_name[14];
    	char dma_tx_ch_name[14];
    };
    
    /* use PIO for small transfers, avoiding DMA setup/teardown overhead and
     * cache operations; better heuristics consider wordsize and bitrate.
     */
    #define DMA_MIN_BYTES			(12)
    
    
    /*
     * Used for context save and restore, structure members to be updated whenever
     * corresponding registers are modified.
     */
    struct omap2_mcspi_regs {
    	u32 modulctrl;
    	u32 wakeupenable;
    	struct list_head cs;
    };
    
    struct omap2_mcspi {
    	struct completion	txrxdone;
    	struct spi_controller	*ctlr;
    	/* Virtual base address of the controller */
    	void __iomem		*base;
    	unsigned long		phys;
    	/* SPI1 has 4 channels, while SPI2 has 2 */
    	struct omap2_mcspi_dma	*dma_channels;
    	struct device		*dev;
    	struct omap2_mcspi_regs ctx;
    	struct clk		*ref_clk;
    	int			fifo_depth;
    	bool			target_aborted;
    	unsigned int		pin_dir:1;
    	size_t			max_xfer_len;
    	u32			ref_clk_hz;
    	bool			use_multi_mode;
    	bool			last_msg_kept_cs;
    };
    
    struct omap2_mcspi_cs {
    	void __iomem		*base;
    	unsigned long		phys;
    	int			word_len;
    	u16			mode;
    	struct list_head	node;
    	/* Context save and restore shadow register */
    	u32			chconf0, chctrl0;
    };
    
    static inline void mcspi_write_reg(struct spi_controller *ctlr,
    		int idx, u32 val)
    {
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(ctlr);
    
    	writel_relaxed(val, mcspi->base + idx);
    }
    
    static inline u32 mcspi_read_reg(struct spi_controller *ctlr, int idx)
    {
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(ctlr);
    
    	return readl_relaxed(mcspi->base + idx);
    }
    
    static inline void mcspi_write_cs_reg(const struct spi_device *spi,
    		int idx, u32 val)
    {
    	struct omap2_mcspi_cs	*cs = spi->controller_state;
    
    	writel_relaxed(val, cs->base +  idx);
    }
    
    static inline u32 mcspi_read_cs_reg(const struct spi_device *spi, int idx)
    {
    	struct omap2_mcspi_cs	*cs = spi->controller_state;
    
    	return readl_relaxed(cs->base + idx);
    }
    
    static inline u32 mcspi_cached_chconf0(const struct spi_device *spi)
    {
    	struct omap2_mcspi_cs *cs = spi->controller_state;
    
    	return cs->chconf0;
    }
    
    static inline void mcspi_write_chconf0(const struct spi_device *spi, u32 val)
    {
    	struct omap2_mcspi_cs *cs = spi->controller_state;
    
    	cs->chconf0 = val;
    	mcspi_write_cs_reg(spi, OMAP2_MCSPI_CHCONF0, val);
    	mcspi_read_cs_reg(spi, OMAP2_MCSPI_CHCONF0);
    }
    
    static inline int mcspi_bytes_per_word(int word_len)
    {
    	if (word_len <= 8)
    		return 1;
    	else if (word_len <= 16)
    		return 2;
    	else /* word_len <= 32 */
    		return 4;
    }
    
    static void omap2_mcspi_set_dma_req(const struct spi_device *spi,
    		int is_read, int enable)
    {
    	u32 l, rw;
    
    	l = mcspi_cached_chconf0(spi);
    
    	if (is_read) /* 1 is read, 0 write */
    		rw = OMAP2_MCSPI_CHCONF_DMAR;
    	else
    		rw = OMAP2_MCSPI_CHCONF_DMAW;
    
    	if (enable)
    		l |= rw;
    	else
    		l &= ~rw;
    
    	mcspi_write_chconf0(spi, l);
    }
    
    static void omap2_mcspi_set_enable(const struct spi_device *spi, int enable)
    {
    	struct omap2_mcspi_cs *cs = spi->controller_state;
    	u32 l;
    
    	l = cs->chctrl0;
    	if (enable)
    		l |= OMAP2_MCSPI_CHCTRL_EN;
    	else
    		l &= ~OMAP2_MCSPI_CHCTRL_EN;
    	cs->chctrl0 = l;
    	mcspi_write_cs_reg(spi, OMAP2_MCSPI_CHCTRL0, cs->chctrl0);
    	/* Flash post-writes */
    	mcspi_read_cs_reg(spi, OMAP2_MCSPI_CHCTRL0);
    }
    
    static void omap2_mcspi_set_cs(struct spi_device *spi, bool enable)
    {
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(spi->controller);
    	u32 l;
    
    	/* The controller handles the inverted chip selects
    	 * using the OMAP2_MCSPI_CHCONF_EPOL bit so revert
    	 * the inversion from the core spi_set_cs function.
    	 */
    	if (spi->mode & SPI_CS_HIGH)
    		enable = !enable;
    
    	if (spi->controller_state) {
    		int err = pm_runtime_resume_and_get(mcspi->dev);
    		if (err < 0) {
    			dev_err(mcspi->dev, "failed to get sync: %d\n", err);
    			return;
    		}
    
    		l = mcspi_cached_chconf0(spi);
    
    		/* Only enable chip select manually if single mode is used */
    		if (mcspi->use_multi_mode) {
    			l &= ~OMAP2_MCSPI_CHCONF_FORCE;
    		} else {
    			if (enable)
    				l &= ~OMAP2_MCSPI_CHCONF_FORCE;
    			else
    				l |= OMAP2_MCSPI_CHCONF_FORCE;
    		}
    
    		mcspi_write_chconf0(spi, l);
    
    		pm_runtime_mark_last_busy(mcspi->dev);
    		pm_runtime_put_autosuspend(mcspi->dev);
    	}
    }
    
    static void omap2_mcspi_set_mode(struct spi_controller *ctlr)
    {
    	struct omap2_mcspi	*mcspi = spi_controller_get_devdata(ctlr);
    	struct omap2_mcspi_regs	*ctx = &mcspi->ctx;
    	u32 l;
    
    	/*
    	 * Choose host or target mode
    	 */
    	l = mcspi_read_reg(ctlr, OMAP2_MCSPI_MODULCTRL);
    	l &= ~(OMAP2_MCSPI_MODULCTRL_STEST);
    	if (spi_controller_is_target(ctlr)) {
    		l |= (OMAP2_MCSPI_MODULCTRL_MS);
    	} else {
    		l &= ~(OMAP2_MCSPI_MODULCTRL_MS);
    
    		/* Enable single mode if needed */
    		if (mcspi->use_multi_mode)
    			l &= ~OMAP2_MCSPI_MODULCTRL_SINGLE;
    		else
    			l |= OMAP2_MCSPI_MODULCTRL_SINGLE;
    	}
    	mcspi_write_reg(ctlr, OMAP2_MCSPI_MODULCTRL, l);
    
    	ctx->modulctrl = l;
    }
    
    static void omap2_mcspi_set_fifo(const struct spi_device *spi,
    				struct spi_transfer *t, int enable)
    {
    	struct spi_controller *ctlr = spi->controller;
    	struct omap2_mcspi_cs *cs = spi->controller_state;
    	struct omap2_mcspi *mcspi;
    	unsigned int wcnt;
    	int max_fifo_depth, bytes_per_word;
    	u32 chconf, xferlevel;
    
    	mcspi = spi_controller_get_devdata(ctlr);
    
    	chconf = mcspi_cached_chconf0(spi);
    	if (enable) {
    		bytes_per_word = mcspi_bytes_per_word(cs->word_len);
    		if (t->len % bytes_per_word != 0)
    			goto disable_fifo;
    
    		if (t->rx_buf != NULL && t->tx_buf != NULL)
    			max_fifo_depth = OMAP2_MCSPI_MAX_FIFODEPTH / 2;
    		else
    			max_fifo_depth = OMAP2_MCSPI_MAX_FIFODEPTH;
    
    		wcnt = t->len / bytes_per_word;
    		if (wcnt > OMAP2_MCSPI_MAX_FIFOWCNT)
    			goto disable_fifo;
    
    		xferlevel = wcnt << 16;
    		if (t->rx_buf != NULL) {
    			chconf |= OMAP2_MCSPI_CHCONF_FFER;
    			xferlevel |= (bytes_per_word - 1) << 8;
    		}
    
    		if (t->tx_buf != NULL) {
    			chconf |= OMAP2_MCSPI_CHCONF_FFET;
    			xferlevel |= bytes_per_word - 1;
    		}
    
    		mcspi_write_reg(ctlr, OMAP2_MCSPI_XFERLEVEL, xferlevel);
    		mcspi_write_chconf0(spi, chconf);
    		mcspi->fifo_depth = max_fifo_depth;
    
    		return;
    	}
    
    disable_fifo:
    	if (t->rx_buf != NULL)
    		chconf &= ~OMAP2_MCSPI_CHCONF_FFER;
    
    	if (t->tx_buf != NULL)
    		chconf &= ~OMAP2_MCSPI_CHCONF_FFET;
    
    	mcspi_write_chconf0(spi, chconf);
    	mcspi->fifo_depth = 0;
    }
    
    static int mcspi_wait_for_reg_bit(void __iomem *reg, unsigned long bit)
    {
    	unsigned long timeout;
    
    	timeout = jiffies + msecs_to_jiffies(1000);
    	while (!(readl_relaxed(reg) & bit)) {
    		if (time_after(jiffies, timeout)) {
    			if (!(readl_relaxed(reg) & bit))
    				return -ETIMEDOUT;
    			else
    				return 0;
    		}
    		cpu_relax();
    	}
    	return 0;
    }
    
    static int mcspi_wait_for_completion(struct  omap2_mcspi *mcspi,
    				     struct completion *x)
    {
    	if (spi_controller_is_target(mcspi->ctlr)) {
    		if (wait_for_completion_interruptible(x) ||
    		    mcspi->target_aborted)
    			return -EINTR;
    	} else {
    		wait_for_completion(x);
    	}
    
    	return 0;
    }
    
    static void omap2_mcspi_tx_dma(struct spi_device *spi,
    				struct spi_transfer *xfer,
    				struct dma_slave_config cfg)
    {
    	struct omap2_mcspi	*mcspi;
    	struct omap2_mcspi_dma  *mcspi_dma;
    	struct dma_async_tx_descriptor *tx;
    
    	mcspi = spi_controller_get_devdata(spi->controller);
    	mcspi_dma = &mcspi->dma_channels[spi_get_chipselect(spi, 0)];
    
    	dmaengine_slave_config(mcspi_dma->dma_tx, &cfg);
    
    	tx = dmaengine_prep_slave_sg(mcspi_dma->dma_tx, xfer->tx_sg.sgl,
    				     xfer->tx_sg.nents, DMA_MEM_TO_DEV, DMA_CTRL_ACK);
    	if (tx) {
    		dmaengine_submit(tx);
    	} else {
    		/* FIXME: fall back to PIO? */
    	}
    	dma_async_issue_pending(mcspi_dma->dma_tx);
    	omap2_mcspi_set_dma_req(spi, 0, 1);
    }
    
    static unsigned
    omap2_mcspi_rx_dma(struct spi_device *spi, struct spi_transfer *xfer,
    				struct dma_slave_config cfg,
    				unsigned es)
    {
    	struct omap2_mcspi	*mcspi;
    	struct omap2_mcspi_dma  *mcspi_dma;
    	unsigned int		count, transfer_reduction = 0;
    	struct scatterlist	*sg_out[2];
    	int			nb_sizes = 0, out_mapped_nents[2], ret, x;
    	size_t			sizes[2];
    	u32			l;
    	int			elements = 0;
    	int			word_len, element_count;
    	struct omap2_mcspi_cs	*cs = spi->controller_state;
    	void __iomem		*chstat_reg = cs->base + OMAP2_MCSPI_CHSTAT0;
    	struct dma_async_tx_descriptor *tx;
    	dma_cookie_t dma_rx_cookie = 0;
    	struct dma_tx_state mcspi_dma_rxstate;
    	enum dma_status dma_status;
    
    	mcspi = spi_controller_get_devdata(spi->controller);
    	mcspi_dma = &mcspi->dma_channels[spi_get_chipselect(spi, 0)];
    	count = xfer->len;
    
    	/*
    	 *  In the "End-of-Transfer Procedure" section for DMA RX in OMAP35x TRM
    	 *  it mentions reducing DMA transfer length by one element in host
    	 *  normal mode.
    	 */
    	if (mcspi->fifo_depth == 0)
    		transfer_reduction = es;
    
    	word_len = cs->word_len;
    	l = mcspi_cached_chconf0(spi);
    
    	if (word_len <= 8)
    		element_count = count;
    	else if (word_len <= 16)
    		element_count = count >> 1;
    	else /* word_len <= 32 */
    		element_count = count >> 2;
    
    
    	dmaengine_slave_config(mcspi_dma->dma_rx, &cfg);
    
    	/*
    	 *  Reduce DMA transfer length by one more if McSPI is
    	 *  configured in turbo mode.
    	 */
    	if ((l & OMAP2_MCSPI_CHCONF_TURBO) && mcspi->fifo_depth == 0)
    		transfer_reduction += es;
    
    	if (transfer_reduction) {
    		/* Split sgl into two. The second sgl won't be used. */
    		sizes[0] = count - transfer_reduction;
    		sizes[1] = transfer_reduction;
    		nb_sizes = 2;
    	} else {
    		/*
    		 * Don't bother splitting the sgl. This essentially
    		 * clones the original sgl.
    		 */
    		sizes[0] = count;
    		nb_sizes = 1;
    	}
    
    	ret = sg_split(xfer->rx_sg.sgl, xfer->rx_sg.nents, 0, nb_sizes,
    			       sizes, sg_out, out_mapped_nents, GFP_KERNEL);
    
    		if (ret < 0) {
    			dev_err(&spi->dev, "sg_split failed\n");
    			return 0;
    		}
    
    		tx = dmaengine_prep_slave_sg(mcspi_dma->dma_rx, sg_out[0],
    				     out_mapped_nents[0], DMA_DEV_TO_MEM, DMA_CTRL_ACK);
    
    	if (tx) {
    		dma_rx_cookie = dmaengine_submit(tx);
    	} else {
    		/* FIXME: fall back to PIO? */
    	}
    
    	dma_async_issue_pending(mcspi_dma->dma_rx);
    	omap2_mcspi_set_dma_req(spi, 1, 1);
    
    	ret = mcspi_wait_for_completion(mcspi, &mcspi->txrxdone);
    
    	/*
    	* Before disabling RX DMA we need to confirm whether DMA RX is complete.
    	* This polling completes on the first attempt itself in most cases.
    	*/
    	do {
    		dma_status = dmaengine_tx_status(mcspi_dma->dma_rx, dma_rx_cookie,
    										&mcspi_dma_rxstate);
    	} while (dma_status != DMA_COMPLETE);
    
    	omap2_mcspi_set_dma_req(spi, 1, 0);
    	if (ret || mcspi->target_aborted) {
    		dmaengine_terminate_sync(mcspi_dma->dma_rx);
    		return 0;
    	}
    
    	for (x = 0; x < nb_sizes; x++)
    		kfree(sg_out[x]);
    
    	if (mcspi->fifo_depth > 0)
    		return count;
    
    	/*
    	 *  Due to the DMA transfer length reduction the missing bytes must
    	 *  be read manually to receive all of the expected data.
    	 */
    	omap2_mcspi_set_enable(spi, 0);
    
    	elements = element_count - 1;
    
    	if (l & OMAP2_MCSPI_CHCONF_TURBO) {
    		elements--;
    
    		if (!mcspi_wait_for_reg_bit(chstat_reg,
    					    OMAP2_MCSPI_CHSTAT_RXS)) {
    			u32 w;
    
    			w = mcspi_read_cs_reg(spi, OMAP2_MCSPI_RX0);
    			if (word_len <= 8)
    				((u8 *)xfer->rx_buf)[elements++] = w;
    			else if (word_len <= 16)
    				((u16 *)xfer->rx_buf)[elements++] = w;
    			else /* word_len <= 32 */
    				((u32 *)xfer->rx_buf)[elements++] = w;
    		} else {
    			int bytes_per_word = mcspi_bytes_per_word(word_len);
    			dev_err(&spi->dev, "DMA RX penultimate word empty\n");
    			count -= (bytes_per_word << 1);
    			omap2_mcspi_set_enable(spi, 1);
    			return count;
    		}
    	}
    	if (!mcspi_wait_for_reg_bit(chstat_reg, OMAP2_MCSPI_CHSTAT_RXS)) {
    		u32 w;
    
    		w = mcspi_read_cs_reg(spi, OMAP2_MCSPI_RX0);
    		if (word_len <= 8)
    			((u8 *)xfer->rx_buf)[elements] = w;
    		else if (word_len <= 16)
    			((u16 *)xfer->rx_buf)[elements] = w;
    		else /* word_len <= 32 */
    			((u32 *)xfer->rx_buf)[elements] = w;
    	} else {
    		dev_err(&spi->dev, "DMA RX last word empty\n");
    		count -= mcspi_bytes_per_word(word_len);
    	}
    	omap2_mcspi_set_enable(spi, 1);
    	return count;
    }
    
    static unsigned
    omap2_mcspi_txrx_dma(struct spi_device *spi, struct spi_transfer *xfer)
    {
    	struct omap2_mcspi	*mcspi;
    	struct omap2_mcspi_cs	*cs = spi->controller_state;
    	unsigned int		count;
    	u8			*rx;
    	const u8		*tx;
    	struct dma_slave_config	cfg;
    	enum dma_slave_buswidth width;
    	unsigned es;
    	void __iomem		*chstat_reg;
    	int			wait_res;
    	int ret;
    
    	mcspi = spi_controller_get_devdata(spi->controller);
    
    	if (cs->word_len <= 8) {
    		width = DMA_SLAVE_BUSWIDTH_1_BYTE;
    		es = 1;
    	} else if (cs->word_len <= 16) {
    		width = DMA_SLAVE_BUSWIDTH_2_BYTES;
    		es = 2;
    	} else {
    		width = DMA_SLAVE_BUSWIDTH_4_BYTES;
    		es = 4;
    	}
    
    	count = xfer->len;
    
    	memset(&cfg, 0, sizeof(cfg));
    	cfg.src_addr = cs->phys + OMAP2_MCSPI_RX0;
    	cfg.dst_addr = cs->phys + OMAP2_MCSPI_TX0;
    	cfg.src_addr_width = width;
    	cfg.dst_addr_width = width;
    	cfg.src_maxburst = 1;
    	cfg.dst_maxburst = 1;
    
    	rx = xfer->rx_buf;
    	tx = xfer->tx_buf;
    
    	mcspi->target_aborted = false;
    	reinit_completion(&mcspi->txrxdone);
    	mcspi_write_reg(spi->controller, OMAP2_MCSPI_IRQENABLE,     OMAP2_MCSPI_IRQSTATUS_EOW);
    	if (tx)
    		omap2_mcspi_tx_dma(spi, xfer, cfg);
    
    	if (rx)
    		count = omap2_mcspi_rx_dma(spi, xfer, cfg, es);
    
    	ret = mcspi_wait_for_completion(mcspi, &mcspi->txrxdone);
    	omap2_mcspi_set_dma_req(spi, 0, 0);
    	if (ret || mcspi->target_aborted)
    		return 0;
    	/* for TX_ONLY mode, be sure all words have shifted out */
    	if (tx && !rx) {
    		chstat_reg = cs->base + OMAP2_MCSPI_CHSTAT0;
    		if (mcspi->fifo_depth > 0) {
    			wait_res = mcspi_wait_for_reg_bit(chstat_reg,
    				 OMAP2_MCSPI_CHSTAT_TXFFE);
    			if (wait_res < 0)
    				dev_err(&spi->dev, "TXFFE timed out\n");
    			} else {
    				wait_res = mcspi_wait_for_reg_bit(chstat_reg,
    					OMAP2_MCSPI_CHSTAT_TXS);
    				if (wait_res < 0)
    					dev_err(&spi->dev, "TXS timed out\n");
    		}
    		if (wait_res >= 0 &&
    			(mcspi_wait_for_reg_bit(chstat_reg,
    					OMAP2_MCSPI_CHSTAT_EOT) < 0))
    			dev_err(&spi->dev, "EOT timed out\n");
    	}
    	return count;
    }
    
    static unsigned
    omap2_mcspi_txrx_pio(struct spi_device *spi, struct spi_transfer *xfer)
    {
    	struct omap2_mcspi_cs	*cs = spi->controller_state;
    	unsigned int		count, c;
    	u32			l;
    	void __iomem		*base = cs->base;
    	void __iomem		*tx_reg;
    	void __iomem		*rx_reg;
    	void __iomem		*chstat_reg;
    	int			word_len;
    
    	count = xfer->len;
    	c = count;
    	word_len = cs->word_len;
    
    	l = mcspi_cached_chconf0(spi);
    
    	/* We store the pre-calculated register addresses on stack to speed
    	 * up the transfer loop. */
    	tx_reg		= base + OMAP2_MCSPI_TX0;
    	rx_reg		= base + OMAP2_MCSPI_RX0;
    	chstat_reg	= base + OMAP2_MCSPI_CHSTAT0;
    
    	if (c < (word_len>>3))
    		return 0;
    
    	if (word_len <= 8) {
    		u8		*rx;
    		const u8	*tx;
    
    		rx = xfer->rx_buf;
    		tx = xfer->tx_buf;
    
    		do {
    			c -= 1;
    			if (tx != NULL) {
    				if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_TXS) < 0) {
    					dev_err(&spi->dev, "TXS timed out\n");
    					goto out;
    				}
    				dev_vdbg(&spi->dev, "write-%d %02x\n",
    						word_len, *tx);
    				writel_relaxed(*tx++, tx_reg);
    			}
    			if (rx != NULL) {
    				if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_RXS) < 0) {
    					dev_err(&spi->dev, "RXS timed out\n");
    					goto out;
    				}
    
    				if (c == 1 && tx == NULL &&
    				    (l & OMAP2_MCSPI_CHCONF_TURBO)) {
    					omap2_mcspi_set_enable(spi, 0);
    					*rx++ = readl_relaxed(rx_reg);
    					dev_vdbg(&spi->dev, "read-%d %02x\n",
    						    word_len, *(rx - 1));
    					if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_RXS) < 0) {
    						dev_err(&spi->dev,
    							"RXS timed out\n");
    						goto out;
    					}
    					c = 0;
    				} else if (c == 0 && tx == NULL) {
    					omap2_mcspi_set_enable(spi, 0);
    				}
    
    				*rx++ = readl_relaxed(rx_reg);
    				dev_vdbg(&spi->dev, "read-%d %02x\n",
    						word_len, *(rx - 1));
    			}
    			/* Add word delay between each word */
    			spi_delay_exec(&xfer->word_delay, xfer);
    		} while (c);
    	} else if (word_len <= 16) {
    		u16		*rx;
    		const u16	*tx;
    
    		rx = xfer->rx_buf;
    		tx = xfer->tx_buf;
    		do {
    			c -= 2;
    			if (tx != NULL) {
    				if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_TXS) < 0) {
    					dev_err(&spi->dev, "TXS timed out\n");
    					goto out;
    				}
    				dev_vdbg(&spi->dev, "write-%d %04x\n",
    						word_len, *tx);
    				writel_relaxed(*tx++, tx_reg);
    			}
    			if (rx != NULL) {
    				if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_RXS) < 0) {
    					dev_err(&spi->dev, "RXS timed out\n");
    					goto out;
    				}
    
    				if (c == 2 && tx == NULL &&
    				    (l & OMAP2_MCSPI_CHCONF_TURBO)) {
    					omap2_mcspi_set_enable(spi, 0);
    					*rx++ = readl_relaxed(rx_reg);
    					dev_vdbg(&spi->dev, "read-%d %04x\n",
    						    word_len, *(rx - 1));
    					if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_RXS) < 0) {
    						dev_err(&spi->dev,
    							"RXS timed out\n");
    						goto out;
    					}
    					c = 0;
    				} else if (c == 0 && tx == NULL) {
    					omap2_mcspi_set_enable(spi, 0);
    				}
    
    				*rx++ = readl_relaxed(rx_reg);
    				dev_vdbg(&spi->dev, "read-%d %04x\n",
    						word_len, *(rx - 1));
    			}
    			/* Add word delay between each word */
    			spi_delay_exec(&xfer->word_delay, xfer);
    		} while (c >= 2);
    	} else if (word_len <= 32) {
    		u32		*rx;
    		const u32	*tx;
    
    		rx = xfer->rx_buf;
    		tx = xfer->tx_buf;
    		do {
    			c -= 4;
    			if (tx != NULL) {
    				if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_TXS) < 0) {
    					dev_err(&spi->dev, "TXS timed out\n");
    					goto out;
    				}
    				dev_vdbg(&spi->dev, "write-%d %08x\n",
    						word_len, *tx);
    				writel_relaxed(*tx++, tx_reg);
    			}
    			if (rx != NULL) {
    				if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_RXS) < 0) {
    					dev_err(&spi->dev, "RXS timed out\n");
    					goto out;
    				}
    
    				if (c == 4 && tx == NULL &&
    				    (l & OMAP2_MCSPI_CHCONF_TURBO)) {
    					omap2_mcspi_set_enable(spi, 0);
    					*rx++ = readl_relaxed(rx_reg);
    					dev_vdbg(&spi->dev, "read-%d %08x\n",
    						    word_len, *(rx - 1));
    					if (mcspi_wait_for_reg_bit(chstat_reg,
    						OMAP2_MCSPI_CHSTAT_RXS) < 0) {
    						dev_err(&spi->dev,
    							"RXS timed out\n");
    						goto out;
    					}
    					c = 0;
    				} else if (c == 0 && tx == NULL) {
    					omap2_mcspi_set_enable(spi, 0);
    				}
    
    				*rx++ = readl_relaxed(rx_reg);
    				dev_vdbg(&spi->dev, "read-%d %08x\n",
    						word_len, *(rx - 1));
    			}
    			/* Add word delay between each word */
    			spi_delay_exec(&xfer->word_delay, xfer);
    		} while (c >= 4);
    	}
    
    	/* for TX_ONLY mode, be sure all words have shifted out */
    	if (xfer->rx_buf == NULL) {
    		if (mcspi_wait_for_reg_bit(chstat_reg,
    				OMAP2_MCSPI_CHSTAT_TXS) < 0) {
    			dev_err(&spi->dev, "TXS timed out\n");
    		} else if (mcspi_wait_for_reg_bit(chstat_reg,
    				OMAP2_MCSPI_CHSTAT_EOT) < 0)
    			dev_err(&spi->dev, "EOT timed out\n");
    
    		/* disable chan to purge rx datas received in TX_ONLY transfer,
    		 * otherwise these rx datas will affect the direct following
    		 * RX_ONLY transfer.
    		 */
    		omap2_mcspi_set_enable(spi, 0);
    	}
    out:
    	omap2_mcspi_set_enable(spi, 1);
    	return count - c;
    }
    
    static u32 omap2_mcspi_calc_divisor(u32 speed_hz, u32 ref_clk_hz)
    {
    	u32 div;
    
    	for (div = 0; div < 15; div++)
    		if (speed_hz >= (ref_clk_hz >> div))
    			return div;
    
    	return 15;
    }
    
    /* called only when no transfer is active to this device */
    static int omap2_mcspi_setup_transfer(struct spi_device *spi,
    		struct spi_transfer *t)
    {
    	struct omap2_mcspi_cs *cs = spi->controller_state;
    	struct omap2_mcspi *mcspi;
    	u32 ref_clk_hz, l = 0, clkd = 0, div, extclk = 0, clkg = 0;
    	u8 word_len = spi->bits_per_word;
    	u32 speed_hz = spi->max_speed_hz;
    
    	mcspi = spi_controller_get_devdata(spi->controller);
    
    	if (t != NULL && t->bits_per_word)
    		word_len = t->bits_per_word;
    
    	cs->word_len = word_len;
    
    	if (t && t->speed_hz)
    		speed_hz = t->speed_hz;
    
    	ref_clk_hz = mcspi->ref_clk_hz;
    	speed_hz = min_t(u32, speed_hz, ref_clk_hz);
    	if (speed_hz < (ref_clk_hz / OMAP2_MCSPI_MAX_DIVIDER)) {
    		clkd = omap2_mcspi_calc_divisor(speed_hz, ref_clk_hz);
    		speed_hz = ref_clk_hz >> clkd;
    		clkg = 0;
    	} else {
    		div = (ref_clk_hz + speed_hz - 1) / speed_hz;
    		speed_hz = ref_clk_hz / div;
    		clkd = (div - 1) & 0xf;
    		extclk = (div - 1) >> 4;
    		clkg = OMAP2_MCSPI_CHCONF_CLKG;
    	}
    
    	l = mcspi_cached_chconf0(spi);
    
    	/* standard 4-wire host mode:  SCK, MOSI/out, MISO/in, nCS
    	 * REVISIT: this controller could support SPI_3WIRE mode.
    	 */
    	if (mcspi->pin_dir == MCSPI_PINDIR_D0_IN_D1_OUT) {
    		l &= ~OMAP2_MCSPI_CHCONF_IS;
    		l &= ~OMAP2_MCSPI_CHCONF_DPE1;
    		l |= OMAP2_MCSPI_CHCONF_DPE0;
    	} else {
    		l |= OMAP2_MCSPI_CHCONF_IS;
    		l |= OMAP2_MCSPI_CHCONF_DPE1;
    		l &= ~OMAP2_MCSPI_CHCONF_DPE0;
    	}
    
    	/* wordlength */
    	l &= ~OMAP2_MCSPI_CHCONF_WL_MASK;
    	l |= (word_len - 1) << 7;
    
    	/* set chipselect polarity; manage with FORCE */
    	if (!(spi->mode & SPI_CS_HIGH))
    		l |= OMAP2_MCSPI_CHCONF_EPOL;	/* active-low; normal */
    	else
    		l &= ~OMAP2_MCSPI_CHCONF_EPOL;
    
    	/* set clock divisor */
    	l &= ~OMAP2_MCSPI_CHCONF_CLKD_MASK;
    	l |= clkd << 2;
    
    	/* set clock granularity */
    	l &= ~OMAP2_MCSPI_CHCONF_CLKG;
    	l |= clkg;
    	if (clkg) {
    		cs->chctrl0 &= ~OMAP2_MCSPI_CHCTRL_EXTCLK_MASK;
    		cs->chctrl0 |= extclk << 8;
    		mcspi_write_cs_reg(spi, OMAP2_MCSPI_CHCTRL0, cs->chctrl0);
    	}
    
    	/* set SPI mode 0..3 */
    	if (spi->mode & SPI_CPOL)
    		l |= OMAP2_MCSPI_CHCONF_POL;
    	else
    		l &= ~OMAP2_MCSPI_CHCONF_POL;
    	if (spi->mode & SPI_CPHA)
    		l |= OMAP2_MCSPI_CHCONF_PHA;
    	else
    		l &= ~OMAP2_MCSPI_CHCONF_PHA;
    
    	mcspi_write_chconf0(spi, l);
    
    	cs->mode = spi->mode;
    
    	dev_dbg(&spi->dev, "setup: speed %d, sample %s edge, clk %s\n",
    			speed_hz,
    			(spi->mode & SPI_CPHA) ? "trailing" : "leading",
    			(spi->mode & SPI_CPOL) ? "inverted" : "normal");
    
    	return 0;
    }
    
    /*
     * Note that we currently allow DMA only if we get a channel
     * for both rx and tx. Otherwise we'll do PIO for both rx and tx.
     */
    static int omap2_mcspi_request_dma(struct omap2_mcspi *mcspi,
    				   struct omap2_mcspi_dma *mcspi_dma)
    {
    	int ret = 0;
    
    	mcspi_dma->dma_rx = dma_request_chan(mcspi->dev,
    					     mcspi_dma->dma_rx_ch_name);
    	if (IS_ERR(mcspi_dma->dma_rx)) {
    		ret = PTR_ERR(mcspi_dma->dma_rx);
    		mcspi_dma->dma_rx = NULL;
    		goto no_dma;
    	}
    
    	mcspi_dma->dma_tx = dma_request_chan(mcspi->dev,
    					     mcspi_dma->dma_tx_ch_name);
    	if (IS_ERR(mcspi_dma->dma_tx)) {
    		ret = PTR_ERR(mcspi_dma->dma_tx);
    		mcspi_dma->dma_tx = NULL;
    		dma_release_channel(mcspi_dma->dma_rx);
    		mcspi_dma->dma_rx = NULL;
    	}
    
    no_dma:
    	return ret;
    }
    
    static void omap2_mcspi_release_dma(struct spi_controller *ctlr)
    {
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(ctlr);
    	struct omap2_mcspi_dma	*mcspi_dma;
    	int i;
    
    	for (i = 0; i < ctlr->num_chipselect; i++) {
    		mcspi_dma = &mcspi->dma_channels[i];
    
    		if (mcspi_dma->dma_rx) {
    			dma_release_channel(mcspi_dma->dma_rx);
    			mcspi_dma->dma_rx = NULL;
    		}
    		if (mcspi_dma->dma_tx) {
    			dma_release_channel(mcspi_dma->dma_tx);
    			mcspi_dma->dma_tx = NULL;
    		}
    	}
    }
    
    static void omap2_mcspi_cleanup(struct spi_device *spi)
    {
    	struct omap2_mcspi_cs	*cs;
    
    	if (spi->controller_state) {
    		/* Unlink controller state from context save list */
    		cs = spi->controller_state;
    		list_del(&cs->node);
    
    		kfree(cs);
    	}
    }
    
    static int omap2_mcspi_setup(struct spi_device *spi)
    {
    	bool			initial_setup = false;
    	int			ret;
    	struct omap2_mcspi	*mcspi = spi_controller_get_devdata(spi->controller);
    	struct omap2_mcspi_regs	*ctx = &mcspi->ctx;
    	struct omap2_mcspi_cs	*cs = spi->controller_state;
    	struct omap2_mcspi_device_config *cd = spi->controller_data;
    
    	if (!cs) {
    		cs = kzalloc(sizeof(*cs), GFP_KERNEL);
    		if (!cs)
    			return -ENOMEM;
    		cs->base = mcspi->base + spi_get_chipselect(spi, 0) * 0x14;
    		cs->phys = mcspi->phys + spi_get_chipselect(spi, 0) * 0x14;
    		cs->mode = 0;
    		cs->chconf0 = 0;
    		cs->chctrl0 = 0;
    		spi->controller_state = cs;
    		/* Link this to context save list */
    		list_add_tail(&cs->node, &ctx->cs);
    		initial_setup = true;
    	}
    
    	if (!cd) {
    		cd = devm_kzalloc(mcspi->dev, sizeof(*cd), GFP_KERNEL);
    		if (!cd) {
    			if (initial_setup)
    				/* Since the cd allocation failed, cs should free too */
    				omap2_mcspi_cleanup(spi);
    			return -ENOMEM;
    		}
    
    		/* Enables turbo mode as default */
    		cd->turbo_mode = 1;
    		spi->controller_data = cd;
    		dev_dbg(&spi->dev, "%s: enabling TURBO mode\n",
    				__func__);
    	}
    
    	ret = pm_runtime_resume_and_get(mcspi->dev);
    	if (ret < 0) {
    		if (initial_setup)
    			omap2_mcspi_cleanup(spi);
    
    		return ret;
    	}
    
    	ret = omap2_mcspi_setup_transfer(spi, NULL);
    	if (ret && initial_setup)
    		omap2_mcspi_cleanup(spi);
    
    	pm_runtime_mark_last_busy(mcspi->dev);
    	pm_runtime_put_autosuspend(mcspi->dev);
    
    	return ret;
    }
    
    static irqreturn_t omap2_mcspi_irq_handler(int irq, void *data)
    {
    	struct omap2_mcspi *mcspi = data;
    	u32 irqstat;
    
    	irqstat	= mcspi_read_reg(mcspi->ctlr, OMAP2_MCSPI_IRQSTATUS);
    	if (!irqstat)
    		return IRQ_NONE;
    
    	/* Disable IRQ and wakeup target xfer task */
    	mcspi_write_reg(mcspi->ctlr, OMAP2_MCSPI_IRQENABLE, 0);
    	if (irqstat & OMAP2_MCSPI_IRQSTATUS_EOW) {
    		complete_all(&mcspi->txrxdone);
    		mcspi_write_reg(mcspi->ctlr, OMAP2_MCSPI_IRQSTATUS, OMAP2_MCSPI_IRQSTATUS_EOW);
    	}
    
    	return IRQ_HANDLED;
    }
    
    static int omap2_mcspi_target_abort(struct spi_controller *ctlr)
    {
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(ctlr);
    
    	mcspi->target_aborted = true;
    	complete_all(&mcspi->txrxdone);
    
    	return 0;
    }
    
    static int omap2_mcspi_transfer_one(struct spi_controller *ctlr,
    				    struct spi_device *spi,
    				    struct spi_transfer *t)
    {
    
    	/* We only enable one channel at a time -- the one whose message is
    	 * -- although this controller would gladly
    	 * arbitrate among multiple channels.  This corresponds to "single
    	 * channel" host mode.  As a side effect, we need to manage the
    	 * chipselect with the FORCE bit ... CS != channel enable.
    	 */
    
    	struct omap2_mcspi		*mcspi;
    	struct omap2_mcspi_dma		*mcspi_dma;
    	struct omap2_mcspi_cs		*cs;
    	struct omap2_mcspi_device_config *cd;
    	int				par_override = 0;
    	int				status = 0;
    	u32				chconf;
    
    	mcspi = spi_controller_get_devdata(ctlr);
    	mcspi_dma = mcspi->dma_channels + spi_get_chipselect(spi, 0);
    	cs = spi->controller_state;
    	cd = spi->controller_data;
    
    	/*
    	 * The target driver could have changed spi->mode in which case
    	 * it will be different from cs->mode (the current hardware setup).
    	 * If so, set par_override (even though its not a parity issue) so
    	 * omap2_mcspi_setup_transfer will be called to configure the hardware
    	 * with the correct mode on the first iteration of the loop below.
    	 */
    	if (spi->mode != cs->mode)
    		par_override = 1;
    
    	omap2_mcspi_set_enable(spi, 0);
    
    	if (spi_get_csgpiod(spi, 0))
    		omap2_mcspi_set_cs(spi, spi->mode & SPI_CS_HIGH);
    
    	if (par_override ||
    	    (t->speed_hz != spi->max_speed_hz) ||
    	    (t->bits_per_word != spi->bits_per_word)) {
    		par_override = 1;
    		status = omap2_mcspi_setup_transfer(spi, t);
    		if (status < 0)
    			goto out;
    		if (t->speed_hz == spi->max_speed_hz &&
    		    t->bits_per_word == spi->bits_per_word)
    			par_override = 0;
    	}
    
    	chconf = mcspi_cached_chconf0(spi);
    	chconf &= ~OMAP2_MCSPI_CHCONF_TRM_MASK;
    	chconf &= ~OMAP2_MCSPI_CHCONF_TURBO;
    
    	if (t->tx_buf == NULL)
    		chconf |= OMAP2_MCSPI_CHCONF_TRM_RX_ONLY;
    	else if (t->rx_buf == NULL)
    		chconf |= OMAP2_MCSPI_CHCONF_TRM_TX_ONLY;
    
    	if (cd && cd->turbo_mode) {
    		/* Turbo mode is for more than one word.
    		 * Also if the speed is set as above 25MHz then turbo mode should stay enabled.
    		 */
    		if ((t->len > ((cs->word_len + 7) >> 3)) || (t->speed_hz > 25000000))
    			chconf |= OMAP2_MCSPI_CHCONF_TURBO;
    	}
    
    	mcspi_write_chconf0(spi, chconf);
    
    	if (t->len) {
    		unsigned	count;
    
    		if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) &&
    		    spi_xfer_is_dma_mapped(ctlr, spi, t))
    			omap2_mcspi_set_fifo(spi, t, 1);
    
    		omap2_mcspi_set_enable(spi, 1);
    
    		/* RX_ONLY mode needs dummy data in TX reg */
    		if (t->tx_buf == NULL)
    			writel_relaxed(0, cs->base
    					+ OMAP2_MCSPI_TX0);
    
    		if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) &&
    		    spi_xfer_is_dma_mapped(ctlr, spi, t))
    			count = omap2_mcspi_txrx_dma(spi, t);
    		else
    			count = omap2_mcspi_txrx_pio(spi, t);
    
    		if (count != t->len) {
    			status = -EIO;
    			goto out;
    		}
    	}
    
    	omap2_mcspi_set_enable(spi, 0);
    
    	if (mcspi->fifo_depth > 0)
    		omap2_mcspi_set_fifo(spi, t, 0);
    
    out:
    	/* Restore defaults if they were overriden */
    	if (par_override) {
    		par_override = 0;
    		status = omap2_mcspi_setup_transfer(spi, NULL);
    	}
    
    	omap2_mcspi_set_enable(spi, 0);
    
    	if (spi_get_csgpiod(spi, 0))
    		omap2_mcspi_set_cs(spi, !(spi->mode & SPI_CS_HIGH));
    
    	if (mcspi->fifo_depth > 0 && t)
    		omap2_mcspi_set_fifo(spi, t, 0);
    
    	return status;
    }
    
    static int omap2_mcspi_prepare_message(struct spi_controller *ctlr,
    				       struct spi_message *msg)
    {
    	struct omap2_mcspi	*mcspi = spi_controller_get_devdata(ctlr);
    	struct omap2_mcspi_regs	*ctx = &mcspi->ctx;
    	struct omap2_mcspi_cs	*cs;
    	struct spi_transfer	*tr;
    	u8 bits_per_word;
    
    	/*
    	 * The conditions are strict, it is mandatory to check each transfer of the list to see if
    	 * multi-mode is applicable.
    	 */
    	mcspi->use_multi_mode = true;
    
    	if (mcspi->last_msg_kept_cs)
    		mcspi->use_multi_mode = false;
    
    	list_for_each_entry(tr, &msg->transfers, transfer_list) {
    		if (!tr->bits_per_word)
    			bits_per_word = msg->spi->bits_per_word;
    		else
    			bits_per_word = tr->bits_per_word;
    
    		/*
    		 * Check if this transfer contains only one word;
    		 */
    		if (bits_per_word < 8 && tr->len == 1) {
    			/* multi-mode is applicable, only one word (1..7 bits) */
    		} else if (bits_per_word >= 8 && tr->len == bits_per_word / 8) {
    			/* multi-mode is applicable, only one word (8..32 bits) */
    		} else {
    			/* multi-mode is not applicable: more than one word in the transfer */
    			mcspi->use_multi_mode = false;
    		}
    
    		if (list_is_last(&tr->transfer_list, &msg->transfers)) {
    			/* Check if transfer asks to keep the CS status after the whole message */
    			if (tr->cs_change) {
    				mcspi->use_multi_mode = false;
    				mcspi->last_msg_kept_cs = true;
    			} else {
    				mcspi->last_msg_kept_cs = false;
    			}
    		} else {
    			/* Check if transfer asks to change the CS status after the transfer */
    			if (!tr->cs_change)
    				mcspi->use_multi_mode = false;
    		}
    	}
    
    	omap2_mcspi_set_mode(ctlr);
    
    	/* In single mode only a single channel can have the FORCE bit enabled
    	 * in its chconf0 register.
    	 * Scan all channels and disable them except the current one.
    	 * A FORCE can remain from a last transfer having cs_change enabled
    	 *
    	 * In multi mode all FORCE bits must be disabled.
    	 */
    	list_for_each_entry(cs, &ctx->cs, node) {
    		if (msg->spi->controller_state == cs && !mcspi->use_multi_mode) {
    			continue;
    		}
    
    		if ((cs->chconf0 & OMAP2_MCSPI_CHCONF_FORCE)) {
    			cs->chconf0 &= ~OMAP2_MCSPI_CHCONF_FORCE;
    			writel_relaxed(cs->chconf0,
    					cs->base + OMAP2_MCSPI_CHCONF0);
    			readl_relaxed(cs->base + OMAP2_MCSPI_CHCONF0);
    		}
    	}
    
    	return 0;
    }
    
    static bool omap2_mcspi_can_dma(struct spi_controller *ctlr,
    				struct spi_device *spi,
    				struct spi_transfer *xfer)
    {
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(spi->controller);
    	struct omap2_mcspi_dma *mcspi_dma =
    		&mcspi->dma_channels[spi_get_chipselect(spi, 0)];
    
    	if (!mcspi_dma->dma_rx || !mcspi_dma->dma_tx)
    		return false;
    
    	if (spi_controller_is_target(ctlr))
    		return true;
    
    	ctlr->dma_rx = mcspi_dma->dma_rx;
    	ctlr->dma_tx = mcspi_dma->dma_tx;
    
    	return (xfer->len >= DMA_MIN_BYTES);
    }
    
    static size_t omap2_mcspi_max_xfer_size(struct spi_device *spi)
    {
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(spi->controller);
    	struct omap2_mcspi_dma *mcspi_dma =
    		&mcspi->dma_channels[spi_get_chipselect(spi, 0)];
    
    	if (mcspi->max_xfer_len && mcspi_dma->dma_rx)
    		return mcspi->max_xfer_len;
    
    	return SIZE_MAX;
    }
    
    static int omap2_mcspi_controller_setup(struct omap2_mcspi *mcspi)
    {
    	struct spi_controller	*ctlr = mcspi->ctlr;
    	struct omap2_mcspi_regs	*ctx = &mcspi->ctx;
    	int			ret = 0;
    
    	ret = pm_runtime_resume_and_get(mcspi->dev);
    	if (ret < 0)
    		return ret;
    
    	mcspi_write_reg(ctlr, OMAP2_MCSPI_WAKEUPENABLE,
    			OMAP2_MCSPI_WAKEUPENABLE_WKEN);
    	ctx->wakeupenable = OMAP2_MCSPI_WAKEUPENABLE_WKEN;
    
    	omap2_mcspi_set_mode(ctlr);
    	pm_runtime_mark_last_busy(mcspi->dev);
    	pm_runtime_put_autosuspend(mcspi->dev);
    	return 0;
    }
    
    static int omap_mcspi_runtime_suspend(struct device *dev)
    {
    	int error;
    
    	error = pinctrl_pm_select_idle_state(dev);
    	if (error)
    		dev_warn(dev, "%s: failed to set pins: %i\n", __func__, error);
    
    	return 0;
    }
    
    /*
     * When SPI wake up from off-mode, CS is in activate state. If it was in
     * inactive state when driver was suspend, then force it to inactive state at
     * wake up.
     */
    static int omap_mcspi_runtime_resume(struct device *dev)
    {
    	struct spi_controller *ctlr = dev_get_drvdata(dev);
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(ctlr);
    	struct omap2_mcspi_regs *ctx = &mcspi->ctx;
    	struct omap2_mcspi_cs *cs;
    	int error;
    
    	error = pinctrl_pm_select_default_state(dev);
    	if (error)
    		dev_warn(dev, "%s: failed to set pins: %i\n", __func__, error);
    
    	/* McSPI: context restore */
    	mcspi_write_reg(ctlr, OMAP2_MCSPI_MODULCTRL, ctx->modulctrl);
    	mcspi_write_reg(ctlr, OMAP2_MCSPI_WAKEUPENABLE, ctx->wakeupenable);
    
    	list_for_each_entry(cs, &ctx->cs, node) {
    		/*
    		 * We need to toggle CS state for OMAP take this
    		 * change in account.
    		 */
    		if ((cs->chconf0 & OMAP2_MCSPI_CHCONF_FORCE) == 0) {
    			cs->chconf0 |= OMAP2_MCSPI_CHCONF_FORCE;
    			writel_relaxed(cs->chconf0,
    				       cs->base + OMAP2_MCSPI_CHCONF0);
    			cs->chconf0 &= ~OMAP2_MCSPI_CHCONF_FORCE;
    			writel_relaxed(cs->chconf0,
    				       cs->base + OMAP2_MCSPI_CHCONF0);
    		} else {
    			writel_relaxed(cs->chconf0,
    				       cs->base + OMAP2_MCSPI_CHCONF0);
    		}
    	}
    
    	return 0;
    }
    
    static struct omap2_mcspi_platform_config omap2_pdata = {
    	.regs_offset = 0,
    };
    
    static struct omap2_mcspi_platform_config omap4_pdata = {
    	.regs_offset = OMAP4_MCSPI_REG_OFFSET,
    };
    
    static struct omap2_mcspi_platform_config am654_pdata = {
    	.regs_offset = OMAP4_MCSPI_REG_OFFSET,
    	.max_xfer_len = SZ_4K - 1,
    };
    
    static const struct of_device_id omap_mcspi_of_match[] = {
    	{
    		.compatible = "ti,omap2-mcspi",
    		.data = &omap2_pdata,
    	},
    	{
    		.compatible = "ti,omap4-mcspi",
    		.data = &omap4_pdata,
    	},
    	{
    		.compatible = "ti,am654-mcspi",
    		.data = &am654_pdata,
    	},
    	{ },
    };
    MODULE_DEVICE_TABLE(of, omap_mcspi_of_match);
    
    static int omap2_mcspi_probe(struct platform_device *pdev)
    {
    	struct spi_controller	*ctlr;
    	const struct omap2_mcspi_platform_config *pdata;
    	struct omap2_mcspi	*mcspi;
    	struct resource		*r;
    	int			status = 0, i;
    	u32			regs_offset = 0;
    	struct device_node	*node = pdev->dev.of_node;
    	const struct of_device_id *match;
    
    	if (of_property_read_bool(node, "spi-slave"))
    		ctlr = spi_alloc_target(&pdev->dev, sizeof(*mcspi));
    	else
    		ctlr = spi_alloc_host(&pdev->dev, sizeof(*mcspi));
    	if (!ctlr)
    		return -ENOMEM;
    
    	/* the spi->mode bits understood by this driver: */
    	ctlr->mode_bits = SPI_CPOL | SPI_CPHA | SPI_CS_HIGH;
    	ctlr->bits_per_word_mask = SPI_BPW_RANGE_MASK(4, 32);
    	ctlr->setup = omap2_mcspi_setup;
    	ctlr->auto_runtime_pm = true;
    	ctlr->prepare_message = omap2_mcspi_prepare_message;
    	ctlr->can_dma = omap2_mcspi_can_dma;
    	ctlr->transfer_one = omap2_mcspi_transfer_one;
    	ctlr->set_cs = omap2_mcspi_set_cs;
    	ctlr->cleanup = omap2_mcspi_cleanup;
    	ctlr->target_abort = omap2_mcspi_target_abort;
    	ctlr->dev.of_node = node;
    	ctlr->use_gpio_descriptors = true;
    
    	platform_set_drvdata(pdev, ctlr);
    
    	mcspi = spi_controller_get_devdata(ctlr);
    	mcspi->ctlr = ctlr;
    
    	match = of_match_device(omap_mcspi_of_match, &pdev->dev);
    	if (match) {
    		u32 num_cs = 1; /* default number of chipselect */
    		pdata = match->data;
    
    		of_property_read_u32(node, "ti,spi-num-cs", &num_cs);
    		ctlr->num_chipselect = num_cs;
    		if (of_property_read_bool(node, "ti,pindir-d0-out-d1-in"))
    			mcspi->pin_dir = MCSPI_PINDIR_D0_OUT_D1_IN;
    	} else {
    		pdata = dev_get_platdata(&pdev->dev);
    		ctlr->num_chipselect = pdata->num_cs;
    		mcspi->pin_dir = pdata->pin_dir;
    	}
    	regs_offset = pdata->regs_offset;
    	if (pdata->max_xfer_len) {
    		mcspi->max_xfer_len = pdata->max_xfer_len;
    		ctlr->max_transfer_size = omap2_mcspi_max_xfer_size;
    	}
    
    	mcspi->base = devm_platform_get_and_ioremap_resource(pdev, 0, &r);
    	if (IS_ERR(mcspi->base)) {
    		status = PTR_ERR(mcspi->base);
    		goto free_ctlr;
    	}
    	mcspi->phys = r->start + regs_offset;
    	mcspi->base += regs_offset;
    
    	mcspi->dev = &pdev->dev;
    
    	INIT_LIST_HEAD(&mcspi->ctx.cs);
    
    	mcspi->dma_channels = devm_kcalloc(&pdev->dev, ctlr->num_chipselect,
    					   sizeof(struct omap2_mcspi_dma),
    					   GFP_KERNEL);
    	if (mcspi->dma_channels == NULL) {
    		status = -ENOMEM;
    		goto free_ctlr;
    	}
    
    	for (i = 0; i < ctlr->num_chipselect; i++) {
    		sprintf(mcspi->dma_channels[i].dma_rx_ch_name, "rx%d", i);
    		sprintf(mcspi->dma_channels[i].dma_tx_ch_name, "tx%d", i);
    
    		status = omap2_mcspi_request_dma(mcspi,
    						 &mcspi->dma_channels[i]);
    		if (status == -EPROBE_DEFER)
    			goto free_ctlr;
    	}
    
    	status = platform_get_irq(pdev, 0);
    	if (status < 0)
    		goto free_ctlr;
    	init_completion(&mcspi->txrxdone);
    	status = devm_request_irq(&pdev->dev, status,
    				  omap2_mcspi_irq_handler, 0, pdev->name,
    				  mcspi);
    	if (status) {
    		dev_err(&pdev->dev, "Cannot request IRQ");
    		goto free_ctlr;
    	}
    
    	mcspi->ref_clk = devm_clk_get_optional_enabled(&pdev->dev, NULL);
    	if (IS_ERR(mcspi->ref_clk)) {
    		status = PTR_ERR(mcspi->ref_clk);
    		dev_err_probe(&pdev->dev, status, "Failed to get ref_clk");
    		goto free_ctlr;
    	}
    	if (mcspi->ref_clk)
    		mcspi->ref_clk_hz = clk_get_rate(mcspi->ref_clk);
    	else
    		mcspi->ref_clk_hz = OMAP2_MCSPI_MAX_FREQ;
    	ctlr->max_speed_hz = mcspi->ref_clk_hz;
    	ctlr->min_speed_hz = mcspi->ref_clk_hz >> 15;
    
    	pm_runtime_use_autosuspend(&pdev->dev);
    	pm_runtime_set_autosuspend_delay(&pdev->dev, SPI_AUTOSUSPEND_TIMEOUT);
    	pm_runtime_enable(&pdev->dev);
    
    	status = omap2_mcspi_controller_setup(mcspi);
    	if (status < 0)
    		goto disable_pm;
    
    	status = devm_spi_register_controller(&pdev->dev, ctlr);
    	if (status < 0)
    		goto disable_pm;
    
    	return status;
    
    disable_pm:
    	pm_runtime_dont_use_autosuspend(&pdev->dev);
    	pm_runtime_put_sync(&pdev->dev);
    	pm_runtime_disable(&pdev->dev);
    free_ctlr:
    	omap2_mcspi_release_dma(ctlr);
    	spi_controller_put(ctlr);
    	return status;
    }
    
    static void omap2_mcspi_remove(struct platform_device *pdev)
    {
    	struct spi_controller *ctlr = platform_get_drvdata(pdev);
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(ctlr);
    
    	omap2_mcspi_release_dma(ctlr);
    
    	pm_runtime_dont_use_autosuspend(mcspi->dev);
    	pm_runtime_put_sync(mcspi->dev);
    	pm_runtime_disable(&pdev->dev);
    }
    
    /* work with hotplug and coldplug */
    MODULE_ALIAS("platform:omap2_mcspi");
    
    static int __maybe_unused omap2_mcspi_suspend(struct device *dev)
    {
    	struct spi_controller *ctlr = dev_get_drvdata(dev);
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(ctlr);
    	int error;
    
    	error = pinctrl_pm_select_sleep_state(dev);
    	if (error)
    		dev_warn(mcspi->dev, "%s: failed to set pins: %i\n",
    			 __func__, error);
    
    	error = spi_controller_suspend(ctlr);
    	if (error)
    		dev_warn(mcspi->dev, "%s: controller suspend failed: %i\n",
    			 __func__, error);
    
    	return pm_runtime_force_suspend(dev);
    }
    
    static int __maybe_unused omap2_mcspi_resume(struct device *dev)
    {
    	struct spi_controller *ctlr = dev_get_drvdata(dev);
    	struct omap2_mcspi *mcspi = spi_controller_get_devdata(ctlr);
    	int error;
    
    	error = spi_controller_resume(ctlr);
    	if (error)
    		dev_warn(mcspi->dev, "%s: controller resume failed: %i\n",
    			 __func__, error);
    
    	return pm_runtime_force_resume(dev);
    }
    
    static const struct dev_pm_ops omap2_mcspi_pm_ops = {
    	SET_SYSTEM_SLEEP_PM_OPS(omap2_mcspi_suspend,
    				omap2_mcspi_resume)
    	.runtime_suspend	= omap_mcspi_runtime_suspend,
    	.runtime_resume		= omap_mcspi_runtime_resume,
    };
    
    static struct platform_driver omap2_mcspi_driver = {
    	.driver = {
    		.name =		"omap2_mcspi",
    		.pm =		&omap2_mcspi_pm_ops,
    		.of_match_table = omap_mcspi_of_match,
    	},
    	.probe =	omap2_mcspi_probe,
    	.remove_new =	omap2_mcspi_remove,
    };
    
    module_platform_driver(omap2_mcspi_driver);
    MODULE_DESCRIPTION("OMAP2 McSPI controller driver");
    MODULE_LICENSE("GPL");
    

    spi dma configuration:

  • Hi Jiannan,

    However, the modified SPI communication has a high probability of completely freezing the system, leaving it totally unresponsive, so that only a power cycle can restart it, regardless of which core.

    Can you please elaborate on this? When you say system freeze, what happens? You see kernel crash, or something else?

  • Hi Divyansh,

    Customer means the system hang in this scenario, not showing error log. (you could see from the customer last reply for the log they captured) This only happens after integrating the above fix and running the SPI application with high possibility to reproduce. But this fix resolved the original large spi jitter, so we need this fix but just need something else to properly integrate this fix in SDK11.2

    Thanks,

    Kevin

  • Hi Kevin,

    Is it possible run the spi_mcu_test program attached in the first post on AM62L EVM? If so, does the AM62L kernel devicetree should be patched to enable SPI? does the test program require a device to be attached to the other side the SPI bus?

  • Hi Bin,

    Thanks for the support. The AM62L kernel should be patched to reproduce this issue. Customer originally was testing spi_mcu_test program under their real application (customer board + SPI connecting external device), this will reproduce easily. To help us reproduce on EVM, customer today disconnected the MISO & MOSI (no external SPI device connected) in their customer board setup, after multiple runs (not every time could reproduce it), they could still reproduce this issue, so this suggests we are supposed to reproduce on EVM setup also.

    Customer successfully captured some error log below in one of the times that reproduced the issue:

    [  140.151496] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
    [  140.151517] rcu:     1-...!: (2 GPs behind) idle=2150/0/0x0 softirq=0/0 fqs=0 (false positive?)
    [  140.151531] rcu:     (detected by 0, t=60002 jiffies, g=1325, q=1 ncpus=2)
    [  140.151541] Sending NMI from CPU 0 to CPUs 1:
    [  140.151553] NMI backtrace for cpu 1 skipped: idling at default_idle_call+0x24/0x34
    [  140.152551] rcu: rcu_preempt kthread timer wakeup didn't happen for 59999 jiffies! g1325 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
    [  140.152559] rcu:     Possible timer handling issue on cpu=0 timer-softirq=40541
    [  140.152563] rcu: rcu_preempt kthread starved for 60002 jiffies! g1325 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
    [  140.152570] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
    [  140.152573] rcu: RCU grace-period kthread stack dump:
    [  140.152575] task:rcu_preempt     state:I stack:0     pid:17    tgid:17    ppid:2      flags:0x00000008
    [  140.152588] Call trace:
    [  140.152592]  __switch_to+0xc8/0x124
    [  140.152602]  __schedule+0x230/0x6c0
    [  140.152610]  schedule+0x30/0xf4
    [  140.152617]  schedule_timeout+0x6c/0xd0
    [  140.152624]  rcu_gp_fqs_loop+0x104/0x414
    [  140.152633]  rcu_gp_kthread+0xdc/0x10c
    [  140.152640]  kthread+0x10c/0x110
    [  140.152650]  ret_from_fork+0x10/0x20
    [  140.152658] rcu: Stack dump where RCU GP kthread last ran:
    [  140.152663] CPU: 0 UID: 0 PID: 302 Comm: spi_mcu_arm64.e Not tainted 6.12.57-ge942a64b06fe #29
    [  140.152671] Hardware name: TM731-Lite (DT)
    [  140.152673] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [  140.152680] pc : am62l_udma_tx_status+0x1b0/0x2e4
    [  140.152691] lr : am62l_udma_tx_status+0x68/0x2e4
    [  140.152698] sp : ffff000003c8f910
    [  140.152701] x29: ffff000003c8f910 x28: 0000000000000000 x27: ffff8000812aa100
    [  140.152710] x26: 0000000000000001 x25: 0000000000000020 x24: 00000000000003e8
    [  140.152718] x23: ffff000001a625c0 x22: ffff000003440430 x21: 0000000000000383
    [  140.152726] x20: ffff000003c8f9f0 x19: ffff000003440390 x18: 00000012a96885d3
    [  140.152735] x17: 0000000000000000 x16: 000000000013fc98 x15: ffff00001ff995c0
    [  140.152743] x14: 02f42a7cd9c8086c x13: 0000000000000000 x12: 0000000000000000
    [  140.152751] x11: 00000000000000c0 x10: 00000000000008f0 x9 : ffff000003c8f7e0
    [  140.152760] x8 : ffff000003f15ed0 x7 : 0000000000000000 x6 : 000000000000002c
    [  140.152767] x5 : ffff000003440448 x4 : 0000000000000000 x3 : 0000000000000000
    [  140.152775] x2 : 0000000000000000 x1 : 0000000000000002 x0 : 0000000018100000
    [  140.152783] Call trace:
    [  140.152785]  am62l_udma_tx_status+0x1b0/0x2e4
    [  140.152792]  omap2_mcspi_rx_dma+0x178/0x448
    [  140.152805]  omap2_mcspi_transfer_one+0x3a0/0xbf8
    [  140.152814]  spi_transfer_one_message+0x384/0x6d8
    [  140.152822]  __spi_pump_transfer_message+0x198/0x4f4
    [  140.152830]  __spi_sync+0x24c/0x32c
    [  140.152837]  spi_sync+0x2c/0x4c
    [  140.152843]  spidev_message+0x240/0x2f8
    [  140.152851]  spidev_ioctl+0x25c/0x3f0
    [  140.152859]  __arm64_sys_ioctl+0xa4/0xe4
    [  140.152869]  el0_svc_common.constprop.0+0x58/0x124
    [  140.152878]  do_el0_svc+0x18/0x20
    [  140.152884]  el0_svc+0x80/0xf0
    [  140.152891]  el0t_64_sync_handler+0x118/0x124
    [  140.152898]  el0t_64_sync+0x14c/0x150

    Customer is suspecting it might be get stuck in the following while loop in omap2_mcspi_rx_dma():

    /*
    * Before disabling RX DMA we need to confirm whether DMA RX is complete.
    * This polling completes on the first attempt itself in most cases.
    */
    do {
    dma_status = dmaengine_tx_status(mcspi_dma->dma_rx, dma_rx_cookie,
    &mcspi_dma_rxstate);
    } while (dma_status != DMA_COMPLETE);

    Thanks,

    Kevin

  • Hi Kevin,

    Please revert your spi driver change, and apply the kernel patch attached below to see if the system still hangs.

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/908/0001_2D00_FROMLIST_2D00_spi_2D00_spi_2D00_omap2_2D00_mcspi_2D00_Use_2D00_EOW_2D00_interrupt.patch

  • Hi Bin,

    This patch also has a problem. kernel log:

    [  173.118196] cma: __cma_alloc: reserved: alloc failed, req-size: 1024 pages, ret: -12
    [  173.118221] cma: number of available pages: 15@177+15@209+15@241+15@273+15@305+15@337+15@369+47@401+128@640+512@7680=> 792 free of 8192 total pages
    [  174.400698] cma: __cma_alloc: reserved: alloc failed, req-size: 1024 pages, ret: -12
    [  174.400722] cma: number of available pages: 15@177+15@209+15@241+15@273+15@305+15@337+15@369+15@401+128@640+512@7680=> 760 free of 8192 total pages
    [  175.680698] cma: __cma_alloc: reserved: alloc failed, req-size: 1024 pages, ret: -12
    [  175.680723] cma: number of available pages: 15@177+15@209+15@241+15@273+15@305+15@337+15@369+15@401+128@640+512@7680=> 760 free of 8192 total pages
    [  176.960698] cma: __cma_alloc: reserved: alloc failed, req-size: 1024 pages, ret: -12
    [  176.960722] cma: number of available pages: 15@177+15@209+15@241+15@273+15@305+15@337+15@369+15@401+128@640+512@7680=> 760 free of 8192 total pages
    [  178.240698] cma: __cma_alloc: reserved: alloc failed, req-size: 1024 pages, ret: -12
    [  178.240724] cma: number of available pages: 15@177+15@209+15@241+15@273+15@305+15@337+15@369+15@401+128@640+512@7680=> 760 free of 8192 total pages
    [  179.520700] cma: __cma_alloc: reserved: alloc failed, req-size: 1024 pages, ret: -12
    [  179.520725] cma: number of available pages: 15@177+15@209+15@241+15@273+15@305+15@337+15@369+15@401+128@640+512@7680=> 760 free of 8192 total pages

    At the same time, the memory usage keeps increasing, and even after the test program exits, it is not released.

  • Hi Jiannan,

    Can you please attach the kernel boot log? The log should have the information of the preserved CMA pool.

  • ```
    Starting kernel ...
    
    [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
    [    0.000000] Linux version 6.12.57-gd9943b67bd86 (zhang@zhang-VirtualBox) (aarch64-oe-linux-gcc (GCC) 13.4.0, GNU ld (GNU Binutils) 2.42.0.20240723) #1 SMP PREEMPT_RT Tue Jun 23 10:47:26 CST 2026
    [    0.000000] Machine model: TM731-Lite
    [    0.000000] earlycon: ns16550a0 at MMIO32 0x0000000002800000 (options '')
    [    0.000000] printk: legacy bootconsole [ns16550a0] enabled
    [    0.000000] efi: UEFI not found.
    [    0.000000] OF: reserved mem: 0x0000000080200000..0x00000000809fffff (8192 KiB) nomap non-reusable optee@80200000
    [    0.000000] OF: reserved mem: 0x0000000080000000..0x00000000801fffff (2048 KiB) nomap non-reusable tfa@80000000
    [    0.000000] Zone ranges:
    [    0.000000]   DMA      [mem 0x0000000080000000-0x000000009fffffff]
    [    0.000000]   DMA32    empty
    [    0.000000]   Normal   empty
    [    0.000000] Movable zone start for each node
    [    0.000000] Early memory node ranges
    [    0.000000]   node   0: [mem 0x0000000080000000-0x00000000809fffff]
    [    0.000000]   node   0: [mem 0x0000000080a00000-0x000000009fffffff]
    [    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000009fffffff]
    [    0.000000] cma: Reserved 32 MiB at 0x000000009d600000 on node -1
    [    0.000000] psci: probing for conduit method from DT.
    [    0.000000] psci: PSCIv1.1 detected in firmware.
    [    0.000000] psci: Using standard PSCI v0.2 function IDs
    [    0.000000] psci: MIGRATE_INFO_TYPE not supported.
    [    0.000000] psci: SMC Calling Convention v1.5
    [    0.000000] psci: OSI mode supported.
    [    0.000000] percpu: Embedded 23 pages/cpu s54000 r8192 d32016 u94208
    [    0.000000] Detected VIPT I-cache on CPU0
    [    0.000000] CPU features: detected: GIC system register CPU interface
    [    0.000000] CPU features: kernel page table isolation disabled by kernel configuration
    [    0.000000] CPU features: detected: ARM erratum 845719
    [    0.000000] alternatives: applying boot alternatives
    [    0.000000] Kernel command line: console=ttyS0,115200n8 earlycon=ns16550a,mmio32,0x02800000 ubi.mtd=ospi_nand.rootfs root=ubi0:rootfs rw rootfstype=ubifs rootwait rcu_nocb_poll rcu_nocbs=1 nohz=on nohz_full=1 kthread_cpus=0 irqaffinity=0 isolcpus=managed_irq,domain,1
    [    0.000000] Unknown kernel command line parameters "kthread_cpus=0", will be passed to user space.
    [    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
    [    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
    [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 131072
    [    0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
    [    0.000000] software IO TLB: SWIOTLB bounce buffer size adjusted to 0MB
    [    0.000000] software IO TLB: area num 2.
    [    0.000000] software IO TLB: mapped [mem 0x000000009fe4c000-0x000000009fecc000] (0MB)
    [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
    [    0.000000] rcu: Preemptible hierarchical RCU implementation.
    [    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=2.
    [    0.000000] rcu:     RCU_SOFTIRQ processing moved to rcuc kthreads.
    [    0.000000]  No expedited grace period (rcu_normal_after_boot).
    [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
    [    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
    [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
    [    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
    [    0.000000] GICv3: 960 SPIs implemented
    [    0.000000] GICv3: 0 Extended SPIs implemented
    [    0.000000] Root IRQ handler: gic_handle_irq
    [    0.000000] GICv3: GICv3 features: 16 PPIs
    [    0.000000] GICv3: GICD_CTRL.DS=0, SCR_EL3.FIQ=0
    [    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x0000000001840000
    [    0.000000] ITS [mem 0x01820000-0x0182ffff]
    [    0.000000] GIC: enabling workaround for ITS: Socionext Synquacer pre-ITS
    [    0.000000] ITS@0x0000000001820000: Devices Table too large, reduce ids 20->19
    [    0.000000] ITS@0x0000000001820000: allocated 524288 Devices @81400000 (flat, esz 8, psz 64K, shr 0)
    [    0.000000] ITS: using cache flushing for cmd queue
    [    0.000000] GICv3: using LPI property table @0x0000000080c30000
    [    0.000000] GIC: using cache flushing for LPI property table
    [    0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000080c40000
    [    0.000000] NO_HZ: Full dynticks CPUs: 1.
    [    0.000000] rcu:     Offload RCU callbacks from CPUs: 1.
    [    0.000000] rcu:     Poll for callbacks from no-CBs CPUs.
    [    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
    [    0.000000] arch_timer: cp15 timer(s) running at 200.00MHz (phys).
    [    0.000000] clocksource: arch_sys_counter: mask: 0x3ffffffffffffff max_cycles: 0x2e2049d3e8, max_idle_ns: 440795210634 ns
    [    0.000000] sched_clock: 58 bits at 200MHz, resolution 5ns, wraps every 4398046511102ns
    [    0.000259] Console: colour dummy device 80x25
    [    0.000314] Calibrating delay loop (skipped), value calculated using timer frequency.. 400.00 BogoMIPS (lpj=200000)
    [    0.000325] pid_max: default: 32768 minimum: 301
    [    0.000393] LSM: initializing lsm=capability
    [    0.000543] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes, linear)
    [    0.000554] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes, linear)
    [    0.002366] rcu: Hierarchical SRCU implementation.
    [    0.002380] rcu:     Max phase no-delay instances is 400.
    [    0.069611] Timer migration: 1 hierarchy levels; 8 children per group; 1 crossnode level
    [    0.079098] EFI services will not be available.
    [    0.082170] smp: Bringing up secondary CPUs ...
    [    0.137286] Detected VIPT I-cache on CPU1
    [    0.137382] GICv3: CPU1: found redistributor 1 region 0:0x0000000001860000
    [    0.137398] GICv3: CPU1: using allocated LPI pending table @0x0000000080c50000
    [    0.137449] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
    [    0.151929] smp: Brought up 1 node, 2 CPUs
    [    0.151937] SMP: Total of 2 processors activated.
    [    0.151940] CPU: All CPU(s) started at EL2
    [    0.151944] CPU features: detected: 32-bit EL0 Support
    [    0.151948] CPU features: detected: CRC32 instructions
    [    0.151980] alternatives: applying system-wide alternatives
    [    0.152697] Memory: 448340K/524288K available (9408K kernel code, 1110K rwdata, 3096K rodata, 2048K init, 597K bss, 36676K reserved, 32768K cma-reserved)
    [    0.205687] devtmpfs: initialized
    [    0.594406] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
    [    0.594429] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
    [    0.612459] 28656 pages in range for non-PLT usage
    [    0.612478] 520176 pages in range for PLT usage
    [    0.612729] pinctrl core: initialized pinctrl subsystem
    [    0.628354] DMI not present or invalid.
    [    0.628922] NET: Registered PF_NETLINK/PF_ROUTE protocol family
    [    0.639983] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
    [    0.647468] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
    [    0.647835] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
    [    0.647928] audit: initializing netlink subsys (disabled)
    [    0.648158] audit: type=2000 audit(0.645:1): state=initialized audit_enabled=0 res=1
    [    0.678259] thermal_sys: Registered thermal governor 'step_wise'
    [    0.678538] ASID allocator initialised with 65536 entries
    [    0.694854] /bus@f0000/interrupt-controller@1800000: Fixed dependency cycle(s) with /bus@f0000/interrupt-controller@1800000
    [    0.694949] /bus@f0000/dwc3-usb@f900000/usb@31000000: Fixed dependency cycle(s) with /bus@f0000/i2c@20000000/tps6598x@20/connector
    [    0.694982] /bus@f0000/i2c@20000000/tps6598x@20/connector: Fixed dependency cycle(s) with /bus@f0000/dwc3-usb@f900000/usb@31000000
    [    0.733743] /bus@f0000/dwc3-usb@f900000/usb@31000000: Fixed dependency cycle(s) with /bus@f0000/i2c@20000000/tps6598x@20/connector
    [    0.733904] /bus@f0000/i2c@20000000/tps6598x@20/connector: Fixed dependency cycle(s) with /bus@f0000/dwc3-usb@f900000/usb@31000000
    [    0.760666] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
    [    0.760680] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
    [    0.760686] HugeTLB: registered 32.0 MiB page size, pre-allocated 0 pages
    [    0.760688] HugeTLB: 0 KiB vmemmap can be freed for a 32.0 MiB page
    [    0.760692] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
    [    0.760695] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
    [    0.760700] HugeTLB: registered 64.0 KiB page size, pre-allocated 0 pages
    [    0.760702] HugeTLB: 0 KiB vmemmap can be freed for a 64.0 KiB page
    [    0.815362] k3-chipinfo 43000014.chipid: Family:AM62LX rev:SR1.1 JTAGID[0x1bba702f] Detected
    [    0.825203] iommu: Default domain type: Translated
    [    0.825216] iommu: DMA domain TLB invalidation policy: strict mode
    [    0.825625] SCSI subsystem initialized
    [    0.826150] usbcore: registered new interface driver usbfs
    [    0.846922] usbcore: registered new interface driver hub
    [    0.846968] usbcore: registered new device driver usb
    [    0.858114] pps_core: LinuxPPS API ver. 1 registered
    [    0.858119] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
    [    0.858133] PTP clock support registered
    [    0.858171] EDAC MC: Ver: 3.0.0
    [    0.868443] scmi_core: SCMI protocol bus registered
    [    0.885381] FPGA manager framework
    [    0.890142] vgaarb: loaded
    [    0.893348] clocksource: Switched to clocksource arch_sys_counter
    [    0.899924] VFS: Disk quotas dquot_6.6.0
    [    0.899954] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
    [    0.918450] NET: Registered PF_INET protocol family
    [    0.923691] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
    [    0.924363] tcp_listen_portaddr_hash hash table entries: 256 (order: 1, 10240 bytes, linear)
    [    0.924388] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
    [    0.924397] TCP established hash table entries: 4096 (order: 3, 32768 bytes, linear)
    [    0.924465] TCP bind hash table entries: 4096 (order: 6, 327680 bytes, linear)
    [    0.945777] TCP: Hash tables configured (established 4096 bind 4096)
    [    0.945939] UDP hash table entries: 256 (order: 2, 24576 bytes, linear)
    [    0.945978] UDP-Lite hash table entries: 256 (order: 2, 24576 bytes, linear)
    [    0.946201] NET: Registered PF_UNIX/PF_LOCAL protocol family
    [    0.946257] PCI: CLS 0 bytes, default 64
    [    0.996003] Initialise system trusted keyrings
    [    0.996209] workingset: timestamp_bits=62 max_order=17 bucket_order=0
    [    1.059394] Key type asymmetric registered
    [    1.059410] Asymmetric key parser 'x509' registered
    [    1.059487] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245)
    [    1.076225] io scheduler mq-deadline registered
    [    1.076230] io scheduler kyber registered
    [    1.076256] io scheduler bfq registered
    [    1.091461] pinctrl-single 4084000.pinctrl: 147 pins, size 588
    [    1.100887] ti-udma-am62l 485c4000.dma-controller: Number of rings: 144
    [    1.100971] ti-udma-am62l 485c4000.dma-controller: Channels: 144 (bchan: 16, tchan + rchan: 128)
    [    1.125418] ti-udma-am62l 485c0000.dma-controller: Number of rings: 112
    [    1.132336] ti-udma-am62l 485c0000.dma-controller: Channels: 97 (tchan + rchan: 97)
    [    1.150442] Serial: 8250/16550 driver, 12 ports, IRQ sharing enabled
    [    1.169601] loop: module loaded
    [    1.174014] megasas: 07.727.03.00-rc1
    [    1.180691] tun: Universal TUN/TAP device driver, 1.6
    [    1.180995] CAN device driver interface
    [    1.190912] VFIO - User Level meta-driver version: 0.3
    [    1.197458] usbcore: registered new interface driver usb-storage
    [    1.203893] UDC core: g_ether: couldn't find an available UDC
    [    1.204175] i2c_dev: i2c /dev entries driver
    [    1.217216] arm-scmi arm-scmi.0.auto: Using scmi_smc_transport
    [    1.217234] arm-scmi arm-scmi.0.auto: SCMI max-rx-timeout: 30ms
    [    1.217433] scmi_protocol scmi_dev.1: Enabled polling mode TX channel - prot_id:16
    [    1.237980] arm-scmi arm-scmi.0.auto: SCMI RAW Mode initialized for instance 0
    [    1.237995] arm-scmi arm-scmi.0.auto: SCMI RAW Mode COEX enabled !
    [    1.238083] arm-scmi arm-scmi.0.auto: SCMI Notifications - Core Enabled.
    [    1.238139] arm-scmi arm-scmi.0.auto: SCMI Protocol v2.0 'TI:' Firmware version 0x0
    [    1.289852] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
    [    1.299788] optee: probing for conduit method.
    [    1.299807] optee: api uid mismatch
    [    1.299813] optee firmware:optee: probe with driver optee failed with error -22
    [    1.300390] NET: Registered PF_INET6 protocol family
    [    1.322327] Segment Routing with IPv6
    [    1.322378] In-situ OAM (IOAM) with IPv6
    [    1.322441] NET: Registered PF_PACKET protocol family
    [    1.322467] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
    [    1.322471] can: controller area network core
    [    1.353178] NET: Registered PF_CAN protocol family
    [    1.353185] can: raw protocol
    [    1.353192] can: broadcast manager protocol
    [    1.353201] can: netlink gateway - max_hops=1
    [    1.353242] Key type dns_resolver registered
    [    1.382869] Loading compiled-in X.509 certificates
    [    1.444108] /bus@f0000/i2c@20000000/tps6598x@20/connector: Fixed dependency cycle(s) with /bus@f0000/dwc3-usb@f900000/usb@31000000
    [    1.444212] omap_i2c 20000000.i2c: bus 0 rev0.12 at 200 kHz
    [    1.476258] rtc-pcf8563 1-0051: registered as rtc0
    [    1.481849] omap_i2c 20020000.i2c: bus 1 rev0.12 at 400 kHz
    [    1.488792] printk: legacy console [ttyS0] disabled
    [    1.489238] 2800000.serial: ttyS0 at MMIO 0x2800000 (irq = 157, base_baud = 3000000) is a 8250
    [    1.503146] printk: legacy console [ttyS0] enabled
    [    1.503146] printk: legacy console [ttyS0] enabled
    [    1.503152] printk: legacy bootconsole [ns16550a0] disabled
    [    1.503152] printk: legacy bootconsole [ns16550a0] disabled
    [    1.529720] 2810000.serial: ttyS5 at MMIO 0x2810000 (irq = 158, base_baud = 3000000) is a 8250
    [    1.540438] 2820000.serial: ttyS3 at MMIO 0x2820000 (irq = 159, base_baud = 3000000) is a 8250
    [    1.551245] 2830000.serial: ttyS4 at MMIO 0x2830000 (irq = 160, base_baud = 3000000) is a 8250
    [    1.562314] spi-nand spi0.0: FORESEE SPI NAND was found.
    [    1.562329] spi-nand spi0.0: 128 MiB, block size: 128 KiB, page size: 2048, OOB size: 64
    [    1.576278] 5 fixed-partitions partitions found on MTD device spi0.0
    [    1.576296] Creating 5 MTD partitions on "spi0.0":
    [    1.576304] 0x000000000000-0x000000080000 : "ospi_nand.tiboot3"
    [    1.596015] 0x000000080000-0x000000200000 : "ospi_nand.tispl"
    [    1.604360] 0x000000200000-0x000000400000 : "ospi_nand.u-boot"
    [    1.612991] 0x000000400000-0x000007fe0000 : "ospi_nand.rootfs"
    [    1.690053] 0x000007fe0000-0x000008000000 : "ospi_nand.phypattern"
    [    1.698191] cadence-qspi fc40000.spi: Pattern not found. Skipping calibration.
    [    1.708732] am65-cpsw-nuss 8000000.ethernet: initializing am65 cpsw nuss version 0x6BA00103, cpsw version 0x6BA80103 Ports: 3 quirks:00000006
    [    1.709175] am65-cpsw-nuss 8000000.ethernet: Use random MAC address
    [    1.748517] davinci_mdio 8000f00.mdio: davinci mdio revision 17.7, bus freq 1000000
    [    1.763823] davinci_mdio 8000f00.mdio: phy[2]: device 8000f00.mdio:02, driver TI DP83826C
    [    1.763839] davinci_mdio 8000f00.mdio: phy[4]: device 8000f00.mdio:04, driver TI DP83826C
    [    1.764029] am65-cpsw-nuss 8000000.ethernet: initialized cpsw ale version 1.5
    [    1.764037] am65-cpsw-nuss 8000000.ethernet: ALE Table size 512, Policers 32
    [    1.780556] am65-cpsw-nuss 8000000.ethernet: CPTS ver 0x4e8a010d, freq:200000000, add_val:4 pps:0
    [    1.811087] am65-cpsw-nuss 8000000.ethernet: set new flow-id-base 96
    [    1.822222] /bus@f0000/i2c@20000000/tps6598x@20/connector: Fixed dependency cycle(s) with /bus@f0000/dwc3-usb@f900000/usb@31000000
    [    1.822389] /bus@f0000/dwc3-usb@f900000/usb@31000000: Fixed dependency cycle(s) with /bus@f0000/i2c@20000000/tps6598x@20/connector
    [    1.849745] ubi0: attaching mtd3
    [    1.856597] g_ether gadget.0: HOST MAC 26:23:27:ae:73:36
    [    1.856686] g_ether gadget.0: MAC 9e:73:0a:30:43:33
    [    1.856889] g_ether gadget.0: Ethernet Gadget, version: Memorial Day 2008
    [    1.856895] g_ether gadget.0: g_ether ready
    [    2.479788] ubi0: scanning is finished
    [    2.497197] ubi0: attached mtd3 (name "ospi_nand.rootfs", size 123 MiB)
    [    2.497216] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
    [    2.497222] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
    [    2.497227] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
    [    2.497232] ubi0: good PEBs: 991, bad PEBs: 0, corrupted PEBs: 0
    [    2.497237] ubi0: user volume: 1, internal volumes: 1, max. volumes count: 128
    [    2.497242] ubi0: max/mean erase counter: 25/10, WL threshold: 4096, image sequence number: 502254411
    [    2.497247] ubi0: available PEBs: 0, total reserved PEBs: 991, PEBs reserved for bad PEB handling: 20
    [    2.497260] ubi0: background thread "ubi_bgt0d" started, PID 91
    [    2.498713] clk: Disabling unused clocks
    [    2.570599] PM: genpd: Disabling unused power domains
    [    2.577436] UBIFS (ubi0:0): Mounting in unauthenticated mode
    [    2.583255] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 92
    [    2.628326] UBIFS (ubi0:0): recovery needed
    [    2.824426] UBIFS (ubi0:0): recovery completed
    [    2.829045] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "rootfs"
    [    2.829056] UBIFS (ubi0:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
    [    2.829062] UBIFS (ubi0:0): FS size: 112881664 bytes (107 MiB, 889 LEBs), max 900 LEBs, journal size 9023488 bytes (8 MiB, 72 LEBs)
    [    2.829071] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
    [    2.829075] UBIFS (ubi0:0): media format: w4/r0 (latest is w5/r0), UUID E66B7C49-4066-461D-B3E4-D352BA294457, small LPT model
    [    2.857599] VFS: Mounted root (ubifs filesystem) on device 0:21.
    [    2.882918] devtmpfs: mounted
    [    2.886060] Freeing unused kernel memory: 2048K
    [    2.886185] Run /sbin/init as init process
    [    3.570649] udevd[120]: starting version 3.2.10
    [    3.765544] random: crng init done
    [    3.841959] udevd[121]: starting eudev-3.2.10
    [    4.233416] mtdblock: MTD device 'ospi_nand.tiboot3' is NAND, please consider using UBI block devices instead.
    [    4.244383] mtdblock: MTD device 'ospi_nand.tispl' is NAND, please consider using UBI block devices instead.
    [    4.261780] mtdblock: MTD device 'ospi_nand.u-boot' is NAND, please consider using UBI block devices instead.
    [    4.277212] mtdblock: MTD device 'ospi_nand.rootfs' is NAND, please consider using UBI block devices instead.
    [    4.293186] mtdblock: MTD device 'ospi_nand.phypattern' is NAND, please consider using UBI block devices instead.
    [    4.629284] tps6598x: version magic '6.12.57-gaa918e1d7c65 SMP preempt_rt mod_unload aarch64' should be '6.12.57-gd9943b67bd86 SMP preempt_rt mod_unload aarch64'
    insmod: can't insert '/lib/modules/tps6598x.ko': invalid module [    4.645088] tps6598x: version magic '6.12.57-gaa918e1d7c65 SMP preempt_rt mod_unload aarch64' should be '6.12.57-gd9943b67bd86 SMP preempt_rt mod_unload aarch64'
    format
    INIT: Entering runlevel: 5
    Configuring network interfaces... [    5.125690] am65-cpsw-nuss 8000000.ethernet eth0: PHY [8000f00.mdio:02] driver [TI DP83826C] (irq=POLL)
    [    5.125714] am65-cpsw-nuss 8000000.ethernet eth0: configuring for phy/rmii link mode
    ifconfig: SIOCSIFNETMASK: Cannot assign requested address
    route: SIOCADDRT: Network is unreachable
    [    5.190360] am65-cpsw-nuss 8000000.ethernet eth1: PHY [8000f00.mdio:04] driver [TI DP83826C] (irq=POLL)
    [    5.190386] am65-cpsw-nuss 8000000.ethernet eth1: configuring for phy/rmii link mode
    
    
    root@am62xx-evm:~# dmesg | grep cma
    [    0.000000] cma: Reserved 32 MiB at 0x000000009d600000 on node -1
    [    0.152697] Memory: 448340K/524288K available (9408K kernel code, 1110K rwdata, 3096K rodata, 2048K init, 597K bss, 36676K reserved, 32768K cma-reserved)
    ```

  • [  173.118196] cma: __cma_alloc: reserved: alloc failed, req-size: 1024 pages, ret: -12

    when does this message get printed? when you started your spi test application or something else?

  • This log will appear a short while after starting the test program.Subsequently, each time the test program is started, it will keep printing this log until a reboot.

  • Hi Jiannan,
    I have been trying to reproduce the issue at my end using your application on the default 11.02 SDK.

    Even with or without the patch given above, if I just apply the following diff to the EVM overlay:

    diff --git a/arch/arm64/boot/dts/ti/k3-am62l3-evm.dts b/arch/arm64/boot/dts/ti/k3-am62l3-evm.dts
    index 03a623418469..308052c1272b 100644
    --- a/arch/arm64/boot/dts/ti/k3-am62l3-evm.dts
    +++ b/arch/arm64/boot/dts/ti/k3-am62l3-evm.dts
    @@ -391,6 +391,15 @@ AM62LX_IOPAD(0x0188, PIN_INPUT, 0) /* (A9) MCASP0_AXR1 */
     		>;
     	};
     
    +	main_spi1_pins_default: main-spi1-pins-default {
    +		pinctrl-single,pins = <
    +			AM62LX_IOPAD(0x008c, PIN_INPUT, 4) /* (H22) SPI1_CLK */
    +			AM62LX_IOPAD(0x0080, PIN_INPUT, 4) /* (K22) SPI1_D0 */
    +			AM62LX_IOPAD(0x0084, PIN_INPUT, 4) /* (J23) SPI1_D1 */
    +			AM62LX_IOPAD(0x0088, PIN_INPUT, 4) /* (K23) SPI1_CS0 */
    +		>;
    +	};
    +
     	pmic_irq_pins_default: pmic-irq-default-pins {
     		pinctrl-single,pins = <
     			AM62LX_IOPAD(0x01e8, PIN_INPUT, 0) /* (C8) EXTINTn */
    @@ -987,3 +996,35 @@ &mcasp0 {
     &wkup_uart0_interconnect {
     	status = "okay";
     };
    +
    +&main_spi1 {
    +	status = "okay";
    +	pinctrl-names = "default";
    +	pinctrl-0 = <&main_spi1_pins_default>;
    +	bootph-all;
    +	ti,pindir-d0-out-d1-in;
    +
    +	dmas = <&main_bcdma 0 0 0xc300 0>, <&main_bcdma 0 0 0x4300 0>;
    +	dma-names = "tx0", "rx0";
    +	
    +	spidev@0 {
    +		spi-max-frequency = <24000000>;
    +		reg = <0>;
    +		compatible = "rohm,dh2228fv";
    +		spi-cs-setup-delay-ns = <5000>;
    +		spi-cs-hold-delay-ns = <5000>;
    +		spi-cs-inactive-delay-ns = <5000>;
    +	};
    +}; 
    +
    +
    +&main_i2c1 {
    +	gpio@23 {
    +		fet_sel {
    +			gpio-hog;
    +			gpios = <1 GPIO_ACTIVE_HIGH>;
    +			output-high;
    +			line-name = "VOUT0_FET_SEL0";
    +		};
    +	};
    +};
    \ No newline at end of file

    with no other changes (DMA driver patches not applied), then build and run you example on target, it still hangs.

    You earlier mentioned that without this patch, you see a high CPU load, but the userspace does not hang, it only hangs after using the the second patch, is that correct?

    Am I missing something while trying to replicate your setup?

  • When I apply no patches, I observe a significantly high CPU usage without DMA enabled; enabling DMA noticeably improves the situation, though the system exhibits substantial jitter. In both cases, the system remains functional and does not become hang.

    After applying the patches, the hang issue can be reproduced. For easier observation, I have adjusted the test program's scheduling policy from FIFO to OTHER, which allows me to observe exceptionally high CPU usage when the problem occurs.

    Do you mean that you are able to reproduce the issue on the EVM board without applying any patches? I have noticed that certain kernel debug configurations may affect the manifestation of the problem—for example, the earlier `__cma_alloc` logs do not appear when debug features such as tracer are enabled (but still hang).

  • Hi Jiannan,
    It was dma thread number issue earlier. Works now, even with the patch applied (I see low CPU usage).

    To replicate your issue, you mentioned you do not see it everytime, when do you see it? Upon multiple reboots and re-running the application or re-running the same application in the same boot itself? Typically how much time does it take from boot-up for you to see the hang/kernel crash?

  • Hi Divyansh,

    It was dma thread number issue earlier. Works now, even with the patch applied (I see low CPU usage).

    Could you please clarify what exactly the DMA thread number issue refers to? I am not entirely clear on that, as this point does not seem to have been brought up previously. I have not configured anything for the kernel's DMA interrupt threads.

    Regarding the reproduction of the problem, the procedure is straightforward: after applying the first patch, simply run the test program directly. In most cases, the issue reproduces immediately, but occasionally the program runs normally without any failure. In such cases, you can exit and rerun the program to trigger the issue, or reboot the system. For the second patch, you need to let it run for a longer period (about 1 to 2 minutes) before the cma log is printed.

  • Hi Jiannan,
    I am using spi1 instead of spi0 in my setup, which warrants changing the dma thread id. You shouldn't need any code change if you are using spi0.

    I have used the same boot (no reboot) with the second patch applied, no crash logs for over 3 hours. I have ran your code example for full length over 7-8 times, still no issues seen. CPU consumption across both cores is around 5-6% each while the application is running.
    I see the following in each of the 7-8 runs I did:

    root@am62lxx-evm:~/spi_mcu# ./spi_mcu_arm64.elf
    [INFO] Start
    [INFO] spi_mcu_init()
    [WARN] too long trace_jitter.csv! exit...
    [INFO] spi_mcu_deinit()
    [INFO] Exit
    root@am62lxx-evm:~/spi_mcu#

    I also see SPI0_CLK and SPI0_D0 output while the application is running in the above case.

  • Hi Divyansh,

    It appears that the reproduction was not successful. Could you please try several more times? It should not take as long as several hours. Not running for a long time, but running multiple times to try.

    The file trace_jitter.csv logs certain runtime parameters of the SPI:

    • Column 1: timestamp;

    • Columns 2–4: transmission times, representing the current, maximum, and minimum transmission durations, respectively;

    • Columns 5–7: jitter times, representing the current, maximum, and minimum jitter durations, respectively.

    My expectation is that the transmission time remains as stable as possible, and that the maximum jitter time is kept as small as possible. These parameters require a long period of running statistics to be accurate.

  • Hi Divyansh, Jiannan,

    There is some misunderstanding for your discussion. Divyansh's just testing was on Bin's new patch, not the original one.

    And for Jiannan's feedback, for the original patch, after multiple try we could see the hang issue easily, no need to wait. Only for Bin's new patch they are seeing some error log after 1~2 mins. So they are kind of different.

    I am suggesting maybe Divyansh you could firstly try on the original patch and see if you have the same issue firstly.

    Thanks,

    Kevin

  • I briefly compared the original patched spi-omap2-mcspi.c file with the new patch and see there are only 2 differences:

    1. the original patch changed DMA_MIN_BYTES to 12 but the new patch keeps the macro the same, 160.

    2. the do-while loop which Kevin pointed out and potentially where causes the hang (busy loop) is removed from the new patch.

    do {
            dma_status = dmaengine_tx_status(mcspi_dma->dma_rx, dma_rx_cookie,
            &mcspi_dma_rxstate);
    } while (dma_status != DMA_COMPLETE);

  • Hi Bin,

    I have discussed with customer, and here are some clarifications below:

    1: Customer found this "do-while loop" in original patch is suspectable by themselves the day before yesterday, so they have already removed it, the hang issue disappears as expected, but introducing the new "cma_alloc" error log related kernel memory leakage issues.

    2: For you new patch shared yesterday, they only follow the change that removing the "do-while loop" but revert back the DMA_MIN_BYTES from 160 to 12. The reason is customer thinks 160 is too large and their CPU loading is already high, wishing to use DMA as much as possible, so they keep DMA_MIN_BYTES as 12 for your new patch to test, so the result is the same as point 1 mentioned above.

    3: Based on your new patch with DMA_MIN_BYTES as 12, customer found that once they enabled tracer debugging option, the "cma_alloc" error log disappears, but it will cause their application testing program cannot exit normally (cannot use crtl+c to exit).

    Customer knows Divyansh just tested using EVM + your new patch with DMA_MIN_BYTES 160 with no issues, they hope we could test the original patch also, if we also see the hang issue this will prove your new patch with DMA_MIN_BYTES 160 actually solved the problem (not EVM itself cannot reproduce). In the same time, customer will test your new patch with DMA_MIN_BYTES 160 (not 12) tomorrow to see the results.

    Thanks,

    Kevin

  • Hi Bin,

    The DMA_MIN_BYTES parameter does have a significant impact, and the modification was made with reference to the linked resource: [FAQ] AM6x: Optimizing SPI-transfer inter-byte gaps using the DMA in Linux. I applied your patch while keeping DMA_MIN_BYTES set to 160, and observed no further CMA-related log entries; the latency jitter remained around 300 µs. However, when I changed the data length in spi_mcu_output_test() of the test program from 1000 to 159, the ioctl call failed directly; when set to 160, it executed normally.

    [INFO] Start
    [INFO] spi_mcu_init()
    [ERROR] ioctl(SPI_IOC_MESSAGE(2))
    [ERROR] output test FAIL!
    [ERROR] ioctl(SPI_IOC_MESSAGE(2))
    [ERROR] output test FAIL!
    

    For this issue, we currently plan to work around it by padding the data with zeros to 160 bytes, and we will proceed with product testing using this approach in the future. We are using Codesys, and there has been feedback that SPI traffic running on core 0 may also affect the EtherCAT functionality on core 1. Further testing will be conducted to determine whether this problem persists.

    With respect to the original patch, after changing DMA_MIN_BYTES to 160, multiple test runs did not reproduce the hang issue.