This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62x: Linux UART driver with DMA support

Hi,

We are using the below TI chipset for my project.

SoC:   AM62X SR1.0 GP
Model: Texas Instruments K3 AM625 SoC

I have modified the device tree to use Packet DMA (PKTDMA) for the UART.

&main_uart0 {
pinctrl-names = "default";
pinctrl-0 = <&main_uart0_pins_default>;
dmas = <&main_pktdma 0x4400 0>, <&main_pktdma 0xc400 0>;
dma-names = "rx", "tx";
};

As CONFIG_SERIAL_8250_DMA is defined, the RX DMA function is "omap_8250_rx_dma()" in 8250_omap.c file, and not "serial8250_rx_dma()" function in 8250_dma.c file.

3.2.2.12. UART — Processor SDK AM62x Documentation

By using printk, I understand that the RX DMA is only being setup when the FIFO has data and this is done through the UART IRQ.

static void am654_8250_handle_rx_dma(struct uart_8250_port *up, u8 iir,
				     unsigned char status)
{
	/*
	 * Queue a new transfer if FIFO has data.
	 */
	if ((status & (UART_LSR_DR | UART_LSR_BI)) &&
	    (up->ier & UART_IER_RDI)) {
		omap_8250_rx_dma(up);
		serial_out(up, UART_OMAP_EFR2, UART_OMAP_EFR2_TIMEOUT_BEHAVE);
	} 

I would like to understand if the below DMA operation is possible on this SOC.

One time setup for the UART RX DMA and then let the UART core to send the DMA request to initiate the DMA transfer whenever RX FIFO has data without involving CPU.

And the DMA core will continue to advance to the next buffer descriptor until the end and then goes back to the first buffer descriptor in cyclic mode.

Of course, DMA core needs to have a way to report the progress as how many data has been transferred.

The reason for this requirement is that our measurement data will be streaming continuously from the measurement hardware at 3.125Mbps baudrate.

The product needs to meet the 50,000 readings per second. Hence, we don't want to have 50,000 interrupts per second to the system.

rgds,

kc Wong

  • Hi Kiung,

    Thanks for your query.

    Please expect responses in few business days.

    Thanks,

    Vaibhav

  • Hello Kiung,

    Please expect delayed responses due to the holidays.

    Thanks for your patience.

    Regards,

    Vaibhav

  • Hi,

    The current 8250_omap UART driver and UDMA driver do not support DMA cyclic mode for the UART interface.

  • Hi support,


    I spent some times to trace the Linux UART driver code in "drivers/tty/serial/8250/8250_omap.c".

    This is my understanding on how it is working.

    When RX FIFO has data, an interrupt is generated and only then a DMA transfer is queued to move the data in omap_8250_rx_dma() function. At the same time, RX FIFO interrupt is disabled while RX DMA is enabled.

    https://elixir.bootlin.com/linux/v5.10.145/source/drivers/tty/serial/8250/8250_omap.c#L923

    When RX timeout interrupt is generated, the omap_8250_rx_dma_flush() function will have to pause the DMA if it is still in progress. And then copy the data from the DMA buffer to the TTY buffer in __dma_rx_do_complete() function and of course re-enable RX FIFO interrupt.

    https://elixir.bootlin.com/linux/v5.10.145/source/drivers/tty/serial/8250/8250_omap.c#L898
    https://elixir.bootlin.com/linux/v5.10.145/source/drivers/tty/serial/8250/8250_omap.c#L810


    This use model is to start the RX DMA transfer using the RX FIFO interrupt. And transfer the data from the DMA buffer to another intermediate buffer using the RX timeout interrupt.

    Question is can we do the RX DMA setup as in omap_8250_rx_dma() function before the RX FIFO has data ? Or it is a must to have data in RX FIFO first, then only can setup the RX DMA ?

    It is because my experience with DMA on other processor, the DMA can be setup before the RX data is actually available. 

    rgds,

    kc Wong

  • Hi,

    Question is can we do the RX DMA setup as in omap_8250_rx_dma() function before the RX FIFO has data ?

    I think it is technically possible.

  • Hi Bin Liu,

    Maybe I explain why we are trying to create a custom serial DMA driver. Hopefully, you can help.

    Our project is basically migrating a 10-year old WinCE application which is originally running on SPEAr320S.

     

    The SPEAr320S is using UART with 3.125Mbps baudrate to communicate with the measurement board.

     

    The SPEAr320S has PrimeCell UART (PL011) and PrimeCell DMA Controller (PL080).

     

    PL080 has a destination address register for application to know where the DMA will write next. That destination address register basically acts like a DMA write pointer.

     

    Then, a circular buffer is setup to be accessible by both the application and the DMA for UART receiving.

     

    So, the application just have to poll that register at 10ms interval. Application will know there is new data in the buffer if the DMA write pointer is not the same as the application read pointer.

     

    This custom serial DMA driver will expose to application with a set of IOCTL commands. The main purpose of this custom driver is to do UART receiving using DMA to avoid the overhead with the standard file APIs.

     

    Instead, the custom driver will use the DMA driver directly through the DMA engine slave APIs.

    IOCTL commands

    IOCTL_DMAC_ALLOCATE_MEMORY

    IOCTL_DMAC_FREE_MEMORY

    IOCTL_DMAC_START_DMA

    IOCTL_DMAC_WAIT_DMA_COMPLETE

    IOCTL_DMAC_DISABLE_DMA

    IOCTL_DMAC_ABORT_DMA

    IOCTL_DMAC_GET_DMA_STATUS

    The custom driver consists of 2 main parts:-

    1. Memory allocation:
      (1) dma_alloc_coherent() function allocates non-cached physically contiguous memory.
      (2) dma_free_coherent()
    2. DMA operation control using DMA engine slave APIs:
      (1) dma_request_channel()
      (2) dma_release_channel()
      (3) dmaengine_prep_slave_sg()/dmaengine_prep_dma_cyclic()
      (4) dmaengine_submit()
      (5) dma_async_issue_pending()
      (6) dmaengine_tx_status()

     

    The first part is easy. But, the second part is not that straightforward because of the differences of the DMA core.

    Attached at the end is the custom serial DMA driver for ST SPEAr320S processor, which is working. We also got it working on NXP i.MX6ULL processor.



    Now, we are trying to do the same for TI AM62x processor. Below is the progress for this work on TI AM62x processor.


    (A) I have added a new device node for DMAC device in device tree.

    	dmac {
    		compatible = "dmac_driver";
    		status = "okay";
    		dmas = <&main_pktdma 0x4400 0>, <&main_pktdma 0xc400 0>;
    		dma-names = "rx", "tx";    
    	};


    (B) And successfully requested the DMA channel with below code.

    static int dmac_open(struct inode *inode, struct file *file)
    {
        struct device *dev = dmac_misc_device.this_device;
        dma_cap_mask_t mask;
        
        struct dma_slave_config rx_conf = {
            .src_addr = (UART_BASE + UART_RX),
            .src_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE,
            .direction = DMA_DEV_TO_MEM,
            .src_maxburst = 1,
            .device_fc = false,
        };
    
        dmac_t *p_dmac = (dmac_t *) devm_kzalloc(dev, sizeof(dmac_t ), GFP_KERNEL);
        if (!p_dmac)
        {
            dev_err(p_dmac->dev, "Failed to allocate memory for dmac_t\n");
            return -1;
        }
    
        p_dmac->dev = dev;
    
        file->private_data = p_dmac;
    
        dma_cap_zero(mask);
        dma_cap_set(DMA_SLAVE | DMA_CYCLIC, mask);
    
        dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
    
        p_dmac->chan = dma_request_chan(p_dmac->dev, "rx");
        if (IS_ERR_OR_NULL(p_dmac->chan))
        {
            dev_err(p_dmac->dev, "Failed to request DMA channel\n");
            return -1;
        }
    
        dev_dbg(p_dmac->dev, "open %s\n", dma_chan_name(p_dmac->chan));
    
        dmaengine_slave_config(p_dmac->chan, &rx_conf);
    
        p_dmac->uart_base = ioremap(UART_BASE, 1024);
        if (!p_dmac->uart_base)
        {
            dev_err(p_dmac->dev, "Failed to ioremap\n");
            return -1;
        }
    
        return 0;
    }

    (C) And this is also shown on the dmaengine summary.

    root@p550:~# cat /sys/kernel/debug/dmaengine/summary
    dma0 (485c0100.dma-controller): number of channels: 48
    
    dma1 (485c0000.dma-controller): number of channels: 35
     dma1chan0 | dmac:rx (DEV_TO_MEM, rchan0 [0x4400 -> 0x9000], rflow0, PDMA, Packet mode)
    root@p550:~#

    (D) Now we are trying to start the RX DMA with below code. But, it does not seem to work. This is where we are now ... Hopefully, you can help.

        case DMAC_START_DMA:
            {
                start_dma_arg_t k_arg;
    
                struct scatterlist sglist;
                struct dma_async_tx_descriptor *desc;
    
                unsigned int ier;
                unsigned int scr;
    
                if ( copy_from_user(&k_arg , (start_dma_arg_t *)arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy from user\n");
                    return -EFAULT;
                }
    
                dev_dbg(p_dmac->dev, "DMAC_START_DMA, size = %ld block_size = %ld physical = 0x%llx\n",
                        k_arg.i_size,
                        k_arg.i_block_size,
                        k_arg.i_physical_address);
    
                {
                    sg_init_table(&sglist, 1);
                    sg_set_page(&sglist, phys_to_page(k_arg.i_physical_address), k_arg.i_size, offset_in_page(k_arg.i_physical_address));
                    sg_dma_address(&sglist) = k_arg.i_physical_address;
                    sg_dma_len(&sglist) = k_arg.i_size;
    
                    desc = dmaengine_prep_slave_sg(p_dmac->chan,
                        &sglist,
                        1,
                        DMA_DEV_TO_MEM,
                        DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
                }
    
    
                if (!desc)
                {
                    dev_err(p_dmac->dev, "Failed to get a descriptor for transaction\n");
                    return -EBUSY;
                }
    
    
                desc->callback = dmac_rx_callback;
                desc->callback_param = p_dmac;
    
                p_dmac->cookie = dmaengine_submit(desc);
                if ( dma_submit_error(p_dmac->cookie) )
                {
                    dev_err(p_dmac->dev, "Failed to submit a descriptor for transaction\n");
                    return -EIO;
                }
                
                dma_async_issue_pending(p_dmac->chan);
    
                ier = readl(p_dmac->uart_base + (UART_IER << REG_SHIFT));
                scr = readl(p_dmac->uart_base + (UART_OMAP_SCR << REG_SHIFT));
    
    			/*
    			 * Disable RX interrupts to allow RX DMA completion
    			 * callback to run.
    			 */
                ier &= ~(UART_IER_RLSI | UART_IER_RDI);
                scr = OMAP_UART_SCR_RX_TRIG_GRANU1_MASK | OMAP_UART_SCR_TX_EMPTY |
                            OMAP_UART_SCR_TX_TRIG_GRANU1_MASK;
                scr |= OMAP_UART_SCR_DMAMODE_1 | OMAP_UART_SCR_DMAMODE_CTL;
    
                writel(ier, p_dmac->uart_base + (UART_IER << REG_SHIFT));
    
                /*
                 * The manual recommends not to enable the DMA mode selector in the SCR
                 * (instead of the FCR) register _and_ selecting the DMA mode as one
                 * register write because this may lead to malfunction.
                 */
                writel(scr & ~OMAP_UART_SCR_DMAMODE_MASK, p_dmac->uart_base + (UART_OMAP_SCR << REG_SHIFT));
                writel(scr, p_dmac->uart_base + (UART_OMAP_SCR << REG_SHIFT));
    
                // p_dmac->physical_chan = dmac_find_physical_chan(p_dmac, p_dmac->data.dma_request);
                if ( p_dmac->physical_chan != INVALID_DMA_CHAN ) p_dmac->running = true;
    
                k_arg.o_dma_chan = p_dmac->physical_chan;
                
                if ( copy_to_user((start_dma_arg_t *)arg, &k_arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy to user\n");
                    return -EFAULT;
                }
            }
            break;

    rgds,

    kc Wong

    #include <linux/module.h>
    #include <linux/kernel.h>
    #include <linux/fs.h>
    #include <linux/ioctl.h>
    #include <linux/miscdevice.h>
    
    #include <linux/dma-mapping.h>
    #include <linux/dmaengine.h>
    
    #include <linux/completion.h>
    
    // STM320 is having PrimeCell UART (PL011)
    // and PrimeCell DMA Controller (PL080).
    #include <linux/amba/pl08x.h>
    #include <linux/amba/pl080.h>
    #include <linux/amba/serial.h>
    
    #include "dmac_driver.h"
    
    #define UART2_BASE 0xA4000000
    #define DMAC_BASE  0xFC400000
    
    // UART Peripheral identification
    #define UARTPeriphID0 0xFE0
    #define UARTPeriphID1 0xFE4
    #define UARTPeriphID2 0xFE8
    #define UARTPeriphID3 0xFEC
    
    // UART Cell identification
    #define UARTPCellID0 0xFF0
    #define UARTPCellID1 0xFF4
    #define UARTPCellID2 0xFF8
    #define UARTPCellID3 0xFFC
    
    // DMAC Peripheral identification
    #define DMACPeriphID0 0xFE0
    #define DMACPeriphID1 0xFE4
    #define DMACPeriphID2 0xFE8
    #define DMACPeriphID3 0xFEC
    
    // DMAC Cell identification
    #define DMACPCellID0 0xFF0
    #define DMACPCellID1 0xFF4
    #define DMACPCellID2 0xFF8
    #define DMACPCellID3 0xFFC
    
    typedef struct dmac
    {
        struct device *dev;
    
        // Memory allocation
        dma_addr_t dma_addr;
        void *cpu_addr;
    
        // DMA control
        struct dma_chan *chan;
        dma_cookie_t cookie;
        bool running;
        void __iomem *dmac_addr;
        int physical_chan;
    
        // IO control
        void __iomem *io_addr;
    
        // Synchronization
        struct completion complete;
    } dmac_t;
    
    static int __init dmac_init(void);
    static void __exit dmac_exit(void);
    
    static int dmac_open(struct inode *inode, struct file *file);
    static int dmac_release(struct inode *inode, struct file *file);
    static ssize_t dmac_read(struct file *filp, char __user *buf, size_t len,loff_t * off);
    static ssize_t dmac_write(struct file *filp, const char *buf, size_t len, loff_t * off);
    static long dmac_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
    static int dmac_mmap(struct file *file_p, struct vm_area_struct *vma);
    
    static void dmac_rx_callback(void *data);
    
    static int dmac_find_physical_chan(dmac_t *p_dmac, phys_addr_t src_addr);
    
    static void dmac_print_register(dmac_t *p_dmac);
    
    static struct file_operations dmac_fops =
    {
        .owner      = THIS_MODULE,
        .read       = dmac_read,
        .write      = dmac_write,
        .open       = dmac_open,
        .release    = dmac_release,
        .unlocked_ioctl = dmac_ioctl,
        .mmap	= dmac_mmap,
    };
    
    static struct miscdevice dmac_misc_device = {
        .minor = MISC_DYNAMIC_MINOR,
        .name = "dmac",
        .fops = &dmac_fops,
    };
    
    static int __init dmac_init(void)
    {
        int error;
    
        pr_info("misc dmac: init\n");
    
        error = misc_register(&dmac_misc_device);
        if (error) {
            pr_err("misc dmac: Failed to  register misc device\n");
            return error;
        }
    
        return 0;
    }
    
    static void __exit dmac_exit(void)
    {
        pr_info("misc dmac: exit\n");
    
        misc_deregister(&dmac_misc_device);
    }
    
    static int dmac_open(struct inode *inode, struct file *file)
    {
        struct device *dev = dmac_misc_device.this_device;
        dma_cap_mask_t mask;
        
        struct dma_slave_config rx_conf = {
            .src_addr = (UART2_BASE + UART01x_DR),
            .src_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE,
            .direction = DMA_DEV_TO_MEM,
            .src_maxburst = 1,
            .device_fc = false,
        };
    
        dmac_t *p_dmac = (dmac_t *) devm_kzalloc(dev, sizeof(dmac_t ), GFP_KERNEL);
        p_dmac->dev = dev;
        p_dmac->physical_chan = INVALID_DMA_CHAN;
    
        file->private_data = p_dmac;
    
        dma_cap_zero(mask);
        dma_cap_set(DMA_SLAVE | DMA_CYCLIC, mask);
        
        // Request the DMA channel specific for UART2 RX.
        p_dmac->chan = dma_request_channel(mask, pl08x_filter_id, "uart2_rx");
        if (!p_dmac->chan)
        {
            dev_err(p_dmac->dev, "Failed to request DMA channel\n");
            return -1;
        }
    
        dev_dbg(p_dmac->dev, "open %s\n", dma_chan_name(p_dmac->chan));
    
        dmaengine_slave_config(p_dmac->chan, &rx_conf);
    
        p_dmac->io_addr = ioremap(UART2_BASE, 1024);
        if (!p_dmac->io_addr)
        {
            dev_err(p_dmac->dev, "Failed to ioremap\n");
            return -1;
        }
    
        p_dmac->dmac_addr = ioremap(DMAC_BASE, 1024);
        if (!p_dmac->dmac_addr)
        {
            dev_err(p_dmac->dev, "Failed to ioremap\n");
            return -1;
        }
    
        init_completion (&p_dmac->complete);
    
        return 0;
    }
    
    static int dmac_release(struct inode *inode, struct file *file)
    {
        dmac_t *p_dmac = (dmac_t *) file->private_data;
    
        if (p_dmac->chan) dma_release_channel(p_dmac->chan);
    
        if (p_dmac->io_addr) iounmap(p_dmac->io_addr);
    
        if (p_dmac->dmac_addr) iounmap(p_dmac->dmac_addr);
        
        dev_dbg(p_dmac->dev, "release\n");
        return 0;
    }
    
    static ssize_t dmac_read(struct file *file, char __user *buf, size_t len, loff_t *off)
    {
        dmac_t *p_dmac = (dmac_t *) file->private_data;
        
        dev_dbg(p_dmac->dev, "read\n");
        return 0;
    }
    
    static ssize_t dmac_write(struct file *file, const char __user *buf, size_t len, loff_t *off)
    {
        dmac_t *p_dmac = (dmac_t *) file->private_data;
        
        dev_dbg(p_dmac->dev, "write\n");
        return len;
    }
    
    static long dmac_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
    {
        dmac_t *p_dmac = (dmac_t *) file->private_data;
        
        switch (cmd)
        {
        case DMAC_ALLOCATE_MEMORY:
            {
                allocate_memory_arg_t k_arg;
    
                if ( copy_from_user(&k_arg , (allocate_memory_arg_t *)arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy from user\n");
                    return -EFAULT;
                }
    
                p_dmac->cpu_addr = dma_alloc_coherent(p_dmac->dev, k_arg.i_size, &p_dmac->dma_addr, GFP_KERNEL);
    
                k_arg.o_kernel_virtual_address = (uint32_t) p_dmac->cpu_addr;
                k_arg.o_physical_address = (uint32_t) p_dmac->dma_addr;
    
                dev_dbg(p_dmac->dev, "DMAC_ALLOCATE_MEMORY, size = %d virtual = 0x%x physical = 0x%x\n",
                        k_arg.i_size,
                        k_arg.o_kernel_virtual_address,
                        k_arg.o_physical_address);
    
                if ( copy_to_user((allocate_memory_arg_t *)arg, &k_arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy to user\n");
                    return -EFAULT;
                }
            }
            break;
    
        case DMAC_FREE_MEMORY:
            {
                free_memory_arg_t k_arg;
    
                if ( copy_from_user(&k_arg , (free_memory_arg_t *)arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy from user\n");
                    return -EFAULT;
                }
    
                dev_dbg(p_dmac->dev, "DMAC_FREE_MEMORY, size = %d virtual = 0x%x physical = 0x%x\n",
                        k_arg.i_size,
                        k_arg.i_kernel_virtual_address,
                        k_arg.i_physical_address);
                
                dma_free_coherent(p_dmac->dev, k_arg.i_size, (void *) k_arg.i_kernel_virtual_address, (dma_addr_t) k_arg.i_physical_address);
            }
            break;
    
        case DMAC_START_DMA:
            {
                start_dma_arg_t k_arg;
    
                struct scatterlist sglist;
                struct dma_async_tx_descriptor *desc;
    
                unsigned int dmacr;
                unsigned int imsc;
    
                if ( copy_from_user(&k_arg , (start_dma_arg_t *)arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy from user\n");
                    return -EFAULT;
                }
    
                dev_dbg(p_dmac->dev, "DMAC_START_DMA, size = %d block_size = %d physical = 0x%x\n",
                        k_arg.i_size,
                        k_arg.i_block_size,
                        k_arg.i_physical_address);
    
                if ( k_arg.i_block_size == 0 )
                {
                    sg_init_table(&sglist, 1);
                    sg_set_page(&sglist, phys_to_page(k_arg.i_physical_address), k_arg.i_size, offset_in_page(k_arg.i_physical_address));
                    sg_dma_address(&sglist) = k_arg.i_physical_address;
                    sg_dma_len(&sglist) = k_arg.i_size;
    
                    desc = dmaengine_prep_slave_sg(p_dmac->chan,
                        &sglist,
                        1,
                        DMA_DEV_TO_MEM,
                        DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
                }
                else
                {
                    desc = dmaengine_prep_dma_cyclic(p_dmac->chan,
                        k_arg.i_physical_address,
                        k_arg.i_size,
                        k_arg.i_block_size,
                        DMA_DEV_TO_MEM,
                        DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
                }
                if (!desc)
                {
                    dev_err(p_dmac->dev, "Failed to get a descriptor for transaction\n");
                    return -EBUSY;
                }
    
                desc->callback = dmac_rx_callback;
                desc->callback_param = p_dmac;
    
                p_dmac->cookie = dmaengine_submit(desc);
                if ( dma_submit_error(p_dmac->cookie) )
                {
                    dev_err(p_dmac->dev, "Failed to submit a descriptor for transaction\n");
                    return -EIO;
                }
                
                dma_async_issue_pending(p_dmac->chan);
    
                dmacr = readw_relaxed(p_dmac->io_addr + UART011_DMACR);
                imsc = readw_relaxed(p_dmac->io_addr + UART011_IMSC);
    
                dmacr |= UART011_RXDMAE; // Enable receive DMA
                imsc &= ~(UART011_RTIM | UART011_RXIM); // Disable receive timeout interrupt and receive interrupt
    
                writew_relaxed(dmacr, p_dmac->io_addr + UART011_DMACR);
                writew_relaxed(imsc, p_dmac->io_addr + UART011_IMSC);
    
                p_dmac->physical_chan = dmac_find_physical_chan(p_dmac, (UART2_BASE + UART01x_DR));
                if ( p_dmac->physical_chan != INVALID_DMA_CHAN ) p_dmac->running = true;
    
                k_arg.o_dma_chan = p_dmac->physical_chan;
                
                if ( copy_to_user((start_dma_arg_t *)arg, &k_arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy to user\n");
                    return -EFAULT;
                }
            }
            break;
    
        case DMAC_WAIT_DMA_COMPLETE:
            {
                wait_dma_complete_arg_t k_arg;
    
                uint32_t timeout;
    
                if ( copy_from_user(&k_arg , (wait_dma_complete_arg_t *)arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy from user\n");
                    return -EFAULT;
                }
    
                timeout = msecs_to_jiffies(k_arg.i_millisec_wait);
                timeout = wait_for_completion_timeout(&p_dmac->complete, timeout);
    
                k_arg.o_completed_event = false;
                k_arg.o_timeout_event = false;
    
                if (timeout > 0)
                {
                    k_arg.o_completed_event = true;
                    reinit_completion(&p_dmac->complete);
                }
                else if (timeout == 0)
                {
                    k_arg.o_timeout_event = true;
                }
    
                dev_dbg(p_dmac->dev, "DMAC_WAIT_DMA_COMPLETE, millisec_wait = %d completed = %d timeout = %d\n",
                        k_arg.i_millisec_wait,
                        k_arg.o_completed_event,
                        k_arg.o_timeout_event);
    
                if ( copy_to_user((wait_dma_complete_arg_t *)arg, &k_arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy to user\n");
                    return -EFAULT;
                }
            }
            break;
    
        case DMAC_DISABLE_DMA:
            if ( p_dmac->running )
            {
                struct dma_tx_state state;
                enum dma_status status;
    
                unsigned int dmacr;
    
                // This pauses activity on the DMA channel without data loss.
                if ( dmaengine_pause(p_dmac->chan) )
                {
                    dev_err(p_dmac->dev, "Failed to pause DMA channel\n");
                }
    
                status = dmaengine_tx_status(p_dmac->chan, p_dmac->cookie, &state);
                if (status != DMA_PAUSED)
                {
                    dev_err(p_dmac->dev, "Failed to pause DMA channel %d\n", status);
                }
    
                dev_dbg(p_dmac->dev, "DMAC_DISABLE_DMA, residue = %d, in_flight_bytes = %d\n", state.residue, state.in_flight_bytes);
                    
                p_dmac->running = false;
                    
                if ( dmaengine_terminate_async(p_dmac->chan) )
                {
                    dev_err(p_dmac->dev, "Failed to terminate a DMA transfer\n");
                }
    
                dmacr = readw_relaxed(p_dmac->io_addr + UART011_DMACR);
                dmacr &= ~UART011_RXDMAE; // Disable receive DMA - incoming data will wait in the FIFO
                writew_relaxed(dmacr, p_dmac->io_addr + UART011_DMACR);
            }
            break;
    
        case DMAC_ABORT_DMA:
            if ( p_dmac->running )
            {
                unsigned int dmacr;
                
                dev_dbg(p_dmac->dev, "DMAC_ABORT_DMA\n");
    
                p_dmac->running = false;
                
                // This causes all activity for the DMA channel to be stopped,
                // and may discard data in the DMA FIFO which hasn’t been fully transferred.
                // No callback functions will be called for any incomplete transfers.
                if ( dmaengine_terminate_async(p_dmac->chan) )
                {
                    dev_err(p_dmac->dev, "Failed to terminate a DMA transfer\n");
                }
    
                dmacr = readw_relaxed(p_dmac->io_addr + UART011_DMACR);
                dmacr &= ~UART011_RXDMAE; // Disable receive DMA - incoming data will wait in the FIFO
                writew_relaxed(dmacr, p_dmac->io_addr + UART011_DMACR);
            }
            break;
    
        case DMAC_GET_DMA_STATUS:
            {
                get_dma_status_arg_t k_arg;
                
                if ( p_dmac->physical_chan == INVALID_DMA_CHAN ) return -EFAULT;
    
                dev_dbg(p_dmac->dev, "DMAC_GET_DMA_STATUS\n");
    
                k_arg.o_destination_address = readl(p_dmac->dmac_addr + PL080_Cx_BASE(p_dmac->physical_chan) + PL080_CH_DST_ADDR);
    
                if ( copy_to_user((get_dma_status_arg_t *)arg, &k_arg, sizeof(k_arg)) )
                {
                    dev_err(p_dmac->dev, "Failed copy to user\n");
                    return -EFAULT;
                }
            }
            break;
    
        default:
            dev_err(p_dmac->dev, "unknown ioctl command\n");
            return -ENOTTY;
        }
        return 0;
    }
    
    static int dmac_mmap(struct file *file, struct vm_area_struct *vma)
    {
        dmac_t *p_dmac = (dmac_t *) file->private_data;
    
        if (!p_dmac->cpu_addr || !p_dmac->dma_addr)
        {
            dev_err(p_dmac->dev, "Memory is not allocated\n");
            return -ENOMEM;
        }
        
        dev_dbg(p_dmac->dev, "mmap, virtual = 0x%x physical = 0x%x\n",
                (uint32_t) p_dmac->cpu_addr,
                (uint32_t) p_dmac->dma_addr);
    
        return dma_mmap_coherent(p_dmac->dev, vma,
                                 p_dmac->cpu_addr, p_dmac->dma_addr,
                                 vma->vm_end - vma->vm_start);
    }
    
    static void dmac_rx_callback(void *data)
    {
        dmac_t *p_dmac = (dmac_t *)data;
        
        dev_dbg(p_dmac->dev, "rx_callback\n");
    
    	/* Indicate the DMA transaction completed to allow the other
    	 * thread of control to finish processing
    	 */
    	complete(&p_dmac->complete);
    }
    
    static int dmac_find_physical_chan(dmac_t *p_dmac, phys_addr_t src_addr)
    {
        static const int num_chan = 8;
        int chan = 0;
    
        for (; chan < num_chan; ++chan)
        {
            phys_addr_t addr = readl(p_dmac->dmac_addr + PL080_Cx_BASE(chan) + PL080_CH_SRC_ADDR);
            
            if (addr == src_addr) return chan; // found
        }
    
        return INVALID_DMA_CHAN;
    }
    
    __maybe_unused static void dmac_print_register(dmac_t *p_dmac)
    {
        void __iomem *misc_addr;
        unsigned int enb;
        unsigned int cfg;
        
        unsigned int physical_chan = dmac_find_physical_chan(p_dmac, (UART2_BASE + UART01x_DR));
        if (physical_chan == INVALID_DMA_CHAN) return;
    
        // Miscellaneous registers (Misc)
        misc_addr = ioremap(0xFCA80000, 1024);
        if (!misc_addr)
        {
            dev_err(p_dmac->dev, "Failed to ioremap\n");
            return;
        }
    
        enb = readl(misc_addr + 0x2C); // PERIP1_CLK_ENB
        cfg = readl(misc_addr + 0xA0); // DMA_CHN_CFG
    
        dev_dbg(p_dmac->dev, "MISC: PERIP1_CLK_ENB = 0x%x DMA_CHN_CFG = 0x%x\n", enb, cfg);
    
        // DMAC global registers.
        dev_dbg(p_dmac->dev, "DMACIntStatus = 0x%x\n", readl(p_dmac->dmac_addr + PL080_INT_STATUS)); // DMACIntStatus
        dev_dbg(p_dmac->dev, "DMACIntTCStatus = 0x%x\n", readl(p_dmac->dmac_addr + PL080_TC_STATUS)); // DMACIntTCStatus
        dev_dbg(p_dmac->dev, "DMACIntErrorStatus = 0x%x\n", readl(p_dmac->dmac_addr + PL080_ERR_STATUS)); // DMACIntErrorStatus
        dev_dbg(p_dmac->dev, "DMACRawIntTCStatus = 0x%x\n", readl(p_dmac->dmac_addr + PL080_RAW_TC_STATUS)); // DMACRawIntTCStatus
        dev_dbg(p_dmac->dev, "DMACRawIntErrorStatus = 0x%x\n", readl(p_dmac->dmac_addr + PL080_RAW_ERR_STATUS)); // DMACRawIntErrorStatus
        dev_dbg(p_dmac->dev, "DMACEnbldChns = 0x%x\n", readl(p_dmac->dmac_addr + PL080_EN_CHAN)); // DMACEnbldChns
        dev_dbg(p_dmac->dev, "DMACConfiguration = 0x%x\n", readl(p_dmac->dmac_addr + PL080_CONFIG)); // DMACConfiguration
    
        // DMAC channel specific registers.
        dev_dbg(p_dmac->dev, "DMACCnSrcAddr = 0x%x\n", readl(p_dmac->dmac_addr + PL080_Cx_BASE(physical_chan) + PL080_CH_SRC_ADDR)); // DMACCnSrcAddr
        dev_dbg(p_dmac->dev, "DMACCnDestAddr = 0x%x\n", readl(p_dmac->dmac_addr + PL080_Cx_BASE(physical_chan) + PL080_CH_DST_ADDR)); // DMACCnDestAddr
        dev_dbg(p_dmac->dev, "DMACCnLLI = 0x%x\n", readl(p_dmac->dmac_addr + PL080_Cx_BASE(physical_chan) + PL080_CH_LLI)); // DMACCnLLI
        dev_dbg(p_dmac->dev, "DMACCnControl = 0x%x\n", readl(p_dmac->dmac_addr + PL080_Cx_BASE(physical_chan) + PL080_CH_CONTROL)); // DMACCnControl
        dev_dbg(p_dmac->dev, "DMACCnConfiguration = 0x%x\n", readl(p_dmac->dmac_addr + PL080_Cx_BASE(physical_chan) + PL080_CH_CONFIG)); // DMACCnConfiguration
    
        dev_dbg(p_dmac->dev, "DMAC Peripheral ID: 0x%x 0x%x 0x%x 0x%x\n",
                 readl_relaxed(p_dmac->dmac_addr + DMACPeriphID0),  // 0x80
                 readl_relaxed(p_dmac->dmac_addr + DMACPeriphID1),  // 0x10
                 readl_relaxed(p_dmac->dmac_addr + DMACPeriphID2),  // 0x14
                 readl_relaxed(p_dmac->dmac_addr + DMACPeriphID3)); // 0x0A
    
        dev_dbg(p_dmac->dev, "DMAC Cell ID: 0x%x 0x%x 0x%x 0x%x\n",
                 readl_relaxed(p_dmac->dmac_addr + DMACPCellID0),  // 0x0D
                 readl_relaxed(p_dmac->dmac_addr + DMACPCellID1),  // 0xF0
                 readl_relaxed(p_dmac->dmac_addr + DMACPCellID2),  // 0x05
                 readl_relaxed(p_dmac->dmac_addr + DMACPCellID3)); // 0xB1
    
        dev_dbg(p_dmac->dev, "UART Peripheral ID: 0x%x 0x%x 0x%x 0x%x\n",
                 readw_relaxed(p_dmac->io_addr + UARTPeriphID0),  // 0x11
                 readw_relaxed(p_dmac->io_addr + UARTPeriphID1),  // 0x10
                 readw_relaxed(p_dmac->io_addr + UARTPeriphID2),  // 0x24
                 readw_relaxed(p_dmac->io_addr + UARTPeriphID3)); // 0x00
    
        dev_dbg(p_dmac->dev, "UART Cell ID: 0x%x 0x%x 0x%x 0x%x\n",
                 readw_relaxed(p_dmac->io_addr + UARTPCellID0),  // 0x0D
                 readw_relaxed(p_dmac->io_addr + UARTPCellID1),  // 0xF0
                 readw_relaxed(p_dmac->io_addr + UARTPCellID2),  // 0x05
                 readw_relaxed(p_dmac->io_addr + UARTPCellID3)); // 0xB1
    }
    
    module_init(dmac_init);
    module_exit(dmac_exit);
     
    MODULE_LICENSE("GPL");
    MODULE_AUTHOR("Keysight Technologies");
    MODULE_DESCRIPTION("DMAC driver");
    MODULE_VERSION("1:0.0");

  • Hi KC,

    Thanks for your explanation, now I understand that you work on a custom driver for UART. We don't typically provide support on custom driver development, so I glanced your code but didn't read every single line of the code. My first question is you cannot use the standard Linux UART driver for your application and have to have this custom driver?

    But, the second part is not that straightforward because of the differences of the DMA core.

    It shouldn't be such case, because the purpose of the kernel dmaengine framework is to provide an unified API and hide the details of DMA cores.

  • Hi Bin Liu,

    Yes, the standard Linux UART driver does not match the use case of this legacy application.

    The application needs to have direct access to the DMA buffer so that it can have the zero copy buffer design as described in the video.


    https://www.youtube.com/watch?v=kRYtkvDbZSg&t=1479s

    https://www.youtube.com/watch?v=yJg-DkyH5CM&t=1029s

    rgds,

    kc Wong

  • Hi KC,

    Understood the requirement.

    Let's discuss details offline.

  • Hi KC,

    Please apply the following kernel patch and test the UART 8250_omap driver in non-DMA mode to see if you still have RX data loss in 3Mbps.

    This patch basically removes the "UART_RX_TIMEOUT_QUIRK" flag.

    diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
    index 2bc6b982238f..842f3c19192e 100644
    --- a/drivers/tty/serial/8250/8250_omap.c
    +++ b/drivers/tty/serial/8250/8250_omap.c
    @@ -1337,7 +1337,7 @@ static struct omap8250_dma_params am33xx_dma = {
     static struct omap8250_platdata am654_platdata = {
            .dma_params     = &am654_dma,
            .habit          = UART_HAS_EFR2 | UART_HAS_RHR_IT_DIS |
    -                         UART_RX_TIMEOUT_QUIRK | UART_HAS_NATIVE_RS485,
    +                         UART_HAS_NATIVE_RS485,
     };
     
     static struct omap8250_platdata am33xx_platdata = {

  • Hi Bin Liu,

    I removed "UART_RX_TIMEOUT_QUIRK" flag in non-DMA mode.

    It does not help with our application.

    rgds,

    kc Wong

  • Hi KC,

    Can you please explain how it does not help? What is symptom in the non-DMA mode? Please note I asked you to use the 8250_omap driver without any other modification, such as your direct memory copy to user space.

  • Hi Bin Liu,

    Yes, that is without any other modification, just the patch as given by you in non-DMA mode.

    Without any custom driver, the application just uses the standard file API to read from the UART port.

    SerialDma::StatusCode SerialDma::read(uint8_t const*& bBytes,
    									  uint32_t& iBytesRead,
    									  uint32_t iMaxBytesToRead)
    	{
    		static volatile uint8_t *pBuffer =  _pDmaMemory;
    		static volatile uint8_t *pBufferEnd = pBuffer + _sAllocInputParms.sizeInBytes;
    		static int fd = (int)_hSerial;
    		
    		bBytes = NULL;
    		iBytesRead = 0;
    		
    		fd_set fds;
    		FD_ZERO(&fds);
    		FD_SET(fd, &fds);
    		
    		int ms = getTimeout();
    		
    		struct timeval timeout;
    		timeout.tv_sec = ms / 1000;
    		timeout.tv_usec = (ms % 1000) * 1000;
    		
    		if (::select(fd + 1, &fds, NULL, NULL, (ms == INFINITE)?NULL:&timeout) > 0)
    		{
    			int n = ::read(fd, (void *)pBuffer, 4096);
    		
    			if (n > 0)
    			{
    				bBytes = (const uint8_t*)pBuffer;
    				iBytesRead = n;
    		
    				pBuffer = pBuffer + n;		
    				if (pBuffer > (pBufferEnd - 4096)) pBuffer =  _pDmaMemory;
    			}
    			
    			return ok;	
    		}
    		else
    		{
    			return ReadTimeout;
    		}
    	}

    At high level, the application is implementing the HDLC protocol. With or without "UART_RX_TIMEOUT_QUIRK" flag, the application still miss an expected HDLC packet.

    High-Level Data Link Control - Wikipedia

    rgds,

    kc Wong

  • Hi KC,

    With or without "UART_RX_TIMEOUT_QUIRK" flag,

    Please apply the patch to remove this flag. Otherwise, a random byte loss could happen.

    the application still miss an expected HDLC packet.

    I won't be able to debug the HDLC application. Please debug it and let me know what data are wrong in /dev/ttySx read or write.

    Or you can use the UART test methods mentioned in the link below to test the UART.

    https://dev.ti.com/tirex/explore/node?node=A__AaoTgeq5bZLef17lYvUnsw__AM62-ACADEMY__uiYMDcq__LATEST&placeholder=true

  • Hi Bin Liu,

    Based on the similar idea from another thread, I modified the "/drivers/tty/serial/8250/8250_port.c" to direct the UART RX data into the buffer setup to be directly accessed by the application instead of the TTY buffer as shown in below patch for non-DMA mode.

    diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
    index 9d60418e4adb..cf62bace525d 100644
    --- a/drivers/tty/serial/8250/8250_port.c
    +++ b/drivers/tty/serial/8250/8250_port.c
    @@ -37,6 +37,8 @@
     
     #include "8250.h"
     
    +#include "/home/keysight/sambashare/projects/torreys/dmms/Torreys/software/firmware/lkm/dmac_driver.c"
    +
     /* Nuvoton NPCM timeout register */
     #define UART_NPCM_TOR          7
     #define UART_NPCM_TOIE         BIT(7)  /* Timeout Interrupt Enable */
    @@ -1765,7 +1767,11 @@ void serial8250_read_char(struct uart_8250_port *up, unsigned char lsr)
     	if (uart_prepare_sysrq_char(port, ch))
     		return;
     
    -	uart_insert_char(port, lsr, UART_LSR_OE, ch, flag);
    +	if ( IRQ_NO != up->port.irq ||
    +		write_to_dmac_buffer(&ch, 1) < 0 )
    +	{
    +		uart_insert_char(port, lsr, UART_LSR_OE, ch, flag);
    +	}
     }
     EXPORT_SYMBOL_GPL(serial8250_read_char);
     
    

    By redirecting the UART RX data into the application buffer, that enables us to perform testing with the application. The testing result does show the differences between the image built with and without the "UART_RX_TIMEOUT_QUIRK" flag. 

    By removing the "UART_RX_TIMEOUT_QUIRK" flag, the application can receive the number of reading as expected from the measurement board, else we see missing reading. So, thanks for that patch. And do keep us informed when the fix will be released.

    But, we see overrun happens with the testing. I think that indicates that we still need the DMA mode to work eventually.

    root@p550:~# cat /proc/tty/driver/serial
    serinfo:1.0 driver revision:
    0: uart:unknown port:00000000 irq:0
    1: uart:8250 mmio:0x02810000 irq:20 tx:80 rx:265 pe:90 RTS|DTR
    2: uart:8250 mmio:0x02860000 irq:21 tx:34965 rx:266 RTS|DTR|DSR
    3: uart:8250 mmio:0x02800000 irq:19 tx:0 rx:1464811229 brk:53845 oe:83 RTS|DTR|DSR
    4: uart:unknown port:00000000 irq:0
    5: uart:unknown port:00000000 irq:0
    6: uart:unknown port:00000000 irq:0
    7: uart:unknown port:00000000 irq:0
    8: uart:unknown port:00000000 irq:0
    9: uart:unknown port:00000000 irq:0
    root@p550:~#
    

    At the same time, we will spend some times to debug why using the standard file API to read from the UART port will have packet miss while no packet miss is seen by redirecting the UART RX data into the application buffer.

    rgds,

    kc Wong

  • Hi KC,

    DMA mode only helps in offload CPU, it won't solve the RX overrun problem.

    If overrun happens, it basically means the RX FIFO read is not fast enough, and hardware flow control is required.

  • Hi Bin Liu,

    Temporary for now we use CPU for UART receiving with interrupt, at least it enables us to proceed with some application testing.


    I believe AM62x Cortex-A53 core is powerful enough to use CPU for UART receiving with interrupt. But unfortunately, the CPU core is not dedicated to performing only one single task.

    There may be time the higher priority interrupt is running, or interrupt gets disabled to access the OS critical section. So, using CPU for UART receiving with interrupt may not be reliable as shown with overrun happens.

    That's why a dedicated DMA is needed for UART receiving the real time measurement reading streaming from measurement board.

    That application originally was running on SPEAr320S WinCe without actually requires hardware flow control, but of course with DMA. We are porting this application to AM62x Linux now and we need the DMA to work without any byte loss.

    rgds,

    kc Wong

  • Hi KC,

    Have you tested with the RT Linux in the Processor SDK?