This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM3358: McASP DMA data drops

Part Number: AM3358


Tool/software: Linux

I am using DMA to service the MCASP and using a known sine wave input to validate the setup. I am getting discontinuities in the wave at random intervals which look like dropped data. When this occurs the MCASP gives no indication of an overrun or DMA error.

I saw the following post which describes similar behavior:

https://e2e.ti.com/support/dsp/omap_applications_processors/f/42/t/150666

The solution for the post above was fix something that was missed in the initialization sequence.

In my case I am clocking data into the MCASP while using the transmit clocks (ACLKXCTL ASYNC bit is cleared). The TRM section 22.3.12.2 indicates the receive registers should be set prior to the transmit registers. However it also indicates if an external clock is used it must be running prior to initialization. If the transmit section clocks the receive section should it be configured first? At what point in this sequence should RFIFOCTL and REVTCTL be configured?

  • What software is this? Which version?
  • This is a custom driver running on a 4.9 kernel. Due to some interface oddities we can't use the standard davinci-mcasp based sound driver. I understand that this isn't based on the supported TI Linux SDK, but I think the questions related to MCASP configuration steps should be relevant regardless of the underlying software.
  • Hi Andrew,

    The software team has been notified.

    I see you have reviewed Section 22.3.12.2 Transmit/Receive Section Initialization in spruh73p (www.ti.com/.../spruh73p.pdf). This sequence must be followed exactly.

    What memory are your ping/pong buffers located in?

    Have you reviewed these resources? Maybe you can also review the McASP code available in the Processor SDK (www.ti.com/.../processor-sdk-am335x)

    processors.wiki.ti.com/.../AM335x_Audio_Driver's_Guide

    processors.wiki.ti.com/.../McASP_Tips

    processors.wiki.ti.com/.../StarterWare_McASP - shows FIFO setup in sequence

    Regards,
    Mark
  • Mark,

    Thank you for the response. Based on the examples you have linked I am pretty sure the initialization code I am using is correct.

    A little more about my setup. I use the ARM core to configure DMA to move data from the MCASP into PRU shared memory. The DMA callback is used to interrupt the PRU to notify it that data is ready. The PRU manipulates the data and moves it to another buffer in PRU shared memory. After some threshold is reached, the PRU configures and initiates a DMA transfer to move processed data into DDR. MCASP DMA initialization is based off sound/soc/davinci/davinci-mcasp.c and PRU DMA setup is based on examples/am335x/PRU_edmaConfig from the PRU Software Support Package. Both the MCASP and PRU are using EDMA TC0. Communication between the ARM core and PRU is done with Remoteproc/RPMsg.

    This works 99% of the time, but I do keep running into intermittent data loss. The image below shows three dropouts grouped together in an otherwise perfect run of 35000 points.

  • Hello Andrew,

    That is a pretty complicated process. Can you isolate the issue to a smaller portion of it? Eg is the data loss occurring when the McASP is capturing it, in the PRU processing step, etc?

    Regards,
    Nick
  • Nick,

    I have verified the PRU to DDR DMA transfers are operating correctly under representative operating conditions. I did this by zeroing the DDR contents after my program allocates it. McASP to PRU DMA was operating as usual, but I wrote known values and patterns to the outgoing PRU buffer instead of manipulating the incoming data. PRU to DDR DMA was triggered normally. After the acquisition was complete I compared the result with the known input. The data matched what I was expecting.

    One of my coworkers has helped me validate that the manipulation routine is operating as expected and not corrupting data.

    So that leaves the McASP to PRU DMA transfer. I am already using a known input that does not match my output. But just to try something else I started playing with the PRU cycle counter to time the acquisition. I start the cycle counter at the beginning and read it after the last McASP to PRU transfer is received. What I am seeing is a higher than expected cycle count that is more pronounced on longer runs. I know there will be some additional latency as the PRU has to poll for interrupts, manipulate data, and go interact with EDMA. However my assumption is that this latency is small and should be fairly consistent between acquisitions of different sizes. As an example I did an acquisition which would require 20016 MCASP to PRU DMA transfers but the cycle counter indicated enough time transpired for 20129 transfers. A run requiring 2016 transfers only had time for 2016 transfers. In my mind this indicates some transfers never occurred on the longer acquisition. I do have RDMAERR and ROVRN interrupts enabled but they never get triggered. What else should I try?

  • Hello Andrew,

    1) Could you tell us about how the clocks are setup with the McASP and the audio codec? We need to make sure those have the same clock source so information doesn't go missing between the codec and the McASP.

    2) Could you go more in depth on the McASP to PRU DMA scheme? e.g. are you using ping-pong buffers? How much time does each buffer contain? (e.g. 1ms, 10 ms, etc - the DMA should be able to tell you each time it finishes a transfer to a buffer) Does the PRU code have time to address the data in a buffer before the DMA starts writing to that buffer again?

    Regards,
    Nick
  • I apologize in advance for the novel. I have been trying to figure a way to
    describe my setup without a complete information overload. The following is
    incomplete but I think it should give you a pretty good idea of my setup. I am
    including some code from the kernel module and the PRU which I think are helpful.

    The clock source can be either the internal 24MHz functional clock or an external
    36.864MHz oscillator. As I mentioned earlier we are using the transmit clocks
    AHCLKX/ACLKX/FSX instead of the receive clocks due to pin mux constraints. The
    oscillator is enabled or disabled with a simple gpio line parsed from the device
    tree. I see the same behavior regardless of the clock source and even with a
    board that had the oscillator removed.

    Regarding the DMA setup. I am using a ring buffer located in PRU shared memory
    that contains 16 positions. DMA from the McASP to PRU is configured as a cyclic
    transfer using the standard dmaengine API. For the cyclic transfer each ring
    buffer segment has a dedicated EDMA PaRAM that is linked to the next buffer
    position. These transfers will run until explicitly stopped. I have DMA
    configured to interrupt after each transfer completes. The callback function
    writes the total number of completed dma transfers to a location in PRU shared
    memory and then writes to the PRU INTC SISR register to inform the PRU data is
    ready.

    Each DMA transfer consists of 10 frames containing 6 32-bit tdm slots. Max data
    rate is 144kHz, so for the worst case scenario we get a DMA transfer of 240 bytes
    (10 frames * 6 slots * 4 bytes) every ~69us. With 16 segments in the ring buffer
    old values will be overwritten in ~1.1ms. The PRU is clocked at 200MHz so
    that gives a worst case of 1388 clock ticks per incoming transfer to decode and
    move data into a ring buffer located in PRU shared memory. When a buffer segment
    containing processed data is full the PRU initiates a DMA transfer to move data
    out of PRU shared memory and into DDR memory.

    The PRU maintains two tables of DDR address. When one table is loaded with
    processed data the PRU notifies the ARM core via rpmsg and switches to the other
    table. The ARM core then refills the empty table with valid DDR addresses and
    uses rpmsg to tell the PRU new DDR addresses have been loaded.

    The kernel module controls the McASP with the following functions:

    #define WORDS_PER_FRAME         6
    #define DMA_XFR_FRAMES          10
    #define DMA_XFR_WORD_SIZE       sizeof(int32_t)
    #define DMA_XFR_WORDS           (DMA_XFR_FRAMES * WORDS_PER_FRAME)
    #define DMA_XFR_BYTES           (DMA_XFR_WORDS * DMA_XFR_WORD_SIZE)
    #define DMA_RING_BUFFER_SLOTS   16
    #define DMA_RING_BUFFER_WORDS   (DMA_XFR_WORDS * DMA_RING_BUFFER_SLOTS)
    #define DMA_RING_BUFFER_BYTES   (DMA_XFR_BYTES * DMA_RING_BUFFER_SLOTS)
    
    static void mcasp_start_ctrl_clocks(struct mcasp *asp, u16 ahclkdiv, u16 aclkdiv)
    {
    	/* external clock must be running prior to initialization */
    	u8 use_ext_osc = (asp->clksrc == CLKSRC_XOSC) && asp->xosc.present;
    	if (use_ext_osc) {
    		gpio_set_value(asp->xosc.gpio, 1);
    	} else if (asp->xosc.present) {
    		gpio_set_value(asp->xosc.gpio, 0);
    	}
    
    	/* 1. s/w reset */
    	mcasp_write(asp, GBLCTL_OFFSET, 0);
    
    	/* 2a. set debug behavior */
    	mcasp_set_bits(asp, PWRIDLESYSCONFIG_OFFSET, IDLEMODE(1));
    
    	/* 2b. rx registers */
    	mcasp_write(asp, RMASK_OFFSET, 0xffffffff);
    	mcasp_write(asp, RFMT_OFFSET, XRRVRS | XRSSZ(0xf));
    	mcasp_write(asp, AFSRCTL_OFFSET, XRMOD(8) | FSXRM);
    	mcasp_write(asp, ACLKRCTL_OFFSET, CLKXRP | CLKXRM | CLKXRDIV(aclkdiv));
    	mcasp_write(asp, AHCLKRCTL_OFFSET, HCLKXRM | HCLKXRDIV(ahclkdiv));
    	if (use_ext_osc)
    		mcasp_write(asp, AHCLKXCTL_OFFSET, 0);
    	mcasp_write(asp, RTDM_OFFSET, MCASP_TDM_SLOT_MASK);
    	mcasp_write(asp, RINTCTL_OFFSET, XRDMAERR | ROVRN);
    
    	/* 2c. tx registers */
    	mcasp_write(asp, XMASK_OFFSET, 0xffffffff);
    	mcasp_write(asp, XFMT_OFFSET, XRSSZ(0xf));
    	mcasp_write(asp, AFSXCTL_OFFSET, XRMOD(8) | FSXRM);
    	mcasp_write(asp, ACLKXCTL_OFFSET, CLKXRM | CLKXRDIV(aclkdiv));
    	mcasp_write(asp, AHCLKXCTL_OFFSET, HCLKXRM | HCLKXRDIV(ahclkdiv));
    	if (use_ext_osc)
    		mcasp_write(asp, AHCLKXCTL_OFFSET, 0);
    
    	/* 2d. serializers */
    	mcasp_write(asp, SRCTL_OFFSET(0), DISMOD(2) | SRMOD(2));
    	mcasp_write(asp, SRCTL_OFFSET(1), 0);
    	mcasp_write(asp, SRCTL_OFFSET(2), 0);
    	mcasp_write(asp, SRCTL_OFFSET(3), 0);
    
    	/* 2e. global registers */
    	mcasp_clr_bits(asp, PFUNC_OFFSET, AHCLKX | ACLKX | AFSX);
    	mcasp_set_bits(asp, PDIR_OFFSET, ACLKX | AFSX);
    	if (!use_ext_osc)
    		mcasp_set_bits(asp, PDIR_OFFSET, AHCLKX);
    
    	/* 2f. skip dit registers */
    
    	/* 3. start high frequency clocks */
    	mcasp_set_gblctl_bits(asp, GBLCTL_OFFSET, XHCLKRST | RHCLKRST);
    
    	/* 4. start bit clocks */
    	mcasp_set_gblctl_bits(asp, GBLCTL_OFFSET, XCLKRST | RCLKRST);
    
    	/* 5. setup dma */
    	mcasp_clr_bits(asp, REVTCTL_OFFSET, XRDATDMA);
    	mcasp_write(asp, RFIFOCTL_OFFSET, XRNUMEVT(DMA_XFR_WORDS) | XRNUMDMA(1));
    }
    
    static void mcasp_start(struct mcasp *asp)
    {
    	/* flush then enable RFIFO */
    	mcasp_clr_bits(asp, RFIFOCTL_OFFSET, XRENA);
    	mcasp_set_bits(asp, RFIFOCTL_OFFSET, XRENA);
    
    	/* 6. activate serializers */
    	mcasp_write(asp, XSTAT_OFFSET, 0xFFFF);
    	mcasp_write(asp, RSTAT_OFFSET, 0xFFFF);
    	mcasp_set_gblctl_bits(asp, GBLCTL_OFFSET, RSRCLR);
    
    	mcasp_write(asp, RBUF_OFFSET, 0);
    
    	/* 7. skip */
    
    	/* 8. release state machine from reset */
    	mcasp_set_gblctl_bits(asp, GBLCTL_OFFSET, RSMRST | XSMRST);
    
    	/* 9. release frame sync generator from reset */
    	mcasp_set_gblctl_bits(asp, GBLCTL_OFFSET, RFRST | XFRST);
    }
    
    static void mcasp_stop(struct mcasp *asp)
    {
    	mcasp_write(asp, GBLCTL_OFFSET, 0);
    	mcasp_write(asp, RSTAT_OFFSET, 0xFFFF);
    
    	/*
    	 * AHCLKX is not gated following reset of GBLCTL. To prevent clock leakage,
    	 * reconfigure all mcasp pins to inputs.
    	 */
    	mcasp_write(asp, PDIR_OFFSET, 0);
    
    	if (asp->clksrc == CLKSRC_XOSC && asp->xosc.present) {
    		gpio_set_value(asp->xosc.gpio, 0);
    	}
    }

    The kernel module dma setup looks like the following:

    static void rx_dma_callback(void *data)
    {
    	struct pru_rproc *pru = (struct pru_rproc *) data;
    	pru->dma_xfrs++;
    	writel_relaxed(pru->dma_xfrs, pru->xfr_count);
    	writel_relaxed(pru->sys_event, pru->intc_sisr);
    }
    
    static void rx_dma_stop(struct mcasp *asp)
    {
    	if (asp->dma.chan) {
    		dmaengine_terminate_sync(asp->dma.chan);
    		dma_release_channel(asp->dma.chan);
    		asp->dma.chan = 0;
    	}
    }
    
    static int rx_dma_start(struct mcasp *asp, struct pru_rproc *pru, u32 channels)
    {
    	struct dma_info *dma = &asp->dma;
    	struct dma_slave_config config;
    	struct dma_async_tx_descriptor *desc;
    	dma_cookie_t cookie;
    
    	if (!channels) {
    		dev_err(asp->dev, "no channels specified\n");
    		return -EINVAL;
    	}
    
    	dma->chan = dma_request_chan(asp->dev, "rx");
    	if (IS_ERR(dma->chan)) {
    		if (PTR_ERR(dma->chan) != -ENODEV) {
    			dev_err(asp->dev, "can't get rx dma channel\n");
    			return PTR_ERR(dma->chan);
    		}
    	}
    
    	memset(&config, 0, sizeof(config));
    	config.src_addr = asp->l3_addr + RBUF_OFFSET;
    	config.src_addr_width = DMA_XFR_WORD_SIZE;
    	config.src_maxburst = DMA_XFR_WORDS;
    
    	if (dmaengine_slave_config(dma->chan, &config)) {
    		dev_err(asp->dev, "dmaengine_slave_config failed\n");
    		dma_release_channel(dma->chan);
    		return -EINVAL;
    	}
    
    	desc = dmaengine_prep_dma_cyclic(dma->chan,
    				dma->dest,
    				DMA_RING_BUFFER_BYTES,
    				DMA_XFR_BYTES,
    				DMA_DEV_TO_MEM,
    				DMA_PREP_INTERRUPT);
    
    	if (!desc) {
    		dev_err(asp->dev, "dmaengine_prep_dma_cyclic() failed\n");
    		return -ENOMEM;
    	}
    
    	desc->callback = rx_dma_callback;
    	desc->callback_param = pru;
    	pru->dma_xfrs = 0;
    
    	cookie = dmaengine_submit(desc);
    	if (dma_submit_error(cookie)) {
    		dev_err(asp->dev, "dmaengine_submit()\n");
    		dmaengine_terminate_all(dma->chan);
    		rx_dma_stop(asp);
    		return -EINVAL;
    	}
    
    	dma_async_issue_pending(dma->chan);
    
    	return 0;
    }

    The PRU code that processes the data looks like the following.

    void acquire_data(void)
    {
    	uint32_t queue = ring_buff_queue();
    	if (!acquisition_running()) {
    		acquisition_abort(ERROR_INPUT_OVERRUN);
    		return;
    	}
    
    	while (queue && !ring_buff_finished()) {
    		/* reassemble data and move it from PRU DRAM into PRU SRAM */
    		sram_buff_fill(ring_buff_read(sram_buff_ptr()));
    		if (!acquisition_running())
    			return;
    
    		if (sram_buff_slot_full()) {
    			if (!ddr_table_valid()) {
    				acquisition_abort(ERROR_TABLE_OVERRUN);
    				return;
    			}
    
    			if (sram_buff_xfrs() && !dma_transfer_complete()) {
    				acquisition_abort(ERROR_DMA_STALL);
    				return;
    			}
    
    			/* move data out of PRU SRAM into DDR */
    			dma_start_transfer(sram_buff_addr(), ddr_dst_addr(sram_buff_slot()));
    
    			sram_buff_slot_update();
    			if (sram_buff_slot() == 0)
    				ddr_switch_address_table();
    
    			if (ring_buff_finished() && sram_buff_finished()) {
    				acquisition_stop();
    				dma_finish_transfer();
    				pru_send_arm_acq_complete(ring_buff_xfrs(), sram_buff_xfrs(), cycle_count());
    				return;
    			}
    		} else if (sram_buff_slot_overrun()) {
    			acquisition_abort(ERROR_BUFFER_OVERRUN);
    			return;
    		}
    
    		queue--;
    	}
    }
    
    void parse_command(void)
    {
    	while (pru_rpmsg_receive(&transport, &src, &dst, payload, &len) == PRU_RPMSG_SUCCESS) {
    		if (pru_echoing_cmds()) {
    			pru_rpmsg_send(&transport, dst, src, payload, len);
    		}
    
    		uint32_t *cmd = (uint32_t *) payload;
    		switch (cmd[0]) {
    		case ARM_SEND_PRU_DDR_ADDR_LOADED:
    			ddr_validate_addresses(cmd[1]);
    			break;
    		case ARM_SEND_PRU_ACQ_CONF:
    			ring_buff_init((uint8_t) cmd[1], cmd[2]);
    			sram_buff_init((uint8_t) cmd[1], cmd[3]);
    			ddr_table_init();
    			dma_init();
    			pru_send_arm_mem_addr();
    			acquisition_start();
    			break;
    		case ARM_SEND_PRU_ECHO_CMDS:
    			pru_configure_echo(cmd[1]);
    			break;
    		case ARM_SEND_PRU_RESET:
    			pru_reset();
    			break;
    		}
    	}
    }
    
    void enable_data_ready_irq(void)
    {
    	/* map system event 20 to channel 1 */
    	CT_INTC.CMR5_bit.CH_MAP_20 = 1;
    	/* map channel 1 to host interrupt 1 */
    	CT_INTC.HMR0_bit.HINT_MAP_1 = 1;
    	/* enable system event 20 */
    	CT_INTC.EISR_bit.EN_SET_IDX = SYSEVT_DATA_READY;
    	/* enable host interrupt 1 on system event */
    	CT_INTC.HIEISR_bit.HINT_EN_SET_IDX = 1;
    	/* enable all host interrupts */
    	CT_INTC.GER_bit.EN_HINT_ANY = 1;
    	/* clear any existing system event */
    	CT_INTC.SICR_bit.STS_CLR_IDX = SYSEVT_DATA_READY;
    }
    
    void main(void)
    {
    	/* Allow OCP master port access by the PRU so the PRU can read external memories */
    	CT_CFG.SYSCFG_bit.STANDBY_INIT = 0;
    
    	/* Clear the status of the PRU-ICSS system event that the ARM will use to 'kick' us */
    	CT_INTC.SICR_bit.STS_CLR_IDX = SYSEVT_FROM_ARM;
    
    	/* Make sure the Linux drivers are ready for RPMsg communication */
    	status = &resourceTable.rpmsg_vdev.status;
    	while (!(*status & VIRTIO_CONFIG_S_DRIVER_OK));
    
    	/* Initialize the RPMsg transport structure */
    	pru_rpmsg_init(&transport, &resourceTable.rpmsg_vring0, &resourceTable.rpmsg_vring1, SYSEVT_TO_ARM, SYSEVT_FROM_ARM);
    
    	/* Create the RPMsg channel between the PRU and ARM user space using the transport structure. */
    	while (pru_rpmsg_channel(RPMSG_NS_CREATE, &transport, CHAN_NAME, CHAN_DESC, CHAN_PORT) != PRU_RPMSG_SUCCESS);
    
    	enable_data_ready_irq();
    
    	while (1) {
    		if (__R31 & HOST0_INT) {
    			CT_INTC.SICR_bit.STS_CLR_IDX = SYSEVT_FROM_ARM;
    			parse_command();
    		}
    
    		if (__R31 & HOST1_INT) {
    			CT_INTC.SICR_bit.STS_CLR_IDX = SYSEVT_DATA_READY;
    			acquire_data();
    		}
    	}
    }

  • Hello Andrew,

    1) Codec clock. I'm guessing your McASP is getting its data from an audio codec - is that the case? If so, either the McASP should be the clock master and providing the clock used by the clock slave codec, or the codec needs to be the clock master providing the clock used by the clock slave McASP. If your codec and McASP have different clock sources, you may lose information before the McASP even gets it. Could you verify the clock setup for us?

    2) DMA procedure / PRU setup: I need a bit more time to think about your code and application. I'm not sure yet if sending an interrupt every 69us would cause any problems for the PRU, if larger/fewer buffers would make sense, etc.

    Regards,
    Nick
  • Nick,

    1. Codec clock

    The McASP is always the clock master. However for every acquisition the user can select AHCLKX from one of two sources: internal McASP 24MHz functional clock or an external 36.864MHz oscillator. If the functional clock is selected the McASP provides the codec with AHCLKX. Otherwise the external oscillator feeds both the codec and the McASP. The McASP always provides ACLKX and FSX to the codec. Clock dividers for AHCLKX and ACLKX are dependent on the desired sample rate and operating mode of the codec. The sample rates we use vary widely: ~1kHz to 144kHz. Note that the issue is present even if the external oscillator is removed from the board.

    2. DMA setup

    I have tried increasing the size of the ring buffer by modifying the maximum number of PaRAMs allowed for cyclic DMA. I did this by modifying MAX_NR_SG as defined in drivers/dma/edma.c and reconfiguring my code and linker file for the larger buffer size. I doubled the buffer size to 32 segments of 240 bytes but am still seeing the same behavior.

    At the moment I am attempting to move all of the McASP setup and cyclic DMA code into the PRU. I am wondering if variable latency from the transfer completion interrupt combined with waiting on the scheduler to queue the callback might be tripping me up. If I send the transfer completion interrupt directly to the PRU I should be able to cut a lot of timing uncertainty out of my control loop. This should also help me get more accurate cycle counts between when the acquisition starts and every time the PRU sees available data.
  • Hello Andrew,

    How is debug going for you? Were you able to resolve the issue?

    Regards,
    Nick
  • Nick,

    Unfortunately no I haven't been able to resolve the problem. Due to some other work priorities I am going to have to let this sit for a little while.

    My initial question was in regards to the McASP initialization sequence. Mark indicated the sequence contained in the TRM was correct and that is what I am following. From that perspective I don't mind marking this as resolved. I can open a new or related question as necessary when I have time to look at this some more.

  • Hello Andrew,

    Sounds good, I flagged Mark's response as an answer. Good luck!

    Regards,
    Nick