AM623: UDMA cyclic mode for McASP broken by EOP change

Part Number: AM623

Tool/software:

Between SDK versions 09.02.00.009 and 09.02.00.010 the following patch broke audio streaming from McASP:

commit 67d5b24398155d07376b04dd8b3d858d049a5e17
Author: Jai Luthra <j-luthra@ti.com>
Date:   Mon Apr 29 16:06:55 2024 +0530

    dmaengine: ti: k3-udma: Fix teardown for cyclic RX with PDMA

    When receiving data in cyclic mode from PDMA peripherals, where reload
    count is set to infinite, any TR in the set can potentially be the last
    one of the overall transfer. In such cases, the EOP flag needs to be set
    in each TR and PDMA's Static TR "Z" parameter should be set, matching
    the size of the TR.

    This is required for the teardown to function properly and cleanup the
    internal state memory. This only affects platforms using BCDMA and not
    those using UDMA-P, which could set EOP flag in the teardown TR
    automatically.

The UDMA is configured for audio streaming as:

  • PDMA with ACC32, BURST
  • cyclic
  • period_len=65536
  • buf_len=524288

Before this patch audio streaming worked correctly, meaning the output bytes matched the input.

After this patch audio streaming is broken:

  1. Every 65536 bytes in output there is a discontinuity with extra null bytes (observed both 16 and 32 bytes offset)
  2. Due to #1 TDM streams get moved to different channels
  3. Manually tracking changed offsets in data stream still has discontinuity in audio signal, possibly indicating lost samples

Attached file normal_tdm128.wav is a recording using arecord from 09.00.00.010. File bad_tdm128.wav is a recording using arecord from 09.02.00.010. Both recordings are 128 channels with 24 channels streaming audio and the rest zero. Both recordings have a tone input on one channel that will be seen at much smaller magnitude on at least 3 other channels due to proximity.

  • Hi Matthew, 

    Could you run the test with latest 11.1 SDK and share us your observations. 

    Also, appreciate if you could let us know a way to reproduce this on our EVM.

    Best Regards,

    Suren

  • I will attempt to recreate with 11.1. We are using the McASP to capture I2S so replicating on the EVM may require a data source.

    Our sound card is 4 channels of TDM32 streaming from McASP2 with the following device tree configuration:

        sound0: sound@0 {
            compatible = "simple-audio-card";
            simple-audio-card,name = "A2B_TDM";
            simple-audio-card,format = "dsp_a";
            simple-audio-card,mclk-fs = <512>;
            simple-audio-card,bitclock-master = <&sound_slave>;
            simple-audio-card,frame-master = <&sound_slave>;
    
            status="okay";
    
            sound_master: simple-audio-card,cpu {
                sound-dai = <&mcasp2>;
            };
    
            sound_slave: simple-audio-card,codec {
                sound-dai = <&codec_test>;
            };
        };
    	
        codec_test: codec_test {
            #sound-dai-cells = <0>;
            compatible = "linux,snd-soc-dummy";
            status="okay";
          };
    
    &mcasp2 {
        status = "okay";
        #sound-dai-cells = <0>;
    
        pinctrl-names = "default";
        pinctrl-0 = <&mcasp2_pins_default>;
    
        op-mode = <0>;      /* MCASP_IIS_MODE */
        tdm-slots = <32>;
        ext-clk-provider;   /* external clocks provided, override DAI_FMT to be bclk/fs consumer */
        extended-tdm-fs;    /* TDM mode defaults to 1-bit-width frame sync, this flags extends to 1-word-width */
    
        /*    1 : Tx, 2 : Rx */
        serial-dir = <
            2 2 2 2
            0 0 0 0
            0 0 0 0
            1 1 1 1
        >;
    
        tx-num-evt = <0>;
        rx-num-evt = <0>;
    };

    Additionally, the BCDMA configuration works out as follows:

    • buf_len = 524288 (512k)
    • period_len = 65536 (64k)
    • periods = 8
    • 16 transmit requests as 8 groups of 2:
      • icnt0=65528, icnt1=1, dim1=65528
      • icnt0=8, icnt1=1, dim1=8, EOP set
    • PDMA (DEV_TO_MEM) mode with:
      • ACC32=1
      • BURST=1
      • cyclic=1
    • bstcnt set to 32764 Zap (I suddenly realize that this is likely the problem!) Zap
  • The added code to set bstcount appears to be wrong:

    	} else if (uc->ud->match_data->type == DMA_TYPE_BCDMA &&
    		   uc->config.dir == DMA_DEV_TO_MEM && !uc->config.pkt_mode &&
    		   uc->cyclic) {
    		/*
    		 * For cyclic TR mode PDMA must close the packet after every TR
    		 * transfer, as we have to set EOP in each TR to prevent short
    		 * packet errors seen on channel teardown.
    		 */
    		struct cppi5_tr_type1_t *tr_req = d->hwdesc[0].tr_req_base;
    
    		d->static_tr.bstcnt =
    			(tr_req->icnt0 * tr_req->icnt1) / dev_width;

    This sets bstcnt based on a single TR, however the driver in udma_prep_dma_cyclic_tr() may generate one or two TR per cycle based on the size when calling udma_get_tr_counters():

    static int udma_get_tr_counters(size_t len, unsigned long align_to,
    				u16 *tr0_cnt0, u16 *tr0_cnt1, u16 *tr1_cnt0)
    {
    	if (len < SZ_64K) {
    		*tr0_cnt0 = len;
    		*tr0_cnt1 = 1;
    
    		return 1;
    	}
    
    	if (align_to > 3)
    		align_to = 3;
    
    realign:
    	*tr0_cnt0 = SZ_64K - BIT(align_to);
    	if (len / *tr0_cnt0 >= SZ_64K) {
    		if (align_to) {
    			align_to--;
    			goto realign;
    		}
    		return -EINVAL;
    	}
    
    	*tr0_cnt1 = len / *tr0_cnt0;
    	*tr1_cnt0 = len % *tr0_cnt0;
    
    	return 2;
    }

    Thus for an input length of 64k to 128k-1 there will be two TR per transfer and we end up with:

    1. EOP after 65536 bytes (set on second TR)
    2. bstcnt set to 32464, dev_width=2 for a total of 65528
  • Hi Matthew,

    Can you provide a patch for the same, so I can have my team review it and up streamed for the community to use.

    With the above change, are you able to resolve the issue?

    Best Regards,

    Suren

  • With the attached patch for 9.03 the TDM channels do not get mixed up:

    Workaround for bug in commit 67d5b24398155d07376b04dd8b3d858d049a5e17
    Original commit title:
        dmaengine: ti: k3-udma: Fix teardown for cyclic RX with PDMA
    
    The original commit sets EOP at the end of BCDMA transfer and sets bstcnt
    based on the contents of transmit resquest (TR) #0. The driver, however, will
    use two TRs when transfer size is >=64k bytes with EOP set on the second TR. 
    In this case bstcnt should equal the total bytes for both TRs.
    
    diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
    index 4291bbf998ab..81b62df45be2 100644
    --- a/drivers/dma/ti/k3-udma.c
    +++ b/drivers/dma/ti/k3-udma.c
    @@ -3220,6 +3220,11 @@ static int udma_configure_statictr(struct udma_chan *uc, struct udma_desc *d,
     
     		d->static_tr.bstcnt =
     			(tr_req->icnt0 * tr_req->icnt1) / dev_width;
    +
    +		// Data may be in two TR if >=64k
    +		if( !(tr_req->flags & CPPI5_TR_CSF_EOP) ){
    +			d->static_tr.bstcnt += (tr_req[1].icnt0 * tr_req[1].icnt1) / dev_width;
    +		}
     	} else {
     		d->static_tr.bstcnt = 0;
     	}
    

    Note that 11.01 uses the same code but it has been moved to a different file.

    In testing I am still seeing discontinuity in the recorded audio and arecord takes 8 seconds to record 2 seconds of audio. I am working on identifying if this is related to the EOP change or if it is a separate issue.

  • I have two additional questions about the BCDMA process:

    1. Is there a performance impact when the transfer is broken into two transfer requests and the second request is only 8 bytes?
    2. For the application of McASP to PDMA to BCDMA where is the packet length (period_len in k3-udma.c) of 64kB set?
  • Hi Matthew.

    Thanks for sharing the code changes. I have sent it to our SW experts to review and provide comments.

    Also reached out to them on the implications of the code changes on performance as well.

    Did you get to test audio recording on 11.1 SDK? If I understand correctly, stereo recording is working as expected and you are seeing these discontinuity only with mutliple tdm slots/channels?

    Best Regards,

    Suren

  • Hi Matthew,

    Here is the analysis so far:

    ALSA max period size limit is 64K

    UDMA max TR size limit is set to 64K-1 

    Due to extra byte, it is splitting into two TRs which is currently not well supported in UDMA driver. 

    Can you try to reduce the period size and give it a try?

    Best Regards,

    Suren