This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-AM62: ALSA/McASP Linux RT Underflow

Part Number: SK-AM62
Other Parts Discussed in Thread: TLV320AIC3106

Hello TI Team!

I'm trying to get a low latency audio demo working on the SK-AM62x (GP) using Processor-SDK-Linux-RT (version 9.0.0.3). I have written a userspace application with ALSA that captures 3.5mm mic audio from a connected headset and plays back that same audio (we'll eventually be processing this audio - this is just for a proof-of-concept). This is using the SK-AM62x onboard 3.5mm connector and TLV320AIC3106 CODEC. Unfortunately, while this works for a period of time, as currently configured it will always result in a kernel error from the McASP driver regarding a transmit buffer underflow that seems to be unrecoverable.

Some application specific details:

- The playback and capture threads are pthreads with a scheduler policy of SCHED_FIFO and a task priority of 80. The capture thread stores the audio into a buffer and the playback thread writes the stored buffer – right now, there is no audio processing in these threads, so they are not computationally-intensive.

- The playback and capture PCM parameters are: 1 channel, 48000 sample rate, signed 16-bit samples and the period time is 5 ms (period size in frames = 240, buffer size in frames = period size * 2 = 480).

As I said, we are going for low latency, thus the small period time. At this time, I believe our questions are:

  1. With the current McASP kernel driver, is there a minimum period time that is supported? We see that 1 ms is not but no error is thrown with 5 ms and as I said this will work as currently configured for up to 23 hours.
  2. Are there other considerations to prevent a transmit buffer underflow, such as a higher thread priority than 80?
  3. This last question is more about recovery than the actual problem - is there any way to recover the McASP driver once it gets into this state? It seems calls to snd_pcm_prepare() correctly place the ALSA handle in the SND_PCM_STATE_PREPARED (instead of SND_PCM_STATE_XRUN), but further attempts to write to that handle fail and produce the same McASP kernel error printout.

Thanks so much for your help!

  • Hi Andy,

    Can you share the log with the kernel error you are seeing. We are aware of few issues with DMA transfer and the team is actually working to resolve the issue. 

    Best Regards,

    Suren

  • Hi Suren,

    Thank you so much for the quick reply! When the error happens, the following two kernel messages appear over and over on UART:

    [ 322.151846] davinci-mcasp 2b10000.audio-controller: Receive buffer overflow
    [ 322.167643] davinci-mcasp 2b10000.audio-controller: Transmit buffer underflow

    Are there additional logs that would be helpful?

    Thanks!

    Andy

  • Hi Andy,

    Can you try to add this change in the DTS file on your setup and try to run the test and see if the behavior changes.

    diff --git a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
    index 33768c02d8eb1..2b7b1671448b5 100644
    --- a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
    +++ b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
    @@ -504,8 +504,8 @@ &mcasp1 {
                   0 0 0 0
                   0 0 0 0
            >;
    -       tx-num-evt = <32>;
    -       rx-num-evt = <32>;
    +       tx-num-evt = <0>;
    +       rx-num-evt = <0>;
     };
    

    Best Regards,

    Suren

  • Hi Suren,

    Thanks for the suggestion! I've made the change and am currently running my loopback test again. I will provide an update when I can. As I said in the original post, the current test has been known to run for up to 23 hours before failing, so this may take some time.

    Andy

  • Understood Andy.

    Keep us posted with your findings.

    Best Regards,

    Suren

  • Hi Suren,

    Unfortunately, I don't believe the device tree change with regards to the AFIFO has had any effect on the underflow/overflow issue. I ran the test to failure several times (once after about 7 hours, once sometime overnight, and once within minutes of starting the app). The failure is the same as before:

    [ 148.552613] davinci-mcasp 2b10000.audio-controller: Receive buffer overflow
    [ 148.559795] davinci-mcasp 2b10000.audio-controller: Transmit buffer underflow

    There are no messages printed by the kernel indicating other activity right before the error (such as networking status updates), so I don't believe the processor was off doing something unexpected/unplanned. The receive overflow error seems to always appear first.

    If there's additional information that I can gather, please let me know.

    Andy

  • Hi Andy,

    In your experiments, can you confirm if after seeing the overflow you are able to stop and start arecord again?

    Or you have to reboot the board in order to get it working?

    Also as I pointed to Jason in a separate thread, the minimum on period_size on our setup is 64 due to FIFO being set to 64. I have asked him to try to change it to 32 and see if it helps ( latency might improve but, overflow could get worse.. )

    I have asked Jason, if its okay to continue our discussion here and close the other as its the same topic being discussed.

    Best Regards,

    Suren

  • Hi Suren,

    I am using my own userspace application that loops audio from the capture device (3.5mm mic) to the playback device (3.5mm headset). I cannot stop and restart my application - I must cycle power on the board in order to get it working again. I have not tried arecord - I can attempt to do that the next time I see a failure.

    Andy

  • I was able to confirm that after my application fails with an overflow, I can use arecord to successfully record mic audio without a board reboot.

  • Thanks Andy,

    Is your application just doing record + playback on the EVM continuously? Could you share the script/application for us to reproduce the issue on our end and help debug.

    Best Regards,

    Suren

  • Hi Suren,

    Yes, once my application starts, it stores the received capture buffer and then plays that back on the same interface and does so continuously. I've attached my code along with pre-built debug and release images (I've been primarily using the debug version for my tests but the same issue happens on the release build as well).

    In order to run this on the SK, the default device tree in Processor SDK 9.0.0.3 will work, but you will need to modify the kernel so that the TLV320AIC3106 codec is supported per your documentation here.

    Andy

     audio_test_sk_minimal.zip

  • Thanks Andy.

    Can you run the below command continuously on your end and see if the issue occurs? In the below command I am capturing for 20 sec and playing back on the speaker.

    arecord -Dhw:0,0 -r 48000 -c 2 --period-size=64 -d 20 -f S16_LE | aplay -Dhw:0,0

    Best Regards,

    Suren

  • Hi Suren,

    Unfortunately, that test also failed (I did remove the -d 20 option from arecord so that it could run continuously). Here is the console printout:

    root@am62xx-evm:/usr/bin# arecord -Dhw:0,0 -r 48000 -c 2 --period-size=64 -f S16_LE | aplay -Dhw:0,0                                
    Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo                                                         
    Playing WAVE 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo                                                           
    [16519.963667] ti-udma 485c0100.dma-controller: chan2 teardown timeout!                                                             
    underrun!!! (at least 595.510 ms long)                                                                                              
    [16519.997883] davinci-mcasp 2b10000.audio-controller: Transmit buffer underflow                                                    
    [16521.019840] ti-udma 485c0100.dma-controller: chan1 teardown timeout! 

    Andy

  • Hi Andy,

    Allow me a day/two to respond back on what might be going wrong with McASP.

    Best Regards,

    Suren

  • Hi Suren:
    FYI, Im also experiencing what is probably the same issue but in my case using a proprietary audio streaming protocol with an app that writes directly to ALSA

  • Hi Andy, Bravo,

    We are looking at this issue and a JIRA is currently filed to resolve this issue. More updates as I get, I will provide it here. 

    Best Regards,

    Suren

  • Thank you for the update Suren! I'll look forward to any updates you can provide.

    Andy

  • Hi Suren,

    I know people are still coming back from the holidays. I was just curious whether there was any update, and if not, perhaps an expected timeframe for such an update? Thank you!

    Andy

  • Hi Andy, 

    Happy New year!

    I appreciate your patience and apologies for the delay:

    Our SW team suggested you try this GStreamer based pipeline as they didn't see any issues running it for 4+ hours so far:

    Pasting the Pipeline here:

    gst-launch-1.0 alsasrc device="hw:0,0" latency-time=5 ! audio/x-raw, format="(string)S16LE", layout="(string)interleaved", rate="(int)48000", channels="(int)2" ! queue ! alsasink device="hw:0,0" latency-time=5 -v

    Let me know how it goes.

    Best Regards,

    Suren

  • Hi Suren,

    Happy new year to you as well!

    I have been running the requested test using GStreamer. It is still running and I have not seen the overflow/underflow errors that I had previously. I will continue to run it over the weekend.

    However, I'm not sold that this is an apples-to-apples comparison. I still need to familiarize myself with GStreamer, but the latency of the GStreamer loopback test is much greater (> 100 ms) than that of the user space application I wrote (and provided above) or alsaloop (approx. 20 ms). Also, the latency-time argument doesn't seem to be having any effect on the observed latency. Is this the proper parameter, and is this much larger latency with GStreamer expected?

    If the ALSA period size that is set up by GStreamer is some larger value, then I would expect more stability than previous attempts. But then that reintroduces one of the questions in my original post - is there a minimum in regards to ALSA period time for the AM62x/McASP/RT Linux to prevent underflows/overflows?

    Just to post it, here is the output of the GStreamer test since verbose mode was specified:

    root@am62xx-evm:~# amixer sset PCM 90%                                                                                                                 
    Simple mixer control 'PCM',0                                                                                                        
      Capabilities: pvolume                                                                                                             
      Playback channels: Front Left - Front Right                                                                                       
      Limits: Playback 0 - 127                                                                                                          
      Mono:                                                                                                                             
      Front Left: Playback 114 [90%] [-6.50dB]                                                                                          
      Front Right: Playback 114 [90%] [-6.50dB]                                                                                         
    root@am62xx-evm:~# amixer sset 'Left PGA Mixer Mic3R' on                                                                            
    Simple mixer control 'Left PGA Mixer Mic3R',0                                                                                       
      Capabilities: pswitch pswitch-joined                                                                                              
      Playback channels: Mono                                                                                                           
      Mono: Playback [on]                                                                                                               
    root@am62xx-evm:~# amixer sset 'Right PGA Mixer Mic3R' on                                                                           
    Simple mixer control 'Right PGA Mixer Mic3R',0                                                                                      
      Capabilities: pswitch pswitch-joined                                                                                              
      Playback channels: Mono                                                                                                           
      Mono: Playback [on]                                                                                                               
    root@am62xx-evm:~# amixer sset PGA 90%                                                                                              
    Simple mixer control 'PGA',0                                                                                                        
      Capabilities: cvolume cswitch                                                                                                     
      Capture channels: Front Left - Front Right                                                                                        
      Limits: Capture 0 - 119                                                                                                           
      Front Left: Capture 107 [90%] [53.50dB] [on]                                                                                      
      Front Right: Capture 107 [90%] [53.50dB] [on]                                                                                     
    root@am62xx-evm:~# gst-launch-1.0 alsasrc device="hw:0,0" latency-time=5 ! audio/x-raw, format="(string)S16LE", layout="(string)inte
    rleaved", rate="(int)48000", channels="(int)2" ! queue ! alsasink device="hw:0,0" latency-time=5 -v                                 
    Setting pipeline to PAUSED ...                                                                                                      
    Pipeline is live and does not need PREROLL ...                                                                                      
    Pipeline is PREROLLED ...                                                                                                           
    Setting pipeline to PLAYING ...                                                                                                     
    New clock: GstAudioSrcClock                                                                                                         
    /GstPipeline:pipeline0/GstAlsaSrc:alsasrc0: actual-buffer-time = 200000                                                             
    /GstPipeline:pipeline0/GstAlsaSrc:alsasrc0: actual-latency-time = 1333                                                              
    Redistribute latency...                                                                                                             
    /GstPipeline:pipeline0/GstAlsaSrc:alsasrc0.GstPad:src: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate=(i
    nt)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                                 
    /GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, r
    ate=(int)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                           
    /GstPipeline:pipeline0/GstQueue:queue0.GstPad:sink: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate=(int)
    48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                                    
    /GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:sink: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, 
    rate=(int)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                          
    /GstPipeline:pipeline0/GstQueue:queue0.GstPad:src: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate=(int)4
    8000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                                     
    Redistribute latency...                                                                                                             
    /GstPipeline:pipeline0/GstAlsaSink:alsasink0.GstPad:sink: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate
    =(int)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                              
    Redistribute latency...                 

    Andy

  • As Andy correctly pointed out, this is not a valid test. 

    I have another thread pointing to the same issue and I even shared a little program that triggers the issue 100% of the time, plus a bit of further investigation of where this is coming from:

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1307264/sk-am62-davinci-mcasp-2b10000-audio-controller-transmit-buffer-underflow/4985582

  • Hi Bravo/Andy,

    We also ran, this below command using aplay and arecord for 30+ hours and haven't seen the issue yet.

    The reason it would stop earlier was because of the max WAV file size in arecord: 

    arecord  --max-file-time: Default is the maximum size supported by the file format: 2 GiB for WAV files..

    Also refer to the issue that was reported in raspberry pi forums: raspberrypi.stackexchange.com/.../arecord-aplay-stop-after-a-while

    arecord -Dhw:0,0 -r 48000 -c 2 --period-size=64 --buffer-size 9600 -f S16_LE -v -f dat -t raw | aplay -Dhw:0,0

    Hope this helps.

    Best Regards,
    Suren




  • Hi Suren,

    If I try to run the command exactly as written, it fails in the manner below:

    root@am62xx-evm:~# arecord -Dhw:0,0 -r 48000 -c 2 --period-size=64 --buffer-size 9600 -f S16_LE -v -f dat -t raw | aplay -Dhw:0,0   
    Recording raw data 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo                                                     
    Hardware PCM card 0 'AM62x-SKEVM' device 0 subdevice 0                                                                              
    Its setup is:                                                                                                                       
      stream       : CAPTURE                                                                                                            
      access       : RW_INTERLEAVED                                                                                                     
      format       : S16_LE                                                                                                             
      subformat    : STD                                                                                                                
      channels     : 2                                                                                                                  
      rate         : 48000                                                                                                              
      exact rate   : 48000 (48000/1)                                                                                                    
      msbits       : 16                                                                                                                 
      buffer_size  : 9600                                                                                                               
      period_size  : 64                                                                                                                 
      period_time  : 1333                                                                                                               
      tstamp_mode  : NONE                                                                                                               
      tstamp_type  : MONOTONIC                                                                                                          
      period_step  : 1                                                                                                                  
      avail_min    : 64                                                                                                                 
      period_event : 0                                                                                                                  
      start_threshold  : 1                                                                                                              
      stop_threshold   : 9600                                                                                                           
      silence_threshold: 0                                                                                                              
      silence_size : 0                                                                                                                  
      boundary     : 5404319552844595200                                                                                                
      appl_ptr     : 0                                                                                                                  
      hw_ptr       : 0                                                                                                                  
    Playing raw data 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono                                                                       
    aplay: set_params:1352: Sample format non available                                                                                 
    Available formats:                                                                                                                  
    - S16_LE                                                                                                                            
    - S24_LE                                                                                                                            
    - S32_LE                                                                                                                            
    - S24_3LE                                                                                                                           
    [  572.507728] ti-udma 485c0100.dma-controller: chan2 teardown timeout!   

    If I modify it so that the format is added to aplay and remove the redundant settings from arecord (since -f dat is a shortcut specifying -c, -r and -f), the equivalent command works:

     

    arecord -Dhw:0,0 --period-size=64 --buffer-size 9600 -v -f dat -t raw | aplay -Dhw:0,0 -f dat

    It should be noted that, even though the ALSA period is only 1.3 ms, the latency on this demo is extraordinarily large, even worse than the GStreamer test above. Is there a reason that arecord/aplay are used in conjunction as opposed to just utilizing alsaloop?

    Additionally, you mentioned above that a JIRA was opened for this issue. Could you give us a status report for that bug? It seems that these latest tests we’re running are moving away from the issue that Lisandro and I are experiencing.

    Thank you.

    Andy

  • Hi Suren,

    This issue is gaining visibility in our organization and we are under increased pressure for a resolution. We've asked our FAE to help facilitate a call for alignment on this issue.

    Andy

  • underrun_crash_fix.patch

    Hi Andy, Lisandro,

    PFA the patch to fix the crashes seen with underruns. This is a temporary fix that worked on my system to at least make the errors recoverable - more work is needed to understand if this is a correct/upstreamable fix.

    Thanks,
    Jai

  • Jai:

    Thanks a lot for taking the time to look into this! Thumbsup

    Though this is definitely a step in the right direction, at least the pcm doesnt halts on xrun now, as you say there is work to be done.

    What I can see is that the DMA teardown issue is still present when calling pcm_drop & pcm_prepare, and that audio playback resume afterwards introduces a long delay, I will take proper measures on monday but seems around 2 seconds.

    I have adapted this wav player program to force pcm_drop & pcm_prepare from another example I provided to you guys earlier (this version is more flexible with different wave formats), to make it easy to reproduce this issue:

    // A simple C example to play a mono or stereo, 16-bit 44KHz
    // WAVE file using ALSA. This goes directly to the first
    // audio card (ie, its first set of audio out jacks). It
    // uses the snd_pcm_writei() mode of outputting waveform data,
    // blocking.
    //
    // Compile as so to create "alsawave":
    // gcc -o alsawave alsawave.c -lasound
    //
    // Run it from a terminal, specifying the name of a WAVE file to play:
    // ./alsawave MyWaveFile.wav
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    
    // Include the ALSA .H file that defines ALSA functions/data
    #include <alsa/asoundlib.h>
    
    #pragma pack (1)
    /////////////////////// WAVE File Stuff /////////////////////
    // An IFF file header looks like this
    typedef struct _FILE_head
    {
    	unsigned char	ID[4];	// could be {'R', 'I', 'F', 'F'} or {'F', 'O', 'R', 'M'}
    	unsigned int	Length;	// Length of subsequent file (including remainder of header). This is in
    									// Intel reverse byte order if RIFF, Motorola format if FORM.
    	unsigned char	Type[4];	// {'W', 'A', 'V', 'E'} or {'A', 'I', 'F', 'F'}
    } FILE_head;
    
    
    // An IFF chunk header looks like this
    typedef struct _CHUNK_head
    {
    	unsigned char ID[4];	// 4 ascii chars that is the chunk ID
    	unsigned int	Length;	// Length of subsequent data within this chunk. This is in Intel reverse byte
    							// order if RIFF, Motorola format if FORM. Note: this doesn't include any
    							// extra byte needed to pad the chunk out to an even size.
    } CHUNK_head;
    
    // WAVE fmt chunk
    typedef struct _FORMAT {
    	short				wFormatTag;
    	unsigned short	wChannels;
    	unsigned int	dwSamplesPerSec;
    	unsigned int	dwAvgBytesPerSec;
    	unsigned short	wBlockAlign;
    	unsigned short	wBitsPerSample;
      // Note: there may be additional fields here, depending upon wFormatTag
    } FORMAT;
    #pragma pack()
    
    // Size of the audio card hardware buffer. Here we want it
    // set to 1024 16-bit sample points. This is relatively
    // small in order to minimize latency. If you have trouble
    // with underruns, you may need to increase this, and PERIODSIZE
    // (trading off lower latency for more stability)
    #define BUFFERSIZE	(2*1024)
    
    // How many sample points the ALSA card plays before it calls
    // our callback to fill some more of the audio card's hardware
    // buffer. Here we want ALSA to call our callback after every
    // 64 sample points have been played
    #define PERIODSIZE	(2*64)
    
    // Handle to ALSA (audio card's) playback port
    snd_pcm_t				*PlaybackHandle;
    
    // Handle to our callback thread
    snd_async_handler_t	*CallbackHandle;
    
    // Points to loaded WAVE file's data
    unsigned char			*WavePtr;
    
    // Size (in frames) of loaded WAVE file's data
    snd_pcm_uframes_t		WaveSize;
    
    // Sample rate
    unsigned short			WaveRate;
    
    // Bit resolution
    unsigned char			WaveBits;
    
    // Number of channels in the wave file
    unsigned char			WaveChannels;
    
    // The name of the ALSA port we output to. In this case, we're
    // directly writing to hardware card 0,0 (ie, first set of audio
    // outputs on the first audio card)
    char SoundCardPortName[100] = "hw:0,0"; 
    
    // For WAVE file loading
    static const unsigned char Riff[4]	= { 'R', 'I', 'F', 'F' };
    static const unsigned char Wave[4] = { 'W', 'A', 'V', 'E' };
    static const unsigned char Fmt[4] = { 'f', 'm', 't', ' ' };
    static const unsigned char Data[4] = { 'd', 'a', 't', 'a' };
    
    /********************** compareID() *********************
     * Compares the passed ID str (ie, a ptr to 4 Ascii
     * bytes) with the ID at the passed ptr. Returns TRUE if
     * a match, FALSE if not.
     */
    
    static unsigned char compareID(const unsigned char * id, unsigned char * ptr)
    {
    	register unsigned char i = 4;
    
    	while (i--)
    	{
    		if ( *(id)++ != *(ptr)++ ) return(0);
    	}
    	return(1);
    }
    
    /********************** waveLoad() *********************
     * Loads a WAVE file.
     *
     * fn =			Filename to load.
     *
     * RETURNS: 0 if success, non-zero if not.
     *
     * NOTE: Sets the global "WavePtr" to an allocated buffer
     * containing the wave data, and "WaveSize" to the size
     * in sample points.
     */
    
    static unsigned char waveLoad(const char *fn)
    {
    	const char				*message;
    	FILE_head				head;
    	register int			inHandle;
    
    	if ((inHandle = open(fn, O_RDONLY)) == -1)
    		message = "didn't open";
    
    	// Read in IFF File header
    	else
    	{
    		if (read(inHandle, &head, sizeof(FILE_head)) == sizeof(FILE_head))
    		{
    			// Is it a RIFF and WAVE?
    			if (!compareID(&Riff[0], &head.ID[0]) || !compareID(&Wave[0], &head.Type[0]))
    			{
    				message = "is not a WAVE file";
    				goto bad;
    			}
    
    			// Read in next chunk header
    			while (read(inHandle, &head, sizeof(CHUNK_head)) == sizeof(CHUNK_head))
    			{
    				// ============================ Is it a fmt chunk? ===============================
    				if (compareID(&Fmt[0], &head.ID[0]))
    				{
    					FORMAT	format;
    
    					// Read in the remainder of chunk
    					if (read(inHandle, &format.wFormatTag, sizeof(FORMAT)) != sizeof(FORMAT)) break;
    
    					// Can't handle compressed WAVE files
    					if (format.wFormatTag != 1)
    					{
    						message = "compressed WAVE not supported";
    						goto bad;
    					}
    
    					WaveBits = (unsigned char)format.wBitsPerSample;
    					WaveRate = (unsigned short)format.dwSamplesPerSec;
    					WaveChannels = format.wChannels;
    				}
    
    				// ============================ Is it a data chunk? ===============================
    				else if (compareID(&Data[0], &head.ID[0]))
    				{
    					// Size of wave data is head.Length. Allocate a buffer and read in the wave data
    					if (!(WavePtr = (unsigned char *)malloc(head.Length)))
    					{
    						message = "won't fit in RAM";
    						goto bad;
    					}
    
    					if (read(inHandle, WavePtr, head.Length) != head.Length)
    					{
    						free(WavePtr);
    						break;
    					}
    
    					// Store size (in frames)
    					WaveSize = (head.Length * 8) / ((unsigned int)WaveBits * (unsigned int)WaveChannels);
    
    					close(inHandle);
    					return(0);
    				}
    
    				// ============================ Skip this chunk ===============================
    				else
    				{
    					if (head.Length & 1) ++head.Length;  // If odd, round it up to account for pad byte
    					lseek(inHandle, head.Length, SEEK_CUR);
    				}
    			}
    		}
    
    		message = "is a bad WAVE file";
    bad:	close(inHandle);
    	}
    
    	printf("%s %s\n", fn, message);
    	return(1);
    }
    
    
    int reset(){
    	int err = 0;
    	printf("Reset\n");
        
        printf("--snd_pcm_drop\n");
    	if ((err = snd_pcm_drop(PlaybackHandle))) {
    		fprintf(stderr, "cannot drop pending frames (%s)\n",
    				snd_strerror(err));
    		goto done;
    	}
    
        printf("--snd_pcm_prepare\n");
    	if ((err = snd_pcm_prepare(PlaybackHandle))) {
    		fprintf(stderr, "cannot prepare playback audio interface for use (%s)\n",
    				snd_strerror(err));
    		goto done;
    	}
        printf("End reset\n");
    
    done:
    	return err;
    }
    
    /********************** play_audio() **********************
     * Plays the loaded waveform.
     *
     * NOTE: ALSA sound card's handle must be in the global
     * "PlaybackHandle". A pointer to the wave data must be in
     * the global "WavePtr", and its size of "WaveSize".
     */
    static void play_audio(void)
    {
        register snd_pcm_uframes_t frames;
        snd_pcm_uframes_t frameSize = 4096; // Set the frame size to 4096
        snd_pcm_uframes_t remainingFrames = WaveSize; // Total number of frames to be played
        unsigned char *dataPtr = WavePtr; // Pointer to the current position in wave data
    	int period_reset = 0;
    
    	auto pcm_state = snd_pcm_state(PlaybackHandle);
    
        while (remainingFrames > 0)
        {
            snd_pcm_uframes_t framesToWrite = remainingFrames < frameSize ? remainingFrames : frameSize;
    
            // Write the frames
            frames = snd_pcm_writei(PlaybackHandle, dataPtr, framesToWrite);
    
            // If an error, try to recover from it
            if (frames < 0)
                frames = snd_pcm_recover(PlaybackHandle, frames, 0);
            if (frames < 0)
            {
                printf("Error playing wave: %s\n", snd_strerror(frames));
                break;
            }
    
            // Update pointers and counters
            dataPtr += frames * (WaveBits / 8) * WaveChannels;
            remainingFrames -= frames;
    		
    		period_reset++;
    		if(period_reset>=5){
                period_reset=0;
    			reset();   
            }
    
        }
    
        // Wait for playback to completely finish
        if (remainingFrames == 0)
            snd_pcm_drain(PlaybackHandle);
    }
    
    /*********************** free_wave_data() *********************
     * Frees any wave data we loaded.
     *
     * NOTE: A pointer to the wave data be in the global
     * "WavePtr".
     */
    
    static void free_wave_data(void)
    {
    	if (WavePtr) free(WavePtr);
    	WavePtr = 0;
    }
    
    int main(int argc, char **argv)
    {
    	// No wave data loaded yet
    	WavePtr = 0;
    
    	if (argc < 2)
        {
            printf("Usage: %s <WAVE file> [SoundCardPortName]\n", argv[0]);
            printf("Example: %s sound.wav hw:1,0\n", argv[0]);
            return 1;
        }
    
        // Check if sound card is specified
        if (argc > 2)
        {
            // Copy the specified sound card name
            strncpy(SoundCardPortName, argv[2], sizeof(SoundCardPortName));
            SoundCardPortName[sizeof(SoundCardPortName) - 1] = '\0'; // Ensure null-termination
        }
    
    	// Load the wave file
    	if (!waveLoad(argv[1]))
    	{
    		register int		err;
    
    		// Open audio card we wish to use for playback
    		if ((err = snd_pcm_open(&PlaybackHandle, &SoundCardPortName[0], SND_PCM_STREAM_PLAYBACK, 0)) < 0)
    			printf("Can't open audio %s: %s\n", &SoundCardPortName[0], snd_strerror(err));
    		else
    		{
    			switch (WaveBits)
    			{
    				case 8:
    					err = SND_PCM_FORMAT_U8;
    					break;
    			
    				case 16:
    					err = SND_PCM_FORMAT_S16;
    					break;
    			
    				case 24:
    					err = SND_PCM_FORMAT_S24;
    					break;
    			
    				case 32:
    					err = SND_PCM_FORMAT_S32;
    					break;
    			}
    		
    			// Set the audio card's hardware parameters (sample rate, bit resolution, etc)
    			if ((err = snd_pcm_set_params(PlaybackHandle, err, SND_PCM_ACCESS_RW_INTERLEAVED, WaveChannels, WaveRate, 1, 500000)) < 0)
    				printf("Can't set sound parameters: %s\n", snd_strerror(err));
    
    			// Play the waveform
    			else
    				play_audio();
    
    			// Close sound card
    			snd_pcm_close(PlaybackHandle);
    		}
    	}
    
    	// Free the WAVE data
    	free_wave_data();
    
    	return(0);
    }

    This is the output on a am62xx-lp-evm with the underrun_crash_fix.patch applied:

    ./alsawave piano2.wav 
    Reset
    --snd_pcm_drop
    --snd_pcm_prepare
    [ 2476.733559] ti-udma 485c0100.dma-controller: chan1 teardown timeout!
    End reset
    Reset
    --snd_pcm_drop
    --snd_pcm_prepare
    [ 2479.005665] ti-udma 485c0100.dma-controller: chan1 teardown timeout!
    End reset
    Reset
    --snd_pcm_drop
    --snd_pcm_prepare
    [ 2481.277681] ti-udma 485c0100.dma-controller: chan1 teardown timeout!

    I have also tested the same code on an iMX8, there is no DMA issues on that platform and calling the reset() function introduces barely noticeable skips on the audio output as to be expected.

    It was also mentioned that Raspberry Pi had similar issues, I tested a Raspberry Pi 4 with the latest image and the behaviour is the same as with NXP.

    We will require at least the same behaviour as the imx8 in order to validate this platform as suitable for our project (several thousands units / year).

  • Hi Andy,

    Here is the complete patch that you need to apply on 9.1 SDK released kernel.

    alsa-latency-changes.txt
    diff --git a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
    index 6f102b430..316f87de4 100644
    --- a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
    +++ b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
    @@ -555,8 +555,8 @@ &mcasp1 {
     	       0 0 0 0
     	       0 0 0 0
     	>;
    -	tx-num-evt = <32>;
    -	rx-num-evt = <32>;
    +	tx-num-evt = <0>;
    +	rx-num-evt = <0>;
     };
     
     &ospi0 {
    diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
    index 9f0669505..82d20d93b 100644
    --- a/drivers/dma/ti/k3-udma.c
    +++ b/drivers/dma/ti/k3-udma.c
    @@ -1843,7 +1843,8 @@ static int udma_alloc_rx_resources(struct udma_chan *uc)
     
     #define TISCI_BCDMA_TCHAN_VALID_PARAMS (			\
     	TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |	\
    -	TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID)
    +	TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |      \
    +	TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FDEPTH_VALID)
     
     #define TISCI_BCDMA_RCHAN_VALID_PARAMS (			\
     	TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID)
    @@ -2017,6 +2018,7 @@ static int bcdma_tisci_tx_channel_config(struct udma_chan *uc)
     	req_tx.nav_id = tisci_rm->tisci_dev_id;
     	req_tx.index = tchan->id;
     	req_tx.tx_supr_tdpkt = uc->config.notdpkt;
    +	req_tx.fdepth = 64;
     	if (ud->match_data->flags & UDMA_FLAG_TDTYPE) {
     		/* wait for peer to complete the teardown for PDMAs */
     		req_tx.valid_params |=
    @@ -3489,6 +3491,10 @@ udma_prep_dma_cyclic_tr(struct udma_chan *uc, dma_addr_t buf_addr,
     	u16 tr0_cnt0, tr0_cnt1, tr1_cnt0;
     	unsigned int i;
     	int num_tr;
    +	u32 period_csf = 0;
    +
    +	if (uc->config.ep_type == PSIL_EP_PDMA_XY && dir == DMA_MEM_TO_DEV)
    +		period_csf = CPPI5_TR_CSF_EOP;
     
     	num_tr = udma_get_tr_counters(period_len, __ffs(buf_addr), &tr0_cnt0,
     				      &tr0_cnt1, &tr1_cnt0);
    @@ -3538,8 +3544,10 @@ udma_prep_dma_cyclic_tr(struct udma_chan *uc, dma_addr_t buf_addr,
     		}
     
     		if (!(flags & DMA_PREP_INTERRUPT))
    -			cppi5_tr_csf_set(&tr_req[tr_idx].flags,
    -					 CPPI5_TR_CSF_SUPR_EVT);
    +			period_csf |= CPPI5_TR_CSF_SUPR_EVT;
    +
    +		if (period_csf)
    +			cppi5_tr_csf_set(&tr_req[tr_idx].flags, period_csf);
     
     		period_addr += period_len;
     	}
    diff --git a/include/sound/dmaengine_pcm.h b/include/sound/dmaengine_pcm.h
    index 2df54cf02..62d38fab5 100644
    --- a/include/sound/dmaengine_pcm.h
    +++ b/include/sound/dmaengine_pcm.h
    @@ -36,6 +36,7 @@ snd_pcm_uframes_t snd_dmaengine_pcm_pointer_no_residue(struct snd_pcm_substream
     int snd_dmaengine_pcm_open(struct snd_pcm_substream *substream,
     	struct dma_chan *chan);
     int snd_dmaengine_pcm_close(struct snd_pcm_substream *substream);
    +int snd_dmaengine_pcm_prepare(struct snd_pcm_substream *substream);
     
     int snd_dmaengine_pcm_open_request_chan(struct snd_pcm_substream *substream,
     	dma_filter_fn filter_fn, void *filter_data);
    diff --git a/sound/core/pcm_dmaengine.c b/sound/core/pcm_dmaengine.c
    index 494ec0c20..0304bf5bd 100644
    --- a/sound/core/pcm_dmaengine.c
    +++ b/sound/core/pcm_dmaengine.c
    @@ -349,6 +349,16 @@ int snd_dmaengine_pcm_open_request_chan(struct snd_pcm_substream *substream,
     }
     EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_open_request_chan);
     
    +int snd_dmaengine_pcm_prepare(struct snd_pcm_substream *substream)
    +{
    +	struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream);
    +
    +	dmaengine_synchronize(prtd->dma_chan);
    +
    +	return 0;
    +}
    +EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_prepare);
    +
     /**
      * snd_dmaengine_pcm_close - Close a dmaengine based PCM substream
      * @substream: PCM substream
    diff --git a/sound/soc/soc-generic-dmaengine-pcm.c b/sound/soc/soc-generic-dmaengine-pcm.c
    index 3b99f619e..a43beae62 100644
    --- a/sound/soc/soc-generic-dmaengine-pcm.c
    +++ b/sound/soc/soc-generic-dmaengine-pcm.c
    @@ -318,6 +318,12 @@ static int dmaengine_copy_user(struct snd_soc_component *component,
     	return 0;
     }
     
    +int dmaengine_pcm_prepare(struct snd_soc_component *component,
    +			  struct snd_pcm_substream *substream)
    +{
    +	return snd_dmaengine_pcm_prepare(substream);
    +}
    +
     static const struct snd_soc_component_driver dmaengine_pcm_component = {
     	.name		= SND_DMAENGINE_PCM_DRV_NAME,
     	.probe_order	= SND_SOC_COMP_ORDER_LATE,
    @@ -327,6 +333,7 @@ static const struct snd_soc_component_driver dmaengine_pcm_component = {
     	.trigger	= dmaengine_pcm_trigger,
     	.pointer	= dmaengine_pcm_pointer,
     	.pcm_construct	= dmaengine_pcm_new,
    +	.prepare	= dmaengine_pcm_prepare,
     };
     
     static const struct snd_soc_component_driver dmaengine_pcm_component_process = {
    @@ -339,6 +346,7 @@ static const struct snd_soc_component_driver dmaengine_pcm_component_process = {
     	.pointer	= dmaengine_pcm_pointer,
     	.copy_user	= dmaengine_copy_user,
     	.pcm_construct	= dmaengine_pcm_new,
    +	.prepare	= dmaengine_pcm_prepare,
     };
     
     static const char * const dmaengine_pcm_dma_channel_names[] = {
    diff --git a/sound/soc/ti/davinci-mcasp.c b/sound/soc/ti/davinci-mcasp.c
    index ca5d1bb6a..73cb9e4bd 100644
    --- a/sound/soc/ti/davinci-mcasp.c
    +++ b/sound/soc/ti/davinci-mcasp.c
    @@ -1475,7 +1475,7 @@ static int davinci_mcasp_hw_rule_min_periodsize(
     	struct snd_interval frames;
     
     	snd_interval_any(&frames);
    -	frames.min = 64;
    +	frames.min = 16;
     	frames.integer = 1;
     
     	return snd_interval_refine(period_size, &frames);
    

    The changes are basically:

    -disabling McASP FIFO,

    - Fix for xruns happening earlier(instability),

    -reduced period size from 64 to 32 or 16 (Current patch is with 16)  

    -chan teardown timeout error that used to happen on end of playback, 

    -Reduce FIFO size for BCDMA TX (From 196 to 64)

    In order to test with period size of 16 frames, alsaloop application had to be modified:

    diff --git a/alsaloop/alsaloop.c b/alsaloop/alsaloop.c
    index f5f2e37..3e4056f 100644
    --- a/alsaloop/alsaloop.c
    +++ b/alsaloop/alsaloop.c
    @@ -481,7 +481,7 @@ static int parse_config(int argc, char *argv[], snd_output_t *output,
                            break;
                    case 'E':
                            err = atoi(optarg);
    -                       arg_period_size = err >= 32 && err < 200000 ? err : 0;
    +                       arg_period_size = err >= 16 && err < 200000 ? err : 0;
                            break;
                    case 's':
                            err = atoi(optarg);
    

    Attached is the alsaloop utility with the above change: 

    alsaloop

    With all the above changes, we are able to see these latency numbers:

    Let me know, if this helps.

    Best Regards,

    Suren

  • The commands that we use are as follows:

    alsaloop -v -t 1600 --period=32 --buffer=2048 -S 0 -f S32_LE (By default sample rate is 48KHz)

    alsaloop -v -t 3200 --period=16 --buffer=8192 -S 0 -f S16_LE -r 16000 

    Hope these help.

    Best Regards,

    Suren

  • Hi Suren,

    Thank you for all of the assistance on this issue. The patch from Feb. 16 sufficiently resolves my issue, as I am able to repeat your first two tests on our
    custom hardware (using the two commands below):

    alsaloop -v -t 1600 --period=32 --buffer=2048 -S 0 -f S32_LE
    alsaloop -v -t 4200 --period=32 --buffer=8192 -S 0 -f S16_LE -r 16000

    The 48k test (test 1) ran for 40 hours without any errors. The 16k test (test 2) ran for 24 hours without any errors.

    I have not yet tried tests 3 and 4 with a --period=16 argument, but trust that those will perform similarly.

    There was one pitfall I want to make anyone following this issue aware of. With SDK9.1, the default kernel configuration Processor SDK for RT Linux must be built with the following command:

    make ARCH=arm64 CROSS_COMPILE=aarch64-oe-linux- defconfig ti_arm64_prune.config ti_rt.config

    The ti_rt.config patch was not specified in the Processor SDK documentation, and not including that obviously doesn't turn on any of the RT Linux functionality.

    I will ultimately try to dial in the best period size and buffer size for our products, but we have a baseline that we know to be functional.

    Again Suren, thank you and everyone on the TI team for the help in resolving this matter!

    Best Regards,

    Andy

  • Thank you Andy

    Additionally capturing some of the questions you asked offline with the responses from our experts 

    > The buffer sizes in frames of these latest tests are large. Do they need to be more than the 4x and 8x period size that we typically see with low-latency ALSA applications?

     

    The buffer-size corresponds to size of ring buffer used. It is important to determine an adequate size of it as the problem of XRUNS is related to it.

    For e.g.

    1)  During playback, if the application does not pass data into the buffer quickly enough as compared to the rate at which hardware is reading (i.e. as per sampling rate), the ring buffer would become empty causing an under-run scenario.

    2)  During capture, if the application does not read data from the buffer quickly enough as compared to the rate at which hardware is writing , the ring buffer would become full and hardware would overrite the data, causing an over-run scenario.

     

    Now both the above situations can occur depending on the system environment related factors viz CPU usage, scheduling latency, task priorities etc. As due to these factors, it is possible that application may not be getting enough CPU to complete the transactions on time and that's why an adequate size of ring buffer provides an extra layer of buffering to mitigate these factors, the trade-off is of course that larger buffer-size will use more memory.

     

    As a rule of thumb, It is generally recommended to start with buffer-size = 2x Period size and than keep increasing it incrementally (in multiple of periods though) until we stop seeing (or seeing less of) XRUNS, but the buffer-size also gets adjusted by apps sometimes as explained below.

     

    > Do you recommend the period size in frames be a power of 2n (16, 32, 64, etc) for stability or can it be any value? For example, 2 ms at 48 kHz would be 96 frames/second. Is that an acceptable value?

     

    It's usually the case as if you don't than linux kernel alsa framework would adjust it to be a multiple of it as per the handling in alsa core framework [1]. For e.g. it will ensure buffer-size is divisible by period-size using an integer as seen here [2] and if not then adjust it. The alsacore will make sure that period-size, period-time, param rate, buffer-size, buffer-time are homogenous w.r.t each other and if not then adjust them as per the rules [2].

     

    The second level of adjustment usually comes from the application, for e.g. I see alsaloop setting/adjusting buffer-size as 8 * period-size * (requested buffer-size/requested period-size) possibly here [3]

     

    So for e.g for alsaloop -v --period=64 --buffer=128 -S 0 -f S16_LE -r 48000

    It sets buffer-size as 1024 which is 64*8*(128/64) = 1024.

     

    [1] : https://elixir.bootlin.com/linux/latest/source/sound/core/pcm_native.c#L2557

    [2] : https://elixir.bootlin.com/linux/latest/source/sound/core/pcm_native.c

    [3] : https://github.com/alsa-project/alsa-utils/blob/master/alsaloop/pcmjob.c#L164