SK-AM62: ALSA/McASP Linux RT Underflow

Andy Lenhart

Part Number: SK-AM62
Other Parts Discussed in Thread: TLV320AIC3106

Hello TI Team!

I'm trying to get a low latency audio demo working on the SK-AM62x (GP) using Processor-SDK-Linux-RT (version 9.0.0.3). I have written a userspace application with ALSA that captures 3.5mm mic audio from a connected headset and plays back that same audio (we'll eventually be processing this audio - this is just for a proof-of-concept). This is using the SK-AM62x onboard 3.5mm connector and TLV320AIC3106 CODEC. Unfortunately, while this works for a period of time, as currently configured it will always result in a kernel error from the McASP driver regarding a transmit buffer underflow that seems to be unrecoverable.

Some application specific details:

- The playback and capture threads are pthreads with a scheduler policy of SCHED_FIFO and a task priority of 80. The capture thread stores the audio into a buffer and the playback thread writes the stored buffer – right now, there is no audio processing in these threads, so they are not computationally-intensive.

- The playback and capture PCM parameters are: 1 channel, 48000 sample rate, signed 16-bit samples and the period time is 5 ms (period size in frames = 240, buffer size in frames = period size * 2 = 480).

As I said, we are going for low latency, thus the small period time. At this time, I believe our questions are:

With the current McASP kernel driver, is there a minimum period time that is supported? We see that 1 ms is not but no error is thrown with 5 ms and as I said this will work as currently configured for up to 23 hours.
Are there other considerations to prevent a transmit buffer underflow, such as a higher thread priority than 80?
This last question is more about recovery than the actual problem - is there any way to recover the McASP driver once it gets into this state? It seems calls to snd_pcm_prepare() correctly place the ALSA handle in the SND_PCM_STATE_PREPARED (instead of SND_PCM_STATE_XRUN), but further attempts to write to that handle fail and produce the same McASP kernel error printout.

Thanks so much for your help!

over 1 year ago

0 Suren Porwar over 1 year ago

TI__Mastermind 20170 points

Hi Andy,

Can you share the log with the kernel error you are seeing. We are aware of few issues with DMA transfer and the team is actually working to resolve the issue.

Best Regards,

Suren

0 Andy Lenhart over 1 year ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

Thank you so much for the quick reply! When the error happens, the following two kernel messages appear over and over on UART:

[ 322.151846] davinci-mcasp 2b10000.audio-controller: Receive buffer overflow
[ 322.167643] davinci-mcasp 2b10000.audio-controller: Transmit buffer underflow

Are there additional logs that would be helpful?

Thanks!

Andy

0 Suren Porwar 12 months ago in reply to Andy Lenhart

TI__Mastermind 20170 points

Hi Andy,

Can you try to add this change in the DTS file on your setup and try to run the test and see if the behavior changes.

diff --git a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
index 33768c02d8eb1..2b7b1671448b5 100644
--- a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
@@ -504,8 +504,8 @@ &mcasp1 {
               0 0 0 0
               0 0 0 0
        >;
-       tx-num-evt = <32>;
-       rx-num-evt = <32>;
+       tx-num-evt = <0>;
+       rx-num-evt = <0>;
 };

Best Regards,

Suren

0 Andy Lenhart 12 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

Thanks for the suggestion! I've made the change and am currently running my loopback test again. I will provide an update when I can. As I said in the original post, the current test has been known to run for up to 23 hours before failing, so this may take some time.

Andy

0 Suren Porwar 12 months ago in reply to Andy Lenhart

TI__Mastermind 20170 points

Understood Andy.

Keep us posted with your findings.

Best Regards,

Suren

0 Andy Lenhart 11 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

Unfortunately, I don't believe the device tree change with regards to the AFIFO has had any effect on the underflow/overflow issue. I ran the test to failure several times (once after about 7 hours, once sometime overnight, and once within minutes of starting the app). The failure is the same as before:

[ 148.552613] davinci-mcasp 2b10000.audio-controller: Receive buffer overflow
[ 148.559795] davinci-mcasp 2b10000.audio-controller: Transmit buffer underflow

There are no messages printed by the kernel indicating other activity right before the error (such as networking status updates), so I don't believe the processor was off doing something unexpected/unplanned. The receive overflow error seems to always appear first.

If there's additional information that I can gather, please let me know.

Andy

0 Suren Porwar 11 months ago in reply to Andy Lenhart

TI__Mastermind 20170 points

Hi Andy,

In your experiments, can you confirm if after seeing the overflow you are able to stop and start arecord again?

Or you have to reboot the board in order to get it working?

Also as I pointed to Jason in a separate thread, the minimum on period_size on our setup is 64 due to FIFO being set to 64. I have asked him to try to change it to 32 and see if it helps ( latency might improve but, overflow could get worse.. )

I have asked Jason, if its okay to continue our discussion here and close the other as its the same topic being discussed.

Best Regards,

Suren

0 Andy Lenhart 11 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

I am using my own userspace application that loops audio from the capture device (3.5mm mic) to the playback device (3.5mm headset). I cannot stop and restart my application - I must cycle power on the board in order to get it working again. I have not tried arecord - I can attempt to do that the next time I see a failure.

Andy

0 Andy Lenhart 11 months ago in reply to Andy Lenhart

Prodigy 30 points

I was able to confirm that after my application fails with an overflow, I can use arecord to successfully record mic audio without a board reboot.

0 Suren Porwar 11 months ago in reply to Andy Lenhart

TI__Mastermind 20170 points

Thanks Andy,

Is your application just doing record + playback on the EVM continuously? Could you share the script/application for us to reproduce the issue on our end and help debug.

Best Regards,

Suren

0 Andy Lenhart 11 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

Yes, once my application starts, it stores the received capture buffer and then plays that back on the same interface and does so continuously. I've attached my code along with pre-built debug and release images (I've been primarily using the debug version for my tests but the same issue happens on the release build as well).

In order to run this on the SK, the default device tree in Processor SDK 9.0.0.3 will work, but you will need to modify the kernel so that the TLV320AIC3106 codec is supported per your documentation here.

Andy

audio_test_sk_minimal.zip

0 Suren Porwar 11 months ago in reply to Andy Lenhart

TI__Mastermind 20170 points

Thanks Andy.

Can you run the below command continuously on your end and see if the issue occurs? In the below command I am capturing for 20 sec and playing back on the speaker.

arecord -Dhw:0,0 -r 48000 -c 2 --period-size=64 -d 20 -f S16_LE | aplay -Dhw:0,0

Best Regards,

Suren

0 Andy Lenhart 11 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

Unfortunately, that test also failed (I did remove the -d 20 option from arecord so that it could run continuously). Here is the console printout:

root@am62xx-evm:/usr/bin# arecord -Dhw:0,0 -r 48000 -c 2 --period-size=64 -f S16_LE | aplay -Dhw:0,0                                
Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo                                                         
Playing WAVE 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo                                                           
[16519.963667] ti-udma 485c0100.dma-controller: chan2 teardown timeout!                                                             
underrun!!! (at least 595.510 ms long)                                                                                              
[16519.997883] davinci-mcasp 2b10000.audio-controller: Transmit buffer underflow                                                    
[16521.019840] ti-udma 485c0100.dma-controller: chan1 teardown timeout!

Andy

0 Suren Porwar 11 months ago in reply to Andy Lenhart

TI__Mastermind 20170 points

Hi Andy,

Allow me a day/two to respond back on what might be going wrong with McASP.

Best Regards,

Suren

0 Lisandro Bravo 11 months ago in reply to Suren Porwar

Prodigy 91 points

Hi Suren:
FYI, Im also experiencing what is probably the same issue but in my case using a proprietary audio streaming protocol with an app that writes directly to ALSA

0 Lisandro Bravo 11 months ago in reply to Lisandro Bravo

Prodigy 91 points

can this be somehow related?
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1222149/am625-operate-codec-show-transmit-buffer-underflow?tisearch=e2e-sitesearch&keymatch=%22Transmit%20buffer%20underflow%22#

0 Suren Porwar 11 months ago in reply to Lisandro Bravo

TI__Mastermind 20170 points

Hi Andy, Bravo,

We are looking at this issue and a JIRA is currently filed to resolve this issue. More updates as I get, I will provide it here.

Best Regards,

Suren

0 Andy Lenhart 11 months ago in reply to Suren Porwar

Prodigy 30 points

Thank you for the update Suren! I'll look forward to any updates you can provide.

Andy

0 Andy Lenhart 11 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

I know people are still coming back from the holidays. I was just curious whether there was any update, and if not, perhaps an expected timeframe for such an update? Thank you!

Andy

0 Suren Porwar 11 months ago in reply to Andy Lenhart

TI__Mastermind 20170 points

Hi Andy,

Happy New year!

I appreciate your patience and apologies for the delay:

Our SW team suggested you try this GStreamer based pipeline as they didn't see any issues running it for 4+ hours so far:

Pasting the Pipeline here:

gst-launch-1.0 alsasrc device="hw:0,0" latency-time=5 ! audio/x-raw, format="(string)S16LE", layout="(string)interleaved", rate="(int)48000", channels="(int)2" ! queue ! alsasink device="hw:0,0" latency-time=5 -v

Let me know how it goes.

Best Regards,

Suren

0 Andy Lenhart 10 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

Happy new year to you as well!

I have been running the requested test using GStreamer. It is still running and I have not seen the overflow/underflow errors that I had previously. I will continue to run it over the weekend.

However, I'm not sold that this is an apples-to-apples comparison. I still need to familiarize myself with GStreamer, but the latency of the GStreamer loopback test is much greater (> 100 ms) than that of the user space application I wrote (and provided above) or alsaloop (approx. 20 ms). Also, the latency-time argument doesn't seem to be having any effect on the observed latency. Is this the proper parameter, and is this much larger latency with GStreamer expected?

If the ALSA period size that is set up by GStreamer is some larger value, then I would expect more stability than previous attempts. But then that reintroduces one of the questions in my original post - is there a minimum in regards to ALSA period time for the AM62x/McASP/RT Linux to prevent underflows/overflows?

Just to post it, here is the output of the GStreamer test since verbose mode was specified:

root@am62xx-evm:~# amixer sset PCM 90%                                                                                                                 
Simple mixer control 'PCM',0                                                                                                        
  Capabilities: pvolume                                                                                                             
  Playback channels: Front Left - Front Right                                                                                       
  Limits: Playback 0 - 127                                                                                                          
  Mono:                                                                                                                             
  Front Left: Playback 114 [90%] [-6.50dB]                                                                                          
  Front Right: Playback 114 [90%] [-6.50dB]                                                                                         
root@am62xx-evm:~# amixer sset 'Left PGA Mixer Mic3R' on                                                                            
Simple mixer control 'Left PGA Mixer Mic3R',0                                                                                       
  Capabilities: pswitch pswitch-joined                                                                                              
  Playback channels: Mono                                                                                                           
  Mono: Playback [on]                                                                                                               
root@am62xx-evm:~# amixer sset 'Right PGA Mixer Mic3R' on                                                                           
Simple mixer control 'Right PGA Mixer Mic3R',0                                                                                      
  Capabilities: pswitch pswitch-joined                                                                                              
  Playback channels: Mono                                                                                                           
  Mono: Playback [on]                                                                                                               
root@am62xx-evm:~# amixer sset PGA 90%                                                                                              
Simple mixer control 'PGA',0                                                                                                        
  Capabilities: cvolume cswitch                                                                                                     
  Capture channels: Front Left - Front Right                                                                                        
  Limits: Capture 0 - 119                                                                                                           
  Front Left: Capture 107 [90%] [53.50dB] [on]                                                                                      
  Front Right: Capture 107 [90%] [53.50dB] [on]                                                                                     
root@am62xx-evm:~# gst-launch-1.0 alsasrc device="hw:0,0" latency-time=5 ! audio/x-raw, format="(string)S16LE", layout="(string)inte
rleaved", rate="(int)48000", channels="(int)2" ! queue ! alsasink device="hw:0,0" latency-time=5 -v                                 
Setting pipeline to PAUSED ...                                                                                                      
Pipeline is live and does not need PREROLL ...                                                                                      
Pipeline is PREROLLED ...                                                                                                           
Setting pipeline to PLAYING ...                                                                                                     
New clock: GstAudioSrcClock                                                                                                         
/GstPipeline:pipeline0/GstAlsaSrc:alsasrc0: actual-buffer-time = 200000                                                             
/GstPipeline:pipeline0/GstAlsaSrc:alsasrc0: actual-latency-time = 1333                                                              
Redistribute latency...                                                                                                             
/GstPipeline:pipeline0/GstAlsaSrc:alsasrc0.GstPad:src: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate=(i
nt)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                                 
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, r
ate=(int)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                           
/GstPipeline:pipeline0/GstQueue:queue0.GstPad:sink: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate=(int)
48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                                    
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:sink: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, 
rate=(int)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                          
/GstPipeline:pipeline0/GstQueue:queue0.GstPad:src: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate=(int)4
8000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                                     
Redistribute latency...                                                                                                             
/GstPipeline:pipeline0/GstAlsaSink:alsasink0.GstPad:sink: caps = audio/x-raw, format=(string)S16LE, layout=(string)interleaved, rate
=(int)48000, channels=(int)2, channel-mask=(bitmask)0x0000000000000003                                                              
Redistribute latency...

Andy

0 Lisandro Bravo 10 months ago in reply to Andy Lenhart

Prodigy 91 points

As Andy correctly pointed out, this is not a valid test.

I have another thread pointing to the same issue and I even shared a little program that triggers the issue 100% of the time, plus a bit of further investigation of where this is coming from:

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1307264/sk-am62-davinci-mcasp-2b10000-audio-controller-transmit-buffer-underflow/4985582

0 Suren Porwar 10 months ago in reply to Lisandro Bravo

TI__Mastermind 20170 points

Hi Bravo/Andy,

We also ran, this below command using aplay and arecord for 30+ hours and haven't seen the issue yet.

The reason it would stop earlier was because of the max WAV file size in arecord:

arecord --max-file-time: Default is the maximum size supported by the file format: 2 GiB for WAV files..

Also refer to the issue that was reported in raspberry pi forums: raspberrypi.stackexchange.com/.../arecord-aplay-stop-after-a-while

arecord -Dhw:0,0 -r 48000 -c 2 --period-size=64 --buffer-size 9600 -f S16_LE -v -f dat -t raw | aplay -Dhw:0,0

Hope this helps.

Best Regards,
Suren

0 Andy Lenhart 10 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

If I try to run the command exactly as written, it fails in the manner below:

root@am62xx-evm:~# arecord -Dhw:0,0 -r 48000 -c 2 --period-size=64 --buffer-size 9600 -f S16_LE -v -f dat -t raw | aplay -Dhw:0,0 data 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo card 0 'AM62x-SKEVM' device 0 subdevice 0

: 5404319552844595200

'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono Sample format non available

ti-udma 485c0100.dma-controller: chan2 teardown timeout!

If I modify it so that the format is added to aplay and remove the redundant settings from arecord (since -f dat is a shortcut specifying -c, -r and -f), the equivalent command works:

arecord -Dhw:0,0 --period-size=64 --buffer-size 9600 -v -f dat -t raw | aplay -Dhw:0,0 -f dat

It should be noted that, even though the ALSA period is only 1.3 ms, the latency on this demo is extraordinarily large, even worse than the GStreamer test above. Is there a reason that arecord/aplay are used in conjunction as opposed to just utilizing alsaloop?

Additionally, you mentioned above that a JIRA was opened for this issue. Could you give us a status report for that bug? It seems that these latest tests we’re running are moving away from the issue that Lisandro and I are experiencing.

Thank you.

Andy

0 Andy Lenhart 10 months ago in reply to Andy Lenhart

Prodigy 30 points

Hi Suren,

This issue is gaining visibility in our organization and we are under increased pressure for a resolution. We've asked our FAE to help facilitate a call for alignment on this issue.

Andy

0 Jai Luthra 10 months ago in reply to Andy Lenhart

TI__Prodigy 400 points

underrun_crash_fix.patch

Hi Andy, Lisandro,

PFA the patch to fix the crashes seen with underruns. This is a temporary fix that worked on my system to at least make the errors recoverable - more work is needed to understand if this is a correct/upstreamable fix.

Thanks,
Jai

0 Lisandro Bravo 10 months ago in reply to Jai Luthra

Prodigy 91 points

Jai:

Thanks a lot for taking the time to look into this!

Though this is definitely a step in the right direction, at least the pcm doesnt halts on xrun now, as you say there is work to be done.

What I can see is that the DMA teardown issue is still present when calling pcm_drop & pcm_prepare, and that audio playback resume afterwards introduces a long delay, I will take proper measures on monday but seems around 2 seconds.

I have adapted this wav player program to force pcm_drop & pcm_prepare from another example I provided to you guys earlier (this version is more flexible with different wave formats), to make it easy to reproduce this issue:

// A simple C example to play a mono or stereo, 16-bit 44KHz
// WAVE file using ALSA. This goes directly to the first
// audio card (ie, its first set of audio out jacks). It
// uses the snd_pcm_writei() mode of outputting waveform data,
// blocking.
//
// Compile as so to create "alsawave":
// gcc -o alsawave alsawave.c -lasound
//
// Run it from a terminal, specifying the name of a WAVE file to play:
// ./alsawave MyWaveFile.wav

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

// Include the ALSA .H file that defines ALSA functions/data
#include <alsa/asoundlib.h>

#pragma pack (1)
/////////////////////// WAVE File Stuff /////////////////////
// An IFF file header looks like this
typedef struct _FILE_head
{
	unsigned char	ID[4];	// could be {'R', 'I', 'F', 'F'} or {'F', 'O', 'R', 'M'}
	unsigned int	Length;	// Length of subsequent file (including remainder of header). This is in
									// Intel reverse byte order if RIFF, Motorola format if FORM.
	unsigned char	Type[4];	// {'W', 'A', 'V', 'E'} or {'A', 'I', 'F', 'F'}
} FILE_head;


// An IFF chunk header looks like this
typedef struct _CHUNK_head
{
	unsigned char ID[4];	// 4 ascii chars that is the chunk ID
	unsigned int	Length;	// Length of subsequent data within this chunk. This is in Intel reverse byte
							// order if RIFF, Motorola format if FORM. Note: this doesn't include any
							// extra byte needed to pad the chunk out to an even size.
} CHUNK_head;

// WAVE fmt chunk
typedef struct _FORMAT {
	short				wFormatTag;
	unsigned short	wChannels;
	unsigned int	dwSamplesPerSec;
	unsigned int	dwAvgBytesPerSec;
	unsigned short	wBlockAlign;
	unsigned short	wBitsPerSample;
  // Note: there may be additional fields here, depending upon wFormatTag
} FORMAT;
#pragma pack()

// Size of the audio card hardware buffer. Here we want it
// set to 1024 16-bit sample points. This is relatively
// small in order to minimize latency. If you have trouble
// with underruns, you may need to increase this, and PERIODSIZE
// (trading off lower latency for more stability)
#define BUFFERSIZE	(2*1024)

// How many sample points the ALSA card plays before it calls
// our callback to fill some more of the audio card's hardware
// buffer. Here we want ALSA to call our callback after every
// 64 sample points have been played
#define PERIODSIZE	(2*64)

// Handle to ALSA (audio card's) playback port
snd_pcm_t				*PlaybackHandle;

// Handle to our callback thread
snd_async_handler_t	*CallbackHandle;

// Points to loaded WAVE file's data
unsigned char			*WavePtr;

// Size (in frames) of loaded WAVE file's data
snd_pcm_uframes_t		WaveSize;

// Sample rate
unsigned short			WaveRate;

// Bit resolution
unsigned char			WaveBits;

// Number of channels in the wave file
unsigned char			WaveChannels;

// The name of the ALSA port we output to. In this case, we're
// directly writing to hardware card 0,0 (ie, first set of audio
// outputs on the first audio card)
char SoundCardPortName[100] = "hw:0,0"; 

// For WAVE file loading
static const unsigned char Riff[4]	= { 'R', 'I', 'F', 'F' };
static const unsigned char Wave[4] = { 'W', 'A', 'V', 'E' };
static const unsigned char Fmt[4] = { 'f', 'm', 't', ' ' };
static const unsigned char Data[4] = { 'd', 'a', 't', 'a' };

/********************** compareID() *********************
 * Compares the passed ID str (ie, a ptr to 4 Ascii
 * bytes) with the ID at the passed ptr. Returns TRUE if
 * a match, FALSE if not.
 */

static unsigned char compareID(const unsigned char * id, unsigned char * ptr)
{
	register unsigned char i = 4;

	while (i--)
	{
		if ( *(id)++ != *(ptr)++ ) return(0);
	}
	return(1);
}

/********************** waveLoad() *********************
 * Loads a WAVE file.
 *
 * fn =			Filename to load.
 *
 * RETURNS: 0 if success, non-zero if not.
 *
 * NOTE: Sets the global "WavePtr" to an allocated buffer
 * containing the wave data, and "WaveSize" to the size
 * in sample points.
 */

static unsigned char waveLoad(const char *fn)
{
	const char				*message;
	FILE_head				head;
	register int			inHandle;

	if ((inHandle = open(fn, O_RDONLY)) == -1)
		message = "didn't open";

	// Read in IFF File header
	else
	{
		if (read(inHandle, &head, sizeof(FILE_head)) == sizeof(FILE_head))
		{
			// Is it a RIFF and WAVE?
			if (!compareID(&Riff[0], &head.ID[0]) || !compareID(&Wave[0], &head.Type[0]))
			{
				message = "is not a WAVE file";
				goto bad;
			}

			// Read in next chunk header
			while (read(inHandle, &head, sizeof(CHUNK_head)) == sizeof(CHUNK_head))
			{
				// ============================ Is it a fmt chunk? ===============================
				if (compareID(&Fmt[0], &head.ID[0]))
				{
					FORMAT	format;

					// Read in the remainder of chunk
					if (read(inHandle, &format.wFormatTag, sizeof(FORMAT)) != sizeof(FORMAT)) break;

					// Can't handle compressed WAVE files
					if (format.wFormatTag != 1)
					{
						message = "compressed WAVE not supported";
						goto bad;
					}

					WaveBits = (unsigned char)format.wBitsPerSample;
					WaveRate = (unsigned short)format.dwSamplesPerSec;
					WaveChannels = format.wChannels;
				}

				// ============================ Is it a data chunk? ===============================
				else if (compareID(&Data[0], &head.ID[0]))
				{
					// Size of wave data is head.Length. Allocate a buffer and read in the wave data
					if (!(WavePtr = (unsigned char *)malloc(head.Length)))
					{
						message = "won't fit in RAM";
						goto bad;
					}

					if (read(inHandle, WavePtr, head.Length) != head.Length)
					{
						free(WavePtr);
						break;
					}

					// Store size (in frames)
					WaveSize = (head.Length * 8) / ((unsigned int)WaveBits * (unsigned int)WaveChannels);

					close(inHandle);
					return(0);
				}

				// ============================ Skip this chunk ===============================
				else
				{
					if (head.Length & 1) ++head.Length;  // If odd, round it up to account for pad byte
					lseek(inHandle, head.Length, SEEK_CUR);
				}
			}
		}

		message = "is a bad WAVE file";
bad:	close(inHandle);
	}

	printf("%s %s\n", fn, message);
	return(1);
}


int reset(){
	int err = 0;
	printf("Reset\n");
    
    printf("--snd_pcm_drop\n");
	if ((err = snd_pcm_drop(PlaybackHandle))) {
		fprintf(stderr, "cannot drop pending frames (%s)\n",
				snd_strerror(err));
		goto done;
	}

    printf("--snd_pcm_prepare\n");
	if ((err = snd_pcm_prepare(PlaybackHandle))) {
		fprintf(stderr, "cannot prepare playback audio interface for use (%s)\n",
				snd_strerror(err));
		goto done;
	}
    printf("End reset\n");

done:
	return err;
}

/********************** play_audio() **********************
 * Plays the loaded waveform.
 *
 * NOTE: ALSA sound card's handle must be in the global
 * "PlaybackHandle". A pointer to the wave data must be in
 * the global "WavePtr", and its size of "WaveSize".
 */
static void play_audio(void)
{
    register snd_pcm_uframes_t frames;
    snd_pcm_uframes_t frameSize = 4096; // Set the frame size to 4096
    snd_pcm_uframes_t remainingFrames = WaveSize; // Total number of frames to be played
    unsigned char *dataPtr = WavePtr; // Pointer to the current position in wave data
	int period_reset = 0;

	auto pcm_state = snd_pcm_state(PlaybackHandle);

    while (remainingFrames > 0)
    {
        snd_pcm_uframes_t framesToWrite = remainingFrames < frameSize ? remainingFrames : frameSize;

        // Write the frames
        frames = snd_pcm_writei(PlaybackHandle, dataPtr, framesToWrite);

        // If an error, try to recover from it
        if (frames < 0)
            frames = snd_pcm_recover(PlaybackHandle, frames, 0);
        if (frames < 0)
        {
            printf("Error playing wave: %s\n", snd_strerror(frames));
            break;
        }

        // Update pointers and counters
        dataPtr += frames * (WaveBits / 8) * WaveChannels;
        remainingFrames -= frames;
		
		period_reset++;
		if(period_reset>=5){
            period_reset=0;
			reset();   
        }

    }

    // Wait for playback to completely finish
    if (remainingFrames == 0)
        snd_pcm_drain(PlaybackHandle);
}

/*********************** free_wave_data() *********************
 * Frees any wave data we loaded.
 *
 * NOTE: A pointer to the wave data be in the global
 * "WavePtr".
 */

static void free_wave_data(void)
{
	if (WavePtr) free(WavePtr);
	WavePtr = 0;
}

int main(int argc, char **argv)
{
	// No wave data loaded yet
	WavePtr = 0;

	if (argc < 2)
    {
        printf("Usage: %s <WAVE file> [SoundCardPortName]\n", argv[0]);
        printf("Example: %s sound.wav hw:1,0\n", argv[0]);
        return 1;
    }

    // Check if sound card is specified
    if (argc > 2)
    {
        // Copy the specified sound card name
        strncpy(SoundCardPortName, argv[2], sizeof(SoundCardPortName));
        SoundCardPortName[sizeof(SoundCardPortName) - 1] = '\0'; // Ensure null-termination
    }

	// Load the wave file
	if (!waveLoad(argv[1]))
	{
		register int		err;

		// Open audio card we wish to use for playback
		if ((err = snd_pcm_open(&PlaybackHandle, &SoundCardPortName[0], SND_PCM_STREAM_PLAYBACK, 0)) < 0)
			printf("Can't open audio %s: %s\n", &SoundCardPortName[0], snd_strerror(err));
		else
		{
			switch (WaveBits)
			{
				case 8:
					err = SND_PCM_FORMAT_U8;
					break;
			
				case 16:
					err = SND_PCM_FORMAT_S16;
					break;
			
				case 24:
					err = SND_PCM_FORMAT_S24;
					break;
			
				case 32:
					err = SND_PCM_FORMAT_S32;
					break;
			}
		
			// Set the audio card's hardware parameters (sample rate, bit resolution, etc)
			if ((err = snd_pcm_set_params(PlaybackHandle, err, SND_PCM_ACCESS_RW_INTERLEAVED, WaveChannels, WaveRate, 1, 500000)) < 0)
				printf("Can't set sound parameters: %s\n", snd_strerror(err));

			// Play the waveform
			else
				play_audio();

			// Close sound card
			snd_pcm_close(PlaybackHandle);
		}
	}

	// Free the WAVE data
	free_wave_data();

	return(0);
}

This is the output on a am62xx-lp-evm with the underrun_crash_fix.patch applied:

./alsawave piano2.wav 
Reset
--snd_pcm_drop
--snd_pcm_prepare
[ 2476.733559] ti-udma 485c0100.dma-controller: chan1 teardown timeout!
End reset
Reset
--snd_pcm_drop
--snd_pcm_prepare
[ 2479.005665] ti-udma 485c0100.dma-controller: chan1 teardown timeout!
End reset
Reset
--snd_pcm_drop
--snd_pcm_prepare
[ 2481.277681] ti-udma 485c0100.dma-controller: chan1 teardown timeout!

I have also tested the same code on an iMX8, there is no DMA issues on that platform and calling the reset() function introduces barely noticeable skips on the audio output as to be expected.

It was also mentioned that Raspberry Pi had similar issues, I tested a Raspberry Pi 4 with the latest image and the behaviour is the same as with NXP.

We will require at least the same behaviour as the imx8 in order to validate this platform as suitable for our project (several thousands units / year).

+1 Suren Porwar 9 months ago in reply to Lisandro Bravo

TI__Mastermind 20170 points

Hi Andy,

Here is the complete patch that you need to apply on 9.1 SDK released kernel.

alsa-latency-changes.txt

diff --git a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
index 6f102b430..316f87de4 100644
--- a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
@@ -555,8 +555,8 @@ &mcasp1 {
 	       0 0 0 0
 	       0 0 0 0
 	>;
-	tx-num-evt = <32>;
-	rx-num-evt = <32>;
+	tx-num-evt = <0>;
+	rx-num-evt = <0>;
 };
 
 &ospi0 {
diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index 9f0669505..82d20d93b 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -1843,7 +1843,8 @@ static int udma_alloc_rx_resources(struct udma_chan *uc)
 
 #define TISCI_BCDMA_TCHAN_VALID_PARAMS (			\
 	TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID |	\
-	TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID)
+	TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_SUPR_TDPKT_VALID |      \
+	TI_SCI_MSG_VALUE_RM_UDMAP_CH_TX_FDEPTH_VALID)
 
 #define TISCI_BCDMA_RCHAN_VALID_PARAMS (			\
 	TI_SCI_MSG_VALUE_RM_UDMAP_CH_PAUSE_ON_ERR_VALID)
@@ -2017,6 +2018,7 @@ static int bcdma_tisci_tx_channel_config(struct udma_chan *uc)
 	req_tx.nav_id = tisci_rm->tisci_dev_id;
 	req_tx.index = tchan->id;
 	req_tx.tx_supr_tdpkt = uc->config.notdpkt;
+	req_tx.fdepth = 64;
 	if (ud->match_data->flags & UDMA_FLAG_TDTYPE) {
 		/* wait for peer to complete the teardown for PDMAs */
 		req_tx.valid_params |=
@@ -3489,6 +3491,10 @@ udma_prep_dma_cyclic_tr(struct udma_chan *uc, dma_addr_t buf_addr,
 	u16 tr0_cnt0, tr0_cnt1, tr1_cnt0;
 	unsigned int i;
 	int num_tr;
+	u32 period_csf = 0;
+
+	if (uc->config.ep_type == PSIL_EP_PDMA_XY && dir == DMA_MEM_TO_DEV)
+		period_csf = CPPI5_TR_CSF_EOP;
 
 	num_tr = udma_get_tr_counters(period_len, __ffs(buf_addr), &tr0_cnt0,
 				      &tr0_cnt1, &tr1_cnt0);
@@ -3538,8 +3544,10 @@ udma_prep_dma_cyclic_tr(struct udma_chan *uc, dma_addr_t buf_addr,
 		}
 
 		if (!(flags & DMA_PREP_INTERRUPT))
-			cppi5_tr_csf_set(&tr_req[tr_idx].flags,
-					 CPPI5_TR_CSF_SUPR_EVT);
+			period_csf |= CPPI5_TR_CSF_SUPR_EVT;
+
+		if (period_csf)
+			cppi5_tr_csf_set(&tr_req[tr_idx].flags, period_csf);
 
 		period_addr += period_len;
 	}
diff --git a/include/sound/dmaengine_pcm.h b/include/sound/dmaengine_pcm.h
index 2df54cf02..62d38fab5 100644
--- a/include/sound/dmaengine_pcm.h
+++ b/include/sound/dmaengine_pcm.h
@@ -36,6 +36,7 @@ snd_pcm_uframes_t snd_dmaengine_pcm_pointer_no_residue(struct snd_pcm_substream
 int snd_dmaengine_pcm_open(struct snd_pcm_substream *substream,
 	struct dma_chan *chan);
 int snd_dmaengine_pcm_close(struct snd_pcm_substream *substream);
+int snd_dmaengine_pcm_prepare(struct snd_pcm_substream *substream);
 
 int snd_dmaengine_pcm_open_request_chan(struct snd_pcm_substream *substream,
 	dma_filter_fn filter_fn, void *filter_data);
diff --git a/sound/core/pcm_dmaengine.c b/sound/core/pcm_dmaengine.c
index 494ec0c20..0304bf5bd 100644
--- a/sound/core/pcm_dmaengine.c
+++ b/sound/core/pcm_dmaengine.c
@@ -349,6 +349,16 @@ int snd_dmaengine_pcm_open_request_chan(struct snd_pcm_substream *substream,
 }
 EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_open_request_chan);
 
+int snd_dmaengine_pcm_prepare(struct snd_pcm_substream *substream)
+{
+	struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream);
+
+	dmaengine_synchronize(prtd->dma_chan);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_prepare);
+
 /**
  * snd_dmaengine_pcm_close - Close a dmaengine based PCM substream
  * @substream: PCM substream
diff --git a/sound/soc/soc-generic-dmaengine-pcm.c b/sound/soc/soc-generic-dmaengine-pcm.c
index 3b99f619e..a43beae62 100644
--- a/sound/soc/soc-generic-dmaengine-pcm.c
+++ b/sound/soc/soc-generic-dmaengine-pcm.c
@@ -318,6 +318,12 @@ static int dmaengine_copy_user(struct snd_soc_component *component,
 	return 0;
 }
 
+int dmaengine_pcm_prepare(struct snd_soc_component *component,
+			  struct snd_pcm_substream *substream)
+{
+	return snd_dmaengine_pcm_prepare(substream);
+}
+
 static const struct snd_soc_component_driver dmaengine_pcm_component = {
 	.name		= SND_DMAENGINE_PCM_DRV_NAME,
 	.probe_order	= SND_SOC_COMP_ORDER_LATE,
@@ -327,6 +333,7 @@ static const struct snd_soc_component_driver dmaengine_pcm_component = {
 	.trigger	= dmaengine_pcm_trigger,
 	.pointer	= dmaengine_pcm_pointer,
 	.pcm_construct	= dmaengine_pcm_new,
+	.prepare	= dmaengine_pcm_prepare,
 };
 
 static const struct snd_soc_component_driver dmaengine_pcm_component_process = {
@@ -339,6 +346,7 @@ static const struct snd_soc_component_driver dmaengine_pcm_component_process = {
 	.pointer	= dmaengine_pcm_pointer,
 	.copy_user	= dmaengine_copy_user,
 	.pcm_construct	= dmaengine_pcm_new,
+	.prepare	= dmaengine_pcm_prepare,
 };
 
 static const char * const dmaengine_pcm_dma_channel_names[] = {
diff --git a/sound/soc/ti/davinci-mcasp.c b/sound/soc/ti/davinci-mcasp.c
index ca5d1bb6a..73cb9e4bd 100644
--- a/sound/soc/ti/davinci-mcasp.c
+++ b/sound/soc/ti/davinci-mcasp.c
@@ -1475,7 +1475,7 @@ static int davinci_mcasp_hw_rule_min_periodsize(
 	struct snd_interval frames;
 
 	snd_interval_any(&frames);
-	frames.min = 64;
+	frames.min = 16;
 	frames.integer = 1;
 
 	return snd_interval_refine(period_size, &frames);

The changes are basically:

-disabling McASP FIFO,

- Fix for xruns happening earlier(instability),

-reduced period size from 64 to 32 or 16 (Current patch is with 16)

-chan teardown timeout error that used to happen on end of playback,

-Reduce FIFO size for BCDMA TX (From 196 to 64)

In order to test with period size of 16 frames, alsaloop application had to be modified:

diff --git a/alsaloop/alsaloop.c b/alsaloop/alsaloop.c
index f5f2e37..3e4056f 100644
--- a/alsaloop/alsaloop.c
+++ b/alsaloop/alsaloop.c
@@ -481,7 +481,7 @@ static int parse_config(int argc, char *argv[], snd_output_t *output,
                        break;
                case 'E':
                        err = atoi(optarg);
-                       arg_period_size = err >= 32 && err < 200000 ? err : 0;
+                       arg_period_size = err >= 16 && err < 200000 ? err : 0;
                        break;
                case 's':
                        err = atoi(optarg);

Attached is the alsaloop utility with the above change:

alsaloop

With all the above changes, we are able to see these latency numbers:

Let me know, if this helps.

Best Regards,

Suren

0 Suren Porwar 9 months ago in reply to Suren Porwar

TI__Mastermind 20170 points

The commands that we use are as follows:

alsaloop -v -t 1600 --period=32 --buffer=2048 -S 0 -f S32_LE (By default sample rate is 48KHz)

alsaloop -v -t 3200 --period=16 --buffer=8192 -S 0 -f S16_LE -r 16000

Hope these help.

Best Regards,

Suren

0 Andy Lenhart 9 months ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

Thank you for all of the assistance on this issue. The patch from Feb. 16 sufficiently resolves my issue, as I am able to repeat your first two tests on our
custom hardware (using the two commands below):

alsaloop -v -t 1600 --period=32 --buffer=2048 -S 0 -f S32_LE
alsaloop -v -t 4200 --period=32 --buffer=8192 -S 0 -f S16_LE -r 16000

The 48k test (test 1) ran for 40 hours without any errors. The 16k test (test 2) ran for 24 hours without any errors.

I have not yet tried tests 3 and 4 with a --period=16 argument, but trust that those will perform similarly.

There was one pitfall I want to make anyone following this issue aware of. With SDK9.1, the default kernel configuration Processor SDK for RT Linux must be built with the following command:

make ARCH=arm64 CROSS_COMPILE=aarch64-oe-linux- defconfig ti_arm64_prune.config ti_rt.config

The ti_rt.config patch was not specified in the Processor SDK documentation, and not including that obviously doesn't turn on any of the RT Linux functionality.

I will ultimately try to dial in the best period size and buffer size for our products, but we have a baseline that we know to be functional.

Again Suren, thank you and everyone on the TI team for the help in resolving this matter!

Best Regards,

Andy

0 Mukul Bhatnagar 9 months ago in reply to Andy Lenhart

TI__Guru* 81755 points

Thank you Andy

Additionally capturing some of the questions you asked offline with the responses from our experts

> The buffer sizes in frames of these latest tests are large. Do they need to be more than the 4x and 8x period size that we typically see with low-latency ALSA applications?

The buffer-size corresponds to size of ring buffer used. It is important to determine an adequate size of it as the problem of XRUNS is related to it.

For e.g.

1) During playback, if the application does not pass data into the buffer quickly enough as compared to the rate at which hardware is reading (i.e. as per sampling rate), the ring buffer would become empty causing an under-run scenario.

2) During capture, if the application does not read data from the buffer quickly enough as compared to the rate at which hardware is writing , the ring buffer would become full and hardware would overrite the data, causing an over-run scenario.

Now both the above situations can occur depending on the system environment related factors viz CPU usage, scheduling latency, task priorities etc. As due to these factors, it is possible that application may not be getting enough CPU to complete the transactions on time and that's why an adequate size of ring buffer provides an extra layer of buffering to mitigate these factors, the trade-off is of course that larger buffer-size will use more memory.

As a rule of thumb, It is generally recommended to start with buffer-size = 2x Period size and than keep increasing it incrementally (in multiple of periods though) until we stop seeing (or seeing less of) XRUNS, but the buffer-size also gets adjusted by apps sometimes as explained below.

> Do you recommend the period size in frames be a power of 2n (16, 32, 64, etc) for stability or can it be any value? For example, 2 ms at 48 kHz would be 96 frames/second. Is that an acceptable value?

It's usually the case as if you don't than linux kernel alsa framework would adjust it to be a multiple of it as per the handling in alsa core framework [1]. For e.g. it will ensure buffer-size is divisible by period-size using an integer as seen here [2] and if not then adjust it. The alsacore will make sure that period-size, period-time, param rate, buffer-size, buffer-time are homogenous w.r.t each other and if not then adjust them as per the rules [2].

The second level of adjustment usually comes from the application, for e.g. I see alsaloop setting/adjusting buffer-size as 8 * period-size * (requested buffer-size/requested period-size) possibly here [3]

So for e.g for alsaloop -v --period=64 --buffer=128 -S 0 -f S16_LE -r 48000

It sets buffer-size as 1024 which is 64*8*(128/64) = 1024.

[1] : https://elixir.bootlin.com/linux/latest/source/sound/core/pcm_native.c#L2557

[2] : https://elixir.bootlin.com/linux/latest/source/sound/core/pcm_native.c

[3] : https://github.com/alsa-project/alsa-utils/blob/master/alsaloop/pcmjob.c#L164

Processors

Processors forum

SK-AM62: ALSA/McASP Linux RT Underflow