Intermittent McASP failure Issue

Steven Kenny

Other Parts Discussed in Thread: TMS320DM647

Hi Anyone,

I've been struggling with an intermittent issue and hoping someone might have some suggestions or seen something like this before.

The setup:

DSPBIOS: v5.33.03
DSPLINK: v1.64
DM648 BIOSPSP: v1.10.03
EDMA3_LLD: v1.05.00
XDC: v3.10.03
CG6X: v6.1.08

GPP is an Atom running Linux with PCI to a DM647 (not DM648)

A custom FPGA is used as the audio source/sink via 4 (2 channel TDM) McASP serialisers.
The FGPA is the McASP master of all serialiser using the same CLK and SYNC lines.
For a total of 8 In, 8 Out audio channels @ 96kHz.

The DSP Software:

Based on the examples from the BIOSPSP, the only main changes are with psp_audio file as we only use the McASP and don't need codec control.
Also the McASP is configure to use 4 serialisers in and 4 serialisers out.

The rest of the code is "above" the DSPBIOS layer.

Pseudo code/outline to follow.

During main()

dsp, edma3 and dsplink initialised.

McASP buffers allocated from IRAM (L2 heap) each 64(samples) * 8(Channels) * 4(UInt32).
I double buffer the In and Out stream.

I create two high priority SWI's, one for McASP Input buffer processing and another SWI for McASP Output buffer processing.

The McASP driver is SIO_create()'d twice once for SIO_OUTPUT and again of SIO_INPUT, using the SIO_ISSUERECLAIM model.
Both use callbacks that SWI_post() the appropriate SWI on completion of a buffer.

After main returns

A NOTIFY() signal from the GPP is used to start the audio processing.
I SIO_issue() all the buffers to their required drivers.

SWI Pseudo code

Both INPUT and OUTPUT SWI's are very similar

SIO_reclaim() a buffer from associated driver
if (encoder/decoder enabled)
{
CPU copy audio data into/from DDR2 audio channel buffers.
signal data to be processed.
}
SIO_issue() the buffer back to the driver.

Lower priority SWI's and TSK's handle encoding/decoding the McASP audio data(PCM) and sending/receiving this data via DSPLink (MSGQ).

The issue:

At an intermittent point during the infinite issue/reclaim SWI cycle the input McASP driver's callback will get called immediately after an SIO_issue() indicating that the buffer is complete.

The SWI's SIO_reclaim() does NOT return an error, so the buffer is processed as valid and then SIO_issue()'d again which is then immediately return, via the callback... repeat...

The output McASP driver is still ticking along as expected.

During my investigation I changed SIO_reclaim() to SIO_reclaimx() and found the pfstatus parameter was being set to -1 when this issue occurs.

With this status information I can now end the loop.
What I can't figure out is why pfstatus starts always return -1?

Obviously the driver has entered an error state but I can't figure out what has caused it.

There are a couple of scenarios that can help produce this issue.

If I change the order of SIO_create()'s calls, I create the SIO_INPUT McASP driver before the SIO_OUTPUT driver, then this issue will usually (45%) happen with 1 minute of start up. With no encoder/decoders enabled.
I originally put this issue down to an initialisation issue, but now seems related.
When using a third party encoder library, which requires IRAM buffers, this issue will usually occur within 10 mins.

If I reduce the number of serialisers to just 1 (2 channels) when initialising the McASP layer, this issue does not happen.

From my investigations I feel it is related to a EDMA/L2 RAM timing issue and I'm in the process of using all L2 as cache for some testing, but need some form of "FASTRAM" for the library to work. (L1 RAM) Is what I am currently trying to configure.

Any thoughts or suggestions would be greatly appreciated.

Steven.

over 13 years ago

0 Brad Griffis over 13 years ago

TI__Guru*** 125430 points

Do you know if the EDMA/McASP setup is utilizing the McASP FIFOs?
What's the threshold set to?
How have you configured the MSTPRI registers?

I don't know if it's related, but I once spent a REALLY long time debugging an issue that sounded similar to what you are experiencing. In my case I made the following change:

USE 32-BIT ACCESSES FOR edma3MemCpy

#if 0

/* Local MemCopy function */

void edma3MemCpy(void *dst, const void *src, unsigned int len)

{

unsigned int i=0u;

const unsigned char *sr;

unsigned char *ds;

assert ((src != NULL) && (dst != NULL));

sr = (const unsigned char *)src;

ds = (unsigned char *)dst;

for( i=0;i<len;i++)

{

*ds=*sr;

ds++;

sr++;

}

return;

}

#else

/* Local MemCopy function */

void edma3MemCpy(void *dst, const void *src, unsigned int len)

{

unsigned int i=0u;

const unsigned int *sr;

unsigned int *ds;

assert ((src != NULL) && (dst != NULL));

// BJG -- force 32-bit accesses

//assert ((src & 0x3 == 0) && (dst & 0x3 == 0) && (len & 0x3 == 0));

len = len >> 2; // convert "len" from bytes to words

sr = (const unsigned int *)src;

ds = (unsigned int *)dst;

for( i=0;i<len;i++)

{

*ds=*sr;

ds++;

sr++;

}

return;

}

#endif

0 Steven Kenny over 13 years ago in reply to Brad Griffis

Prodigy 90 points

Thanks Brad,

The edma3MemCpy change didn't change the problem BUT it got me thinking about other memcpy functions.

I've been using the stdc lib memcpy() in a few SWI's, and memset(), so I changed these to be word copies like the edma3MemCpy() instead.

It seems to have fixed my problem.... not 100% convinced yet :)

I'll verify your response once I've done some more testing and reproduced the issue with stdc lib memcpy back in.

Thanks

Steven

0 Brad Griffis over 13 years ago in reply to Steven Kenny

TI__Guru*** 125430 points

Hi Steve,

Any updates?

Best regards,
Brad

0 Steven Kenny over 13 years ago in reply to Brad Griffis

Prodigy 90 points

Hi Brad,

Hard to verify this one. Without long soak testing... which we are doing.

But changing all my memcpy()'s and memset()'s seems to have fixed the problem.
Within the context that my previous test cases, which caused the problem within about 10mins, now last overnight at least.
I may have just moved a timing issue to somewhere else...

I'm going to verify your answer and include my memcpy() and memset() in this post incase anyone wants them.
They seems to have helped when needing to do memcpy()'s or memset()'s within SWI's.

These functions were created from tips/code I got from this web page: http://www.danielvik.com/2010/02/fast-memcpy-in-c.html

Thanks
Steven

void* memcpy(void* dst, const void* src, unsigned int len)
{
uint8_t* dst8 = (uint8_t*)dst;
const uint8_t* src8 = (const uint8_t*)src;

/* Byte copy to word align dst8 */
switch ( ((unsigned int)dst8) & 0x3 ) {
case 1: if ( len ) { *dst8++ = *src8++; --len; }
case 2: if ( len ) { *dst8++ = *src8++; --len; }
case 3: if ( len ) { *dst8++ = *src8++; --len; }
}

/* Store remaining and reduce len to word count */
unsigned int remaining = len & 0x3;
len >>= 2;

if ( len ) {
uint32_t* dst32 = (uint32_t*)dst8;
/* Word align src8 and store shift offset value */
const unsigned int shift = ((unsigned int)src8) & 0x3;
const uint32_t* src32 = (const uint32_t*)(((unsigned int)src8) & ~0x3);

if ( shift ) {
const unsigned int shr = shift * 8;
const unsigned int shl = (4 * 8) - shr;
/* dst32 and src32 have different alignments
* so do sum shift magic while copying */
uint32_t srcWord = *src32++;
uint32_t dstWord;

while ( len-- ) {
dstWord = srcWord >> shr;
srcWord = *src32++;
dstWord |= srcWord << shl;
*dst32++ = dstWord;
}

/* Convert src32 into src8 accounting for shifted address */
src8 = (const uint8_t*)(src32 - 1);
src8 += shift;
} else {
/* dst32 and src32 are aligned so simple word copy */
while ( len-- ) {
*dst32++ = *src32++;
}

/* Convert aligned src32 to src8 */
src8 = (const uint8_t*)src32;
}

/* Convert aligned dst32 to dst8 */
dst8 = (uint8_t*)dst32;
}

/* Byte copy remaining */
switch ( remaining ) {
case 3: *dst8++ = *src8++;
case 2: *dst8++ = *src8++;
case 1: *dst8++ = *src8++;
}

return dst;
}

void* memset(void* dst, int c, unsigned int len)
{
uint8_t* dst8 = (uint8_t*)dst;

/* Byte copy to word align dst8 */
switch ( ((unsigned int)dst8) & 0x3 ) {
case 1: if (len--) *dst8++ = c;
case 2: if (len--) *dst8++ = c;
case 3: if (len--) *dst8++ = c;
}

unsigned int remaining = len & 0x3;

if ( len > 3 ) {
uint32_t* dst32 = (uint32_t*)dst8;
uint32_t val = c & 0xFF;

/* Repeat byte into 32bit word */
val |= val << 8;
val |= val << 16;

/* reduce len to word count and copy */
len >>= 2;
while ( len-- ) {
*dst32++ = val;
}
/* Byte copy remaining */
dst8 = (uint8_t*)dst32;
}

switch ( remaining ) {
case 3: *dst8++ = c;
case 2: *dst8++ = c;
case 1: *dst8++ = c;
}

return dst;
}

0 Brad Griffis over 13 years ago in reply to Steven Kenny

TI__Guru*** 125430 points

Thanks for the reply and the additional info. If you get a minute please provide another update. Hopefully it will still be going strong after running all weekend!

0 Steven Kenny over 13 years ago in reply to Brad Griffis

Prodigy 90 points

Hi Brad,

Sorry to say my issue is not solved. The memcpy's have help but just moved the timing of the issue.

I've narrowed it down a bit by increasing the number of samples per channel requested per McASP buffer request, from 64 from 128.
When buffers of 64(samples) * 8(channels) * 4(32bit) are issued to the McASP the time it takes before the driver moves into the error state is within about 1-2 minutes.
If I increase the samples to 128 the driver is rarely showing the error. I've only seen the error once so far, one days testing, but once is more than enough.

I'll need to break my app down into smaller pieces to get to the bottom of this issue.

Some info in case it rings any bells.

I have two apps that I run on the gpp side, the full audio network streaming and a test app that stores and plays the audio from files.
Both these apps use the same interface library that wraps/hides the DSPLink and DSP app loading. This library just presents interfaces to pull/push the audio to/from the DSP.

Now when the samples are 64 per channel my test app will usually run quite well but still fail after about 10mins where are the full app using the same parameters will fail within a 1min.
Which seems to suggest to me that the gpp has some influence over the McASP failure.
Which really can only be the DSPLink transfers.
I'm using MSGQ with ZCPY and the DSP is DMA master for these transfers and the PCI bus is share for all peripherals. ie network device.
I've been told, I didn't set this part up so sorry if I get the term wrong, that the EDMA has 2 priority ordered channels and DSPLink usually uses 0. but this has been changed so DSPLink uses the lower priority channel 1 and McASP is using 0.

So my current thoughts are the EDMA driver moves into an error condition caused by a McASP request conflicting with a DSPLink request and maybe a PCI mastering issue.
This is the area I'll be investigating after finishing up so cleanup work.
I'm not very familiar with the PSP lower layers so it'll take a bit, any suggestion welcome.

I'd also like to point out that if I only enable 1 serialiser (2 channels) then is problem new presents itself. Been running these tests for the last 1 week with no errors.

On this point if only 1 serialiser is enabled I can SIO_create() the SIO_INPUT McASP before the SIO_OUTPUT McASP without a problem.
But with more that 1 serialiser enabled I MUST SIO_create() the SIO_OUTPUT driver before the SIO_INPUT, otherwise 45% of the time the DSP started move into this error condition with 2 seconds of SIO_issue()'ing the buffers to the driver.

Any thoughts appreciated.

Steven

0 Brad Griffis over 13 years ago in reply to Steven Kenny

TI__Guru*** 125430 points

Please post your McASP registers, EDMA parameter RAMs (active + links), and EDMA registers. I'd like two dumps actually -- one while things are in a good state and one after things fail.

0 Steven Kenny over 13 years ago in reply to Brad Griffis

Prodigy 90 points

Hi Brad,

Sorry I'll have to get these to you in a couple of weeks, I have to move on to some other high priority work just now.
This bug is a background thing I've been trying to resolve.

Also the sad thing is I've not been doing this work with CCS, just makefiles and tools from CCSv3.3.
I'm building on top of PSP work done last year.

Late next I'll be setting up CCSv4, or maybe I'll try out v5, and get some usefully source level debugging going and get all these registers.

Thanks for your help so far... I'll be back.

Steven

0 Steven Kenny over 13 years ago in reply to Brad Griffis

Prodigy 90 points

Hi Brad,

Looks like we found the cause of problem.

It seems to be related to a DM647/8 silicon issue.
Advisory 1.1.5, SDAM/IDMA: When DSP Level 2 memory is configured as non-cache (RAM) unexpected blocking and potential deadlock condition may occur.
see TMS320DM647/DM648 Silicon Errata: SPRZ263G

It come down to the fact that I was using L2 RAM for my McASP buffers and using the DSP to process these buffers into external RAM, ie memcpy().

By changing some L1D cache into RAM and allocating my McASP buffers from L1D the problem has not happened again... so far... its been a week of testing.

My assumption is that the McASP EDMA transfer would stall/timeout because of this errata and the PSP McASP driver would them move into an error state always returning the buffers as soon as they are issued from then on.

Hope this helps someone else as well.

Thanks for your help

Steven.

0 Brad Griffis over 13 years ago in reply to Steven Kenny

TI__Guru*** 125430 points

Steven,

Thanks a lot for sharing this valuable information. Most of our devices in the last couple years have already corrected that particular erratum so I apologize that it didn't come to mind sooner! That certainly explains the behavior you're seeing. I've run into the issue a few times and I the observed issue depends on which driver is being used (e.g. some of my customers wrote their own). I've seen this behavior manifest itself as channel swapping/rotation on some other devices because the delay in servicing the McBSP was long enough to cause a sample to be missed.

This erratum is now resolved in all of our current/new devices (good news!). Furthermore, TI now generally builds a FIFO on the front end of all the serial ports to reduce the hard real-time deadlines. So I'm glad you were able to get everything figured out and I wanted to share the good news that for future TI DSP designs this sort of thing will be a bit easier to avoid!

Best regards,
Brad

Processors

Processors forum

Intermittent McASP failure Issue

The setup:

The DSP Software:

The issue: