OMAP-L137 EVM and c6Flo Audio sample latency.

Brian Flinn

Other Parts Discussed in Thread: OMAP-L137, TLV320AIC3106, CCSTUDIO

Hello All:
I'm trying to get a handle on where all the audio latency is coming from on the OMAP-L137 Eval boad.
I'm measuring around 28mS.
I'm running code generated by the C6flo tool. This Latency is present with no processing just a Audio in Block to a Audio Out block
from the generated code form C6flo.
I looked at the buffer sizes (at least the ones I could view in the code) they don't seem unreasonable at all about 128 samples is not very big.
So if we're running at 48Khz sample rate about 20 uS each that should only be 2.56 mS... ?
I'm hoping you can assist me in tracking down the source of the latency... Thanks

Brian

over 15 years ago

0 Joe Coombs over 15 years ago

TI__Expert 8890 points

Brian,

You're right, 28 ms does seem like excessive latency for a simple audio system. We made some fairly conservative design choices with the audio blocks for C6Flo, so I have a suggestion that you can try to lower your audio latency.

By default, the audio blocks enqueue 4 buffers each into the input and output interfaces. I believe that reducing this number to 2 will improve your latency. To try this out, you'll just need to make a very simple change to 6 lines in the *_blocks.c source file.

int ti_c6flo_evmc6747_audioin_v1_create(ti_c6flo_evmc6747_audioin_v1_hdl blockp)
{
    // ...
    sio_attrs.nbufs = 2; // was 4
    // ...
    for (i = 0; i < 2; i++) // was 4
    // ...
    for (i = 0; i < 2; i++) // was 4
    // ...
}

int ti_c6flo_evmc6747_audioout_v1_create(ti_c6flo_evmc6747_audioout_v1_hdl blockp)
{
    // ...
    sio_attrs.nbufs = 2; // was 4
    // ...
    for (i = 0; i < 2; i++) // was 4
    // ...
    for (i = 0; i < 2; i++) // was 4
    // ...
}

Please give this a shot at let me know if it improves your audio latency.

0 Brian Flinn over 15 years ago in reply to Joe Coombs

Expert 1180 points

Thanks Joe
That brought it down to 16.45 ms.
Still to high for our needs. We need to be down in the sub 2.0 ms range in order to use.
Any suggestions?
Thanks
Brian

0 Joe Coombs over 15 years ago in reply to Brian Flinn

TI__Expert 8890 points

Brian,

I'm glad that the latency improved with that last change. Let's keep at it until we achieve acceptable latency for your application.

I didn't notice this before, but it looks like your previous calculations were based on a 48 kHz sampling rate. The code generated by C6Flo will actually use a 44.1 kHz sampling rate. To get the total latency for audio pass-through lower than 2 ms, we'll need to reduce the size of your audio buffers. I recommend 32 samples per buffer, which works out to ~ 0.7 ms. We could change this in the graphical tool, but that would overwrite our previous edit to the C source. Fortunately, it's easy to change the buffer size from 128 to 32 in the *_threads.c source file:

// Thread parameter structs
C6Flo_std_thread_obj thread0_obj = {
    /* buffer size (bytes)      = */ 128, // was 512
    /* buffer length (elements) = */ 32, // was 128
    /* buffer alignment (bytes) = */ 128,
    /* thread index             = */ 0
};

Note: the buffer size should be exactly 4 times the buffer length if you're using single-precision floating point data buffers. If you're using Int16, then size should be 2 times length. Also, please note that setting the buffer length less than 16 make the audio driver unstable.

This change probably won't get you all the way there by itself. I need to do a little more investigating on my side as well. The fact that you're seeing latency more than two times greater than the buffer length (and the fact that we've switched to a simple ping pong buffer setup) leads me to believe that we may be seeing significant latency in the audio hardware. It seems unlikely that our software alone could cause such large latency without dropping frames.

0 Brian Flinn over 15 years ago in reply to Joe Coombs

Expert 1180 points

Thanks Joe

My *_threds.c original source had this
// Thread parameter structs
C6Flo_std_thread_obj thread0_obj = {
    /* buffer size (bytes)      = */ 1024,
    /* buffer length (elements) = */ 256,
    /* buffer alignment (bytes) = */ 128,
    /* thread index             = */ 0
};

With changes to 128 and 32 now at 4.35 ms...

"The fact that you're seeing latency more than two times greater than the buffer length
(and the fact that we've switched to a simple ping pong buffer setup)
leads me to believe that we may be seeing significant latency in the audio hardware."

Yes, I started looking at that A/D D/A codec data sheet tlv320aic3106.pdf etc but didn't seem to find anythig there.
Maybe I missed somthing or somthing else on the board, don't know.
Do you think it could be the McASP setup possably with the (AFIFO) audio FIFO tx or rx etc...? Just guessing...
Keep looking I will too, we're getting close.

0 Brian Flinn over 15 years ago in reply to Brian Flinn

Expert 1180 points

Hi Joe

Not to change the subject, but I did try the audio examples from the pspdrivers_01_20_00\packages\ti\pspiom\examples\evm6747\audio project.

I had to change the BUFLEN in audioSample_io.c from 2560 to 32... note 16 wont work? Then I set a brake point in the file Mcasp.c in the function mcaspConfigureFifo and manually set chanHandle->enableHwFifo to FALSE for both transmit and receive setup. The difference between enableHwFifo TRUE total audio latency is 4ms, to enableHwFifo FALSE total audio latency is 2.55ms, difference of 1.45ms... I haven’t been able to find and documentation on “how big” the Mcasp hardware FIFO actually is ?? Just food for thought...

Brian

0 Joe Coombs over 15 years ago in reply to Brian Flinn

TI__Expert 8890 points

Brian,

That's an interesting observation. I've been taking some measurements of my own, and this is a summary of what I've found so far.

Even going sample-by-sample, it appears that there is a minimum latency of 0.88 ms on the EVM
1. Measured using BSL example code (very low-level; no driver or buffers)
2. It's possible we could reduce (or increase) this latency by reconfiguring the audio codec chip (AIC3106)
3. I don't recommend going sample-by-sample for actual audio processing; C6000 DSPs are generally more efficient when operating on buffers of data
The "worst case" latency we should ideally see is equal to the board latency (0.88 ms) plus twice the buffer period (i.e. 1 / 44.1 kHz times the length of the buffer)
1. For a 32-sample buffer, this works out to 2.33 ms
2. For a 16-sample buffer, it's 1.61 ms
Your change to the PSP driver example application (i.e. disabling the McASP hardware FIFO) gives audio latency very close to the "ideal" latency I calculated above
We can't make the same change to the C6Flo generated code; it appears to be incompatible with some of the other settings we've applied
I found another change we can make to the C6Flo application to improve its latency
1. Basically, I moved the last for loop from each of the audio _create functions into the respective audio _init functions instead. This means there's less delay between priming the audio input and audio output drivers, which in turn reduces latency as the application continually "refills" the buffers at run time
2. With this change, the C6Flo app has better latency than the standard PSP example app
3. The latency is not as good as the PSP example app with the FIFO disabled
The best latency I can achieve with the PSP example app and the C6Flo app are as follows:
1. PSP app: 1.72 ms (16-element buffers; FIFO disabled)
2. C6Flo app: 2.40 ms (16-element buffers; code moved from create fxns to init fxns as above)
3. Note that both are slightly worse than the "ideal" latency of 1.61 ms that we "should" achieve with 16-element buffers

This has been an interesting problem to debug. I think that it should still be possible to improve the C6Flo application to get the same latency as the PSP example app, but it will probably take me some time to figure out the appropriate driver settings. I thought I'd share my current progress in the meantime. By time we're done, I think we'll have enough material to make a pretty interesting article on the embedded processors wiki. We've also identified some opportunities to improve the C6Flo audio in/out blocks.

0 Brian Flinn over 15 years ago in reply to Joe Coombs

Expert 1180 points

Thanks Joe

I think for the time being I'm going to work with the PSP example, as I have other issues with C6Flo, see (C67x Single Core DSP Forum: C6Flo and IIR filters posted by me).

If I set the PSP example to 16-element buffers it does not work for me. I must use 32 at least. What am I doing wrong? By not work, I'm saying no audio etc.
If you disabled the FIFO how did you do it? I had to manually brake point the Mcasp.c libaray code and set it to false... I cant seem to locate “backtrack” where the parameter(s) are or are being set by the PSP app.. Please advise...
Where is the BSL example code located?
>“The "worst case" latency we should ideally see is equal to the board latency (0.88 ms) plus twice the buffer period (i.e. 1 / 44.1 kHz times the length of the buffer)” Twice the buffer length because there are 2 of them?

Thanks again Joe we're getting there...

0 Joe Coombs over 15 years ago in reply to Brian Flinn

TI__Expert 8890 points

Brian,

That makes sense; the PSP example is closer to where you want to go. This discussion has already led to a number of improvements in the C6Flo tool, so I thank you for that.

In the PSP example, the buffer length and FIFO usage can be controlled modifying defines and structs in the audioSample_io.c source file:

#define BUFLEN   16 /* number of samples in the frame */
#define BUFALIGN 128 /* alignment of buffer for L2 cache */

#define BUFSIZE (BUFLEN * sizeof(Ptr))

#define NUM_BUFS 2   /* Num Bufs to be issued and reclaimed */

// ...

Mcasp_ChanParams mcasp_chanparam[Audio_NUM_CHANS]=
{
    {
        0x0001,
        {Mcasp_SerializerNum_0, },
        (Mcasp_HwSetupData *)&mcaspRcvSetup,
        TRUE,
        Mcasp_OpMode_TDM,
        Mcasp_WordLength_32,
        NULL,
        0,
        NULL,
        NULL,
        1,
        Mcasp_BufferFormat_INTERLEAVED,
        FALSE, // (FIFO enable)
        TRUE
    },
    {
        0x0001,
        {Mcasp_SerializerNum_5,},
        (Mcasp_HwSetupData *)&mcaspXmtSetup,
        TRUE,
        Mcasp_OpMode_TDM,
        Mcasp_WordLength_32,
        NULL,
        0,
        NULL,
        NULL,
        1,
        Mcasp_BufferFormat_INTERLEAVED,
        FALSE, // (FIFO enable)
        TRUE
    }
};

Note that you must still use 128-byte alignment, even for small buffer sizes. Let me know if you're still having trouble getting the app to run with 16-element buffers.

The BSL examples may be a little tricky to locate, depending on how (or if) they were installed. On my PC, they're located at:

C:\CCStudio_v3.3\boards\evmc6747_v1\dsp\tests

Generally speaking, I would expect them to be located near your GEL file. If you can't find them, you may need to download them from the EVM page on Spectrum Digital's website. (Look for "test code" in the software section.)

My "ideal" latency uses 2 times the buffer period because I want to account for the input and output buffers. While there are 2 input buffers and 2 output buffers, only one of each is "active" at any moment.

Processors

Processors forum

OMAP-L137 EVM and c6Flo Audio sample latency.