[FAQ] AM2754-Q1: How to create a low latency audio application using McASP and minimal number of audio buffers?

Part Number: AM2754-Q1

I want to create a low latency audio application that uses minimal audio buffers. The default mcasp-multichannel-playback example in AM275x/AM62Dx uses 4 buffers for Transmit and Receive. These buffers intruduce an additional delay to the input-output latency of the system. Is there any way to do it with less number of buffers? What are the changes it need to make to convert the default mcasp-multichannel-playback SDK example into a low latency one?

  • LOW LATENCY MCASP MULTICHANNEL PLAYBACK IMPLEMENTATION GUIDE
    From 4-Buffer to 2-Buffer Architecture
    -----------------------------------------------------------------------------------------------------------------

    1. OVERVIEW

    ===========================================================

    This guide explains how to convert the default MCASP multichannel playback
    example to a low-latency version optimized for real-time audio processing.

    Default Example:
    - Uses 4 physical buffers for both TX and RX
    - Simple callback-based re-submission
    - Total latency: 4 buffer periods
    - No explicit audio processing hook

    Low Latency Version:
    - Uses only 2 physical buffers [ less memory and less delay ]
    - Maintains 4 transaction objects (queued in MCASP driver)
    - Semaphore-based synchronization for audio processing
    - Total latency: ~2 buffer periods
    - Includes placeholder for audio processing algorithms

    2. ARCHITECTURE COMPARISON
    -------------------------------------------------

    DEFAULT ARCHITECTURE (4-Buffer System):
    ----------------------------------------------------------------


    Physical Memory:
    RX Buffers: [Buf0] [Buf1] [Buf2] [Buf3]
    TX Buffers: [Buf0] [Buf1] [Buf2] [Buf3]

    Transaction Flow:
    Txn0 -> Buf0, Txn1 -> Buf1, Txn2 -> Buf2, Txn3 -> Buf3
    (One-to-one mapping)

    Callback Behavior:
    - TX callback: Submits same transaction to RX queue
    - RX callback: Submits same transaction to TX queue
    - Processing happens implicitly in callbacks

    Latency: 4 buffer periods (4 x buffer_size / sample_rate)


    LOW LATENCY ARCHITECTURE (2-Buffer + 4-Transaction System):
    ------------------------------------------------------------------------------------------------


    Physical Memory:

    RX Buffers: [Buf0] [Buf1] (Only 2 buffers!)
    TX Buffers: [Buf0] [Buf1] (Only 2 buffers!)

    Transaction Flow:

    Txn0 -> Buf0, Txn1 -> Buf1, Txn2 -> Buf0, Txn3 -> Buf1
    (Transactions alternate between 2 buffers using modulo operation)

    Callback Behavior:

    - TX callback: Empty (no action)
    - RX callback: Signals main thread via semaphore
    - Main thread: Performs audio processing and re-submits transactions

    Latency: ~2 buffer periods

    - 2 buffers actively in MCASP DMA queues
    - Processing time is negligible (0.05-0.2 ms << buffer period)
    - C75x DSP processes much faster than DMA buffer transfer


    Key Advantage:

    The low latency design reduces memory usage by 50% and decreases latency by
    50% while maintaining a robust double-buffering scheme with pre-queued
    transactions for glitch-free operation.

    3. LATENCY ANALYSIS
    -----------------------------------

    For the 8-channel configuration:
    - Buffer Size: 2048 bytes
    - Sample Rate: 48 kHz
    - Channels: 8 (multi-channel)
    - Bit Depth: 32-bit (4 bytes per sample)

    Bytes per frame = 8 channels × 4 bytes = 32 bytes
    Frames per buffer = 2048 / 32 = 64 frames
    Buffer period = 64 / 48000 = 1.33 ms

    DEFAULT LATENCY:
    Total = 4 buffers × 1.33 ms = 5.33 ms

    LOW LATENCY VERSION:
    Queue Depth: 2 buffers in MCASP DMA queues
    Processing: ~0.05 - 0.2 ms (negligible vs 1.33 ms buffer period)
    Total ≈ 2 buffers × 1.33 ms = 2.67 ms

    Note: Processing on C75x @ 1 GHz is much faster than buffer period,
    so it doesn't add a full buffer period to latency.

    Latency Reduction: 5.33 - 2.67 = 2.66 ms (50% improvement)

    4. STEP-BY-STEP IMPLEMENTATION GUIDE
    ----------------------------------------------------------------

    STEP 1: Modify Buffer and Transaction Count Macros
    -----------------------------------------------------------------------------

    #define APP_MCASP_AUDIO_BUFF_COUNT (2U)
    #define APP_MCASP_AUDIO_BUFF_SIZE (2048U)
    
    #define APP_MCASP_AUDIO_TRANSACTION_COUNT (4U)

    EXPLANATION:
    - Reduce physical buffer count from 4 to 2 to save memory
    - Introduce separate transaction count of 4 to maintain queue depth
    - This decouples buffer memory from transaction queue management


    STEP 2: Update Transaction Array Declarations
    --------------------------------------------------------------------

     

    MCASP_Transaction gMcaspAudioTxnTx[APP_MCASP_AUDIO_TRANSACTION_COUNT] = {0};
    MCASP_Transaction gMcaspAudioTxnRx[APP_MCASP_AUDIO_TRANSACTION_COUNT] = {0};

    EXPLANATION:
    - Transaction arrays now sized for 4 transactions instead of 4 buffers
    - Allows more transactions than physical buffers


    STEP 3: Add Semaphore and Tracking Variables
    ---------------------------------------------------------------------

    ADD THE FOLLOWING:

    SemaphoreP_Object gBufferReadySem;
    volatile uint32_t transaction_index = 0;
    
    uint8_t *aweRxBufPtr = NULL;
    uint32_t aweRxBufIndex = 0;

    EXPLANATION:
    - SemaphoreP_Object: Binary semaphore for signaling between ISR and main task
    - transaction_index: Tracks which transaction completed (set by RX callback)
    - aweRxBufPtr/aweRxBufIndex: Placeholder variables for future audio
    processing framework integration (e.g., Audio Weaver)


    STEP 4: Initialize Semaphore in Main Function
    -------------------------------------------------------------------

    ADD:
     

    SemaphoreP_constructBinary(&gBufferReadySem, 0);

    EXPLANATION:
    - Creates binary semaphore initialized to 0 (not available)
    - Will be posted by RX callback when buffer is ready
    - Main thread will pend on this semaphore


    STEP 5: Modify Transaction Initialization (TX)
    -------------------------------------------------------------------

     

    for (i = 0U; i < APP_MCASP_AUDIO_TRANSACTION_COUNT; i++)
    {
        gMcaspAudioTxnTx[i].buf = (void*) &gMcaspAudioBufferTx[(i)%APP_MCASP_AUDIO_BUFF_COUNT][0];
        gMcaspAudioTxnTx[i].count = APP_MCASP_AUDIO_BUFF_SIZE/4;
        gMcaspAudioTxnTx[i].timeout = 0xFFFFFF;
        gMcaspAudioTxnTx[i].args = (void *)i;
        MCASP_submitTx(mcaspHandle, &gMcaspAudioTxnTx[i]);
    }

    EXPLANATION:
    - Loop now iterates 4 times (APP_MCASP_AUDIO_TRANSACTION_COUNT)
    - Buffer assignment uses modulo: (i)%APP_MCASP_AUDIO_BUFF_COUNT
    * Transaction 0 -> Buffer 0
    * Transaction 1 -> Buffer 1
    * Transaction 2 -> Buffer 0 (wraps around)
    * Transaction 3 -> Buffer 1 (wraps around)
    - args field stores transaction index for tracking in callbacks


    STEP 6: Modify Transaction Initialization (RX)
    ------------------------------------------------------------------

     

    for (i = 0U; i < APP_MCASP_AUDIO_TRANSACTION_COUNT; i++)
    {
        gMcaspAudioTxnRx[i].buf = (void*) &gMcaspAudioBufferRx[(i)%APP_MCASP_AUDIO_BUFF_COUNT][0];
        gMcaspAudioTxnRx[i].count = APP_MCASP_AUDIO_BUFF_SIZE/4;
        gMcaspAudioTxnRx[i].timeout = 0xFFFFFF;
        gMcaspAudioTxnRx[i].args = (void *)i;
        MCASP_submitRx(mcaspHandle, &gMcaspAudioTxnRx[i]);
    }

    EXPLANATION:
    - Same logic as TX transaction initialization
    - Ensures RX and TX transactions map to corresponding buffer pairs


    STEP 7: Replace User Input Loop with Processing Loop
    ---------------------------------------------------------------------------------

    uint32_t loop = 1, transaction_index1 = 0;
    while(loop)
    {
        int32_t status = SemaphoreP_pend(&gBufferReadySem, SystemP_WAIT_FOREVER);
        if (status == SystemP_SUCCESS)
        {
            void * src_buf = gMcaspAudioTxnRx[transaction_index].buf;
            void * dest_buf = gMcaspAudioTxnTx[transaction_index].buf;
            
            CacheP_inv(src_buf, APP_MCASP_AUDIO_BUFF_SIZE, CacheP_TYPE_ALL);
            
            memcpy(dest_buf, src_buf, APP_MCASP_AUDIO_BUFF_SIZE);
            // Audio processing algorithm here.
            
            CacheP_wb(dest_buf, APP_MCASP_AUDIO_BUFF_SIZE, CacheP_TYPE_ALL);
            
            MCASP_submitRx(mcaspHandle, &gMcaspAudioTxnRx[transaction_index]);
            MCASP_submitTx(mcaspHandle, &gMcaspAudioTxnTx[transaction_index]);
        }
    }

    EXPLANATION:
    - Main thread now enters infinite processing loop
    - Waits on semaphore posted by RX callback
    - When signaled:
    1. Gets buffer pointers from completed transaction
    2. Invalidates RX cache (ensures fresh data from DMA)
    3. Performs audio processing (currently just memcpy as placeholder)
    4. Writes back TX cache (ensures processed data visible to DMA)
    5. Re-submits both RX and TX transactions to MCASP queue

    NOTE: To stop demo, change "loop = 1" to "loop = 0" or add exit condition


    STEP 8: Modify TX Callback Function
    ------------------------------------------------------

     

    void mcasp_txcb(MCASP_Handle handle,
    MCASP_Transaction *transaction)
    {
        // Empty - no action needed
    }

    EXPLANATION:
    - TX callback no longer needs to do anything
    - Transaction re-submission now handled by main thread
    - Keeps ISR execution time minimal


    STEP 9: Modify RX Callback Function
    ------------------------------------------------------

    volatile uint32_t rxcb_count = 0;
    
    void mcasp_rxcb(MCASP_Handle handle,
    MCASP_Transaction *transaction)
    {
        transaction_index = (uint32_t)transaction->args;
        rxcb_count++;
        SemaphoreP_post(&gBufferReadySem);
    }

    EXPLANATION:
    - Extracts transaction index from transaction->args field
    - Increments callback counter (useful for debugging/profiling)
    - Posts semaphore to wake up main processing thread
    - Does NOT submit transactions (main thread handles this)
    - Minimal ISR execution for real-time performance

    5. KEY CONCEPTS AND DESIGN DECISIONS
    -----------------------------------------------------------------


    5.1 Why 2 Buffers + 4 Transactions?
    ----------------------------------------------------
    - 2 buffers provide true double-buffering (one filling, one processing)
    - 4 transactions ensure MCASP always has work queued (prevents underruns)
    - Modulo mapping allows multiple transactions to share same buffer
    - Result: Memory efficiency + robust queue depth

    5.2 Transaction vs Buffer Relationship
    -------------------------------------------------------
    In MCASP driver architecture:
    - Buffer: Physical memory location for DMA data transfer
    - Transaction: Descriptor containing buffer pointer, size, and metadata
    - Multiple transactions can point to the same buffer at different times
    - Driver maintains separate TX and RX queues of transaction objects

    Transaction Flow:
    [Submit] -> [Queued] -> [Active] -> [Complete] -> [Callback] -> [Re-submit]

    5.3 Cache Coherency Management
    --------------------------------------------------
    On C75x DSP with cache:
    - CacheP_inv(): Invalidate cache before reading DMA-written data (RX)
    - CacheP_wb(): Write back cache after writing data for DMA (TX)
    - Critical for correct data transfer between CPU and peripherals
    - Without proper cache operations, you'll see stale or corrupted data

    5.4 Semaphore-Based Synchronization
    --------------------------------------------------------
    Why use semaphore instead of direct callback processing?
    - Separates ISR context from processing context
    - Allows longer processing time without blocking interrupts
    - Enables integration of complex audio algorithms
    - Provides clear synchronization point for debugging

    5.5 The "args" Field Trick
    -------------------------------------
    MCASP_Transaction structure includes a void* args field:
    - Used here to store transaction index
    - Allows callback to identify which transaction completed
    - Main thread can then access correct buffer pair
    - Essential for maintaining transaction-to-buffer mapping

    6. FULL CODE EXAMPLE
    --------------------------------------

    Adding the mcasp_playback.c file for reference. To try this, just replace this file inside the mcasp_multichannel_playback example. 

    /*
     *  Copyright (C) 2024 Texas Instruments Incorporated
     *
     *  Redistribution and use in source and binary forms, with or without
     *  modification, are permitted provided that the following conditions
     *  are met:
     *
     *    Redistributions of source code must retain the above copyright
     *    notice, this list of conditions and the following disclaimer.
     *
     *    Redistributions in binary form must reproduce the above copyright
     *    notice, this list of conditions and the following disclaimer in the
     *    documentation and/or other materials provided with the
     *    distribution.
     *
     *    Neither the name of Texas Instruments Incorporated nor the names of
     *    its contributors may be used to endorse or promote products derived
     *    from this software without specific prior written permission.
     *
     *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
     *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
     *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
     *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
     *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
     *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
     *  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
     *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
     *  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
     *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
     *  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     */
    
    #include <kernel/dpl/DebugP.h>
    #include <kernel/dpl/AddrTranslateP.h>
    #include <kernel/dpl/ClockP.h>
    #include <kernel/dpl/DebugP.h>
    #include <drivers/i2c.h>
    #include <drivers/gpio.h>
    #include <drivers/mcasp.h>
    #include <board/ioexp/ioexp_tca6424.h>
    #include <drivers/pinmux.h>
    #include "ti_drivers_config.h"
    #include "ti_drivers_open_close.h"
    #include "ti_board_open_close.h"
    
    /* ========================================================================== */
    /*                           Macros & Typedefs                                */
    /* ========================================================================== */
    
    /* Audio buffer settings */
    #define APP_MCASP_AUDIO_BUFF_COUNT  (2U)
    #define APP_MCASP_AUDIO_BUFF_SIZE   (2048U)
    
    #define APP_MCASP_AUDIO_TRANSACTION_COUNT  (4U)
    
    /* ========================================================================== */
    /*                           Global Variables                                 */
    /* ========================================================================== */
    
    /* Create buffers for transmit and Receive */
    uint8_t gMcaspAudioBufferTx[APP_MCASP_AUDIO_BUFF_COUNT][APP_MCASP_AUDIO_BUFF_SIZE] __attribute__((aligned(256)));
    uint8_t gMcaspAudioBufferRx[APP_MCASP_AUDIO_BUFF_COUNT][APP_MCASP_AUDIO_BUFF_SIZE] __attribute__((aligned(256)));
    
    /* Create transaction objects for transmit and Receive */
    MCASP_Transaction   gMcaspAudioTxnTx[APP_MCASP_AUDIO_TRANSACTION_COUNT] = {0};
    MCASP_Transaction   gMcaspAudioTxnRx[APP_MCASP_AUDIO_TRANSACTION_COUNT] = {0};
    
    
    SemaphoreP_Object gBufferReadySem;
    volatile uint32_t transaction_index = 0;
    
    uint8_t *aweRxBufPtr = NULL;
    uint32_t aweRxBufIndex = 0;
    
    /* ========================================================================== */
    /*                        Extern Function Declaration                         */
    /* ========================================================================== */
    int32_t Board_codecConfig(void);
    int32_t Board_clockgenConfig(I2C_Handle handle, uint8_t devAddr);
    
    void mcasp_playback_main(void *args)
    {
        int32_t     status = SystemP_SUCCESS;
        uint32_t    i;
        MCASP_Handle    mcaspHandle;
        char            valueChar;
    
        I2C_Handle      i2cHandle;
        i2cHandle = gI2cHandle[CONFIG_I2C0];
    
    #if defined (SOC_AM275X)
        Pinmux_PerCfg_t i2cPinmuxConfig[] =
        {
            {
                PIN_GPIO1_72,
                ( PIN_MODE(1) | PIN_INPUT_ENABLE | PIN_PULL_DIRECTION  )
            },
            {PINMUX_END, 0U}
        };
    
        Pinmux_config(i2cPinmuxConfig, PINMUX_DOMAIN_ID_MAIN);
    #endif
    
        /* Configure clock generator for getting the external HCLK */
        status = Board_clockgenConfig(i2cHandle, 0x68);
        DebugP_assert(status == SystemP_SUCCESS);
    
        ClockP_usleep(100);
    
        /* Open MCASP driver after enabling the HCLK */
        gMcaspHandle[0] = MCASP_open(0, &gMcaspOpenParams[0]);
        if(NULL == gMcaspHandle[0])
        {
            DebugP_logError("MCASP open failed for instance 0 !!!\r\n");
            DebugP_assert(false);
        }
    
        ClockP_usleep(100);
    
        /* Configure codec */
        status = Board_codecConfig();
        DebugP_assert(status == SystemP_SUCCESS);
    
        DebugP_log("[MCASP] Audio playback example started.\r\n");
    
        mcaspHandle = MCASP_getHandle(CONFIG_MCASP0);
        SemaphoreP_constructBinary(&gBufferReadySem, 0);
    
        /* Prepare and submit audio transaction transmit objects */
        for (i = 0U; i < APP_MCASP_AUDIO_TRANSACTION_COUNT; i++)
        {
            gMcaspAudioTxnTx[i].buf = (void*) &gMcaspAudioBufferTx[(i)%APP_MCASP_AUDIO_BUFF_COUNT][0];
            gMcaspAudioTxnTx[i].count = APP_MCASP_AUDIO_BUFF_SIZE/4;
            gMcaspAudioTxnTx[i].timeout = 0xFFFFFF;
            gMcaspAudioTxnTx[i].args = (void *)i;
            MCASP_submitTx(mcaspHandle, &gMcaspAudioTxnTx[i]);
        }
    
        /* Prepare and submit audio transaction receive objects */
        for (i = 0U; i < APP_MCASP_AUDIO_TRANSACTION_COUNT; i++)
        {
            gMcaspAudioTxnRx[i].buf = (void*) &gMcaspAudioBufferRx[(i)%APP_MCASP_AUDIO_BUFF_COUNT][0];
            gMcaspAudioTxnRx[i].count = APP_MCASP_AUDIO_BUFF_SIZE/4;
            gMcaspAudioTxnRx[i].timeout = 0xFFFFFF;
            gMcaspAudioTxnRx[i].args = (void *)i;
            MCASP_submitRx(mcaspHandle, &gMcaspAudioTxnRx[i]);
        }
    
        /* Trigger McASP receive operation */
        status = MCASP_startTransferRx(mcaspHandle);
        DebugP_assert(status == SystemP_SUCCESS);
    
        /* Trigger McASP transmit operation */
        status = MCASP_startTransferTx(mcaspHandle);
        DebugP_assert(status == SystemP_SUCCESS);
    
    
    
        uint32_t loop = 1, transaction_index1 = 0;
        while(loop){
            int32_t status = SemaphoreP_pend(&gBufferReadySem, SystemP_WAIT_FOREVER);
            if (status == SystemP_SUCCESS)
            {
                void * src_buf  = gMcaspAudioTxnRx[transaction_index].buf;
                void * dest_buf = gMcaspAudioTxnTx[transaction_index].buf;
    
                CacheP_inv(src_buf, APP_MCASP_AUDIO_BUFF_SIZE, CacheP_TYPE_ALL);
    
                memcpy(dest_buf, src_buf, APP_MCASP_AUDIO_BUFF_SIZE);
                // Audio processing algorithm here.
    
                CacheP_wb(dest_buf, APP_MCASP_AUDIO_BUFF_SIZE, CacheP_TYPE_ALL);
    
                MCASP_submitRx(mcaspHandle, &gMcaspAudioTxnRx[transaction_index]);
                MCASP_submitTx(mcaspHandle, &gMcaspAudioTxnTx[transaction_index]);
    
            }
        }
    
        MCASP_stopTransferTx(mcaspHandle);
        MCASP_stopTransferRx(mcaspHandle);
    
        DebugP_log("Exiting demo\r\n");
    }
    
    void mcasp_txcb(MCASP_Handle handle,
                              MCASP_Transaction *transaction)
    {
    
    }
    
    volatile uint32_t rxcb_count = 0;
    void mcasp_rxcb(MCASP_Handle handle,
                              MCASP_Transaction *transaction)
    {
        transaction_index = (uint32_t)transaction->args;
        rxcb_count++;
        SemaphoreP_post(&gBufferReadySem);
    }