This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM335x McASP waveform mangled with DMA transfer to buffer

Hello folks. We have used the AM335x in a hardware design for an arbetrary waveform module (mostly based on the ICEv2.1). Presently, we have the board properly playing back waveform data on 8 channels using the McASP. We use the following playback design:

  1. Data is loaded from the SD card into a series of 8 individual waveform buffers in the heap.
  2. DMA to McASP is set up with a pair of ping-pong buffers and linked DMA parameters to spool the data directly from the already-sorted buffer to the audio DAC. On parameter swap, an ISR is fired to load the buffer that is not being used. Buffer length is sufficient for 5ms of playback at 192 kHz.
  3. During initial setup, or when the ISR is fired, we calculate a series of linked transfers (between 8 and 16 DMA parameters, depending on whether we are wrapping around to the beginning again) to sort the data from the individual waveform buffers (which may be different lengths) into the corresponding packed order in the playback ping-pong buffer

As per above, if we load the wave data once and then start playback, everything works properly. However, we wanted to extend to dynamically generate a wavefile based on a user-defined frequency/period/etc. over EtherCAT, which can change at any time. The approach I came up with was:

  1. Create a 'scratch area' for the wave data; initially 3 buffer widths wide (extended to 4 as I debugged further). This data is sorted into the ping pong buffers uses the exact same algorithm as complete wave files above.
  2. When the ISR sets up the DMA transfers for the wave playback, set a flag to tell the foreground EtherCAT frame (at 1ms framerate) to load more data. It already has to keep track of where the NEXT update is to start from.
  3. As soon as the foreground thread sees the flag, it will clear the flag calculate the next slice of the waveform in the scratch area. It will be writing the segment ahead of what is used for playback, so there should be no overlap. I later changed it to work two buffers ahead of DMA to be paranoid. Since the foreground thread runs at 1ms, it is essentially guaranteed to be able to complete this calculation before the next 5 ms ISR.

What I'm actually seeing happening is only some variable-sized segments of my waveform actually get transferred correctly from my scratch area to the ping-pong buffer. The corrupted areas appear to keep the initial data I loaded on startup. I also noticed that when the updates were occurring, it was always the same sections of the waveform that would update; the sections left at the default would ALWAYS stay at default.

As part of my testing, I actually slowed down my update routine so I would only write the next segment to the scratch area once a second, while repeatedly DMAing the 'stable' 4-buffer segment of the waveform. This has no effect on the behaviour. The following is an oscilloscope capture of the output waveform (yellow) while a reference 50 Hz pre-calculated sine wave is being played on channel 2 (blue).

The test waveform is a very simple sawtooth wavefile which will count from 0 to 19200 at a rate of 1 per sample, then loop back to zero. You can see above the initial section of the waveform is mostly present, with the 'correct' updated waveform only partially visible. Some segments are more complete than others, but the general trend is visible.


I stopped the code with the debugger and inspected both the waveform buffer and the playback buffers.A cute side-effect of the fact that playback is strictly ping-pong DMA is that the output would continue to loop on the contents of the two buffers. Here is the reference scope trace of these two buffers to correspond with the later plots I generated of the data dumps:

The sine wave actually provides a very good guide for the buffer boundaries; the rising section of the SINE is one buffer, while the falling is another. I've aligned those with the scope division lines. You will see that the first buffer has a large number of glichy data points, but it does appear to contain more of the updated waveform. At first glance, the second buffer appears to contain almost none of the waveform, until you notice that the one spike is just before the end of playback; that is the only data sample correctly updated in that part of the file.

The waveform buffer exactly matches what I wrote into it. The two ping-pong buffers, however, *mostly* match the output. I say mostly, because there are blocks of the data that the debugger appeared unable to read and wrote zeroes. Viewing in a memory window, no errors were reported; it just looked like an 'actual' zero. The debugger also seemed to read zeroes for ALL channels in the output buffer, not just the one I was having problems generating. However, with the scope data to prove what actually went to the Audio DAC -- especially the uninterrupted sine segment -- I can confirm that the data must have been there still.

Since the waveform contains all 4 buffer widths, it also contains data that has not yet been loaded to the ping-pong buffers, as illustrated in the plot. I also combined both ping-pong buffer contents into a single plot to show a direct comparison with the oscilloscope view. I'm also attaching the excel file with for reference; that will contain a highlight where the buffer boundary exists.

Combined Data.xlsx

So, any ideas what might be happening? This seems to be to be something to do with cache, but I'm not sure how this ends up with the CPU and the DMA getting a different view of the same location in memory. The error appears to persist as well; when it was slowed down to 1 second writes, the DMA would have plenty of additional transfers to get the written data, but always saw the same result.


I will also be added a second post with specific code examples for the above architecture. This post may not get completed until tomorrow however.

  • Here is the promised code writeup. Note that I've stripped out some debug printouts to detect certain error conditions to keep the below listing as brief as I can make it. First, a series of #defines and definitions needed by the later code snippets:

    static uint32_t mcaspEdmaTxParameters[] = {8, 10};
    
    #define MCASP_NUM_SER                         (4)
    
    #define MCASP_BUFFER1_EDMA_PARAM 70
    #define MCASP_BUFFER2_EDMA_CH_PARAM 20
    #define MCASP_TRANSFER_EDMA_CH_PARAM 21
    
    #define MCASP_TRANSFER_EDMA_PARAM_START 71
    #define MCASP_TRANSFER_EDMA_MAX_PARAM 16
    
    #define MCASP_FIFO_WIDTH                      (4)
    #define CS4384_NUM_CHANNELS 8
    #define CS4384_BUFFERSIZE 30720
    #define CS4384_BUFFERRECORDS (CS4384_BUFFERSIZE / (MCASP_FIFO_WIDTH * CS4384_NUM_CHANNELS))
    
    #define WAVE_FORMAT_PCM 1
    #define ARB_SAMPLE_RATE 192000
    #define ARB_BYTE_WIDTH 2
    #define ARB_STORAGE_SIZE (CS4384_BUFFERRECORDS * 4 * ARB_BYTE_WIDTH)
    
    typedef struct wavfmtchunk
    {
        int16_t wFormatTag;
        uint16_t nChannels;
        uint32_t nSamplesPerSec;
        uint32_t nAvgBytesPerSec;
        uint16_t nBlockAlign;
        uint16_t wBitsPerSample;
        uint16_t cbSize;
        uint16_t wValidBitsPerSample;
        uint32_t dwChannelMask;
    
        // First two bytes of subformat are the data format code
        uint16_t SubFormatType;
        uint8_t SubFormat[14];
    } wavfmtchunk;
    
    typedef struct wavefile
    {
        wavfmtchunk format;
    
        size_t size;
        uint8_t * data;
    
        size_t playbackpos;
        size_t waveformpos;
    
        #ifdef DEBUG_ARB_UNSYNC_LOAD
        size_t loadpos;
        #endif
    
        bool needNextSegment;
    
    } wavefile;
    
    typedef struct DevCS4384
    {
        uint32_t sampleRate;
        uint16_t bitDepth;
        uint16_t slotSize;
        uint32_t playbackStartFrame;
        bool playbackEnabled;
    
        bool channelEnabled[CS4384_NUM_CHANNELS];
        bool channelLoaded[CS4384_NUM_CHANNELS];
        float channelVolume[CS4384_NUM_CHANNELS];
        wavefile chWaveData[CS4384_NUM_CHANNELS];
    
        I2CDev * i2cInstance;
        uint32_t mcASPinstance;
        uint32_t mclkRateHz;
    
        Gpio resetPin;
        Gpio clockEn;
    
        uint8_t buffer1[CS4384_BUFFERSIZE];
        uint8_t buffer2[CS4384_BUFFERSIZE];
    
        uint32_t isr1cnt;
        uint32_t isr2cnt;
        uint32_t transfercnt;
    
        uint32_t expectedcount;
    
        uint8_t lastMuteCommand;
        uint8_t lastChannelVolume[CS4384_NUM_CHANNELS];
    
    } DevCS4384;
    

    Here is the initial configuration of the EDMA channels and interrupt:

    static void DevCS4384EDMAInit(DevCS4384 * dev)
    {
        int32_t status;
    
        EDMAInit(SOC_EDMA30CC_0_REGS, 0);
    
        EDMAChConfig_t chConfig = {
                .region = 0,
                .paramIdx = mcaspEdmaTxParameters[dev->mcASPinstance],
                .queueNum = 0,
                .enableEvt = false,
                .enableIntr = true
        };
    
        // Allocate the channels for DMA transfers as used by the audio device
        status = EDMAChConfig(SOC_EDMA30CC_0_REGS, EDMA_CH_TYPE_DMA,
                mcaspEdmaTxParameters[dev->mcASPinstance], &chConfig);
    
        chConfig.paramIdx = MCASP_TRANSFER_EDMA_CH_PARAM;
        status = EDMAChConfig(SOC_EDMA30CC_0_REGS, EDMA_CH_TYPE_DMA,
                MCASP_TRANSFER_EDMA_CH_PARAM, &chConfig);
    
        chConfig.paramIdx = MCASP_BUFFER2_EDMA_CH_PARAM;
        status = EDMAChConfig(SOC_EDMA30CC_0_REGS, EDMA_CH_TYPE_DMA,
                MCASP_BUFFER2_EDMA_CH_PARAM, &chConfig);
    
        // Configure interrupt handler:
        Hwi_Params playbackParams;
        Hwi_Params_init(&playbackParams);
        playbackParams.enableInt = true;
        playbackParams.arg = (UArg)dev;
        playbackParams.priority = 254; // Lower the priority of this HWI...
    
        Hwi_create(SYS_INT_EDMACOMPINT, (Hwi_FuncPtr) PlaybackIsr, &playbackParams, NULL);
    }
    

    Here is configuring the ping-pong buffer and McASP EDMA parameters. Note this also uses DevCS4384LoadBuffer() to load the buffers. This is the same method the ISR uses and will be shown later.

    void DevCS4384PreparePlayback(DevCS4384 * dev, bool waitfortransfer)
    {
        int i;
        uint32_t status;
    
        dev->transfercnt = 0;
        for (i = 0; i < CS4384_NUM_CHANNELS; ++i)
        {
            dev->chWaveData[i].playbackpos = 0;
        }
    
        // Zero out and load the initial contents of the two playback buffers
        memset(dev->buffer1, 0, CS4384_BUFFERSIZE);
        memset(dev->buffer2, 0, CS4384_BUFFERSIZE);
    
        DevCS4384LoadBuffer(dev, 0, waitfortransfer);
        DevCS4384LoadBuffer(dev, 1, waitfortransfer);
    
        // Prepare the DMA settings for the playback
        {
            // Also note that any unused parameters will be zeroed by the compiler (e.g. CIDX).
            EDMAParamDataConfig_t bufferConfig = {
                .addrMode = DMA_XFER_DATA_ADDR_MODE_INC,
                .addrOff = {
                    .addr = (uint32_t)dev->buffer1,
                    .bCntIdx = MCASP_FIFO_WIDTH * MCASP_NUM_SER,
                },
                .size = {
                    .aCnt = MCASP_FIFO_WIDTH * MCASP_NUM_SER,
                    .bCnt = CS4384_BUFFERSIZE / (MCASP_FIFO_WIDTH * MCASP_NUM_SER),
                    .cCnt = 1
                },
                .syncType = EDMA_PARAM_SYNC_TYPE_A
            };
            EDMAParamDataConfig_t destinationConfig = {
                .addrMode = DMA_XFER_DATA_ADDR_MODE_INC,
                .addrOff = {
                    .addr = mcaspDataRegisters[dev->mcASPinstance],
                }
            };
            EDMAParamConfig_t param = {
                .pSrc = &bufferConfig,
                .pDst = &destinationConfig,
                .privType = EDMA_PARAM_PRIV_LVL_USER,
                .privId = 0,
                .enableLink = true,
                .enableStatic = false,
                .linkAddr = MCASP_BUFFER2_EDMA_CH_PARAM,
                .enableChain = false,
                .chainMask = EDMA_PARAM_XFER_TRIGGER_MASK_NONE,
                .tccMode = EDMA_PARAM_TCC_MODE_NORMAL,
                .tcc = mcaspEdmaTxParameters[dev->mcASPinstance],
                .intrMask = EDMA_PARAM_XFER_TRIGGER_MASK_COMPLETE
            };
    
            status = EDMAParamConfig(SOC_EDMA30CC_0_REGS, mcaspEdmaTxParameters[dev->mcASPinstance], &param);
    
            // Buffer 1 has the same contents as the initial PaRAM:
            status = EDMAParamConfig(SOC_EDMA30CC_0_REGS, MCASP_BUFFER1_EDMA_PARAM, &param);
    
            // Buffer 2 needs to change the link and interrupt channels as well as source address
            bufferConfig.addrOff.addr = (uint32_t)dev->buffer2;
            param.linkAddr = MCASP_BUFFER1_EDMA_PARAM;
            param.tcc = MCASP_BUFFER2_EDMA_CH_PARAM;
    
            status = EDMAParamConfig(SOC_EDMA30CC_0_REGS, MCASP_BUFFER2_EDMA_CH_PARAM, &param);
        }
    }
    

    Here is the ISR itself. As above, it uses DevCS4384LoadBuffer() to sort/load the next segment into the idle ping-pong buffer.

    static void PlaybackIsr(DevCS4384 * dev)
    {
        uint32_t value = 0;
    
        // Check which EDMA interrupt bits are set. This is a somewhat simplified implementation to only
        // check the channels we intend to use, and to not necessarily loop and clear all if we get
        // another set during the ISR. The ISR will immediately re-run in that case anyhow.
        value = EDMAIntrStatus(SOC_EDMA30CC_0_REGS, 0, EDMA_CH_SET_0_31);
    
        if (value & (1 << MCASP_TRANSFER_EDMA_CH_PARAM))
        {
            // Just count the number of transfer completes.
            //This is used for the wait for transfer complete feature
            ++dev->transfercnt;
            EDMAIntrClear(SOC_EDMA30CC_0_REGS, 0, MCASP_TRANSFER_EDMA_CH_PARAM);
        }
        if (value & 1 << mcaspEdmaTxParameters[dev->mcASPinstance])
        {
            // Buffer 1 finished transfer and buffer 2 is now in use. Re-load buffer 1
            ++dev->isr1cnt;
            DevCS4384LoadBuffer(dev, 0, false);
            EDMAIntrClear(SOC_EDMA30CC_0_REGS, 0, mcaspEdmaTxParameters[dev->mcASPinstance]);
        }
        if (value & 1 << MCASP_BUFFER2_EDMA_CH_PARAM)
        {
            // Buffer 2 finished transfer and buffer 1 is now in use. Re-load buffer 2
            ++dev->isr2cnt;
            DevCS4384LoadBuffer(dev, 1, false);
            EDMAIntrClear(SOC_EDMA30CC_0_REGS, 0, MCASP_BUFFER2_EDMA_CH_PARAM);
        }
    }
    

    Here is the load buffer method. It's a bit long, but it has to figure out what wave format and sizes to use to sort the data correctly into the ping-pong playback buffers. It also needs to detect rollover and set up a second transfer to handle the remaining data in that case.

    static void DevCS4384LoadBuffer(DevCS4384 * dev, uint_fast8_t buffernum, bool waitfortransfer)
    {
        uint32_t beforecnt = dev->transfercnt;
        int i;
        int paramnum = 0;
        int curparam = 0;
        wavefile * wave;
        uint8_t * buffer;
        uint32_t status;
    
        // Prepare initial EDMA PaRAM set with common options
        EDMAParamDataConfig_t indata = {
            .addrMode = DMA_XFER_DATA_ADDR_MODE_INC,
            .syncType = EDMA_PARAM_SYNC_TYPE_A,
            .size = { .cCnt = 1 }
        };
        EDMAParamDataConfig_t outdata = {
            .addrMode = DMA_XFER_DATA_ADDR_MODE_INC,
            .addrOff = {
                .bCntIdx = MCASP_FIFO_WIDTH * CS4384_NUM_CHANNELS
            }
        };
        EDMAParamConfig_t param = {
            .pSrc = &indata,
            .pDst = &outdata,
            .privType = EDMA_PARAM_PRIV_LVL_USER,
            .privId = 0,
            .enableLink = true,
            .enableStatic = false,
            .enableChain = true,
            .chainMask = EDMA_PARAM_XFER_TRIGGER_MASK_INTERMEDIATE | EDMA_PARAM_XFER_TRIGGER_MASK_COMPLETE,
            .tccMode = EDMA_PARAM_TCC_MODE_NORMAL,
            .tcc = MCASP_TRANSFER_EDMA_CH_PARAM,
            .intrMask = EDMA_PARAM_XFER_TRIGGER_MASK_COMPLETE
        };
    
        // Stop any existing transfer so we can start another
        status = EDMATransferStop(SOC_EDMA30CC_0_REGS, 0, MCASP_TRANSFER_EDMA_CH_PARAM,
                DMA_XFER_TRIGGER_TYPE_MANUAL);
    
        // Count the number of PaRAM needed
        for (i = 0; i < CS4384_NUM_CHANNELS; ++i)
        {
            if (dev->channelLoaded[i])
            {
                wave = &dev->chWaveData[i];
                if (wave->size - wave->playbackpos < CS4384_BUFFERRECORDS * wave->format.nBlockAlign)
                    // Transfer will need to wrap around
                    paramnum += 2;
                else
                    // Transfer will not wrap around
                    ++paramnum;
            }
        }
    
        // Select the correct output buffer for DMA.
        if (buffernum == 0)
            buffer = dev->buffer1;
        else
            buffer = dev->buffer2;
    
        // Prepare the actual PaRAM data.
        for (i = 0; i < CS4384_NUM_CHANNELS; ++i)
        {
            if (dev->channelLoaded[i])
            {
                wave = &dev->chWaveData[i];
    
                indata.addrOff.addr = (uint32_t)(wave->data + wave->playbackpos);
                indata.addrOff.bCntIdx = wave->format.nBlockAlign;
                indata.size.aCnt = wave->format.nBlockAlign;
                outdata.addrOff.addr = (uint32_t)(buffer + i * MCASP_FIFO_WIDTH);
    
                if (wave->size - wave->playbackpos
                        < CS4384_BUFFERRECORDS * wave->format.nBlockAlign)
                {
                    // Transfer will need to wrap around. Configure first PaRAM now:
                    indata.size.bCnt = (wave->size - wave->playbackpos) / wave->format.nBlockAlign;
    
                    // Link to the next param
                    param.linkAddr = MCASP_TRANSFER_EDMA_PARAM_START + curparam;
    
                    status = EDMAParamConfig(SOC_EDMA30CC_0_REGS, DevCS4384BufferPaRAM(curparam), &param);
                    ++curparam;
    
                    // Now configure second PaRAM
                    indata.addrOff.addr =(uint32_t)wave->data;
                    outdata.addrOff.addr = (uint32_t)(buffer + i * MCASP_FIFO_WIDTH + indata.size.bCnt * outdata.addrOff.bCntIdx);
                    indata.size.bCnt = CS4384_BUFFERRECORDS - indata.size.bCnt;
                    wave->playbackpos = indata.size.bCnt * wave->format.nBlockAlign;
    
                    // Link to the next PaRAM. If this is last, disable the link and final chain
                    param.linkAddr = MCASP_TRANSFER_EDMA_PARAM_START + curparam;
                    if (curparam >= paramnum - 1)
                    {
                        param.enableLink = false;
                        param.chainMask = EDMA_PARAM_XFER_TRIGGER_MASK_INTERMEDIATE;
                    }
    
                    status = EDMAParamConfig(SOC_EDMA30CC_0_REGS, DevCS4384BufferPaRAM(curparam), &param);
                    ++curparam;
                }
                else
                {
                    // Transfer will not wrap around
                    indata.size.bCnt = CS4384_BUFFERRECORDS;
                    wave->playbackpos += indata.size.bCnt * wave->format.nBlockAlign;
    
                    // Link to the next PaRAM. If this is last, disable the link and final chain
                    param.linkAddr = MCASP_TRANSFER_EDMA_PARAM_START + curparam;
                    if (curparam >= paramnum - 1)
                    {
                        param.enableLink = false;
                        param.chainMask = EDMA_PARAM_XFER_TRIGGER_MASK_INTERMEDIATE;
                    }
    
                    status = EDMAParamConfig(SOC_EDMA30CC_0_REGS, DevCS4384BufferPaRAM(curparam), &param);
                    ++curparam;
    
                    // If the transfer aligned exactly with the end of the wave file without needing a wrap-around, reset to the start
                    if (wave->size - wave->playbackpos == 0)
                    {
                        wave->playbackpos = 0;
                    }
                }
    
                // Set a flag for the foreground thread to calculate more data for the wavefile
                if (!waitfortransfer)
                {
                    wave->needNextSegment = true;
                }
            }
        }
    
        status = EDMATransferStart(SOC_EDMA30CC_0_REGS, 0, MCASP_TRANSFER_EDMA_CH_PARAM,
                DMA_XFER_TRIGGER_TYPE_MANUAL);
    
        dev->expectedcount = beforecnt + paramnum;
    
        if (waitfortransfer)
        {
            while (dev->transfercnt < dev->expectedcount )
            {
                // Wait for a transfer complete ISR for each PaRAM in the link set.
            }
        }
    }
    

    Here is the wave data initialization method for arbetrary waveform playback. This is what creates the 4-buffer wide scratch area and sets the playback parameters so the above algorithms will know what bit width/sampling rate we are using.

    int32_t waveArbInit(wavefile * wave)
    {
        int32_t status = E_FAIL;
    
        memset(wave, 0, sizeof(wavefile));
    
        wave->format.wFormatTag = WAVE_FORMAT_PCM;
        wave->format.nChannels = 1;
        wave->format.nSamplesPerSec = ARB_SAMPLE_RATE;
        wave->format.nAvgBytesPerSec = ARB_SAMPLE_RATE * ARB_BYTE_WIDTH;
        wave->format.wBitsPerSample = ARB_BYTE_WIDTH * 8;
        wave->format.nBlockAlign = ARB_BYTE_WIDTH;
    
        wave->data = waveMalloc(ARB_STORAGE_SIZE);
    
        if ( wave->data != NULL )
        {
            wave->size = ARB_STORAGE_SIZE;
            status = S_PASS;
        }
    
        return status;
    }
    

    Note that the above uses a custom malloc-alike function called waveMalloc. The reason for this is because I want to be able to wipe out the 'heap' and start over when the slave transitions from Pre-Op to Safe-Op. I've actually allocated my own very large array (254 MB) using the regular malloc function on the heap, then I'm allocating space from that array to each wave file with waveMalloc.

    Here is the function that is called to generate a new segment of the arbetrary waveform. This function is called from my main thread when wave->needNextSegment is true for the corresponding wave storage area.

    void waveArbGenSine(wavefile * wave, int numbuffers, float amplitude, float period)
    {
        wave->needNextSegment = false;
    
        size_t numsamples = numbuffers * CS4384_BUFFERRECORDS;
        size_t loadpos = wave->playbackpos;
    
        // For normal (not initialization startup), load ahead one sample from the DMA
        if (1 == numbuffers)
        {
            loadpos += CS4384_BUFFERRECORDS * ARB_BYTE_WIDTH;
            if (loadpos >= wave->size) loadpos = 0;
        }
    
        volatile int16_t * overlay = (int16_t *)&wave->data[loadpos];
        size_t i;
        size_t periodSamples = ARB_SAMPLE_RATE * period;
    
        // The following loop assumes the local wave storage is aligned with
        // the playback buffer size (which should be the case by design).
        // As such, it does not attempt to detect wave file rollover
        for (i = 0; i < numsamples; ++i)
        {
            overlay[i] = (i + wave->waveformpos);
        }
        
        wave->waveformpos = (wave->waveformpos + numsamples) % periodSamples;
    }
    

  • Is that Linux or bare metall?

    Are the buffers cachable (fast) or non-cachable (slow)?

    If the buffers are cachable, where is the dma mapping/unmapping code?

  • It is almost bare metal: running SYS/BIOS version 6.42.03.35 with the EtherCAT stack.

    I've placed the buffers in the DDR3 memory along with everything else. Cache is enabled in the sys/bios config file, and I believe SYS/BIOS is supposed to be handling it in general (though I guess not very well!). I didn't write any DMA code myself for cache; all my DMA code is above.

    Here is my sys/bios app.cfg script for reference:

    var Defaults = xdc.useModule('xdc.runtime.Defaults');
    var Diags = xdc.useModule('xdc.runtime.Diags');
    var Error = xdc.useModule('xdc.runtime.Error');
    var Main = xdc.useModule('xdc.runtime.Main');
    var Memory = xdc.useModule('xdc.runtime.Memory');
    var SysMin = xdc.useModule('xdc.runtime.SysMin');
    var System = xdc.useModule('xdc.runtime.System');
    var Text = xdc.useModule('xdc.runtime.Text');
    
    var BIOS = xdc.useModule('ti.sysbios.BIOS');
    var Clock = xdc.useModule('ti.sysbios.knl.Clock');
    var Swi = xdc.useModule('ti.sysbios.knl.Swi');
    var Task = xdc.useModule('ti.sysbios.knl.Task');
    var Mailbox = xdc.useModule('ti.sysbios.knl.Mailbox');
    var Semaphore = xdc.useModule('ti.sysbios.knl.Semaphore');
    var Hwi = xdc.useModule('ti.sysbios.hal.Hwi');
    var Timer = xdc.useModule('ti.sysbios.hal.Timer');
    var HeapStd = xdc.useModule('xdc.runtime.HeapStd');
    var Cache = xdc.useModule('ti.sysbios.family.arm.a8.Cache')
    var Mmu = xdc.useModule('ti.sysbios.family.arm.a8.Mmu');
    
    var SemihostSupport = xdc.useModule('ti.sysbios.rts.gnu.SemiHostSupport');
    var ReentSupport = xdc.useModule('ti.sysbios.rts.gnu.ReentSupport');
    var ti_sysbios_hal_Cache = xdc.useModule('ti.sysbios.hal.Cache');
    var GIO = xdc.useModule('ti.sysbios.io.GIO');
    var memmap = prog.cpu.memoryMap;
    
    // Enable the cache
    Cache.enableCache = true;
    
    // Correct DDR3 size to the proper 256 MB (2 Gbit)
    memmap["DDR3"].len = 0x10000000;
    
    
    /* 
     * Program.argSize sets the size of the .args section. 
     * The examples don't use command line args so argSize is set to 0.
     */
    Program.argSize = 0x0;
    
    /*
     * Minimize exit handler array in System.  The System module includes
     * an array of functions that are registered with System_atexit() to be
     * called by System_exit().
     */
    System.maxAtexitHandlers = 1;       
    
    /*
     * The BIOS module will create the default heap for the system.
     * Specify the size of this default heap.
     * Note: This will generate a warning with the std heap, but will not have any effect.
     See e2e.ti.com/.../53428 for possible alternate solution
     */
    BIOS.heapSize = 32000; //32000
    
    
    /* System stack size (used by ISRs and Swis) */
    Program.stack = 0x2000;
    
    /* System heap size. Allocate 255 MB, leaving 1 MB for the application */
    prog.heap = 0xFF00000;
    
    /* Circular buffer size for System_printf() */
    SysMin.bufSize = 0x200;
    
    Swi.common$.namedInstance = true;
    
    /*
     * Build a custom BIOS library.  The custom library will be smaller than the
     * pre-built "instrumented" (default) and "non-instrumented" libraries.
     *
     * The BIOS.logsEnabled parameter specifies whether the Logging is enabled
     * within BIOS for this custom build.  These logs are used by the RTA and
     * UIA analysis tools.
     *
     * The BIOS.assertsEnabled parameter specifies whether BIOS code will
     * include Assert() checks.  Setting this parameter to 'false' will generate
     * smaller and faster code, but having asserts enabled is recommended for
     * early development as the Assert() checks will catch lots of programming
     * errors (invalid parameters, etc.)
     */
    BIOS.libType = BIOS.LibType_Custom;
    BIOS.customCCOpts = BIOS.customCCOpts.replace(" -g ","");
    BIOS.customCCOpts += "-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard -mabi=aapcs -O3 -Wunused -Wunknown-pragmas -ffunction-sections -fdata-sections -g -Dti_sysbios_Build_useHwiMacros -Dfar= -D__DYNAMIC_REENT__"
    
    BIOS.assertsEnabled = false;
    Clock.tickPeriod = 1000;
    Hwi.dispatcherSwiSupport = true;
    Hwi.dispatcherTaskSupport = true;
    Hwi.dispatcherAutoNestingSupport = true;
    Hwi.initStackFlag = false;
    Hwi.checkStackFlag = false;
    BIOS.logsEnabled = false;
    BIOS.swiEnabled = true;
    Task.enableIdleTask = false;
    Task.initStackFlag = false;
    Task.checkStackFlag = false;
    BIOS.cpuFreq.lo = 600000000;
    
    Program.linkTemplate = java.lang.System.getenv("IA_SDK_HOME") + "/protocols/ethercat_slave/ecat_appl/ecat_appl.xdt";
    BIOS.rtsGateType = BIOS.NoLocking;
    
    Program.sectMap[".c_int00"] = new Program.SectionSpec();
    Program.sectMap[".c_int00"].loadAddress = 0x80000000;
    

  • Hmm.

    Do you have virtual addresses? If you allocate a buffer with malloc(), is this a physical continuous space?

    You need to do some cache maintainance with DMA.

    Sending data to hardware:

    memset(buffer, content, size);

    cache_flush(buffer, size);

    do_dma(buffer);

    Receiving data from hardware:

    do_dma(buffer);

    cache_invalidate(buffer, size);

    memcpy(buffer, dest, size);

    I can not be more specific, because I only know how to do it in linux. But the procedure is always the same...

  • I don't believe I have virtual addresses, but I'm not sure exactly what the MMU does behind my back :). The buffer is indeed a continuous space.

    I actually did some reading on the SYS/BIOS API calls and experiments while you were posting that. I had to add the following call after my buffer fill:
    Cache_wb(<address>, <size>, Cache_Type_ALL, false);

    I'm a bit surprised that the memory manager is doing such a poor job keeping the cache and ram in sync. Per my above testing, I let the memory sit for 1 second before writing the next segment, and it was never fully written out to RAM. Either way, I'm unstuck and can continue with my development, so thanks for pointing me in the right direction :)

  • Don't blame the memory manager. It is up to you to control the cache if you are doing DMA. This can not be done automatic. It's the same in ALL high performance cpu today.

    Glad to hear that you solved the problem.