This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

regd. multiple DMA transfers on DM6446

Hi,

I want to avoid time spending on waiting for DMA to finish and at the same time don't want to use an ISR to service DMA interrupt.  Instead I am doing like this:

1. Fire DMA1, DMA2, DMA3 and DMA4. Instead of checking TCCField for any of fired DMAs (1-4), DSP is  doing some other processing. And at middle of that processing , I want to check TCCField of each DMA(1-4). Does this cause any problem?

2. If my Src is in DDR and Destination is in L2SRAM/L1SRAM.For all the dma transfers any potential cache stalls? if I fire DMA(1-4), one after the other?

3. Do you think this approach is far better than waiting for each DMA (the DMA that is already triggered/fired) to finish, before fire/trigger another DMA?

4. What are the trade offs in each method (i.e waiting on each DMA to finish , before I fire another DMA or method that I described in step 1)?

 

 

  • 1. I do not see any issue with reading the DMA parameters while the DMAs are in process, modifying the values may cause problems however.

    2. I am not entirely sure what you mean by cache stalls, the data transferred by DMA is not recognized by the DSP's cache so you need to perform a cache invalidate to ensure it sees the newly arrived data and is not looking at cached data. If you mean stalling the CPU than potentially you could see delays as the CPU could be competing with the DMA for bandwidth, particularly on the DDR interface.

    3. As long as the transfers can be made independently and do not need to happen in order than this is a fine solution. You may also want to consider using chaining for your DMA transfers such that the each transfer triggers the following transfer as opposed to starting all simultaneously, this would allow you to have the transfers happen in order without using an ISR.

    4. If you wait on each transfer and use an ISR it means that you use some extra CPU time, but the advantage is that you have more control since you get to write the ISR (such as having the ISR unblock a thread to begin processing on the first piece of data). The advantage of the first option is that you have no need for an ISR and all transfers can begin immediately, however this means that not all will necessarily be done in order, and you may have to wait for all to complete before you begin using the data.

  • I think you've got the right idea.  You want the CPU to do other work while the DMA does the transfer.  Otherwise you might as well do a CPU copy.

    There's no problem kicking off multiple transfers at once, provided there are no dependencies between the data.

    If the CPU is operating on data buffers located in INTERNAL memory then you don't need to worry about cache coherence as it will be maintained by the hardware. 

    If the CPU is operating on buffers in EXTERNAL memory then you need to take some extra precautions:

    • Align your buffers to an L2 line boundary (128 bytes)
    • Make sure you allocate buffers whose length is a multiple of an L2 line (128 bytes)
    • Do a "block invalidate" before using the CPU to read from the buffer after it has been filled by the DMA.
    • Do a "block writeback" after using the CPU to write to a buffer that is about to be grabbed by the DMA.

    If you have any follow-up questions please give some additional details such as whether this will be an EDMA or QDMA transfer and if there are any special data dependencies.

    Brad

  • Thanks for replying to me in detailed.  I am little bit confused about: 

    • Do a "block invalidate" before using the CPU to read from the buffer after it has been filled by the DMA.
    • Do a "block writeback" after using the CPU to write to a buffer that is about to be grabbed by the DMA.

    Can you explain me in some what detailed manner regd. your uggestions. Also If I am triggering multiple data transfers at once and they all using same DDR interface (different src and destination buffers), but not waiting for them to be completed,

    1. Do they transfer data in parallel or sequential (one transfer after completion of other in chained fashion)?

    2. Any major stalls, while DMAs are transfering data?

    Next regd. EDMA/QDMA , I will be using EDMA (to take advantage of larger number of possible logicl channels in EDMA vs QDMA) with 8 logical channels.

     

  • Here's an example to demonstrate.  Let's say you're some audio filtering.  So the data comes in through the McBSP and gets transferred to an IN buffer by the EDMA.  The CPU filters the IN buffer and writes the results to an OUT buffer,  Finally the EDMA transfers the OUT buffer back out the McBSP.  (I am ignoring ping/pong buffers for simplicity.)  Here's the order things would need to happen:

    1. EDMA transfers data from McBSP to DDR, filling the IN buffer.
    2. BEFORE READING THE "IN" BUFFER, the CPU does a "block invalidate" (see Cache User's Guide) on IN.
    3. CPU filters the data from the IN buffer with the results being written to the OUT buffer.
    4. AFTER WRITING THE "OUT" BUFFER, the CPU does a "block writeback" (see Cache User's Guide) on OUT.
    5. EDMA transfers data from OUT buffer, located in DDR, to the McBSP.

    Again, I want to emphasize the fact that user initiated cache coherence operations are only necessary when both the CPU and a DMA are operating on a buffer in EXTERNAL memory.

    Brad