This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: (C7X) Using UDMA Multi-Channels (DRU) simultaneously

Part Number: TDA4VM


Hi,

SDK version: SDK-RTOS-J721E-EVM-08_05_00_11

We were trying to trigger multiple transfers, each on a separate channel (with DRU enabled), and instead of using the appUdmaCopyNDWait function we are using a custom one appUdmaCopyNDIsFinished, in which we check on the transfer status without waiting for it to finish. This was done in order to avoid iterating throw multiple copy operations on a single channel and in the hope of getting a better performance out of the UDMA DRU channels.

The copy operation are working fine, however, there was no performance improvement compared to the classic loop-TriggerAndWait version.

Is this an expected behaviour of the DMA? or is there a way to fix our solution and get a better performance out of the DMA?

  • Hi Amine,

    I did not get it, which performance do you expect to improve here? Also did not get what is checking for transfer status without waiting for it to finish mean here? Are you waiting on the Tr response or something?

    DRU when used in multiple channel can give better DDR BW. But i am not sure performance improvement you are expecting here..

    Regards,

    Brijesh  

  • Hi Brijesh,

    Here is a better explanation of our code:

    • We are using the app_udma driver provided with the sdk.
    • We are using 8 DRU channels (channel_idx >= (APP_UDMA_ND_CHANNELS_MAX / 2))
    • We are trying to perform 8 copy operations at once.

    Here is a state machine for our code:

    • STEP1:
      • for each channel: Initialize the copy operation using appUdmaCopyNDInit
    • STEP2:
      • for each channel: Trigger the copy operation using appUdmaCopyNDTrigger
    • STEP3:
      • for each channel: Check the copy operation status using appUdmaCopyNDIsFinished
        • if the operation is done and we have remaining data to copy on the given channel:
          • Initialize and trigger (appUdmaCopyNDInit + appUdmaCopyNDTrigger) the copy op. for the remaining data and stay in STEP3.
        • If the operation is done and we don't have any remaining data, then the copy operation is done on the given channel and stay in STEP3.
      • if all copy operations are done on all channels we return.

    We implemented this state machine in order to avoid implementing 8 copy operations sequentially and wait for each one of them, meaning:

    •   for each channel:
      • Initialize the copy operation using appUdmaCopyNDInit
      • Trigger the copy operation using appUdmaCopyNDTrigger
      • Wait for the operation to finish using appUdmaCopyNDWait

    By triggering all those operations at once and not waiting on them to finish one by one, we were hoping to get a better copy performance from the DRU channels.

    I hope this gives you a better understanding of my previous comment.

    Regards

  • Hi Amine,

    ok, what's the size of the data that you are transferring? Are the sizes different for all transfer? Also what is the performance difference that you are seeing between parallel transfer and sequential transfer? 

    Regards,

    Brijesh

  • Hi Brijesh,

    So we are copying 400 * 1024 bytes on each channel and using 8 DRU channels

    Parallel Transfer 513 us
    Sequential Transfer 555 us

    So the performance is quite the same for both.

    It should be noted that all copy operations are done from DDR to DDR.

    Regards