TDA4VM: (C7X) Using UDMA Multi-Channels (DRU) simultaneously

Amine Hamidi

Part Number: TDA4VM

Hi,

SDK version: SDK-RTOS-J721E-EVM-08_05_00_11

We were trying to trigger multiple transfers, each on a separate channel (with DRU enabled), and instead of using the appUdmaCopyNDWait function we are using a custom one appUdmaCopyNDIsFinished, in which we check on the transfer status without waiting for it to finish. This was done in order to avoid iterating throw multiple copy operations on a single channel and in the hope of getting a better performance out of the UDMA DRU channels.

The copy operation are working fine, however, there was no performance improvement compared to the classic loop-TriggerAndWait version.

Is this an expected behaviour of the DMA? or is there a way to fix our solution and get a better performance out of the DMA?

over 2 years ago

0 Brijesh Jadav over 2 years ago

TI__Guru**** 484355 points

Hi Amine,

I did not get it, which performance do you expect to improve here? Also did not get what is checking for transfer status without waiting for it to finish mean here? Are you waiting on the Tr response or something?

DRU when used in multiple channel can give better DDR BW. But i am not sure performance improvement you are expecting here..

Regards,

Brijesh

0 Amine Hamidi over 2 years ago

Prodigy 145 points

Hi Brijesh,

Here is a better explanation of our code:

We are using the app_udma driver provided with the sdk.
We are using 8 DRU channels (channel_idx >= (APP_UDMA_ND_CHANNELS_MAX / 2))
We are trying to perform 8 copy operations at once.

Here is a state machine for our code:

STEP1:
- for each channel: Initialize the copy operation using appUdmaCopyNDInit
STEP2:
- for each channel: Trigger the copy operation using appUdmaCopyNDTrigger
STEP3:
- for each channel: Check the copy operation status using appUdmaCopyNDIsFinished
  - if the operation is done and we have remaining data to copy on the given channel:
    - Initialize and trigger (appUdmaCopyNDInit + appUdmaCopyNDTrigger) the copy op. for the remaining data and stay in STEP3.
  - If the operation is done and we don't have any remaining data, then the copy operation is done on the given channel and stay in STEP3.
- if all copy operations are done on all channels we return.

We implemented this state machine in order to avoid implementing 8 copy operations sequentially and wait for each one of them, meaning:

for each channel:
- Initialize the copy operation using appUdmaCopyNDInit
- Trigger the copy operation using appUdmaCopyNDTrigger
- Wait for the operation to finish using appUdmaCopyNDWait

By triggering all those operations at once and not waiting on them to finish one by one, we were hoping to get a better copy performance from the DRU channels.

I hope this gives you a better understanding of my previous comment.

Regards

0 Brijesh Jadav over 2 years ago in reply to Amine Hamidi

TI__Guru**** 484355 points

Hi Amine,

ok, what's the size of the data that you are transferring? Are the sizes different for all transfer? Also what is the performance difference that you are seeing between parallel transfer and sequential transfer?

Regards,

Brijesh

0 Amine Hamidi over 2 years ago in reply to Brijesh Jadav

Prodigy 145 points

Hi Brijesh,

So we are copying 400 * 1024 bytes on each channel and using 8 DRU channels

Parallel Transfer	513 us
Sequential Transfer	555 us

So the performance is quite the same for both.

It should be noted that all copy operations are done from DDR to DDR.

Regards

Processors

Processors forum

TDA4VM: (C7X) Using UDMA Multi-Channels (DRU) simultaneously