CCS: Does dmautils support Scatter-Gather DMA on TDA4?

Haibo Xu

Intellectual 345 points

Tool/software: Code Composer Studio

Hello TI,

i am trying to gather a lot of separating address data from memory

should i use "pdk/packages/ti/drv/udma/lib/j721e/c66xdsp_1/debug/udma.ae66" this lib to do this ?

and if so, how could i build the example for c66 dsp?

thanks very much

over 5 years ago

0 Anshu Jain over 5 years ago

TI__Guru 56820 points

Hi,

Can you explain your data transfer a bit more so that we can suggest alternatives? Do you want to transfer multiple blocks of data from random locations in a single trigger or it's like every trigger of DMA needs to fetch data from different location.

Regards,

Anshu

0 Haibo Xu over 5 years ago in reply to Anshu Jain

Intellectual 345 points

Hi Jain,

for example we got data of address as below:

0 1 2 3 4 5 6 7 8 9 10 11 [ src memory]

i want get dst memory from src, as below:

0 3 6 9 1 4 7 10 2 5 8 11 [ dst memory]

i mean the next data will skip 3 elements when got current element

thanks very much

0 Anshu Jain over 5 years ago in reply to Haibo Xu

TI__Guru 56820 points

Hi,

This transfer is not a scatter gather transfer. Transferring such small data will not be efficient, instead it might be better to handle this in the processing of this data.

Regards,

Anshu

0 Haibo Xu over 5 years ago in reply to Anshu Jain

Intellectual 345 points

Hi ,

this is just an example

acturally, my data is 874*774*3 bytes image

the origin format is RGB...RGB...RGB

I need to transfer it to be RRR...GGG...BBB

so I need gather all channel 0 togather to be R channel

so I need gather all channel 1 togather to be G channel

so I need gather all channel 2 togather to be B channel

I am now trying in this function

App_udmaTrpdInit

and i think i need to change

pTr->icnt0 = 2;//length;
pTr->icnt1 = 3;//1U;
pTr->icnt2 = 1;
pTr->icnt3 = 1;
pTr->dim1 = 1;
pTr->dim2 = 3;
pTr->dim3 = 1;

can you give me any advise?

thanks very much

0 Anshu Jain over 5 years ago in reply to Haibo Xu

TI__Guru 56820 points

Hi,

What you are trying to do will be highly inefficient with DMA. As mentioned earlier you should try to do this de-interleaving in your processing.

Regards,

Anshu

0 Haibo Xu over 5 years ago in reply to Anshu Jain

Intellectual 345 points

Hi,

thanks very much for your advise

in fact , at this time ,i have modified the udma_memcpy_test.c example to do this job

so you think this is highly inefficient ?

can you describe the "de-interleaving in your processing" more detail

i have no idea how to do with it

thanks very much

0 Haibo Xu over 5 years ago in reply to Anshu Jain

Intellectual 345 points

Hi,

thanks for your reply

i got the image(RGB) from camera, so i have no chance to "do this de-interleaving in your processing"

so i think i can only do it by DMA, do you think so?

thanks very much

0 Anshu Jain over 5 years ago in reply to Haibo Xu

TI__Guru 56820 points

Hi,

If ICNT0 is very small it is expected to be inefficient. Can you explain what you want to do with the data after you do the dma as you explained earlier?

Regards,

Anshu

0 Haibo Xu over 5 years ago in reply to Anshu Jain

Intellectual 345 points

Hi Anshu,

"what you want to do with the data after you do the dma?"

I will use this data for deep learning net work for inference

i get the image from camera ,and the format this RGB RGB RGB ....

BUT the deep learning network need the image format is RRR... GGG... BBB ....

so these 2 days i modified the udma_memcpy_test.c

i modified icnt0/1/2 dim1/2/3 and dicnt0/1/2 and ddim1/2/3

i get the right result finally

BUT, the function "App_memcpyTest" cost 73,410,505 clock cycles

and the DSP freq is 1.35GHz,

so I can calculate it cost more than 50ms, it is obviously cost too much and high inefficient

can you tell me what is the problem , my data size is 874*774*3

i think the result should be less than 5ms

thanks very much

0 Anshu Jain over 5 years ago in reply to Haibo Xu

TI__Guru 56820 points

Hi Haibo,

If it is for deep learning network then it should be taken care as part of pre-processing of the input image to the network. As mentioned earlier using DMA for this will be much more inefficient then doing it as part of the processing.

Regards,

Anshu

Code Composer Studio™︎

Code Composer Studio forum

CCS: Does dmautils support Scatter-Gather DMA on TDA4?