TDA3LX: more efficiently lookup action

SJay

Expert 1615 points

Part Number: TDA3LX

Hi,

If i need to do a lookup action of the entire image size,

Is there any way to achieve the goal more efficiently,

The lookup action has been simplified to the following code,

int idx = 0;
for(i=0; i<imageH; i++)
{
for(j=0; j<imageW; j++)
{
outPutImg[idx] = Lut[idx];
idx++;
}
}

We perform this task in the DSP core,

The utilization rate of Dsp core with other algorithm actions has reached 99%

Is there a more efficient way to perform table lookup tasks, maybe use eve or?

Our image size may be 1280x720 up.

SJay

over 3 years ago

0 Victor Kazmirenko over 3 years ago

Guru 13042 points

Hello!

I don't know for sure what other processing you are doing there, but at least subframe extraction should be doable by EDMA alone, without CPU. One may configure completion interrupt, so DSP gets notified once extraction done and comes with actual processing algorithm.

0 SJay over 3 years ago in reply to Victor Kazmirenko

Expert 1615 points

Hi rrlagic,

I'm not quite sure what you mean,

maybe I describe too much information,

To be simple, if I execute the code below with dsp, it will take 100ms

int idx = 0;
for(i=0; i<imageH; i++)
{
for(j=0; j<imageW; j++)
{
outPutImg[idx] = Lut[idx];
idx++;
}
}

I want to shorten the execution time of this program,

How can I improve?

any method like use edma or eve kernel?

SJay

0 Victor Kazmirenko over 3 years ago in reply to SJay

Guru 13042 points

Hello!

In original post you stated DSP is pretty busy doing other jobs.

Then what I suggest is to offload subframe extraction work to EDMA controller. It will not happen faster. In fact, it may even take little longer depending on memory traffic in your system. What is important, this work consumes no DSP cycles, and it can be run in parallel, in background with other jobs.

The example you show looks too much stripped to me. Really, if 'outPutImg' and 'Lut' both are linear arrays, why do you process them in nested loops? So I presume there is some more logic behind the scene and just telling EDMA can handle 2D transfers, like subframe extration/insertion. If that is merely lienar array copy, still EDMA can be used to offload this job from CPU. So whatever time it takes to make actual transfer, it will make no impact on DSP load. You'll need to issue just few configuration writes.

Just for this case there is QDMA unit in each EDMA controller. Configuration of QDMA transfer takes from 8 to 1 dword writes by processor.

I'm stepping on thin ice of guessing now, so if you're not familiar with DMA, let me give you very small example. Suppose we have 2 linear arrays to copy, like

unsigned char src[SIZE];
unsigned char dst[SIZE];

Then you program QDMA controller with following settings

QDMA_OPT = options_bitfiled;
QDMA_SRC = src;
QDMA_AB_CNT = MAKE_A_B_CNT_MACRO(SIZE, 0);
QDMA_DST = dst;
// zero to unused regs

There is some more details on which config word triggers the transfer, but in general moving block of data is that simple.

0 SJay over 3 years ago in reply to Victor Kazmirenko

Expert 1615 points

Hi rrlagic,

Thank you for your comments,

In fact, I have used EDMA, and I also know that it is suitable for large-scale array handling.

I found that my sample code description is a bit wrong,

Should be changed as below,

int idx = 0;
for(i=0; i<imageH; i++)
{
for(j=0; j<imageW; j++)
{
outPutImg[idx] =inPutTexture[ Lut[idx] ];
idx++;
}
}

As shown in the code,

When getting a value from the inPutTexture by lookup the value according to the lookup table,

it is not suitable to use EDMA to operate.

0 Victor Kazmirenko over 3 years ago in reply to SJay

Guru 13042 points

Well,

I felt the example was too simple. Yet again, why do you use nested loops over linear array?

Next, what is contents of LUT? Does it make any back and forth moves over input?

0 SJay over 3 years ago in reply to Victor Kazmirenko

Expert 1615 points

Hi rrlagic,

In fact, I will develop an AVM system that does not use GPU on rtos.

so, if it is not designed as a lookup table method,

the AVM system is difficult to achieve real-time,

The lookup table action is used to obtain the texture pixels value corresponding to the model point coordinates.

SJay

0 Victor Kazmirenko over 3 years ago in reply to SJay

Guru 13042 points

Hello!

If your processing is merely extracting subset of element of larger array, poor performance could be related with order they are read. If your indexing sequence, i.e. LUT, is sorted ascending, then probably performance of your system is limited by data read/cache speed. If, however, LUT is not ordered, one may consider rearrange both LUT and reference pattern in a way, that input data read sequentially, that hopefully may improve caching performance. Yet again, that's heavily depends on design of the LUT.

0 SJay over 3 years ago in reply to Victor Kazmirenko

Expert 1615 points

Hi rrlagic,

ok, I understand the advice you gave,

I will design and implement according to this idea,

thank!

Curious to ask, are you Ti FAE?

Because your account doesn’t look like it, but it’s the first time I encountered a reply from a non-Ti FAE : O

0 Victor Kazmirenko over 3 years ago in reply to SJay

Guru 13042 points

Hi again,

Glad to be of help.

No, I am not formally affiliated with TI, another developer like you, just had experience with TI processors.

0 Brijesh Jadav over 3 years ago in reply to Victor Kazmirenko

TI__Guru**** 398160 points

Any further questions on this thread? Let us know if we can close it.

Regards,

Brijesh

0 SJay over 3 years ago in reply to Brijesh Jadav

Expert 1615 points

Hi,

you could close it,

if there is any further questions, i will create another thread, thx.

0 Brijesh Jadav over 3 years ago in reply to SJay

TI__Guru**** 398160 points

Thank you, closing this thread.

Processors

Processors forum

TDA3LX: more efficiently lookup action