This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA3LX: more efficiently lookup action

Expert 1615 points
Part Number: TDA3LX

Hi,

If i need to do a lookup action of the entire image size,

Is there any way to achieve the goal more efficiently,

The lookup action has been simplified to the following code,

int idx = 0;
for(i=0; i<imageH; i++)
{
    for(j=0; j<imageW; j++)
    {
        outPutImg[idx] = Lut[idx];
        idx++;
    }
}

We perform this task in the DSP core,

The utilization rate of Dsp core with other algorithm actions has reached 99%

Is there a more efficient way  to perform table lookup tasks, maybe use eve or?

Our image size may be 1280x720 up.

SJay

  • Hello!

    I don't know for sure what other processing you are doing there, but at least subframe extraction should be doable by EDMA alone, without CPU. One may configure completion interrupt, so DSP gets notified once extraction done and comes with actual processing algorithm.

  • Hi rrlagic, 

    I'm not quite sure what you mean,

    maybe I describe too much information,

    To be simple, if I execute the code below with dsp, it will take 100ms

    int idx = 0;
    for(i=0; i<imageH; i++)
    {
        for(j=0; j<imageW; j++)
        {
            outPutImg[idx] = Lut[idx];
            idx++;
        }
    }

    I want to shorten the execution time of this program,

    How can I improve?

    any method like use edma or eve kernel?

    SJay

  • Hello!

    In original post you stated DSP is pretty busy doing other jobs.

    Then what I suggest is to offload subframe extraction work to EDMA controller. It will not happen faster. In fact, it may even take little longer depending on memory traffic in your system. What is important, this work consumes no DSP cycles, and it can be run in parallel, in background with other jobs.

    The example you show looks too much stripped to me. Really, if 'outPutImg' and 'Lut' both are linear arrays, why do you process them in nested loops? So I presume there is some more logic behind the scene and just telling EDMA can handle 2D transfers, like subframe extration/insertion. If that is merely lienar array copy, still EDMA can be used to offload this job from CPU. So whatever time it takes to make actual transfer, it will make no impact on DSP load. You'll need to issue just few configuration writes.

    Just for this case there is QDMA unit in each EDMA controller. Configuration of QDMA transfer takes from 8 to 1 dword writes by processor.

    I'm stepping on thin ice of guessing now, so if you're not familiar with DMA, let me give you very small example. Suppose we have 2 linear arrays to copy, like

    unsigned char src[SIZE];
    unsigned char dst[SIZE];

    Then you program QDMA controller with following settings

    QDMA_OPT = options_bitfiled;
    QDMA_SRC = src;
    QDMA_AB_CNT = MAKE_A_B_CNT_MACRO(SIZE, 0);
    QDMA_DST = dst;
    // zero to unused regs

    There is some more details on which config word triggers the transfer, but in general moving block of data is that simple.

  • Hi rrlagic,

    Thank you for your comments,

    In fact, I have used EDMA, and I also know that it is suitable for large-scale array handling.

    I found that my sample code description is a bit wrong,

    Should be changed as below,

    int idx = 0;
    for(i=0; i<imageH; i++)
    {
        for(j=0; j<imageW; j++)
        {
            outPutImg[idx] =inPutTexture[ Lut[idx] ];
            idx++;
        }
    }

    As shown in the code,

    When getting a value from the inPutTexture by lookup the value according to the lookup table,

    it is not suitable to use EDMA to operate.

  • Well,

    I felt the example was too simple. Yet again, why do you use nested loops over linear array?

    Next, what is contents of LUT? Does it make any back and forth moves over input?

  • Hi rrlagic,

    In fact, I will develop an AVM system that does not use GPU on rtos.


    so, if it is not designed as a lookup table method,


    the AVM system is difficult to achieve real-time,


    The lookup table action is used to obtain the texture pixels value corresponding to the model point coordinates.

    SJay

  • Hello!

    If your processing is merely extracting subset of element of larger array, poor performance could be related with order they are read. If your indexing sequence, i.e. LUT, is sorted ascending, then probably performance of your system is limited by data read/cache speed. If, however, LUT is not ordered, one may consider rearrange both LUT and reference pattern in a way, that input data read sequentially, that hopefully may improve caching performance. Yet again, that's heavily depends on design of the LUT.

  • Hi rrlagic,

    ok, I understand the advice you gave,

    I will design and implement according to this idea,

    thank!

    Curious to ask, are you Ti FAE?

    Because your account doesn’t look like it, but it’s the first time I encountered a reply from a non-Ti FAE : O

  • Hi again,

    Glad to be of help.

    No, I am not formally affiliated with TI, another developer like you, just had experience with TI processors.

  • Any further questions on this thread? Let us know if we can close it.

    Regards,

    Brijesh

  • Hi,

    you could close it,

    if there is any further questions, i will create another thread, thx.

     

  • Thank you, closing this thread.