This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Why codec engine access ARM memory so slow?

HW:DM8168

SW:ezsdk5.5.1.4+codec engine

I only run the following alg:

for(i=0;i<270;i++)
{
     for(j=0;j<480;j++)
    {
          dstbuffer[itmp+j]=0;

     }

}

 

dstbuffer is allocated from arm.

this will take 3millisecond,but when the alg run at the ARM,it only take 0.5millisecond.

Why the performance os dsp is slower  than ARM?

  • Dear Margarita:

         I change the codec of video_copy examples.

    int MY_algorithm(char*orgbuffer,char*dstbuffer,int iwidth,int iheight)
    {
    unsigned int i=0;
    unsigned int j=0;

    for(i=0;i<270;i++)
    {
         for(j=0;j<480;j++)
        {
              dstbuffer[itmp+j]=0;

         }

    }


    }

     

    XDAS_Int32 VIDEOCOPY_TI_process(IVIDDEC_Handle h, XDM_BufDesc *inBufs,
    XDM_BufDesc *outBufs, IVIDDEC_InArgs *inArgs, IVIDDEC_OutArgs *outArgs)
    {
    XDAS_Int32 curBuf;
    XDAS_Int32 minSamples;

    Log_print5(Diags_ENTRY, "[+E] COLORCONVERT_TI_process(0x%lx, 0x%lx, 0x%lx, "
    "0x%lx, 0x%lx)",
    (IArg)h, (IArg)inBufs, (IArg)outBufs, (IArg)inArgs, (IArg)outArgs);

    /* validate arguments - this codec only supports "base" xDM. */
    if ((inArgs->size != sizeof(*inArgs)) ||
    (outArgs->size != sizeof(*outArgs))) {

    Log_print2(Diags_ENTRY, "[+E] COLORCONVERT_TI_process, unsupported size "
    "(0x%lx, 0x%lx)", (IArg)(inArgs->size), (IArg)(outArgs->size));

    return (IVIDDEC_EFAIL);
    }

    /* outArgs->bytesConsumed reports the total number of bytes consumed */
    outArgs->bytesConsumed = 0;

    /*
    * A couple constraints for this simple "copy" codec:
    * - Given a different number of input and output buffers, only
    * decode (i.e., copy) the lesser number of buffers.
    * - Given a different size of an input and output buffers, only
    * decode (i.e., copy) the lesser of the sizes.
    */

    for (curBuf = 0; (curBuf < inBufs->numBufs) &&
    (curBuf < outBufs->numBufs); curBuf++) {

    /* there's an available in and out buffer, how many samples? */
    minSamples = inBufs->bufSizes[curBuf] < outBufs->bufSizes[curBuf] ?
    inBufs->bufSizes[curBuf] : outBufs->bufSizes[curBuf];

    /* process the data: read input, produce output */
    //memcpy(outBufs->bufs[curBuf], inBufs->bufs[curBuf], minSamples);
    MY_algorithm(inBufs->bufs[curBuf],outBufs->bufs[curBuf],1920,1080);

    Log_print1(Diags_USER2, "[+2] COLORCONVERT_TI_process> "
    "Processed %d bytes.", (IArg)(minSamples ));
    outArgs->bytesConsumed += minSamples;
    }

    /* Fill out the rest of the outArgs struct */
    outArgs->extendedError = 0;
    outArgs->decodedFrameType = 0; /* TODO */
    outArgs->outputID = inArgs->inputID;
    outArgs->displayBufs.numBufs = 0; /* important: indicate no displayBufs */

    return (IVIDDEC_EOK);
    }

    And I run the algorithm video_copy.

    then I write an app use the MY_algorithm function at ARM side,not run  at DSP side.

    I found that the DSP run so slow than ARM.

    How to accelerate the memory access time ata DSP side?