Why codec engine access ARM memory so slow?

feng xiao ming

Expert 1445 points

HW:DM8168

SW:ezsdk5.5.1.4+codec engine

I only run the following alg:

for(i=0;i<270;i++)
{
for(j=0;j<480;j++)
{
dstbuffer[itmp+j]=0;

}

dstbuffer is allocated from arm.

this will take 3millisecond,but when the alg run at the ARM,it only take 0.5millisecond.

Why the performance os dsp is slower than ARM?

over 9 years ago

0 Margarita Gashova over 9 years ago

TI__Guru* 80190 points

Hello,

I would recommended you to read this wiki pages:

http://processors.wiki.ti.com/index.php/Codec_Engine_Application_Developers_Guide

http://processors.wiki.ti.com/index.php/Codec_Engine_Overhead

http://processors.wiki.ti.com/index.php/Codec_Engine_Cache_Per_Alg

http://processors.wiki.ti.com/index.php/Codec_Engine_FAQ

How you measure the performance only on DSP?

Best Regards,

Margarita

0 feng xiao ming over 9 years ago in reply to Margarita Gashova

Expert 1445 points

Dear Margarita:

I change the codec of video_copy examples.

int MY_algorithm(char*orgbuffer,char*dstbuffer,int iwidth,int iheight)
{
unsigned int i=0;
unsigned int j=0;

for(i=0;i<270;i++)
{
for(j=0;j<480;j++)
{
dstbuffer[itmp+j]=0;

}

XDAS_Int32 VIDEOCOPY_TI_process(IVIDDEC_Handle h, XDM_BufDesc *inBufs,
XDM_BufDesc *outBufs, IVIDDEC_InArgs *inArgs, IVIDDEC_OutArgs *outArgs)
{
XDAS_Int32 curBuf;
XDAS_Int32 minSamples;

Log_print5(Diags_ENTRY, "[+E] COLORCONVERT_TI_process(0x%lx, 0x%lx, 0x%lx, "
"0x%lx, 0x%lx)",
(IArg)h, (IArg)inBufs, (IArg)outBufs, (IArg)inArgs, (IArg)outArgs);

/* validate arguments - this codec only supports "base" xDM. */
if ((inArgs->size != sizeof(*inArgs)) ||
(outArgs->size != sizeof(*outArgs))) {

Log_print2(Diags_ENTRY, "[+E] COLORCONVERT_TI_process, unsupported size "
"(0x%lx, 0x%lx)", (IArg)(inArgs->size), (IArg)(outArgs->size));

return (IVIDDEC_EFAIL);
}

/* outArgs->bytesConsumed reports the total number of bytes consumed */
outArgs->bytesConsumed = 0;

/*
* A couple constraints for this simple "copy" codec:
* - Given a different number of input and output buffers, only
* decode (i.e., copy) the lesser number of buffers.
* - Given a different size of an input and output buffers, only
* decode (i.e., copy) the lesser of the sizes.
*/

for (curBuf = 0; (curBuf < inBufs->numBufs) &&
(curBuf < outBufs->numBufs); curBuf++) {

/* there's an available in and out buffer, how many samples? */
minSamples = inBufs->bufSizes[curBuf] < outBufs->bufSizes[curBuf] ?
inBufs->bufSizes[curBuf] : outBufs->bufSizes[curBuf];

/* process the data: read input, produce output */
//memcpy(outBufs->bufs[curBuf], inBufs->bufs[curBuf], minSamples);
MY_algorithm(inBufs->bufs[curBuf],outBufs->bufs[curBuf],1920,1080);

Log_print1(Diags_USER2, "[+2] COLORCONVERT_TI_process> "
"Processed %d bytes.", (IArg)(minSamples ));
outArgs->bytesConsumed += minSamples;
}

/* Fill out the rest of the outArgs struct */
outArgs->extendedError = 0;
outArgs->decodedFrameType = 0; /* TODO */
outArgs->outputID = inArgs->inputID;
outArgs->displayBufs.numBufs = 0; /* important: indicate no displayBufs */

return (IVIDDEC_EOK);
}

And I run the algorithm video_copy.

then I write an app use the MY_algorithm function at ARM side,not run at DSP side.

I found that the DSP run so slow than ARM.

How to accelerate the memory access time ata DSP side?

Processors

Processors forum

Why codec engine access ARM memory so slow?