memcpy is so slowly

Changsheng Li

When my 138 board is start, the log is:

ARM Clock : 456000000 Hz
DDR Clock : 198000000 Hz

But in my code, when I process frame data, and it's need some memcpy function. But I find that it is so slowly on DSP side. In my program, ARM communicate to DSP with ListMP. In DSP, I copy D1 420 frame to DSP buffer from ARM side.

I don't known what wrong about it. Who can help me ?

Thanks.

over 9 years ago

0 Titusrathinaraj Stalin over 9 years ago

TI__Guru** 116100 points

Please refer to the following post like yours.

http://e2e.ti.com/support/embedded/linux/f/354/t/146992

http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/p/29260/101628#101628

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/129433

0 Shankari G over 9 years ago

TI__Mastermind 43955 points

Hi Changsheng Li,

We would like to understand how do you measure the execution time of the function "memcpy". Did you use the profile clock option in CCS which measure the instruction cycle between lines of code?

And also, please give details about the name of the package you use and the data rate of memory copy.

In which memory segment the the source buffer and the destination buffer lies??

Regards,
Shankari

-------------------------------------------------------------------------------------------------------
Please click the Verify Answer button on this post if it answers your question.
--------------------------------------------------------------------------------------------------------

0 Changsheng Li over 9 years ago in reply to Shankari G

Expert 2445 points

Hi Titus and Shankari G:
Thanks very much!
In my project, I using MCSDK.
I capture video frame in Arm side, and using ListMP to send D1 frame data to DSP size(Syslink). I find that both ARM and DSP side, memcpy will need 40ms or so to copy 608k buffer.

memory segment is: SHARED_REGION_1
Thanks.

0 Changsheng Li over 9 years ago in reply to Changsheng Li

Expert 2445 points

In ARM size or DSP side, I convert 422 UV to 420 UV:
for (i=0; i<288; i++)
{
for (j=0; j < 360; j++)
{
*uu++ = *ss++;
*vv++ = *ss++;
}
ss+=720;
}

When I perform these code, I need about 58ms. so slowly!

0 Shankari G over 9 years ago in reply to Changsheng Li

TI__Mastermind 43955 points

Hi Changsheng Li,

I thought you were using the "memcopy" library function.

Have you attempt with memcopy function instead of this routine?

Regards,

Shankari

-------------------------------------------------------------------------------------------------------

Please click the Verify Answer button on this post if it answers your question.
--------------------------------------------------------------------------------------------------------

0 Changsheng Li over 9 years ago in reply to Shankari G

Expert 2445 points

No. I cann't. Because 422's UV data is interlace, when convert to 420, I must copy UV data one by one.
Thanks.

0 Norman Wong over 9 years ago in reply to Changsheng Li

Guru 26430 points

Some suggestions:
1) Turn up the compiler optimization.
2) Access memory in the largest width possible. Accessing slow 32-bit wide memory 8 bytes at a time is inefficient.
3) Declare most often used variables with the register attribute. If you have enough free registers, all your vairables are out of slow memory. Maximum compiler optimizaion might do this for you if the compiler knows your loop count.
4) Enable data and instruction cacheing.

Your YUV422 to YUV420p code does not appear complete. Missing Y and averaging of two U and V from 2 rows.

Processors

Processors forum

memcpy is so slowly