Dear all:
Basically, the almost format to do image process is RGB888, so the first step to do is format conversion from YCbCr422 to RGB888.Why the efficiency is so low when we use to convert from YCbCr422 to RGB888, it takes almost 1 sec to process this conversion. Let alone to use on real time application. We try to use the optimization level in Build options, but the result is limited. Can anyone have any suggestion to improve the efficiency, below is the demo code.
void ycbcr2rgb(Uint8* src, Int32 width, Int32 height, Uint8* des)
{
Int32 byte_count_line_rgb= width*3;
Int32 byte_count_line_yuv=width*2;
Int32 i,j,k;
Uint8 temp[4];
float p1 , p2 , p3;
for(j=0 ; j <height ; j++) //convert from DDR2 buffer_in to buffer_out
{
k=0;
for(i=0; i< byte_count_line_yuv ; i=i+4 )//1 line
{
temp[0] = *( Uint8*)(src+j*byte_count_line_yuv+i); //cb0
temp[1] = *( Uint8*)(src+j*byte_count_line_yuv+i+1); //y0
temp[2] = *( Uint8*)(src+j*byte_count_line_yuv+i+2); //cr0
temp[3] = *( Uint8*)(src+j*byte_count_line_yuv+i+3); //y1
p1=(temp[1]-16)*1.164+(temp[2]-128)*1.596; //b0
p2=(temp[1]-16)*1.164-(temp[2]-128)*0.813-(temp[0]-128)*0.392; //g0
p3=(temp[1]-16)*1.164+(temp[0]-128)*2.017; //r0
*(Uint8*)(des+j*byte_count_line_rgb+k)=(Uint8)p1 ;//b0;
*(Uint8*)(des+j*byte_count_line_rgb+k+1)=(Uint8)p2 ;//g0;
*(Uint8*)(des+j*byte_count_line_rgb+k+2)=(Uint8)p3 ;//r0;
p1=(temp[3]-16)*1.164+(temp[2]-128)*1.596; //b1
p2=(temp[3]-16)*1.164-(temp[2]-128)*0.813-(temp[0]-128)*0.392; //g1
p3=(temp[3]-16)*1.164+(temp[0]-128)*2.017; //r1
*(Uint8*)(des+j*byte_count_line_rgb+k)=(Uint8)p1 ;//b1;
*(Uint8*)(des+j*byte_count_line_rgb+k+1)=(Uint8)p2 ;//g1;
*(Uint8*)(des+j*byte_count_line_rgb+k+2)=(Uint8)p3 ;//r1;
k=k+6;
}
}
}
http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/99/p/121390/434701.aspx#434701
According to someone suggestion, use DMA can improve the bottleneck for memory read/write. Can anyone answer this queation?
Best regards,
Alan