dm6437 YUV422 to RGB888, problem

Bing Lee

Other Parts Discussed in Thread: TVP5150

Hello, everyone,

I am now doing image porcessing using DM6437.

I stored one frame in the DDR, and then I want to convert it from YUV(4:2:2) to RGB888, but when the image displayed on the screen, it likes this:

The code I used are as follows:

Int16 MedianFilter_test( )

{

int height = 480;

int width = 720;

tvp5150_init();

vpfe_init( 0x82000000, 720, 480);

_wait(3000000);

VPFE_CCDC_PCR=0x0000000;

for(i = 0; i < height; i++)

{

k = width*i*2;

l = width*i*2;

for(j = 0; j < width/2; j++)

{

cb = *((Uint8*)0x82000000 + 2 + 4*j + k);

y0 = *((Uint8*)0x82000000 + 1 + 4*j + k);

y1 = *((Uint8*)0x82000000 + 3 + 4*j + k);

cr = *((Uint8*)0x82000000 + 4*j + k);

r0 = (1.00000 * y0) + (0.00000 * cb) + (1.40200 * cr);

g0 = (1.00000 * y0) - (0.34414 * cb) - (0.71444 * cr);

b0 = (1.00000 * y0) + (1.72200 * cb) + (0.00000 * cr);

r1 = (1.00000 * y1) + (0.00000 * cb) + (1.40200 * cr);

g1 = (1.00000 * y1) - (0.34414 * cb) - (0.71444 * cr);

b1 = (1.00000 * y1) + (1.72200 * cb) + (0.00000 * cr);

if(r0 > 255) r0 = 255;

else if(r0 < 0) r0 = 0;

if(g0 > 255) g0 = 255;

else if(g0 < 0) g0 = 0;

if(b0 > 255) b0 = 255;

else if(b0 < 0) b0 = 0;

if(r1 > 255) r1 = 255;

else if(r1 < 0) r1 = 0;

if(g1 > 255) g1 = 255;

else if(g1 < 0) g1 = 0;

if(b1 > 255) b1 = 255;

else if(b1 < 0) b1 = 0;

*((Uint8*)0x83000000 + 6*j + l) = r0;

*((Uint8*)0x83000000 + 1 + 6*j + l) = g0;

*((Uint8*)0x83000000 + 2 + 6*j + l) = b0;

*((Uint8*)0x83000000 + 3 + 6*j + l) = r1;

*((Uint8*)0x83000000 + 4 + 6*j + l) = g1;

*((Uint8*)0x83000000 + 5 + 6*j + l) = b1;

}

vpbe_init( 0x83000000, 720, 480,0); // Setup Back-End

return 0;

}

could anyone help me?

Thank you very much!

Bing Lee

over 14 years ago

0 AaronYang over 14 years ago

Intellectual 320 points

It seems that you had make a mistake in the color channle. the yuv422 has many storage formats. such as planner half planner interweave and so on.

why donn't you use the vlib liarary to do such things,

0 SANG-YONG over 14 years ago

TI__Intellectual 1870 points

I expect you need to take offset from Cb and Cr.

instead of cb and cr, please use (cb-128) and (cr-128).

Also, to check the order of data, you may put '0' on coeff. for Cb and Cr. So, you can check whether you can get proper Grey image. Also, you need to check whether Y0 and Y1 are swapped or not. You may do the similar thing to check whether Cb and Cr are swapped or not.

regards,

Sang-Yong

0 Alan Yi over 14 years ago in reply to SANG-YONG

Intellectual 410 points

Dear all:

Basically, the almost format to do image process is RGB888, so the first step to do is format conversion from YCbCr422 to RGB888.Why the efficiency is so low when we use to convert from YCbCr422 to RGB888, it takes almost 1 sec to process this conversion. Let alone to use on real time application. We try to use the optimization level in Build options, but the result is limited. Can anyone have any suggestion to improve the efficiency, below is the demo code.

void ycbcr2rgb(Uint8* src, Int32 width, Int32 height, Uint8* des)

{

Int32 byte_count_line_rgb= width*3;

Int32 byte_count_line_yuv=width*2;

Int32 i,j,k;

Uint8 temp[4];

float p1 , p2 , p3;

for(j=0 ; j <height ; j++) //convert from DDR2 buffer_in to buffer_out

{

k=0;

for(i=0; i< byte_count_line_yuv ; i=i+4 )//1 line

{

temp[0] = *( Uint8*)(src+j*byte_count_line_yuv+i); //cb0

temp[1] = *( Uint8*)(src+j*byte_count_line_yuv+i+1); //y0

temp[2] = *( Uint8*)(src+j*byte_count_line_yuv+i+2); //cr0

temp[3] = *( Uint8*)(src+j*byte_count_line_yuv+i+3); //y1

p1=(temp[1]-16)*1.164+(temp[2]-128)*1.596; //b0

p2=(temp[1]-16)*1.164-(temp[2]-128)*0.813-(temp[0]-128)*0.392; //g0

p3=(temp[1]-16)*1.164+(temp[0]-128)*2.017; //r0

*(Uint8*)(des+j*byte_count_line_rgb+k)=(Uint8)p1 ;//b0;

*(Uint8*)(des+j*byte_count_line_rgb+k+1)=(Uint8)p2 ;//g0;

*(Uint8*)(des+j*byte_count_line_rgb+k+2)=(Uint8)p3 ;//r0;

p1=(temp[3]-16)*1.164+(temp[2]-128)*1.596; //b1

p2=(temp[3]-16)*1.164-(temp[2]-128)*0.813-(temp[0]-128)*0.392; //g1

p3=(temp[3]-16)*1.164+(temp[0]-128)*2.017; //r1

*(Uint8*)(des+j*byte_count_line_rgb+k)=(Uint8)p1 ;//b1;

*(Uint8*)(des+j*byte_count_line_rgb+k+1)=(Uint8)p2 ;//g1;

*(Uint8*)(des+j*byte_count_line_rgb+k+2)=(Uint8)p3 ;//r1;

k=k+6;

}

Best regards,

Alan

0 SANG-YONG over 14 years ago in reply to Alan Yi

TI__Intellectual 1870 points

You may need to optimize the code more (also you may use approximation on do float operation, too (for example, pre-define coefficients like (int)(1.164 x 256+0.5) and use >>8 during operation), but I guess the main bottleneck comes from memory read/write.

You need to use DMA if you don't use it now. Hope other experts chime in if you don't know how to use DMA.,

regards,

Sang-Yong

0 Alan Yi over 14 years ago in reply to SANG-YONG

Intellectual 410 points

Dear Sang-Yong:

We make a conclusion about what you mentioned above; there are some aspects to reduce the time of procedure:

1. Use pre-define coefficients

2. Use >> operator

3. Use DMA

For our understanding, we can realize using item1&2 are to reduce the process time after compilation, however, how to use DMA to improve the bottleneck on memory read/write. We found the DMA is complicated and hard to use it flexibly, could you explain it more detail?

We are starting to doubt whether we can use DM6437 on image process for real time application? Do you have any suggestion?

Best regards,

Alan

0 SANG-YONG over 14 years ago in reply to Alan Yi

TI__Intellectual 1870 points

Alan,

I am not an expert on DMA use.

Could you make new question related to DMA, so it can be answered by different owner?

regards,

Sang-Yong

0 Alan Yi over 14 years ago in reply to SANG-YONG

Intellectual 410 points

Dear Sang-Yong:

Thanks, anyway.

Best regards,

Alan

0 Norman Wong over 14 years ago in reply to Alan Yi

Guru 26430 points

Alan,

Here's a possible hand-optimized version of your code. The main objective to avoid doing the same computation more than once. The compiler is probably doing the same. You could probably trade off accuracy for speed by using fixed-point. Depends on your processor.

void ycbcr2rgb
( const Uint8 *src,   /* Input in YCbCr format. No alignment limitation. */
int          width, /* Width in pixels */
int          height,/* Height in pixels */
Uint8       *des    /* Output in RGB888 format. No alignment limitation. */
)
{
int   i;
int   j;

int   icb;
int   iy0;
int   icr;
int   iy1;

float fcb;
float fy0;
float fcr;
float fy1;

float frc;
float fgc;
float fbc;

float fr;
float fg;
float fb;

width /= 2; /* Width now means count of two pixels. */

for(j=0; j <height; j++)
{
     for(i=0; i< width; i++) //1 line
     {
       /* Read out two pixels cb,y0,cr,y1 */
       icb = *src++;
       iy0 = *src++;
       icr = *src++;
       iy1 = *src++;

       /* Offset values */
       icb -= 128;
       iy0 -= 16;
       icr -= 128;
       iy1 -= 16;

       /* Convert to float */
       fcb = (float)icb;
       fy0 = (float)iy0;
       fcr = (float)icr;
       fy1 = (float)iy1;

       /* Scale the Y values */
       fy0 *= 1.164F;
       fy1 *= 1.164F;

       /* Calc chroma values for RGB */
       fbc = fcr*1.596F;
       fgc = fcr*0.813F + fcb*0.392F;
       frc = fcb*2.017F;

       /* Calc first RGB pixel */
       fb = fy0 + fbc;
       fg = fy0 - fgc;
       fr = fy0 + frc;

       /* Store first RGB pixel. */
      *des++ = (Uint8)fb;
      *des++ = (Uint8)fg;
      *des++ = (Uint8)fr;

       /* Calc second RGB pixel */
       fb = fy1 + fbc;
       fg = fy1 - fgc;
       fr = fy1 + frc;

       /* Store second RGB pixel */
      *des++ = (Uint8)fb;
      *des++ = (Uint8)fg;
      *des++ = (Uint8)fr;
    }
}
}

A note about your code, the second RGB value is overwriting the first value. The indices should be 3,4,5, eg.

                *(Uint8*)(des+j*byte_count_line_rgb+k+3)=(Uint8)p1 ;//b1;
                *(Uint8*)(des+j*byte_count_line_rgb+k+4)=(Uint8)p2 ;//g1;
                *(Uint8*)(des+j*byte_count_line_rgb+k+5)=(Uint8)p3 ;//r1;

No guarantees it will work.

0 Alan Yi over 14 years ago in reply to Norman Wong

Intellectual 410 points

Dear Norman:

Thanks for your suggestion. I’ll try it. By the way, do you have any idea about using DM6437 on image processing? For example, it is suitable for image recognition such as LDW and FCW?

The error code you mentioned is right, I made a mistake during copy, thanks.

Best regards,

Alan

0 Alan Yi over 14 years ago in reply to Alan Yi

Intellectual 410 points

Dear Norman:

After we try it, we found the improvement is limited. Just as Sang-Yong said, the bottleneck should be the memory read/write. So maybe we should focus our target on DMA. However, we appreciate your chime in, thanks!

Best regards,

Alan

0 Norman Wong over 14 years ago in reply to Alan Yi

Guru 26430 points

Sorry, I don't anything about the DM6437 or image processing. I am not quite sure how DMA can help for the calculation part. DMA should help for moving data from memory to peripheral. As far I can tell, the DM6437 does not have a floating point processor. Software floating point operations are very expensive. Fixed point code is a bit tricky. Get it wrong and you'll get weird pictures. Here's some totally untested code to illustrate fixed-point math.

/* Fixed point 32 bit = 16 bits whole + 16 bits fraction*/
void ycbcr2rgb
( const Uint8 *src,   /* Input in YCbCr format. No alignment limitation. */
int          width, /* Width in pixels */
int          height,/* Height in pixels */
Uint8       *des    /* Output in RGB888 format. No alignment limitation. */
)
{
int   i;
int   j;

Int32 icb;
Int32 iy0;
Int32 icr;
Int32 iy1;

Int32 irc;
Int32 igc;
Int32 ibc;

Int32 ir;
Int32 ig;
Int32 ib;

const Int32 k1_164 = 0x000129FB; /* 1.164 */
const Int32 k1_596 = 0x00019892; /* 1.596 */
const Int32 k0_813 = 0x0000D01F; /* 0.813 */
const Int32 k0_392 = 0x00006459; /* 0.392 */
const Int32 k2_017 = 0x0002045A; /* 2.017 */
const Int32 k128   = 0x00800000; /* 128 */
const Int32 k16    = 0x00100000; /* 16 */

width /= 2; /* Width now means count of two pixels. */

for(j=0; j <height; j++)
{
     for(i=0; i< width; i++) //1 line
     {
       /* Read out two pixels cb,y0,cr,y1 */
       icb = *src++;
       iy0 = *src++;
       icr = *src++;
       iy1 = *src++;

       /* Convert from int to fixed-point */
       icb <<= 16;
       iy0 <<= 16;
       icr <<= 16;
       iy1 <<= 16;

       /* Offset values */
       icb -= k128;
       iy0 -= k16;
       icr -= k128;
       iy1 -= k16;

       /* Scale the Y values */
       iy0 *= k1_164;
       iy1 *= k1_164;

       /* Calc chroma values for RGB */
       ibc = (icr*k1_596)>>16;
       igc = ((icr*k0_813)>>16) + ((icb*k0_392)>>16);
       irc = (icb*k2_017)>>16;

       /* Calc first RGB pixel */
       ib = iy0 + ibc;
       ig = iy0 - igc;
       ir = iy0 + irc;

       /* Convert from fixed point to int */
       ib >>= 16;
       ig >>= 16;
       ir >>= 16;

       /* Bound to range 0-255 */
       if(ib < 0) ib = 0; else if(ib > 255) ib = 255;
       if(ig < 0) ig = 0; else if(ig > 255) ig = 255;
       if(ir < 0) ir = 0; else if(ir > 255) ir = 255;

       /* Store first RGB pixel. */
      *des++ = (Uint8)ib;
      *des++ = (Uint8)ig;
      *des++ = (Uint8)ir;

       /* Calc second RGB pixel */
       ib = iy1 + ibc;
       ig = iy1 - igc;
       ir = iy1 + irc;

       /* Convert to int */
       ib >>= 16;
       ig >>= 16;
       ir >>= 16;

       /* Bound to range 0-255 */
       if(ib < 0) ib = 0; else if(ib > 255) ib = 255;
       if(ig < 0) ig = 0; else if(ig > 255) ig = 255;
       if(ir < 0) ir = 0; else if(ir > 255) ir = 255;

       /* Store second RGB pixel */
      *des++ = (Uint8)ib;
      *des++ = (Uint8)ig;
      *des++ = (Uint8)ir;
    }
}
}

0 Alan Yi over 14 years ago in reply to Norman Wong

Intellectual 410 points

Dear Norman:

One important thing you mentioned is fixed point process. The other is the huge image data process, which need to move from DDR2 to internal cache to process and then move the result to DDR2 again. So we want to know how to use DMA to accelerate the format conversion.

By the way, the last code you provided is failed to display normally, maybe the result is wrong after calculating. We are not familiar with the fixed-point math, so we do not know what is wrong with the code. Do you have any comment?

Best regards,

Alan

0 Norman Wong over 14 years ago in reply to Alan Yi

Guru 26430 points

DSP architecture is still new to me. On the ARM side, the data would remain in DDR2.

Fixed point math is tricky because of possible overflow or underflow of calculations. I arbitrarily choose a 16.16 format for the example. I've found 22.10 works for yCbCr to RGB conversion on past projects. But it depends on the calculations and range of the numbers involved. My YCbCr format, RGB format and conversion was different than yours so I can't say exactly the correct precision. You could try 20.12 to see if makes any difference. Change all the shifts from 16 to 12. Shift all the constants right by 4 bits.

The need for real-time sounds like you are streaming from sensor to display. Maybe reduce the image size to a minimum. Decimate your image before the conversion. Sometime the sensor or LCD have both YCbCr and RGB options. I've seen some HW where the sensor can be directly connected to the LCD.

0 Alan Yi over 14 years ago in reply to Norman Wong

Intellectual 410 points

Dear Norman:

Thanks for your good suggestion, we will keep in mind.

Best regards,

Alan

Processors

Processors forum

dm6437 YUV422 to RGB888, problem