Hi,
I am implementing computer vision algorithms in the dm6446 processor. Specifically in the dm6446 EVM.
I am very pleased with the C64x+ DSP, unfortunately i cant say the same about the arm side.
The time it takes just to read the data and displaying it is huge. and i really mean huge.
Basically i read images from the camera, send it to the DSP, get 3 images back and display them on the LCD.
I am using DMAI.
Lets see some examples:
/* Get a captured buffer from the capture device driver */
if (Capture_get(hCapture, &cBuf) < 0) {
printf("Failed to get capture buffer\n");
goto cleanup;
} -> Takes 45us
on the other hand
Get a display buffer from the display device driver */
if (Display_get(hDisplay, &dBuf) < 0) {
printf("Failed to get display buffer\n");
goto cleanup;
} -> Takes 36464us
Why is it taking so long to grab the buffer from the dysplay? I have defined 3 Buffers for the Dysplay.
Next i reduce the image size and extract only the Y channel. with this function:
char ExtractReduce_Y( unsigned char* p_src_img,unsigned char* p_dst_img, int src_w, int src_h, int bpp)
{
if(!p_src_img&&!p_dst_img)
return 1;
if(p_src_img==p_dst_img)
return 1;
int i=0;
int j=0;
int k=0;
int width_step=src_w*bpp;
for(j=0;j<src_h;j+=2){
for(i=1;i<width_step;i+=4)
{
p_dst_img[k]=p_src_img[i+j*width_step];
k++;
}
}
return 0;
}
for an original image of 720x576 pixels it takes 25458us.
Then i call the DSP.
Next i want to dysplay the DSP output.
So i grab the 360x288 pixels 1 channel image, convert it to YCrCb and send it to the Dysplay with these 2 functions.
char Gray1CToYCrCb( unsigned char* p_src_img,unsigned char* p_dst_img, int size)
{
if(!p_src_img&&!p_dst_img)
return 1;
if(p_src_img==p_dst_img)
return 1;
int i;
int j;
for(i=0,j=0;i<size;i++,j++)
{
p_dst_img[j]=128;
j++;
p_dst_img[j]=p_src_img[i];
}
return 0;
}
char PutImageLCD( unsigned char* p_src_img, int src_w, int src_h,unsigned char * p_img_lcd, int lcd_w, int lcd_h, int x, int y, int bpp)
{
//bpp - bytes per pixel
if(!p_src_img&&!p_img_lcd)
return 1;
int i;
int lcd_width_nbytes=lcd_w*bpp;//width in Bytes
unsigned char* p_img_lcd_aux=p_img_lcd+x*bpp+lcd_width_nbytes*y; //pointer to left-up corner origen
int src_img_width_nbytes=src_w*bpp;
for(i=0;i<src_h;i++)
{
memcpy(p_img_lcd_aux,p_src_img,src_img_width_nbytes); //Copies 1 horizontal line
p_img_lcd_aux+=lcd_width_nbytes; //Points to next line
p_src_img+=src_img_width_nbytes;
}
return 0;
}
These two functions take 25445us.
My question is, am I missing something, or are these normal times?
Are these buffers being cached?
It's weird that the ARM cant deal at least with the read and display part in real time.
Any lights on how can i get these processing times really down, so i can have my system running in real time are highly appreciated.
Thanks in advance.
Filipe Alves
P.S. These are Release Version values with -o2 optimization.