This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

dm6446 Arm Side way too slow

Hi,

 

I am implementing computer vision algorithms in the dm6446 processor. Specifically in the dm6446 EVM.

I am very pleased with the C64x+ DSP, unfortunately i cant say the same about the arm side.

The time it takes just to read the data and displaying it is huge. and i really mean huge.

Basically i read images from the camera, send it to the DSP, get 3 images back and display them on the LCD.

I am using DMAI.

Lets see some examples:

 

 /* Get a captured buffer from the capture device driver */
        if (Capture_get(hCapture, &cBuf) < 0) {
            printf("Failed to get capture buffer\n");
            goto cleanup;
        } -> Takes 45us

on the other hand

 

 Get a display buffer from the display device driver */
        if (Display_get(hDisplay, &dBuf) < 0) {
            printf("Failed to get display buffer\n");
            goto cleanup;
        } -> Takes 36464us

 

Why is it taking so long to grab the buffer from the dysplay? I have defined 3 Buffers for the Dysplay.

 

Next i reduce the image size and extract only the Y channel. with this function:

char ExtractReduce_Y( unsigned char*  p_src_img,unsigned char*  p_dst_img, int src_w, int src_h, int bpp)
{
    if(!p_src_img&&!p_dst_img)
        return 1;   

    if(p_src_img==p_dst_img)
        return 1;       

    int i=0;
    int j=0;
    int k=0;
    int width_step=src_w*bpp;
    for(j=0;j<src_h;j+=2){   
        for(i=1;i<width_step;i+=4)
        {       
            p_dst_img[k]=p_src_img[i+j*width_step];
            k++;       
        }
               
    }
    return 0;   

}

for an original image of 720x576 pixels it takes 25458us.


Then i call the DSP.

 

Next i want to dysplay the DSP output.

 

So i grab the 360x288 pixels 1 channel image, convert it to YCrCb and send it to the Dysplay with these 2 functions.

 

 

char Gray1CToYCrCb( unsigned char*  p_src_img,unsigned char* p_dst_img, int size)
{

    if(!p_src_img&&!p_dst_img)
        return 1;   
   
    if(p_src_img==p_dst_img)
        return 1;       
   
    int i;
    int j;
    for(i=0,j=0;i<size;i++,j++)
    {   
        p_dst_img[j]=128;
        j++;
        p_dst_img[j]=p_src_img[i];   
   
    }
    return 0;
}

 

char PutImageLCD( unsigned char*  p_src_img, int src_w, int src_h,unsigned char *  p_img_lcd, int lcd_w, int lcd_h, int x, int y, int bpp)
{
    //bpp -  bytes per pixel       
    if(!p_src_img&&!p_img_lcd)
        return 1;
    int i;
    int lcd_width_nbytes=lcd_w*bpp;//width in Bytes   
    unsigned char* p_img_lcd_aux=p_img_lcd+x*bpp+lcd_width_nbytes*y;    //pointer to left-up corner origen
    int src_img_width_nbytes=src_w*bpp;                   
    for(i=0;i<src_h;i++)
    {
        memcpy(p_img_lcd_aux,p_src_img,src_img_width_nbytes);    //Copies 1 horizontal line
        p_img_lcd_aux+=lcd_width_nbytes;            //Points to next line   
        p_src_img+=src_img_width_nbytes;   
    }
    return 0;
}

These two functions take 25445us.

 

My question is, am I missing something, or are these normal times?

Are these buffers being cached?

It's weird that the ARM cant deal at least with the read and display part in real time.

 

Any lights on how can i get these processing times really down, so i can have my system running in real time are highly appreciated.

 

Thanks in advance.

 

Filipe Alves

 

P.S. These are Release Version values with -o2 optimization.

 

 

  • Anyone willing to take 5 min to analyse this please. It should be a straight answer to those experienced in the board.

    No need to analyse if my functions are optimized or wrong, just to discuss if it is normal to take 36 ms just to grab the display frame, when the buffer from the camera is grabbed in 0.045 ms

    Best Regards

     

    Filipe Alves

  • Filipe Alves49699 said:

    I am implementing computer vision algorithms in the dm6446 processor. Specifically in the dm6446 EVM.

    I am very pleased with the C64x+ DSP, unfortunately i cant say the same about the arm side.

    The time it takes just to read the data and displaying it is huge. and i really mean huge.

    Basically i read images from the camera, send it to the DSP, get 3 images back and display them on the LCD.

    I am using DMAI.

    Lets see some examples:

     

     /* Get a captured buffer from the capture device driver */
            if (Capture_get(hCapture, &cBuf) < 0) {
                printf("Failed to get capture buffer\n");
                goto cleanup;
            } -> Takes 45us

    on the other hand

     

     Get a display buffer from the display device driver */
            if (Display_get(hDisplay, &dBuf) < 0) {
                printf("Failed to get display buffer\n");
                goto cleanup;
            } -> Takes 36464us

     

    Why is it taking so long to grab the buffer from the dysplay? I have defined 3 Buffers for the Dysplay.

    - I am not sure if you use V4L2 of FBdev for the display but I would guess that getting an already displayed buffer should be quicker.

    Are you really using 3 display buffers? Double check that the application has allocated memory to all 3 buffers. Check as well how many buffer are being queued in the display driver queues before you request one.

    The System_Integration_using_Linux_Workshop gives some details on the mecanism of Linux Video driver:
    http://processors.wiki.ti.com/index.php/OMAP%E2%84%A2/DaVinci%E2%84%A2_System_Integration_using_Linux_Workshop
    It might be a good place to look at to understand the behavior.


    - As a test you could use try to measure the delay directly at the V4L2/FBdev level. There are PSP V4L2 and Fbdev loopback examples available:
    dvsdk_2_00_00_22/PSP_02_00_00_140/examples/dm644x/v4l2
    dvsdk_2_00_00_22/PSP_02_00_00_140/examples/dm644x/fbdev
    (I am assuming that you use DVDSK 2.00)

    Hope it helps.

    Anthony

  • Filipe Alves49699 said:

    My question is, am I missing something, or are these normal times?

    Are these buffers being cached?

     

    - For codec engine some information about CACHE are available:
            http://processors.wiki.ti.com/index.php/Cache_Management
    However for DMAI I am not sure if/how CACHE coherency is done.

    - May be you have the option to have the DSP perform as well the color conversion rather than having the ARM doing it. I think that there are some optimized fct in the IMG lib for color conversion.

     

    Best regards,

    Anthony