This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How can I accelerate the speeds of "VLIB_mixtureOfGaussiansS32"?

Hi,

I use DM6437 to track a moving object, for example, an aeroplane, but the function "VLIB_mixtureOfGaussiansS32" consumes too much time, longer than 40 ms. It is too long for controling an autonomous plane. How can I accelerate the speeds of "VLIB_mixtureOfGaussiansS32" ?

Thanks

Dabo Guo

 

  • Dabo,

    Could you let us know what is your video resolution and which part of DM64x are you using?

     

    Assuming you are using a 720x480  image and a part with a 400Mhz DSP, the numbers seem about right.

    If you look at the VLIB offering there are two 2 Gaussian modeling function VLIB_mixtureOfGaussiansS32 and VLIB_mixtureOfGaussiansS16. I believe the 2 functions are basically the same but the 32 bit functions provides higher accuracy of mean and variance of the Gaussian model. Are you using the 16 bit function. The performance difference between the 2 functions is 8 cycles/ pixel which could be significant improvement(7ms) for an 720x480 image.  Other potential improvements can be achieved by optimizing the memory bandwidth by using DMA to move data from external to internal memory.

    Regards,

    Viet

  • Viet,

                Thanks for your answer.

                I use DM6437 and the image size is 640*576.  According to your assumption, it will cost 5ms for a frame of image. But from VLIB reference guide(2.0), On-chip memory performance has been measured as 31.30 cycles/pixel for VLIB_mixtureOfGaussiansS16 and On-chip memory performance has been measured as 39.13 cycles/pixel for VLIB_mixtureOfGaussiansS32. According to the guide, it will cost 20ms by using 16 bit function and 25 ms by using 32 bit function. Additionally, I use the evaluation board, videos are captured by VPFE and display by VPBE.

            The main routine is listed  as follows:

         FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
         src=frameBuffPtr->frame.frameBufferPtr;
         VLIB_extractLumaFromUYUV(src,640,640,576,imageData); 
         test_gaussian_mixture_models(imageData,currentMeans,currentVars,currentWgts,compIndex,intBuffer,height,width,fgMask);
         test_dilate_and_erode(fgMask,height,width,imageTempData,imageOutputData);
         test_connected_components_labeling(UartHandle,imageOutputData,height,width,primaryBuff1,primaryBuff2,overFlowBuff1,overFlowBuff2,handle,src); 
         FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

      best regards,

    Dabo Guo

  • Hi, all

          I am looking forward to your answers.  

        Videos are captured by VPFE and display by VPBE, so data are transpoted by edma. I expect vlib2.1 can be optimized than vlib 2.0 so that the speed can be greatly improved.

       Waiting.......

    Dabo Guo