This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Integral Image using Vision Library from TI (TI Vlib)

Other Parts Discussed in Thread: SPRC589

Dear All:

Currently, I use wide VGA (752 x 480) Images and would like to use Vlib Integral Image block in Simulink

While, computing the Integral Image of current frame with TI Vlib the value of the last column of previous Integral Image is progressively added.

Is there a way to avoid this? As continued addition will requires large memory size requirement for the Integral Image. If not what is the advantage of doing so?

As my second questions, the input to the Vlib Integral Image block is a vector where first 1....752 elements form the first row of image, and 752 * (n-1) + 1 ...... 752*n forms the n-th row of the image and so on.

I would like to down-sample the image by factor of 2 simply by neglicting every second pixel in both row and column. However, as the input ot Vlib is sample based is there a way to downsample the images for real-time implementation.

Please let me know if you have implemented Vlib block for generated auto codes for Integral Image and tested on DM6437.

Looking for your assitance, Chandra

  •  

    Chandra,

    The reason for progressive adding is to enable block based kernel operation.

    Your image is large and sits in the DDR and DDR memory accesses are slow.

    So a small block of this image is DMAed in local memory and then integral image would be computed. For the next block, we would need the last line of the previous block's integral image.

    Please refer to the following link for more information on double buffering.

    http://en.wikipedia.org/wiki/Multiple_buffering

    In your case, you can resolve the issue by setting the lastline buffer to zero using memset.

     

    From what I understand from your second question, you have to downsample the input image and then call the integral image function.

    Downsampling can be implemented easily by two nested for loops.

     

    Please let me know if you require more clarification.

     

    Regards

    Senthil

     

     

     

  • Thanks Senthil,

    As rightly pointed by you, using the large image will occupy DDR which is slow.

    However, in simulink while using TI Vlib Integral Image block how can I divide my data stream in lets say L parts and do Integral Image processing and update the next block.

    If I have a (N x M) vector data stream, finally I would like to have an Integral Image which is a matrix given by N x M. To save time and avoid DDR access, I want to divide the data into L chunks (N x M)/L and find the Integral Image for this and then use the other chunk and find Integral Image for the block and add the last column from previous block's Integral Image and update it so on.

    Please assist me in this and give me guidelines to do so.

    Regards: Chandra

     

     

  • Hi Senthil:

    Can you please tell me how I can set the lastline buffer to zero using memset when using simulink Vlib block?

    Is it possible to change parameters for the Vlib simulink blocks?

    Thanks and I look forward to your assistance,

    Regards: Chandra

  •  

    Chandra,

    Please read the README.txt file in the simulink folder to understand how to rebuild the Simulink blocks.

    You can include the memset in the source file wrap_VLIB_integralImage8.c and then you can rebuild.

    You can also modify and change parameters inside this file within the scope of the VLIB function.

     

    Regards

    Senthil

  •  

    Chandra,

    Please take a look at the block processing examples in the IMGLIB Simulink Package.

    http://focus.ti.com/docs/toolsw/folders/print/sprc589.html

    You can reuse these examples and plugin VLIB blocks.

    Regards

    Senthil

  • Thanks Senthil, I will read the .txt file and also the suggested IMGLIB package

  • Hi Senthil:

    I worked on the suggestions made by you and realized that only modifications can be made for input and outputs of the function VLIB_integralImage8, while, we cannot modify the processing by the function.

    From your experience, can you tell me what algorithm is being used by TI Vlib Integral Block. Is it the same as the one proposed by Branislav in his publication Integral Image Optimizations for Embedded Vision Applications.

    I am not able to achive 2.3 cycles/pixel as mentioned in Vlib API manual. Considerng the on-chip memory of 32 KB, I did the following

    1. Used the entire image (752 x 480)

    2. Performed Integral Image using Vlib block with rows = 762 and columns = 5 (752 x 5 x 8 < 32 KB).

    This would just require 3 microseconds. If I have a for-loop (96 blocks for 480) and ignore data transfer it would be roughly 0.3 ms + time needed to do sum over all columns.

    I would now like to take your advise on this:

    1. How much is the data transfer time from the DDR to DSP cache?

    2. Is this strategy the best way to have reduced time with Vlib Integral Image or is there any other possible solution?

    3. Should I divide my image into (752 x 5 pixels) and do Integral Image on this block of image or load the entire image and do Integral Image using just the block?

    Best Regards and many thanks: Chandra S Dhir

  •  

    Chandra,

    VLIB Integral Image uses the same implementation discussed in Branislav's publication.

    I think it is very difficult to achieve the maximum performance of DSP using Simulink. It would be very difficult to integrate DMA, etc.

    If you use DMA optimally and do ping-pong buffering, you should be able to get pretty close to the performance quoted in VLIB.

    You should divide the image into smaller blocks which would fit in the local memory and do block by block (as in point 3). In addition you should also do ping-pong buffering.

    Regards,

    Senthil

  • Hi Senthil:

     

    Thanks for your reply and appreciate your helps.

    I would like to know more in details about the compiler optimization options (as suggested in Branislav's paper) . 

    Is there an example that you could share with me for my understanding.

     

    Best Regards: Chandra S Dhir

  •  

    Chandra,

    The compiler optimizations can be set here:
    Right click on the CCS project -> Build Options -> Compiler tab -> Basic.
    There is an Opt Level field - O3 is the highest level of compiler optimization.

    Please refer to spru425a - TMS320C6000 Optimizing C Compiler Tutorial for more details on compiler optimization.

    They have code snippets and small examples.

    Please let me know if you need something more.

    Regards

    Senthil

  • Hi Senthil,

    Does VLIB support square integral calculation?

    Thank you,

    Best regards,

    Tri