This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

frame rate degredation when adding VLPB to OMX chain

Hi TI experts,

I was performing frame rate tests by counting the number of frames at the display (measured on the EBD of the display) with the following configuration:
capture->DEI(scale down)->DSP(VLPB)->DEI(scale up)->display.
Where input was in 1080p. Scaling down of the buffer was done x1/4 horizontal and x1/4 vertical total of x1/16 and scaling up was done in x4 horizontal and x4 vertical, total of x16. The frame rate that was measured in this configuration is approximately 59 frames per second. 

Now i am using an OMX chain of capture->DSP(VLPB)->Display, where capture is done on 1080p in 422 format, DSP performs only loop-back where the area where the OMX buffers are allocated is cacheable by the DSP and VLPB is compiled with -O3 for maximum performances. To my disappointment the frame rate was deteriorated from 59 frames per second to 30 frames per second, it is obvious that the buffers size at the DSP is making the difference.

My questions are:

1. How can it be? Does the frame rate depends on the buffer size?
2. Since the data buffers are located in shared memory area in order to avoid data copying, only 1 copy occurs and it is copying of input buffer to output buffer, should single HD buffer coping take so many time on a cacheable area? 

Thanks,
Gabi 

  • Gabi,

    Copying data by DSP CPU is costly, and It can degrade the performance. Based on buffer size copying of data would reduce/increasae, so you would see varying fps.

    Regards

    Vimal

  • Vimal,

    copying of one char from input buffer to output buffer in DSP where the buffers are located in a cachable area takes what ? 1,2,3 cycles
    thus copying of 1080p image in 422 format (size of 4M Bytes) should take what? 10M - 12M cycles, the DSP is running at 800MHZ, if my assumptions are correct it can copy 66 HD images in sec, where is the degradation is coming from?, reducing the copy size to a few lines instead of a full image didn't give better performances.

    Gabi

  • Hi Gabi,

    It is not just the DSP clock which determines the data copy, It is DDR interface which typically limits the data transfer. To compare, can you apply same logic for copying of frames on PC running with GHzs. Using cache is usefull if you are accessing the data and doing some data modification, else still you are reading / writting from DDR ! DDR access could be in range of 70-80 cycles !!

    Nonetheless, As You mentioned copying fewer lines did not help , that is bit strange..If you do not copy then also you see the same result? Is A8 /DSP waiting for buffer?

    Regards

    Vimal

  • Hi Vimal,

    The OMX chain is managed by the A8, capture->DSP->display, therefore the DSP is waiting for the input buffer to be valid where the A8 gives the approval, the DSP copies the data and now the A8 is waiting for the DSP output buffer, i wonder if the connections A8->DSP and DSP->A8 them-self are large time consumers? and the time consumption is in someway proportional to the buffer size(but not because of the data coping)? because as i mentioned earlier when i had the OMX chain capture->DEI(scale down)->DSP->DEI(scale up)->display where the buffer size after scale down is 1/16 from the capture buffer then the fps was much higher.

    Thanks,
    Gabi