Any tips on reducing latency of the encodedecode demo?

Derek Richardson

Hi, I have a Dm368 EVM rev G and I was wondering if anyone has tips or suggestions on reducing the latency of the encodedecode demo supplied with the board.

The majority of delay seems to come as a result of frame buffering between the capture driver and encoder, which we've calculated to be about 6 frames. Has anyone else verified that this is where the majority of latency is coming from? I've tried adjusting the following constants to reduce the amount of buffering:

NUM_CAPTURE_BUFS in capture.c, set to 3 by default.

CAPTURE_PIPE_SIZE in video.c, set to 4 by default.

DISPLAY_PIPE_SIZE in video.c, set to 7 by default.

NUM_DISPLAY_BUFS in display.c, set to 4 by default.

I can reduce CAPTURE_PIPE_SIZE to 3, DISPLAY_PIPE_SIZE to 6 and NUM_DISPLAY_BUFS to 3 and it seems to work fine, however, the latency is still noticeable, as I believe these changes are only reducing the amount of buffering by a frame or so.

Besides a slice level implementation, which we are considering implementing, are there any other recommendations on reducing the latency. Ideally, we would like ~15-20ms of latency from capture through to display, if that's even achievable with the DM368 and it's software libraries.

Thank you in advance

Derek

over 14 years ago

0 Anshuman Saxena over 14 years ago

TI__Mastermind 19985 points

Hi Derek,

I would suggest that we divide this problem into couple of parts

1 - Capture -> Encoder

2 - Capture -> Display

In the capture to encode path, only the number of capture buffers can add the delay. Actually, in the implementation, we queue the NUM_CAPTURE_BUFS (=3) empty buffers into the V4L2 layer of capture driver. Once VIDIOC_STREAM_ON is called, the user can start calling VIDIOC_DQBUF. This should happen after 33 msec and the data can instantly go the encoder thread using the FIFO_put(). So, i would say in a steady state, there is only one frame delay between capture and encoder.

I would say, you should try decoupling the capture->encoder path with rest of the system and see the difference in time stamp of the captured data and the encoded output data.

Actually, using the encodedecode demo for the latency check between capture->encode or capture->display because in this demo, encode is tied to decode which in turn is tied to display. So till the display is not providing the buffers, decoder cannot give free buffer to encoder, which in turn means encoder has to wait before it can consume the captured buffer.

I would recommend using the encode demo to check the latency of the above two conditions and then we focus on each of the two usecases one-by-one.

BTW, i am going ahead and closing the other post (http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/100/p/118457/421215.aspx#421215) which is same as this. We will track the latency issue in this thread itself.

Regards,

Anshuman

0 Anshuman Saxena over 14 years ago in reply to Anshuman Saxena

TI__Mastermind 19985 points

BTW, i assume you are still using the capture in "On-the-fly" mode. That is the default setting in the standard DMAI code.

Regards,

Anshuman

0 Derek Richardson over 14 years ago in reply to Anshuman Saxena

Intellectual 610 points

Hi, thanks for the response.

So, what I'm seeing is that during priming of the capture and display threads, the capture thread runs through 2 iterations of the main loop, calling FIFO_PUT twice to send a raw buffer to the video thread. After that, the display thread is primed.

Once the capture thread and display thread are primed, the display thread is unpaused and the video thread begins execution of the main loop. The video thread makes a call at the bottom of the loop to get a buffer from the display thread. However, the display thread is busy making a call to Display_Get, which returns some 132ms later before calling FIFO_PUT to pass the buffer to the video thread. Meanwhile, the capture thread has gone through 4 iterations of it's main loop, grabbing a raw video buffer from the capture driver and passing it to the video thread for encoding. So, there is a latency of 4 frames here I believe. Is this correct?

My calculation of frames of latency between capture driver and encoder is the 2 frames at the beginning during priming, plus the 4 frames which are passed to the video thread while the video thread waits for a buffer from the display thread. So, it seems that the encoder is lagging capture by about 6 frames. Is that not the case?

Yesterday, I added some code to put the capture thread to sleep for the 4 frames during the call to Display_Get. I'll work on verifying if there was any effect on the latency.

I will also look into "On-the-fly" mode, as I'm not familiar with the term.

Thanks for the help so far, much appreciated.

Derek

0 Derek Richardson over 14 years ago in reply to Derek Richardson

Intellectual 610 points

Hi again, I spent some time yesterday looking at the encode demo and tracking the latency there.

As you mentioned, the encode demo behaves differently than the encodedecode demo, as the issue of buffering 4 additional frames to the encoder disappears. The video thread no longer waits for a buffer and seems to be relatively in sync with the capture driver, ie. about a frame delay between them of 33ms. I believe putting the capture thread to sleep for those 4 frames in the encodedecode demo did in fact reduce the latency, however, it seems that we won't be using the encodedecode demo in the end.

We will most likely be using the encode demo on one DM368 and passing the encoded data through EMIF to our FPGA and proprietary communication protocol. The encoded video will be transmitted over our proprietary communication protocol to an FPGA on the other end, connected to a second DM368 through EMIF. The second DM368 will read the encoded stream from the FPGA and run it through the decode demo. So perhaps testing the latency with the encodedecode demo is not really helpful in this case.

A couple of new questions:

1) You mentioned that the only thing affecting delay in the capture to encode path is the number of capture buffers, currently set to 3. I believe this makes the delay then from the camera through encoding 3 frames capture buffer + 1 frame between capture and encode + encode time (measured at 14.2ms) = 147.5ms. Is that correct? Is there any way to reduce the capture buffer latency? I tried changing the number of capture to 1, but there's no video output when I do that. The ARM load drops to 0%.

2) The capture thread passes raw frames straight through to the display driver in the encode demo. The latency from capture to display is then 3 frames capture buffer + 3 frames display buffer = 200ms. Is that correct? There is a noticeable latency when I wave my hand in front of the camera while running the encode demo.

3) Does changing from "On-the-fly" mode improve the latency? Is there any supporting documentation for describing "On-the-fly" mode?

Thanks alot

Derek

0 Anshuman Saxena over 14 years ago in reply to Derek Richardson

TI__Mastermind 19985 points

Derek,

Good analysis. Give me some time and i will reply back to you (by tomorrow)

Regards,

Anshuman

0 Derek Richardson over 14 years ago in reply to Anshuman Saxena

Intellectual 610 points

Anshuman, thanks for your assistance. Any insights would be greatly appreciated!

Best Regards

Derek

0 Anshuman Saxena over 14 years ago in reply to Derek Richardson

TI__Mastermind 19985 points

Hi Derek,

Derek Richardson said:

1) You mentioned that the only thing affecting delay in the capture to encode path is the number of capture buffers, currently set to 3. I believe this makes the delay then from the camera through encoding 3 frames capture buffer + 1 frame between capture and encode + encode time (measured at 14.2ms) = 147.5ms. Is that correct? Is there any way to reduce the capture buffer latency? I tried changing the number of capture to 1, but there's no video output when I do that. The ARM load drops to 0%.

We queue in 3 frames in the capture driver. But as soon as you call VIDIOC_DQBUF, you can get a captured frame. This frame can directly be sent to encoder thread using FIFO_put(). This means there is a delay of only one frame, equivalent to 33 ms. To this you can add encoding time and also the time to send the data over EMIF. Please look at the capture.c file in <DVSDK_Demos>/dm365/encode folder and refer to captureThrFxn

Derek Richardson said:

2) The capture thread passes raw frames straight through to the display driver in the encode demo. The latency from capture to display is then 3 frames capture buffer + 3 frames display buffer = 200ms. Is that correct? There is a noticeable latency when I wave my hand in front of the camera while running the encode demo.

The display driver is primed with the buffers in it and hence can have a delay from capture to display. Assuming the display driver was not primed for any buffers, the steady state flow is that capture gives a YUV frame to the encoder and encoder returns the earlier frame which is actually passed to the display. So the chain is something like this CAPTURE-->ENCODER-->DISPLAY.

In steady state, the worst case delay between capture to display can be equal to the depth of the queue of capture. In this case it is 3 frames. Additional one frame delay can happen, because the frame might not be available to the display driver at the time of the VD signal. So i dont foresee a delay longer than 133msec, assuming the display priming is off.

Derek Richardson said:

3) Does changing from "On-the-fly" mode improve the latency? Is there any supporting documentation for describing "On-the-fly" mode?

In "On-The-Fly" mode, the ISIF hardware module directly feeds the data to the resizer module. There is no buffering in between. This means there is no delay in the capture driver due to the hardware.

In "One-shot" mode, ISIF dumps raw data to the DDR and then resizer picks up this raw data and does its resize operation. This means a delay of 33msec gets added to the overall data flow.

You can get the details of these modes in the VPFE user guide http://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=sprufg8c&fileType=pdf

To check the option for On-The-Fly in the code, refer to <DMAI>/packages/ti/sdo/dmail/linux/dm365/capture.c file and look for attrs-->onTheFly variable. It can give you the details of how it is used in the system.

Regards,

Anshuman

PS: Please mark this post as verified, if you think it has answered your question. Thanks.

0 Derek Richardson over 14 years ago in reply to Anshuman Saxena

Intellectual 610 points

Thanks Anshuman,

So, from your descriptions, the latency from capture to encode is minimal, equal to a frame delay plus encode time. This makes the delay then 33ms capture + 33ms capture -> encode + 14.2ms encode = ~80.2ms

Thanks, the description of the data flow from capture to display makes sense, I verified that the code does as you described, sending the captured frame to the video thread, video thread passes the frame to the encoder, as well as returning it to the capture thread, at which point, the capture thread calls Display_Put. So, the delay here is 33ms capture + 33ms capture -> encode + 33ms encode -> display + 33ms display = ~132ms.

I can't seem to disable display priming. As mentioned, changing the defaults of NUM_DISPLAY_BUFS = 3 and NUM_CAPTURE_BUFS = 3 to anything other than 3 seems to cause the demo to stop working. So display priming is on in this case. Not sure how much additional delay is added, if any.

I verified that "On-The-Fly" mode defaults to TRUE, as previously mentioned. It sounds like running in "On-The-Fly" mode adds no additional delay. So, we'll continue running in this mode.

My original question was how or what can be done to reduce the latency in the video processing pipeline, as the delay between the camera and display is noticeable when running the demos. We would essentially like a more real time response.

Other than slice level data transfers, which we are currently investigating and will most likely attempt to implement, is there anything else that can be done to reduce the latency further?

Thanks alot for all the insights! Very much appreciated

Best Regards

Derek

Processors

Processors forum

Any tips on reducing latency of the encodedecode demo?