This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Streaming video to h264 decoder

Hi there, I have a question about how to stream video to the h264 decoder.

We have 2 DM368EVM revG's now and I'm writing encoded video out the EMIF port on one into an external FPGA.  The FPGA transmits the encoded video over GigE to a second FPGA.  From the second FPGA, I read the video into a buffer through EMIF.

I've modified the file read calls in Loader_prime and Loader_readData to instead, copy the buffer I'm reading video into through EMIF into the ring buffer.

I managed to get video streaming through from end to end this morning, which was great to finally see!

The main issue is that the latency that I'm seeing is rather horrendous.  I believe it has to do with the request for a 2MB buffer of encoded video by the decoder as input with the call to Loader_Prime.  At 1MBps bit rate, I'm seeing around 15 seconds of latency end to end. 

Is there a way to cut down the size of the buffer?  Or is there some other configuration that needs to be done in order to reduce the latency in the stream?  The ideal situation would be that as the frames are read in through EMIF, the decoder is able to grab them immediately and begin decoding...

Thanks!

Derek

  • I managed to speedup the latency to around 1sec end to end. 

    In the dmai Loader.c Loader_prime function, rather than waiting for "hLoader->readSize" worth of data to get buffered in, which is 2MB, I simply waited for 30K worth of data to get buffered in, which appears to be slightly larger than the first encoded frame, at 1MBps encoding.

    Also, in Loader_readData, rather than checking whether the delta value (difference between write and read pointers in the ring buffer) is less than hLoader->readSize + hLoader->readAhead, I simply checked if it's less than 100K. 

    I suppose the decoder required the size of the input buffer to be 2MB, but it doesn't seem like it needs the buffer to filled with that much video during the first call to decode the buffer, after calling Loader_prime.

    With some additional modifications, I'm sure I can improve the latency further.

    Cheers

    Derek

  • Hello,

    The reason why so much data is being read (and kept in memory) by the Loader is because the decode demo doesn't know how large the encoded frame is. The app asks the codec what the worst case is, and ~2MB is the h.264 spec reply.

    However, if you already know the size of the encoded frame this is not necessary. You could e.g. put a "number of bytes" packet as part of your protocol to tell the Loader how big the next frame is, since from I gather from above you control both encode and decode.

    The only requirement from the decoder is that a full frame is passed, or it will fail.

    Regards, Niclas

  • Hi Niclas, thank you for the reply

    Yes, we control the encode on one side and decode on the other.  The modification to the encode demo was fairly simple, the encoded frames are simply being written out the EMIF port.  I also write the frame size out as my header, so that the receiver knows how many encoded bytes to read out of the FPGA in order to reassemble the frame.  Since I already have the number of bytes available, I could use that to tell the Loader how big the next frame is.

    The decode side has proven a bit more complicated.  I've added an additional thread to the decode demo, responsible for reading the encoded frames in through EMIF as they arrive and placing them in a ring buffer I created.  I can't seem to run any faster than 1MBps encoding rate, otherwise, the additional thread can't seem to keep up with the frame processing.  I'm hoping to speed this up, as the image resolution is rather poor.

    Thanks again for the insights.

    Cheers

    Derek

  • Hi again,

    I'm still working on reducing the end-to-end latency and I have some questions with regards to that.

    With the modifications I made to Loader_readData, it looks like the loader thread is able to keep up with the frames as they arrive, hence, there appears to be at most, one frame of latency worst case between the thread I use to read in the frames from the fpga and the loader thread.  This is good news. 

    The issue I'm experiencing now is that over time, there appears to be more calls to Loader_readData from the loader thread than calls to Loader_getFrame from the video thread, hence, the loop in the loader thread is faster than the loop in the video thread.  Slowly, the end-to-end latency gets worse and worse over time.  I calculated that after running for 10 minutes, there was 96 frames still in the ring buffer.  This computes to around 3 seconds of latency.  It gets worse from there, after 15 minutes, there was 171 frames in the ring buffer.

    This morning, I tried making some modifications to Loader_getFrame, to basically drop a number of frames until there was a single frame delay between the writer and reader in the ring buffer.  I got a bit error as a result of doing so from the decoder.

    I was wondering if the H264 encoded frames are dependent on each other.  My guess is that they are in some way, as a bit error resulted when I tried dropping a few frames, unless I somehow messed the pointers, which I've gotten a bit error previously from doing so.  Are the encoded frames dependent on each other?

    Either way, I need to figure out a way to speedup the video thread slightly, so that the reader can keep up with the writer in the ring buffer.

    Any insights here would be greatly appreciated!

    Thanks

    Derek