This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

question about synchronize audio and video on evmdm365

Other Parts Discussed in Thread: CC8520

Hello everybody!

In my applications (encoder captures video and audio, encode data and send them to the decoder which decodes the data and displays video and audio), I need to synchronize audio data with video data. On the encoder, I have 2 threads, one for video encoding (H264) and one for audio encoding (G711). Both data packets are send at the same time to the decoder which then uses 2 threads for decoding. When I display the video, the audio is always minimum 0,5 - 1 second later than the video. I don't understand why the audio latency is so high. I use the DVSDK 3.10 with the EVMDM365 board for both encoder and decoder. As audio driver I use the Sound module from the DMAI. Maybe the audio driver has buffered 0,5 - 1 second old data when I start to read from the device? Thanks for any help!

Regards,

Matthias

  • Hi Matthias, 

       I am working on a similar application but in may case is the video encoding which causes the major delay. 

    You say that audio data and video data are sent at the same time, but how are the two thread syncronized? Is it possible that you send less audio frames than available. In this case the audio delay should increase with time. 

    Maybe your audio buffer is too long. I acquire audio frames of 20 ms. So I get a frame each 20ms ready to be transmitted. If you use a fifo (DMAI provides this kind of data) be sure to send all the buffers available to the network. 

    If both audio and video are transmitted via RTP I think you can use RTP timestamps to syncronize them. 

  • Hi, MW51194.
    Hi, Peregrinus.

    My suggestion is to temporarily refuse audio coding. Try to use pure PCM sound first.
    You can play native PCM sound directly from your application using alsa-lib.
    So, you can bypass DMAI usage for audio. This will give you the opportunity to effectively use the time shift of the sound data.

    ---
    I'm also interested in the topic of synchronous video+audio playback.
    My record application uses mkv-container for vid+aud storage, now.
    Mkv-file is playing on the host. Mkv cluster has special features for marking audio and video frames with timestamps. So, host players have no problems with sync playback.
    In my case, there were some problems with AAC to well and correctly put the audio-frames together into mkv-cluster. There were no problems with sync playback on the host side (using stadard multimedia players) but mkv-cluster was very complex because H264 and AAC frame time durations were different. Then I refused to use AAC and began to use the pure PCM for sound. Everything was good, simple, and clear.
    Soon I plan to start working with vid+aud playback on the device side.
    My application should be able to read mkv-file and playback H264-video together with audio. Again, I will plan to use pure PCM (using alsa-lib without DMAI intervention).

    PS: Important note. Personally I don't use two threads! I placed the code that works with the Alsa inside the video thread. On my opinion, it is more easy and clear.

    ---
    best regards, senchuss

  • Hi,

    thank you very much for your answers!

    The audio and video threads are synchronized with the fifo module from dmai. When a buffer is full with encoded data, it sends the buffer via fifo to a writer thread (like in the demo application). The writer thread waits until the buffer of the second thread send its buffer via fifo to the writer. The writer then adds the audio data at the end of the video data and send all data via DMA to the emif interface. A FPGA at the other side takes the data, creates a MPEG like transport stream and send it via radio link to the decoder. After DMA transfer is complete, the writer thread send the video buffer via fifo back to the video thread and the audio buffer via fifo back to the audio thread.

    I use 720p video with 30fps frame rate, so I have approx. 33 ms between two writes to the FPGA. With 8kHz sampling rate, there are 264 samples of audio data to get 33 ms of audio (264 * 125us = 33ms). I set the ringbuffer of the audio device to 528 Bytes (264 16-Bit samples) via sAttrs.bufSize. I also tried lower values, e.g. 32 Bytes or so, but the audio delay is nearly the same. The input buffer size for the Sound_read function is also 528 Bytes for 264 16-Bit samples.

    On receiver side it is vice versa. A loader thread receives a buffer on the emif interface with video data and attached audio data. Then, it fills a audio buffer with the attached data and sends the video buffer to the video thread and the audio buffer to the audio thread. After that, it waits until it gets free buffers from the threads to fill with incoming data from the emif interface.

    The audio delay does not increase with time. It is about 0,5 - 1s, while the video delay is about 150 ms.

    The use of a container is very interesting. I have thought about using a mp4 container when record the video data to memory, because many players have problems of playback raw h264 data. Unfortunately, I didn't find any information about creating a mp4 container. Is there a library available, which I can use to put the incoming video frames in a container?

    I will try the variant of using one common thread for video and audio. I'm sure it is more easy, but I have thought that during the video process call, the application can switch to the audio thread and do the read/process operations while the video thread is still processing. But I will try this! Again, thank you for your answers!

    Kind regards,

    Matthias

  • Hi, Matthias.

    "Unfortunately, I didn't find any information about creating a mp4 container. Is there a library available, which I can use to put the incoming video frames in a container?"

    I couldn't find it too. That's why I chose matroska.

    http://matroska.org/technical/specs/index.html

    Quite a few words in the text. There is nothing superfluous. Everything is described very briefly. But all specs are present in tables.

    There are two libs:
    * Standard "pair"- libebml & libmatroska
    * Alternative - yamka

    At my glance, both of them are complex for newbies. Especially if you are looking at "mkv" at first time and you can't understand how data is organized.

    The right solution is study mkv-structure and write some C-code from scratch!
    For example, in my case, I wrote my custom simple mkv-lib with necessary functions during two month period. I.e. In my humble opinion, sometimes it is faster to write a simple code yourself than to understand of how to use the complex library.

    So... here is brief info.

    The basis of all is EBML block.
    <ID>
    <LEN>
    <DATA>

    ID - is unique set of bytes that specify kind of block.
    LEN - is size of data inside block.
    DATA - is data :-)

    Primitive parser do following tasks:
    * Read ID of current block
    * Read LEN field of current block
    * Read (or skip) LEN bytes of current block DATA.
    * Read ID of next block
    * Read LEN field of next block
    * Read (or skip) LEN bytes of next block
    * etc

    The mkv-file is only a set of EBML blocks.
    Only EBML blocks and nothing more at low level.
    All of the blocks are right next to each other (without holes).
    So, this principle is very clean!
    A bit confused by the fact that the number of bytes in the ID and LEN fields may be different. But this fact is decribed at top of specs (see "EBML principle").
    Most significant bits is the key of how many bytes are used.
    Big Endianness is used everywhere.
    Bits order is also BE, i.e. first(left) bit is Bit0, and last(right) bit of byte is Bit7.

    The EBML blocks have a hierarchy (Level 0, Level 1, Level 2, etc).
    <ID>
    <LEN>
    <DATA> ---> it includes other EBML blocks with low hierarchy level.

    After writting some functions for EBML (i.e. "byte_reorder", "encode_length", "decode_length" and so on), you can do next step - working with segment_id section, segment_tracks section, and clusters.

    You can use predifined byte arrays to quickly write some data parts.
    For example, here is mkv-header
    const char MKV_Header[] = {0x1A, 0x45, 0xDF, 0xA3, 0xA3, 0x42, 0x86, 0x81, 0x01, 0x42, 0xF7, 0x81, 0x01, 0x42, 0xF2, 0x81, 0x04, 0x42, 0xF3, 0x81, 0x08, 0x42, 0x82, 0x88, 0x6D, 0x61, 0x74, 0x72, 0x6F, 0x73, 0x6B, 0x61, 0x42, 0x87, 0x81, 0x02, 0x42, 0x85, 0x81, 0x02 };
    And so on.

    The very very useful tool is MKVINFO !
    Try to download and install 'mkvtoolnix' packet. Here is "mkvinfo" program.
    Try to open mkv file from it (use small mkv files to avoid long parsing time!).
    Click "Options/Show all elements", you can browse data structure including clusters and so on. It is very useful for debug! You will create your mkv files and always check it for consistency using 'mkvinfo'.
    Another useful tool is 'mkvmerge' - it can produce mkv-files using video and audio data streams (including raw h.264). You can try to generate generic mkv-file using 'mkvmerge'. Simple open file with ".264"-extension and press "mux" button. Now you can analyze file structure using 'mkvinfo'.
    Note: 'mkvinfo' can show sizes of elements. This size is - full size of EBML block (id+len+data).

    So. At first time, try to create your own mkv file using only basic sections:
    * EBML Header
    * Segment (with len = 0x01FFFFFFFFFFFFFF = -1 = unknown segment length)
    ** Info
    ** Tracks (single video track)
    and few clusters (with only h264-frames inside).
    I.e. without cues, metaseek and others. If file is played correctly, try next features.
    Add audio track and subtitles. Also add metaseek and cues sections.

    After 3-4 months you will become a Mkv Profi !     I believe :-)
    You will have your own library, or you have already clearly will know how to use standard libraries for the mkv.

    But here is one trouble for you case - G711.
    There is "CodecID" field inside Tracks section.
    For H.264 : CodecID="V_MPEG4/ISO/AVC"
    For PCM   : CodecID="A_PCM/INT/LIT"
    For AAC   : CodecID="A_AAC/MPEG4/LC"
    Which type of CodecID for G711?
    I don't know. It can be problem.

    Also, I didn't advertise here the advantages of mkv to other containers! Other containers like AVI, MP4, MOV - are also good. Just I did not immediately able to find the proper documentation for their.
    And I may be entirely in vain then I advise you to use mkv?
    I just only describe here my experience.
    What type of container you need to use - you have to decide for yourself.
    It would be great if forum users would write here own experience with different containers!

    Good Look!
    ---
    best regards, senchuss

  • Here is single thread solution based on encode-demo.
    Assume 25fps (40ms video frame period).
    It would be nice to set h264 encoding parameters, so each I-frame is every 25 frames.

    File is: "writer.c"
    Function is: "Void *writerThrFxn(Void *arg)"

    BEFORE the main cycle (while (TRUE) {)
    1. Generate mkv-header
    2a. Write SegmentID ([18][53][80][67])
    2b. Write "unknown length" 0x01FFFFFFFFFFFFFF (64 bit) or you can use shorter "unknown" length 0x03FFFFFFFFFFFF (56 bit) or so on shorter.
    3. Write Info section (all data is known, except duration)
    3a. Write ID [15][49][A9][66], Write known length.
    3b. Put necessary subfields
    * SegmentUID= (random 16 bytes or any as you wish)
    * TimecodeScale=1000000 (by default = 1 ms)
    * MuxingApp=(your string)
    * WritingApp=(your string)
    3c. (optional, but highly recomended) Reserve some bytes for "Duration" field using Void(0xEC) block.

    4. Write Tracks section (all data known)
    * TrackNumber=1
    * TrackUID= (random 4 bytes or any as you wish)
    * TrackType=1 (video)
    * FlagEnabled=1
    * FlagDefault=1
    * FlagForced=1
    * FlagLacing=0
    * MinCache=1
    * DefaultDuration=40
    * TrackTimecodeScale=1.0
    * MaxBlockAdditionID=0
    * Language=(your 3 chars lang id)
    * CodecID=V_MPEG4/ISO/AVC
    * CodecDecodeAll=1
    * (Video) [E0]
    ** FlagInterlaced=(1 or 0)
    ** PixelWidth=?
    ** PixelHeight=?
    * TrackNumber=2
    * TrackUID= (random 4 bytes or any as you wish)
    * TrackType=2 (audio)
    * FlagEnabled=1
    * FlagDefault=1
    * FlagForced=1
    * FlagLacing=0
    * MinCache=1
    * DefaultDuration=40
    * TrackTimecodeScale=1.0
    * MaxBlockAdditionID=0
    * Language=(your 3 chars lang id)
    * CodecID=A_PCM/INT/LIT
    * CodecDecodeAll=1
    * (Audio) [E1]
    ** SamplingFrequency=(your sampling freq)
    ** Channels=(1=mono, 2=stereo) -- assume mono for simplicity
    ** BitDepth=16 (for PCM)

    INSIDE MAIN CYCLE:

    5. "Open" Cluster (write [1F][43][B6][75] and write "unknown" length)
    5.1. Get and store 25 video and 25 audio frames:
    5.1.1. Using DMAI func "Buffer_getUserPtr(hOutBuf)" get current H264 frame and store it to cluster using Block([A1]) specs.
    5.1.2. Using ALSA func "snd_pcm_readi()" get current audio frame (appropriate samples of sound for 40ms duration) and store it to audio block.
    5.2. "Close" cluster (after 25 vid+aud frames are written), simple seek to field of "unknown" len of cluster and replace it with known length! Seek to the end of cluster again.
    6. goto p.5 until exit.

    Each of cluster contains from 25 video frames (first is H264 type I-frame) and 25 audio frames the same duration of 40ms.
    Each BlockDuration=40
    There are no problems(!) with READ VIDEO H264-FRAME (read(Buffer_getUserPtr())) and then READ AUDIO PCM FRAME from Alsa at internal state of cycle at same thread (get 1 video frame and get 1 audio frame). Both are buffered!
    Each cluster has unique timestamp (timecode). See "Timecode" field in cluster for details.
    If you want, you can put video frames and audio frames to different clusters. As you wish.

    When work is done, you need (optional):

    7. Seek to "unknown" LEN field of Segment and replace it with known bytes of segment data.
    8. Seek to "Void" field for Segment Duration and replace it with EBML block of KNOWN DURATION in milliseconds (recomended).

    This demo produce fully playable mkv-file but without "rewind"(FF).

    If you want "rewind" feature (fast jump to known position of movie), you need to implement CUES and METASEEK. It's more complex.

    Cues - is a table where each cluster position and their timecode stored. Using CUES-table the player can jump to specified cluster shortly. It can read specified
    "TIME" and quickly jump to specified "POS". I.e. the player can easy find desired cluster.

    Also, player should be able to locate any of "Level-1" sections inside segment from mkv-file. This done by METASEEK table.
    Metaseek table include pos of any Lev-1 section, i.e. Info(single), Tracks(single), Cues(single), Cluster(each!).

    So... here is little bit complex solution.
    1. Store mkv-header
    2. Reserve with Void(0xEC) space for future "Top Metaseek" table at the top of segment.
    3. Store "Info" and "Tracks" sections and write their positions to temporarily metaseek file.
    3. Write clusters! Store their positions to temporarily metaseek file. Also, store cluster pos and timecodes to temporarily cues file.
    4. When "work is done", add cues data to the end and store cues pos to metaseek file.
    5. Store data from temporarily metaseek file as "bottom metaseek" to the end.
    6. Seek to top of seg and replace "Void" block with "top metaseek" that links to "bottom metaseek table".

    It works!

    I have already more than a year using MKV. I LIKE IT! :-)
    Flexible, clean, useful !

    Hope, it helps to anybody!
    Any comments I'd like to see here in this topic.

    ---
    best regards, senchuss

  • and send it via radio link to the decoder.

    Which type of radio xmit type did you use?

    For, example, the mad idea is to use CC8520 like device, to transimt video over audio stream.

    What is your opinion? Can the h264 video-stream be transmitted over I2S data flow?

    It is possible or not?


  • Hi senchuss,

    thank you for your answers!

    Apparently, there is not very much information about mp4 container so I think I check out the link about the matroska specs you sent me. It's a lot of information you wrote in your post and I'm sure I need some time to really understand the structures of a matroska file.

    My radio transmission is realized via a propriatary COFDM module. You "only" need to put a MPEG-TS in the TX and get a MPEG-TS out of the RX. As I were saying, I realized the creation of MPEG packets with the help of an FPGA.

    I looked at the datasheet of the CC8520 and the I2S specs, but I can't give a useful advise. Apparently the I2S is for audio data only but the possible data rate seems high enough to use it for video. I'm not sure if it is possible to define only one channel so that you just have to put the video buffers to the i2s device driver. Maybe multiple channels can be handled with different dma settings (like the use of offsets or so). So if that is your idea, I would give it a try!

    Best regards,

    Matthias

  • Hi, Matthias.

    Thank you for your answer and comments too. The last words I want to say about matroska(mkv) specs. In the main table there are a huge number of block types. It looks scary. I want to assure all that the most of blocks are not used in real life. You do not need to study all of them, just the some.

  • Hi senchuss,

    I will post any progress with the container format but it will take some time because there are several other things to do. But thank you again for your infos!

    Concerning the audio delay problem, I have rewritten my application so that I don't use the sound module of the DMAI but the ALSA user API for access to the sound driver. I have changed some values, the important one in my opinion was the start_threshold value (I set it to 1 and NOT to bufsize as in the DMAI). Now, the sound starts when the first periodSize samples are put into the ring buffer.

    My problem now is that the speech delay is not constant but increases with time (to approx. 500 ms). Then after reaching the highest delay, the delay goes down so that the audio is again sync with video. I think it has something to do with different frametimes in the video thread, they are not constant 33 ms for 30fps framerate). Since I read from and write to the sound driver in the video thread I have to keep that in mind. Maybe I should not read a constant amount of samples every frame from the sound driver but adjust the amount to the given real frametime. I will try that. Please let me know, if you have another approach to my problem. Thank you!

    Best regards,

    Matthias