This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM365 Decodes multiple H.264 streams

Dear all,

I'm a newbie to TI digital media processors.  Right now, I have a project to decode 4 H.264 streams of resolution 720p (preferable) simultaneously.  I've studied the documents in DVSDK for a couple of days.  I can only find that we can create multiple of H.264 decoder instances for the purpose but I can't find any information whether DM364 has sufficient horse power to decode 4 H.264 streams of resolution 720p.  Can anyone kindly share your experience in DM365?

Also, after couple of days of study, I still can't get a clear picture on the programming model and how to start my project for DVSDK.  Could anyone kindly tell me any tutorial or overview of DVSDK?

Thank you so much in advance!

Regards

Chitat

  • Hi Chitat,

    Please refer to H.264 Decoder datasheet supplied with the DVSDK. Actually, we support only one stream of H.264 decode at 720P30fps on DM365. If you want to compromise on FPS, more streams can be added.

    The best place to start your project is to refer to decode demo on DM365 DVSDK and start enhancing it for your requirements.

     

    Regards,

    Anshuman

  • Anshuman,

    Thank you so much!

    Based on the average result in the datasheet, DM365 can support either

       1. one stream of H.264 720p@30fps decode, or

       2. three streams of H.264 VGA@30fps decode, or

       3. three streams of H.264 D1@30fps decode,

    if I interpreted it correctly.  Am I right?

    I found some terms in Tables 2 & 3 that I can't understand.  Could you kindly tell me what the meanings of ARM926 PER FRAME and DECODE PER FRAME (ARM926 and ARM968)?  I knew that DM365 is based on a core of ARM926 but what is the ARM968 for in DM365?

    Also, do the AVERAGE and PEAK mean that the testing conditions were under average load and peak load respectively?

    Regards,

    Chitat

     

     

     

  • Hi Chitat,

    Your calculations are theoretically correct.

    One rule of thumb: When you add multiple instances of decoders, there would be switching and reloading of code for each instance which will bring down the performance. I would suggest expect a drop of atleast 10%.

     

    If you looked at DM365 block diagram, you would have noticed there is HDVICP hardware engine dedicated for H.264 encode/decode. This has ARM968 as a control processor.

    In the datasheet ARM926 load for encoder or decoder is the code loading part and some interrupt handling part for each frame of encode/decode. Otherwise ARM926 is free when codec is running.

     

    Regards,

    Anshuman

  • Hi Anshuman,

    Thank you so much!

    Also, I checked with H.264 encoder datasheet and it mentioned that DM365 is able to encode a H.264 720p@30fps video stream.

    Do you know whether or not DM365 is able to encode a H.264 720p@30fps stream and decode a H.264 720p30fps stream simultaneously?

    Regards,

    Chitat

  • Chitat,

    If you look at the datasheet, it explicitly says the maximum performance of encoder and decoder. I had also mentioned in earlier post that encoder and decoder use same hardware engine. So this precisely means we cannot have 720P30 encode and decode simultaneously.

     

    Regards,

    Anshuman

  • Anshuman is correct, both cannot be done simultaneously since they both use the same hardware; however, please note that depending on the size of frame and refresh rate you require, you may be able to time-multiplex both.  Our software architecture is such that you can create an instance of encode and decode under a single application but will only one will be able to take onwership of hardware resources to process a frame at a single point in time, but after one (say encode) is done, you can call the other (decode) and so on.  In fact this is what our encodedecode demo included in the DVSDK does.

  • Anshuman and Juan,

    Thank you so much!  I got it.

    In other words, the build-in co-processor can only encode and decode H.264 video streams simultaneously at a lower resolution or frame rate, says 2 VGA@30fps, or 720p@12fps,..., etc, if the firmware is carefully engineered to time-multplex the encode and decode processes.  Am I right?

    I'm right now studying the sample code for decoding H.264 videos.  It seems that it just decodes H.264 test videos without displaying those decoded videos to neither composite nor component video outputs.  Am I right?   Is there any sample code exampifying this purpose in dvsdk and somewhere else?  

    Regards,

    Chitat

  • What sort of synchronization method is used for two threads to gain access to the same hardware for encode and decode?

  • I'm not sure if decoding works as encoding, but there is only 1 encoding unit, and this unit can encode only 1 image at a a time. So there is no gain in using multiple threads to encode a frame. But since HW encoding can be very fast, ofter there is enough time to encode the same image (or different images) into multiple streams. For example PAL works 25 ips. This means there is a 40ms distance between progressive images. If encoding at a chosen resolution takes 10ms, you can theoretically encode 4 streams real time, or more realistically 3 streams.

    But notice that several parts of the "encoding" process can be parallelized since they are performed by different hardware units, so there is actually a gain if you use multiple threads the right way. As an example, video capture, video resize, encoding, are all performed by different units, so if you use multiple threads you can encode while resizing. This allows to achieve real time when encoding multiple streams.

    Synchronization can be achieved the usual way using thread semaphores and other primitives or using DMAI's functions.

    So if decoding works the same way as encoding there is only one unit, but it can be so fast to work on several streams allowing to do real time decoding on several streams, even if in fact it is not decoding in parallel.

     

  • Hi John,

    I believe your question is specifically w.r.t. to encoding and decoding some content in different threads. If yes, then Codec Engine does that scheduling and locking, based on the scratch group Id that is mentioned in the .cfg file of the application. Each codec/algorithm would have a scratch group Id. For example, if you made a process call for H.264 encoder from one thread, and process call for H.264 deocder from another, CE will sequentialize these calls and let one go through at a time and the other thread will be in non-blocking sleep. Application need not worry about it.

    If your question is w.r.t. capture and encode in parallel, then ofcourse you are going to get a lot of advantage as encode happens on hardware accelerators and ARM is free to do any capture processing. But in this case, application has to ensure syncrhonization of buffers from capture to encoder. This is shown in DVSDK demos.

     

    Regards,

    Anshuman