This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

H264 Decoder Performance on DM6467

I've been using the MPEG2 decoder for years on a 720 MHz DM6467 system.  Recently the satellite input has changed to H264 in my application.

According to the spec. sheet, H264 decode should easily run real time on 720 MHz device for 1080i inputs.

However, the performance I'm able to obtain is far from real time.  For a simple 1080i decode of each picture is requiring 32.5 ms.  In order to reach real time, that number has to be below 16.7 ms per picture.

I've pared down the application so that the only thing I'm doing is a call to VIDDEC2_process().  From my perspective there's no further optimization I can do.  I suspect I've got a problem in the configuration of the codec combo being used, but I haven't been able to figure out the magic settings. 

Is there a troubleshooting procedure for understanding why performance numbers in the spec. sheet are not being met?  If not, is there a demo application that runs on the DM6467 for profiling H264 performance?  Any pointers greatly appreciated!

  • On closer examination of codec combo configuration files, I found that L2 was configured for no cache.  Reconfiguring for 32 kB of cache, I'm now getting a decode time of 14.3 ms per picture which agrees with the data sheet.

    While the decode time is much better, there is still a performance problem.  When I enable VDCE in parallel with decode, performance degrades to 18 ms per picture.  Again this is too slow for real time operation.  Also, the exact same unit test for MPEG2 shows a degradation in decode time of less than 1 ms. 

  • Pursuing the codec FAQ on real time problems, there are suggestions for improving performance based on distributing EDMA transfers across all TC.  Something like this seems worth investigating in my application.  Where in the codec combo framework can I apply/verify the recommendation to "Try to distribute the transfers on the TCs such the load is balanced between these TCs. Usually, it happens that one TC is much more loaded than the other TCs, and hence this particular TC could become a bottleneck. "?

    In isolation, H264 decode is running real time at around 30 ms per frame.  However, as soon as I start adding the rest of the required data transfers decoding time quickly degrades.  The application requires AC3 decode (on DSP), H264 decode, VDCE 420->422, 422->VPIF, and TSIF capture.  The biggest drop in performance seems to happen when VDCE is enabled.

  • Joe,

    Could you please email me so that I can connect to the codec guys to respond to your query. Please mention the current performance numbers for various use cases and the expected numbers as well.