TI E2E Community
Multimedia Software Codecs
Multimedia Software Codecs forum
DM368 H.264 encoder performance at 1080P
I am working on a DM368 platform, encoding 1080P video input from a TVP7002. I have seen in a few threads on this forum that the encoder timing of the DM368 at 1080P should be ~27-28 ms. See here for an example:
However, I am not able to get below ~29.1 ms for encoding, and this is for a system running nothing else but the encoder in a single thread, encoding an image that has been filled with black. Here is a log, printing out stats gathered across Venc1_process calls. 9000 is the total byte count for 30 frame period, 300 is average byte count. So it is really not doing much work at all.
877 1970/01/01 00:04:48 Info main.cpp (1197) H264 - Queue 0; generated 9000 ( 300 avg) in 873953 us ( 29131 avg) 877 1970/01/01 00:04:49 Info main.cpp (1197) H264 - Queue 0; generated 9000 ( 300 avg) in 873803 us ( 29126 avg) 876 1970/01/01 00:04:50 Info main.cpp (1197) H264 - Queue 0; generated 9000 ( 300 avg) in 873847 us ( 29128 avg) 877 1970/01/01 00:04:50 Info main.cpp (1197) H264 - Queue 0; generated 9000 ( 300 avg) in 873654 us ( 29121 avg)
Here are some notes about my device and test setup.
432Mhz ARM340Mhz DDR
In VPFE setup code I am setting these registers:MSTPRI0 = 0x00440022MSTPRI1 = 0x00000555 // Or default, makes no difference
Edit: I forgot to include notes about RSZ_DMA_RZA. I have tried setting RSZ_DMA_RZA to the recommended value of 0x20, but this results in quite poor encoder performance (i.e. on the order of 32.5ms). I have set it to 0x50 after a lot of testing, which is how I was able to get the 29.1 values above). Though at 0x50, I am experiencing periodic capture loss, which is not surprising. Any suggestions here would be very welcome.
I believe the DDR timings are set up properly, but I am not certain. I do know if I use the default UBL that comes with the DVSDK my results are much worse (~32 ms per frame).
In my .cfg file I am enabling cache usage for the encoder:var H264ENC = xdc.useModule('ti.sdo.codecs.h264enc.ce.H264ENC');H264ENC.useCache = true;
H.264 encoder version 02.20.00.05. I have also tried version 02.30.00.04.
Here are the relevent encoder set up parameters. Most other parameters are at their defaults.
encoderParams.videncParams.encodingPreset = XDM_HIGH_SPEED;encoderParams.encQuality = 2; // tried 3 as well, no differenceencoderParams.transform8x8FlagIntraFrame = 0;encoderParams.meAlgo = 0;encoderParams.rcAlgo = 5; // or 0, no differenceencoderParams.videncParams.rateControlPreset = IVIDEO_USER_DEFINED
encoderDynamicParams.resetHDVICPeveryFrame = 0;encoderDynamicParams.videncDynamicParams.intraFrameInterval = 0;encoderDynamicParams.idrFrameInterval = 30;encoderDynamicParams.maxDelay = 1000;encoderDynamicParams.videncDynamicParams.targetBitRate = 4000000;encoderDynamicParams.videncDynamicParams.inputWidth = 1920;encoderDynamicParams.videncDynamicParams.inputHeight = 1088;
Does anybody have any ideas on how to further reduce the time taken for encoding?
I don't see any problem in your parametrs. Few points i i would suggest to check. You are using rcAlgo = 5 0r 0 which means CBR/Custom CBR.
Are you really intrested for CBR ?
In CBR there are two things you need to consider , either frame rate or Bitrate.
By default codec will try to drop frames to maintain the target birate which is the requirement in CBR.
But if you want high fame rate then try making your QP value towards maximum.
rcQMax = 48;rcQMin = 0~9;rcQMaxI = 58;rcQMinI = 0~9;
But if you want to meet the target bitrate at any cost then you must have to compermise on frame rate.
Both you can get at same time.
If CBR is not your strict requirement then go for VBR/CVBR (rcAlgo = 3 0r 1).
Apart from this one more point i would like to add here.
if encodingPreset is XDM_HIGH_SPEED, encQuality should be 3.
and if encodingPreset is XDM_HIGH_QUALITY, encQuality should be 2.
Also make sure enableDDRbuff = 0 and sliceMode = 0 to get a better performance.
The above suggestions are purely based upon my understanding.
Thank you for your detailed and thoughtful response. I do not always need CBR, but will need it if the customer sets the options to be CBR, so I would like to get the performance to the maximum in either CBR or VBR mode. Also, the bit rate I am using is high enough not to cause frame dropping most of the time in CBR mode.
I have tried changing the QP parameters such as you suggest, but did not see a change in performance. Thanks for the advice regarding encQuality; I have changed it. Regarding enableDDRbuff and sliceMode, both were already set to 0 but I will make sure not to change them in the future.
Thanks and regards,
Does TI not take these performance issues seriously? I have noticed many similar queries on this board and many of them never get answers, or even a single response.
I hope you understand how frustrating that is.
Sorry we missed this thread. 29.1 ms are the expected numbers for 1080p on DM368(ARM@432Hz, DDR@340HHz). You can also conform the numbers with data mentioned in datasheet. We claim 35fps for encQuality = 3 or HIGH_SPEED mode, same thingt you are getting.
Thank you very much for the response.
I do have another question regarding this issue. In my real application I can't encode simple black images so I am also capturing video using ISIF->IPIPEIF->RSZ->DDR chain. I have tweaked the RSZ_DMA_RZA register according to some advice I have seen in other threads. I noticed that the recommended value is 0x20 for larger frame sizes, but using 0x20 leaves not as much room for the encoder and I get ~32.4 ms per encode. I have used the value 0x50 to achieve the 29.1 numbers above, but this is leading to periodic capture loss. Do you have any other advice in this area? If I also enable video streaming the encode time increases beyond 33 at which point I am dropping frames. I'd really like to get a solid 30FPS out of this hardware but it seems very difficult.
We requested App team to look into your query.
Can you please check your MASTER PRIORITY?
MASTER Priority: The master priority registers needs to be modifiedas below. For details, please refer tohttp://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=sprufg5a&fileType=pdf for details on meaning of this register.
Sugggested value: MSTPRI0 : 0x1c4003C = 0x440011
DDR timing: The value of the below registers can affect theperformance of the system. Hence, it is advised to review the value ofthese registers and should be kept at the most optimized value. Thereis no specific recommended value, as the value will change dependingon the DDR used in the product. For details please refer to :http://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=sprufi2&fileType=pdfDDR->SDTIMRDDR->SDTIMR2DDR->SDBCRDDR->SDRCR
3) Resizer registers: Optimized settings for DDR access of resizer. Fordetails, please refer tohttp://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=sprufg8c&fileType=pdf for details on meaning of this register.
0x01C7:0420h (DMA_RZA SDRAM Request Minimum Interval for RZA) = 0x200x01C7:0424h (DMA_RZB SDRAM Request Minimum Interval for RZB) = 0x20
4) OSD and other related DMA transfer: This should happen on TC1/QUEUE1.
Please mark this reply as Verify Answer on this post if it answers the question.
Thank you for the response.
1. We had previously set MSTPRI0 to 0x440022 based on recommendations in a thread last year. I have tried changing it to 0x440011 but have found performance to be better with 0x440022. We also set MSTPRI1 to 0x555.
2. I have gone through the DDR register set up from scratch and recalculated all the fields. There were some incorrect calculations in SDTIMR which I have since corrected, and this change definitely helped overall throughput. Thanks!
3. I have already set RZA_DMA and RZB_DMA. Though for my testing purposes now I have only been using ResizerA and left ResizerB disabled.
4. I don't have analog video output enabled for this testing, but am very interested in optimizing its usage of DMA. We have modified the VPFE to correctly set up the buffers in order to avoid DMA copy for analog output, but is there another case here to worry about? I also use DMA via DMAI accelerated FrameCopy to perform graphics overlay (which is disabled at the moment) and would be interested to optimize this as well. However, I am unable to determine where I can set a specific TC to use for any DMA operations. I understand it will be different for various components, but do you have a starting point or any specific area I can look at?
Thank you very much for the help and suggestions.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.