DM368 H.264 encoder performance at 1080P

i all,

I am working on a DM368 platform, encoding 1080P video input from a TVP7002.  I have seen in a few threads on this forum that the encoder timing of the DM368 at 1080P should be ~27-28 ms.  See here for an example:

http://e2e.ti.com/support/embedded/multimedia_software_codecs/f/356/t/99812.aspx

However, I am not able to get below ~29.1 ms for encoding, and this is for a system running nothing else but the encoder in a single thread, encoding an image that has been filled with black.  Here is a log, printing out stats gathered across Venc1_process calls.  9000 is the total byte count for 30 frame period, 300 is average byte count.  So it is really not doing much work at all.

   877 1970/01/01 00:04:48 Info    main.cpp            (1197) H264 - Queue 0; generated   9000 (   300 avg) in  873953 us ( 29131 avg)
   877 1970/01/01 00:04:49 Info    main.cpp            (1197) H264 - Queue 0; generated   9000 (   300 avg) in  873803 us ( 29126 avg)
   876 1970/01/01 00:04:50 Info    main.cpp            (1197) H264 - Queue 0; generated   9000 (   300 avg) in  873847 us ( 29128 avg)
   877 1970/01/01 00:04:50 Info    main.cpp            (1197) H264 - Queue 0; generated   9000 (   300 avg) in  873654 us ( 29121 avg)

Here are some notes about my device and test setup.

432Mhz ARM
340Mhz DDR

In VPFE setup code I am setting these registers:
MSTPRI0 = 0x00440022
MSTPRI1 = 0x00000555 // Or default, makes no difference

Edit: I forgot to include notes about RSZ_DMA_RZA.  I have tried setting RSZ_DMA_RZA to the recommended value of 0x20, but this results in quite poor encoder performance (i.e. on the order of 32.5ms).  I have set it to 0x50 after a lot of testing, which is how I was able to get the 29.1 values above).  Though at 0x50, I am experiencing periodic capture loss, which is not surprising.  Any suggestions here would be very welcome.

I believe the DDR timings are set up properly, but I am not certain.  I do know if I use the default UBL that comes with the DVSDK my results are much worse (~32 ms per frame).

In my .cfg file I am enabling cache usage for the encoder:
var H264ENC  = xdc.useModule('ti.sdo.codecs.h264enc.ce.H264ENC');
H264ENC.useCache = true;

H.264 encoder version 02.20.00.05.  I have also tried version 02.30.00.04.

Here are the relevent encoder set up parameters.  Most other parameters are at their defaults.

encoderParams.videncParams.encodingPreset = XDM_HIGH_SPEED;
encoderParams.encQuality = 2; // tried 3 as well, no difference
encoderParams.transform8x8FlagIntraFrame = 0;
encoderParams.meAlgo = 0;
encoderParams.rcAlgo = 5; // or 0, no difference
encoderParams.videncParams.rateControlPreset = IVIDEO_USER_DEFINED

encoderDynamicParams.resetHDVICPeveryFrame = 0;
encoderDynamicParams.videncDynamicParams.intraFrameInterval = 0;
encoderDynamicParams.idrFrameInterval = 30;
encoderDynamicParams.maxDelay = 1000;
encoderDynamicParams.videncDynamicParams.targetBitRate = 4000000;
encoderDynamicParams.videncDynamicParams.inputWidth = 1920;
encoderDynamicParams.videncDynamicParams.inputHeight = 1088;

Does anybody have any ideas on how to further reduce the time taken for encoding?

Thanks,

Chris

  • Hi Chris,

    I don't see any problem in your parametrs. Few points i i would suggest to check. You are using rcAlgo = 5 0r 0  which means CBR/Custom CBR.

    Are you really intrested for CBR ?

    In CBR there are two things you need to consider , either frame rate or Bitrate.

    By default codec will try to drop frames to maintain the target birate which is the requirement in CBR.

    But if you want high fame rate then try making your QP value towards maximum.

    rcQMax = 48;rcQMin = 0~9;rcQMaxI = 58;rcQMinI = 0~9;

    But if you want to meet the target bitrate at any cost then you must have to compermise on frame rate.

    Both you can get at same time.

    If CBR is not your strict requirement then go for VBR/CVBR (rcAlgo = 3 0r 1).

    Apart from this one more point i would like to add here.

    if encodingPreset is XDM_HIGH_SPEED,    encQuality should be  3.

    and if encodingPreset is XDM_HIGH_QUALITY, encQuality should be 2.

    Also make sure     enableDDRbuff  = 0 and   sliceMode  = 0 to get a better performance.

    The above suggestions are purely based upon my understanding.

     

  • In reply to sujit mahapatro:

    Hi Sujit,

    Thank you for your detailed and thoughtful response.  I do not always need CBR, but will need it if the customer sets the options to be CBR, so I would like to get the performance to the maximum in either CBR or VBR mode.  Also, the bit rate I am using is high enough not to cause frame dropping most of the time in CBR mode.

    I have tried changing the QP parameters such as you suggest, but did not see a change in performance.  Thanks for the advice regarding encQuality; I have changed it.  Regarding enableDDRbuff and sliceMode, both were already set to 0 but I will make sure not to change them in the future.

    Thanks and regards,

    Chris

  • Does TI not take these performance issues seriously?  I have noticed many similar queries on this board and many of them never get answers, or even a single response.

    I hope you understand how frustrating that is.

    Thanks,

    Chris

  • In reply to Chris Richardson77843:

    Hi Chris,

    Sorry we missed this thread. 29.1 ms are the expected numbers for 1080p on DM368(ARM@432Hz, DDR@340HHz). You can also conform the numbers with data mentioned in datasheet. We claim 35fps for encQuality = 3 or HIGH_SPEED mode, same thingt you are getting.

    Thanks,

    Veeranna

  • In reply to Veeranna Hanchinal:

    Hi Veeranna,

    Thank you very much for the response.

    I do have another question regarding this issue.  In my real application I can't encode simple black images so I am also capturing video using ISIF->IPIPEIF->RSZ->DDR chain.  I have tweaked the RSZ_DMA_RZA register according to some advice I have seen in other threads.  I noticed that the recommended value is 0x20 for larger frame sizes, but using 0x20 leaves not as much room for the encoder and I get ~32.4 ms per encode.  I have used the value 0x50 to achieve the 29.1 numbers above, but this is leading to periodic capture loss.  Do you have any other advice in this area?  If I also enable video streaming the encode time increases beyond 33 at which point I am dropping frames.  I'd really like to get a solid 30FPS out of this hardware but it seems very difficult.

    Thank you,

    Chris

  • In reply to Chris Richardson77843:

    Hi Chris,

    We requested App team to look into your query.

    Thanks,

    Veeranna

  • In reply to Veeranna Hanchinal:

    Can you please check your MASTER PRIORITY?

    MASTER Priority: The master priority registers needs to be modified
    as below. For details, please refer to
    http://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=s
    prufg5a&fileType=pdf  for details on meaning of this register.

    Sugggested value: MSTPRI0 : 0x1c4003C = 0x440011

    DDR timing: The value of the below registers can affect the
    performance of the system. Hence, it is advised to review the value of
    these registers and should be kept at the most optimized value. There
    is no specific recommended value, as the value will change depending
    on the DDR used in the product. For details please refer to :
    http://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=s
    prufi2&fileType=pdf
    DDR->SDTIMR
    DDR->SDTIMR2
    DDR->SDBCR
    DDR->SDRCR

    3) Resizer registers: Optimized settings for DDR access of resizer. For
    details, please refer to
    http://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=s
    prufg8c&fileType=pdf for details on meaning of this register.


    0x01C7:0420h (DMA_RZA SDRAM Request Minimum Interval for RZA) = 0x20
    0x01C7:0424h (DMA_RZB SDRAM Request Minimum Interval for RZB) = 0x20


    4) OSD and other related DMA transfer: This should happen on TC1/QUEUE1.

  • In reply to Raghu Kudva:

    Hi Raghu,

    Thank you for the response.

    1. We had previously set MSTPRI0 to 0x440022 based on recommendations in a thread last year.  I have tried changing it to 0x440011 but have found performance to be better with 0x440022.  We also set MSTPRI1 to 0x555.

    2. I have gone through the DDR register set up from scratch and recalculated all the fields.  There were some incorrect calculations in SDTIMR which I have since corrected, and this change definitely helped overall throughput.  Thanks!

    3. I have already set RZA_DMA and RZB_DMA.  Though for my testing purposes now I have only been using ResizerA and left ResizerB disabled.

    4. I don't have analog video output enabled for this testing, but am very interested in optimizing its usage of DMA.  We have modified the VPFE to correctly set up the buffers in order to avoid DMA copy for analog output, but is there another case here to worry about?  I also use DMA via DMAI accelerated FrameCopy to perform graphics overlay (which is disabled at the moment) and would be interested to optimize this as well.  However, I am unable to determine where I can set a specific TC to use for any DMA operations.  I understand it will be different for various components, but do you have a starting point or any specific area I can look at?

    Thank you very much for the help and suggestions.

    Chris