This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

dm368 1080p h264 single stream not able to reach 30fps

Hi

I'm encountering some frame rate issue for dm368 1080p h264 single stream.

1: there is only H264 single stream, no MJPEG, no CVBS. VIDENC1_process() will consum 32/33 ms. so it's very tough to output stream 30fps. is the consum time statistic correct on DM368 board?

my DVSDK: dvsdk_3_10_00_19 H264                                  

codec lib:H264ENC.version.02.10.00.09

2: if I add CVBS(some DMA copy will be excuted), VIDENC1_process() will consum 35/36 ms. i don't know why CVBS DMA copy will effect process performace?

 

  • Can you share your encoder settings/configuration?

  • Ritesh,

    thanks for your reply in time. i appreciate that.

    1080settings-configuration.zip
  • Can you make below changes in configs and check:

    meAlgo = 0;

    transform8x8FlagIntraFrame = 0;

    resetHDVICPeveryFrame = 0;

  • Hi Ritesh

    thanks

    some modifications are done based on your suggestion.

    however, VIDENC1_process() consume time is still about 33ms.

  • Codec performance (VIDENC1_process() consume time) varies with the DDR loading. In the case, when you add CVBS it might be delaying the DDR transfers related to codec hence you see performance drop. I'll get back to you on this at the earliest. The changes, which I had suggested you to do, should have improved performance by 0.2- 0.3 ms.

  • thanks for your clarification.

    Now, my critical point is if my 32/33ms time is right on DM368 1080p enconding.

    if that's right, that is say DM368 1080p performance have some problem, or else i'll continue to check from my side, including software and hardware.

  • Hi William,

    Ideally Codec library: 2.10.00.09 should clock ~30.5ms on DM368 board (ARM @ 432 MHz, DDR @ 340).

    I had a look at your encoder settings. They look fine and hope you have taken in changes mentioned by Ritesh.

    Also, make sure that there is no frame copy happening with CVBS. CVBS can be implemented without doing a frame copy with pipeline changes.

    But first we need to achieve desired performance with CVBS Off and MJPEG Off. Can you confirm on below two things:

    1. Is MJPEG create being done? If so don't do it. Create only H264 encoder and measure the performance.

    2. Are frequency settings same as mentioned above?

    Rgds, mahant

     


  • Hi Mahant

    i really appreciate you. your information is very useful to me.

    i made some changes mentioned by RItesh. there is no obvious improvement. i will try to integrate it into your suggestions.

    1. Is MJPEG create being done? If so don't do it. Create only H264 encoder and measure the performance.

    William: MJPEG is created. i'll disable it and measure the performance.

    2. Are frequency settings same as mentioned above?

    William: in my evaluation, all settings are the same and  only are set in H264 create. no dynamic change.

    whatever i'll let you know.

  • in addition, my CVBS is done by DMA but not frame copy.

    when CVBS DMA excuted, it makes encoder consume more about 3ms.

  • WIlliam,

    As Mahant said, you should avoid doing any frame copy (even if it is DMA based) on DM36x. CVBS can be implemented without frame copy. Frame copy introduced 2 x D1 x 30 frame of extra DDR b/w per sec which effects codec performance. The resizer can output D1 resoultuon whcih can be directly read by OSD. This is more optimized way. This has been implemnted in TI reference IPNC and we have seen DM368 giving 30fps with CVBS.

    Also, there are few system setting which can improve the overall performance. Pls see

    1)       DDR timing: The value of the below registers can affect the performance of the system.  Hence, it is advised to review the value of these registers and should be kept at the most optimized value. There is no specific recommended value, as the value will change depending on the DDR used in the product. For details please refer to : http://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=sprufi2&fileType=pdf

     

    DDR->SDTIMR

    DDR->SDTIMR2

    DDR->SDBCR

    DDR->SDRCR

     

    2)       MASTER Priority: The master priority registers needs to be modified as below. For details, please refer to http://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=sprufg5a&fileType=pdf for details on meaning of this register.

     

    MSTPRI0 : 0x1c4003C = 0x440022

     

    3)       Resizer registers: Optimized settings for DDR access of resizer. For details, please refer to http://focus.ti.com/general/docs/lit/getliterature.tsp?literatureNumber=sprufg8c&fileType=pdf for details on meaning of this register.

     

    0x01C7:0420h (DMA_RZA SDRAM Request Minimum Interval for RZA) = 0x40

    0x01C7:0424h (DMA_RZB SDRAM Request Minimum Interval for RZB) = 0x40

     

     

    4)       OSD related DMA transfer: This should happen on TC1/QUEUE1.

    .

    regards

    Yashwant

  • Yashwant,

    i'm very glad to receive your very useful and detail information. i'll excute these optimization approach in next week and let you know.

    many many thanks.

  • Hi,

    till now,  DM368 encoding capability still trouble me.

    i do some improvment according to your above suggestions. and my test results as follows:

    1: DM368 encoding VIDENC1_process consume time:

    (1) Single H264 encoding + CVBS(DMA): VIDENC1_process consume time: 35/36ms

    (2) Single H264 encoding : VIDENC1_process consume time: 31/32ms

     

    2: DM368_ex encoding VIDENC1_process consume time:

    (1) Single H264 encoding + CVBS(DMA): VIDENC1_process consume time: 31/32ms

    (2) Single H264 encoding : VIDENC1_process consume time: 29ms

    there are serveral question:

    1: do you think if it's possible to do some improvements for my h264 encoding performance? how do?

    2: do you have DM368_ex encoding test data? how much consume time is correct for VIDENC1_process?

    3: whether it's possible to support 1080P(H264)+VGA(MJPEG) on DM368_ex?

  • Hi William,

    Let me answer your questions first -

    1: do you think if it's possible to do some improvements for my h264 encoding performance? how do?

    >> If you are using ver 2.0 or ver 2.1 codec, the above performance data for single H.264 encoding looks fine. The bigger problem is the performanec with CVBS and this is because you are using DMA for that. Why cant you avoid this DMA  as we suggested earlier  ?

    2: do you have DM368_ex encoding test data? how much consume time is correct for VIDENC1_process?

    >> On DM368_ex , H.264 ecode(new ver 2.2) take 27.4ms to encode 1 1080P frame with streaming and CVBS ON. Please note that CVBS in this case does not use DMA. Display direclty picks up the data from the DDR location where resizer dumps it.  We will be shortly uploading the ver 2.2 codec on codecs downlaod page. In case you want it early, please ask your local TI contact.

    3: whether it's possible to support 1080P(H264)+VGA(MJPEG) on DM368_ex?

    >> Yes, it is very much possible to have 1080P(H264)+VGA(MJPEG) on DM368_ex. We have this use case running on TI IPNC.

     

    regards

    Yashwant

  • Hi Yashwant,

    thanks for your confirm and answer.

    1. I just get 2.2 codec from our local TI FAE, and encode 1080P P frame clock(approximatly 27~28ms---27.6ms) is almost consistant wih your results. but I frame will consume approximatly 33ms, this is a bigger problem to implement real time 1080P(H264---30fps)+VGA(MJPEG---30fps),

    2. we are considering your CVBS suggestion.

  • Hi Yashwant,

    another question:

    i'm implementing 1080P H264+VGA-MJPEG encoding,

    with 2.2 codec, while do 1080p h264 single stream encoding(still keep dual stream capture_get), h264 encoding clock is 27~28ms,

    however, if do h264 and mjpeg dual stream encoding, h264 encoding clock is 32~33ms, my h264 and mjpeg encoding is  implemented serially.

    I don't know why mjpeg will affect h264 encoding performance? can you give me some suggestion?

  • Hi William,

    - regarding loss in performance when run with Jpeg, can you please check the value of "resetHDVICPeveryFrame" of the dynamic parameters. For Jpeg+H.264 combo, it is recommended to set it to 2. Pls see page 4-50 of H.264 encoder user guide for details.

    - Regading I frame performance - If you have IntraFrameInterval set as 30, the slightly poor performance of I farme should not effect the overall performance. You will see the effect only when the IntraFrameInterval is ver low like 2 or 5 which is generally not the case.

    regards

    Yashwant

  • Hi Yashwant,

    thanks for you very useful information.

    it's works fine in our system with H264+MJPEG VGA  30fps without CVBS;

    but there are still serveral problems blocking us.

    our requirment need H264-1080P + MJPEG-VGA + CVBS; 

    The resizer can output D1 resoultuon whcih can be directly read by OSD, the data can be sent to CVBS for display or MJPEG buffer for encoding,

    however, how can send it to CVBS and MJPEG simultaneously.

    if it's possible, how can i do without DMA or frame copy? in other word, don't affect H264 encoding performance.

  • do you have any idea for above questions?

  • William,

    Are you saying that your MJPEG and CVBS resolution are different ? If yes, then this optimization is not possible. For having the same data fed to both OSD and MJPEG, you need to have both of them with same resolution.

    regards

    Yashwant

  • Yshwant,

    thanks for your reply,

    my MJPEG and CVBS resolution are the same.

    my older solution is:

    1: the first capture YUV data is fed to H264 to encode;

    2: the second capture YUV data is sent to CVBS(DMA COPY) and MJPEG to encode.

    based above clarification, CVBS DMA copy will affect H264 encoder performance.

    now i want to accept your suggestion which CVBS read data directly from OSD. but how do MJPEG get source data? from OSD as well? how to do it?   

  • Hi William,

    The Image Processing pipeline in DM36x IPNC in simplest form looks like as shown below:

    Capture -> Encode -> Stream
        |
       \/
    Display

    The display path is branched off from the Capture Node.
    In the display node ,we have a set of display buffers into which we copy the frames from Capture buffer.
    The display buffers' address are used for switching the display buffers in the VENC ISR.

    You can avoid this framecopy if you use the capture buffers' address directly for switching the display buffers in VENC ISR.
    Here directly means that Display node doesn't hold the capture buffers, it doesn't call 'getFullBuf' and 'putEmptyBuf' routines.

    regards,
    Anand