This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[DM8168] Video Display Latency

Other Parts Discussed in Thread: TVP5158

DM Champ,

I'm working with a customer who's using 2x DM816x for encode, transport and display.  This application will be used for robotic control so glass to glass latency is critical.  The following latency measurements have been very helpful in determining the expected latency that will be seen in the end system.  All the capture, encode, decode latencis are appealing except for the final Scale/Chroma Con. Display Delay T7–T5 This time is roughly double the capture and encode latency.  Can you kindly explain the reason (break down) for this large latency.  Is there a way to bypass/accelerate this time?

DM8168 Latency Measurements:
  • Hi Philipp

    i think the shift issue related to the startX and startY offset generated by decoder is not handing in Display link.  The decode allocated a slightly bigger buffer than the actual picture size and it uses the remaining as scratch area.  so there is a 32 pixel width vertical strip in the left of the image and 24 pixel breadth horizontal strip from the TOP need to be removed while displaying.  This was handled in SWMS and now when you bypass SWMS the same logic need to be added at the input side of display link or the output side of Decoder link.  Its mainly the buffer point manipulation

    new buff address = original address + startY * pitch + stratX 

    where startX = 32, startY = 24

    this should be done for both Luma and chroma buffer pointers

    please refer the SWMS function SwMsLink_drvModifyFramePointer() 

    regards, shiju

  • Ah yes, that looks promising. I will investigate this. Thanks Shiju!

  • Your advice was dead-on Shiju! Thanks!

    Unfortunately, I was unable to appropriately mirror the mosaicer code as the display seems to be lacking some information regarding startX, startY etc.. Or did I miss something there?

    I ended up with the following patch to remove the extra data:

    diff --git a/mcfw/src_bios6/links_m3vpss/display/displayLink_drv.c b/mcfw/src_bios6/links_m3vpss/display/displayLink_drv.c
    index 82e60c4..2ad5ace 100755
    --- a/mcfw/src_bios6/links_m3vpss/display/displayLink_drv.c
    +++ b/mcfw/src_bios6/links_m3vpss/display/displayLink_drv.c
    @@ -1553,21 +1553,36 @@ Int32 DisplayLink_drvProcessData(DisplayLink_Obj * pObj)
                 {
                     Bool frameReject;
                     UInt32 pitch0,pitch1;
    +                Int32 offset0, offset1;
    +                Int32 bytesPerPixel = 1;

                     pitch0 = pObj->displayFormat.pitch[0];
                     pitch1 = pObj->displayFormat.pitch[1];
    +
                     if (DisplayLink_drvDoInputPitchDoubling(pObj))
                     {
                         pitch0 /= 2;
                         pitch1 /= 2;
                     }

    +                // The swms used to take care of adjusting the memory pointers.
    +                // Now we have to do it ourselves.
    +                // TODO Clean up this hack
    +                offset0 = pitch0 * 24 + 32 * bytesPerPixel;
    +                offset1 = pitch1 * 24 / 2 + 32 * bytesPerPixel;
    +
                     UTILS_assert(DISPLAY_LINK_BLANK_FRAME_CHANNEL_NUM !=
                                  pFrame->channelNum);
    +
                     pFrame->addr[1][0] =
                         (UInt8 *) pFrame->addr[0][0] + pitch0;
                     pFrame->addr[1][1] =
                         (UInt8 *) pFrame->addr[0][1] + pitch1;
    +
    +                // Adjust the image address pointers
    +                pFrame->addr[0][0] = (UInt8 *) pFrame->addr[0][0] + offset0;
    +                pFrame->addr[0][1] = (UInt8 *) pFrame->addr[0][1] + offset1;
    +
         #ifdef SYSTEM_DEBUG_DISPLAY_RT
                     Vps_printf(" %d: DISPLAY: Queue %d frames\n", Utils_getCurTimeInMsec(),
                                displayFrameList.numFrames);

  • Just to summarize the success of this thread:

    The mulitch_vdec_vdis chain used to look like this:
    file -> decoder -> mpSclr -> dup -> swms -> display (without ISR queueing)

    The chain now looks like this:
    file -> decoder -> display (with ISR queueing, and 420 input)

    Now I have the latencies between injecting a frame into the decoder and seeing the pixel on the screen as follows:

    (@30fps feed on a 60Hz screen) Top-left of screen Center of screen Bottom-right of screen
    Min 31ms 38ms 45ms
    Max 49ms 55ms 63ms

    To the best of my knowledge, the difference between a column's Min and Max is the amount of time between screen refreshes (~17ms at 60Hz).
    The difference between the top-left and the bottom-right of the screen is the amount of time it takes for the screen to update all its pixels (~14ms) (scan-time?)

  • Shiju,

    When you say

    Shiju Sivasankaran said:

    In RDK 3.5,  420 path display is supported for both 816x and  810x, but not supported for 814x.  

    I assume that it is not due to a hardware limitation for 814x, but rather just

    a software issue?  What would it take to support 420 path display for 814x?

    Regards,

    John Whittington

  • John

    yes, you are correct. There is no HW limitation, this is only a SW limitation. Next RDK release 4.0 will support 420 display for all three platforms including DM814x

    regards, shiju

  • Philipp

    Great! thanks for the update.

    Please note that these changes in display link works only when decode link as the previous link. Only decoder populate these values as 32 and 24, for all other links these are set to 0

    As your chain is working fine now, may be you can modify this fix by taking stratX and stratY from the input frame Appdata rather than hard coding

    Input frame has a field "appData" of type System_FrameInfo and this data structure has fields such as

    pFrameInfo->rtChInfo.width

    pFrameInfo->rtChInfo.height

    pFrameInfo->rtChInfo.startX

    pFrameInfo->rtChInfo.startY

    pFrameInfo->rtChInfo.pitch[]

    These values are updated and sending along with each and every frame.  So you can use these values directly to modify the frame pointer

    This is only a suggestion, please ignore this if you are comfortable with your current modification

    regards, shiju

  • Hi Shiju,

    That's a great suggestion, thank you. I was hoping for parameters like that.

    I will modify the code to use the supplied parameters.

    Thanks again,
    Philipp

  • hi, Mishael

        I am using the DM814X for cap+sclr+nsf+enc. I want to know the total latency. And I find in this post that you have measure the total latency using DVRRDK. my DVRRDK is 3.0.  I thinking the measure method should be similarly. Could you tell me how to have the total latency in detail. I don't know how to test it at all...


    regards,

    oguri 

  • hi,Shiju

    I'm using DVRRDK on 814x for cap+sclr+enc, and I want to know the total latency, below is my print info. Do you have a clear way to get the total latency? 

    below is my print info

     [m3vpss ]
     [m3vpss ]  *** Capture Driver Advanced Statistics ***
     [m3vpss ]
     [m3vpss ]  VIP Parser Reset Count : 0
     [m3vpss ]
     [m3vpss ]      |   Total    Even     Odd  Total  Even   Odd  Min /  Max  Min /  Max Dropped Fid Repeat Frame Error Y/C
     [m3vpss ]  CH  |  Fields  Fields  Fields    FPS   FPS   FPS       Width      Height  Fields      Count (Desc Error Y/C)
     [m3vpss ]  ------------------------------------------------------------------------------------------------------------
     [m3vpss ]  000 |  101869  101869       0     72    72     0 1922 / 1922 1080 / 1080      11          0 0/0 (0/0)
     [m3vpss ]
     [m3vpss ]  VIP Capture Port 0 | DescMissMatch1 = 0, DescMissMatch2 = 0 , DescMissMatch3 = 0
     [m3vpss ]
     [m3vpss ]  *** Capture List Manager Advanced Statistics ***
     [m3vpss ]
     [m3vpss ]  List Post Count        : 176837
     [m3vpss ]  List Stall Count       : 0
     [m3vpss ]  List Post Time (ms)    : Max = 0, Min = 0, Avg = 0, Total = 0
     [m3vpss ]  INTC Timeout Count     : (0, 0) (Min timeout value = 987, 991)
     [m3vpss ]  Descriptor miss found count : 0
     [m3vpss ]
     [m3vpss ]
     [m3vpss ]  VIP and VPDMA registers,
     [m3vpss ]  VIP0 : FIQ_STATUS  : 0x4810551c = 0x00004400
     [m3vpss ]  VIP1 : FIQ_STATUS  : 0x48105a1c = 0x00000000
     [m3vpss ]  VPDMA: LIST_BUSY   : 0x4810d00c = 0x00020000
     [m3vpss ]
     [m3vpss ]
     [m3vpss ]  1480578: CAPTURE: Fields = 101858 (fps = 72, CPU Load = 0)
     [m3vpss ]  1480578: CAPTURE: Num Resets = 0 (Avg 0 ms per reset)
     [m3vpss ]  1480579: SYSTEM  : FREE SPACE : System Heap      = 183328 B, Mbx = 10238 msgs)
     [m3vpss ]  1480579: SYSTEM  : FREE SPACE : SR0 Heap         = 4450048 B (4 MB)
     [m3vpss ]  1480579: SYSTEM  : FREE SPACE : Frame Buffer     = 135377792 B (129 MB)
     [m3vpss ]  1480579: SYSTEM  : FREE SPACE : Bitstream Buffer = 86836096 B (82 MB)
     [m3vpss ]
     [m3vpss ]  *** [Scalar0   ] SCLR Statistics ***
     [m3vpss ]
     [m3vpss ]  Elasped Time           : 1415 secs
     [m3vpss ]  Total Fields Processed : 101858
     [m3vpss ]  Total Fields FPS       : 249 FPS
     [m3vpss ]
     [m3vpss ]
     [m3vpss ]  CH  | In Recv In Reject In Process Out Skip Out User Out Latency
     [m3vpss ]  Num | FPS     FPS       FPS        FPS    FPS   Skip FPS Min / Max
     [m3vpss ]  ---------------------------------------------------------------------
     [m3vpss ]    0 |      71         0         71  71        0        0   4   9
     [m3vpss ]
     [m3vpss ]
     [m3vpss ]  *** [NSF0] NSF Statistics ***
     [m3vpss ]
     [m3vpss ]  Elasped Time           : 1415 secs
     [m3vpss ]  Total Fields Processed : 101858
     [m3vpss ]  Total Fields FPS       : 368 FPS
     [m3vpss ]
     [m3vpss ]
     [m3vpss ]  CH  | In Recv In Reject In Process Out User Out Out
     [m3vpss ]  Num | FPS     FPS       FPS        FPS Skip FPS Skip FPS
     [m3vpss ]  ------------------------------------------------
     [m3vpss ]    0 |      71         0         71  71        0        0
     [m3vpss ]
     [m3vpss ]  1480583: SYSTEM  : FREE SPACE : Tiler Buffer     = 55 B (0 MB)  - TILER OFF
     [m3vpss ]  1482251: DISPLAY: HDDAC(BP0) : 71 fps, Latency (Min / Max) = ( 0 / 14 ), Callback Interval (Min / Max) = ( 13 / 14 )
     !!!
     [m3vpss ]  1482251: DISPLAY: UNDERFLOW COUNT: HDMI(BP0) 102009, HDDAC(BP0) 0, DVO2(BP1) 102009, SDDAC(SEC1) 189995
     [m3vpss ]  1482251: SYSTEM  : FREE SPACE : System Heap      = 183328 B, Mbx = 10239 msgs)
     [m3vpss ]  1482251: SYSTEM  : FREE SPACE : SR0 Heap         = 4450048 B (4 MB)
     [m3vpss ]  1482251: SYSTEM  : FREE SPACE : Frame Buffer     = 135377792 B (129 MB)
     [m3vpss ]  1482252: SYSTEM  : FREE SPACE : Bitstream Buffer = 86836096 B (82 MB)
     [m3vpss ]  1482252: SYSTEM  : FREE SPACE : Tiler Buffer     = 55 B (0 MB)  - TILER OFF
    Failed video writing buffer
     [m3video]      1485587: HDVICP-ID:0
     [m3video] All percentage figures are based off totalElapsedTime
     [m3video]               totalAcquire2wait :0 %
     [m3video]               totalWait2Isr :28 %
     [m3video]               totalIsr2Done :0 %
     [m3video]               totalWait2Done :28 %
     [m3video]               totalDone2Release :0 %
     [m3video]               totalAcquire2Release :28 %
     [m3video]               totalAcq2acqDelay :71 %
     [m3video]               totalElapsedTime in msec : 1419652
     [m3video]               numAccessCnt: 1226604
     [m3video]              IVA-FPS :     864
     [m3video]              Average time spent per frame in microsec:     323
     [m3video]      1485588: HDVICP-ID:1
     [m3video] All percentage figures are based off totalElapsedTime
     [m3video]               totalAcquire2wait :0 %
     [m3video]               totalWait2Isr :0 %
     [m3video]               totalIsr2Done :0 %
     [m3video]               totalWait2Done :0 %
     [m3video]               totalDone2Release :0 %
     [m3video]               totalAcquire2Release :0 %
     [m3video]               totalAcq2acqDelay :0 %
     [m3video]               totalElapsedTime in msec :       0
     [m3video]               numAccessCnt:       0
     [m3video]              IVA-FPS :       0
     [m3video]              Average time spent per frame in microsec:       0
     [m3video]      1485589: HDVICP-ID:2
     [m3video] All percentage figures are based off totalElapsedTime
     [m3video]               totalAcquire2wait :0 %
     [m3video]               totalWait2Isr :0 %
     [m3video]               totalIsr2Done :0 %
     [m3video]               totalWait2Done :0 %
     [m3video]               totalDone2Release :0 %
     [m3video]               totalAcquire2Release :0 %
     [m3video]               totalAcq2acqDelay :0 %
     [m3video]               totalElapsedTime in msec :       0
     [m3video]               numAccessCnt:       0
     [m3video]              IVA-FPS :       0
     [m3video]              Average time spent per frame in microsec:       0
     [m3video]
     [m3video]  *** ENCODE Statistics ***
     [m3video]
     [m3video]  Elasped Time           : 1419 secs
     [m3video]
     [m3video]
     [m3video]  CH  | In Recv In Skip In User  Out Latency
     [m3video]  Num | FPS     FPS     Skip FPS FPS Min / Max
     [m3video]  --------------------------------------------
     [m3video]    0 |      72       0        0  72  12 /  20
     [m3video]
     [m3video] Multi Channel Encode Average Submit Batch Size
     [m3video] Max Submit Batch Size : 24
     [m3video] IVAHD_0 Average Batch Size : 1
     [m3video] IVAHD_0 Max achieved Batch Size : 1
     [m3video]
     [m3video] Multi Channel Encode Batch break Stats
     [m3video] Total Number of Batches created: 102213
     [m3video] All numbers are based off total number of Batches created
     [m3video]       Batch breaks due to batch sizeexceeding limit: 0 %
     [m3video]       Batch breaks due to ReqObj Que being empty: 100 %
     [m3video]       Batch breaks due to changed resolution class: 0 %
     [m3video]       Batch breaks due to interlace and progressivecontent mix: 0 %
     [m3video]       Batch breaks due to channel repeat: 0 %
     [m3video]       Batch breaks due to different codec: 0 %
     [m3vpss ]
     [m3vpss ]  1487259: LOAD: CPU: 14.0% HWI: 2.6%, SWI:1.7%
     [m3vpss ]
     [m3vpss ]  1487259: LOAD: TSK: IPC_OUT_M30         : 1.8%
     [m3vpss ]  1487259: LOAD: TSK: CAPTURE             : 0.8%
     [m3vpss ]  1487259: LOAD: TSK: NSF0                : 1.4%
     [m3vpss ]  1487260: LOAD: TSK: DISPLAY0            : 1.1%
     [m3vpss ]  1487260: LOAD: TSK: DUP0                : 0.4%
     [m3vpss ]  1487260: LOAD: TSK: Scalar0             : 1.8%
     [m3vpss ]  1487260: LOAD: TSK: MISC                : 2.4%
     [m3vpss ]
     [m3video]
     [m3video]  1487678: LOAD: CPU: 8.3% HWI: 1.0%, SWI:1.0%
     [m3video]
     [m3video]  1487679: LOAD: TSK: IPC_IN_M30          : 0.6%
     [m3video]  1487679: LOAD: TSK: IPC_BITS_OUT0       : 1.0%
     [m3video]  1487679: LOAD: TSK: ENC0                : 1.8%
     [m3video]  1487679: LOAD: TSK: ENC_PROCESS_TSK_0   : 2.5%
     [m3video]  1487679: LOAD: TSK: MISC                : 0.4%
     [m3video]
     [c6xdsp ]
     [c6xdsp ]  1150785: LOAD: CPU: 0.2% HWI: 0.0%, SWI:0.0%
     [c6xdsp ]
     [c6xdsp ]  1150785: LOAD: TSK: MISC                : 0.2%
     [c6xdsp ]
     *** Encode Bitstream Received Statistics ***
     Elased time = 1707.2 secs
     CH | Bitrate (Kbps) | FPS | Key-frame FPS | Width (max/min) | Height (max/min) | Latency (max/min)
     --------------------------------------------------------------------------------------------------
      0 |        3824.81 | 60.0 |           2.0 |   720 /    720 |    576 /    576  |     20 /     12

    regards,
    oguri
  • Hello

    DVR RDK 3.0 do not have proper hooks to measure the end to end latency.  Please use RDK 3.5, where capture insert time stamp for each frame when it is getting captured.  IpcBitsIn (HLOS) is the A8 side link receives the encoded frames. You can call the Ay syn API Avsync_getWallTime() and this gives the current system Time.

    now the latency is

    latency = ((UInt32)(Avsync_getWallTime()) - pInFrame->timeStamp);

    regards, shiju

  • Shiju,

         I come to ask you some other question.

         I want to change the bufsize or the bufnum of the buflist, in order to change the total latency.  Could someone tell me if  the size can be changed on M3 or not?... If it can be changed, how should I do?...

  • Shun

    can you elaborate on how do you plan to reduce the latency be reducing the buffers size? Do you think for slice based display  if yes, it is not supported by both display link and display driver. Number of buffers or the disply queue depth can be contolled at build time

    regards, shiju

  • shiju,

        my chain is cap+nsf+enc. I want to reduce  the total latency by reducing the buffer nums of each link. could this be available? 

     you told upstairs that the display queue can be controlled at build time. Do other link queue can be controlled at buile time, and how to do?

  • Hi

    For all the links that your are used in your chain (such as Cap, NSF and enc) you could specify the number of output buffers in the use case file. RDK only picks the default value if this is not specified

    regards, shiju

  • Shiju,

    where is the use case file. could you tell me the directory and a way to change some usecase file... I don't know where and how to specify my number of ouput buffers.

    Could you tell me a little bit more of that? 

    Thanks!

    Shun

  • Shiju,

    could you tell me the directory and a way to change a usecase file... I don't know where and how to specify my number of ouput buffers.

    Could you tell me a little bit more of that? 

    Thanks!

    Shun

  • Hi,

    DVR RDK demo usecase files are under dvr_rdk\mcfw\src_linux\mcfw_api\usecases\ directory.  For example see the file multich_progressive_4d1_vcap_venc_vdec_vdis.c under

    vr_rdk\mcfw\src_linux\mcfw_api\usecases\ti810x

    in this file you could see

    capturePrm.numBufsPerCh = NUM_CAPTURE_BUFFERS; - To set the number of capture link output buffers

    nsfPrm2.numBufsPerCh = NUM_NSF_BUFFERS; - To set the number of NSFlink output buffers

    deiPrm.numBufsPerCh[DEI_LINK_OUT_QUE_DEI_SC] = NUM_DEI_ENC_BUFFERS; - To set the number of DEI link output buffers

    encPrm.numBufPerCh[0] = NUM_ENCODE_D1_BUFFERS; - To set the number of ENC link output buffers

    encPrm.numBufPerCh[1] = NUM_ENCODE_CIF_BUFFERS; - To set the number of ENC link output buffers

    decPrm.chCreateParams[i].numBufPerCh = NUM_DECODE_BUFFERS; - To set the number of DEC link output buffers

    regards, shiju

  • Shiju Sivasankaran said:

    capturePrm.numBufsPerCh = NUM_CAPTURE_BUFFERS; - To set the number of capture link output buffers

    nsfPrm2.numBufsPerCh = NUM_NSF_BUFFERS; - To set the number of NSFlink output buffers

    deiPrm.numBufsPerCh[DEI_LINK_OUT_QUE_DEI_SC] = NUM_DEI_ENC_BUFFERS; - To set the number of DEI link output buffers

    encPrm.numBufPerCh[0] = NUM_ENCODE_D1_BUFFERS; - To set the number of ENC link output buffers

    encPrm.numBufPerCh[1] = NUM_ENCODE_CIF_BUFFERS; - To set the number of ENC link output buffers

    decPrm.chCreateParams[i].numBufPerCh = NUM_DECODE_BUFFERS; - To set the number of DEC link output buffer

    Hi Shiju

          This is John.

          I have a brief question.

          I'd like to know will the "Buffers number" of those links ( cap / nsf /dei / .etc ) effect / introduce the latency from capture to screen displaying?

         

  • Hi Shiju,

         I just tested with the simple chain ( cap -> display ).

         The captured frame & display configuring are 1080p@60.

         I feel there is no much difference in total latency when I changed capturePrm.numBufsPerCh between 6 <-> 13.

         Is that result matches your expectation?