This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM57X: Not able to get full frame rate with Wayland

Other Parts Discussed in Thread: AM5728

Hi,

I am trying to decode 2 streams and display them using HDMI on AM5728.

Processor SDK 02.00.00 is used.

The display application is written using waylandsink as reference

https://git.ti.com/glsdk/gst-plugins-bad0-10/blobs/33bed0d3793ded68234eb3d25e020b50ab11de69/ext/wayland/gstwaylandsink.c#line642

We observe the display throughput (frame rate) going low 

1 video stream is 1080p60 and other is 1080p30. The time taken to display the frame using wayland is varying and average display time taken goes as high as 33milli second for 1080p60 channel.

The ARM load is high around ~70%, my guess is somehow this might be causing the issue.

Also with high ARM load ~70%, single channel 1080p60 display throughput goes down to 57-58fps.

Also frame tearing is observed in this case, is there any delay required before the buffers are freed and given to decoders for using.

Please let us know how this can be solved.

Thank you in advance.

Regards,

Shridhar

  • Hi,

    I will ask the software team to comment.
  • AM572x can handle single channel 1080p 60 decode but cannot do 1080p90 fps. If you are doing the dual decode((1080p60 + 1080p30) simultaneously, then the bottleneck is coming from codec. Also clarify if you are using gstreamer framework?

  • Hi Manisha,

    We are using our own media framework and not gstreamer.

    The 1080p60 decode is happening on IVAHD and other 1080p30 decode on ARM (soft codec).

    Hence the decoder is not a bottleneck in this case.

    Based on the our experiments we have narrowed down to display. 

    Please let us know in case you need more information.

    Thanks

    Regards,

    Shridhar

  • Did you check the display rate with single channel 60 fps decode (no soft 1080p30 decode on ARM) ?
  • In addition, can you also check with 1080p60 decode with GStreamer and let us know the frame rate observed? We need to see if the problem is with framework / test stream / something else.

  • Anand,

    Thanks for your response

    We did the following experiments with gstreamer to narrow down the cause

    Experiment 1:
    Following Gstreamer pipes were run simultaneously
    File read -> 720p HEVC Decode on ARM -> display using wayland
    File read -> 1080p60 H264 Decode on IVAHD -> display using wayland
    Observation: Frame skips and tearing is observed on both the outputs.

    To understand whether the issue is with composition or load, the following experiment was also done

    Experiment 2:
    Following Gstreamer pipes were run simultaneously
    File read -> 720p HEVC Decode on ARM -> Fakesink
    File read -> 1080p60 H264 Decode on IVAHD -> display using wayland
    Observation:
    When we run only the hardware decode pipe, we see smooth playback (no issues). The moment we start the software decoder pipe, frame skips and tearing observed on 1080p60 output.
    When we run the software decode pipe, we see the CPU load to shoot up to 70%

    From the above observations and also the profiling that we have done with our framework, looks like with increasing arm load, wayland becomes bottleneck w.r.t throughput resulting in frame skips and tearing.

    Ideally we think this is a bug with wayland and can be fixed. Let us know your thoughts
    Let us know your plans on addressing the issue

    Few of our usecases can load ARM up to 80%. Should we reconsider using Wayland as an option for such scenarios? How do we use QT for such cases?

    Regards,
    Apoorva
  • Apoorva,

    Did you play for process priorities, setting higher priorities for Weston, IVAHD decode gst-pipeline than the ARM process and see if the issue still happens? If it does, the next step is to run PVR tune to understand where the bottleneck is. You can get information on PVR tune in DDK documents.
  • Manisha,

    We tried with changing the nice value of the Weston server and IVAHD decode pipeline. We see some improvement in the performance, but we still see jerky playback, frame skips and frame tearing.

    Regards,
    Apoorva
  • I personally apologize for the delay. I have escalate to software management team and they will take the actions to be sure this post will be answered. Sorry.
  • We are able to reproduce this problem at the AM57xx EVM and the log file is sent to the graphics development team for further analysis today.
    We should create more test cases and add debug code to identify the root cause of this issue and we should get back to you when any key progress is made.
    We do apologize for the delay and shall consider this as a high priority issue.

    Best regards,

    Eric Ruei

  • Hi Eric,

    Thanks for your response.

    From our analysis, in addition to ARM load, we think DDR bandwidth could also be accentuating the issue. (multiple decodes and 1080p60 display with wayland, wayland composition adds significantly to DDR load).

    1. Following information w.r.t DDR bandwidth availability/usage will us help evaluate possibilities on the platform.
    a) DDR bandwidth available for various peripherals (SGX, DSS, HDVICP etc) after protocol overheads
    b) DDR bandwidth usage for H.264 encoding/decoding using HDVICP
    c) DDR bandwidth usage by Weston for composing 2 1080p30 videos onto a 1080p60 display

    Please note it seems impossible to instrument these using software. Any data you can provide will be extremely valuable

    Regards,
    Apoorva
  • a) DDR bandwidth available for various peripherals (SGX, DSS, HDVICP etc) after protocol overheads

    DDR bandwidth for VIP, DSS and VPE is actual resolution and fps of the frame being read and write. Only exception is deinterlacer in VPE. It reads three fields and writes one frame along with 0.5 bytes of meta deta information per pixel only for luma buffer.

    b) DDR bandwidth usage for H.264 encoding/decoding using HDVICP

    Resolution H.264 decoder DDR Bandwidth(Mbytes/sec)
      P only frame@30fps B only frames@30fps
    QCIF 6 10
    CIF 23 41
    VGA 70 124
    D1 79 140
    720P 207 368
    1080P 468 833

    Resolution H.264 encoder DDR Bandwidth(Mbytes/sec) Tools enabled
      P frame@30 fps B frame @ 30 fps Growing Window only
    Search Range (Vertical – 32, Horizontal – 144)
    4MV (i.e., minBlockSize is 8x8)
    Bitrate Max ~25Mbps for 1080p
    QCIF 7.5 7.5
    CIF 25.7 28.7
    VGA 75.6 82.5
    D1 84.8 91.9
    720P 221 240.9
    1080P 495 541
  • DDR throughput can be considered to be 55% when DDR is in non-interleaved mode and 65% when it is in interleaved mode.
  • Thanks Manisha,

    Can you let us know If these DDR numbers for HDVICP codecs are for Tiler mode or non-Tiler mode? Can you also provide us the numbers for other mode?
  • DDR BW remains same between TILER or NON-TILER mode.