This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DRA744: High CPU usage with glimagesink

Part Number: DRA744
Other Parts Discussed in Thread: TVP5158

We have an analog TV decoder (for cameras) that produces raw video in YUYV format. The application uses QT, which in turn depends on gstreamer. To put video on the display, the stream has to be converted to a format acceptable to the sink element. We're experimenting with the 'qmlglsink' element (as an alternative to QtMultimedia), so that we can control the gstreamer pipeline used by the application. qmlglsink seems to use glimagesink internally (or at least it behaves in a very similar way). glimagesink accepts only RGBA format, so colorspace conversion is needed. Here is a gstreamer pipeline that closely simulates the behavior in our QT app:

gst-launch-1.0 -e videotestsrc ! 'video/x-raw, format=YUY2' ! glupload ! glimagesink sync=false

That basically works, but the problem is that it causes a very high level of CPU usage (about 130%, out of 200%, with 2 core processors). But this, on the other hand, doesn't cause the same problem:

gst-launch-1.0 -e videotestsrc ! glupload ! glimagesink sync=false

It takes only about 20% CPU.

So presumably most of the extra CPU usage is due to glcolorconvert (which is used implicitly for the YUY2 --> RGBA colorspace conversion, as shown by the gstreamer logs). But the CPU usage makes no sense, because the gl elements are supposed to offload this processing from the CPU to the GPU hardware, aren't they?

What are we doing wrong?

This is on TI Arago Yocto 3.1/dunfell, with Linux kernel 5.10. (By the way, qmlglsink / glimagesink do not work by default on this distribution. We needed several gstreamer patches before these pipelines would even start up at all. I can share them on request.)

  • Clarification: if I remove sync=false from the pipeline examples above, it makes a big difference: from 130% down to about 65% in the first case vs. 10% (down from 20%) in the second. But the sync property seems to be required to avoid much worse trouble when we bring in the actual camera:

    gst-launch-1.0 -e v4l2src device=/dev/video1 ! 'video/x-raw, format=YUY2' ! glupload ! glimagesink sync=false

    That pipeline also "works" but is worth 180% CPU by the same measure as above. Incidentally it produces this message (although videotestsrc with YUY2 for some reason does not):

    CreateImageSharedFromDmaBufs: Unsupported DRI FourCC (fourcc = 0x38385247)

    Whereas, if I try /dev/video1 without sync=false, I get a whole lot of messages like this:

    WARNING: from element /GstPipeline:pipeline0/GstGLImageSinkBin:glimagesinkbin0/GstGLImageSink:sink: A lot of buffers are being dropped.
    Additional debug info:
    ../gstreamer-1.16.3/libs/gst/base/gstbasesink.c(3003): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstGLImageSinkBin:glimagesinkbin0/GstGLImageSink:sink:
    There may be a timestamping problem, or this computer is too slow.

    and even worse performance in that case than any other.

  • Here's the leading page of a perf report from the /dev/video1 pipeline. This makes it look like all the action is happening inside the PVR user-mode driver. It is closed-source, so I cannot see what functions are called in there.

    Can you help??

    # To display the perf.data header info, please use --header/--header-only options.
    #
    #
    # Total Lost Samples: 0
    #
    # Samples: 135K of event 'cycles'
    # Event count (approx.): 33671791528
    #
    # Children Self Command Shared Object Symbol
    # ........ ........ ............... .................................. .......................................................
    #
    51.93% 0.00% gstglcontext [unknown] [.] 0x0000051d
    |
    ---0x51d
    |
    |--41.63%--0xb508f170
    |
    --10.29%--0xb508f144

    41.63% 0.00% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0xb508f170
    |
    ---0xb508f170

    41.37% 41.37% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0x0001f170
    |
    ---0x51d
    0xb508f170

    10.29% 0.00% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0xb508f144
    |
    ---0xb508f144

    10.20% 10.20% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0x0001f144
    |
    ---0x51d
    0xb508f144

    9.74% 0.00% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0xb50c2872
    |
    ---0xb50c2872

    9.73% 9.73% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0x00052872
    |
    ---0xb50c2872

    7.97% 0.00% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0xb50c287c
    |
    ---0xb50c287c

    7.96% 7.96% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0x0005287c
    |
    ---0xb50c287c

    6.79% 0.00% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0xb50c2880
    |
    ---0xb50c2880

    6.78% 6.78% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0x00052880
    |
    ---0xb50c2880

    3.62% 0.00% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0xb50c289a
    |
    ---0xb50c289a

    3.62% 3.62% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0x0005289a
    |
    ---0xb50c289a

    2.62% 0.00% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0xb50c288e
    |
    ---0xb50c288e

    2.62% 2.62% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0x0005288e
    |
    ---0xb50c288e

    2.33% 0.00% gstglcontext libGLESv2_PVR_MESA.so.1.17.4948957 [.] 0xb50c2782
    |
    ---0xb50c2782

  • Hello,

    Can you please check the supported formats the device can output using the following command and revert this back to us?

    v4l2-ctl -D -d 1 --list-formats-ext

    Regards,

    Erick

  • Thanks for your response!

    We have ourselves written the V4L2 driver for the capture device, which is TW2964, a four-channel-multiplexed (BT.656) analog TV decoder, similar to TVP5158 (from the TI reference design).

    # v4l2-ctl -D -d 1 --list-formats-ext
    Driver Info:
    Driver name : vip
    Card type : vip
    Bus info : platform:vip2:vin3a:stream1
    Driver version : 5.10.100
    Capabilities : 0x85200001
    Video Capture
    Read/Write
    Streaming
    Extended Pix Format
    Device Capabilities
    Device Caps : 0x05200001
    Video Capture
    Read/Write
    Streaming
    Extended Pix Format
    ioctl: VIDIOC_ENUM_FMT
    Type: Video Capture

    [0]: 'UYVY' (UYVY 4:2:2)
    Size: Discrete 720x288
    Size: Discrete 720x288
    [1]: 'YUYV' (YUYV 4:2:2)
    Size: Discrete 720x288
    Size: Discrete 720x288
    [2]: 'VYUY' (VYUY 4:2:2)
    Size: Discrete 720x288
    Size: Discrete 720x288
    [3]: 'YVYU' (YVYU 4:2:2)
    Size: Discrete 720x288
    Size: Discrete 720x288

  • Hello Michael,

    Could you try using "format=YVYU" and let us know your results?

    Regards,

    Erick

  • WARNING: erroneous pipeline: could not link videotestsrc0 to gluploadelement0, gluploadelement0 can't handle caps video/x-raw, format=(string)YVYU

    Actually the BT.656 format on the wire looks more like UYVY instead of YUYV (=YUY2), according to the TW2964 datasheet. But if I use format=UYVY, I get exactly the same behavior as before (no change in CPU usage).

  • Hello,

    Unfortunately that did not work then.

    I've talked internally with some of my colleagues, they pointed out that the following may be happening:

    1) Gstreamer uploads the YUY2 as GL_RG texels and parses them back out when converting from yuv-rgb, please see here:

    https://github.com/GStreamer/gst-plugins-base/blob/master/gst-libs/gst/gl/gstglcolorconvert.c#L399

    2) SGX544 does support RG, but not GR:

    https://github.com/GStreamer/gst-plugins-base/blob/master/gst-libs/gst/gl/egl/gsteglimage.c#L463

    Gstreamer should do this flip in the shader.

    Since the above test did not work, it looks like you might need to make a modification to GStreamer to get this working. Please let me know your thoughts.

    Regards,

    Erick

  • "SGX544 does support RG, but not GR". Great idea! (I wasn't aware of that.)

    First thing I tried was (naively) just to replace DRM_FORMAT_GR88 with DRM_FORMAT_RG88 on line 451 of gsteglimage.c.

    I see no visible effect, and no change in CPU usage. Here's what shows up in the log now:

    0:00:01.305725098 1694 0x1cc5b0 DEBUG gleglimage gsteglimage.c:457:_drm_rgba_fourcc_from_info: Getting DRM fourcc for YUY2 plane 0
    0:00:01.305786133 1694 0x1cc5b0 DEBUG gleglimage gsteglimage.c:543:gst_egl_image_from_dmabuf: fourcc RG88 (943212370) plane 0 (720x288)
    CreateImageSharedFromDmaBufs: Unsupported DRI FourCC (fourcc = 0x38384752)

    As you can see, it does react to the change. However it gives me the same complaint as before, so SGX does not seem to like RG88 any better than GR88. (I am pretty sure that the function CreateImageSharedFromDmaBufs comes from the SGX/PVR user-mode libraries, even though we don't have source code, since this function also exists in the community Mesa codebase. Compare the last line with my earlier comment above, where it stated "0x38385247".)

    Do you buy this? How sure are you that it does support RG? If it really does, then what else could be wrong? (Some config issue; something about the kernel driver maybe?) Or if it actually doesn't support RG either, is there another direction we could take? [Perhaps it would prefer a "packed" instead of "semiplanar" format: I'm not sure why gstreamer goes for multi-planar when the data coming from VIP apparently is packed. But the code here is pretty deliberate about choosing the RG or GR formats for any YUV source... so that refactoring it would seem to be, err, complicated. :-]

  • Michael,

    Yes something else seems missing or not explained yet. One note is that we don't have support for the SGX drivers anymore unfortunately, and this seems to require a deeper dive to see what configuration you need to change to get this working.

    We do have the option of sharing the SGX UM Libs source code if you think it would be useful for you (together with the kernel driver, of course). Please let me know if you would like to initiate this process.

    Regards,

    Erick

  • Yes! Let's do that! Absolutely. What do we have to sign?

  • Michael,

    Let me get this process started for you, I will message you with the details.

    Regards,

    Erick