This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM57x VPE DMA-BUF output to GPU with MMAP buffer slow

Other Parts Discussed in Thread: AM5728

I've successfully configured the AM57XX's VPE to output BGRA32 format images. It is in streaming mode and queuing and dequeuing frames just fine. I want to directly take the output DMA-BUF from the VPE and feed it to the SGX 544 GPU using OpenGL. I can achieve this, but directly passing the mmap'd point is very slow. 

If I submit the VPE mmap'd output directly to the GPU, the glTexImage2D upload takes ~54 ms! If I memcpy the output to a different malloc'd buffer, then the upload takes 2ms. If I use NEON to essentially copy the VPE output to a different buffer, it also takes about 2 ms to upload the texture with glTexImage2D.

Here is my code:

// Perform actual memory mapping of VPE output
int vpeSize = vpeOutPutBuffer->bo[0]->size;
vpeMmapFrame = (char *)mmap(0, vpeSize, PROT_READ, MAP_PRIVATE,vpeOutPutBuffer->fd[0], vpeOutPutBuffer->bo[0]->offset);
assert(vpeMmapFrame != MAP_FAILED);

//neonPermuteARGBtoBGRA((uchar*)vpeMmapFrame,(uchar*)m_Rgba,pixelCount);
memcpy(m_Rgba,vpeMmapFrame ,pixelCount*4);
//m_Rgba = vpeMmapFrame;

// Activate texture unit 1 and submit VL frame to it
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, m_VLTextureLCD);
glTexImage2D(GL_TEXTURE_2D, 0, GL_BGRA_EXT, vlWidth, vlHeight, 0, GL_BGRA_EXT, GL_UNSIGNED_BYTE, m_Rgba);

glUniform1i(m_VLDataUniformLCD, 1);

I do not see an OpenGL extension for directly accepting DMA-BUFs. 

Why is this so slow? How can I directly pass the DMA-BUF from the VPE to OpenGL efficiently? Thanks!

  • I will ask the software team to look at this.
  • Thanks Biser.

    I've seen this presentation:

    elinux.org/.../DMA_Buffer_Sharing-_An_Introduction.pdf

    On page 15, it says "eglImage extension for dma_buf providing
    for userspace CPU access with adequate
    fencing/barriers to deal with asynchronous
    command submission to GPU"

    Another example of the desired functionality is this presentation on the imx.6:

    http://events.linuxfoundation.org/sites/events/files/slides/slides_4.pdf

    1 /* Dequeue */
    2 ...
    3
    4 glBindTexture (GL_TEXTURE_2D, textures[0]);
    5 /* Physical and Virtual addresses */
    6 glTexDirectVIVMap(GL_TEXTURE_2D, width, height, GL_VIV_NV12, &
    buffers[buf.index].start, &(buffers[buf.index].offset));
    7 glTexDirectInvalidateVIV(GL_TEXTURE_2D);
    8
    9 /* Queue */
    10 ...
    • No more memcpy()
    • GPU knows the physical address in RAM

    Update - I've found the following patch in the latest ( April 2016 ) Processor SDK:

    From f56d36a2323f04bb4936a106efdcf8c4754e2233 Mon Sep 17 00:00:00 2001
    From: Anand Balagopalakrishnan <anandb@ti.com>
    Date: Sun, 24 Jan 2016 14:16:10 +0530
    Subject: [PATCH 1/1] configure: disable kmscube
    
    kmscube support requires proprietary EGL extensions EGL_RAW_VIDEO_TI_DMABUF
    which is not supported in DDK 1.14.
    
    Future versions of DDK will support the standard EGL extension
    EXT_image_dma_buf_import which is ratified by Khronos. This will require
    corresponding changes to the application.
    
    Till then, disable kmscube in omapdrmtest.
    
    Signed-off-by: Anand Balagopalakrishnan <anandb@ti.com>
    ---
     configure.ac |    2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)
    
    diff --git a/configure.ac b/configure.ac
    index c026733..05180a6 100644
    --- a/configure.ac
    +++ b/configure.ac
    @@ -51,7 +51,7 @@ AM_CONDITIONAL(ENABLE_V4L2_DMABUF, [test "x$HAVE_V4L2_DMABUF" = xyes])
    
     # Check optional KMSCUBE:
     AC_ARG_ENABLE([kmscube], AS_HELP_STRING([--disable-kmscube], [disable kmscube display support]))
    -AS_IF([test "x$enable_kmscube" != "xno"], [PKG_CHECK_EXISTS(gbm egl glesv2, [HAVE_KMSCUBE=yes], [HAVE_KMSCUBE=no])])
    +HAVE_KMSCUBE="no"
     if test "x$HAVE_KMSCUBE" = "xyes"; then
            AC_DEFINE(HAVE_KMSCUBE, 1, [Have KMSCUBE support])
            PKG_CHECK_MODULES(GBM, gbm)
    -- 
    1.7.9.5
    

    It seems that it is not available but will be in the future. When will the DMA-BUF import extension be available? Thank you.

  • DMA-BUF import extension feature will be available by end of 4Q, 2016
  • Hi,

    Any update on the support of DMA-BUF? It seems like the DDK 1.14 from PSDK don't have this extension.
    Thanks a lot.

    Best regards

  • I am glad to know that you were able to find the patch. Hope that solves your problem. 

  • Hi,

    While using the new interface, which works as expected, it seems that there is a big memory leak with it:

    I am creating the image with:

    image = eglCreateImageKHR(display, EGL_NO_CONTEXT,
    EGL_LINUX_DMA_BUF_EXT, (EGLClientBuffer)NULL, attr);

    I need to create multiples images (video stream), ~16.

    When the stream is done, I am releasing all the images:

    eglDestroyImageKHR(display, image);
    which return TRUE.

    But if I do a before/after on the vmalloc status, I have those extra allocations, and as you can see, it's quite a lot! :

    0xf5c15000-0xf5c1a000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c1a000-0xf5c1f000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c1f000-0xf5c24000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c24000-0xf5c26000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5c26000-0xf5c28000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5c28000-0xf5c2d000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c2d000-0xf5c32000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c32000-0xf5c34000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5c34000-0xf5c39000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c39000-0xf5c3e000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c3e000-0xf5c43000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c43000-0xf5c45000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5c45000-0xf5c4a000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c4a000-0xf5c4f000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c4f000-0xf5c51000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5c51000-0xf5c56000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c56000-0xf5c5b000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c5b000-0xf5c60000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c60000-0xf5c62000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5c62000-0xf5c67000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c67000-0xf5c6c000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c6c000-0xf5c6e000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5c6e000-0xf5c73000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c73000-0xf5c78000 20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5c78000-0xf5c7a000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5c94000-0xf5c97000 12288 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=2 vmalloc
    0xf5c97000-0xf5c99000 8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc

    A simple loop for create/destroy using EGL_LINUX_DMA_BUF_EXT image should show you the problem.
    If I execute the exact same code without creating the image but with dmabufs still allocated and used, there is not more leak. So the leak seems really linked to the eglDestroyImageKHR not working as expected.

    Do you have any fix/workaround?

    Thanks a lot
  • Hi,

    I simplified my code to isolate one leak per playback. I need to keep the OpenGL context open, since the rest of my application is running in OpenGL.

    In a nutshell, I am doing the following (I am working with only one buffer):

    Creation:

    m_imageHandle = eglCreateImageKHR(display, EGL_NO_CONTEXT,
                    EGL_LINUX_DMA_BUF_EXT, (EGLClientBuffer)NULL, attr);

    Release:

        eglDestroyImageKHR(display, m_imageHandle);
        m_imageHandle = 0;


    And for every create/destroy, I have one leak (/proc/vmallocinfo):

    20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc

    So after two cycles for example (/proc/vmallocinfo):

      20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
      20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc

    Funny thing, after three cycles, two new leaks are showing up:

    0xf56f2000-0xf56f7000   20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5716000-0xf571b000   20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf572f000-0xf5734000   20480 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=4 vmalloc
    0xf5734000-0xf5736000    8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc
    0xf5736000-0xf5738000    8192 _VMallocWrapper+0x88/0xcc [pvrsrvkm] pages=1 vmalloc

    And for the following cycles, sometime is one leak only of 20480, and sometime one of 20480 and one of 8192. I am not sure what is the pattern here.

    As explained before, if I run the exact same code without create an image, there is no leak, so the leak really seems link to the DMA-BUF image, not to the rest of my application.

    Do you have any feedback?
    Thanks a lot.

    Best regards,

  • We shall dig into this and get back.
  • Jean-Baptiste,

    I looked into this issue. This is a valid behaviour of the DDK where it caches the Memory information for future use.

    This cannot be considered as a leak as the memory will be cleared in 2 situations:

    1. When memory is low, and stale caches are present

    2. Application exit, where these caches are invalidated.

    Regards,

    Subhajit

  • Hi,

    Thanks a lot for your feedback. But we are facing the following crash, that I am linking to this "leak", since without the creation of the EGLImage, there is no more crash.

    After 3 successful playback of my stream (very "stable", always 3), I get the following error and a crash:

    [ 129.564636] omapdrm omapdrm.0: could not remap: -12 (3)
    [ 129.569918] PVR_K:(Error): DmaBufImportAndAcquirePhysAddr: dma_buf_map_attachment failed: -307216192
    [ 129.579259] PVR_K:(Error): PVRSRVMapDmaBufKM: Failed to get dma-buf phys addr
    [ 129.586543] PVR_K:(Error): PVRSRVMapDmaBufBW: Failed to map dma-buf handle

    The only thing which seems to "clear" the situation is the close the app between playbacks, which obviously is not an answer here.

    After each playbacks:
    -> EGLImage are destroyed
    -> DMA-BUF are freed and close
    -> DRI file descriptor is closed (I hope it would help to clear the cache, but no change)

    But this doesn't seems enough.

    As explained before:
    -> If I disable the creation of EGLImage, and run the exact same code, no more issue.
    -> If I create only one EGLImage (rather than my classic 16 DMA-BUF/EGLImage) per playback, I can do 16x more Playback cycles

    All of this really makes me think of a memory leak/memory not been clear.

    Is there a way of forcing the cache to clear in this situation? Or ideas?

    Thanks a lot.

    Best regards
  • Hi. Did you just copied all libraries to AM5728?
    Could you share some details how to make it work?
    REgards
  • Hi - Is there any example code showing how to use DMABUF import to OpenGL on AM57XX?
  • Hi Philip,

    Please refer http://git.ti.com/glsdk/omapdrmtest/blobs/master/util/display-kmscube.c for gbm_bo_import() usage.

    filevpedisplay application be used to test this.

    Filevpedisplay with vpe's output in RGBA format uses gbm_bo_import() and create an RGB texture on kmscube with below command.

    target$ filevpesiplay /usr/share/ti/video/airshow_p352x288.yuv 352 288 nv12 720 480 abgr32 0 0 352 288 0 1 --kmscube --connector 32 --fps 10

    Let me know if you have any issues.

    Ramprasad


    Please PPlease

  • Hi Ramprasad,

    Thank you for the response. 

    When I run omapdrmtest, I get the error "get_buffers not supported". Please see below. The EGL extension print shows that the DMABUF import functionality is supported: 

    EGL_EXT_image_dma_buf_import

    Any idea what I am doing wrong? Thanks.


    root@fam57xx:/home/flk# ./filevpedisplay framebuff_low_res.yuv 640 480 yuyv 640 480 abgr32 0 0 640 480 0 1 -
    -kmscube --connector 32 --fps 1
    vpe:/dev/video0 open success!!!
    Forcing playback rate at 1 fps.
    Chosen Connector ID = 32
    failed to load module: /usr/lib/gbm/gbm_dri.so: cannot open shared object file: No such file or directory
    failed to load module: /usr/lib/gbm/gbm_gallium_drm.so: cannot open shared object file: No such file or directory
    loaded module : gbm_pvr.so
    found valid GBM backend : gbm_pvr.so
    Using display 0x1 with EGL version 1.4
    EGL Version "1.4 build 1.14@3699939 (MAIN)"
    EGL Vendor "Imagination Technologies"
    EGL Extensions "EGL_IMG_client_api_ogl EGL_KHR_image EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_vg_parent_image EGL_IMG_cl_image EGL_KHR_fence_sync EGL_IMG_context_priority EGL_IMG_hibernate_process EGL_IMG_image_plane_attribs EGL_KHR_surfaceless_context EGL_KHR_wait_sync EGL_KHR_create_context EGL_WL_bind_wayland_display EGL_EXT_image_dma_buf_import"
    GL Extensions "GL_OES_rgb8_rgba8 GL_OES_depth24 GL_OES_vertex_half_float GL_OES_texture_float GL_OES_texture_half_float GL_OES_element_index_uint GL_OES_mapbuffer GL_OES_fragment_precision_high GL_OES_compressed_ETC1_RGB8_texture GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_required_internalformat GL_OES_depth_texture GL_OES_get_program_binary GL_OES_packed_depth_stencil GL_OES_standard_derivatives GL_OES_vertex_array_object GL_OES_egl_sync GL_OES_texture_npot GL_OES_surfaceless_context GL_EXT_discard_framebuffer GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_multisampled_render_to_texture GL_EXT_shader_texture_lod GL_EXT_texture_format_BGRA8888 GL_EXT_occlusion_query_boolean GL_EXT_texture_rg GL_EXT_draw_buffers GL_EXT_shader_framebuffer_fetch GL_IMG_shader_binary GL_IMG_texture_compression_pvrtc GL_IMG_texture_compression_pvrtc2 GL_IMG_texture_npot GL_IMG_texture_format_BGRA8888 GL_IMG_read_format GL_IMG_program_binary GL_IMG_uniform_buffer_object GL_IMG_multisampled_render_to_texture GL_KHR_debug"
    Display initialized, Render thread created
    vpe i/p: G_FMT: width = 640, height = 480, 4cc = YUYV
    vpe o/p: G_FMT: width = 640, height = 480, 4cc = BGR4
    get_buffers not supported!
    Done!!!

  • Hi Philip,
    That error is not an issue, Get_buffer is required only for kms display and not required for kmscube and wayland.
    Did you observe your framebuff_low_res.yuv as texture on kmscube?

    Ram
  • Hi Ram,

    I specified the same HDMI display ( DRM connector_id 36 ) that my personal test application correctly displays frames on using glTexImage2D, but omapdrmtest does not render to it.

    When I launch filevpedisplay, I can see that the HDMI display comes out of "sleep mode" because the monitor activity LED turns blue but the screen remains black. When the filevpedisplay app exits, the HDMI monitor goes back into sleep mode.

    Any ideas what is going on? Thank you.

    ./filevpedisplay framebuff_low_res.yuv 640 480 yuyv 640 480 abgr32 0 0 640 480 0 1 --kmscube --connector 36 --fps 1
  • The kmscube standalone application seems to work just fine and displays a rotating cube on the screen - but it does not demonstrate the dmabuf import:

    root@fam57xx:/home/flk# kmscube -c 36
    
    trying to load module omapdrm...success.
    
    ### Display [0]: CRTC = 34, Connector = 32
    
           Mode chosen [800x480] : Clock => 33000, Vertical refresh => 53, Type => 72
    
           Horizontal => 800, 840, 968, 1184, 0
    
           Vertical => 480, 490, 492, 527, 0
    
    ### Display [1]: CRTC = 38, Connector = 36
    
           Mode chosen [1680x1050] : Clock => 146250, Vertical refresh => 60, Type => 72
    
           Horizontal => 1680, 1784, 1960, 2240, 0
    
           Vertical => 1050, 1053, 1059, 1089, 0
    
    ### Primary display => ConnectorId = 36, Resolution = 1680x1050
    
    failed to load module: /usr/lib/gbm/gbm_dri.so: cannot open shared object file: No such file or directory
    
    failed to load module: /usr/lib/gbm/gbm_gallium_drm.so: cannot open shared object file: No such file or directory
    
    loaded module : gbm_pvr.so
    
    found valid GBM backend : gbm_pvr.so
    
    Using display 0x1 with EGL version 1.4
    
    EGL Version "1.4 build 1.14@3699939 (MAIN)"
    
    EGL Vendor "Imagination Technologies"
    
    EGL Extensions "EGL_IMG_client_api_ogl EGL_KHR_image EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_vg_parent_image EGL_IMG_cl_image EGL_KHR_fence_sync EGL_IMG_context_priority EGL_IMG_hibernate_process EGL_IMG_image_plane_attribs EGL_KHR_surfaceless_context EGL_KHR_wait_sync EGL_KHR_create_context EGL_WL_bind_wayland_display EGL_EXT_image_dma_buf_import"

  • Hi Ramprasad,

    Okay I found this issue with the black screen. The master branch of omapdrmtest seems to have a bug on line  295 of filevpedisplay.c.

    if (do_read ("Y plane", fin, srcBuffers[index],

         vpe->src.size) <= 0)

    I believe above should read:

    if (do_read ("Y plane", fin, srcBuffers[index],

         vpe->src.size) < 0)

    With "<=", the program exits, but the do_read function returns 0 on successful read.

    After fixing that I see a cube spinning on the HDMI. The remaining problem is that the cube is green most of the time, but for 1 second every 10 seconds it shows my image that I am feeding to it on the command line. I have attached images below. Why is it green most of the time? Thanks.

  • Hi Philip,

    I am not sure how your input to vpe looks like.

    There is a NV12 airshow_p352x288.yuv raw file in /usr/share/ti/video of the root-filesystem.

    Can you please try this command and let me know if you observe proper results?

    filevpedisplay /usr/share/ti/video/airshow_p352x288.yuv 352 288 nv12 720 480 abgr32 0 0 352 288 0 1 --kmscube --fps 10 --connector 36

    Ram

  • Hi Ram,

    I want to try that file ( airshow_p352x288.yuv ) - but I cannot find it on my Ubuntu dev system,  would you please attach it to this thread so I can download and test?

    I have attached my YUYV file ( framebuff_low_res.yuv ). It is a raw dump of a YUYV frame captured through the VIP at 640x480. Would you please try it on your system with the filevpedisplay application?

    NOTE: I had to rename the attachment from *.yuv to *.123 in order to attach it to this posting.

    framebuff_low_res.123

    Thank you,

    Phil

  • Hi Philips,

    framebuff.yuv has only one frame.

    Please try the attached NV12

    airshow_p352x288.zip

    Ram

  • Hi Ram,

    Thank you for the support it works. I didn't realize the airshow example file contained several frames. Now it looks as expected with my yuv frame.

    I can take it from here as an example.

    Thank you.

    Phil