I've successfully configured the AM57XX's VPE to output BGRA32 format images. It is in streaming mode and queuing and dequeuing frames just fine. I want to directly take the output DMA-BUF from the VPE and feed it to the SGX 544 GPU using OpenGL. I can achieve this, but directly passing the mmap'd point is very slow.
If I submit the VPE mmap'd output directly to the GPU, the glTexImage2D upload takes ~54 ms! If I memcpy the output to a different malloc'd buffer, then the upload takes 2ms. If I use NEON to essentially copy the VPE output to a different buffer, it also takes about 2 ms to upload the texture with glTexImage2D.
Here is my code:
// Perform actual memory mapping of VPE output int vpeSize = vpeOutPutBuffer->bo[0]->size; vpeMmapFrame = (char *)mmap(0, vpeSize, PROT_READ, MAP_PRIVATE,vpeOutPutBuffer->fd[0], vpeOutPutBuffer->bo[0]->offset); assert(vpeMmapFrame != MAP_FAILED); //neonPermuteARGBtoBGRA((uchar*)vpeMmapFrame,(uchar*)m_Rgba,pixelCount); memcpy(m_Rgba,vpeMmapFrame ,pixelCount*4); //m_Rgba = vpeMmapFrame; // Activate texture unit 1 and submit VL frame to it glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, m_VLTextureLCD); glTexImage2D(GL_TEXTURE_2D, 0, GL_BGRA_EXT, vlWidth, vlHeight, 0, GL_BGRA_EXT, GL_UNSIGNED_BYTE, m_Rgba); glUniform1i(m_VLDataUniformLCD, 1);
I do not see an OpenGL extension for directly accepting DMA-BUFs.
Why is this so slow? How can I directly pass the DMA-BUF from the VPE to OpenGL efficiently? Thanks!