This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenMAX: How does VFPC DEI (deinterlace) component copy 1 input buffer to 2 output buffers?

Guru 10685 points

I have 4 questions (all relating to 1080p video frames):

1) How does VFPC DEI (deinterlace) component copy 1 input buffer to 2 output buffers? Is it using EDMA on the ARM side? Or is it all done quickly by the media controller (as I'd expect)?

2) Are there any components that can take 1 input buffer and copy it to 3 output buffers?

3) How many total DEI components can I have running at a time?

4) What is the difference between "OMX.TI.VPSSM3.VFPC.DEIMDUALOUT" and "OMX.TI.VPSSM3.VFPC.DEIHDUALOUT"?

Thanks,
Ralph

  • Seriously, can no one from TI answer my questions?

    I've already read this and see conflicts with what some of it says in the capture_encode source code.

    Ralph

  • Hi,

     

    Here are the answers to your questions,

    1, Hardware supports one input and two outputs

    2, no

    3, 2

    4, these are two instances.

     

    Thx,

    Brijesh

  • Hi Brijesh,

    Thank you so much for the answers. If you can only have a maximum of 2 outputs on a DEI component, why does the DEI component header definition have 16 input and 16 output ports?

    Thanks,
    Ralph

  • Ralph,

    Those are DEI OMX component ports, not the DEI hw ports.

    Regards

    Vimal

  • Brijesh Jadav said:
    3, 2

    Does it mean that no more than 2 DEI OMX component can run in an OMX chain?

    And also can you please say a word about the new undocumented OMX component called OMX.TI.DUCATI.VIDSPLC?
    see also this http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/p/186134/737139.aspx#737139

    Thanks,
    Gabi 

  • Gabi Gvili said:
    Does it mean that no more than 2 DEI OMX component can run in an OMX chain?

    Yes. This is because the DEI components literally represent the hardware on the DM8168. You need to sign an NDA to get a diagram of the inner hardware of the DM8168 which might help you understand things better (it helped me... a bit).

    I can't see a "VIDSPLC" component in my special VPSS diagram though. That'll be a question for TI.

    Ralph

  • Hi Ralph,

    Thanks for your reply, you always answer quickly, unlike TI employees ;).
    When you say no more than 2 DEIs can run together at the same time, do you mean that one is DEIMDUALOUT and the other DEIHDUALOUT?
    Or do you mean that you can use 2 DEIMDUALOUT and 2 DEIHDUALOUT?

    Regarding the VIDSPLC you can look at the new OMX component at:
    ezsdk/component-sources/omx_05_02_00_38/src/ti/omx/comp/vsplc/ 

    Thanks,
    Gabi 

  • Ha, thanks! I know how annoying it is to have to wait days for an answer! ;-)

    I mean 2 DEI in total. I have had 2xDEIM working and 1xDEIH and 1xDEIM. It all seems to be the same. I cannot work out what the difference in the two is. From the looks of the magic diagram I have, nothing.

    Searching for "vslpc" in the EZSDK and on ti.com yields literally no useful text references apart from the header file in the EZSDK which I think you've already seen. Hopefully the hardware has been upgraded and this gives us another splitter component come the new EZSDK.

    Ralph

  • Ralph,

    there is no splitter component in HW. So if copying of buffer is required, EDMA should be used. If same physical buffer can be used for two different purpose (i.e. read only), It should be maintained in software by way of keeping reference counts.

    Regards

    Vimal

  • Hi Ralph,

    Thank you very much for your answers, I really appreciate your answers because in this magical thread, many answers left unanswered for months are being answered, so i will push my luck a little further.
    Regarding the OMX component VLPB on A8, is it possible to instruct the OMX component VLPB on A8 to allocate its buffers on shared region 2 so it will be possible to use the A8 as part of the OMX chain and what is TI recommendation to use the A8 for efficient video processing. please see also:

    http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/210109.aspx

    http://e2e.ti.com/support/embedded/linux/f/354/p/206982/744392.aspx#744392

    Gabi

  • Hi Gabi,

    I had a look at the VLPB source and decided not to use that for my ARM "component". I'm trying to do all ARM stuff in the application code rather than in a separate VLPB component.

    If I need to access the shared memory region 2 my plan is to use the CMEMK module to map it to userspace and access it that way by specifying pool sizes that match the sizes of the allocated buffers. Sounds a bit of a bodge but I'm hoping it will work.

    There might be caching issues, but it seems the easiest way to map it for access by my application.

    Ralph

  • Hi Ralph,

    Thank you for your quick respond.
    My guess is that you are using OMX for capture and display of video frames, if this is the case and you are working with EZSDK 5.04, you don't need the CMEM for mapping this area to linux user space, you can use the the function DomxCore_mapUsrVirtualAddr2phy() please see:
    http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/p/203642/734670.aspx#734670

    I have actually created an OMX "VLPB" A8 component using EDMA myself, but the problem as i see it is that performances of video buffer processing at this area (shared region 2) are very poor, making shred region 2 cacheable doesn't seems to help please refer to: http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/p/206929/735841.aspx#735841  

    My next step will be benchmarking A8 performances working on a buffer allocated by linux in user space VS A8 performances working on a buffer allocated in shared region 2.

    Good luck,
    Gabi 

  • Great, thanks for the virtual2physical function tip. Might not need CMEMK after all, but then again if I'm using Codec Engine to do DSP processing I might do....

    Edit: just realised my previous comment about decode_display was wrong so I've deleted it - in that case the memory being read is allocated by ARM Linux and is not raw 1080p frames but only compressed H264 ones. I suppose it could give you a guideline of what to expect in Linux though; in decode_display it is reading 60x50kB frames per second and according to my profiling most of 40% utilisation of the CPU is spent parsing the H264 frames byte-by-byte. Say 3MB per second at 25% utilisation and if you're using full utilisation that gives ~12MB/s read speed, i.e. 2x1080p raw video frames every second (assuming RGB 24 bit). Maybe 1 frame per second isn't so far off what you'd expect....

    It will be interesting to hear how your test goes with the buffer allocated by Linux vs. that allocated by the media controller in shared region 2. I expect Linux will be much faster. I haven't had a change to dig into the OpenMAX source code yet regarding this particular issue of caching but expect I will do in the near future.

    Ralph

  • Hi Ralph,

    Well, i have inserted CMEM with overlap in linux user space memory area (LINUX_MEM_1) in order to allocate my working buffers in this area, and used EDMA to copy from shared region 2 to my buffers and now from processing from this area i have an amazing rate of 27fps.
    Video is not very good, i think because of cache problems, anyway i will continue to work on it.

    Gabi

  • Hi Gabi,

    that's really good to hear.

    (By the way, my previous post, even after rewriting, is dodgy as that does not take into account network read speed. Wish I hadn't written the middle paragraph at all now!)

    So are you saying that the ARM is able to read through 27x1080p frames per second? What format are your frames in, YUV 420 or RGB 24 bit?

    Ralph

  • Hi Ralph,

    I was working with 1080p YUV 422 (27fps), now i have moved to 720p YUV 422 which is what i really need and i have 60fps, however there are problems with video, i think that i have taken care of the cache issues, but now i see frames synchronization problems.

    Gabi

  • Hi Ralph,

    Can you please take a look at the diagram of the inner hardware of the DM8168 and answer another question?
    Is it possible to use 2 DEIs and one or more scalers (OMX.TI.VPSSM3.VFPC.INDTXSCWB)  together?

    Thanks,
    Gabi 

  • Hi Gabi,

    Turns out the magic diagram is in the TRM and not my NDA docs as it turns out. See page 375 of the TRM.

    I have used 2 DEIs without any problems, 1 feeding into the other.

    Ralph

  • Gabi Gvili said:

    Is it possible to use 2 DEIs and one or more scalers (OMX.TI.VPSSM3.VFPC.INDTXSCWB)  together?

    I have successfully used combinations of DEIMs and INDTXSCWBs.  From what I can tell, you can have up to two DEIMs and up to four INDTXSCWBs, as long as you don't have more than four of both in total (when you try to make a fifth, the VPSS firmware crashes, so I don't know whether this is actually a hardware restriction).

    Both 1x DEIM + 3x INDTXSCWB and 2x DEIM + 2x INDTXSCWB have been working correctly for me.  However, note that I am only using the 420 output port on the DEIs, so the 422 port shouldn't be eating some scaler resource which it might additionally use if enabled.

    - Mark

  • Thanks a lot Mark, this information is very valuable for me.

    Gabi