This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320DM8168: How to make DM8168 run video capture, processing and streaming with high performance

Part Number: TMS320DM8168
Other Parts Discussed in Thread: TVP7002

Hi everybody,
I'm working on KIT DM8168 with EZSDK. I want to build an application which contains video capture, image processing and video streaming.Could anyone give some advice about designing software architecture ?

Requirements:
+ Capture from tvp7002 (camera HD analog): frame 1280x720
+ Run image processing on DSP using some special algorithms
+ Streaming frame results using gstreamer (25 fps)

Below is my current architecture:
+ On ARM : 2 applications
Main app: main application (capture video, communicate with core DSP and streamer process)
LiveStreamer: run gstreamer pipeline with gstappsrc
+ On Core DSP (DSP app): run image processing algorithms

About communication among applications:
+ Main app and DSP app : use syslink notify and shared region.
I choose shared region IPC_SR_COMMON (addr = 0x9F700000, length = 2MB). Main app captures frame into shared region, notify to DSP app to run algorithms. DSP app run algorithms and notify to main app when finish processing.

+ Main app and LiveStreamer app: in LiveStreamer I use memory mapping to read data directly from physical address of shared region (0x9F700000). LiveStreamer run a pipeline, with GstAppSrc to push frame data into.
I use a separate process to run gstreamer because I can't make Qt main loop in main application and gstreamer work together. Moreover, I think I can start/stop gstreamer easily without worry about memory management.

Currently, my system works but with bad performance: Video stream only 8 fps, ARM load 100%.

This is 3 main problems:
+ In LiveStreamer: GstAppSrc need to create new GstBuffer to copy frame data
GstBuffer *buffer = gst_buffer_new();

uint8_t *tempBuffer = (uint8_t*)malloc(app->length);
memcpy(tempBuffer, app->imgBuf, app->length); // copy frame data into new buffer

GST_BUFFER_MALLOCDATA(buffer) = tempBuffer;
GST_BUFFER_SIZE(buffer) = app->length;
GST_BUFFER_DATA(buffer) = GST_BUFFER_MALLOCDATA(buffer);
GstFlowReturn ret = gst_app_src_push_buffer(app->appSrc, buffer);

I can't find any solutions to push data without allocate new buffer.


+ GstAppSrc read data from shared region and copy to new GstBuffer.
memcpy on ARM is very slow (100 ms for copy frame 1280x720).

+ Shared region is cached. If I only use image processing, turn off LiveStreamer, algorithms on DSP run faster. If enable LiveStreamer (use memory mapping), algorithms run 6->10 ms slower. Moreover, region IPC_SR_COMMON is very small, how can I create large shared region for Host and MC-HDVICP2 and MC-HDVPSS.

Could anyone give me some advice to solve above problems or suggest another software architecture for my system ?

  • Hello,

    Kiet Phi77 said:
    region IPC_SR_COMMON is very small, how can I create large shared region for Host and MC-HDVICP2 and MC-HDVPSS.

    Please check this wiki page:

    http://processors.wiki.ti.com/index.php/EZSDK_Memory_Map#H.2FW_and_S.2FW_Limitations_To_Consider_For_Deciding_Memory_Map

    Regarding the other questions I will get back to you when I have some suggestion.

    BR
    Margarita

  • Hi Margarita,

    You have helped me in many threads. Thank for your support.

    I will try rebuilding a new memory map later. Currently I'm testing with gray frame, I want to achieve real time performance. I'm looking forward to hearing from you.

  • Hello,

    Appsrc element is running on A8 so it seems normal to me ARM load to reach 100% and to observe frame drop(~8 fps) especially you have memcpy .
    Could you provide more information about what main app is doing and the livestreamer app.
    Per my understanding main app perform capture->streaming. The second one, you wrote a plugin with element which is running on DSP for some processing.

    BR
    Margarita
  • Here is my system diagram.

    1. Running on A8

    Main application: 

    • Capture from /dev/video0 using V4L2 API
    • Communicate with DSP app on core DSP (syslink notify)
    • Comunication with LiveStreamer (use unix signal)
    • Some other features

    LiveStreamer: run gstreamer pipeline, use GstAppSrc element to push frame data (example code http://amarghosh.blogspot.com/2012/01/gstreamer-appsrc-in-action.html). I modify function read_data(gst_app_t *app), in this function, LiveStreamer wait signal from Main app ( sigwait(&waitset, &sig) ) to read data from shared region and push frame data into GstAppsrc using gst_app_src_push_buffer.

    Main app capture frame into shared region, notify to DSP app to run algorithms. DSP app run algorithms and notify to main app when finish processing. Main app sends a signal to LiveStreamer to it read frame data and push to pipeline.

    2. Running on Core DSP

    DSP app: run image processing algorithm

    In a nutshell, I have a shared region for Main app and DSP to hold frame data. When capture from /dev/video0, I must copy frame into shared region, when DSP finish processing frame, LiveStreamer also must copy frame data from shared region into new GstBuffer to push into GstAppSrc

  • Hello,

    Thanks for the diagram.
    Did you read this manual?
    www.freedesktop.org/.../gstreamer-GstBuffer.html

    BR
    Margarita
  • Hi,

    I have read documentation about GstBuffer and appsrc element. appsrc has an internal queue which we need to create new GstBuffer and push it into. appsrc also auto free memory of GstBuffer.

    If I assign data pointer of GstBuffer directly to frame buffer (shared region), pipeline can't run.

     

  • I didn't get you completely ,
    it might help , use
    buffer = gst_buffer_new();
    GST_BUFFER_DATA(buffer) = ptr_of_shared_memory_buffer;
    you can even assign free function to buffer using which you can know when your done with that frame buffer.
  • Thanks.

    I have modified my code: create new GstBuffer with GST_BUFFER_DATA(buffer) = ptr_of_shared_memory_buffer. I also add an empty free function to GstBuffer not free memory automatically.

    Pipeline can run, but system performance is slower, FPS ~ 3 (use memcpy FPS ~8). Although don't use any memcpy, performance is still bad.

    You don't need to help me fix current problems if this software architecture is bad, can you suggest me about another system design ? 

    Many examples on DM8168 only include capture and streaming (dual capture, streaming or capture, encode to file), can anyone show me other examples which include capture, image processing on DSP, and streaming frame results ? I want to know how to share data from v4l2 capture buffer, share region DSP and ARM, and gstappsrc buffer.

  • Hello,

    I see that you are capture through tvp7002.
    As you know the gstreamer v4l2src element that you are using is running on A8. The memcpy, streaming element also.
    But we have another element for capture through tvp7002. This is OMX component which is running on media controller (M3 VPSS).
    You could check about VFCC element in the OMX user guide.
    Unfortunately we do not have gstreamer element that is using VFCC.
    But as you could see the gstreamer omx elements are also based on some OMX component like omx_h264enc -> VENC.
    My point is since I know some community members used/wrote element called omx_camera which is based on omx VFCC element instead of v4l2src. Which mean that the capture element will run on VPSS not on A8.
    You could search in the forum for more information.
    But this element omx_camera is not supported by TI.

    In additional, we have omx examples like capture_encode. There is example like VLPB example which is running on DSP(refer OMX user guide). But there is no OMX component for streaming.

    Hope something of this helps you.

    Best Regards,
    Margarita
  • Hi Margarita,

    I'm trying to use high level API, and gstreamer is the only high-level framework to develop encode and streaming. 

    I capture frame use V4L2 API, similar to saloopback example. Capture and display run very smoothly, ARM load 0%. I only stuck in sharing data among A8 app, DSP app, gstreamer app. LiveStreamer runs on A8 with an appsrc, I need create a new GstBuffer and copy frame data into it. Gstreamer documentation recommends this solution. Currently, I can set GstBuffer to point to share memory, but all system is getting slower, and this is not a safe solution.

    Could I use EDMA on A8 to copy frame data faster ?

    I can't find any examples or reference designs about a system which can run capture, image processing, encode and streaming in DM8168. Am I trying to design an over-complex system which is not suitable for DM8168 ?

    I will buy RidgeRun Professional SDK, hope they can help me solve current problem.

    Thank for your support !!!

  • Hello,

    In Ezsdk we have EDMA demo you could take a look in it.
    .../example-applications/linux-driver-examples-psp04.04.00.01

    BR
    Margarita
  • Thank for your help ! If I can make my system run faster, I will post my solution in here.