This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM3730 gstreamer performance

Other Parts Discussed in Thread: DM3730

I'm working on a application to trans-code streaming video using the DM3730. The input video is x264 740x480 at 30fps. The gstreamer pipeline isn't able to keep up, and I'm not sure if its a limitation of the DM3730 or a problem with my gstreamer pipeline.

Here is my pipeline:

gst-launch  udpsrc multicast-group=239.255.0.1 pt=1841 ! mpegtsdemux ! video/x-h264 ! h264parse ! queue ! TIViddec2 numOutputBufs=12 ! queue ! TIVidenc1 codecName=h264enc engineName=codecServer rateControlPreset=3 bitRate=358400 framerate=30/1 contiguousInputFrame=true ! dmaiperf print-arm-load=true engine-name=codecServer ! queue ! rtph264pay name=pay0 ! udpsink host=239.255.0.2 port=8000

And the output:

Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
INFO:
gsttidmaiperf.c(302): gst_dmaiperf_start (): /GstPipeline:pipeline0/GstDmaiperf:dmaiperf0:
Printing DSP load every 1 second...
Setting pipeline to PLAYING ...
New clock: MpegTSClock
INFO:
Timestamp: 0:09:40.920562747; bps: 0; fps: 0; CPU: 0; DSP: 79; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd858; used: 0x127a8; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x464190; used: 0x1b9bad0; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
INFO:
Timestamp: 0:09:41.979125980; bps: 69202; fps: 15; CPU: 16; DSP: 97; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd858; used: 0x127a8; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x464190; used: 0x1b9bad0; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
INFO:
Timestamp: 0:09:42.999542239; bps: 13252; fps: 14; CPU: 7; DSP: 98; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd858; used: 0x127a8; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x464190; used: 0x1b9bad0; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
INFO:
Timestamp: 0:09:44.017120363; bps: 12517; fps: 14; CPU: 10; DSP: 97; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd858; used: 0x127a8; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x464190; used: 0x1b9bad0; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
INFO:
Timestamp: 0:09:45.052001956; bps: 20894; fps: 14; CPU: 7; DSP: 98; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd858; used: 0x127a8; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x464190; used: 0x1b9bad0; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;

I've looked through the example gstreamer pipelines, but I couldn't find any examples of using hardware accelerated decoding and encoding in the same pipeline.

Any advice would be appreciated.

Brian

  • Hello,

    What is the software release that you are using here?

    Could you provide more details about the use case?

    I will check it and I will let you know.

    Best Regards,

    Margarita

  • Hello,

    I tried something similar to your pipeline:

    gst-launch  tcpclientsrc host=192.168.1.1 typefind=true  ! queue  ! mpegvideoparse   ! queue ! TIViddec2 numOutputBufs=12 ! queue ! TIVidenc1 codecName=h264enc engineName=codecServer rateControlPreset=3 bitRate=358400 framerate=30/1 contiguousInputFrame=true ! dmaiperf print-arm-load=true engine-name=codecServer ! queue ! rtph264pay name=pay0 ! tcpserversink host=192.168.1.2

    I do not have problem with low fps in my case is 24, also the load is around 50%:Timestamp: 0:24:45.520141602; bps: 40585; fps: 24; CPU: 5; DSP: 49; mem_seg: DDR2; base: 0x87c2d100; size: 0x20000; maxblocklen: 0xd730; used: 0x128d0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x19cbe90; used: 0x633de8; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:24:46.521026611; bps: 43067; fps: 24; CPU: 9; DSP: 51; mem_seg: DDR2; base: 0x87c2d100; size: 0x20000; maxblocklen: 0xd730; used: 0x128d0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x19cbe90; used: 0x633de8; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:24:47.521667479; bps: 33164; fps: 24; CPU: 9; DSP: 55; mem_seg: DDR2; base: 0x87c2d100; size: 0x20000; maxblocklen: 0xd730; used: 0x128d0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x19cbe90; used: 0x633de8; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:24:48.523132324; bps: 37068; fps: 23; CPU: 6; DSP: 56; mem_seg: DDR2; base: 0x87c2d100; size: 0x20000; maxblocklen: 0xd730; used: 0x128d0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x19cbe90; used: 0x633de8; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:24:49.524261473; bps: 22113; fps: 23; CPU: 12; DSP: 51; mem_seg: DDR2; base: 0x87c2d100; size: 0x20000; maxblocklen: 0xd730; used: 0x128d0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x19cbe90; used: 0x633de8; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:24:50.524383545; bps: 39781; fps: 24; CPU: 2; DSP: 49; mem_seg: DDR2; base: 0x87c2d100; size: 0x20000; maxblocklen: 0xd730; used: 0x128d0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x19cbe90; used: 0x633de8; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;

    You can try to address the issue by enlarging the buffers. (If that doesn't help you have to change the codecs or its parameters and / or lower the resolution.) You could try enlarge the queues as well as the buffers of your sources, e.g. ! queue max-size-buffers=X max-size-time=X max-size-bytes=X !. Even if not necessary increasing the buffer-sizes doesn't do any harm.

    Keep in mind big buffers  increase the pipeline's latency.

    Best Regards,

    Margarita

  • I am using Logic PD's BSP. It is a linux 3.0.0 kernel, and is based on DVSDK 4.02.

    The application I am working on is to reduce the bitrate of a video stream. The input video is h264 encoded at approximately 1.5 mbps, 720x480. The input video is out of my control. I have more flexibility with the output video, the requirement is for a low bitrate, 300-400kbps. The codec and resolution I can change if needed to meet the bitrate requirement.

    I've found gstreamer's test video source is a close approximation of my input video. I used this same pipeline to generate the input video in my first post.

    gst-launch-0.10 -v videotestsrc ! videorate ! videoscale ! video/x-raw-yuv,framerate=30/1,width=720,height=480 ! timeoverlay font-desc="Verdana bold 50px" ! x264enc bitrate=1200 byte-stream=true !  mpegtsmux ! udpsink host=239.255.0.1 port=1841

    I was surprised how low your DSP utilization was compared to mine. I made sure my CPU governor was set to "performance" and the clock speed was at 1Ghz throughout the test. Is there an equivalent governor for the DSP that I should check?

    I will try changing the buffer sizes and the codec of the ouput video. I'm also going to try resizing the video after its decoded, but it was my understanding that the TIVidResize codec was not available on the DM3730.

  • Hello,

    I am using DM3730 EVM with linux kernel 2.6.37 DVSDK 4.03. 

    Brian Hemmersmeier said:
    I will try changing the buffer sizes and the codec of the ouput video.

    Let me know the result.

    Best Regards,

    Margarita

  • I've tweaked my gstreamer pipeline to produce an output video at a lower resolution and frame-rate.

    gst-launch udpsrc multicast-group=239.255.0.1 port=1841 ! video/mpegts ! mpegtsdemux ! video/x-h264 ! TIViddec2 codecName=h264dec engineName=codecServer  numOutputBufs=3 framerate=30/1 ! queue ! videorate ! videoscale method=0 ! video/x-raw-yuv,framerate=10/1,height=240,width=320 ! TIVidenc1 codecName=h264enc engineName=codecServer rateControlPreset=3 bitRate=358400 genTimeStamps=true framerate=10/1 contiguousInputFrame=false ! dmaiperf engine-name=codecServer print-arm-load=true ! rtph264pay name=pay0 ! udpsink host=239.255.0.2 port=8000

    Setting pipeline to PAUSED ...
    Pipeline is live and does not need PREROLL ...
    INFO:
    gsttidmaiperf.c(302): gst_dmaiperf_start (): /GstPipeline:pipeline0/GstDmaiperf:dmaiperf0:
    Printing DSP load every 1 second...
    Setting pipeline to PLAYING ...
    New clock: MpegTSClock
    INFO:
    Timestamp: 0:55:45.630676276; bps: 0; fps: 0; CPU: 0; DSP: 25; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd840; used: 0x127c0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x28a710; used: 0x1d75550; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:55:46.699462897; bps: 23856; fps: 9; CPU: 52; DSP: 86; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd840; used: 0x127c0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x28a710; used: 0x1d75550; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:55:47.772338873; bps: 18107; fps: 13; CPU: 74; DSP: 84; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd840; used: 0x127c0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x28a710; used: 0x1d75550; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:55:49.549469001; bps: 2409; fps: 1; CPU: 18; DSP: 4; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd840; used: 0x127c0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x28a710; used: 0x1d75550; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:55:50.646240240; bps: 17755; fps: 11; CPU: 81; DSP: 68; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd840; used: 0x127c0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x28a710; used: 0x1d75550; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;
    INFO:
    Timestamp: 0:55:51.664520270; bps: 17668; fps: 13; CPU: 70; DSP: 79; mem_seg: DDR2; base: 0x87c42e80; size: 0x20000; maxblocklen: 0xd840; used: 0x127c0; mem_seg: DDRALGHEAP; base: 0x85a00000; size: 0x2000000; maxblocklen: 0x28a710; used: 0x1d75550; mem_seg: L1DSRAM; base: 0x10f04000; size: 0x10000; maxblocklen: 0x0; used: 0x10000;

    Playback is now pretty smooth. But there are some situations where the pipeline doesn't produce any video at all.

    If I start my gstreamer pipeline, and then I begin streaming video over UDP, the pipeline works fine. If I start streaming the UDP video before starting my gstreamer pipeline, no output video is produced. I've captured the output with DMAI_DEBUG=2.

    @0x000117c8:[T:0x4030b000] ti.sdo.dmai - [Dmai] Dmai log level set to '2'. Note that calling CERuntime_init after this point may cause unexpected change to DMAI tracing behavior.
    Setting pipeline to PAUSED ...
    Pipeline is live and does not need PREROLL ...
    INFO:
    gsttidmaiperf.c(302): gst_dmaiperf_start (): /GstPipeline:pipeline0/GstDmaiperf:dmaiperf0:
    Printing DSP load every 1 second...
    Setting pipeline to PLAYING ...
    0:00:00.343444822  2842    0x16050 WARN                     bin gstbin.c:2378:gst_bin_do_latency_func:<pipeline0> failed to query latency
    New clock: MpegTSClock
    @0x00087626:[T:0x43fb0490] ti.sdo.dmai - [Vdec2] Video decoder instance created
    @0x000878e4:[T:0x43fb0490] ti.sdo.dmai - [Vdec2] Made XDM_SETPARAMS control call
    @0x000879d8:[T:0x43fb0490] ti.sdo.dmai - [Vdec2] Made XDM_GETBUFINFO control call
    @0x00087bfe:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Alloc Buffer of size 3225600 at 0x40bd2000 (0x83700000 phys)
    @0x00087e23:[T:0x43fb0490] ti.sdo.dmai - [BufTab] Allocating BufTab for 3 buffers
    @0x00087f17:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Alloc Buffer of size 1843200 at 0x417d4000 (0x83a14000 phys)
    @0x00087f91:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Alloc Buffer of size 1843200 at 0x41b51000 (0x83bd6000 phys)
    @0x0008800b:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Alloc Buffer of size 1843200 at 0x43fb1000 (0x83d98000 phys)
    @0x007a6516:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Set user pointer 0x40bd2000 (physical 0x83700000)
    @0x007a863a:[T:0x43fb0490] ti.sdo.dmai - [Vdec2] VIDDEC2_process() ret -1 inId 0 inUse 0 consumed 113733
    @0x007a86b4:[T:0x43fb0490] ti.sdo.dmai - [Vdec2] VIDDEC2_process() non-fatal error 0x1000
    @0x007a8710:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Set user pointer 0x40bedc45 (physical 0x8371bc45)
    @0x007acd84:[T:0x43fb0490] ti.sdo.dmai - [Vdec2] VIDDEC2_process() ret 0 inId 0 inUse 0 consumed 4966
    @0x007acef2:[T:0x43fb0490] ti.sdo.dmai - [Vdec2] Made XDM_GETBUFINFO control call
    @0x007acf2f:[T:0x43fb0490] ti.sdo.dmai - [BufTab] Trying to chunk BufTab with 3 buffers of size 1843200 aligned on 4096 to 4 buffers of size 691200
    @0x007acf6c:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Set user pointer 0x41b51000 (physical 0x83bd6000)
    @0x007acf8a:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Set user pointer 0x41bfa000 (physical 0x83c7f000)
    @0x007acfa9:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Set user pointer 0x43fb1000 (physical 0x83d98000)
    @0x007acfc7:[T:0x43fb0490] ti.sdo.dmai - [BufTab] New chunked BufTab has 4 buffers, still need 0 buffers
    @0x007ad005:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Set user pointer 0x40beefab (physical 0x8371cfab)
    @0x007b1c6e:[T:0x43fb0490] ti.sdo.dmai - [Vdec2] VIDDEC2_process() ret 0 inId 0 inUse 0 consumed 5134
    @0x007b1ce8:[T:0x43fb0490] ti.sdo.dmai - [Buffer] Set user pointer 0x40bf03b9 (physical 0x8371e3b9)

    The log messages from Buffer and Vdec2 repeat continuously, but there are never any logs from Venc1. I also never see the output from dmaiperf. What could cause this pipeline to fail only in some situations?

    Brian

  • Hello,

    Did you observe the same behaviour before the changes?

    Also could you add some queue elements with properties in the pipeline check my previous post.

    Best Regards,

    Margarita

  • Logic PD's BSP sets the DSP frequency to 260MHz. I don't believe this is documented anywhere, but I ran into similar problems, and obviously saw performance improvements after changing the clock to 800MHz.  Unfortunately, it seems linux's cpufreq/governor/driver doesn't touch the DSP clock.

    Here's what I did, and some cautions.

    http://e2e.ti.com/support/embedded/linux/f/354/p/286591/999622.aspx#999622