This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

HDVPSS Freezing when DDR is Overused



Hey all.  Hoping someone can help put me in the right direction as to how to further debug this issue. 

First some background:

We have a system using the DM8168.  We are using it to encode dual 60FPS video streams using gstreamer.  4x 2Gb Word Wise DDR chips are being used (2 per bank) resulting in 1GB of memory total.  With new timing numbers setup in U-Boot for these chips we have the system booting and working properly and mtest in uboot passes with no errors.

The issue we are now facing is when doing dual 60FPS encode (single 60FPS & dual 30FPS work fine), the VPSS stops feeding video if other applications steal memory for too long. I am pretty confident it is DDR bandwidth not CPU, since running dual 30 FPS at high bitrates (essentially forcing the CPU to be 100%) has no issues.

We originally were having an issue simply initializing the dual 60FPS (video would simply never start working [VPSS crash right away]), but this was 'fixed'/hacked by initializing 1 stream and then stopping it and then initializing 2 streams very slowly (sleep in between every gstreamer init call).  Of course this original issue i believe is still the main issue we are facing, i have simply bypassed it for a short time.

Essentially once the dual 60FPS is encoding and streaming if i do enough operations in the background (multiple SNMP get/sets, many ioctls to a spi driver, network socket communications) the streaming will eventually stop with no errors.  It is as if DDR bandwidth is being overloaded by these other calls and the VPSS can't recover in these instances.

Analyzing my Gstreamer pipes I see that video is essentially no longer being captured and hence everything has stopped. Linux and everything else in the system responds with no issues whatsoever.  I am confident video is still being fed to the VIP from our source.

Once the video layer gets in this freeze state it is no longer recoverable.  A full reboot is required. 

I ran './loggerSMDump.out 0x9E400000 0x100000 all' and no errors were shown.  It essentially just kept printing and when the freeze state happened it simply stopped printing (no errors or anything).  The last lines were:
N:Video P:1 #:29405 T:0000012b76a2d25b M:xdc.runtime.Main S:StartInstance: HDVICP_1
N:Video P:1 #:29406 T:0000012b76dbc74b M:xdc.runtime.Main S:StopInstance: HDVICP_2
N:Video P:1 #:29407 T:0000012b76ebd24b M:xdc.runtime.Main S:StartInstance: HDVICP_2
N:Video P:1 #:29408 T:0000012b76f9ef69 M:xdc.runtime.Main S:StopInstance: HDVICP_1
N:Video P:1 #:29409 T:0000012b774e187f M:xdc.runtime.Main S:StopInstance: HDVICP_2

Which again is what prints the whole time.

dmesg shows nothing.

The only errata I see relevant is 2.1.32 (RGB to YUV or YUV to RGB Inline Within HDVPSS VIP May Lead to VIP Path Lockup if DDR Bandwidth is Overconsumed) but we are not doing any color space conversions in the chip.

 

I am at a lost at this point because I do not know how to debug each part of the VPSS layer. 

Is there anyway to reset each individual component of the VPSS layer so I can determine what part is crashing?

Are there any other debugging utilities that can help me narrow down what is truly happening?

Is there anyway to see if my DDR isn't optimized properly?  I have software leveling enabled and went through the JTAG process w/ CCS, though i can't confirm it is perfect.  We also used the excel sheet to generate the DDR timing numbers for these newer chips and again mtest in uboot passed.

We are running the DDR at 796Mhz.

Please any insight would help and be greatly appreciated.  I have been trying things for weeks no to no avail and I really need someone who understands the lower VPSS layer to assist me. 

If any other info is needed please let me know and I will provide it.

Thanks in advance.

  • Hello,

    What is the use case that you are using here (pipeline)?

    What is the software release as well?

    BR

    Margarita

  • The pipeline is essentially (note i do it programatically hence the non-exact syntax):


    V4L2src /dev/video0 always-copy=false --> Video Caps (x-raw-yuv,NV12,1920x1080,60,8 buffers) --> omxbufferalloc (8 buffers) --> omxh264enc --> mpegtsmux-->udpsink

    V4L2src /dev/video2 always-copy=false --> Video Caps (x-raw-yuv,NV12,1920x1080,60,8 buffers) --> omxbufferalloc (8 buffers) --> omxh264enc --> mpegtsmux-->udpsink

     The VPSS freezing happens regardless of bitrate and # of buffers.  I've tried many different combinations so with no avail.  Video always stops after a certain amount of time.  Hence why i don't think its directly related to the software and more likely related to VPSS crashing due to something like memory.  But again I do not have any way to properly debug this.

    Using:

    EZSDK 5_05_02_00

    OpenMAX IL 0.3

    Gstreamer 0.10

  • Hello Chris,

    Are you using dual capture?

    BR
    Margarita
  • yes i am doing dual capture.  the pipelines listed are actually a single pipeline.

  • Hello,

    So, You observed this hang only in case of dual capture, correct?

    The default EZSDK does not support dual capture.

    BR

    Margarita

  • Apologies Margarita.


    For the dual capture part of gstreamer we use the RidgeRun SDK. 

    Again I believe the issue to be related to DDR not being optimized and causing the VPSS to fault and I'd really like some assistance on how to more properly debug the VPSS layer (resetting individual components, etc) and potentially better ways to debug DDR not running optimally (mtest passes but maybe something more intensive?)

  • Hello,

    In this case I would recommend you to check this with Ridgerun, since in the default EZSDK we does not support dual capture. We does not have the RidgeRun SDK so we could not reproduce this issue or investigate it.

    I would recommend you to try almost the same use case replace v4l2src with videotestsrc element if it is passed/failed let me know.

    BR
    Margarita
  • I am in talks with RidgeRun but we both have slowly concluded its probably a memory issue potentially with the configuration of the memory.

    I have done videotestsrc x2 but the issue there is generating 2x 1080P60 videotestsrcs is a giant CPU hog and hence the throughput is not the same. videotestsrc also doesn't use omxbufferalloc which is an important part of the pipeline. Hence the test isn't 1:1.

    I have piped raw videotestsrc data to a file and then used filesrc to read the data (that way the CPU doesn't hog as much), but again this isn't using omxbufferalloc and that is where the memory is being hit hardest since w/ the video capture.
  • Hello,

    You could set fps 30 with videotestsrc element, just pass it in the caps.
    Set the pattern property in the videotestsrc element with something simple to reduce the CPU load. Set the property in the videotestsrc element is-live=true also.
    You are right that the pipeline will be not exactly the same(missing v4l2src and omxbufferalloc), but if it is working with dual videtestsrc elements you could try to search the problem in the missing elements.

    Are the RigdeRun able to reproduce this problem on their side or this is observed only on yours with custom board?

    BR
    Margarita
  • Margarita,


    I haven't had issues getting dual 30fps streams working.  As mentioned in my original post, I have dual 1080p 30FPS working without issue and can run as many items as i want and it will not freeze.


    Only when performing dual 60fps do issues arise, and only when running many items in the background. 
    So both 60fps streams come through, but then simply stop when performing heavy background operations while streaming. 

    The only way to fix this is to reboot or rmmod all the video coprocessor drivers (vpss, ti81xxvin, syslink, cmemk) and re insmod them. 

    Without reinitializing all these drivers, this freezing/no video input detected stays even after restarting my gstreamer based application.

    Hence why I wanted to know if there is any way to reset just individual parts of the VPSS so i can at least figure out what part is locking up.


    RidgeRun says they haven't had issues on the eval board, but again they aren't running my exact software in the background. 

  • Margarita,

    I just determined that this isn't strictly related to 60fps and dual, it is solely related to what bitrate the encoded stream(s) is outputted as. 

    I originally that it was solely a 60fps dual issue, but that was only because when doing dual 60fps with a variable bitrate the encoder simply can't constrain the bitrate.  Hence if the stream is 1080p60 the minimum bitrate sent out is always at least 6Mbps regardless of what the actual target bitrate is set at. 

    And it seems if the total bitrate is 15mbps or above, the VPSS is inclined to freeze.

    So when running a single 1080p60 stream at 15-25mbps, the VPSS will always eventually freeze.

    Hence the VPSS lockup is unrelated to the video input, it is solely related to the amount of data being used and hence how overloaded my DDR is.

    Any Ideas?  Again i do not know if this is related to my board solely or not.  still determining.

  • Hello,

    What about if you set bitrate 5Mbps or lower for 1080p60?
    The recommended value for the bitrate for 1080p60 is 10Mbps but keep in mind that you are performing streaming(network bandwidth)


    BR
    Margarita
  • Hey Margarita,

    Where are the recommended values for various resolutions?

    Also if the recommended value for 1080p60 is 10Mbps, does that mean it gets split in half for dual capture/encode (5Mbps each)?

    My bigger issue here is that when the system gets in this "lock up state" it freezes and can't recover.  And with these higher resolutions, it is hard to contain these bitrates when there is high motion on the screen (even with a constant bitrate setup).

    So if high motion arises (often in our environment) this results in a sudden bitrate spike and hence a freeze.

    -Chris

  • Margarita,


    I implemented a constrained constant bitrate system to ensure the bitrate doesn't bleed higher. 

    I have determined the following:

               1x 720p60 at 10Mbps doesn’t lock up(unless heavy background operations are being performed)

                1x 720p60 at 15Mbps usually locks up after 10 min

                1x 720p60 at 20Mbps usually locks up after 5 min.

      2x 720p60 at combined 10mbps doesn’t lock up(unless heavy background operations are being performed)

                2x 720p60 at combined 12Mbps usually locks up after 2 min

                2x 720p60 at combined 20Mbps will lock up right away

     again the lock up situation is when the VPSS layer essentially stops processing video with no errors.

  • Hello,

    Could you confirm that the HDVICP (h264 encoder) is hanging. You could check this if you have the overlay package. You could check it by VIDENC2_process () function in omx_venc.c is returning or not by using traces.

    Best Regards,

    Margarita

  • Hey,


    I will try and check that out. 

    I was able to narrow down that it seems related to the mpegtsmux.  I previously had tested w/o the mux but I ended up doing a faulty test (other issues at that time) so it wasn't accurate.

    But it does seem removing the mux and replacing it with a qtmux doesn't show the issues.

    Obviously i need this mpegtsmux though.  I think the mux simply is not keeping up with the amount of frames the encoder is feeding and eventually the encoder freezes up.  Again I believe it isn't the mpegtsmux locking up because restarting the application doesn't alleviate the issue (only reseting the full vpss drivers and firmware). 

    So basically it seems my only options are somehow get this mpegtsmux to be more optimized or find a way for the encoder to not get in that state.

    I put in a very large leaky queue in between the 2 elements and this prevented the lock up (since the encoder could keep passing data) but this queue leaks in packets instead of frames.  Hence the displayed data looks awful. 

    I know on previous SOCs (DM368) we didn't have the issue since the encoder would just drop the frame if the mux couldn't accept it which visually looks like a stutter but that is fine. 

    It seems here it is doing the same thing (frame drops) but eventually it just stops and that is what Im trying to alleviate.

  • Hello,

    I see in your pipeline between the h264enc and mpegtsmux there is no h264parse element. Could you add it?

    BR
    Margarita
  • The mux we wrote is reliant on the h.264 data being a bytestream but i dont believe the standard mpegtsmux is.

    As i test I ran it. It still required a leaky queue in between to get video to feed. And when it did it still dropped as packets instead of frames.

    Is there anyway to have the pipeline be something like:
    h264enc -->h264parse(to get full frames)-->queue(which will drop just full frames)-->mux
    ?

    That way the h264enc wont overflow (and hence that vpss state machine wont lock up) but the mux still receives full frames just at a slower pace so it can handle it?
  • Having the queue drop full frames still doesn't help the issue since it will drop random p-frames. and since p-frames reference each other, artifacts are still highly prevelant in the video.

    Seems the only way to fix this is to:
    Have the encoder not lock up when mpegtsmux doesn’t keep up.

    Find a way to have mpegtsmux keep up. Optimization/priority of some kind?

    Have the frames be dropped dynamically (so mpegtsmux can catch up) in a way that doesn’t create so many artifacts on screen.
  • Hello,

    Could you confirm that when your board hang is because the encoder hangs?

    You could check the H264_Encoder_HDVICP2_UserGuide.pdf for the allowFrameSkip settings.

    Mpegtsmux is non a TI element you could search in the net for similar issue with the mpegtsmux. The mpegtsmux is part of the bad plugin. In the default EZSDK we have gst-plugins-bad-0.10.21 version.

    BR
    Margarita

  • I do not how to confirm 100% the encoder is the one hanging, but based on these items i have made that assumption.

    • When the freeze happens, encoding no longer works even after application restart. 
      • After freeze, running v4l2src -->perf-->fakesink shows data is still being sent all the way through
      • After freeze, running v4l2src-->omxbufferalloc-->perf-->fakesink shows data is still being sent all the way through
      • But after freeze, running v4l2src-->omxbufferalloc-->omx_h264enc-->perf-->fakesink shows data stopping.
      • Hence the omx_h264enc element is causing no data to traverse
    • Only way to recover is to unload the vpss, ti81xxvin, and firmware and reload them all.
      • Then h.264enc element works again since this essentially forces a reset
    • putting a queue leaky queue between the mux and encoder prevents the crash (since encoder always has space to write to)

    Basically mpegtsmux not keeping up is an issue, but the main issue is the encoder freezing when this happens.  I don't think we can ever get the mux to a point where it will keep up 1:1 with the encoder, it just doesn't seem possible.  So the only other solution is to prevent the encoder from freezing or have it drop frames in a manor which doesnt cause horrid visual loss (hence frames would have to drop pre-encode)

    I enabled allowFrameSkip and the freeze still happened. 

  • Hello,

    Could you confirm that the HDVICP (h264 encoder) is hanging. You could check this if you have the overlay package. You could check it by VIDENC2_process () function in omx_venc.c is returning or not by using traces.
    You could also try to increase the encoder output buffers.

    I can not check this issue on my side with v4l2src element, are you observe it if you are using videotestsrc? If yes, provide me steps to check it on my side.

    BR
    Margarita