Hey all. Hoping someone can help put me in the right direction as to how to further debug this issue.
First some background:
We have a system using the DM8168. We are using it to encode dual 60FPS video streams using gstreamer. 4x 2Gb Word Wise DDR chips are being used (2 per bank) resulting in 1GB of memory total. With new timing numbers setup in U-Boot for these chips we have the system booting and working properly and mtest in uboot passes with no errors.
The issue we are now facing is when doing dual 60FPS encode (single 60FPS & dual 30FPS work fine), the VPSS stops feeding video if other applications steal memory for too long. I am pretty confident it is DDR bandwidth not CPU, since running dual 30 FPS at high bitrates (essentially forcing the CPU to be 100%) has no issues.
We originally were having an issue simply initializing the dual 60FPS (video would simply never start working [VPSS crash right away]), but this was 'fixed'/hacked by initializing 1 stream and then stopping it and then initializing 2 streams very slowly (sleep in between every gstreamer init call). Of course this original issue i believe is still the main issue we are facing, i have simply bypassed it for a short time.
Essentially once the dual 60FPS is encoding and streaming if i do enough operations in the background (multiple SNMP get/sets, many ioctls to a spi driver, network socket communications) the streaming will eventually stop with no errors. It is as if DDR bandwidth is being overloaded by these other calls and the VPSS can't recover in these instances.
Analyzing my Gstreamer pipes I see that video is essentially no longer being captured and hence everything has stopped. Linux and everything else in the system responds with no issues whatsoever. I am confident video is still being fed to the VIP from our source.
Once the video layer gets in this freeze state it is no longer recoverable. A full reboot is required.
I ran './loggerSMDump.out 0x9E400000 0x100000 all' and no errors were shown. It essentially just kept printing and when the freeze state happened it simply stopped printing (no errors or anything). The last lines were:
N:Video P:1 #:29405 T:0000012b76a2d25b M:xdc.runtime.Main S:StartInstance: HDVICP_1
N:Video P:1 #:29406 T:0000012b76dbc74b M:xdc.runtime.Main S:StopInstance: HDVICP_2
N:Video P:1 #:29407 T:0000012b76ebd24b M:xdc.runtime.Main S:StartInstance: HDVICP_2
N:Video P:1 #:29408 T:0000012b76f9ef69 M:xdc.runtime.Main S:StopInstance: HDVICP_1
N:Video P:1 #:29409 T:0000012b774e187f M:xdc.runtime.Main S:StopInstance: HDVICP_2
Which again is what prints the whole time.
dmesg shows nothing.
The only errata I see relevant is 2.1.32 (RGB to YUV or YUV to RGB Inline Within HDVPSS VIP May Lead to VIP Path Lockup if DDR Bandwidth is Overconsumed) but we are not doing any color space conversions in the chip.
I am at a lost at this point because I do not know how to debug each part of the VPSS layer.
Is there anyway to reset each individual component of the VPSS layer so I can determine what part is crashing?
Are there any other debugging utilities that can help me narrow down what is truly happening?
Is there anyway to see if my DDR isn't optimized properly? I have software leveling enabled and went through the JTAG process w/ CCS, though i can't confirm it is perfect. We also used the excel sheet to generate the DDR timing numbers for these newer chips and again mtest in uboot passed.
We are running the DDR at 796Mhz.
Please any insight would help and be greatly appreciated. I have been trying things for weeks no to no avail and I really need someone who understands the lower VPSS layer to assist me.
If any other info is needed please let me know and I will provide it.
Thanks in advance.