I have a gstreamer appsrc which feeds into the following end-to-end pipeline:
appsrc name=appsrc \ ! tiovxisp dcc-isp-file=$DCC_PATH/dcc_viss.bin sink_0::dcc-2a-file=$DCC_PATH/dcc_2a.bin \ ! v4l2h264enc ! fpsdisplaysink video-sink=fakesink signal-fps-measurements=true sync=false
Unfortunately, when I feed frames into the frontend of this pipeline at 60 fps (format=bggr, framerate=60/1, width=2056, height=2464), I am only receiving 49 FPS at the output. I instrumented with queues (queue leaky=2) and `tiperfoverlay` and it appears that `tiovxisp` slows down to ≤50 FPS when used in conjunction with `v4l2h264enc`.
I've tried all manner of `output-io-mode` and `capture-io-mode`s on v4l2h264enc as well as `pool-size` on `tiovxisp`.
However, when I break up the pipeline, I am able to hit 60 fps or greater (tiovxisp can hit 100 fps):
# 100 FPS, tiovxisp only, WORKS at full FPS even when other pipeline is running appsrc name=appsrc ! video/x-bayer,format=bggr,framerate=100/1,height=2056,width=2464 \ ! tiovxisp dcc-isp-file=$DCC_PATH/dcc_viss.bin sink_0::dcc-2a-file=$DCC_PATH/dcc_2a.bin \ ! fpsdisplaysink video-sink=fakesink signal-fps-measurements=true sync=false # 60 FPS v4l2h264enc only, WORKS at full FPS even when other pipeline is running # needs videorate since AM69 can't generate 100 fps in videotestsrc videotestsrc ! video/x-raw,framerate=20/1 ! videorate \ ! video/x-raw,framerate=60/1,height=2056,width=2464,format=NV12 ! v4l2h264enc \ ! fpsdisplaysink video-sink=fakesink signal-fps-measurements=true sync=false # Also 60 FPS from custom appsrc, WORKS at full FPS appsrc name=appsrc ! video/x-raw,framerate=60/1,height=2056,width=2464,format=NV12 \ ! v4l2h264enc ! fpsdisplaysink video-sink=fakesink signal-fps-measurements=true sync=false
My CPU utilization is very low and my memory bandwidth isn't unreasonably high. Most confusingly, I can run two of the above pipelines at the same time in separate processes without slowing down (e.g. `tiovxisp`-only pipeline at 100 fps + `v4l2h264enc` pipeline at 60 FPS fps)! So I don't believe it's cache misses or DMA bandwidth limitations.
`tiperfoverlay` from failing 50 fps pipeline (end to end)
CPU: mpu: TOTAL LOAD = 9.21 CPU: c7x_1: TOTAL LOAD = 0.00 CPU: c7x_2: TOTAL LOAD = 0.00 CPU: c7x_3: TOTAL LOAD = 0.00 CPU: c7x_4: TOTAL LOAD = 0.00 HWA: VISS: LOAD = 37.16 % ( 250 MP/s ) DDR: READ BW: AVG = 2445 MB/s, PEAK = 2445 MB/s DDR: WRITE BW: AVG = 2120 MB/s, PEAK = 2120 MB/s DDR: TOTAL BW: AVG = 4565 MB/s, PEAK = 4565 MB/s TEMP: thermal_zone0(MCU_R5F) = 55.00 C TEMP: thermal_zone1(MCU) = 55.43 C TEMP: thermal_zone2(GPU) = 51.93 C TEMP: thermal_zone3(C7x) = 55.00 C TEMP: thermal_zone4(CPU) = 53.25 C TEMP: thermal_zone5(C7x) = 55.00 C TEMP: thermal_zone6(DDR) = 56.52 C FPS: 48
Working partial pipeline (v4l2h264enc only @ 60 fps):
CPU: mpu: TOTAL LOAD = 13.10 CPU: c7x_1: TOTAL LOAD = 0.00 CPU: c7x_2: TOTAL LOAD = 0.00 CPU: c7x_3: TOTAL LOAD = 0.00 CPU: c7x_4: TOTAL LOAD = 0.00 DDR: READ BW: AVG = 2577 MB/s, PEAK = 2577 MB/s DDR: WRITE BW: AVG = 1699 MB/s, PEAK = 1699 MB/s DDR: TOTAL BW: AVG = 4276 MB/s, PEAK = 4276 MB/s TEMP: thermal_zone0(MCU_R5F) = 55.00 C TEMP: thermal_zone1(MCU) = 55.00 C TEMP: thermal_zone2(GPU) = 52.15 C TEMP: thermal_zone3(C7x) = 55.22 C TEMP: thermal_zone4(CPU) = 54.13 C TEMP: thermal_zone5(C7x) = 54.78 C TEMP: thermal_zone6(DDR) = 57.38 C
Partial (tiovxisp only @ 100 fps):
CPU: mpu: TOTAL LOAD = 7.50 CPU: c7x_1: TOTAL LOAD = 0.00 CPU: c7x_2: TOTAL LOAD = 0.00 CPU: c7x_3: TOTAL LOAD = 0.00 CPU: c7x_4: TOTAL LOAD = 0.00 HWA: VISS: LOAD = 72.58 % ( 488 MP/s ) DDR: READ BW: AVG = 2301 MB/s, PEAK = 2301 MB/s DDR: WRITE BW: AVG = 2617 MB/s, PEAK = 2617 MB/s DDR: TOTAL BW: AVG = 4918 MB/s, PEAK = 4918 MB/s TEMP: thermal_zone0(MCU_R5F) = 54.78 C TEMP: thermal_zone1(MCU) = 55.43 C TEMP: thermal_zone2(GPU) = 51.71 C TEMP: thermal_zone3(C7x) = 54.78 C TEMP: thermal_zone4(CPU) = 52.81 C TEMP: thermal_zone5(C7x) = 55.00 C TEMP: thermal_zone6(DDR) = 56.52 C FPS: 95
Two independent pipelines (`tiovxisp` @ 100 fps in one + `v4l2h264enc` @ 60 fps in another) running simultaneously:
CPU: mpu: TOTAL LOAD = 20.69 CPU: c7x_1: TOTAL LOAD = 0.00 CPU: c7x_2: TOTAL LOAD = 0.00 CPU: c7x_3: TOTAL LOAD = 0.00 CPU: c7x_4: TOTAL LOAD = 0.00 HWA: VISS: LOAD = 70.35 % ( 473 MP/s ) DDR: READ BW: AVG = 3992 MB/s, PEAK = 3992 MB/s DDR: WRITE BW: AVG = 3783 MB/s, PEAK = 3783 MB/s DDR: TOTAL BW: AVG = 7775 MB/s, PEAK = 7775 MB/s TEMP: thermal_zone0(MCU_R5F) = 55.43 C TEMP: thermal_zone1(MCU) = 56.95 C TEMP: thermal_zone2(GPU) = 52.15 C TEMP: thermal_zone3(C7x) = 56.52 C TEMP: thermal_zone4(CPU) = 54.13 C TEMP: thermal_zone5(C7x) = 55.22 C TEMP: thermal_zone6(DDR) = 57.81 C
Is there any reason tiovxisp would slow down when attached to a v4l2h264enc element in the same pipeline?
This is running on processor-sdk-linux-am69a (09_01_00)
EDIT: In the hopes of replicating without my `appsrc` and hardware, here's some pipelines that should hopefully show the issue with just the Processor SDK:
# 1) end-to-end, <50 FPS performance gst-launch-1.0 videotestsrc ! video/x-bayer,framerate=15/1 ! videorate \ ! video/x-bayer,framerate=60/1,height=2056,width=2464,format=bggr \ ! tiovxisp dcc-isp-file=/opt/imaging/imx219/linear/dcc_viss.bin \ sink_0::dcc-2a-file=/opt/imaging/imx219/linear/dcc_2a.bin \ sink_0::pool-size=16 src::pool-size=16 \ ! v4l2h264enc capture-io-mode=dmabuf ! tiperfoverlay dump=true overlay=false \ ! fpsdisplaysink video-sink=fakesink signal-fps-measurements=true -ve # 2) only tiovxisp, ~60 fps gst-launch-1.0 videotestsrc ! video/x-bayer,framerate=15/1 ! videorate \ ! video/x-bayer,framerate=60/1,height=2056,width=2464,format=bggr \ ! tiovxisp dcc-isp-file=/opt/imaging/imx219/linear/dcc_viss.bin \ sink_0::dcc-2a-file=/opt/imaging/imx219/linear/dcc_2a.bin \ sink_0::pool-size=16 src::pool-size=16 \ ! tiperfoverlay dump=true overlay=false \ ! fpsdisplaysink video-sink=fakesink signal-fps-measurements=true -ve # 3) only v4l2h264enc, ~60 fps gst-launch-1.0 videotestsrc ! video/x-raw,framerate=15/1 ! videorate \ ! video/x-raw,framerate=60/1,height=2056,width=2464,format=NV12 \ ! v4l2h264enc capture-io-mode=dmabuf ! tiperfoverlay dump=true overlay=false \ ! fpsdisplaysink video-sink=fakesink signal-fps-measurements=true -ve ######### # Note that pipelines 2 & 3 can run simultaneously on the AM69 without any issue # while pipeline 1 cannot acheive the expected rate