This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] TDA4VH-Q1: Analysis on Latency Wave521CL CODEC IP

Part Number: TDA4VH-Q1

What information is available of encoder/decoder latency on the Wave521CL CODEC IP?

  • Below are FAQ and respective analyses done to provide insights to latency metrics regarding the Wave521CL Codec: 

    How to characterize latency tolerance?

    VPU can afford to reach real time performance under memory access latency if the number of cycles per CTU
    retains less than 500 cycles. VPU is designed to be less sensitive to pipeline delay especially at a peak bitrate
    which might eventually incur performance drop.With use of decoupling technique and inter-pipe queues, it
    can hide this sort of delay and deliver high performance at any situation.

    C&M designed the WAVE521CL HW in order to meet target performance (4K60fps with 400Mhz bus clock and 500Mhz core clock) under both worst read and write latency fixed to 500 bus clock cycles. The Codec can meet the performance for the almost cases under this desgin constraint for worst latency.
    This also means that WAVE521CL can meet the target performance for the case of momentary latency. However, any potential performance drop is hard to specify because it can differ on the stream.

    Attached here is the latest excel test report attachment from our IP vendor to show timings of different stream use cases for encode and decode:
    5140.V4L2_test_report_v8.8.xlsx

    How can end-to-end latency be determined?

    End-to-end latency includes input delay, network delay, delay due to the video standard specification, display delay, and so on. This is why our Linux test-cases only look at latency from the v4l2h265/v4l2h264 elements with no display.

    Specific decoder latency information:

    The decoder is 10% faster than encoder in same input frequency. Additionally, you can refer to the TRM in section 6.6.2.1 Performance where it specifies the decoder for 3840x2160@60 with 450MHz input frequency while Encoder encodes the same resolution with 500MHz. Additionally for the decoder, the timing of first display depends on bitstream header syntax. So display delay can be different for each bitstream.

    How to measure CODEC latency information (Linux):

    To measure hardware accelerator latency from a GStreamer pipeline follow these steps:

    1. Copy this parse_gst_tracers.py file on to the board:
      4213.parse_gst_tracers.py
      #!/usr/bin/python3
      
      import sys
      import re
      import time
      import signal
      import threading
      import os
      
      stats = {}
      
      pattern_latency = "time=\(guint64\)([0-9]*)"
      pattern_ts = "ts=\(guint64\)([0-9]*)"
      pattern_element = "element=\(string\)(.*?),"
      
      if len(sys.argv) > 1 and os.path.isfile(sys.argv[1]):
      	fp = open(sys.argv[1], 'r')
      else:
      	print("[ERROR] Trace file dose not exist")
      	print("Usage: ./parse_gst_tracers.py ./path/to/gst_trace/trace.log")
      	exit()
      
      stop = False
      def signal_handler(sig, frame):
      	print("Ctrl-C")
      	global stop
      	stop = True
      signal.signal(signal.SIGINT, signal_handler)
      
      header =  "|element             latency      out-latancy      out-fps     frames     |"
      divider = "+-------------------------------------------------------------------------+"
      def report():
      	while(not stop):
      		os.system('clear')
      		print(divider)
      		print(header)
      		print(divider)
      		for e in stats:
      			print('|' + e.ljust(20), ("%0.2f"%(stats[e][0]/1000000)).ljust(13), ("%0.2f"%(stats[e][2]/1000000)).ljust(17), str(stats[e][3]).ljust(12), str(stats[e][4]).ljust(10), '|', sep="")
      		print(divider)
      		time.sleep(1)
      reporting_thread = threading.Thread(target=report)
      reporting_thread.start()
      
      while (not stop):
      	l = fp.readline() # read line
      	if not l:
      		time.sleep(0.5)
      		continue
      
      	# find instances of latency in this line
      	t = re.findall(pattern_latency, l)
      	if (not t):
      		continue
      	if (not t[0]):
      		continue
      
      	latency = int(t[0])
      
      	# get the current time stamp
      	t = re.findall(pattern_ts, l)
      	if (not t):
      		continue
      	if (not t[0]):
      		continue
      
      	time_stamp = int(t[0])
      
      	# retrieve the current element
      	t = re.findall(pattern_element, l)
      	if (not t):
      		continue
      	element = t[0]
      
      	# see if this element is currently in stats dict
      	if element not in stats:
                      #                [latency[0], curr-timestamp[1], out-latency[2], out-fps[3], num-frames[4]]
      		stats[element] = [0,          0,                 0,              0,          0            ]; # set up if just adding
      
      	stats[element][4] += 1 # increase number of frames
      
      	# calculate latency of the element
      	stats[element][0] = ((stats[element][4] - 1) * stats[element][0] + latency)/stats[element][4]
      	if (stats[element][1]):
      		stats[element][2] = ((stats[element][4] - 1) * stats[element][2] + time_stamp - stats[element][1])/stats[element][4]
      		stats[element][3] = int(1000000000/stats[element][2])
      	stats[element][1] = time_stamp
      
    2. Run the GStreamer pipeline with debug logs enabled and written to a file.
      1. Example: GST_DEBUG_FILE=gst_trace.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element):v4l2" gst-launch-1.0 filesrc location=/bbb1080p30.yuv ! rawvideoparse width=1920 height=1080 format=nv12 framerate=30/1 colorimetry=bt709 ! v4l2h264enc ! filesink location=/bbb1080p30.264
    3. Run the Python script onto the gst_trace.log file:
      1. python3 parse_gst_tracers.py gst_trace.log
      2. Example Output:

        The v4l2h264enc0 out-latency describes the actual latency of the hardware accelerator.