This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-J721E: How the FPS in TIOVX App is calculated?

Part Number: PROCESSOR-SDK-J721E


  May I ask why the FPS printed out in printf_perf function in app.c in TIOVX App is not related to any printed out avg performance of each node or the whole graph? (Check the image below)

  From the code, I can guess that the FPS is calculated from the first step of reading input image to the final step of display output on the screen, but why the numbers printed out are a little off?

  P/s: I did not change anything in the default performance measurement code, except for printing out perf of every node in graph instead of tidl_node only, and I run the object detection config with my own model

  • Hi,

    May I know is this an out of the box demo from the SDK? If yes, May I know which one?

    The FPS is calculated at the Host (A72 side of the application) and the value depends on where you have put the total_perf start and end.

    The other perf stats for nodes and graphs come from the TIOVX framework

    Regards,

    Nikhil

  • Thank you for your answer! But if you're talking about the out of box demo from SDK in vision_apps directory, then it is not. This one I build from edgeai-tiovx-apps directory following this github link: https://github.com/TexasInstruments/edgeai-tiovx-apps. And I run it with object_detection.yaml config. I believe the code to update performance is in the end of while loop at /opt/edgeai-tiovx-apps/apps/src/app.c:

    /* Main loop that runs till interrupt */
    while(run_loop)
    {
    bool skip = false;

    #if defined(TARGET_OS_LINUX)
    /***********************************************************************
    * V4L2 Sources *
    ***********************************************************************/

    for(i = 0; i < num_input_blocks; i++)
    {
    if (LINUX_CAM == input_blocks[i].input_info->source)
    {
    /*
    * Dqueue v4l2 buffers and if not NULL store in valid buffers
    * which will be used later to enqueue and dqueue in openvx graph
    */
    v4l2_dq_bufs[i] = v4l2_capture_dqueue_buf(input_blocks[i].v4l2_obj.v4l2_capture_handle);
    if(NULL != v4l2_dq_bufs[i])
    {
    v4l2_valid_bufs[i] = v4l2_dq_bufs[i];
    }
    }
    }

    for(i = 0; i < num_input_blocks; i++)
    {
    if (LINUX_CAM == input_blocks[i].input_info->source &&
    NULL == v4l2_valid_bufs[i])
    {
    /*
    * Skip the loop if any v4l2 buffer is not there
    */
    skip = true;
    break;
    }
    }

    for(i = 0; i < num_input_blocks; i++)
    {
    in_buf_pool = input_blocks[i].input_pad->buf_pool;
    if (LINUX_CAM == input_blocks[i].input_info->source)
    {
    /*
    * If all v4l2 buffers are present, enqueue and dequeue to
    * openvx graph.
    */
    if(!skip)
    {
    tiovx_modules_enqueue_buf(v4l2_valid_bufs[i]);
    inbuf = tiovx_modules_dequeue_buf(in_buf_pool);
    v4l2_capture_enqueue_buf(input_blocks[i].v4l2_obj.v4l2_capture_handle, inbuf);
    v4l2_valid_bufs[i] = NULL;

    /* AEWB processing for linux */
    linux_h3a_buf_pool = input_blocks[i].v4l2_obj.h3a_pad->buf_pool;
    linux_aewb_buf_pool = input_blocks[i].v4l2_obj.aewb_pad->buf_pool;
    linux_h3a_buf = tiovx_modules_dequeue_buf(linux_h3a_buf_pool);
    linux_aewb_buf = tiovx_modules_dequeue_buf(linux_aewb_buf_pool);
    aewb_process(input_blocks[i].v4l2_obj.aewb_handle, linux_h3a_buf, linux_aewb_buf);
    tiovx_modules_enqueue_buf(linux_h3a_buf);
    tiovx_modules_enqueue_buf(linux_aewb_buf);
    }

    /*
    * Else, enqueue back the dequeue buffer from v4l2 capture
    */
    else if(NULL != v4l2_dq_bufs[i])
    {
    v4l2_capture_enqueue_buf(input_blocks[i].v4l2_obj.v4l2_capture_handle, v4l2_dq_bufs[i]);
    }
    }
    }

    if(skip)
    {
    continue;
    }
    #endif

    /***********************************************************************
    * Other Sources *
    ***********************************************************************/
    for(i = 0; i < num_input_blocks; i++)
    {
    in_buf_pool = input_blocks[i].input_pad->buf_pool;
    if (RTOS_CAM == input_blocks[i].input_info->source)
    {
    /* Dequeue and the enqueue buffers */
    inbuf = tiovx_modules_dequeue_buf(in_buf_pool);
    tiovx_modules_enqueue_buf(inbuf);
    }

    #if defined(TARGET_OS_LINUX)
    else if (H264_VID == input_blocks[i].input_info->source)
    {
    /* Dequeue from graph, enqueue to v4l2 for decode,
    * then dequeue from v4l2 and enqueue to graph
    */
    inbuf = tiovx_modules_dequeue_buf(in_buf_pool);
    v4l2_decode_enqueue_buf(input_blocks[i].v4l2_obj.v4l2_decode_handle, inbuf);
    inbuf = v4l2_decode_dqueue_buf(input_blocks[i].v4l2_obj.v4l2_decode_handle);
    if(NULL == inbuf)
    {
    run_loop = false;
    skip = true;
    break;
    }
    tiovx_modules_enqueue_buf(inbuf);
    }
    #endif

    else if (RAW_IMG == input_blocks[i].input_info->source)
    {
    /* Dequeue the buffer, send to read image thread for filling
    * the buffer and the enqueue it
    */
    inbuf = tiovx_modules_dequeue_buf(in_buf_pool);

    pthread_mutex_lock(&r_thread_lock[i]);
    r_thread_data[i].read_image_buf = inbuf;
    pthread_mutex_unlock(&r_thread_lock[i]);

    tiovx_modules_enqueue_buf(inbuf);
    }
    }

    if(skip)
    {
    break;
    }

    for(i = 0; i < num_output_blocks; i++)
    {
    if(NULL != output_blocks[i].output_pad)
    {
    /* Dequeue and enqueue output buffer if pad is present */
    out_buf_pool = output_blocks[i].output_pad->buf_pool;
    outbuf = tiovx_modules_dequeue_buf(out_buf_pool);
    if (IMG_DIR == output_blocks[i].output_info->sink)
    {
    /* Update write image thread buffer */
    pthread_mutex_lock(&w_thread_lock[i]);
    w_thread_data[i].write_image_buf = outbuf;
    pthread_mutex_unlock(&w_thread_lock[i]);
    }
    #if defined(TARGET_OS_LINUX)
    else if (LINUX_DISPLAY == output_blocks[i].output_info->sink)
    {
    /* Render the dequeued buffer on kms display */
    kms_display_render_buf(output_blocks[i].kms_obj.kms_display_handle,
    outbuf);
    }
    else if (H264_ENCODE == output_blocks[i].output_info->sink)
    {
    v4l2_encode_enqueue_buf(output_blocks[i].v4l2_obj.v4l2_encode_handle, outbuf);
    outbuf = v4l2_encode_dqueue_buf(output_blocks[i].v4l2_obj.v4l2_encode_handle);
    if(NULL == outbuf)
    {
    run_loop = false;
    break;
    }
    }
    #endif
    tiovx_modules_enqueue_buf(outbuf);
    }

    /* Dequeue and enqueue perf overlay buffer */
    if(NULL != output_blocks[i].perf_overlay_pad)
    {
    perf_overlay_buf_pool = output_blocks[i].perf_overlay_pad->buf_pool;
    perf_overlay_buf = tiovx_modules_dequeue_buf(perf_overlay_buf_pool);
    update_perf_overlay((vx_image)perf_overlay_buf->handle, &perf_stats_handle);    <--------------   THIS ONE TO UPDATE PERF?
    // printf("FPS test: %d\n", perf_stats_handle.stats.fps);
    if(cmd_args->verbose)
    {
    print_perf(&graph, &perf_stats_handle);
    }
    tiovx_modules_enqueue_buf(perf_overlay_buf);
    }
    // printf("Output name: %s\n", output_blocks[i].output_info->name);
    }
    }
    So could you explain why the FPS is not related to any printed time of node(s) and Graph like in the above image?
  • Hi,

    Sorry, I was referring to the vision_Apps based flow.

    Let me connect you with EdgeAi experts to get a response on the edgeai-tiovx based implementations.

    Regards,

    Nikhil

  • Yes please. Thank you!

  • Hi Han,

    FPS is calculated based on number of times update_perf_overlay is called
    This is assumed to be called for every frame once

    Regards
    Rahul T R

  • Thank you for your answer! But can you explain why the numbers printed out in the above images are not related?

    For example, Graph time is 46.879 ms  -> FPS = 1 / 46.879 * 1000 = 21 which is different from 60 FPS printed out.

    I am confused from which node to which node in the graph is the start and end of FPS calculation? Or if they are not related, can you explain why in details?

  • Hi Han,

    FPS and latency are different measurements
    46.8 is the average latency of the graph
    and since the graph runs in a pipelined fashion, when
    different nodes runs parallelly FPS is higher than 1/latency

    Regards
    Rahul T R  

  • Hi Rahul,

    Thank you for your answer!