This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-TDA4VM: Running the Multi Graph pipeline with edgeai-tiovx-apps with Multi Threads environment, how?

Part Number: SK-TDA4VM


Tool/software:

 Hello, we have been trying to implement 2d bit block transfer module utilizing tiovx. In order to achieve the goal,

we have referenced edgeai-tiovx-apps package's multi_graph_test.c on the github repository. The specific path to the source code is following.

https://github.com/TexasInstruments/edgeai-tiovx-apps/blob/develop/tests/app_tiovx_linux_multi_graph_test.c

We have successfully gotten the result from the code we devised that consists of 4 graphs. (Color convert - Multi Scale - Mosaic - Color convert) to be exact.

The thing is, inside the source code,

There are lines where

/*
     * Note that in below loop graph on and 2 are running sequentially
     * to run them in parellel, you need to decuple them by creating
     * seperate threads for enqueue/dequeue of each graph
     */
    for (int i = 0; i < APP_NUM_ITERATIONS; i++) {

those are stated. So, from my understanding, in order to achieve the optimal performance,
we need to somehow decouple  them and call the queue/dequeue parts below on the separate threads for each graph.

    /*
     * Note that in below loop graph on and 2 are running sequentially
     * to run them in parellel, you need to decuple them by creating
     * seperate threads for enqueue/dequeue of each graph
     */
    for (int i = 0; i < APP_NUM_ITERATIONS; i++) {
        //Graph1 Execution
        inbuf1 = v4l2_decode_dqueue_buf(v4l2_decode_handle);
        outbuf1 = tiovx_modules_acquire_buf(out_buf_pool1);
        tiovx_modules_enqueue_buf(inbuf1);
        tiovx_modules_enqueue_buf(outbuf1);
        inbuf1 = tiovx_modules_dequeue_buf(in_buf_pool1);
        outbuf1 = tiovx_modules_dequeue_buf(out_buf_pool1);
        v4l2_decode_enqueue_buf(v4l2_decode_handle, inbuf1);

        //Graph2 Execution
        inbuf2 = tiovx_modules_acquire_buf(in_buf_pool2);
        outbuf2 = tiovx_modules_acquire_buf(out_buf_pool2);
        //Swap input2 mem with output1 to feed graph1 out to graph2
        tiovx_modules_buf_swap_mem(outbuf1, inbuf2);
        tiovx_modules_release_buf(outbuf1);
        tiovx_modules_enqueue_buf(inbuf2);
        tiovx_modules_enqueue_buf(outbuf2);
        inbuf2 = tiovx_modules_dequeue_buf(in_buf_pool2);
        outbuf2 = tiovx_modules_dequeue_buf(out_buf_pool2);
        kms_display_render_buf(kms_display_handle, outbuf2);
        tiovx_modules_release_buf(inbuf2);
        tiovx_modules_release_buf(outbuf2);
    }

Here is the question.
1. What does it mean exactly with 'by creating separate threads'?
    is the API compatible / thread safe with pthread or std::thread and such?

2. openvx graph should always operate with the vx_context, Isn't it? So, if I really 
   make several different threads to queue/dequeue the data for each graph, how should
   I manage the context for it?

3. Is there any example/tutorial/sample code that I could reference to implement the multi thread
 settings with multi graphs operation with tiovx?

  • Hi Sejin,

    1. Yes, it is compatible with pthread

    2. Yes openvx graph is associated with the context. when you initialize the graph using this api tiovx_modules_initialize_graph() it create context for it, each graph object has its own context, refer the GraphObj struct

    context is released by calling the api tiovx_modules_clean_graph() with the graph object as argument.

    3. Refer to app_tiovx_linux_multi_capture_display_test.c file, which has pthread implementation for same graph but it can be extended for multiple graphs by passing the graph objects as thread args.

    Regards,
    Gokul

  • Thank you very much for the fast response.
    I tested the multi-threaded/multi graph approach and it turned out for the relatively smaller sized graphs such as our usecase,
    using single graph with single thread seems to be more performant.