This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Questions about synchronization when implementing multiple instance-based pipelining in the openVX graph system

Part Number: TDA4VM

Hello, I'm working on TDA4VM for developing recognition process.

I'm currently studying openVX and TIOVX and I have a question about graph pipelines using multiple instances.

According to my understanding of the pipeline in the OpenVX framework, the graph needs to create multiple instances. At that time, does openVX automatically provide scheduling services for multiple instances?

For example, if the input frame data at times T and T+1 are inserted into each graph instance, and the output of the T+1 input is generated first, does the openVX framework provide an automatic scheduling service? Or should developers develop their own synchronization function for this case?

If a developer needs to develop a function, can we get some examples of this case?

  • Hi,

    Is this a single node graph? 

    Are the data at T and T+1 given to some node in the graph?

    Are there dependencies among these graphs?

    Regards,

    Nikhil

  • Hi thanks to reply.

    We'll answer your questions below.

    Q1. Is this a single node graph?

    A1. No, my graph consists of 7 nodes. And I am using tivxSetGraphPipelineDepth API to create the instance. (Pipeline depth is 7.)

    Q2. Are the data at T and T+1 given to some node in the graph?

    A2. My example is when image data at time T is inserted into the first node of the first graph instance and T+1 image data is used for the first node of the second graph instance. 

    Q3. Are there dependencies among these graphs?

    A3. As you can see from my A1, there doesn't seem to be any dependencies between graph instances.

    So to ask the question again, when using multiple graph instances by TI's GraphPipelineDepth API, what if the scond graph instance using T+1 image data outputs results faster than the first graph instance using T image data? I would like to know whether TI automatically performs synchronization, and if not, is there any related example code or documentation? (I think this is a situation that can occur depending on the operating conditions of each node.)

    Thanks for your reply again.

    Regards,

    Mincheol.

  • Hi Mincheol,

    From the above explanation, what you mean by different graph instances are the same graph but for different pipeline as shown below right?

    TIOVX User Guide: Graph Pipelining in TIOVX

    So to ask the question again, when using multiple graph instances by TI's GraphPipelineDepth API, what if the scond graph instance using T+1 image data outputs results faster than the first graph instance using T image data? I would like to know whether TI automatically performs synchronization, and if not, is there any related example code or documentation? (I think this is a situation that can occur depending on the operating conditions of each node.)

    For a node, the execution time would basically not vary by much for every iteration. But even if it varies, this could be taken care by setting the buffer depth (not pipeline depth) to the output of the node, such that the node gets another buffer to fill and the next node would be using the filled buffer from the first node.

    This would be taken care by the framework.

    Regards,

    Nikhil

  • Hi Nikhil

    Thanks for your reply.

    Yes, when I said about different graph instances I meant the pipeline you mentioned.

    The example I gave you is not a general case, but a very special case.

    For example, assume that the inference time of a DSP node is twice as slow as that of an ISP node, and the process of the DSP node is executed conditionally by a specific flag signal. And the data at time T1 is inserted into the DSP node, and the data at time T2 is inserted into the ISP node (of course, each node is a different graph instance). Then, at T3 time, the T1 data is still running in the DSP node and the T2 data will be used as input data to the CPU node when it finishes the ISP node and passes the DSP node by the flag signal. If my assumption up to this point is correct, the data from T2 will finish processing before the data from T1.

    Does the TIOVX framework automatically control synchronization in this case? Or does the developer have to write code for synchronization separately? If I have to write one, is there any example code provided for reference?

    Regards,

    Mincheol.

  • Hi,

    Then, at T3 time, the T1 data is still running in the DSP node and the T2 data will be used as input data to the CPU node when it finishes the ISP node and passes the DSP node by the flag signal. If my assumption up to this point is correct, the data from T2 will finish processing before the data from T1.

    If the data flow is ISP -> DSP -> CPU, then the flow cannot bypass the DSP while the DSP is working on the data T1, unless you have a direct connect from ISP -> CPU.

    if you see the above diagram, you could see the buffers in between. This buffer depth is set from the application based on the execution time of the node. i.e. if ISP is faster and DSP is slower, there would certain number of buffers in between them. 

    So, Once the DSP is working on Data of T1, and ISP is working on data of T2, then when ISP is done with data of T2, it will place it in the buffer and start working on the data of T3.

    The data of T2 is in the buffer space waiting for DSP to take it. Once, DSP is done with data of T1, it passes this to CPU and takes in data of T2 from the buffer and the CPU will be working with data of T1. 

    The above flow is under the assumption that your graph flow is ISP->DSP->CPU 

    Regards,

    Nikhil

  • Hi, Nikhil

    Thank you for the detailed answer.

    According to the answer above, is a graph instance for a pipeline a kind of virtual concept for running a pipeline on a graph in openVX?

    What I mean is, in the above diagram, I understand that the three instances are not running perfectly in parallel, or the graph instances are running in parallel, but the entire flow is synchronized by the controller, so it works like a typical pipeline concept. Is that correct?

    Or forexample, if I set the pipeline depth to 3, does it dynamically operate and control the nodes of the graph without internally creating 3 graph instances (i.e. multi-threaded)?

    Could you please explain in a little more detail how the graph pipeline works internally with the concept of multiple graph instances?

    Regards,

    Mincheol.

  • Hi Mincheol,

    This is not virtual concept, we would literally have 3 graph instances. 

    But the reason why T2 does not go to CPU node before T1, because, the DSP node in each of the graph instances runs on the same target / task. So the task running on DSP, will not be retriggered until it is done with T1. So the T2 will wait for T1 to complete in the DSP and then gets processed.

    Regards,

    Nikhil

  • Hello Nikhil,

    I fully understand that part, thank you. Just to confirm my understand, each node has its own task thread created on the target core, and the concept of sharing it with the graph instance is correct?

    Additionally, I have another question about this execution system. According to my understanding, the first node in the graph is executed through the Enqueue & Dequeue API call, and subsequent connected nodes are executed with the output of the previous node of each node acting as a trigger. is this right? If that's correct, does each thread start if the buffer is not empty? The most confusing point is what exactly the trigger for a connected node is and how it is works as trigger.

    Thanks,

    Mincheol.

  • Hello,

    This thread is assigned to our engineer in the India office. Due to a regional holiday, half of our team is out of the office. Please expect a 1~2 day delay in responses.

    Apologies for the delay, and thank you for your patience.

    Thanks.

  • Hi,

    I fully understand that part, thank you. Just to confirm my understand, each node has its own task thread created on the target core, and the concept of sharing it with the graph instance is correct?

    Yes, By each node here, it would be ISP, DSP and CPU. Across the pipeline, it would be on the same task. (i.e. 3 DSP nodes across the pipeline runs on the same task)

    According to my understanding, the first node in the graph is executed through the Enqueue & Dequeue API call, and subsequent connected nodes are executed with the output of the previous node of each node acting as a trigger. is this right?

    Yes.

    If that's correct, does each thread start if the buffer is not empty? The most confusing point is what exactly the trigger for a connected node is and how it is works as trigger.

    As soon as the buffer is available, it would trigger the connected nodes.

    For eg. If ISP has completed the buffer "A", then the DSP node would be triggered and process the buffer "A". Meanwhile, if ISP has completed buffer "B" while DSP node is processing "A", then it will be placed in the buffer pool, and as soon as "A" is done by DSP, since "B" is available in the buffer pool, it triggers again.

    So a node is triggered as soon as a buffer is available in the buffer pool. (also known as buffer depth)

    Regards,

    Nikhil

  • Hi

    Thanks to your explanation, I understand completely.

    Best regards,

    Mincheol.