J784S4XEVM: some question about openvx pipeline depth

rz liu

Part Number: J784S4XEVM

Hello,

The TIOVX user guide has a description of the pipeline depth, but I don't understand why does it need multiple graph instance, it is only one node in an entity can run at any time?

I think it can be used the analogy of a production line in a factory. Workers are equivalent to nodes or targets. Multiple workers collaborate on the production line to complete a product. Multiple workers work in pipeline line mode, but there is only one production line instance.

Thanks.

over 2 years ago

0 Nikhil Dasan over 2 years ago

TI__Guru* 85196 points

Hi,

rz liu said:
The TIOVX user guide has a description of the pipeline depth, but I don't understand why does it need multiple graph instance, it is only one node in an entity can run at any time?

Here, pipelining means that a second frame can not start graph execution on this same graph until the first graph execution is completed.

Hence in order to increase the throughput, we would have to consider multiple instances of the graph so that once the node has processed the frame, it could take in the second frame and not wait until all the nodes completes the first frame.

You could refer the below documentation as well

The OpenVX Graph Pipelining, Streaming, and Batch Processing Extension to OpenVX 1.1 and 1.2 (khronos.org)

Regards,

Nikhil

0 rz liu over 2 years ago in reply to Nikhil Dasan

Prodigy 131 points

Hi,

I did not understand:

Here, pipelining means that a second frame can not start graph execution on this same graph until the first graph execution is completed.

if there are 3 Nodes in graph, could it work like this:

On T1:
  +---------+      +---------+      +---------+
  |         |      |         |      |         |
  | frame 3 |----->| frame 2 |----->| frame 1 |
  |         |      |         |      |         |
  +---------+      +---------+      +---------+
   Node1(A72)       Node2(C71)       Node3(MCU)
   
On T2:
  +---------+      +---------+      +---------+
  |         |      |         |      |         |
  | frame 4 |----->| frame 3 |----->| frame 2 |
  |         |      |         |      |         |
  +---------+      +---------+      +---------+
   Node1(A72)       Node2(C71)       Node3(MCU)

if it could be work, it need one graph instance only, any problem?

Thanks.

0 Nikhil Dasan over 2 years ago in reply to rz liu

TI__Guru* 85196 points

Hi,

The standard concept of OpenVX does not support pipelining, which means that the graph can be retriggered only when all the node execution for one frame is complete. Hence, node retriggering is not possible.

Hence, in order to retrigger the graph, TI's extension/support was to create instances of the graph (i.e. pipeline depth) that would be optimally equal to the number of nodes, so that retriggering of the each graph would be similar to retriggering of each node.

Hence, we require multiple instances.

Regards,

Nikhil

0 rz liu over 2 years ago in reply to Nikhil Dasan

Prodigy 131 points

Hi,

That doesn't sound particularly perfect! Will it waste memory space? Why not TI improve its implementation based on OpenVX function interface ?

Thanks.

0 Nikhil Dasan over 2 years ago in reply to rz liu

TI__Guru* 85196 points

Hi,

If compared to not pipelining, then yes, there is more memory used in both multiple buffering and in node object descriptors.

However, keeping track of when to execute a node for which buffers from which frame takes control logic to execute at the right time the right buffers. Scaling from nonpipeling to pipelining by replicating the graph/node objects is a scalable and efficient way of using existing logic with minimal memory increase.

To use the analogy of an assembly line, you could consider each factory worker as a separate node. In the case of 3 nodes, there is still only 3 factory workers whether you pipeline or not. The pipeline depth can be something similar to a container of the item being worked on. If you only work on 1 item at a time, you only need one container, and when the item is done, you can reuse it for the next item. If you want to work on 3 at a time, you need 3 containers, reusing the one that finished each time for the next one to start. This container is like a graph context that we call a graph instance.

Hope this helps in clarifying your query.

Regards,

Nikhil

0 rz liu over 2 years ago in reply to Nikhil Dasan

Prodigy 131 points

Hi,

you said:

But a graph instance looks more like a assembly lines, it has its own nodes and buffers , Maybe my understanding is wrong.

Thanks.

0 Nikhil Dasan over 2 years ago in reply to rz liu

TI__Guru* 85196 points

Hi,

Graph instance cannot be compared to the assembly line as the next input does not come in until the line is empty and done with processing the current product. (i.e. the graph can be retriggered only once all the nodes are done with the execution.

Regards,

Nikhil

0 rz liu over 2 years ago in reply to Nikhil Dasan

Prodigy 131 points

Hi,

This is exactly my question. There is only one worker busy on an assembly line, and the others are idle and waiting. Isn’t it a waste of resources?

Thansk.

0 Nikhil Dasan over 2 years ago in reply to rz liu

TI__Guru* 85196 points

Nikhil Dasan said:
the graph can be retriggered only once all the nodes are done with the execution

This is the current implementation of a graph as per the OpenVX standard where it cannot be retriggered unless it is completed all the nodes.
Hence, to extend this implementation to pipelining mode, (as node level triggering is not available), TI's implementation was to extend it to multiple graph instances.

Hence, scaling from non-pipelining to pipelining by replicating the graph/node objects is a scalable and efficient way of using existing logic with minimal memory increase which is currently supported in the SDK.

Regards,

Nikhil

Processors

Processors forum

J784S4XEVM: some question about openvx pipeline depth