[TIOVX]: run C7x node for n iterations

Dominic Fellner

Hi,

I have a 2-stage processing pipeline where

1) a CNN is processing an image using a TIDL node. The output is a list of objects.

2) for every object, I want to run another CNN to process that data. The length of the list depends on the image input, but has some upper bound M.

I'm thinking of creating a second graph with one TIDL node and run it in a for loop for N times, N depending on the output of the first graph.

Alternatively, I thought of dynamically creating a second graph after I have the information about how many objects there are in this single frame and then deploy and run N TIDL nodes which get the N objects.

Alternatively again, I could have a custom node that is distributing the maximum M outputs to M TIDL nodes and trigger only N nodes in every cycle. Ideally the data would not leave the C7x to reduce overhead.

I could not find a TIOVX demo supporting this. Is there a best practice how to do this?

Thanks and regards

Dom

over 5 years ago

0 Subhajit Paul over 5 years ago

TI__Expert 7015 points

Dom,

I am checking on this internally and will get back to you

- Subhajit

0 Shyam Jagannathan over 5 years ago in reply to Subhajit Paul

TI__Genius 10355 points

Dom,

The recommended approach is to have separate TIDL nodes for separate networks not matter how they are related.

Regards,
Shyam

0 Dominic Fellner over 5 years ago in reply to Shyam Jagannathan

Prodigy 200 points

Hi Shyam,

thanks for your answer.

What is not clear to me is how I can create a TIOVX graph to pass the data accordingly.

The problem is that for every run of the first TIDL node, the second one has to run multiple times and the number of iterations depends on the output of the first TIDL node and will be different in every frame. Its not clear to me how I can represent that in a TIOVX graph.

My first approach would be to have two separate graphs with a single TIDL node each and for each frame, run the first graph and then run the second one in a for loop for N iterations depending on the output of the first. That might even work, but does not sound very efficient to me.

Possible alternatives to that approach are listed in my first post.

Can you please check back again what is the best practice for my use case?

Thanks and regards

Dom

0 Kedar Chitnis over 5 years ago in reply to Dominic Fellner

TI__Genius 9101 points

hi Dom,

You are right, this cannot be represented as OpenVX graph. The options you listed are roughly what can be done, let me elaborate and fine tune them.

Option 1:

"I'm thinking of creating a second graph with one TIDL node and run it in a for loop for N times, N depending on the output of the first graph.

These are essentially the same.

Have two graphs with one TIDL node each, 1st for first network and 2nd for second network.

Invoke these for A72.

After 1st TIDL graph is invoked, invoke the 2nd one N times based on output of first.

Pro: This is good to verify your network functionally with existing TIDL node.

Con: the repeated IPC between A72 and C7x. Complications in pipelining later.

My recommendation do this to verify your network functionally.

Option 2:

Write you own TIDL node for the 2nd part of DL processing.

Here it will take as input output from 1st TIDL node and you will put the loop of "N" in the 2nd TIDL here.

Pro: You can resue TIDL node for 1st part

Con: you need to write your own node.

Option 3:

Variation of option 2, where you write your own node and do 1st and 2nd DL processing in same node.

Pro: Most efficient, though vs option 2 maybe not much savings

Cons: now this node is very specific to this algorithm

"Alternatively again, I could have a custom node that is distributing the maximum M outputs to M TIDL nodes and trigger only N nodes in every cycle. Ideally the data would not leave the C7x to reduce overhead."

This may not be possible to in OpenVX and looks somewhat complicated.

My recommendation is do option 1, verify everything, then do option 2 or 3 (both are similar), which just be a optimization step.

regards
Kedar

Processors

Processors forum

[TIOVX]: run C7x node for n iterations