This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[TIOVX]: run C7x node for n iterations

Hi,

I have a 2-stage processing pipeline where

1) a CNN is processing an image using a TIDL node. The output is a list of objects.

2) for every object, I want to run another CNN to process that data. The length of the list depends on the image input, but has some upper bound M.

I'm thinking of creating a second graph with one TIDL node and run it in a for loop for N times, N depending on the output of the first graph.

Alternatively, I thought of dynamically creating a second graph after I have the information about how many objects there are in this single frame and then deploy and run N TIDL nodes which get the N objects.

Alternatively again, I could have a custom node that is distributing the maximum M outputs to M TIDL nodes and trigger only N nodes in every cycle. Ideally the data would not leave the C7x to reduce overhead.

I could not find a TIOVX demo supporting this. Is there a best practice how to do this?

Thanks and regards

Dom

  • Dom,

    I am checking on this internally and will get back to you

    - Subhajit

  • Dom,

    The recommended approach is to have separate TIDL nodes for separate networks not matter how they are related. 


    Regards,
    Shyam

  • Hi Shyam,

    thanks for your answer.

    What is not clear to me is how I can create a TIOVX graph to pass the data accordingly.

    The problem is that for every run of the first TIDL node, the second one has to run multiple times and the number of iterations depends on the output of the first TIDL node and will be different in every frame. Its not clear to me how I can represent that in a TIOVX graph.

    My first approach would be to have two separate graphs with a single TIDL node each and for each frame, run the first graph and then run the second one in a for loop for N iterations depending on the output of the first. That might even work, but does not sound very efficient to me.

    Possible alternatives to that approach are listed in my first post.

    Can you please check back again what is the best practice for my use case?

    Thanks and regards

    Dom

  • hi Dom,

    You are right, this cannot be represented as OpenVX graph. The options you listed are roughly what can be done, let me elaborate and fine tune them.

    Option 1:

    "I'm thinking of creating a second graph with one TIDL node and run it in a for loop for N times, N depending on the output of the first graph.

    Alternatively, I thought of dynamically creating a second graph after I have the information about how many objects there are in this single frame and then deploy and run N TIDL nodes which get the N objects."

    These are essentially the same.

    Have two graphs with one TIDL node each, 1st for first network and 2nd for second network.

    Invoke these for A72.

    After 1st TIDL graph is invoked, invoke the 2nd one N times based on output of first.

    Pro: This is good to verify your network functionally with existing TIDL node.

    Con: the repeated IPC between A72 and C7x. Complications in pipelining later.

    My recommendation do this to verify your network functionally.

    Option 2:

    Write you own TIDL node for the 2nd part of DL processing.

    Here it will take as input output from 1st TIDL node and you will put the loop of "N" in the 2nd TIDL here.

    Pro: You can resue TIDL node for 1st part

    Con: you need to write your own node.

    Option 3:

    Variation of option 2, where you write your own node and do 1st and 2nd DL processing in same node.

    Pro: Most efficient, though vs option 2 maybe not much savings

    Cons: now this node is very specific to this algorithm

    "Alternatively again, I could have a custom node that is distributing the maximum M outputs to M TIDL nodes and trigger only N nodes in every cycle. Ideally the data would not leave the C7x to reduce overhead."

    This may not be possible to in OpenVX and looks somewhat complicated.

    My recommendation is do option 1, verify everything, then do option 2 or 3 (both are similar), which just be a optimization step.

    regards
    Kedar