TDA4VM: Scalar Node's 4-channel output with different resolution

Vyom Mishra1

Genius 4590 points

Part Number: TDA4VM

Dear Sir,

I need your suggestion for the below requirements:

Capture Node (4 channel output NV12 H x W) ->Scalar Node -> X Node

X Node is expecting 4-channel output but

Cam 0 and Cam 1 with resolution H1 x W1

Cam 2 and Cam 3 with resolution H2 x W2

Please let me know if I can consider some references available or suggest the best possible way to implement the same.

Thanks and Regards,

Vyom Mishra

over 1 year ago

0 Nikhil Dasan over 1 year ago

TI__Guru* 85266 points

Hi Vyom,

First of all, for different resolutions, you would require different capture nodes.

i.e. Since the capture node output is an Object array, it expects same resolution for all the sensors configured for that capture node.

So you would need 2 capture nodes here.

So your X Node should take in 2 object arrays and extract all the object arrays and give the output as 4 vx_images.

Regards,

Nikhil

0 Vyom Mishra1 over 1 year ago in reply to Nikhil Dasan

Genius 4590 points

Dear Sir,

Thanks for the response.

Note: X Node is a parallel node of my Normal 4-channel segmentation model pipeline

As per your suggestion, My understanding is that

If I have two different resolutions, I need to create two different capture nodes. Will both capture nodes be capturing all 4 channels independently?

Correct me, If I am wrong.

Second, Can two scalar nodes be created for different resolutions? I may need HxW resolution for the post-processing segmentation model

Please provide your suggestion for this new condition also

4-channel Input -> Custom Y Node-> Custom Z Node

Y-Node is view conversion Node

Y Node is a custom Node whose input resolution is the same for all channels, but the Output resolution changes as per the requirement.

So, How to handle the integration of both Y and Z Node?

Please provide your valuable suggestions.

Thanks and Regards,

Vyom Mishra

0 Nikhil Dasan over 1 year ago in reply to Vyom Mishra1

TI__Guru* 85266 points

Hi,

Vyom Mishra1 said:
If I have two different resolutions, I need to create two different capture nodes. Will both capture nodes be capturing all 4 channels independently?

Correct me, If I am wrong.

No, the two capture nodes would only be capturing the cameras you have allocated it too. Based on the configuration done on the node.

In your case, 2 cameras with same resolution would be captured on 1 capture node.

Vyom Mishra1 said:
Second, Can two scalar nodes be created for different resolutions? I may need HxW resolution for the post-processing segmentation mode

Sorry, did not get you. Scalar node takes in an image with one resolution and gives scaled output. If you have 2 different input resolutions, you would require 2 scalar nodes.

Vyom Mishra1 said:
Y Node is a custom Node whose input resolution is the same for all channels, but the Output resolution changes as per the requirement.

So, How to handle the integration of both Y and Z Node?

Here the output resolution of Y would be known during the creation time of the node right?

Also input to Y is an object array and there are 4 outputs for different resolutions? If yes, you could take 4 inputs to Z same resolutions as Y's output to integrate the node.

Regards,

Nikhil

0 Vyom Mishra1 over 1 year ago in reply to Nikhil Dasan

Genius 4590 points

Dear Sir,

Thanks for the response!

I have one query, please suggest

I have four channel outputs from the TIDL Node( Segmentation Mask) belonging to 4 different camera positions. I want to process individually through some post-processing which is camera position dependent.

While porting the post-processing, is it possible to extract all channel output inside a custom kernel and pass it to the respective post-processing function call. If yes do we have some references or can you please suggest the way?

Do I need to create 4 separate kernels for each channel as per the position of the camera? If yes, how to manage the synchronization of all four output of 4 kernels if next node is expecting all four output of the kernel together.

Thanks and Regards,

Vyom Mishra

0 Nikhil Dasan over 1 year ago in reply to Vyom Mishra1

TI__Guru* 85266 points

Hi Vyom,

Vyom Mishra1 said:
I have four channel outputs from the TIDL Node( Segmentation Mask) belonging to 4 different camera positions

By 4 different camera positions, do you mean they have 4 different resolutions? i.e. is the output of TIDL node an object array with replicated TIDL nodes or is it literally 4 outputs from single TIDL node?

The current TIDL node in the SDK supports only 1 output per node, and if an object array is provided (i.e. with same resolution) as input, then TIDL node would be replicated using vxReplicateNode() API, which internally creates 4 instances of the node.

May I know which is your usecase?

It would be better if you could put this in the form of a block diagram.

Regards,

Nikhil

0 Vyom Mishra1 over 1 year ago in reply to Nikhil Dasan

Genius 4590 points

Dear Sir,

Please find the block diagram

Capture - > Scalar -> Pre-processing -> TIDL -> X Node

All camera has the same resolution, output mask of TIDL for each channel is also the same as 1024x786.

I have a requirement in which X node needs to process the TIDL Output Segmentation mask of the same resolution but the position of the camera is different for the vehicle( Side Left , Side Right, Front and Rear)

X node Algorithm logic is different for all cameras (as per their respective position).

I assume till TIDL I can replicate the pipeline but for X Node it cannot be replicated.

For the above requirement, I have two options

A. Create a custom kernel and Node for each channel separately and pass the TIDL Output mask as per the channel

B. Create single custom kernel and Node for all channels, select the TIDL Output of each channel separately and process it inside the algorithm

For example:

In algorithm,

void main()

{

leftprocess(tidl _output_left_camera_buffer);

rightprocess(tidl _output_right_camera_buffer);

}

I would like to know your opinion on the possibilities, feasibility and references if available do so if not then please suggest the best possible way to handle it

One more request, please let me know

If I have a custom node after TIDL, is it possible to separate out output mask (vx_tensor) for each channel(4 cam) in the custom node.

Thanks and Regards,

Vyom Mishra

+1 Nikhil Dasan over 1 year ago in reply to Vyom Mishra1

TI__Guru* 85266 points

Hi Vyom,

I think I understood your usecase.

Upto the output of TIDL node, you would have an object array, whereas, the X node should perform different actions based on the left, right, Front or rear..

Is my understanding correct?

If yes, you have 2 options as shown below

Option 1:

If you have some flag or variable from the tidl output tensor, from where you could know if the image is from left, right, front or rear camera, then you could implement the process function of the X node such that based on this variable from output tensor, it performs each algo.
In this case, you could use vxReplicateNode, since the split happens in the process function based on the output tensor from the TIDL node.

Option 2:

Have 4 nodes X1, X2, X3, X4 each having unique process functions specific to each camera. For this, you could use the tivxObjArraySplitNode() which would take in 1 object_array of 4 elements and gives 4 object_arrays of 1 element each.
So after the TIDL node, you can pass the output_Tensor object array to this node, and then pass each input to X1, X2, X3 and X4 nodes.

tivxObjArraySplitNode is currently used in the multicam demo for reference.

Please check which of the above option is feasible for you and go ahead with the same.

Regards,

Nikhil

0 Vyom Mishra1 over 1 year ago in reply to Nikhil Dasan

Genius 4590 points

Dear Sir,

Thanks for the detailed response.

For Option1: Please let me know how we can separate out the TIDL output tensor(mask) channel knowing the channel(camera) position

For option 2: tivxObjArraySplitNode is only available from version SDK 8.6, my current software is on PSDK 7

So, I will not be able to use this Node.

I would like to know, any other way to separate the TIDL Output vx_tensor(mask) of each channel.

Thanks and Regards,

Vyom Mishra

0 Nikhil Dasan over 1 year ago in reply to Vyom Mishra1

TI__Guru* 85266 points

Hi Vyom,

Either you could port this node to PSDK 7 or you could use vxGetObjectArrayItem() API to access each tensor from the object array

Regards,

Nikhil

0 Vyom Mishra1 over 1 year ago in reply to Nikhil Dasan

Genius 4590 points

Dear Sir,

Thanks for the reply.

In the case of vxGetObjectArrayItem(),

as per my understanding, always 0 index is always passed to the Node which means the object array[0] holds all channel output which is replicated for the num_cameras.

If I am using non replicated custom node after the TIDL Node as mentioned earlier and use vxGetObjectArrayItem(,1/2/3) will it not always access channel 0 of that index?

Thanks and Regards,

Vyom Mishra

0 Nikhil Dasan over 1 year ago in reply to Vyom Mishra1

TI__Guru* 85266 points

Vyom Mishra1 said:
as per my understanding, always 0 index is always passed to the Node which means the object array[0] holds all channel output which is replicated for the num_cameras.

This is not the case, Only if you are using a replicated node, you will pass the 0th instance.

If you are not replicating the node, the 0th instance is the only channel the node would process.

Hence, you could give 1, 2 and 3 instances to other nodes and don't call vxReplicateNode() for those nodes. In this case, it would process only those channels which are passed into it.

Regards,

Nikhil

0 Vyom Mishra1 over 1 year ago in reply to Nikhil Dasan

Genius 4590 points

Dear Sir,

Thanks for the response!

Following the same approach and instead of creating 4 separated nodes, initially I have created a single Node after TIDL

Please find the PyTIOVX wrapper

from tiovx import *
os.environ['CUSTOM_KERNEL_PATH'] = "/home/PSDK7/tiovx/kernels_j7"
code = KernelExportCode("postprocessing",Core.C66,"CUSTOM_KERNEL_PATH")
code.setCoreDirectory("c66")
kernel = Kernel("postprocessing")
#kernel.setParameter(Type.USER_DATA_OBJECT, Direction.INPUT, ParamState.REQUIRED, "out_args")
kernel.setParameter(Type.TENSOR, Direction.INPUT, ParamState.REQUIRED, "INPUT_SL")
kernel.setParameter(Type.TENSOR, Direction.INPUT, ParamState.REQUIRED, "INPUT_SR")
kernel.setParameter(Type.TENSOR, Direction.INPUT, ParamState.REQUIRED, "INPUT_F")
kernel.setParameter(Type.TENSOR, Direction.INPUT, ParamState.REQUIRED, "INPUTS_R")

kernel.setParameter(Type.TENSOR, Direction.OUTPUT, ParamState.REQUIRED, "OUTPUT_SL")
kernel.setParameter(Type.TENSOR, Direction.OUTPUT, ParamState.REQUIRED, "OUTPUT_SR")
kernel.setParameter(Type.TENSOR, Direction.OUTPUT, ParamState.REQUIRED, "OUTPUT_F")
kernel.setParameter(Type.TENSOR, Direction.OUTPUT, ParamState.REQUIRED, "OUTPUT_R")

kernel.setTarget(Target.DSP1)
code.export(kernel)

My input to the custom node is the TIDL Output Segmentation Mask, the output of the custom node is also the same with some post-processing logic.

So, allocation of the output buffer for the custom Node is taken care of the same way done in TIDL.

But I am facing the below errors at runtime:

Verify is next! 
    76.812801 s:  VX_ZONE_ERROR:[ownContextSendCmd:784] Command ack message returned failure cmd_status: -7
    76.812905 s:  VX_ZONE_ERROR:[ownContextSendCmd:818] tivxEventWait() failed.
    76.812931 s:  VX_ZONE_ERROR:[ownNodeKernelInit:526] Target kernel, TIVX_CMD_NODE_CREATE failed
    76.812961 s:  VX_ZONE_ERROR:[ownNodeKernelInit:527] Please be sure the target callbacks have been registered for this core
    76.812997 s:  VX_ZONE_ERROR:[ownNodeKernelInit:528] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
    76.813130 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 3, kernel com.ti.postprocessing.postprocessing ... failed !!!
    76.813369 s:  VX_ZONE_ERROR:[vxVerifyGraph:2010] Node kernel init failed
    76.813397 s:  VX_ZONE_ERROR:[vxVerifyGraph:2064] Graph verify failed
Grapy verify FAILURE

Please let me know, how I can decode these errors.

Thanks and Regards,

Vyom Mishra

0 Nikhil Dasan over 1 year ago in reply to Vyom Mishra1

TI__Guru* 85266 points

Hi Vyom,

The error seems to come from the create function of your node. Please check inside this function.

Also please check once if you have registered the node on the target side.

Regards,

Nikhil

0 Vyom Mishra1 over 1 year ago in reply to Nikhil Dasan

Genius 4590 points

Dear Sir,

Do you mean this function call?

static vx_status VX_CALLBACK tivxpostprocessingCreate(
       tivx_target_kernel_instance kernel,
       tivx_obj_desc_t *obj_desc[],
       uint16_t num_params, void *priv_arg)
{
    vx_status status = (vx_status)VX_SUCCESS;

    /* < DEVELOPER_TODO: (Optional) Add any target kernel create code here (e.g. allocating */
    /*                   local memory buffers, one time initialization, etc) > */

    return status;
}

I have not modified it, and currently, I am focusing on an empty Kernel to check the portability.

I have created a Node for the required input and output buffer management.

And, I have registered the kernel on the Target side also at "/home/bsp/vision_apps/apps/basic_demos/app_tirtos/common/app_init.c"

I am sharing the log again!

Verify is next! 
    84.647542 s:  VX_ZONE_ERROR:[ownContextSendCmd:784] Command ack message returned failure cmd_status: -7
    84.647653 s:  VX_ZONE_ERROR:[ownContextSendCmd:818] tivxEventWait() failed.
    84.647679 s:  VX_ZONE_ERROR:[ownNodeKernelInit:526] Target kernel, TIVX_CMD_NODE_CREATE failed
    84.647711 s:  VX_ZONE_ERROR:[ownNodeKernelInit:527] Please be sure the target callbacks have been registered for this core
    84.647748 s:  VX_ZONE_ERROR:[ownNodeKernelInit:528] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
    84.647887 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 3, kernel com.ti.postprocessing.postprocessing ... failed !!!
    84.648122 s:  VX_ZONE_ERROR:[vxVerifyGraph:2010] Node kernel init failed
    84.648150 s:  VX_ZONE_ERROR:[vxVerifyGraph:2064] Graph verify failed
Grapy verify FAILURE!
[0000000083611008][1][CAMERA_DAEMON][I] FPS: 30.01
    84.664259 s:  VX_ZONE_ERROR:[tivxExportGraphToDot:1414] Invalid parameters or graph node not verifiedApp Verify Graph Done! 
Run  is next! 
Scaler delete done!
Pre Proc delete done!
TIDL delete done!

Thanks and Regards,

Vyom Mishra

0 Nikhil Dasan over 1 year ago in reply to Vyom Mishra1

TI__Guru* 85266 points

Hi,

Vyom Mishra1 said:
I have not modified it, and currently, I am focusing on an empty Kernel to check the portability.

Ok, since you do not have anything in create API, I am suspecting whether the register on target is done correctly.

In the same target file, you should have an Add API which would be called when target registration happens. Could you put a logs in this and check if this API is being called?

Which core are you running this node as target? Please ensure that this logs comes from that core.

Could you share the full application log along with vision_apps_init.sh script logs after doing this?

Regards,

Nikhil

0 Vyom Mishra1 over 1 year ago in reply to Nikhil Dasan

Genius 4590 points

Dear Sir,

I have written a module as

#include "app_post_proc_module.h"


static void createOutputTensors(vx_context context, vx_user_data_object config, vx_tensor output_tensors[])
{
    vx_size output_sizes[APP_MODULES_MAX_TENSOR_DIMS];
    vx_map_id map_id_config;

    vx_uint32 id;

    tivxTIDLJ7Params *tidlParams;
    sTIDL_IOBufDesc_t *ioBufDesc;

    vxMapUserDataObject(config, 0, sizeof(tivxTIDLJ7Params), &map_id_config,
                      (void **)&tidlParams, VX_READ_ONLY, VX_MEMORY_TYPE_HOST, 0);

    ioBufDesc = (sTIDL_IOBufDesc_t *)&tidlParams->ioBufDesc;
    for(id = 0; id < ioBufDesc->numOutputBuf; id++) {
        output_sizes[0] = ioBufDesc->outWidth[id]  + ioBufDesc->outPadL[id] + ioBufDesc->outPadR[id];
        output_sizes[1] = ioBufDesc->outHeight[id] + ioBufDesc->outPadT[id] + ioBufDesc->outPadB[id];
        output_sizes[2] = ioBufDesc->outNumChannels[id];

        vx_enum data_type = get_vx_tensor_datatype(ioBufDesc->outElementType[id]);
        output_tensors[id] = vxCreateTensor(context, 3, output_sizes, data_type, 0);
    }

  vxUnmapUserDataObject(config, map_id_config);

  return;
}
vx_status app_init_post_proc(vx_context context, TIDLObj *tidlObj,PostprocObj *postprocObj, char *objName,vx_int32 num_cameras)
{

    int i;
    vx_map_id map_id_config;
    sTIDL_IOBufDesc_t *ioBufDesc;
    tivxTIDLJ7Params *tidlParams;
    vx_status status = VX_SUCCESS;
    status = vxMapUserDataObject(tidlObj->config, 0, sizeof(tivxTIDLJ7Params), &map_id_config,
                    (void **)&tidlParams, VX_READ_ONLY, VX_MEMORY_TYPE_HOST, 0);

    ioBufDesc = (sTIDL_IOBufDesc_t *)&tidlParams->ioBufDesc;
    memcpy(&postprocObj->ioBufDesc, ioBufDesc, sizeof(sTIDL_IOBufDesc_t));


	vx_tensor output_tensors[APP_MODULES_MAX_TENSORS];
	if(status == VX_SUCCESS)
	{
	    createOutputTensors(context,tidlObj->config, output_tensors);

	     for(i = 0; i < tidlObj->num_output_tensors; i++)
	     {
	    	 postprocObj->output_tensor_arr[i]  = vxCreateObjectArray(context, (vx_reference)output_tensors[i], num_cameras);
	         vxReleaseTensor(&output_tensors[i]);
	     }
		//printf("postproc init:%d,%d\n",postprocObj->out_width,postprocObj->out_height);

	}
	vxUnmapUserDataObject(tidlObj->config, map_id_config);
	return status;
}
vx_status app_update_postproc(PostprocObj *postprocObj, vx_user_data_object config)
{
	vx_status status = VX_SUCCESS;

	/**
	 * Add if needed later
	 */
	return status;

}

vx_status app_create_graph_post_proc(vx_context context,vx_graph graph,TIDLObj *tidlObj, PostprocObj *postprocObj, vx_object_array input_tensors_arr)
{
	
	vx_status status = VX_SUCCESS;

	vx_tensor input_tensors[APP_MAX_TENSORS];
	vx_tensor output_tensors[APP_MAX_TENSORS];

	vx_int32 i;

	input_tensors[0] = (vx_tensor)vxGetObjectArrayItem((vx_object_array)input_tensors_arr, 0);

	input_tensors[1] = (vx_tensor)vxGetObjectArrayItem((vx_object_array)input_tensors_arr, 1);

	input_tensors[2] = (vx_tensor)vxGetObjectArrayItem((vx_object_array)input_tensors_arr, 2);

	input_tensors[3] = (vx_tensor)vxGetObjectArrayItem((vx_object_array)input_tensors_arr, 3);

	

	for(i = 0; i < tidlObj->num_output_tensors; i++)
	{
		printf("i=%d\n",i);
	    output_tensors[0] = (vx_tensor)vxGetObjectArrayItem((vx_object_array)postprocObj->output_tensor_arr[i], 0);
	    output_tensors[1] = (vx_tensor)vxGetObjectArrayItem((vx_object_array)postprocObj->output_tensor_arr[i], 1);
	    output_tensors[2] = (vx_tensor)vxGetObjectArrayItem((vx_object_array)postprocObj->output_tensor_arr[i], 2);
	    output_tensors[3] = (vx_tensor)vxGetObjectArrayItem((vx_object_array)postprocObj->output_tensor_arr[i], 3);
	}

	
	postprocObj->node = tivxPostprocNode(graph,
			input_tensors[0],
			input_tensors[1],
			input_tensors[2],
			input_tensors[3],
			output_tensors[0],
			output_tensors[1],
			output_tensors[2],
			output_tensors[3]
			);
	
	APP_ASSERT_VALID_REF(postprocObj->node);
	vxSetNodeTarget(postprocObj->node, VX_TARGET_STRING, TIVX_TARGET_DSP1);
	vxSetReferenceName((vx_reference)postprocObj->node, "PostProcNode");


//	vx_bool replicate[] = {vx_true_e, vx_true_e, vx_true_e};
//	if(status == VX_SUCCESS)
//	{
//	   status = vxReplicateNode(graph, postprocObj->node, replicate, 3);
//	}


	printf("Tensor releaseing Start Done!\n ");
	for(i = 0; i < tidlObj->num_input_tensors; i++)
	{
	    vxReleaseTensor(&input_tensors[i]);
	}

	for(i = 0; i < tidlObj->num_output_tensors; i++)
	{
	    vxReleaseTensor(&output_tensors[i]);
	}

	printf("Tensor releaseing  Done!\n ");

}
void app_deinit_post_proc(PostprocObj *postprocObj)
{
	vxReleaseObjectArray(&postprocObj->output_image_arr);
}
void app_delete_post_proc(PostprocObj *postprocObj)
{
	if(postprocObj->node != NULL)
	{
		vxReleaseNode(&postprocObj->node);
	}
}

Can you please let me know if the buffers are mapped as per your suggestion for the Non-Replicated Node?

Apart from this,

I was facing an issue with kernel registration, so to accelerate the development I integrated my custom kernel to /vision_apps/kernel/img_proc

I am again facing the issue related to kernel registration at run-time

Scaler graph done!
[SCALER-MODULE-CREATE] Exit! 
Pre proc graph done!
TIDL graph done!
I am inside app_create_graph_post_proc
  Before Output Mapping in Post-proc !
 i=0
 Before tivxPostprocNode !
    118.534026 s:  VX_ZONE_ERROR:[tivxCreateNodeByKernelName:136] Call to vxGetKernelByName failed; kernel may not be registered
tivxPostProcNode Done!
    118.534081 s:  VX_ZONE_ERROR:[vxSetReferenceName:646] Invalid reference
 Post Proc graph done!
BEFORE pipeline
set pipeline setup! 
Begin : vxSetGraphScheduleConfig
End : vxSetGraphScheduleConfig
set node parame num
Pipeline params setup done!
exit create graph
App Create Graph Done! CAM from Daemon 
Verify is next! 
Vyom Grapy verify SUCCESS!

What could be the possible reasons for this?

I ran vision_apps_init.sh, but due to restrictions, I am unable to share the logs.

But I couldn't find any VX_ZONE errors in the log after running vision_apps_init.sh( earlier with custom kernel in the TIOVX folder I was seeing error logs related to custom kernel registration on core, Now the place for the custom kernel is changed).

Please provide your suggestions for improvements.

Thanks and Regards,

Vyom Mishra

0 Nikhil Dasan over 1 year ago in reply to Nikhil Dasan

TI__Guru* 85266 points

Hi,

Nikhil Dasan said:
In the same target file, you should have an Add API which would be called when target registration happens. Could you put a logs in this and check if this API is being called?

As mentioned above, could you put prints in ADD function on both HOST and Target side and ensure that they both are printing?

This way we could confirm that the kernel is registered on both host and target side.

Post which, you wouldn't see the below error

118.534026 s: VX_ZONE_ERROR:[tivxCreateNodeByKernelName:136] Call to vxGetKernelByName failed; kernel may not be registered
tivxPostProcNode Done!
118.534081 s: VX_ZONE_ERROR:[vxSetReferenceName:646] Invalid reference

Regards,

Nikhil

Processors

Processors forum

TDA4VM: Scalar Node's 4-channel output with different resolution