Other Parts Discussed in Thread: TDA4VM
Tool/software:
Hi,
I'm trying to perform inference of a custom UNet-style semantic segmentation network on the TDA4VM chip. I'm using the 09_02_09 version of edgeai-tidl-tools, both for the docker and the board setup. I cannot publicly share the onnx file of the net, but:
1. It uses the onnx ops Div, Sub, Conv, ReLU, MaxPool, Resize, Concat, Slice, ConvTranspose and Argmax
2. No errors are encountered when compiling the model using the TIDLCompilationProvider and the following provider options
provider_options = { 'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'], 'artifacts_folder' : artifact_path, 'advanced_options:quant_params_proto_path': onnx_path[:-len('.onnx')] + '_qparams.prototxt', "platform":"J7", 'debug_level' : 2, 'tensor_bits' : 8, 'advanced_options:calibration_frames' : num_calibration_images, 'advanced_options:calibration_iterations' : 2, 'accuracy_level' : 1, "advanced_options:add_data_convert_ops" : 1 }}
Below are relevant parts of the log of the compilation process
tidl_tools_path = /home/root/tidl_tools artifacts_folder = dnn2s/dnn/tidl_artifacts tidl_tensor_bits = 8 debug_level = 2 num_tidl_subgraphs = 16 tidl_denylist = tidl_denylist_layer_name = tidl_denylist_layer_type = tidl_allowlist_layer_name = model_type = tidl_calibration_accuracy_level = 7 tidl_calibration_options:num_frames_calibration = 10 tidl_calibration_options:bias_calibration_iterations = 1 mixed_precision_factor = -1.000000 model_group_id = 0 power_of_2_quantization = 2 ONNX QDQ Enabled = 0 enable_high_resolution_optimization = 0 pre_batchnorm_fold = 1 add_data_convert_ops = 1 output_feature_16bit_names_list = m_params_16bit_names_list = m_single_core_layers_names_list = reserved_compile_constraints_flag = 1601 ti_internal_reserved_1 = ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options****** Supported TIDL layer type --- Div -- /model/img_norm/Div [... more supported layers ...] Preliminary subgraphs created = 1 Final number of subgraphs created are : 1, - Offloaded Nodes - 94, Total Nodes - 94 INFORMATION -- [TIDL_ResizeLayer] Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION -- [TIDL_ResizeLayer] Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION -- [TIDL_ResizeLayer] Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION -- [TIDL_ResizeLayer] Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION -- [TIDL_ResizeLayer] Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION -- [TIDL_ResizeLayer] Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. SUGGESTION -- [TIDL_Deconv2DLayer] Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient. ALLOWLISTING : Argmax Layer: Warning : only keepdims = 1 supported for Argmax Layer - forcing it to be 1 -- file info - tidl_onnxImport.cpp , TIDL_onnxMapArgmaxBaseParams , 2093 Running runtimes graphviz - /home/root/tidl_tools/tidl_graphVisualiser_runtimes.out dnn2s/dnn/tidl_artifacts/allowedNode.txt dnn2s/dnn/tidl_artifacts/tempDir/graphvizInfo.txt dnn2s/dnn/tidl_artifacts/tempDir/runtimes_visualization.svg *** In TIDL_createStateImportFunc *** Compute on node : TIDLExecutionProvider_TIDL_0_0 0, Div, 2, 1, image, /model/img_norm/Div_output_0 [...more nodes...] Input tensor name - image Output tensor name - 3 Graph Domain TO version : 18In TIDL_onnxRtImportInit subgraph_name=subgraph_0 Layer 0, subgraph id subgraph_0, name=3 Layer 1, subgraph id subgraph_0, name=image In TIDL_runtimesOptimizeNet: LayerIndex = 96, dataIndex = 95 ************** Frame index 1 : Running float import ************* In TIDL_runtimesPostProcessNet In TIDL_runtimesPostProcessNet 1 In TIDL_runtimesPostProcessNet 2 In TIDL_runtimesPostProcessNet 3 INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_5/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_4/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_3/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_2/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_1/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_0/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. SUGGESTION: [TIDL_Deconv2DLayer] /model/segmentation_decoder_depth_to_space/ConvTranspose Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient. **************************************************** ** 7 WARNINGS 0 ERRORS ** **************************************************** In TIDL_runtimesPostProcessNet 4 ************ in TIDL_subgraphRtCreate ************ The soft limit is 2048 The hard limit is 2048 MEM: Init ... !!! MEM: Init ... Done !!! 0.0s: VX_ZONE_INIT:Enabled 0.6s: VX_ZONE_ERROR:Enabled 0.7s: VX_ZONE_WARNING:Enabled 0.3209s: VX_ZONE_INIT:[tivxInit:185] Initialization Done !!! -------------------------------------------- TIDL Memory size requiement (record wise): MemRecNum , Space , Attribute , Alignment , Size(KBytes), BasePtr 0 , DDR Cacheable , Persistent , 128, 19.25 , 0x00000000 1 , DDR Cacheable , Persistent , 128, 0.64 , 0x00000000 2 , DDR Cacheable , Scratch , 128, 16.00 , 0x00000000 3 , DDR Cacheable , Scratch , 128, 4.00 , 0x00000000 4 , DDR Cacheable , Scratch , 128, 56.00 , 0x00000000 5 , DDR Cacheable , Persistent , 128, 1292.66 , 0x00000000 6 , DDR Cacheable , Scratch , 128, 191370.24, 0x00000000 7 , DDR Cacheable , Scratch , 128, 0.12 , 0x00000000 8 , DDR Cacheable , Scratch , 128, 157440.12, 0x00000000 9 , DDR Cacheable , Scratch , 128, 78723.00, 0x00000000 10 , DDR Cacheable , Persistent , 128, 832.70 , 0x00000000 11 , DDR Cacheable , Scratch , 128, 512.25 , 0x00000000 12 , DDR Cacheable , Persistent , 128, 0.12 , 0x00000000 13 , DDR Cacheable , Persistent , 128, 101336.45, 0x00000000 14 , DDR Cacheable , Persistent , 128, 0.00 , 0x00000000 -------------------------------------------- Total memory size requirement (space wise): Mem Space , Size(KBytes) DDR Cacheable, 531603.56 -------------------------------------------- NOTE: Memory requirement in host emulation can be different from the same on EVM To get the actual TIDL memory requirement make sure to run on EVM with debugTraceLevel = 2 -------------------------------------------- TIDL init call from ivision API -------------------------------------------- TIDL Memory size requiement (record wise): MemRecNum , Space , Attribute , Alignment , Size(KBytes), BasePtr 0 , DDR Cacheable , Persistent , 128, 19.25 , 0x77d00000 1 , DDR Cacheable , Persistent , 128, 0.64 , 0xa234d000 2 , DDR Cacheable , Scratch , 128, 16.00 , 0xa1cca000 3 , DDR Cacheable , Scratch , 128, 4.00 , 0xa234c000 4 , DDR Cacheable , Scratch , 128, 56.00 , 0x6e7ea000 5 , DDR Cacheable , Persistent , 128, 1292.66 , 0x6e6a6000 6 , DDR Cacheable , Scratch , 128, 191370.24, 0x0051d000 7 , DDR Cacheable , Scratch , 128, 0.12 , 0xa234b000 8 , DDR Cacheable , Scratch , 128, 157440.12, 0xf6b5c000 9 , DDR Cacheable , Scratch , 128, 78723.00, 0xf1e7b000 10 , DDR Cacheable , Persistent , 128, 832.70 , 0x6c05a000 11 , DDR Cacheable , Scratch , 128, 512.25 , 0x5c8c6000 12 , DDR Cacheable , Persistent , 128, 0.12 , 0xa234a000 13 , DDR Cacheable , Persistent , 128, 101336.45, 0xebb84000 14 , DDR Cacheable , Persistent , 128, 0.00 , 0xa1c69000 -------------------------------------------- Total memory size requirement (space wise): Mem Space , Size(KBytes) DDR Cacheable, 531603.56 -------------------------------------------- NOTE: Memory requirement in host emulation can be different from the same on EVM To get the actual TIDL memory requirement make sure to run on EV2024-07-22 19:49:21.058246943 [W:onnxruntime:, execution_frame.cc:835 VerifyOutputSizes] Expected shape from model of {1,384,640} does not match actual shape of {1,1,1,1,384,640} for output 3 M with debugTraceLevel = 2 -------------------------------------------- Alg Init for Layer # - 1 [...more layers...] PREEMPTION: Adding a new priority object for targetPriority = 2, handle = 0x7f0977d00000 PREEMPTION: Now total number of priority objects = 1 at priorityId = 2, with new memRec of base = 0x7f09a234a000 and size = 128 PREEMPTION: Requesting context memory addr for handle 0x7f0977d00000, return Addr = 0x7f09482ab1b8 ************ TIDL_subgraphRtCreate done ************ ******* In TIDL_subgraphRtInvoke ******** TIDL_activate is called with handle : 77d00000 Core 0 Alg Process for Layer # - 0, layer type 0 Core 0 Alg Process for Layer # - 1, layer type 39 [...more layers...] TIDL_process is completed with handle : 77d00000 Layer, Layer Cycles,kernelOnlyCycles, coreLoopCycles,LayerSetupCycles,dmaPipeupCycles, dmaPipeDownCycles, PrefetchCycles,copyKerCoeffCycles,LayerDeinitCycles,LastBlockCycles, paddingTrigger, paddingWait,LayerWithoutPad,LayerHandleCopy, BackupCycles, RestoreCycles,Multic7xContextCopyCycles, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, [...more layers...] Sub Graph Stats 582.000000 8441229.000000 2428.000000 ******* TIDL_subgraphRtInvoke done ******** ********** Frame Index 1 : Running float inference ********** Graph Domain TO version : 18******* In TIDL_subgraphRtInvoke ******** Core 0 Alg Process for Layer # - 0, layer type 0 [...more layers...] ******* TIDL_subgraphRtInvoke done ******** ********** Frame Index 2 : Running float inference ********** Graph Domain TO version : 18******* In TIDL_subgraphRtInvoke ******** Core 0 Alg Process for Layer # - 0, layer type 0 [...more of the same...] Processing config file #0 : /home/root/dnn2s/dnn/tidl_artifacts/tempDir/subgraph_0_tidl_io_.qunat_stats_config.txt Freeing memory for user provided Net ----------------------- TIDL Process with REF_ONLY FLOW ------------------------ # 0 . .. T 8395.28 .... ..... ... .... ..... # 1 . .. T 7889.41 .... ..... ... .... ..... # 2 . .. T 7859.25 .... ..... ... .... ..... # 3 . .. T 7913.51 .... ..... ... .... ..... # 4 . .. T 7820.78 .... ..... ... .... ..... # 5 . .. T 7886.22 .... ..... ... .... ..... # 6 . .. T 7904.02 .... ..... ... .... ..... # 7 . .. T 7866.40 .... ..... ... .... ..... # 8 . .. T 7957.14 .... ..... ... .... ..... # 9 . .. T 7941.50 .... ..... ... .... ..... Processing config file #0 : /home/root/dnn2s/dnn/tidl_artifacts/tempDir/subgraph_0_tidl_io_.qunat_stats_config.txt Freeing memory for user provided Net ----------------------- TIDL Process with REF_ONLY FLOW ------------------------ # 0 . .. T 5099.36 .... ..... ... .... ..... # 1 . .. T 4679.27 .... ..... ... .... ..... # 2 . .. T 4680.17 .... ..... ... .... ..... # 3 . .. T 4732.99 .... ..... ... .... ..... # 4 . .. T 4708.00 .... ..... ... .... ..... # 5 . .. T 4765.57 .... ..... ... .... ..... # 6 . .. T 4709.29 .... ..... ... .... ..... # 7 . .. T 4685.30 .... ..... ... .... ..... # 8 . .. T 4758.10 .... ..... ... .... ..... # 9 . .. T 4759.16 .... ..... ... .... ..... ------------------ Network Compiler Traces ----------------------------- successful Memory allocation successful Workload Creation 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, Sum of Layer Cycles 0 Sub Graph Stats 622.000000 7787849.000000 1115.000000 ******* TIDL_subgraphRtInvoke done ******** ********** Frame Index 10 : Running fixed point mode for calibration ********** In TIDL_runtimesPostProcessNet In TIDL_runtimesPostProcessNet 1 In TIDL_runtimesPostProcessNet 2 In TIDL_runtimesPostProcessNet 3 Parameters unavailable, running calibration! ~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~ ***************** Calibration iteration number 0 started ************************ Parameters unavailable, running calibration! ~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~ ***************** Calibration iteration number 0 completed ************************ Parameters unavailable, running calibration! Output network quant params prototxt file path: dnn2s/dnn/tidl_artifacts/tempDir/subgraph_0_tidl_net_quant_params.prototxt Calibrated Quant Parameters stored in protoTxt format INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_5/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_4/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_3/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_2/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_1/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. INFORMATION: [TIDL_ResizeLayer] /model/segmentation_decoder/up_layer_0/upsample/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be replaced by 4x4 resize followed by 2x2 resize. SUGGESTION: [TIDL_Deconv2DLayer] /model/segmentation_decoder_depth_to_space/ConvTranspose Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient. **************************************************** ** 7 WARNINGS 0 ERRORS ** **************************************************** In TIDL_runtimesPostProcessNet 4 ************ in TIDL_subgraphRtDelete ************ TIDL_deactivate is called with handle : 77d00000 PREEMPTION: Removing priroty object with handle = 0x7f0977d00000 and targetPriority = 2, Number of obejcts left are = 0, removed object with base = 0x7f09a234a000 and size =128 MEM: Deinit ... !!! MEM: Alloc's: 25 alloc's of 653532697 bytes MEM: Free's : 25 free's of 653532697 bytes MEM: Open's : 0 allocs of 0 bytes MEM: Deinit ... Done !!!
3. When evaluating on the docker using the TIDLExecutionProvider (emulation), the outputs look good (similar to ground truth and float32 eval using only the onnx file and CPUExecutionProvider). Below are relevant parts of the log.
libtidl_onnxrt_EP loaded 0x55a89d3a0540 artifacts_folder = dnn2s/dnn debug_level = 2 target_priority = 0 max_pre_empt_delay = 340282346638528859811704183484516925440.000000 Final number of subgraphs created are : 1, - Offloaded Nodes - 94, Total Nodes - 94 In TIDL_createStateInfer Compute on node : TIDLExecutionProvider_TIDL_0_0 ************ in TIDL_subgraphRtCreate ************ The soft limit is 2048 The hard limit is 2048 MEM: Init ... !!! MEM: Init ... Done !!! 0.0s: VX_ZONE_INIT:Enabled 0.7s: VX_ZONE_ERROR:Enabled 0.8s: VX_ZONE_WARNING:Enabled 0.3224s: VX_ZONE_INIT:[tivxInit:185] Initialization Done !!! -------------------------------------------- TIDL Memory size requiement (record wise): MemRecNum , Space , Attribute , Alignment , Size(KBytes), BasePtr 0 , DDR Cacheable , Persistent , 128, 19.25 , 0x00000000 1 , DDR Cacheable , Persistent , 128, 0.64 , 0x00000000 2 , DDR Cacheable , Scratch , 128, 16.00 , 0x00000000 3 , DDR Cacheable , Scratch , 128, 448.00 , 0x00000000 4 , DDR Cacheable , Scratch , 128, 7968.00 , 0x00000000 5 , DDR Cacheable , Persistent , 128, 24090.16, 0x00000000 6 , DDR Cacheable , Scratch , 128, 15.62 , 0x00000000 7 , DDR Cacheable , Scratch , 128, 24682.50, 0x00000000 8 , DDR Cacheable , Scratch , 128, 79300.12, 0x00000000 9 , DDR Cacheable , Scratch , 128, 29538.38, 0x00000000 10 , DDR Cacheable , Persistent , 128, 832.70 , 0x00000000 11 , DDR Cacheable , Scratch , 128, 512.25 , 0x00000000 12 , DDR Cacheable , Persistent , 128, 0.12 , 0x00000000 13 , DDR Cacheable , Persistent , 128, 25206.34, 0x00000000 14 , DDR Cacheable , Persistent , 128, 0.00 , 0x00000000 -------------------------------------------- Total memory size requirement (space wise): Mem Space , Size(KBytes) DDR Cacheable, 192630.09 -------------------------------------------- NOTE: Memory requirement in host emulation can be different from the same on EVM To get the actual TIDL memory requirement make sure to run on EVM with debugTraceLevel = 2 -------------------------------------------- TIDL init call from ivision API -------------------------------------------- TIDL Memory size requiement (record wise): MemRecNum , Space , Attribute , Alignment , Size(KBytes), BasePtr 0 , DDR Cacheable , Persistent , 128, 19.25 , 0x1e3b4000 1 , DDR Cacheable , Persistent , 128, 0.64 , 0x489fe000 2 , DDR Cacheable , Scratch , 128, 16.00 , 0x4837b000 3 , DDR Cacheable , Scratch , 128, 448.00 , 0x18005000 4 , DDR Cacheable , Scratch , 128, 7968.00 , 0x10031000 5 , DDR Cacheable , Persistent , 128, 24090.16, 0xc2879000 6 , DDR Cacheable , Scratch , 128, 15.62 , 0x4830d000 7 , DDR Cacheable , Scratch , 128, 24682.50, 0xc105e000 8 , DDR Cacheable , Scratch , 128, 79300.12, 0xbc2ec000 9 , DDR Cacheable , Scratch , 128, 29538.38, 0xba613000 10 , DDR Cacheable , Persistent , 128, 832.70 , 0x02f2d000 11 , DDR Cacheable , Scratch , 128, 512.25 , 0x02eac000 12 , DDR Cacheable , Persistent , 128, 0.12 , 0x489fd000 13 , DDR Cacheable , Persistent , 128, 25206.34, 0xb8d75000 14 , DDR Cacheable , Persistent , 128, 0.00 , 0x489fc000 -------------------------------------------- Total memory size requirement (space wise): Mem Space , Size(KBytes) DDR Cacheable, 192630.09 -------------------------------------------- NOTE: Memory requirement in host e2024-07-22 20:02:01.939497478 [W:onnxruntime:, execution_frame.cc:835 VerifyOutputSizes] Expected shape from model of {1,384,640} does not match actual shape of {1,1,1,1,384,640} for output 3 mulation can be different from the same on EVM To get the actual TIDL memory requirement make sure to run on EVM with debugTraceLevel = 2 -------------------------------------------- Alg Init for Layer # - 2 [...more layers...] PREEMPTION: Adding a new priority object for targetPriority = 2, handle = 0x7f381e3b4000 PREEMPTION: Now total number of priority objects = 1 at priorityId = 2, with new memRec of base = 0x7f38489fd000 and size = 128 PREEMPTION: Requesting context memory addr for handle 0x7f381e3b4000, return Addr = 0x7f37f3fbb1b8 ************ TIDL_subgraphRtCreate done ************ ******* In TIDL_subgraphRtInvoke ******** TIDL_activate is called with handle : 1e3b4000 Core 0 Alg Process for Layer # - 2, layer type 29 Processing Layer # - 2 [...more layers...] Sub Graph Stats 520.000000 4272583.000000 1974.000000 ******* TIDL_subgraphRtInvoke done ******** ******* In TIDL_subgraphRtInvoke ******** [...more of the same...] ************ in TIDL_subgraphRtDelete ************ TIDL_deactivate is called with handle : 1e3b4000 PREEMPTION: Removing priroty object with handle = 0x7f381e3b4000 and targetPriority = 2, Number of obejcts left are = 0, removed object with base = 0x7f38489fd000 and size =128 MEM: Deinit ... !!! MEM: Alloc's: 25 alloc's of 227251277 bytes MEM: Free's : 25 free's of 227251277 bytes MEM: Open's : 0 allocs of 0 bytes MEM: Deinit ... Done !!!
However, transferring the model artifacts to the SOC and running inference there, the outputs look completely wrong (randomly spotted segmentation masks). Below are relevant parts of the log when running inference on the chip
[W:onnxruntime:, execution_frame.cc:835 VerifyOutputSizes] Expected shape from model of {1,384,640} does not match actual shape of {1,1,1,1,384,640} for output 3 libtidl_onnxrt_EP loaded 0x27f0a4f0 artifacts_folder = dnn debug_level = 2 target_priority = 0 max_pre_empt_delay = 340282346638528859811704183484516925440.000000 Final number of subgraphs created are : 1, - Offloaded Nodes - 94, Total Nodes - 94 In TIDL_createStateInfer Compute on node : TIDLExecutionProvider_TIDL_0_0 ************ in TIDL_subgraphRtCreate ************ APP: Init ... !!! MEM: Init ... !!! MEM: Initialized DMA HEAP (fd=5) !!! MEM: Init ... Done !!! IPC: Init ... !!! IPC: Init ... Done !!! REMOTE_SERVICE: Init ... !!! REMOTE_SERVICE: Init ... Done !!! 245418.251062 s: GTC Frequency = 200 MHz APP: Init ... Done !!! 245418.254235 s: VX_ZONE_INIT:Enabled 245418.254239 s: VX_ZONE_ERROR:Enabled 245418.254241 s: VX_ZONE_WARNING:Enabled 245418.256859 s: VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-0 245418.256937 s: VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-1 245418.257008 s: VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-2 245418.257065 s: VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-3 245418.257069 s: VX_ZONE_INIT:[tivxInitLocal:136] Initialization Done !!! 245418.261792 s: VX_ZONE_INIT:[tivxHostInitLocal:101] Initialization Done for HOST !!! ************ TIDL_subgraphRtCreate done ************ ******* In TIDL_subgraphRtInvoke ******** Layer, Layer Cycles,kernelOnlyCycles, coreLoopCycles,LayerSetupCycles,dmaPipeupCycles, dmaPipeDownCycles, PrefetchCycles,copyKerCoeffCycles,LayerDeinitCycles,LastBlockCycles, paddingTrigger, paddingWait,LayerWithoutPad,LayerHandleCopy, BackupCycles, RestoreCycles,Multic7xContextCopyCycles, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 278984, 160355, 201443, 10115, 14062, 0, 0, 0, 0, 0, 6032, 1, 0, 5227, 0, 0, 0, [...more layers...] Sum of Layer Cycles 18261825 Sub Graph Stats 1087.000000 21002.000000 1118.000000 ******* TIDL_subgraphRtInvoke done ******** ******* In TIDL_subgraphRtInvoke ******** [...more of the same...] ************ in TIDL_subgraphRtDelete ************ 245418.851722 s: VX_ZONE_INIT:[tivxHostDeInitLocal:115] De-Initialization Done for HOST !!! 245418.856110 s: VX_ZONE_INIT:[tivxDeInitLocal:204] De-Initialization Done !!! APP: Deinit ... !!! REMOTE_SERVICE: Deinit ... !!! REMOTE_SERVICE: Deinit ... Done !!! IPC: Deinit ... !!! IPC: DeInit ... Done !!! MEM: Deinit ... !!! DDR_SHARED_MEM: Alloc's: 7 alloc's of 29839696 bytes DDR_SHARED_MEM: Free's : 7 free's of 29839696 bytes DDR_SHARED_MEM: Open's : 0 allocs of 0 bytes MEM: Deinit ... Done !!! APP: Deinit ... Done !!!
I've tried generating traces using debug_level=3, which works in emulation mode in the docker, but the board eval just hangs whenever I select that debug level (this also happens to me for other kinds of networks).
Any help would be greatly appreciated.