AM68A: Problems compiling custom semantic segmentation model

T S

Part Number: AM68A
Other Parts Discussed in Thread: TDA4VL,

I've got a custom ONNX semantic segmentation model that I'm trying to compile using 09_02_06_00 edgeai-tidl-tools for the AM68A/TDA4VL platform. I've integrated it into the "model_configs.py" configuration successfully. I can execute the model inference via `python3 ./onnxrt_ep.py -d -m ss-ort-800k-model-f1` and the resulting output looks good.

However, when I try to compile the model and run it on my x86 platform via TIDL emulation, the compiled version produces functionally incorrect output and seems to be only reporting a single class for the entire test image (I expect to see at least 3 classes appear). I ran the model with the "tensor_bits=32" configuration and the resulting output for that was good.

I suspect the problem may be associated with the Resize operator in the model. I get warnings about it when I compile it with "tensor_bits=8":

------------------ Network Compiler Traces -----------------------------                                                                                                                                           
NC running for device: 1                                                                                                                                                                                           
Running with OTF buffer optimizations                                                                                                                                                                              
successful Memory allocation                                                                                                                                                                                       
successful Workload Creation                                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool6/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool7/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool8/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool9/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool10/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be r
eplaced by 4x4 resize followed by 2x2 resize.                                                                                                                                                                      
****************************************************                                                                                                                                                               
**          5 WARNINGS          0 ERRORS          **                                                                                                                                                               
****************************************************

Looking at the supported operator list (github.com/.../supported_ops_rts_versions.md) shows that only 'symmetric' Resize is supported and my model uses "asymmetric". I think this may be the problem. I've tried to add "deny_list":"Resize" to my config, but when I do this and compile, the compilation hangs with "termiante called without an active exception". Here's what I see:

root@088f260a8b34:/home/root/shared_with_docker/edgeai-tidl-tools/examples/osrt_python/ort# python3 ./onnxrt_ep.py -m ss-ort-800k-model-f1 -c
Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 1 Models - ['ss-ort-800k-model-f1']


Running_Model :  ss-ort-800k-model-f1  


Running shape inference on model /home/root/shared_with_docker/deere-models/800k_model_f1.onnx 

Preliminary subgraphs created = 6 
Final number of subgraphs created are : 6, - Offloaded Nodes - 117, Total Nodes - 122 
floating_model: True
 Graph Domain TO version : 17
 ************** Frame index 1 : Running float import ************* 
****************************************************
**                ALL MODEL CHECK PASSED          **
****************************************************

The soft limit is 2048
The hard limit is 2048
MEM: Init ... !!!
MEM: Init ... Done !!!
 0.0s:  VX_ZONE_INIT:Enabled
 0.5s:  VX_ZONE_ERROR:Enabled
 0.7s:  VX_ZONE_WARNING:Enabled
 0.1397s:  VX_ZONE_INIT:[tivxInit:185] Initialization Done !!!

**********  Frame Index 1 : Running float inference **********
terminate called without an active exception

Why is compilation having issues when I try to deny "Resize"? Do I have any options apart from changing up the Resize parameter in the original model? I'm not 100% certain the Resize operator is the source -- I would expect the tool would report a compilation error if it ran into an unsupported operator configuration but I see no such thing. Are there any other debug steps I can perform to drill down to root cause?

over 1 year ago

0 T S over 1 year ago

Intellectual 310 points

I bumped "tensor_bits" to 16 and tried that. The output looked good, so perhaps the Resize operator is not the culprit?

0 T S over 1 year ago in reply to T S

Intellectual 310 points

I decided to proceed with the 16-bit version of my compiled model. The host TIDL emulation output matched what I expected so I copied things over to my AM68A dev board. When I attempt to run the model on the dev kit, however, I encounter VX_ZONE_ERROR messages that repeat until I interrupt the execution. See the following:

root@am68a-sk-amazing-louse:~/edgeai-tidl-tools/examples/osrt_python/ort# python3 ./onnxrt_ep.py -m ss-ort-800k-model-f1-with-argmax                                                                   
Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 1 Models - ['ss-ort-800k-model-f1-with-argmax']


Running_Model :  ss-ort-800k-model-f1-with-argmax  

libtidl_onnxrt_EP loaded 0x13707930 
Final number of subgraphs created are : 1, - Offloaded Nodes - 123, Total Nodes - 123 
APP: Init ... !!!
MEM: Init ... !!!
MEM: Initialized DMA HEAP (fd=5) !!!
MEM: Init ... Done !!!
IPC: Init ... !!!
IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
  2101.729499 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
  2101.729885 s:  VX_ZONE_INIT:Enabled
  2101.730093 s:  VX_ZONE_ERROR:Enabled
  2101.730227 s:  VX_ZONE_WARNING:Enabled
  2101.731159 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-0 
  2101.731662 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-1 
  2101.732067 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-2 
  2101.732503 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-3 
  2101.732705 s:  VX_ZONE_INIT:[tivxInitLocal:136] Initialization Done !!!
  2101.733599 s:  VX_ZONE_INIT:[tivxHostInitLocal:101] Initialization Done for HOST !!!
  2101.781268 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
  2101.781530 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  2101.781596 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
  2101.781730 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  2101.781803 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
  2101.781896 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
  2101.781954 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
TIDL_RT_OVX: ERROR: Verifying TIDL graph ... Failed !!!
TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
  2102.037263 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
  2102.037312 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  2102.037347 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
  2102.037375 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  2102.037399 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
  2102.037428 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
  2102.037451 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
  2102.037527 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
  2102.037554 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
  2102.037572 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
  2102.217000 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
  2102.217049 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  2102.217067 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
  2102.217080 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  2102.217091 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
  2102.217111 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
  2102.217122 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
  2102.217177 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
  2102.217188 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
  2102.217199 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
  2102.399082 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
  2102.399128 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  2102.399148 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
  2102.399159 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  2102.399172 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
  2102.399193 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
  2102.399204 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
  2102.399260 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
  2102.399272 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
  2102.399282 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
  2102.578604 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
  2102.578649 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  2102.578668 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
  2102.578680 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  2102.578693 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
  2102.578712 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
  2102.578722 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
  2102.578780 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
  2102.578791 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
  2102.578800 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
  2102.767335 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
  2102.767381 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  2102.767395 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
  2102.767406 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  2102.767417 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
  2102.767435 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
  2102.767444 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
  2102.767495 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
  2102.767505 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
  2102.767511 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
  2102.946385 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
  2102.946428 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  2102.946443 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
  2102.946456 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  2102.946466 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
  2102.946484 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
  2102.946491 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
  2102.946544 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
  2102.946553 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
  2102.946560 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
^CTraceback (most recent call last):
  File "/home/root/edgeai-tidl-tools/examples/osrt_python/ort/./onnxrt_ep.py", line 328, in <module>
    run_model(model, mIdx)
  File "/home/root/edgeai-tidl-tools/examples/osrt_python/ort/./onnxrt_ep.py", line 239, in run_model
    imgs, output, proc_time, sub_graph_time, height, width  = infer_image(sess, input_images, config)
  File "/home/root/edgeai-tidl-tools/examples/osrt_python/ort/./onnxrt_ep.py", line 115, in infer_image
    imgs.append(Image.open(image_files[i]).convert('RGB').resize((width, height), PIL.Image.LANCZOS))
  File "/usr/lib/python3.10/site-packages/PIL/Image.py", line 2192, in resize
    return self._new(self.im.resize(size, resample, box))
KeyboardInterrupt
  2103.147349 s:  VX_ZONE_INIT:[tivxHostDeInitLocal:115] De-Initialization Done for HOST !!!
  2103.151904 s:  VX_ZONE_INIT:[tivxDeInitLocal:204] De-Initialization Done !!!
APP: Deinit ... !!!
REMOTE_SERVICE: Deinit ... !!!
REMOTE_SERVICE: Deinit ... Done !!!
IPC: Deinit ... !!!
IPC: DeInit ... Done !!!
MEM: Deinit ... !!!
DDR_SHARED_MEM: Alloc's: 8 alloc's of 56453764 bytes 
DDR_SHARED_MEM: Free's : 8 free's  of 56453764 bytes 
DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes 
MEM: Deinit ... Done !!!
APP: Deinit ... Done !!!

What is causing this issue?

I confirmed that running inference for "ss-8610_onnxrt_ade20k32_edgeai-tv_deeplabv3plus_mobilenetv2_edgeailite_512x512_20210308_outby4_onnx" works on the dev kit, so my baseline setup should be good.

0 T S over 1 year ago in reply to T S

Intellectual 310 points

I'm not confident that setting "tensor_bits" to 16 actually worked and hence has issues trying to run on the target. I just noticed this being reported in the last stages when trying to compile the model using onnxrt_ep.py:

 *****************   Calibration iteration number 4 completed ************************                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
Empty prototxt path, running calibration
                                                                                                         
------------------ Network Compiler Traces -----------------------------                                 
NC running for device: 1
Running with OTF buffer optimizations
successful Memory allocation
ERROR : [file:src/gc_map_df_wl.c, func:generateMultipleWLForGroupConv, line:509] Memory limit exceeded for Workload Creation. Max number of Workload Limit per core is 1536                                        
Could not open /home/root/shared_with_docker/edgeai-tidl-tools/model-artifacts/ss-ort-800k-model-f1-with-argmax/tempDir/out_tidl_net/perfSimInfo.bin                                                               
SUGGESTION: [TIDL_BatchNormLayer] /backbone/0/BatchNormalization 16 bits is not optimal in this release. 
SUGGESTION: [TIDL_BatchNormLayer] /backbone/1/BatchNormalization 16 bits is not optimal in this release.
SUGGESTION: [TIDL_BatchNormLayer] /backbone/2_1/BatchNormalization 16 bits is not optimal in this release.                                                                                                         
INFORMATION: [TIDL_ResizeLayer] /unpool6/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.
INFORMATION: [TIDL_ResizeLayer] /unpool7/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.
INFORMATION: [TIDL_ResizeLayer] /unpool8/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.        
INFORMATION: [TIDL_ResizeLayer] /unpool9/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.        
INFORMATION: [TIDL_ResizeLayer] /unpool10/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be r
eplaced by 4x4 resize followed by 2x2 resize.                                                                                                                                                                      WARNING: [TIDL_E_DATAFLOW_INFO_NULL] Network compiler returned with error or didn't executed, this model can only be used on PC/Host emulation mode, it is not expected to work on target/EVM.                     
****************************************************                                                     
**          9 WARNINGS          0 ERRORS          **                                                     
****************************************************                                                     
                                                    
                                            
Completed_Model :     1, Name : ss-ort-800k-model-f1-with-argmax                  , Total time :  160886.49, Offload Time :     635.30 , DDR RW MBs : 0, Output File : py_out_ss-ort-800k-model-f1-with-argmax_ADE_
val_00001801.jpg        
                                                                                                         
  
MEM: Deinit ... !!!
MEM: Alloc's: 25 alloc's of 908342445 bytes 
MEM: Free's : 25 free's  of 908342445 bytes 
MEM: Open's : 0 allocs  of 0 bytes 
MEM: Deinit ... Done !!!

Specifically, this line indicates there is a problem: "ERROR : [file:src/gc_map_df_wl.c, func:generateMultipleWLForGroupConv, line:509] Memory limit exceeded for Workload Creation. Max number of Workload Limit per core is 1536"

I'm running all of this inside the GPU-enabled docker container. The image was created using the Dockerfile from the edgeai-tidl-tools repo (https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/scripts/docker/Dockerfile_GPU). The machine that it is running on has 128GB w/an NVIDIA RTX A5000 equipped with 24GB of RAM. Not sure why it's encountering a Memory Limit error.

Any ideas what is causing the compilation issue?

+1 T S over 1 year ago in reply to T S

Intellectual 310 points

Success! The magic parameter in this case is apparently "advanced_options:quantization_scale_type". Setting this to '4' and setting "tensor_bits" to 8 got things to compile and work both on the host and on the target.

For future reference, here's the "optional_options" configuration for my model in the model_configs.py file:

        'optional_options' : 
        {
            'tensor_bits' : 8,
            'advanced_options:quantization_scale_type': 4,
        },

Processors

Processors forum

AM68A: Problems compiling custom semantic segmentation model