This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM68A: Problems compiling custom semantic segmentation model

Part Number: AM68A
Other Parts Discussed in Thread: TDA4VL,

I've got a custom ONNX semantic segmentation model that I'm trying to compile using 09_02_06_00 edgeai-tidl-tools for the AM68A/TDA4VL platform.  I've integrated it into the "model_configs.py" configuration successfully.  I can execute the model inference via `python3 ./onnxrt_ep.py -d -m ss-ort-800k-model-f1` and the resulting output looks good. 

However, when I try to compile the model and run it on my x86 platform via TIDL emulation, the compiled version produces functionally incorrect output and seems to be only reporting a single class for the entire test image (I expect to see at least 3 classes appear).  I ran the model with the "tensor_bits=32" configuration and the resulting output for that was good. 

I suspect the problem may be associated with the Resize operator in the model.  I get warnings about it when I compile it with "tensor_bits=8":

------------------ Network Compiler Traces -----------------------------                                                                                                                                           
NC running for device: 1                                                                                                                                                                                           
Running with OTF buffer optimizations                                                                                                                                                                              
successful Memory allocation                                                                                                                                                                                       
successful Workload Creation                                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool6/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool7/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool8/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool9/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
placed by 4x4 resize followed by 2x2 resize.                                                                                                                                                                       
INFORMATION: [TIDL_ResizeLayer] /unpool10/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be r
eplaced by 4x4 resize followed by 2x2 resize.                                                                                                                                                                      
****************************************************                                                                                                                                                               
**          5 WARNINGS          0 ERRORS          **                                                                                                                                                               
****************************************************                                                                                                                                                               

Looking at the supported operator list (github.com/.../supported_ops_rts_versions.md) shows that only 'symmetric' Resize is supported and my model uses "asymmetric".  I think this may be the problem.  I've tried to add "deny_list":"Resize" to my config, but when I do this and compile, the compilation hangs with "termiante called without an active exception".  Here's what I see:

root@088f260a8b34:/home/root/shared_with_docker/edgeai-tidl-tools/examples/osrt_python/ort# python3 ./onnxrt_ep.py -m ss-ort-800k-model-f1 -c
Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 1 Models - ['ss-ort-800k-model-f1']


Running_Model :  ss-ort-800k-model-f1  


Running shape inference on model /home/root/shared_with_docker/deere-models/800k_model_f1.onnx 

Preliminary subgraphs created = 6 
Final number of subgraphs created are : 6, - Offloaded Nodes - 117, Total Nodes - 122 
floating_model: True
 Graph Domain TO version : 17
 ************** Frame index 1 : Running float import ************* 
****************************************************
**                ALL MODEL CHECK PASSED          **
****************************************************

The soft limit is 2048
The hard limit is 2048
MEM: Init ... !!!
MEM: Init ... Done !!!
 0.0s:  VX_ZONE_INIT:Enabled
 0.5s:  VX_ZONE_ERROR:Enabled
 0.7s:  VX_ZONE_WARNING:Enabled
 0.1397s:  VX_ZONE_INIT:[tivxInit:185] Initialization Done !!!

**********  Frame Index 1 : Running float inference **********
terminate called without an active exception

Why is compilation having issues when I try to deny "Resize"?  Do I have any options apart from changing up the Resize parameter in the original model?  I'm not 100% certain the Resize operator is the source -- I would expect the tool would report a compilation error if it ran into an unsupported operator configuration but I see no such thing.  Are there any other debug steps I can perform to drill down to root cause?

  • I bumped "tensor_bits" to 16 and tried that.  The output looked good, so perhaps the Resize operator is not the culprit?

  • I decided to proceed with the 16-bit version of my compiled model.  The host TIDL emulation output matched what I expected so I copied things over to my AM68A dev board.  When I attempt to run the model on the dev kit, however, I encounter VX_ZONE_ERROR messages that repeat until I interrupt the execution.  See the following:

    root@am68a-sk-amazing-louse:~/edgeai-tidl-tools/examples/osrt_python/ort# python3 ./onnxrt_ep.py -m ss-ort-800k-model-f1-with-argmax                                                                   
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running 1 Models - ['ss-ort-800k-model-f1-with-argmax']
    
    
    Running_Model :  ss-ort-800k-model-f1-with-argmax  
    
    libtidl_onnxrt_EP loaded 0x13707930 
    Final number of subgraphs created are : 1, - Offloaded Nodes - 123, Total Nodes - 123 
    APP: Init ... !!!
    MEM: Init ... !!!
    MEM: Initialized DMA HEAP (fd=5) !!!
    MEM: Init ... Done !!!
    IPC: Init ... !!!
    IPC: Init ... Done !!!
    REMOTE_SERVICE: Init ... !!!
    REMOTE_SERVICE: Init ... Done !!!
      2101.729499 s: GTC Frequency = 200 MHz
    APP: Init ... Done !!!
      2101.729885 s:  VX_ZONE_INIT:Enabled
      2101.730093 s:  VX_ZONE_ERROR:Enabled
      2101.730227 s:  VX_ZONE_WARNING:Enabled
      2101.731159 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-0 
      2101.731662 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-1 
      2101.732067 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-2 
      2101.732503 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-3 
      2101.732705 s:  VX_ZONE_INIT:[tivxInitLocal:136] Initialization Done !!!
      2101.733599 s:  VX_ZONE_INIT:[tivxHostInitLocal:101] Initialization Done for HOST !!!
      2101.781268 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
      2101.781530 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
      2101.781596 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
      2101.781730 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
      2101.781803 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
      2101.781896 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
      2101.781954 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
    TIDL_RT_OVX: ERROR: Verifying TIDL graph ... Failed !!!
    TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
      2102.037263 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
      2102.037312 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
      2102.037347 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
      2102.037375 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
      2102.037399 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
      2102.037428 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
      2102.037451 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
      2102.037527 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
      2102.037554 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
      2102.037572 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
      2102.217000 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
      2102.217049 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
      2102.217067 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
      2102.217080 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
      2102.217091 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
      2102.217111 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
      2102.217122 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
      2102.217177 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
      2102.217188 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
      2102.217199 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
      2102.399082 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
      2102.399128 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
      2102.399148 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
      2102.399159 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
      2102.399172 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
      2102.399193 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
      2102.399204 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
      2102.399260 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
      2102.399272 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
      2102.399282 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
      2102.578604 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
      2102.578649 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
      2102.578668 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
      2102.578680 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
      2102.578693 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
      2102.578712 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
      2102.578722 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
      2102.578780 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
      2102.578791 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
      2102.578800 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
      2102.767335 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
      2102.767381 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
      2102.767395 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
      2102.767406 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
      2102.767417 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
      2102.767435 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
      2102.767444 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
      2102.767495 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
      2102.767505 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
      2102.767511 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
      2102.946385 s:  VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
      2102.946428 s:  VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
      2102.946443 s:  VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
      2102.946456 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
      2102.946466 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
      2102.946484 s:  VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
      2102.946491 s:  VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
      2102.946544 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
      2102.946553 s:  VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
      2102.946560 s:  VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
    ^CTraceback (most recent call last):
      File "/home/root/edgeai-tidl-tools/examples/osrt_python/ort/./onnxrt_ep.py", line 328, in <module>
        run_model(model, mIdx)
      File "/home/root/edgeai-tidl-tools/examples/osrt_python/ort/./onnxrt_ep.py", line 239, in run_model
        imgs, output, proc_time, sub_graph_time, height, width  = infer_image(sess, input_images, config)
      File "/home/root/edgeai-tidl-tools/examples/osrt_python/ort/./onnxrt_ep.py", line 115, in infer_image
        imgs.append(Image.open(image_files[i]).convert('RGB').resize((width, height), PIL.Image.LANCZOS))
      File "/usr/lib/python3.10/site-packages/PIL/Image.py", line 2192, in resize
        return self._new(self.im.resize(size, resample, box))
    KeyboardInterrupt
      2103.147349 s:  VX_ZONE_INIT:[tivxHostDeInitLocal:115] De-Initialization Done for HOST !!!
      2103.151904 s:  VX_ZONE_INIT:[tivxDeInitLocal:204] De-Initialization Done !!!
    APP: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... Done !!!
    IPC: Deinit ... !!!
    IPC: DeInit ... Done !!!
    MEM: Deinit ... !!!
    DDR_SHARED_MEM: Alloc's: 8 alloc's of 56453764 bytes 
    DDR_SHARED_MEM: Free's : 8 free's  of 56453764 bytes 
    DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!
    APP: Deinit ... Done !!!
    

    What is causing this issue? 

    I confirmed that running inference for "ss-8610_onnxrt_ade20k32_edgeai-tv_deeplabv3plus_mobilenetv2_edgeailite_512x512_20210308_outby4_onnx" works on the dev kit, so my baseline setup should be good.

  • I'm not confident that setting "tensor_bits" to 16 actually worked and hence has issues trying to run on the target.  I just noticed this being reported in the last stages when trying to compile the model using onnxrt_ep.py:

     *****************   Calibration iteration number 4 completed ************************                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
    Empty prototxt path, running calibration
                                                                                                             
    ------------------ Network Compiler Traces -----------------------------                                 
    NC running for device: 1
    Running with OTF buffer optimizations
    successful Memory allocation
    ERROR : [file:src/gc_map_df_wl.c, func:generateMultipleWLForGroupConv, line:509] Memory limit exceeded for Workload Creation. Max number of Workload Limit per core is 1536                                        
    Could not open /home/root/shared_with_docker/edgeai-tidl-tools/model-artifacts/ss-ort-800k-model-f1-with-argmax/tempDir/out_tidl_net/perfSimInfo.bin                                                               
    SUGGESTION: [TIDL_BatchNormLayer] /backbone/0/BatchNormalization 16 bits is not optimal in this release. 
    SUGGESTION: [TIDL_BatchNormLayer] /backbone/1/BatchNormalization 16 bits is not optimal in this release.
    SUGGESTION: [TIDL_BatchNormLayer] /backbone/2_1/BatchNormalization 16 bits is not optimal in this release.                                                                                                         
    INFORMATION: [TIDL_ResizeLayer] /unpool6/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
    placed by 4x4 resize followed by 2x2 resize.
    INFORMATION: [TIDL_ResizeLayer] /unpool7/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
    placed by 4x4 resize followed by 2x2 resize.
    INFORMATION: [TIDL_ResizeLayer] /unpool8/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
    placed by 4x4 resize followed by 2x2 resize.        
    INFORMATION: [TIDL_ResizeLayer] /unpool9/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be re
    placed by 4x4 resize followed by 2x2 resize.        
    INFORMATION: [TIDL_ResizeLayer] /unpool10/Resize Any resize ratio which is power of 2 and greater than 4 will be placed by combination of 4x4 resize layer and 2x2 resize layer. For example a 8x8 resize will be r
    eplaced by 4x4 resize followed by 2x2 resize.                                                                                                                                                                      WARNING: [TIDL_E_DATAFLOW_INFO_NULL] Network compiler returned with error or didn't executed, this model can only be used on PC/Host emulation mode, it is not expected to work on target/EVM.                     
    ****************************************************                                                     
    **          9 WARNINGS          0 ERRORS          **                                                     
    ****************************************************                                                     
                                                        
                                                
    Completed_Model :     1, Name : ss-ort-800k-model-f1-with-argmax                  , Total time :  160886.49, Offload Time :     635.30 , DDR RW MBs : 0, Output File : py_out_ss-ort-800k-model-f1-with-argmax_ADE_
    val_00001801.jpg        
                                                                                                             
      
    MEM: Deinit ... !!!
    MEM: Alloc's: 25 alloc's of 908342445 bytes 
    MEM: Free's : 25 free's  of 908342445 bytes 
    MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!
    

    Specifically, this line indicates there is a problem: "ERROR : [file:src/gc_map_df_wl.c, func:generateMultipleWLForGroupConv, line:509] Memory limit exceeded for Workload Creation. Max number of Workload Limit per core is 1536"

    I'm running all of this inside the GPU-enabled docker container.  The image was created using the Dockerfile from the edgeai-tidl-tools repo (https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/scripts/docker/Dockerfile_GPU).  The machine that it is running on has 128GB w/an NVIDIA RTX A5000 equipped with 24GB of RAM.  Not sure why it's encountering a Memory Limit error.

    Any ideas what is causing the compilation issue?  

  • Success!  The magic parameter in this case is apparently "advanced_options:quantization_scale_type".  Setting this to '4' and setting "tensor_bits" to 8 got things to compile and work both on the host and on the target.

    For future reference, here's the "optional_options" configuration for my model in the model_configs.py file:

            'optional_options' : 
            {
                'tensor_bits' : 8,
                'advanced_options:quantization_scale_type': 4,
            },