SK-AM62A-LP: How to convert a model trained with a custom dataset

Part Number: SK-AM62A-LP

Tool/software:

Hello,

I contact you because I want to convert a model from the zoo with my own database. My dataset is a multi-class dataset, i.e. I have several labels (more than 3 labels) per image. I have a checkpoint obtained with a pytorch model (resnet50). Can I use directly these weights?

I succeed to run the example given in the github https://github.com/TexasInstruments/edgeai-tensorlab/tree/main?tab=readme-ov-file

But, in all examples, it's only 1 label per image and I don't know how and where I can change it. Do you have some suggestions?

I'm using the branch r9.1 of the github.

Thanks,

Anaïs

  • Hello Anaïs,

    Good question. I'm interpreting this to mean you have one image, and you want 3 separate labels, such that you are effectively running 3 classifiers on the same input. Is this correct? This implies your model would have multiple outputs.

    The main resnet model from our model zoo would have 1x1000 (or 1x1001) for classifying 1000 separate classes. You can certainly modify a model like this to have multiple outputs, and you can start from the same set of pretrained weights (PTH file). This will require you to dig into some of the training code

     

    Which example, exactly? edgeai-tensorlab organizes several of our repos into one to resolve some challenges with versioning and dependencies. Typically edgeai-modelmaker is used as the top-level tool for training, but this has limited models supported (doesn't include resnet50 unfortunately)

    Your task looks something like this:

    Summarized, you need to change the model and dataset-handling to allow 3 outputs.

    Edit: I recently posted an FAQ on our model zoo and supporting tools. Your case is an extension on comment #3, in which you are working with a model-zoo model that TI has not actually modified. 

    BR,
    Reese

  • Hello Reese,

    Thank you for your response. I am currently facing an issue in understanding how to compile my ONNX model to generate the model artifact folder. I followed your advice and am now using the GitHub repository

    https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/custom_model_evaluation.md .

    I was able to successfully install the Docker for edgeai-tidl-tools.

    From what I understand, the compilation steps are as follows:

    "[...]

    • Update the inference script to compile the model with TIDL acceleration by passing required compilation options. Refer here for detailed documentation on all the required and optional parameters.
    • Run the python code with compilation options using representative input data samples for model compilation and calibration.
      • Default options expects minimum 20 input data samples (calibration_frames) for calibration. User can set as minimum as 1 also for quick model compilation (This may impact the accuracy of fixed point inference).
    • At the end of model compilation step, model-artifacts for inference will be generated in user specified path.
    • Create OSRT inference session with TIDL acceleration option for running inference with generated model artifacts in the above step.
      • User can either update existing python code written for compilation or copy the compilation code to new file and update with accelerated inference option.
    • Refer the below tables for creating OSRT sessions with Compilation and Accelerated inference options.

    "

    However, I am unsure which script I need to run for the compilation step. The first point is not very clear to me. Is it /home/root/examples/osrt_python/ort/onnrt_ep.py ? If so, I’m encountering an error when trying to run this script: "AttributeError: 'InferenceSession' object has no attribute 'get_TI_benchmark_data'".

    Could you confirm whether I am working with the correct Python script?

    Best regards,

    Anaïs

  • Hello Anaïs,

    However, I am unsure which script I need to run for the compilation step. The first point is not very clear to me. Is it /home/root/examples/osrt_python/ort/onnrt_ep.py ? If so, I’m encountering an error when trying to run this script: "AttributeError: 'InferenceSession' object has no attribute 'get_TI_benchmark_data'".

    Yes, you are on the right track. This is the correct script to use within edgeai-tidl-tools. You can run this script with -c option to compile, -d option to run on CPU (so no TIDL in any form), or neither to run with TIDL (including emulation of C7x if you're on x86 PC).

    'InferenceSession' object has no attribute 'get_TI_benchmark_data'".

    You may be encountering some similar issue as another active thread: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1458023/processor-sdk-am62a-edgeai-tidl-tool-compile-error

    The get_TI_benchmark_data is a function only available in the TIDL version of onnxruntime. Perhaps you have the main upstream version of onnxruntime installed too. Can you show me output of the following command?

    pip3 freeze | grep -i "onnx"

    For example, my python3.10 virtual environment for TIDL 9.2 looks like the following

    caffe2onnx==1.0.2
    onnx==1.13.0
    onnx_graphsurgeon @ git+https://github.com/NVIDIA/TensorRT@68b5072fdb9df6b6edab1392b02a705394b2e906#subdirectory=tools/onnx-graphsurgeon
    onnxruntime-tidl @ file:///home/reese/1-edgeai/1-ti-tools/1-tidl-tools/10.0-tidl-tools/onnxruntime_tidl-1.14.0%2B10000000-cp310-cp310-linux_x86_64.whl#sha256=5efb894e39d3ca988e0644a1d0e9e34eab34c1a1f374d0085b9900febbb9724d
    onnxsim==0.4.35
    -e git+https://github.com/TexasInstruments/edgeai-tidl-tools@b7b07738bcd9afc7f74580217e81c307668a84ed#egg=tidl_onnx_model_optimizer&subdirectory=scripts/osrt_model_tools/onnx_tools/tidl-onnx-model-optimizer

    Yours should have only an onnxruntime-tidl, and not an ordinary onnxruntime. I recommend a virutal environment to keep our version of onnxruntime separate. Otherwise, I think default import will choose upstream onnxruntime. 

    BR,
    Reese

  • Hello Reese,

    I'm glad to hear that I'm on the right track, thank you!
    I'm using Docker to run the scripts and I successfully installed onnxruntime-tidl:

    However, I'm still encountering an issue when running the onnxrt_ep.py  script.

    I have exported the TIDL_TOOLS_PATH, and when I check the directory, I can see the libtidl_onnxrt_EP.so file. Additionally, when I print all the environment variables using os.environ, the correct path for TIDL is shown:


    Do you have any suggestions?
    Thanks,

    Anaïs

  • I found the solution, I needed to export the LD_LIBRARY_PATH

  • I'm reaching out because the inference code runs perfectly on my dataset and custom model. However, when I add the compile argument, I encounter a segmentation fault.

    Do you know what might be causing this?

    Thanks,

    Anaïs

  • Hi Anaïs,

    Glad you were able to resolve some of the pathing issues above -- you found the right solution.

    However, when I add the compile argument, I encounter a segmentation fault.

    Hmm, hard to say based on these logs. I cannot tell at what point this failed

    It looks like when you run without any option and it tries to use TIDL for inference, it is not finding the right files, such that the whole network is running on CPU. Some of the printouts in the line before your compile command seem odd, like having 32687 subgraphs for your model (ideally 1, but 16 at maximum due to SW limiter). Perhaps you tried to compile before and it failed, but still produced a few intermediate files... otherwise I'd have expected your initial inference to fail immediately due to missing artifacts.

    I'll need more logs for the compile command to suggest a solution. Please run your compilation with the following settings and share the log -- ideally, also share the artifacts, especially the SVG's under artifacts/tempDir.

    • export TIDL_RT_DEBUG=1 #in linux env
    • "debug_level":2 #either in 'optional_options' as part of your model_config or by setting global variable in common_utils.py

    It might also be informative to run the compile command through gdb, and share the callstack / backtrace ('bt' in gdb shell) to see where we hit this seg fault.

    BR,
    Reese

  • Hello Reese,

    Thank you for your response. Thanks to your advice, I was able to notice that I made a mistake in a folder path to save the artifacts. However, I am still encountering a segmentation fault, and here are the logs:

    error_compile_tidl.txt
    root@908b5978fca7:/home/root/examples/osrt_python/ort# gdb --args python3 onnxrt_ep.py --compile
    GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
    Copyright (C) 2022 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    Type "show copying" and "show warranty" for details.
    This GDB was configured as "x86_64-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <https://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
        <http://www.gnu.org/software/gdb/documentation/>.
    
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from python3...
    (No debugging symbols found in python3)
    (gdb) run
    Starting program: /usr/bin/python3 onnxrt_ep.py --compile
    warning: Error disabling address space randomization: Operation not permitted
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    [New Thread 0x75e69e000640 (LWP 184)]
    [New Thread 0x75e69b600640 (LWP 185)]
    [New Thread 0x75e69ac00640 (LWP 186)]
    [New Thread 0x75e698200640 (LWP 187)]
    [New Thread 0x75e693800640 (LWP 188)]
    [New Thread 0x75e692e00640 (LWP 189)]
    [New Thread 0x75e68e400640 (LWP 190)]
    [New Thread 0x75e68ba00640 (LWP 191)]
    [New Thread 0x75e689000640 (LWP 192)]
    [New Thread 0x75e686600640 (LWP 193)]
    [New Thread 0x75e683c00640 (LWP 194)]
    [New Thread 0x75e683200640 (LWP 195)]
    [New Thread 0x75e67e800640 (LWP 196)]
    [New Thread 0x75e67be00640 (LWP 197)]
    [New Thread 0x75e679400640 (LWP 198)]
    [New Thread 0x75e676a00640 (LWP 199)]
    [New Thread 0x75e674000640 (LWP 200)]
    [New Thread 0x75e673600640 (LWP 201)]
    [New Thread 0x75e670c00640 (LWP 202)]
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    /home/root/model-artifacts/model
    
    Running shape inference on model model 
    
    [New Thread 0x75e65f800640 (LWP 203)]
    [New Thread 0x75e65ee00640 (LWP 204)]
    [New Thread 0x75e65e400640 (LWP 205)]
    [New Thread 0x75e65da00640 (LWP 206)]
    [New Thread 0x75e65d000640 (LWP 207)]
    [New Thread 0x75e657e00640 (LWP 208)]
    [New Thread 0x75e657400640 (LWP 209)]
    [New Thread 0x75e656a00640 (LWP 210)]
    [New Thread 0x75e656000640 (LWP 211)]
    [New Thread 0x75e655600640 (LWP 212)]
    [New Thread 0x75e654c00640 (LWP 213)]
    [New Thread 0x75e64be00640 (LWP 214)]
    [New Thread 0x75e64b400640 (LWP 215)]
    tidl_tools_path                                 = /home/root/tidl_tools 
    artifacts_folder                                = /home/root/model-artifacts/model 
    tidl_tensor_bits                                = 8 
    debug_level                                     = 2 
    num_tidl_subgraphs                              = 16 
    tidl_denylist                                   = 
    tidl_denylist_layer_name                        = 
    tidl_denylist_layer_type                         = 
    tidl_allowlist_layer_name                        = 
    model_type                                      =  
    tidl_calibration_accuracy_level                 = 7 
    tidl_calibration_options:num_frames_calibration = 2 
    tidl_calibration_options:bias_calibration_iterations = 5 
    mixed_precision_factor = -1.000000 
    model_group_id = 0 
    power_of_2_quantization                         = 2 
    ONNX QDQ Enabled                                = 0 
    enable_high_resolution_optimization             = 0 
    pre_batchnorm_fold                              = 1 
    add_data_convert_ops                          = 3 
    output_feature_16bit_names_list                 =  
    m_params_16bit_names_list                       =  
    reserved_compile_constraints_flag               = 1601 
    ti_internal_reserved_1                          = 
    
    
     ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******
    
    Supported TIDL layer type --- [...]
    
    Preliminary subgraphs created = 1 
    Final number of subgraphs created are : 1, - Offloaded Nodes - 124, Total Nodes - 124 
    [Detaching after vfork from child process 216]
    Running runtimes graphviz - /home/root/tidl_tools/tidl_graphVisualiser_runtimes.out /home/root/model-artifacts/model/allowedNode.txt /home/root/model-artifacts/model/tempDir/graphvizInfo.txt /home/root/model-artifacts/model/tempDir/runtimes_visualization.svg 
    *** In TIDL_createStateImportFunc *** 
    Compute on node : TIDLExecutionProvider_TIDL_0_0
      [...]
    
    Input tensor name -  input 
    Output tensor name - 501 
    Output tensor name - output 
    Output tensor name - 499 
    [New Thread 0x75e63be00640 (LWP 221)]
    [New Thread 0x75e63b400640 (LWP 222)]
    [New Thread 0x75e63aa00640 (LWP 223)]
    [New Thread 0x75e63a000640 (LWP 224)]
    [New Thread 0x75e639600640 (LWP 225)]
    [New Thread 0x75e638c00640 (LWP 226)]
    [New Thread 0x75e628200640 (LWP 227)]
    [New Thread 0x75e627800640 (LWP 228)]
    [New Thread 0x75e626e00640 (LWP 229)]
    [New Thread 0x75e626400640 (LWP 230)]
    [New Thread 0x75e625a00640 (LWP 231)]
    [New Thread 0x75e625000640 (LWP 232)]
    [New Thread 0x75e624600640 (LWP 233)]
    [New Thread 0x75e623c00640 (LWP 234)]
    [New Thread 0x75e623200640 (LWP 235)]
    [New Thread 0x75e622800640 (LWP 236)]
    [New Thread 0x75e621e00640 (LWP 237)]
    [New Thread 0x75e621400640 (LWP 238)]
    [New Thread 0x75e620a00640 (LWP 239)]
     Graph Domain TO version : 11In TIDL_onnxRtImportInit subgraph_name=499output501
    Layer 0, subgraph id 499output501, name=501
    Layer 1, subgraph id 499output501, name=output
    Layer 2, subgraph id 499output501, name=499
    Layer 3, subgraph id 499output501, name=input
    In TIDL_runtimesOptimizeNet: LayerIndex = 128, dataIndex = 125 
    WARNING: [...]
    WARNING: [...]
    WARNING: [...]
    
     ************** Frame index 1 : Running float import ************* 
    In TIDL_runtimesPostProcessNet 
    In TIDL_runtimesPostProcessNet 1
    In TIDL_runtimesPostProcessNet 2
    In TIDL_runtimesPostProcessNet 3
    [Detaching after vfork from child process 240]
    [Detaching after vfork from child process 242]
    ****************************************************
    **                ALL MODEL CHECK PASSED          **
    ****************************************************
    
    In TIDL_runtimesPostProcessNet 4
    ************ in TIDL_subgraphRtCreate ************ 
     TIDL_RT_OVX: Set default TIDLRT params done
    Calling appInit() in TIDL-RT!
    The soft limit is 2048
    The hard limit is 2048
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INIT:Enabled
     0.5s:  VX_ZONE_ERROR:Enabled
     0.7s:  VX_ZONE_WARNING:Enabled
    [New Thread 0x75e616a00640 (LWP 249)]
    [New Thread 0x75e616000640 (LWP 250)]
    [New Thread 0x75e615600640 (LWP 251)]
    [New Thread 0x75e614c00640 (LWP 252)]
    [New Thread 0x75e614200640 (LWP 253)]
    [New Thread 0x75e613800640 (LWP 254)]
    [New Thread 0x75e612e00640 (LWP 255)]
    [New Thread 0x75e612400640 (LWP 256)]
    [New Thread 0x75e611a00640 (LWP 257)]
    [New Thread 0x75e611000640 (LWP 258)]
    [New Thread 0x75e610600640 (LWP 259)]
    [New Thread 0x75e60fc00640 (LWP 260)]
    [New Thread 0x75e60f200640 (LWP 261)]
    [New Thread 0x75e60e800640 (LWP 262)]
    [New Thread 0x75e60de00640 (LWP 263)]
    [New Thread 0x75e60d400640 (LWP 264)]
    [New Thread 0x75e60ca00640 (LWP 265)]
    [New Thread 0x75e60c000640 (LWP 266)]
    [New Thread 0x75e60b600640 (LWP 267)]
    [New Thread 0x75e60ac00640 (LWP 268)]
    [New Thread 0x75e60a200640 (LWP 269)]
    [New Thread 0x75e609800640 (LWP 270)]
    [New Thread 0x75e608e00640 (LWP 271)]
    [New Thread 0x75e608400640 (LWP 272)]
     0.6024s:  VX_ZONE_INIT:[tivxInit:185] Initialization Done !!!
    TIDL_RT_OVX: Init ... 
    TIDL_RT_OVX: Mapping config file ...
    TIDL_RT_OVX: Mapping config file ... Done. 37912 bytes
    TIDL_RT_OVX: Tensors, input = 1, output = 3
    Host kernel - 0x75e649c0f658 
    TIDL_RT_OVX: Mapping network file
    TIDL_RT_OVX: Mapping network file... Done 97299008 bytes
    TIDL_RT_OVX: Init done.
    TIDL_RT_OVX: Creating graph ... 
    TIDL_RT_OVX: input_sizes[0] = 896, dim = 224 padL = 0 padR = 0
    TIDL_RT_OVX: input_sizes[1] = 200704, dim = 224 padT = 0 padB = 0
    TIDL_RT_OVX: input_sizes[2] = 3, dim = 3 
    TIDL_RT_OVX: input_sizes[3] = 1, dim = 1 
    TIDL_RT_OVX: input_buffer = 0x75e683232000 150528
    TIDL_RT_OVX: Creating graph ... Done.
    
    --------------------------------------------
    TIDL Memory size requiement (record wise):
    MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr     
    0           , DDR Cacheable       , Persistent  ,  128, 15.25   , 0x00000000
    1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0x00000000
    2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0x00000000
    3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0x00000000
    4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0x00000000
    5           , DDR Cacheable       , Persistent  ,  128, 930.75  , 0x00000000
    6           , DDR Cacheable       , Scratch     ,  128, 34549.12, 0x00000000
    7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0x00000000
    8           , DDR Cacheable       , Scratch     ,  128, 4873.25 , 0x00000000
    9           , DDR Cacheable       , Scratch     ,  128, 6500.50 , 0x00000000
    10          , DDR Cacheable       , Persistent  ,  128, 929.20  , 0x00000000
    11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00000000
    12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0x00000000
    13          , DDR Cacheable       , Persistent  ,  128, 95018.69, 0x00000000
    14          , DDR Cacheable       , Persistent  ,  128, 0.08    , 0x00000000
    --------------------------------------------
    Total memory size requirement (space wise):
    Mem Space , Size(KBytes)
    DDR Cacheable, 143405.98
    --------------------------------------------
    NOTE: Memory requirement in host emulation can be different from the same on EVM
          To get the actual TIDL memory requirement make sure to run on EVM with 
          debugTraceLevel = 2
    
    --------------------------------------------
    TIDL init call from ivision API 
    
    --------------------------------------------
    TIDL Memory size requiement (record wise):
    MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr     
    0           , DDR Cacheable       , Persistent  ,  128, 15.25   , 0x9ec8e000
    1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0xa1b89000
    2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0x9e009000
    3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0xa1b88000
    4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0x92e34000
    5           , DDR Cacheable       , Persistent  ,  128, 930.75  , 0x7be17000
    6           , DDR Cacheable       , Scratch     ,  128, 34549.12, 0xffd77000
    7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0xa1940000
    8           , DDR Cacheable       , Scratch     ,  128, 4873.25 , 0x5c33d000
    9           , DDR Cacheable       , Scratch     ,  128, 6500.50 , 0xff71d000
    10          , DDR Cacheable       , Persistent  ,  128, 929.20  , 0x79417000
    11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x86609000
    12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0xa193f000
    13          , DDR Cacheable       , Persistent  ,  128, 95018.69, 0xf9a52000
    14          , DDR Cacheable       , Persistent  ,  128, 0.08    , 0xa1098000
    --------------------------------------------
    Total memory size requirement (space wise):
    Mem Space , Size(KBytes)
    DDR Cacheable, 143405.98
    --------------------------------------------
    NOTE: Memory requirement in host emulation can be different from the same on EVM
          To get the actual TIDL memory requirement make sure to run on EVM with 
          debugTraceLevel = 2
    
    --------------------------------------------
    Alg Init for Layer # -    1
    [...]
    
    PREEMPTION: Adding a new priority object for targetPriority = 0, handle = 0x75e69ec8e000
    PREEMPTION: Now total number of priority objects = 1 at priorityId = 0,    with new memRec of base = 0x75e6a193f000 and size = 128
    PREEMPTION: Requesting context memory addr for handle 0x75e69ec8e000, return Addr = 0x75e64aa4e678
    TIDL_RT_OVX: Verifying TIDL graph ... Done.
    ************ TIDL_subgraphRtCreate done ************ 
     *******   In TIDL_subgraphRtInvoke  ******** 
    TIDL_RT_OVX: Set default TIDLRT tensor done
    TIDL_RT_OVX: Set default TIDLRT tensor done
    TIDL_RT_OVX: Set default TIDLRT tensor done
    TIDL_RT_OVX: Set default TIDLRT tensor done
    TIDL_RT_OVX: Running Graph ... 
    TIDL_RT_OVX: input_sizes[0] = 896, dim = 224 padL = 0 padR = 0
    TIDL_RT_OVX: input_sizes[1] = 200704, dim = 224 padT = 0 padB = 0
    TIDL_RT_OVX: input_sizes[2] = 3, dim = 3 
    TIDL_RT_OVX: input_sizes[3] = 1, dim = 1 
    TIDL_RT_OVX : Memcpy Input Buffer 
    TIDL_RT_OVX: input_buffer = 0x75e683232000 150528
    TIDL_RT_OVX: memset_out_tensor_tidlrt_tiovx  ... Done.
    TIDL_activate is called with handle : 9ec8e000 
    Core 0 Alg Process for Layer # [...]
    
    Thread 53 "python3" received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0x75e616a00640 (LWP 249)]
    0x000075e64889109d in void TIDL_refInnerProductParamBitDepth<float, float, float>(TIDL_Obj*, int, void*, void*, void*, float*, float*, float*, int, tidlInnerProductBuffParams_t*) [clone .isra.0] () from /home/root/tidl_tools/libvx_tidl_rt.so
    
    (gdb) backtrace
    #0  0x000075e64889109d in void TIDL_refInnerProductParamBitDepth<float, float, float>(TIDL_Obj*, int, void*, void*, void*, float*, float*, float*, int, tidlInnerProductBuffParams_t*) [clone .isra.0] ()
       from /home/root/tidl_tools/libvx_tidl_rt.so
    #1  0x000075e648896e75 in TIDL_innerProductRefProcess(TIDL_Obj*, sTIDL_AlgLayer_t*, sTIDL_Layer_t*, sTIDL_InnerProductParams_t*, tidlInnerProductBuffParams_t*, void*, void*, void*) ()
       from /home/root/tidl_tools/libvx_tidl_rt.so
    #2  0x000075e648897f2c in TIDL_innerProductProcessNew(TIDL_NetworkCommonParams*, sTIDL_AlgLayer_t*, sTIDL_Layer_t*, void**, void**, int) () from /home/root/tidl_tools/libvx_tidl_rt.so
    #3  0x000075e6488dbae2 in WorkloadRefExec_Process(TIDL_Obj*, TIDL_NetworkCommonParams*, sWorkloadUnit_t*, sTIDL_AlgLayer_t*, sTIDL_Layer_t*, void**, void**, int, int) ()
       from /home/root/tidl_tools/libvx_tidl_rt.so
    #4  0x000075e648838904 in TIDL_process(IVISION_Obj*, IVISION_BufDescList*, IVISION_BufDescList*, IVISION_InArgs*, IVISION_OutArgs*) () from /home/root/tidl_tools/libvx_tidl_rt.so
    #5  0x000075e648835f7a in tivxKernelTIDLProcess () from /home/root/tidl_tools/libvx_tidl_rt.so
    #6  0x000075e648824411 in ownTargetKernelExecute () from /home/root/tidl_tools/libvx_tidl_rt.so
    #7  0x000075e648822bb7 in ownTargetNodeDescNodeExecuteTargetKernel () from /home/root/tidl_tools/libvx_tidl_rt.so
    #8  0x000075e6488235d9 in ownTargetTaskMain () from /home/root/tidl_tools/libvx_tidl_rt.so
    #9  0x000075e64883137c in tivxTaskMain () from /home/root/tidl_tools/libvx_tidl_rt.so
    #10 0x000075e6a1efcac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
    #11 0x000075e6a1f8da04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
    
    

    It's coming from libvx_tidl_rt.so (the file exists in my tidl-tools path). But when I look in the output path (artifacts folder), several files/folders have been created:

    Does this mean that the compilation has still been completed and that I can use this model on the AM62A board?

    Thanks,

    Anaïs

  • In my previous message, I forgot to mention that I have hidden the architecture details in the log file. I replaced the architecture details with [...].

    Best regards,

    Anaïs

  • Hello,

    Thanks for the information and screenshots, very helpful. The backtrace especially tells me this is happening fairly deep within the TIDL import tool.

    Does this mean that the compilation has still been completed and that I can use this model on the AM62A board?

    No, looks like compilation did not complete. Those tempDir files are a working directory and once compilation completes, a few of the files will be copied back up into the artifacts/ directory. Sometimes these intermediate binaries are sufficient, but I am doubtful that is the case here.

    From the logs, TIDL hit an error during compilation while trying to run the floating-point implementation of an InnerProduct (or similar matrix multiplication) layer. This is part of the calibration and quantization process. It's hard to immediately say why this failed

    Core 0 Alg Process for Layer # [...]

    Is this the last layer that ran, layer 0? It may help to open the ...tidl_net.bin.svg in a browser and send a screenshot of the last layer with print for "Alg Process for Layer #". You can hover your mouse over the node -- this will provide more info on that layer only. I'm looking for something like the image from this link: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/tidl_osr_debug.md#example-visualization-1. Feel free to edit out anything that may expose more details than you are comfortable with.

    I will take the opportunity to note that the 10.0 SDK and TIDL tools made many improvements to robustness and logging during compilation and inference. If you can, I would suggest upgrading. It is quite likely the issue you are seeing has been resolved in more recent release.

    BR,
    Reese

  • Hello Reese,

    Thanks for your answer. I’ve hidden the architecture, but the last layer to run is: Core 0 Alg Process Layer # - 77.
    It’s a dense layer, and the information in the file ...tidl_net.bin.svg for this layer is:

    last_layer_executed.txt
    Layer 77: TIDL_InnerProductLayer "output_netFormat"
    weightdElementSizeInBits=32
    multiCoreMode=TIDL_NOT_MULTI_CORE
    strideOffsetMethod=TIDL_StrideOffsetTopLeft
    activationType=0 numInRows=1 numInCols=2048 numOutCols=6 transA=0 transB=1
    weightsQ=0 weightScale=1.000000 zeroWeightValue=0
    biasSca;e=1.000000 biasQ=0 inDataQ=0 interDataQ=0
    biasB=0
    weights:0x5ca6a40 bias:0x5cb2a40
    actParams:
       actType=TIDL_NoAct
       slopeScale=1.000000 clipMin/Max=(0.000000,0.000000)
    Inputs:
       [75][2]
    Outputs:
       [77] numDim=0 dims=[1,1,1,2048,1,6] elementType=TIDL_SinglePrecFloat padH/W=[0,0] batchPadH/W=[0,0] numBatchH/W=[1,1]
    pitch=[12288,12288,12288,6,6]
       dataQ=0 roundBits=0
       min/maxValue=(0,0) min/maxTensorValue=(0.000000,0.000000)
       tensorScale=1.000000
    


    However, I noticed that the shape has too many dimensions. I’m not sure why, but since the input, the dimensions are [1,1,1,3,224,224]. The first two dimensions are never present usually. Do you think the error could be related to this?

    Regarding SDK 10.0, we can't use it. We’ve already tried it, but there are some incompatibilities with Python or certain libraries.


    Best regards,

    Anaïs

  • Hello Anaïs,

    Reese is out this week and won't be able to respond until next week.

    Regards,

    Jianzhong

  • Hi Anaïs,

    Thanks for your patience while I was out.

    Is this the last layer of your network? I notice this is called 'output_netFormat', and the _netFormat is something I know TIDL will sometimes add to a name. 

    • Similarly, is this layer an output to the model? Is that output also used as an input to another layer?

    I see that the data type is weightdElementSizeInBits=32, and the output datatype is similarly TIDL_SinglePrecFloat, so we're in floating point at this stage. Ordinarily, a model should have those weights as 8 or 16, and output type is Char or Short, depending on the quantization mode. This supports my theory that the model-import process is failing during initial phase of calibration, in which it runs in 32-bit mode.

    The 6 dimensions can sometimes cause an issue when there are layers that need to run on Arm (unaccelerated), followed by more layers on C7x with TIDL for acceleration. We use 6-D representations (and several other variables like pitch) to program the accelerators data-movement mechanisms, which inherently support 6D. I don't think this is the issue, though.

    Ignoring those first two [1,1,...], are the other dimensions consistent with your model? I have seen error modes in which intermediate tensor shapes are wrong (visible in the SVG), which causes issues later on. Looks like the output should be [2048,1,6] in the original model.

    I would also suggest trying to deny-list this layer to verify it is the offender. See doc here: 

    If you are comfortable, you can also share a version of your model. Random weights are okay. You can also share with me via direct message to protect IP. Alternatively, sharing screenshot of the configuration + tensor input/output shapes for this failing layer 77.

    BR,

    Reese

  • Hello Reese,

    Thanks for you answer. I'm ok to send you a ONNX model in direct message. I think it will be easier for us to move forward.
    Thanks,

    Anaïs

  • Understood, I've sent you a message to kick off the process. Please share your model and the model_config python code associated with it. Thanks!