TDA4VM: ONNX Runtime + TIDL: Issue on Element WIse Add operator

F_Fontana

Part Number: TDA4VM

Goodmorning,

I'm testing ONNXRuntime with support for C7x DSP released in the latest SDK v7.3.
I'm facing an issue with a model that works fine on standalone TIDL runtime, but fails during conversion using OnnxRuntime (with TIDL patch) in Python with the following error:

Traceback (most recent call last):
File "/home/ubuntu/OnnxRT.py", line 105, in setup_onnx_session
sess_options=session_options)
File "/home/ubuntu/texas/onnxrt_tidl/capi/onnxruntime_inference_collection.py", line 283, in __init__
self._create_inference_session(providers, provider_options)
File "/home/ubuntu/texas/onnxrt_tidl/capi/onnxruntime_inference_collection.py", line 315, in _create_inference_session
sess.initialize_session(providers, provider_options)
[libprotobuf FATAL /home/a0393754/work/7_03_00_03/protobuf-3.11.3/src/google/protobuf/repeated_field.h:1537] CHECK failed: (index) < (current_size_):
2021-05-14 15:37:53.380394760 [E:onnxruntime:, inference_session.cc:1311 operator()] Exception during initialization: CHECK failed: (index) < (current_size_):

I discovered that the error is triggered by a specific Add operator (the model contains 5 EltWise Adds, but only the last one triggers it), but I'm not able to understand why, since it seems like the other ones and if I convert the model with standalone TIDL ModelImport tool, it works fine.

Here you will find also the ONNX Session configuration I'm using:

required_options = {
"tidl_tools_path": <path_to_tidl_tool_dir>,
"artifacts_folder": <path_to_artifacts_dir>
}
optional_options = {
"platform": "J7",
"version": "7.3",
"tensor_bits": 8,
"debug_level": 1,
"max_num_subgraphs": 16,
"accuracy_level": 0,
"advanced_options:calibration_frames": 1,
"advanced_options:calibration_iterations": 3,
"advanced_options:quantization_scale_type": 1,
"advanced_options:high_resolution_optimization": 0,
"advanced_options:pre_batchnorm_fold": 0,
"ti_internal_nc_flag": 1601,
}

I was able to extract the operator in a standalone ONNX model, you can find it attached and it should allow you to reproduce the issue.

Is that problem linked to the operator itself (any unsupported parameters combination for example) or is there an issue in Onnx runtime? As said above, the model (and even the Add operator alone) are successfully converted by TIDL ImportTool.

If you need any other information or file, let me know.

onnxrt_add_issue.zip

Thanks in advance,
Federico

over 5 years ago

0 Anand Pathak over 5 years ago

TI__Genius 9065 points

Hi Federico,

Can you share the entire network you are trying to run?

Regards,

Anand

0 F_Fontana over 5 years ago in reply to Anand Pathak

Prodigy 140 points

Hi Anand,

Sorry for the delay, I can't still share the entire model, but I can share a bigger portion at least (if it can be helpful for you).

I'll share it as soon as possible

Regards,

Federico

0 Anand Pathak over 5 years ago in reply to F_Fontana

TI__Genius 9065 points

Yes Federico, that will be helpful.

Regards,

Anand

0 F_Fontana over 5 years ago in reply to Anand Pathak

Prodigy 140 points

Hi Anand,

sorry for the delay, I'm attaching the new model showing the same error (see below), again it works fine with TIDL ImportTool

onnxrt_dim_issue_ext.zip

0.0s: VX_ZONE_INIT:Enabled
0.23s: VX_ZONE_ERROR:Enabled
0.36s: VX_ZONE_WARNING:Enabled

tidl_tools_path = /home/ubuntu/TIDL/tidl_tools
artifacts_folder = /home/ubuntu/models/onnxrt_tidl_artifacts
tidl_tensor_bits = 8
debug_level = 1
num_tidl_subgraphs = 16
tidl_denylist =
tidl_calibration_accuracy_level = 0
tidl_calibration_options:num_frames_calibration = 1
tidl_calibration_options:bias_calibration_iterations = 1
power_of_2_quantization = 3
enable_high_resolution_optimization = 0
pre_batchnorm_fold = 0
output_feature_16bit_names_list =
m_params_16bit_names_list =
reserved_compile_constraints_flag = 1601
Parsing ONNX Model
model_proto 0x7fff6c7312c0

Preliminary subgraphs created = 1
Final number of subgraphs created are : 1, - Offloaded Nodes - 14, Total Nodes - 14
Compile TIDLExecutionProvider_TIDL_0_0
Compiling Sub ONNX Model
0, Conv, 2, 1, 461, 462
1, Conv, 2, 1, 438, 439
2, Conv, 2, 1, 432, 433
3, Resize, 3, 1, 433, 449
4, Add, 2, 1, 449, 450
5, BatchNormalization, 5, 1, 450, 451
6, Relu, 1, 1, 451, 452
7, Conv, 3, 1, 452, 454
8, Relu, 1, 1, 454, 455
9, Conv, 2, 1, 455, 456
10, Resize, 3, 1, 456, 472
11, Add, 2, 1, 472, 473
12, BatchNormalization, 5, 1, 473, 474
13, Relu, 1, 1, 474, 475
Input tensor name - 461
Input tensor name - 449
[libprotobuf FATAL /home/a0393754/work/7_03_00_03/protobuf-3.11.3/src/google/protobuf/repeated_field.h:1537] CHECK failed: (index) < (current_size_):
2021-05-27 10:49:04.383249780 [E:onnxruntime:, inference_session.cc:1311 operator()] Exception during initialization: CHECK failed: (index) < (current_size_):
Traceback (most recent call last):
File "/home/ubuntu/onnxrt.py", line 105, in setup_onnx_session
sess_options=session_options)
File "/home/ubuntu/TIDL/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
self._create_inference_session(providers, provider_options)
File "/home/ubuntu/TIDL/onnxruntime/capi/onnxruntime_inference_collection.py", line 315, in _create_inference_session
sess.initialize_session(providers, provider_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: CHECK failed: (index) < (current_size_):

Regards,

Federico

0 Anand Pathak over 5 years ago in reply to F_Fontana

TI__Genius 9065 points

Hi Federico,

Can you confirm if your original network has multiple inputs, or if it is getting partitioned into subgraphs with multiple inputs? I have identified a bug in case there are multiple inputs to a subgraph which is causing the above example to fail, it is not related to the add operator as such.

I will fix this bug as part of the upcoming release. Just want to make sure this is the reason your original network is failing before I provide a solution.

Regards,

Anand

0 F_Fontana over 5 years ago in reply to Anand Pathak

Prodigy 140 points

Hi Anand,

Actually the original model has one input and multiple (3) outputs.

About the subgraphs, since it's a model that converts successfully with TIDL ImportTool (all the layers are supported by TIDL), it just create a single graph during onnx runtime compilation.

Btw, I noticed, but I'm not sure if it's just an effect of another issue, that often the names of the input layers identified during conversion are not correct, I mean, if you look at the terminal output I pasted in the previous message, input names should be:

432, 438, 461

Instead if you look just before the error it prints

F_Fontana said:
Input tensor name - 461
Input tensor name - 449

while 449 shouldn't be an input.

Regards,

Federico

0 Anand Pathak over 5 years ago in reply to F_Fontana

TI__Genius 9065 points

Hi Federico,

Yes, the input tensor name is not correct because of the bug I mentioned, but this part of the network you have shared works fine for me if I fix that bug. Are you seeing correct input and output names when you run your entire network?

Regards,

Anand

0 F_Fontana over 5 years ago in reply to Anand Pathak

Prodigy 140 points

Hi Anand,

When converting the entire model the input name is wrong, indeed!

Could the add issue be related to the input bug you found? Is there a way to test the fix you did on my side?

Regards,

Federico

0 Anand Pathak over 5 years ago in reply to F_Fontana

TI__Genius 9065 points

Hi Federico,

I am attaching file with correction on top of TIDL.02.00.00.07. Change its extension to .cpp and replace "tidl_onnxRtImport_EP.cpp" in ti_dl/utils/tidlModelImport/.

You will need to build the corresponding library. There are a few dependencies for build which I am listing below. These need to be downloaded/built in psdkra path.

- Protobuf:

Download protobuf: github.com/.../protobuf-cpp-3.11.3.tar.gz

tar -xvzf protobuf-cpp-3.11.3.tar.gz

cd protobuf-3.11.3/

./configure CXXFLAGS=-fPIC --enable-shared=no LDFLAGS="-static"

make

- Flatbuffer:

Refer user guide flatbuffers section:

software-dl.ti.com/.../md_tidl_build_instruction.html

- Onnx:

git clone github.com/.../onnxruntime
cd onnxruntime
git checkout c8e2e3191b2d506d1260069eb3d3fc7c262ec172
git am ../tidl_j7_02_00_00_07/ti_dl/onnxrt_EP/0001-Add-TIDL-compilation-execution-providers.patch

Build libraries from tidl_j7_02_00_00_07 with following command:

make it TIDL_BUILD_ONNX_IMPORT_LIB=1

Let me know if this solves your issue. There is a corresponding fix needed in inference library as well, will share it if import goes through fine with this change.

Regards,

Anand

Fullscreen tidl_onnxRtImport_EP.txt Download

/*
*
* Copyright (c) {2015 - 2017} Texas Instruments Incorporated
*
* All rights reserved not granted herein.
*
* Limited License.
*
* Texas Instruments Incorporated grants a world-wide, royalty-free, non-exclusive
* license under copyrights and patents it now or hereafter owns or controls to make,
* have made, use, import, offer to sell and sell ("Utilize") this software subject to the
* terms herein.  With respect to the foregoing patent license, such license is granted
* solely to the extent that any such patent is necessary to Utilize the software alone.
* The patent license shall not apply to any combinations which include this software,
* other than combinations with devices manufactured by or for TI ("TI Devices").
* No hardware patent is licensed hereunder.
*
* Redistributions must preserve existing copyright notices and reproduce this license
* (including the above copyright notice and the disclaimer and (if applicable) source
* code license limitations below) in the documentation and/or other materials provided
* with the distribution
*
* Redistribution and use in binary form, without modification, are permitted provided
* that the following conditions are met:
*
* *       No reverse engineering, decompilation, or disassembly of this software is
* permitted with respect to any software provided in binary form.
*
* *       any redistribution and use are licensed by TI for use only with TI Devices.
*
* *       Nothing shall obligate TI to provide you with source code for the software
* licensed and provided to you in object code.
*
* If software source code is provided to you, modification and redistribution of the
* source code are permitted provided that the following conditions are met:
*
* *       any redistribution and use of the source code, including any resulting derivative
* works, are licensed by TI for use only with TI Devices.
*
* *       any redistribution and use of any object code compiled from the source code
* and any resulting derivative works, are licensed by TI for use only with TI Devices.
*
* Neither the name of Texas Instruments Incorporated nor the names of its suppliers
*
* may be used to endorse or promote products derived from this software without
* specific prior written permission.
*
* DISCLAIMER.
*
* THIS SOFTWARE IS PROVIDED BY TI AND TI'S LICENSORS "AS IS" AND ANY EXPRESS
* OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL TI AND TI'S LICENSORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*
*/

#define ONNX_ML

#include <google/protobuf/io/coded_stream.h>
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/message.h>
#include <google/protobuf/text_format.h>
using namespace std;
using ::google::protobuf::Message;
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
#include <cmath>
#include <stdarg.h>
#include <unistd.h>

#include "onnx/onnx-ml.proto3.pb.h"
#include "tidl_onnxRtImport_EP.h"

using namespace std;
using namespace onnx;

TIDL_OnnxrtEPData * data_ = new TIDL_OnnxrtEPData;

template <typename T>
T LoadSymbol(void *lib, const char* symbol) {
    T sym = reinterpret_cast<T>(dlsym(lib, symbol));
    assert(sym);
    return sym;
}

template <class Tin>
void tidl_onnxrt_find_range( Tin* src, int32_t c, int32_t h, int32_t w, float src_scale, int32_t zero, float &min, float &max)
{
  float curr;
  min = FLT_MAX;
  max = -FLT_MAX;
  int32_t i0, i1, i2;
  for (i0 = 0; i0 < c; i0++) 
  {
    for (i1 = 0; i1 < h; i1++) 
    {
      for (i2 = 0; i2 < w; i2++) 
      {
        curr = ((src[i0 + i1*w*c + i2*c] - zero)*src_scale);
        min = curr < min ? curr : min;
        max = curr > max ? curr : max;
     }
    } 
  }
}

#define MAX_NUM_TIDL_SUBGRAPHS (16)


static bool check_isdir(const char *path) {
    const char *real = realpath(path, NULL);
    if(!real)
        return false;

    struct stat st;
    int res = stat(real, &st);
    if(res)
        return false;

    bool ret = false;
    if ((st.st_mode & S_IFMT) == S_IFDIR) {
        ret = true;
        free(const_cast<char *>(real));
    }

    return ret;
}

static bool check_isempty(const char *path) {
    if (!check_isdir(path))
        return false;

    struct dirent *e;
    DIR *d = opendir(path);

    if(!d)
        return false;

    errno = 0;
    while(e = readdir(d)) {
        /* do not recurse into . and .. */
        if(!strcmp(e->d_name, ".") || !strcmp(e->d_name, ".."))
            continue;
        return false;
    }

    if(errno)
        return false;

    return true;
}

std::vector<std::string> tidl_fillDenyListOption(char * deny_list)
{
    std::vector<std::string> ret;
    char * token = strtok(deny_list, ",");
    while( token != NULL ) 
    {
        for(int i = 0; i < strlen(token); i++)
        {
            if(token[i] ==  ' ')
            {
                memmove(token+i, token+i+1, strlen(token) - i);
            } 
        }
        std::string itoken;
        std::stringstream(token) >> itoken;
        ret.push_back(itoken);
        token = strtok(NULL, ",");
    }
    return ret;
}

extern "C"
{
bool TIDL_populateOptions(std::vector<std::pair<std::string,std::string>> interface_options)
{
  data_->infer_ops.lib = dlopen("libvx_tidl_rt.so", RTLD_NOW | RTLD_GLOBAL);
  if(! data_->infer_ops.lib)
  {
    printf("Error -   %s \n", dlerror());
  }
  assert(data_->infer_ops.lib);

  data_->infer_ops.TIDLRT_create = LoadSymbol<decltype(data_->infer_ops.TIDLRT_create)>  (data_->infer_ops.lib, "TIDLRT_create");
  data_->infer_ops.TIDLRT_delete = LoadSymbol<decltype(data_->infer_ops.TIDLRT_delete)>  (data_->infer_ops.lib, "TIDLRT_delete");
  data_->infer_ops.TIDLRT_invoke = LoadSymbol<decltype(data_->infer_ops.TIDLRT_invoke)>  (data_->infer_ops.lib, "TIDLRT_invoke");
  data_->infer_ops.TIDLRT_deactivate = LoadSymbol<decltype(data_->infer_ops.TIDLRT_deactivate)>(data_->infer_ops.lib, "TIDLRT_deactivate");
  data_->infer_ops.TIDLRT_setParamsDefault = LoadSymbol<decltype(data_->infer_ops.TIDLRT_setParamsDefault)>(data_->infer_ops.lib, "TIDLRT_setParamsDefault");
  data_->infer_ops.TIDLRT_setTensorDefault = LoadSymbol<decltype(data_->infer_ops.TIDLRT_setTensorDefault)>(data_->infer_ops.lib, "TIDLRT_setTensorDefault");
  data_->infer_ops.TIDLRT_getDdrStats = LoadSymbol<decltype(data_->infer_ops.TIDLRT_getDdrStats)>(data_->infer_ops.lib, "TIDLRT_getDdrStats");

  TIDL_OnnxrtEPData * options = data_;
  for(auto option : interface_options)
  {
    auto key = option.first;
    auto value = option.second;
    if (!strcmp("tidl_tools_path", key.c_str())) 
    {
      options->m_tidl_tools_path = value;
      if(!check_isdir(options->m_tidl_tools_path.c_str())) 
      {
        delete options;
        printf("ERROR : tidl_tools_path not a directory");
        return false;
      }
      // TODO: maybe check for the libs, quants tools, GC tool are contained inside
    }

    if (!strcmp("artifacts_folder", key.c_str())) 
    {
      options->m_artifacts_folder = value;
      if(!check_isdir(options->m_artifacts_folder.c_str())) 
      {
        delete options;

        printf("ERROR : artifacts_folder not a directory");
        return false;
      }
      /*if(!check_isempty(options->m_artifacts_folder.c_str())) {
        delete options;

        printf("ERROR : artifacts_folder s not empty");
        return false;
      }*/
    }

    if (!strcmp("debug_level", key.c_str())) 
    {
      std::stringstream(value) >> options->m_debug_level;
      // TODO: any invalid values? like negative, or beyond supported range?
    }
    
    if (!strcmp("tensor_bits", key.c_str()))
    {
        std::stringstream(value) >> options->m_num_param_bits;

        std::vector<int> valid_num_params{8, 16, 32};
        if(std::find(valid_num_params.begin(), valid_num_params.end(), options->m_num_param_bits) == valid_num_params.end()) 
        {
            delete options;

            printf("ERROR : unsupported tensor_bits \n");
            return false;
        }
    }

    if (!strcmp("max_num_subgraphs", key.c_str())) 
    {
        std::stringstream(value) >> options->m_num_tidl_subgraphs;

        if(options->m_num_tidl_subgraphs > MAX_NUM_TIDL_SUBGRAPHS) 
        {
            delete options;

            printf("ERROR : max_num_subgraphs > MAX_NUM_TIDL_SUBGRAPHS not allowed");
            return false;
        }
    }

    // TODO: fix denylist
    if (strcmp("deny_list", key.c_str()) == 0)  
    {
        try 
        {
          std::string str = value;
          char *cstr = new char[str.length() + 1];
          strcpy(cstr, str.c_str());
          options->m_deny_list = tidl_fillDenyListOption(cstr);
          delete cstr;
        }
        catch(std::string &e) {
            delete options;

            printf("ERROR : could not parse malformed deny_list option");
            return false;
        }
    }

    if (!strcmp("accuracy_level", key.c_str())) 
    {
        std::map<std::string, int> valid_calibs {{"0", 0}, {"1", 7}, {"9", 9}};  // 9 will be mapped to suitable flag based on advanced options
        if(valid_calibs.find(value) == valid_calibs.end()) 
        {
            delete options;

            printf("ERROR : unsupported accuracy_level");
            return false;
        }
        options->m_tidl_calibration_flags = valid_calibs[value];
    }

    if (!strcmp("advanced_options:calibration_frames", key.c_str())) 
    {
        std::stringstream(value) >> options->m_calibration_frames;
        // TODO: any invalid values? like negative, or too many frames?
    }

    if (!strcmp("advanced_options:calibration_iterations", key.c_str())) 
    { 
        std::stringstream(value) >> options->m_calibration_iterations;
        // TODO: any invalid values? like negative, or too many iters?
    }

    if (!strcmp("advanced_options:quantization_scale_type", key.c_str())) 
    { 
        std::map<std::string, int> quantization_scale_type_mapping {{"1", 3}, {"0", 2}};
        if(quantization_scale_type_mapping.find(value) == quantization_scale_type_mapping.end()) 
        {
            delete options;

            printf("ERROR : unsupported quantization_scale_type : specify either '0' or '1'");
            return false;
        }
        options->m_quantization_scale_type = quantization_scale_type_mapping[value];
    }

    if (!strcmp("advanced_options:high_resolution_optimization", key.c_str())) 
    { 
      std::stringstream(value) >> options->m_high_resolution_optimization;
    }

    if (!strcmp("advanced_options:pre_batchnorm_fold", key.c_str())) 
    { 
        std::stringstream(value) >> options->m_pre_batchnorm_fold;
    }

    if (!strcmp("ti_internal_nc_flag", key.c_str())) 
    { 
        std::stringstream(value) >> options->m_compileConstraintsFlag;
    }
    
    if (!strcmp("advanced_options:output_feature_16bit_names_list", key.c_str())) 
    {
      options->m_output_feature_16bit_names_list = value;
    }
    if (!strcmp("advanced_options:params_16bit_names_list", key.c_str())) 
    {
      options->m_params_16bit_names_list = value;
    }

    // below options will be used only if accuracy_level = 9
    if (!strcmp("advanced_options:activation_clipping", key.c_str())) { 
        std::stringstream(value) >> options->m_activation_clipping;
    }
    if (!strcmp("advanced_options:weight_clipping", key.c_str())) { 
        std::stringstream(value) >> options->m_weight_clipping;
    }
    if (!strcmp("advanced_options:bias_calibration", key.c_str())) { 
        std::stringstream(value) >> options->m_bias_calibration;
    }
    if (!strcmp("advanced_options:channel_wise_quantization", key.c_str())) { 
        std::stringstream(value) >> options->m_channel_wise_quantization;
    }
  }

  if(options->m_tidl_calibration_flags == 9) //user defined accuracy level
  {
      options->m_tidl_calibration_flags = options->m_activation_clipping * TIDL_CalibOptionActivationRange +     //default 1
                                        options->m_weight_clipping * TIDL_CalibOptionWeightRange +     //default 1
                                        options->m_bias_calibration * TIDL_CalibOptionBiasCalibration +    //default 1
                                        options->m_channel_wise_quantization * TIDL_CalibOptionPerChannelWeightQuantization;   //default 0
  }
  
  if (options->m_tidl_tools_path.empty()) 
  {
      delete options;

      printf("ERROR : tidl_tools_path must be provided");
      return false;
  }

  if (options->m_artifacts_folder.empty()) 
  {
      delete options;

      printf("ERROR : artifacts_folder must be provided");
      return false;
  }

  options->m_temp_folder = options->m_artifacts_folder + "/tempDir";
  if(mkdir(options->m_temp_folder.c_str(), 0755)) {
      delete options;

      printf("ERROR : mkdir tempDir failed");
      return false;
  }
  if(data_->m_debug_level)
  {
    printf("tidl_tools_path                                 = %s \n", data_->m_tidl_tools_path.c_str());
    printf("artifacts_folder                                = %s \n", data_->m_artifacts_folder.c_str());
    printf("tidl_tensor_bits                                = %d \n", data_->m_num_param_bits);
    printf("debug_level                                     = %d \n", data_->m_debug_level);
    printf("num_tidl_subgraphs                              = %d \n", data_->m_num_tidl_subgraphs);
    printf("tidl_denylist                                   = ");
    for(int i = 0; i < data_->m_deny_list.size(); i++)
    {
      printf("%s   ", data_->m_deny_list[i].c_str());
    }
    printf("\n");
    printf("tidl_calibration_accuracy_level                 = %d \n", data_->m_tidl_calibration_flags);
    printf("tidl_calibration_options:num_frames_calibration = %d \n", data_->m_calibration_frames);
    printf("tidl_calibration_options:bias_calibration_iterations = %d \n", data_->m_calibration_iterations);
    printf("power_of_2_quantization                         = %d \n", data_->m_quantization_scale_type);
    printf("enable_high_resolution_optimization             = %d \n", data_->m_high_resolution_optimization);
    printf("pre_batchnorm_fold                              = %d \n", data_->m_pre_batchnorm_fold);
    printf("output_feature_16bit_names_list                 = %s \n", data_->m_output_feature_16bit_names_list.c_str());
    printf("m_params_16bit_names_list                       = %s \n", data_->m_params_16bit_names_list.c_str());
    printf("reserved_compile_constraints_flag               = %d \n", data_->m_compileConstraintsFlag);
  }
  return true;
}
} //extern "C"

static void copy_file(std::string basename, std::string dstdir, std::string srcdir) {
    std::string src_fname = srcdir + "/" + basename;
    std::string dst_fname = dstdir + "/" + basename;
    int src_fd = open(src_fname.c_str(), O_RDONLY);
    int dst_fd = open(dst_fname.c_str(), O_WRONLY | O_CREAT | O_TRUNC, 0644);
    ssize_t size = lseek(src_fd, 0, SEEK_END); lseek(src_fd, 0, SEEK_SET);
    std::unique_ptr<char[]> buffer = std::make_unique<char[]>(size);

    {
        auto done = 0;
        auto remaining = size;
        while(remaining) {
            int ret = read(src_fd, buffer.get() + done, remaining);
            done += ret;
            remaining -= ret;
        }
    }
    {
        auto done = 0;
        auto remaining = size;
        while(remaining) {
            int ret = write(dst_fd, buffer.get() + done, remaining);
            done += ret;
            remaining -= ret;
        }
    }

    close(src_fd);
    close(dst_fd);
}

int32_t IsNodeSupportedByTIDL(GraphProto&   onnxGraph, FILE *fp, int32_t nodeIx, int32_t debug_level, std::vector<std::string> deny_list, int32_t opSetVersion,
                              bool isObjectDetectionNetwork) 
{
  if(TIDL_onnxAllowlistNode(onnxGraph, nodeIx, debug_level, deny_list, opSetVersion, isObjectDetectionNetwork))
  {
    return true;
  }
  return false;
}


std::vector<std::pair<std::vector<int>, std::pair<std::vector<std::string>, std::vector<std::string>>>> getSubgraphInfo(GraphProto& onnxGraph, std::vector<std::vector<int>> suportedNodeGroups)
{
  std::vector<std::string> nodeOutputs;
  std::vector<std::string> nodeInputs;
  std::pair<std::vector<std::string>, std::vector<std::string>> nodeInputsOutputs;
  std::vector<std::pair<std::vector<int>, std::pair<std::vector<std::string>, std::vector<std::string>>>> info;
  //vector( (subgraph1, (inputs_1, outputs_1)), (subgraph2, (inputs_2, outputs_2)) ) 

  for(int i = 0; i < suportedNodeGroups.size(); i++)
  {
    //save all inputs and outputs of each subgraph
    std::vector<int> subgraph = suportedNodeGroups[i];
    for(int j = 0; j < subgraph.size(); j++)
    {
      for(int l = 0; l < onnxGraph.node(subgraph[j]).input_size(); l++)
      {
        for (int k = 0; k < onnxGraph.value_info_size(); k++)
        {
          if((strcmp(onnxGraph.value_info(k).name().c_str(), onnxGraph.node(subgraph[j]).input(l).c_str()) == 0))
          {
            nodeInputs.push_back(onnxGraph.value_info(k).name());
          }
        }
      }
    }
    for(int j = 0; j < subgraph.size(); j++)
    {
      for(int l = 0; l < onnxGraph.node(subgraph[j]).output_size(); l++)
      {
        nodeOutputs.push_back(onnxGraph.node(subgraph[j]).output(l));
      }
    }
#if 0
    //delete common elements in inputs and outputs - this removes all intermediate linking inputs/outputs, what is left gives subgraph inputs/outputs
    std::sort(nodeInputs.begin(), nodeInputs.end());
    nodeInputs.erase(std::unique(nodeInputs.begin(), nodeInputs.end()), nodeInputs.end());
    std::sort(nodeOutputs.begin(), nodeOutputs.end());
    nodeOutputs.erase(std::unique(nodeOutputs.begin(), nodeOutputs.end()), nodeOutputs.end());

    bool match;
    for(int i = 0; i < nodeInputs.size(); i++)
    {
      match = false;
      for(int j = 0; j < nodeOutputs.size(); j++)
      {
        if(nodeInputs[i].compare(nodeOutputs[j]) == 0)
        {
          match = true;
          auto itr = std::find(nodeInputs.begin(), nodeInputs.end(), nodeInputs[i]);
          if (itr != nodeInputs.end()) nodeInputs.erase(itr);
          itr = std::find(nodeOutputs.begin(), nodeOutputs.end(), nodeOutputs[j]);
          if (itr != nodeOutputs.end()) nodeOutputs.erase(itr);
          j--;
        }
      }
      if(match)
      {
        i--;
      }
    }
#endif
    nodeInputsOutputs = std::make_pair(nodeInputs, nodeOutputs);
    info.push_back(std::make_pair(subgraph, nodeInputsOutputs));
#if 0
    printf("Subgraph inputs \n");
    for(int i = 0; i < nodeInputs.size(); i++)
    {
      printf("%s  \n", nodeInputs[i].c_str());
    }
    printf("Subgraph outputs \n");
    for(int i = 0; i < nodeOutputs.size(); i++)
    {
      printf("%s  \n", nodeOutputs[i].c_str());
    }
#endif
    nodeInputs.clear();
    nodeOutputs.clear();
  }
#if 0
  printf("info.size() = %d \n", info.size());
  for(int i = 0; i < info.size(); i++)
  {
    printf("**** Subgraph %d *****\n", i);
    std::vector<int> subgraph = info[i].first;
    std::vector<std::string> inputs = info[i].second.first;
    std::vector<std::string> outputs = info[i].second.second;
    for(int j = 0; j < subgraph.size(); j++) printf("%d ", subgraph[j]); printf("\n");
    printf("Inputs --- \n");
    for(int j = 0; j < inputs.size(); j++) printf("%s \n ", inputs[j].c_str());
    printf("Outputs --- \n");
    for(int j = 0; j < outputs.size(); j++) printf("%s \n ", outputs[j].c_str());
  }
#endif
  return info;
}


std::vector<std::vector<int>> optimizeGraphPartition(GraphProto& onnxGraph, std::vector<std::vector<int>> suportedNodeGroups)
{
  std::vector<std::pair<std::vector<int>, std::pair<std::vector<std::string>, std::vector<std::string>>>> info;

  std::vector<int> subgraph_i, subgraph_j;
  std::vector<std::string> inputs_i, inputs_j;
  std::vector<std::string> outputs_i, outputs_j;
  bool canMergeInput, canMergeSubgraph, mergeDone;
  mergeDone = false;
  canMergeSubgraph = false;
  
  while(mergeDone == false)
  {
    info = getSubgraphInfo(onnxGraph, suportedNodeGroups);
    for(int i = 0; i < info.size(); i++)
    {
      canMergeSubgraph = false;
      subgraph_i = info[i].first;
      inputs_i = info[i].second.first;
      outputs_i = info[i].second.second;
      for(int j = 0; j < info.size(); j++)
      {
        if(j == i) continue; //do not check subgraph with itself
        canMergeSubgraph = true;
        subgraph_j = info[j].first;
        inputs_j = info[j].second.first;
        outputs_j = info[j].second.second; 
        for(int k = 0; k < inputs_j.size(); k++)
        {
          canMergeInput = false;
          for(int l = 0; l < outputs_i.size(); l++)
          {
            if(inputs_j[k].compare(outputs_i[l]) == 0)  // "all" inputs should be output of another subgraph, else cannot merge subgraphs
            {
              canMergeInput = true;
              continue;
            }
          }
          if(outputs_j.size() == 0) canMergeInput = false;
          if(canMergeInput == false)
          {
            canMergeSubgraph = false;
            break;
          }
        }
        if(inputs_j.size() == 0) canMergeSubgraph = false;
        if(canMergeSubgraph)
        {
          suportedNodeGroups.clear();
          //put all supported nodes in subgraph_i, then delete subgraph_j
          subgraph_i.insert(subgraph_i.end(), subgraph_j.begin(), subgraph_j.end());
          info.erase(std::find(info.begin(), info.end(), info[j]));
          info[i].first = subgraph_i;
          for(int m = 0; m < info.size(); m++)
          {
            suportedNodeGroups.push_back(info[m].first);
          }
          break;
        }
      }
      if(canMergeSubgraph) break;
    }
    if(canMergeSubgraph == false)
    {
      mergeDone = true;
    }
  }
  return suportedNodeGroups;
}


extern "C"
{
std::vector<std::vector<int>> TIDL_getSupportedNodes(std::string& data, int32_t opSetVersion)  
{
  ModelProto model_proto;
  model_proto.ParseFromString(data);

  if(data_->m_debug_level)
  {
    printf("Parsing ONNX Model \n");
    printf("model_proto %p \n", &model_proto);
  }

  auto onnxGraph = model_proto.graph();


  std::vector<std::vector<int>> suportedNodeGroups;
  std::vector<int> nodeGroup;

  FILE *fp;
  char fileName[500];

  sprintf((char *)fileName, "%s/allowedNode.txt", data_->m_artifacts_folder.c_str());
  
  fp = fopen(fileName, "w+");
  if(fp == NULL)
  {
      printf("Could not open %s for writing...exiting !\n", fileName);
  }
  //std::set<std::string> tidl_ops_ = {"Conv", "BatchNormalization", "Relu", "Sum", "Concat", /*"MaxPool"*/};
  bool isObjectDetectionNetwork = false;
  for (int i = 0; i < onnxGraph.node_size(); i++)
  {
    if((strcmp(onnxGraph.node(i).op_type().c_str(), "NonMaxSuppression") == 0) || (strcmp(onnxGraph.node(i).op_type().c_str(), "TopK") == 0))
    {
      isObjectDetectionNetwork = true;
    }
  }

  int32_t i, num_subGraphs = 0; 
  for (i = 0; i < onnxGraph.node_size(); i++)
  {
    if (IsNodeSupportedByTIDL(onnxGraph,  fp, i, data_->m_debug_level, data_->m_deny_list, opSetVersion, isObjectDetectionNetwork)) 
    {
      nodeGroup.push_back(i);
    }
    else
    {
      if(!nodeGroup.empty())
      {
        suportedNodeGroups.push_back(nodeGroup);
        nodeGroup.clear();
        num_subGraphs++;
      }
    }
  }
  if(!nodeGroup.empty())
  {
    suportedNodeGroups.push_back(nodeGroup);
    nodeGroup.clear();
    num_subGraphs++;
  }

  printf("\nPreliminary subgraphs created = %d \n", suportedNodeGroups.size());
  
  std::vector<std::vector<int>> suportedNodeGroupsOptimized = optimizeGraphPartition(onnxGraph, suportedNodeGroups);

  int32_t numSuportedNodes = 0;
  for(int i = 0; i < suportedNodeGroupsOptimized.size(); i++)
  {
    std::vector<int> subgraph = suportedNodeGroupsOptimized[i];
    for(int j = 0; j < subgraph.size(); j++)
    {
      fprintf(fp, "%d\n", subgraph[j]);
      numSuportedNodes++;
    }
  }
  fclose(fp);
  printf("Final number of subgraphs created are : %d, - Offloaded Nodes - %d, Total Nodes - %d \n", suportedNodeGroupsOptimized.size(), numSuportedNodes, onnxGraph.node_size());
  
  if(suportedNodeGroupsOptimized.empty())
  {
    return {{}};
  }
  else
  {
    return suportedNodeGroupsOptimized;
  }
}

int32_t TIDLEP_getDdrStats(uint64_t * read, uint64_t * write)
{
  return(data_->infer_ops.TIDLRT_getDdrStats(read, write));
}


int32_t TIDL_isInputConstInGraph(GraphProto& onnGraph, const string name)
{
  int i;
  for (i = 0; i < onnGraph.initializer_size(); i++)
  {
    if ((strcmp(onnGraph.initializer(i).name().c_str(), name.c_str()) == 0))
    {
      return(1);
    }
  }
  for (i = 0; i < onnGraph.node_size(); i++)
  {
    if ((strcmp(onnGraph.node(i).output(0).c_str(), name.c_str()) == 0) && (strcmp(onnGraph.node(i).op_type().c_str(), "Constant") == 0))
    {
      return(1);
    }
  }
  return (0);
}


int32_t TIDL_isInputConst(std::string * string_buf, const string name)
{
  ModelProto model_proto;
  model_proto.ParseFromString(*string_buf);
  auto onnxGraph = model_proto.graph();
  return (TIDL_isInputConstInGraph(onnxGraph, name));
}

} //extern C

int32_t onnxProto_PrintProps(GraphProto&   onnxGraph)
{
  int32_t i;
  for (i = 0; i < onnxGraph.node_size(); i++)
  {
    printf("%3d, %15s, %d, %d, %s, %s\n", i, 
    onnxGraph.node(i).op_type().c_str(), 
    onnxGraph.node(i).input_size(), onnxGraph.node(i).output_size(),
    onnxGraph.node(i).input(0).c_str(), onnxGraph.node(i).output(0).c_str());
  }
  return 0;
}


char* replaceChar(char* string, char c1, char c2, int length) 
{ 
  for (int32_t i = 0; i < length; i++)
  { 
    if (string[i] == c1) 
        string[i] = c2; 
  }
  return string; 
}

int32_t tidl_onnxrtFindOnnxOutputNames(GraphProto&   onnxGraph, char * outList)
{
  int i, j, k, l;
  char tensorName[500];
  char inTensorName[500];
  int outPutSize = 0;
  int node_idx = 0;

  for (i = 0; i < onnxGraph.node_size(); i++)
  {
    outPutSize = onnxGraph.node(i).output_size();
    for (j = 0; j < outPutSize; j++)
    {
      int outDataUsed = 0;
      strcpy((char *)tensorName, onnxGraph.node(i).output(j).c_str());
      for (k = 0; k < onnxGraph.node_size(); k++)
      {
        for (l = 0; l < onnxGraph.node(k).input_size(); l++)
        {
          strcpy((char *)inTensorName, onnxGraph.node(k).input(l).c_str());
          if (strcmp(tensorName, inTensorName) == 0)
          {
            outDataUsed = 1;
            break;
          }
        }
        if (outDataUsed)
          break;
      }
      if (outDataUsed == 0)
      {
        node_idx = i;
        strcat(outList, tensorName);
        //strcat(outList, ",");
      }
    }
  }
  return (node_idx);
}
extern "C"
{
std::vector<int64_t> TIDL_getOutputShape(void * ioBufDescVPtr, int8_t onnxName[])
{
  sTIDL_IOBufDesc_t *ioBufDescPtr = (sTIDL_IOBufDesc_t *)ioBufDescVPtr;
  std::vector<int64_t> nchw_shape;
  for(int i = 0; i < ioBufDescPtr->numOutputBuf; i++)
  {
    if(strcmp((char *)ioBufDescPtr->outDataName[i], (char *)onnxName) == 0)
    {
      nchw_shape = { 1, ioBufDescPtr->outNumChannels[i], ioBufDescPtr->outHeight[i], ioBufDescPtr->outWidth[i]};
    }
  }
  if(nchw_shape.size() == 0)
  {
    printf("Warning : Couldn't find corresponding ioBuf tensor for onnx tensor with matching name \n");
  }

  return nchw_shape;
}
int32_t TIDLEP_getSubGraphStats(OnnxTIDLSubGraphParams * state_subGraph, char **node_name, void **node_data)
{
  sTIDLRT_PerfStats_t * stats = (sTIDLRT_PerfStats_t*)state_subGraph->stats;
  std::vector<uint64_t> *v = new std::vector<uint64_t>();
  v->push_back(uint64_t(stats->cpIn_time_start));
  v->push_back(uint64_t(stats->cpIn_time_end));
  v->push_back(uint64_t(stats->proc_time_start));
  v->push_back(uint64_t(stats->proc_time_end));
  v->push_back(uint64_t(stats->cpOut_time_start));
  v->push_back(uint64_t(stats->cpOut_time_end));
  *node_data = static_cast<void *>(v);
  *node_name = const_cast<char *>(state_subGraph->subGraphName_);
  return 0;
}

} //extern C

int32_t TIDLRT_ReadBinFromFile(const char * fileName, void * addr, int32_t size)
{
  FILE * fptr = NULL;
  fptr = fopen((const char *)fileName, "rb");
  int status = 0;
  if(fptr)
  {
    status = fread(addr, size, 1, fptr);
    fclose(fptr);
    return status;
  }
  else
  {
    printf("Could not open %s file for reading \n",fileName);
  }
  return status;
}

int32_t tidl_subgraph_rt_create(TIDL_OnnxrtEPData* options, char* subGraphName, sTIDL_IOBufDesc_t *ioBufDescPtr, OnnxTIDLSubGraphParams * subgraphParams)
{
  //tfldelegate_printf(options->debug_level, "************ in tidl_subgraph_rt_create ************ \n ");
  int status = 0;
  sTIDLRT_Params_t prms;
  FILE *fp_network;
  FILE *fp_config;
  char network_file[512];
  char config_file[512];
  void *handle = NULL;

  status = data_->infer_ops.TIDLRT_setParamsDefault(&prms);
  
  snprintf(network_file, MAX_FILE_PATH, "%s/%s_tidl_net.bin", options->m_temp_folder.c_str(), subGraphName);
  snprintf(config_file, MAX_FILE_PATH, "%s/%s_tidl_io_1.bin", options->m_temp_folder.c_str(), subGraphName);
  
  fp_network = fopen(&network_file[0], "rb");
  if (fp_network == NULL)
  {
    printf("Invoke  : ERROR: Unable to open network file %s \n", network_file);
    return -1;
  }
  prms.stats = (sTIDLRT_PerfStats_t*)malloc(sizeof(sTIDLRT_PerfStats_t));

  fseek(fp_network, 0, SEEK_END);
  prms.net_capacity = ftell(fp_network);
  fseek(fp_network, 0, SEEK_SET);
  fclose(fp_network);
  prms.netPtr = malloc(prms.net_capacity);
  
  prms.TIDLReadBinFromFile = TIDLRT_ReadBinFromFile;
  status = prms.TIDLReadBinFromFile(&network_file[0], prms.netPtr, prms.net_capacity);
  
  fp_config = fopen(&config_file[0], "rb");
  if (fp_config == NULL)
  {
    printf("Invoke  : ERROR: Unable to open IO config file %s \n", config_file);
    return -1;
  }
  fseek(fp_config, 0, SEEK_END);
  prms.io_capacity = ftell(fp_config);
  fseek(fp_config, 0, SEEK_SET);
  fclose(fp_config);
  prms.ioBufDescPtr = malloc(prms.io_capacity);
  status = prms.TIDLReadBinFromFile(&config_file[0], prms.ioBufDescPtr, prms.io_capacity);

  if(options->m_debug_level >= 2)
  {
    prms.traceLogLevel = options->m_debug_level;
    prms.traceWriteLevel = 3;
  }

  status = data_->infer_ops.TIDLRT_create(&prms, &handle);
  
  sTIDL_IOBufDesc_t *ioBufDesc = (sTIDL_IOBufDesc_t *)prms.ioBufDescPtr;
  memcpy(ioBufDescPtr, ioBufDesc, sizeof(sTIDL_IOBufDesc_t));

  subgraphParams->rtInList  = (void *)malloc(ioBufDesc->numInputBuf * sizeof(sTIDLRT_Tensor_t));
  subgraphParams->rtOutList = (void *)malloc(ioBufDesc->numOutputBuf * sizeof(sTIDLRT_Tensor_t));
  subgraphParams->rtHandle    = handle;
  subgraphParams->stats       = prms.stats;

  return status;
}

int32_t tidl_subgraph_rt_delete(TIDL_OnnxrtEPData* options, OnnxTIDLSubGraphParams * subgraphParams)
{
  //tfldelegate_printf(options->debug_level, "************ in tidl_subgraph_rt_delete ************ \n ");
  int status = 0;
  if(subgraphParams->rtHandle)
  {
    status = data_->infer_ops.TIDLRT_deactivate(subgraphParams->rtHandle);
    status = data_->infer_ops.TIDLRT_delete(subgraphParams->rtHandle);
  }
  free(subgraphParams->rtInList);
  free(subgraphParams->rtOutList);
  return status;
}

int32_t tidl_subgraph_rt_invoke(TIDL_OnnxrtEPData* options, sTIDL_IOBufDesc_t *ioBufDescPtr, OnnxTIDLSubGraphParams * subgraphParams)
{
  int status = 0;
  int j = 0;
  onnxRtParams_t * onnxRtParams = &subgraphParams->onnxRtParams;

  void *handle = subgraphParams->rtHandle;
  sTIDLRT_PerfStats_t *stats = (sTIDLRT_PerfStats_t *)subgraphParams->stats;

  sTIDLRT_Tensor_t *in[128];
  sTIDLRT_Tensor_t *out[128];
  sTIDLRT_Tensor_t *ins;
  sTIDLRT_Tensor_t *outs;

  ins = (sTIDLRT_Tensor_t *)subgraphParams->rtInList;
  outs = (sTIDLRT_Tensor_t *)subgraphParams->rtOutList;

  if ((ins == NULL) || (outs == NULL))
  {
    printf("Invoke  : ERROR: Unable to allocate memory for TIDL RT in[] out [] tensor struct\n");
    return -1;
  }
  else
  {
    int32_t currInIdx = 0;
    /* Input tesnsors property set up */
    for (j = 0; j < onnxRtParams->numNetInData; j++)
    {
      int64_t inElementType = onnxRtParams->inputTensorElementType[currInIdx];
      void * input = onnxRtParams->inputTensorData[currInIdx];

      if (inElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8)
      {
        in[j] = &(ins[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(in[j]);
        in[j]->ptr = (uint8_t *)(input);
        in[j]->zeroPoint = 0; //quantization->zero_point->data[0];
        in[j]->elementType = TIDLRT_Uint8;
        in[j]->scale = 1.0; //1 / quantization->scale->data[0];
        in[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)in[j]->name, (char *)onnxRtParams->inDataNames[j]);
      }
      else if (inElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32)
      {
        in[j] = &(ins[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(in[j]);
        in[j]->ptr = (int32_t *)(input);
        in[j]->zeroPoint = 0; 
        in[j]->elementType = TIDLRT_Int32;
        in[j]->scale = 1.0;
        in[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)in[j]->name, (char *)onnxRtParams->inDataNames[j]);
      }
      else if (inElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64)
      {
        in[j] = &(ins[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(in[j]);
        in[j]->ptr = (int64_t *)(input);
        in[j]->zeroPoint = 0; 
        in[j]->elementType = TIDLRT_Int64;
        in[j]->scale = 1.0;
        in[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)in[j]->name, (char *)onnxRtParams->inDataNames[j]);
      }
      else if (inElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT)
      {
        in[j] = &(ins[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(in[j]);
        in[j]->ptr = (float *)(input);
        in[j]->zeroPoint = 0; 
        in[j]->elementType = TIDLRT_Float32;
        in[j]->scale = 1.0;
        in[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)in[j]->name, (char *)onnxRtParams->inDataNames[j]);
      }
      else
      {
        printf("Invoke : Unsupported input Tensor element type %d \n", inElementType);
      }
      currInIdx++;
    }

    /* Output tesnsors property set up */
    for (j = 0; j < onnxRtParams->numNetOutData; j++)
    {
      void* output = onnxRtParams->outputTensorData[j];
      int64_t outElementType = onnxRtParams->outputTensorElementType[j];
      //printf("Invoke : outElementType = %d, numchOut = %d, outHeight = %d, outWidth = %d \n", outElementType, ioBufDescPtr->outNumChannels[j], ioBufDescPtr->outHeight[j], ioBufDescPtr->outWidth[j]);

      if (outElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8)
      {
        out[j] = &(outs[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(out[j]);
        out[j]->ptr = (uint8_t *)(output);
        out[j]->zeroPoint = 0; //quantization->zero_point->data[0];
        out[j]->elementType = TIDLRT_Uint8;
        out[j]->scale = 1.0; //1 / quantization->scale->data[0];
        out[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)out[j]->name, (char *)onnxRtParams->outDataNames[j]);
      }
      else if (outElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32)
      {
        out[j] = &(outs[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(out[j]);
        out[j]->ptr = (int32_t *)(output);
        out[j]->zeroPoint = 0;
        out[j]->elementType = TIDLRT_Int32;
        out[j]->scale = 1.0;
        out[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)out[j]->name, (char *)onnxRtParams->outDataNames[j]);
      }
      else if (outElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64)
      {
        out[j] = &(outs[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(out[j]);
        out[j]->ptr = (int64_t *)(output);
        out[j]->zeroPoint = 0;
        out[j]->elementType = TIDLRT_Int64;
        out[j]->scale = 1.0;
        out[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)out[j]->name, (char *)onnxRtParams->outDataNames[j]);
      }
      else if (outElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT)
      {
        out[j] = &(outs[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(out[j]);
        out[j]->ptr = (float *)(output);
        out[j]->zeroPoint = 0;
        out[j]->elementType = TIDLRT_Float32;
        out[j]->scale = 1.0;
        out[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)out[j]->name, (char *)onnxRtParams->outDataNames[j]);
      }
      else
      {
        printf("ERROR : Unsupported output tensor element type %d \n", outElementType);
      }
    }
  }
  status = data_->infer_ops.TIDLRT_invoke(handle, in, out);

  if(options->m_debug_level > 0)
  {
    double proc_time    = (stats->proc_time_end - stats->proc_time_start)  / 1000;
    double cp_in_time   = (stats->cpIn_time_end - stats->cpIn_time_start)  / 1000;
    double cp_out_time  = (stats->cpOut_time_end - stats->cpOut_time_start)/ 1000;

    printf("Sub Graph Stats %f %f %f \n", cp_in_time, proc_time, cp_out_time);
  }
  return status;
}

float TIDL_onnxrtFindMaxQuantizationScale(float min, float max, int32_t elementSizeInBits)
{
  float absRange = (fabs(max) > fabs(min)) ? fabs(max) : fabs(min);
  absRange = (float)ceil(log((double)absRange) / log((double)2));
  absRange = pow(2.0, (double)absRange);
  float quantPrec;
  if (absRange != 0)
  {
    quantPrec = ((1.0*(1 << (elementSizeInBits - 1))) / absRange);
  }
  else
  {
    quantPrec = 1;
  }

  return quantPrec;
}


void tidl_writeQuantizedInput(GraphProto& onnxGraph, onnxRtParams_t * onnxRtParams, char * inputName, 
                              int32_t isCurrFrameIdx1, int32_t numParamBits, float ** inQuantFactorInput)
{
  if(isCurrFrameIdx1) //remove file at the beginning if it exists, in order to avoid appending contents from previous run
  {
    remove(inputName);
  }
  FILE* fp = fopen(inputName, "ab+");
  
  int32_t w[16];
  int32_t h[16];
  int32_t c[16];
  int32_t currInIdx = 0;
  float * inQuantFactor = *inQuantFactorInput;
  //float outScale = 0.0;
  
  if (fp == NULL) 
  {
    printf("Could not open file to save the input tensors \n");
    //return -1;
  }
  
  for (int j = 0; j < onnxGraph.input_size(); j++) 
  {
    if (TIDL_isInputConstInGraph(onnxGraph, onnxGraph.input(j).name())) 
    {
      continue;
    }
    //TODO: Need to put if based on tensor element type    
    float* input = (float *)onnxRtParams->inputTensorData[currInIdx];

    const auto& tensor_shape = onnxRtParams->tensorShape[currInIdx];

    w[currInIdx] = tensor_shape[3];
    h[currInIdx] = tensor_shape[2];
    c[currInIdx] = tensor_shape[1];

    int32_t tensorSize = w[currInIdx] * h[currInIdx] * c[currInIdx];
    {
      float min, max;
      tidl_onnxrt_find_range((float *)(input), tensor_shape[1],
                            tensor_shape[2], tensor_shape[3], 1.0, 0, min, max);
      //float scale = 1.0;
      //tidl_tflite_data_format_hwc2chw(
      //    (float *)pInputData, (float *)(tensor->data.f),
      //    tensor->dims->data[3], tensor->dims->data[1],
      //    tensor->dims->data[2], 1.0, 1 / scale, 0);
      fwrite(input, 1, tensorSize * (32 / 8), fp);
      inQuantFactor[currInIdx] = TIDL_onnxrtFindMaxQuantizationScale(min, max, (numParamBits-1));
    }
    currInIdx++;
    //free(pInputData);
  }
  fclose(fp);
 // *numpInputs = currInIdx; 

}

void tidl_subgraph_import(GraphProto& onnxGraph, onnxRtParams_t * onnxRtParams, TIDL_OnnxrtEPData* options, 
                          void * subGraphPtr, char* subGraphName, int32_t currFrameIdx)
{
  if(currFrameIdx <= options->m_calibration_frames) //need to copy input of subgraphs only before calibration is done
  {
    char inputName[500];
    sprintf((char *)inputName, "%s/%s_calib_raw_data.bin", options->m_temp_folder.c_str(), subGraphName);

    int32_t isCurrFrameIdx1 = (currFrameIdx == 1) ? 1 : 0;
    int32_t numParamBits = options->m_num_param_bits;
#if 1    
    //int32_t numInputTensors = 0;
    float * inQuantFactorCurrTensor = (float *)malloc(16 * sizeof(float));
    memset(inQuantFactorCurrTensor, 0, 16 * sizeof(float));
    tidl_writeQuantizedInput(onnxGraph, onnxRtParams, inputName, isCurrFrameIdx1, numParamBits, &inQuantFactorCurrTensor);
    //for (int i = 0; i < numInputTensors; i++)
    //{
    //  inQuantFactorAllTensors[/* *currNumInTensors + */i] = inQuantFactorCurrTensor[i];
    //}
    //*currNumInTensors = *currNumInTensors + numInputTensors;
#endif 
    if((currFrameIdx == options->m_calibration_frames) && (numParamBits != 32)) //Have all inputs available now, run calibration
    {
      printf("\n**********  Frame Index %d Running fixed point mode for calibration : subgraph id **********\n", currFrameIdx);
      
      TIDL_onnxRtPostProcessNet(options->m_calibration_frames, options->m_num_param_bits, options->m_tidl_calibration_flags, options->m_calibration_iterations, 
                                    const_cast<char *>(options->m_temp_folder.c_str()), subGraphPtr, inQuantFactorCurrTensor,  subGraphName, options->m_debug_level,
                                    options->m_output_feature_16bit_names_list, options->m_params_16bit_names_list, options->m_high_resolution_optimization, 
                                    options->m_quantization_scale_type,   options->m_compileConstraintsFlag,   options->m_pre_batchnorm_fold);
      free(subGraphPtr); // last frame for calibration, no longer need this subgraph to run import_backend, actual model is saved to net file, to be used for inference
    }
    else if((isCurrFrameIdx1) && (options->m_calibration_frames > 0)) //Run in float mode for N-1 images
    {
      printf("\n**********  Frame Index %d Running float import and float inference **********\n", currFrameIdx);
      TIDL_onnxRtPostProcessNet(1, 32, options->m_tidl_calibration_flags, options->m_calibration_iterations, const_cast<char *>(options->m_temp_folder.c_str()), subGraphPtr, 
                                inQuantFactorCurrTensor, subGraphName, options->m_debug_level, options->m_output_feature_16bit_names_list, options->m_params_16bit_names_list, 0, 
                                options->m_quantization_scale_type,   options->m_compileConstraintsFlag,   options->m_pre_batchnorm_fold);
    }
    else
    {
      printf("\n**********  Frame Index %d Running float inference - currFrameIdx <= numFramesCalibration : subgraph id **********\n", currFrameIdx);
    } 

    if(currFrameIdx == options->m_calibration_frames)
    {
      std::string subGraphId;
      std::stringstream(subGraphName) >> subGraphId;
      copy_file(subGraphId + "_tidl_net.bin", options->m_artifacts_folder, options->m_temp_folder);
      copy_file(subGraphId + "_tidl_io_1.bin", options->m_artifacts_folder, options->m_temp_folder); 
    }
  }
  else 
  {
    printf("\n**********  Frame Index %d Running inference - currFrameIdx > numFramesCalibration : subgraph id **********\n", currFrameIdx);
    //No need to run postProcessNet, run inference directly on the saved graph
  }
}


extern "C"
{
void TIDL_createStateFunc(OnnxTIDLSubGraphParams * state_subGraph, std::string * string_buf, const std::string node_name)
{
  onnxRtParams_t * onnxRtParams = &state_subGraph->onnxRtParams;
  state_subGraph->currFrameIdx_ = 0;
  state_subGraph->subGraphPtr_ = NULL;
  state_subGraph->string_buf = string_buf;

  ModelProto model_proto;
  model_proto.ParseFromString(*string_buf);

  auto onnxGraph = model_proto.graph();


  printf("Compile %s\n", node_name.c_str());

  if(data_->m_debug_level)
  {
    printf("Compiling Sub ONNX Model \n");
    onnxProto_PrintProps(onnxGraph);
  }
  state_subGraph->ioBuffDesc = (void*)malloc(sizeof(sTIDL_IOBufDesc_t));
  assert(state_subGraph->ioBuffDesc);

  int status = 0;
  char outDataNamesList[500] = "";
  tidl_onnxrtFindOnnxOutputNames(onnxGraph, (char*)outDataNamesList);
  strcpy((char*)state_subGraph->subGraphName_, (char*)outDataNamesList);
  strcpy((char*)state_subGraph->subGraphName_, replaceChar((char*)state_subGraph->subGraphName_, '/', '_', strlen((const char*)state_subGraph->subGraphName_)));
  
  int32_t currIdx = 0;
  for (int i = 0; i < onnxGraph.input_size(); i++) 
  {    
    if (TIDL_isInputConst(string_buf, onnxGraph.input(i).name())) 
    {
      continue;
    }
    state_subGraph->inputIdx[currIdx++] = i;
  }
  state_subGraph->numInputs = currIdx;
  state_subGraph->numOutputs = onnxGraph.output_size();

  for (int i = 0; i < state_subGraph->numInputs; i++) 
  {      
    printf("Input tensor name -  %s \n", onnxGraph.input(state_subGraph->inputIdx[i]).name().c_str());
    strcpy((char *)onnxRtParams->inDataNames[i],  (char*)onnxGraph.input(state_subGraph->inputIdx[i]).name().c_str());
  }
  for (int i = 0; i < state_subGraph->numOutputs; i++)
  {
    printf("Output tensor name - %s \n", onnxGraph.output(i).name().c_str());
    strcpy((char *)onnxRtParams->outDataNames[i],  onnxGraph.output(i).name().c_str());
  }

  //printf("Compute status : %d \n", status);
}

void TIDL_computeImportFunc(OnnxTIDLSubGraphParams * state_subGraph, std::string * string_buf,int32_t opSetVersion)
{
  ModelProto model_proto;
  model_proto.ParseFromString(*string_buf);

  auto onnxGraph = model_proto.graph(); 
  //printf("Computing Sub ONNX Model \n");
  onnxRtParams_t * onnxRtParams = &state_subGraph->onnxRtParams;

  int32_t status;
  state_subGraph->currFrameIdx_++;

//#ifdef TIDL_IMPORT_ONNX
  if ((state_subGraph->currFrameIdx_ == 1))
  {
    char outDataNamesList[500] = "";
    tidl_onnxrtFindOnnxOutputNames(onnxGraph, (char*)outDataNamesList);
    strcpy((char*)state_subGraph->subGraphName_, (char*)outDataNamesList);
    strcpy((char*)state_subGraph->subGraphName_, replaceChar((char*)state_subGraph->subGraphName_, '/', '_', strlen((const char*)state_subGraph->subGraphName_)));  

    TIDL_onnxRtImportInit(onnxGraph, onnxRtParams, (char*)state_subGraph->subGraphName_,  data_->m_num_param_bits, 
                          const_cast<char *>(data_->m_tidl_tools_path.c_str()), data_->m_debug_level, opSetVersion);
    for (int i = 0; i < onnxGraph.node_size(); i++) 
    {
      TIDL_onnxRtImportAndLinkNode(onnxGraph, i, data_->m_debug_level);
    }
    TIDL_onnxRtOptimizeNet(data_->m_debug_level);
    TIDL_saveTidlOnnxRtSubGraph(&state_subGraph->subGraphPtr_);
  }
  
  tidl_subgraph_import(onnxGraph, onnxRtParams, data_, state_subGraph->subGraphPtr_, state_subGraph->subGraphName_, state_subGraph->currFrameIdx_);
  status = tidl_subgraph_rt_create(data_, state_subGraph->subGraphName_, (sTIDL_IOBufDesc_t*)state_subGraph->ioBuffDesc, state_subGraph);

  //printf("Compute status : %d \n", status);
}

void TIDL_computeInvokeFunc(OnnxTIDLSubGraphParams * state_subGraph)
{
  int32_t status;
  status = tidl_subgraph_rt_invoke(data_, (sTIDL_IOBufDesc_t*)state_subGraph->ioBuffDesc, state_subGraph);
  status = tidl_subgraph_rt_delete(data_, state_subGraph);
}

} //extern C

0 F_Fontana over 4 years ago in reply to Anand Pathak

Prodigy 140 points

Hi Anand,

I confirm that I was able to import the original model with the fix provided!

If you can provide the inference library one, I can test it as well

Thanks,

Federico

0 Anand Pathak over 4 years ago in reply to F_Fontana

TI__Genius 9065 points

Hi Federico,

That's great. Attaching "onnxrt_EP.txt". Change extension to .cpp and replace in "ti_dl/onnxrt_EP/src/". Run the following command to build it from tidl_j7_02_00_00_07.

make onnxrt_EP TARGET_PLATFORM=PC

Let me know if it works so we can close this thread.

Regards,

Anand

Fullscreen onnxrt_EP.txt Download

/*
* Copyright (C) 2020 Texas Instruments Incorporated - http://www.ti.com/
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

//#define ONNX_ML

#include <google/protobuf/io/coded_stream.h>
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/message.h>
#include <google/protobuf/text_format.h>
using ::google::protobuf::Message;
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
#include <cmath>
#include <stdarg.h>

#include "onnx/onnx-ml.proto3.pb.h"
#include "onnxrt_EP.h"

using namespace std;
using namespace onnx;

TIDL_OnnxrtEPInferOptions * data_ = new TIDL_OnnxrtEPInferOptions;

void onnxrt_printf(int32_t debugLevel, char * format, ...)
{
  va_list args;
  if (debugLevel == 1)
  {
    (void)va_start(args, format);
    (void)vprintf(format, args);
    va_end(args);
  }
}

template <typename T>
T LoadSymbol(void *lib, const char* symbol) {
    T sym = reinterpret_cast<T>(dlsym(lib, symbol));
    assert(sym);
    return sym;
}

static bool check_isdir(const char *path) {
    const char *real = realpath(path, NULL);
    if(!real)
        return false;

    struct stat st;
    int res = stat(real, &st);
    if(res)
        return false;

    bool ret = false;
    if ((st.st_mode & S_IFMT) == S_IFDIR) {
        ret = true;
        free(const_cast<char *>(real));
    }

    return ret;
}


extern "C"
{
bool TIDL_populateOptions(std::vector<std::pair<std::string,std::string>> interface_options)
{
  data_->infer_ops.lib = dlopen("libvx_tidl_rt.so", RTLD_NOW | RTLD_GLOBAL);
  if(! data_->infer_ops.lib)
  {
    printf("Error -   %s \n", dlerror());
  }

  data_->infer_ops.TIDLRT_create = LoadSymbol<decltype(data_->infer_ops.TIDLRT_create)>  (data_->infer_ops.lib, "TIDLRT_create");
  data_->infer_ops.TIDLRT_delete = LoadSymbol<decltype(data_->infer_ops.TIDLRT_delete)>  (data_->infer_ops.lib, "TIDLRT_delete");
  data_->infer_ops.TIDLRT_invoke = LoadSymbol<decltype(data_->infer_ops.TIDLRT_invoke)>  (data_->infer_ops.lib, "TIDLRT_invoke");
  data_->infer_ops.TIDLRT_deactivate = LoadSymbol<decltype(data_->infer_ops.TIDLRT_deactivate)>(data_->infer_ops.lib, "TIDLRT_deactivate");
  data_->infer_ops.TIDLRT_setParamsDefault = LoadSymbol<decltype(data_->infer_ops.TIDLRT_setParamsDefault)>(data_->infer_ops.lib, "TIDLRT_setParamsDefault");
  data_->infer_ops.TIDLRT_setTensorDefault = LoadSymbol<decltype(data_->infer_ops.TIDLRT_setTensorDefault)>(data_->infer_ops.lib, "TIDLRT_setTensorDefault");
  data_->infer_ops.TIDLRT_getDdrStats = LoadSymbol<decltype(data_->infer_ops.TIDLRT_getDdrStats)>(data_->infer_ops.lib, "TIDLRT_getDdrStats");

  TIDL_OnnxrtEPInferOptions * options = data_;
  for(auto option : interface_options)
  {
    auto key = option.first;
    auto value = option.second;
    if (!strcmp("debug_level", key.c_str())) 
    {
      std::stringstream(value) >> options->m_debug_level;
      // TODO: any invalid values? like negative, or beyond supported range?
    }
    if (!strcmp("artifacts_folder", key.c_str())) 
    {
      options->m_artifacts_folder = value;
      if(!check_isdir(options->m_artifacts_folder.c_str())) 
      {
        delete options;

        printf("ERROR : artifacts_folder not a directory");
        return false;
      }
    }
  }
  if (options->m_artifacts_folder.empty()) 
  {
      delete options;

      printf("ERROR : artifacts_folder must be provided");
      return false;
  }

  onnxrt_printf(data_->m_debug_level, "artifacts_folder                                = %s \n", data_->m_artifacts_folder.c_str());
  onnxrt_printf(data_->m_debug_level, "debug_level                                     = %d \n", data_->m_debug_level);

  return true;
}
} //extern "C"

int32_t IsNodeSupportedByTIDL(GraphProto&   onnxGraph, FILE *fp, int32_t nodeIx, int32_t opsetVersion) 
{
  int idx;
  fseek(fp, 0, SEEK_SET);
  while (!feof(fp))
  {
    fscanf(fp, "%d\n", &idx);
    if(nodeIx == idx)
    {
      return true;
    }
  }
  return false;
}

std::vector<std::pair<std::vector<int>, std::pair<std::vector<std::string>, std::vector<std::string>>>> getSubgraphInfo(GraphProto& onnxGraph, std::vector<std::vector<int>> suportedNodeGroups)
{
  std::vector<std::string> nodeOutputs;
  std::vector<std::string> nodeInputs;
  std::pair<std::vector<std::string>, std::vector<std::string>> nodeInputsOutputs;
  std::vector<std::pair<std::vector<int>, std::pair<std::vector<std::string>, std::vector<std::string>>>> info;
  //vector( (subgraph1, (inputs_1, outputs_1)), (subgraph2, (inputs_2, outputs_2)) ) 

  for(int i = 0; i < suportedNodeGroups.size(); i++)
  {
    std::vector<int> subgraph = suportedNodeGroups[i];
    for(int j = 0; j < subgraph.size(); j++)
    {
      for(int l = 0; l < onnxGraph.node(subgraph[j]).input_size(); l++)
      {
        for (int k = 0; k < onnxGraph.value_info_size(); k++)
        {
          if((strcmp(onnxGraph.value_info(k).name().c_str(), onnxGraph.node(subgraph[j]).input(l).c_str()) == 0))
          {
            nodeInputs.push_back(onnxGraph.value_info(k).name());
          }
        }
      }
    }
    for(int j = 0; j < subgraph.size(); j++)
    {
      for(int l = 0; l < onnxGraph.node(subgraph[j]).output_size(); l++)
      {
        nodeOutputs.push_back(onnxGraph.node(subgraph[j]).output(l));
      }
    }
#if 0    
    std::sort(nodeInputs.begin(), nodeInputs.end());
    nodeInputs.erase(std::unique(nodeInputs.begin(), nodeInputs.end()), nodeInputs.end());
    std::sort(nodeOutputs.begin(), nodeOutputs.end());
    nodeOutputs.erase(std::unique(nodeOutputs.begin(), nodeOutputs.end()), nodeOutputs.end());

    bool match;
    for(int i = 0; i < nodeInputs.size(); i++)
    {
      match = false;
      for(int j = 0; j < nodeOutputs.size(); j++)
      {
        if(nodeInputs[i].compare(nodeOutputs[j]) == 0)
        {
          match = true;
          auto itr = std::find(nodeInputs.begin(), nodeInputs.end(), nodeInputs[i]);
          if (itr != nodeInputs.end()) nodeInputs.erase(itr);
          itr = std::find(nodeOutputs.begin(), nodeOutputs.end(), nodeOutputs[j]);
          if (itr != nodeOutputs.end()) nodeOutputs.erase(itr);
          j--;
        }
      }
      if(match)
      {
        i--;
      }
    }
#endif
    nodeInputsOutputs = std::make_pair(nodeInputs, nodeOutputs);
    info.push_back(std::make_pair(subgraph, nodeInputsOutputs));
#if 0
    printf("Subgraph inputs \n");
    for(int i = 0; i < nodeInputs.size(); i++)
    {
      printf("%s  \n", nodeInputs[i].c_str());
    }
    printf("Subgraph outputs \n");
    for(int i = 0; i < nodeOutputs.size(); i++)
    {
      printf("%s  \n", nodeOutputs[i].c_str());
    }
#endif
    nodeInputs.clear();
    nodeOutputs.clear();
  }
#if 0
  printf("info.size() = %d \n", info.size());
  for(int i = 0; i < info.size(); i++)
  {
    printf("**** Subgraph %d *****\n", i);
    std::vector<int> subgraph = info[i].first;
    std::vector<std::string> inputs = info[i].second.first;
    std::vector<std::string> outputs = info[i].second.second;
    for(int j = 0; j < subgraph.size(); j++) printf("%d ", subgraph[j]); printf("\n");
    printf("Inputs --- \n");
    for(int j = 0; j < inputs.size(); j++) printf("%s \n ", inputs[j].c_str());
    printf("Outputs --- \n");
    for(int j = 0; j < outputs.size(); j++) printf("%s \n ", outputs[j].c_str());
  }
#endif
  return info;
}


std::vector<std::vector<int>> optimizeGraphPartition(GraphProto& onnxGraph, std::vector<std::vector<int>> suportedNodeGroups)
{
  std::vector<std::pair<std::vector<int>, std::pair<std::vector<std::string>, std::vector<std::string>>>> info;

  std::vector<int> subgraph_i, subgraph_j;
  std::vector<std::string> inputs_i, inputs_j;
  std::vector<std::string> outputs_i, outputs_j;
  bool canMergeInput, canMergeSubgraph, mergeDone;
  mergeDone = false;
  canMergeSubgraph = false;
  
  while(mergeDone == false)
  {
    info = getSubgraphInfo(onnxGraph, suportedNodeGroups);
    for(int i = 0; i < info.size(); i++)
    {
      canMergeSubgraph = false;
      subgraph_i = info[i].first;
      inputs_i = info[i].second.first;
      outputs_i = info[i].second.second;
      for(int j = 0; j < info.size(); j++)
      {
        if(j == i) continue;
        canMergeSubgraph = true;
        subgraph_j = info[j].first;
        inputs_j = info[j].second.first;
        outputs_j = info[j].second.second; 
        for(int k = 0; k < inputs_j.size(); k++)
        {
          canMergeInput = false;
          for(int l = 0; l < outputs_i.size(); l++)
          {
            if(inputs_j[k].compare(outputs_i[l]) == 0)
            {
              canMergeInput = true;
              continue;
            }
          }
          if(outputs_j.size() == 0) canMergeInput = false;
          if(canMergeInput == false)
          {
            canMergeSubgraph = false;
            break;
          }
        }
        if(inputs_j.size() == 0) canMergeSubgraph = false;
        if(canMergeSubgraph)
        {
          suportedNodeGroups.clear();
          subgraph_i.insert(subgraph_i.end(), subgraph_j.begin(), subgraph_j.end());
          info.erase(std::find(info.begin(), info.end(), info[j]));
          info[i].first = subgraph_i;
          for(int m = 0; m < info.size(); m++)
          {
            suportedNodeGroups.push_back(info[m].first);
          }
          break;
        }
      }
      if(canMergeSubgraph) break;
    }
    if(canMergeSubgraph == false)
    {
      mergeDone = true;
    }
  }
  return suportedNodeGroups;
}

extern "C"
{
std::vector<std::vector<int>> TIDL_getSupportedNodes(std::string& data, int32_t opsetVersion)
{
  int32_t numSuportedNodes = 0;
  if (data_->m_debug_level)
  {
    printf("Parsing ONNX Model \n");
  }

  ModelProto model_proto;
  model_proto.ParseFromString(data);
  auto onnxGraph = model_proto.graph();


  std::vector<std::vector<int>> suportedNodeGroups;
  std::vector<int> nodeGroup;

  FILE *fp;
  char fileName[500];

  sprintf((char *)fileName, "%s/allowedNode.txt", data_->m_artifacts_folder.c_str());
  fp = fopen(fileName, "r");
  if(fp == NULL)
  {
      printf("Could not open %s for reading...exiting !\n", fileName);
  }

  int32_t i, num_subGraphs = 0; 
  for (i = 0; i < onnxGraph.node_size(); i++)
  {
    if (IsNodeSupportedByTIDL(onnxGraph,  fp, i, opsetVersion)) 
    {       
      nodeGroup.push_back(i);
      numSuportedNodes++;
    }
    else
    {
      if(!nodeGroup.empty())
      {
        suportedNodeGroups.push_back(nodeGroup);
        nodeGroup.clear();
        num_subGraphs++;
      }
    }
  }
  if(!nodeGroup.empty())
  {
    suportedNodeGroups.push_back(nodeGroup);
    nodeGroup.clear();
    num_subGraphs++;
  }
  fclose(fp);

  printf("\nPreliminary subgraphs created = %ld \n", suportedNodeGroups.size());
  
  std::vector<std::vector<int>> suportedNodeGroupsOptimized = optimizeGraphPartition(onnxGraph, suportedNodeGroups);

  printf("Final number of subgraphs created are : %ld, - Offloaded Nodes - %d, Total Nodes - %d \n", suportedNodeGroupsOptimized.size(), numSuportedNodes, onnxGraph.node_size());
  if(suportedNodeGroupsOptimized.empty())
  {
    return {{}};
  }
  else
  {
    return suportedNodeGroupsOptimized;
  }
}

int32_t TIDL_isInputConstInGraph(GraphProto&   onnGraph, const string name)
{
  int i;
  for (i = 0; i < onnGraph.initializer_size(); i++)
  {
    if ((strcmp(onnGraph.initializer(i).name().c_str(), name.c_str()) == 0))
    {
      return(1);
    }
  }
  for (i = 0; i < onnGraph.node_size(); i++)
  {
    if ((strcmp(onnGraph.node(i).output(0).c_str(), name.c_str()) == 0) && (strcmp(onnGraph.node(i).op_type().c_str(), "Constant") == 0))
    {
      return(1);
    }
  }
  return (0);
}


int32_t TIDL_isInputConst(std::string * string_buf, const string name)
{
  ModelProto model_proto;
  model_proto.ParseFromString(*string_buf);
  auto onnxGraph = model_proto.graph();
  return (TIDL_isInputConstInGraph(onnxGraph, name));
}

} //extern C

int32_t onnxProto_PrintProps(GraphProto&   onnxGraph)
{
  int32_t i;
  for (i = 0; i < onnxGraph.node_size(); i++)
  {
    printf("%3d, %15s, %d, %d, %s, %s\n", i, 
    onnxGraph.node(i).op_type().c_str(), 
    onnxGraph.node(i).input_size(), onnxGraph.node(i).output_size(),
    onnxGraph.node(i).input(0).c_str(), onnxGraph.node(i).output(0).c_str());
  }
  return 0;
}

char* replaceChar(char* string, char c1, char c2, int length) 
{ 
  for (int32_t i = 0; i < length; i++)
  { 
    if (string[i] == c1) 
        string[i] = c2; 
  }
  return string; 
}


int32_t tidl_onnxrtFindOnnxOutputNames(GraphProto&   onnxGraph, char * outList)
{
  int i, j, k, l;
  char tensorName[500];
  char inTensorName[500];
  int outPutSize = 0;
  int node_idx = 0;

  for (i = 0; i < onnxGraph.node_size(); i++)
  {
    outPutSize = onnxGraph.node(i).output_size();
    for (j = 0; j < outPutSize; j++)
    {
      int outDataUsed = 0;
      strcpy((char *)tensorName, onnxGraph.node(i).output(j).c_str());
      for (k = 0; k < onnxGraph.node_size(); k++)
      {
        for (l = 0; l < onnxGraph.node(k).input_size(); l++)
        {
          strcpy((char *)inTensorName, onnxGraph.node(k).input(l).c_str());
          if (strcmp(tensorName, inTensorName) == 0)
          {
            outDataUsed = 1;
            break;
          }
        }
        if (outDataUsed)
          break;
      }
      if (outDataUsed == 0)
      {
        node_idx = i;
        strcat(outList, tensorName);
        //strcat(outList, ",");
      }
    }
  }
  return (node_idx);
}

extern "C"
{
std::vector<int64_t> TIDL_getOutputShape(void * ioBufDescVPtr, int8_t onnxName[])
{
  sTIDL_IOBufDesc_t *ioBufDescPtr = (sTIDL_IOBufDesc_t *)ioBufDescVPtr;
  std::vector<int64_t> nchw_shape;
  for(int i = 0; i < ioBufDescPtr->numOutputBuf; i++)
  {
    if(strcmp((char *)ioBufDescPtr->outDataName[i], (char *)onnxName) == 0)
    {
      nchw_shape = { 1, ioBufDescPtr->outNumChannels[i], ioBufDescPtr->outHeight[i], ioBufDescPtr->outWidth[i]};
    }
  }
  if(nchw_shape.size() == 0)
  {
    printf("Warning : Couldn't find corresponding ioBuf tensor for onnx tensor with matching name \n");
  }
  return nchw_shape;
}
int32_t TIDLEP_getSubGraphStats(OnnxTIDLSubGraphParams * state_subGraph, char **node_name, void **node_data)
{
  sTIDLRT_PerfStats_t * stats = (sTIDLRT_PerfStats_t*)state_subGraph->stats;
  std::vector<uint64_t> *v = new std::vector<uint64_t>();
  v->push_back(uint64_t(stats->cpIn_time_start));
  v->push_back(uint64_t(stats->cpIn_time_end));
  v->push_back(uint64_t(stats->proc_time_start));
  v->push_back(uint64_t(stats->proc_time_end));
  v->push_back(uint64_t(stats->cpOut_time_start));
  v->push_back(uint64_t(stats->cpOut_time_end));
  *node_data = static_cast<void *>(v);
  *node_name = const_cast<char *>(state_subGraph->subGraphName_);
  return 0;
}
void TIDL_computeImportFunc(OnnxTIDLSubGraphParams * state_subGraph, std::string * string_buf,int32_t opSetVersion)
{
  printf("Error : This Fucntion call is not expected for infernce flow \n");
}


} //extern C


int32_t TIDLRT_ReadBinFromFile(const char * fileName, void * addr, int32_t size)
{
  FILE * fptr = NULL;
  fptr = fopen((const char *)fileName, "rb");
  int status = 0;
  if(fptr)
  {
    status = fread(addr, size, 1, fptr);
    fclose(fptr);
    return status;
  }
  else
  {
    printf("Could not open %s file for reading \n",fileName);
  }
  return status;
}

int32_t tidl_subgraph_rt_create(TIDL_OnnxrtEPInferOptions* options, char* subGraphName, sTIDL_IOBufDesc_t *ioBufDescPtr, OnnxTIDLSubGraphParams * subgraphParams)
{
  //tfldelegate_printf(options->debug_level, "************ in tidl_subgraph_rt_create ************ \n ");
  int status = 0;
  sTIDLRT_Params_t prms;
  FILE *fp_network;
  FILE *fp_config;
  char network_file[512];
  char config_file[512];
  void *handle = NULL;

  status = data_->infer_ops.TIDLRT_setParamsDefault(&prms);
  
  snprintf(network_file, MAX_FILE_PATH, "%s/%s_tidl_net.bin", options->m_artifacts_folder.c_str(), subGraphName);
  snprintf(config_file, MAX_FILE_PATH, "%s/%s_tidl_io_1.bin", options->m_artifacts_folder.c_str(), subGraphName);
  
  fp_network = fopen(&network_file[0], "rb");
  if (fp_network == NULL)
  {
    printf("Invoke  : ERROR: Unable to open network file %s \n", network_file);
    return -1;
  }
  prms.stats = (sTIDLRT_PerfStats_t*)malloc(sizeof(sTIDLRT_PerfStats_t));

  fseek(fp_network, 0, SEEK_END);
  prms.net_capacity = ftell(fp_network);
  fseek(fp_network, 0, SEEK_SET);
  fclose(fp_network);
  prms.netPtr = malloc(prms.net_capacity);
  
  prms.TIDLReadBinFromFile = TIDLRT_ReadBinFromFile;
  status = prms.TIDLReadBinFromFile(&network_file[0], prms.netPtr, prms.net_capacity);
  
  fp_config = fopen(&config_file[0], "rb");
  if (fp_config == NULL)
  {
    printf("Invoke  : ERROR: Unable to open IO config file %s \n", config_file);
    return -1;
  }
  fseek(fp_config, 0, SEEK_END);
  prms.io_capacity = ftell(fp_config);
  fseek(fp_config, 0, SEEK_SET);
  fclose(fp_config);
  prms.ioBufDescPtr = malloc(prms.io_capacity);
  status = prms.TIDLReadBinFromFile(&config_file[0], prms.ioBufDescPtr, prms.io_capacity);

  if(options->m_debug_level >= 2)
  {
    prms.traceLogLevel = options->m_debug_level;
    prms.traceWriteLevel = 3;
  }

  status = data_->infer_ops.TIDLRT_create(&prms, &handle);
  
  sTIDL_IOBufDesc_t *ioBufDesc = (sTIDL_IOBufDesc_t *)prms.ioBufDescPtr;
  memcpy(ioBufDescPtr, ioBufDesc, sizeof(sTIDL_IOBufDesc_t));

  subgraphParams->rtInList  = (void *)malloc(ioBufDesc->numInputBuf * sizeof(sTIDLRT_Tensor_t));
  subgraphParams->rtOutList = (void *)malloc(ioBufDesc->numOutputBuf * sizeof(sTIDLRT_Tensor_t));
  subgraphParams->rtHandle    = handle;
  subgraphParams->stats       = prms.stats;

  return status;
}

int32_t tidl_subgraph_rt_delete(TIDL_OnnxrtEPInferOptions* options, OnnxTIDLSubGraphParams * subgraphParams)
{
  //tfldelegate_printf(options->debug_level, "************ in tidl_subgraph_rt_delete ************ \n ");
  int status = 0;
  if(subgraphParams->rtHandle)
  {
    status = data_->infer_ops.TIDLRT_deactivate(subgraphParams->rtHandle);
    status = data_->infer_ops.TIDLRT_delete(subgraphParams->rtHandle);
  }
  free(subgraphParams->rtInList);
  free(subgraphParams->rtOutList);
  return status;
}
int32_t tidl_subgraph_rt_invoke(TIDL_OnnxrtEPInferOptions* options, sTIDL_IOBufDesc_t *ioBufDescPtr, OnnxTIDLSubGraphParams * subgraphParams)
{
  int status = 0;
  int j = 0;
  onnxRtParams_t * onnxRtParams = &subgraphParams->onnxRtParams;
  void *handle = subgraphParams->rtHandle;
  sTIDLRT_PerfStats_t *stats = (sTIDLRT_PerfStats_t *)subgraphParams->stats;

  sTIDLRT_Tensor_t *in[128];
  sTIDLRT_Tensor_t *out[128];
  sTIDLRT_Tensor_t *ins;
  sTIDLRT_Tensor_t *outs;

  ins = (sTIDLRT_Tensor_t *)subgraphParams->rtInList;
  outs = (sTIDLRT_Tensor_t *)subgraphParams->rtOutList;

  if ((ins == NULL) || (outs == NULL))
  {
    printf("Invoke  : ERROR: Unable to allocate memory for TIDL RT in[] out [] tensor struct\n");
    return -1;
  }
  else
  {
    int32_t currInIdx = 0;
    /* Input tesnsors property set up */
    for (j = 0; j < onnxRtParams->numNetInData; j++)
    {

      int64_t inElementType = onnxRtParams->inputTensorElementType[currInIdx];
      void * input = onnxRtParams->inputTensorData[currInIdx];

      if (inElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8)
      {
        in[j] = &(ins[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(in[j]);
        in[j]->ptr = (uint8_t *)(input);
        in[j]->zeroPoint = 0; //quantization->zero_point->data[0];
        in[j]->elementType = TIDLRT_Uint8;
        in[j]->scale = 1.0; //1 / quantization->scale->data[0];
        in[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)in[j]->name, (char *)onnxRtParams->inDataNames[j]);
      }
      else if (inElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32)
      {
        in[j] = &(ins[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(in[j]);
        in[j]->ptr = (int32_t *)(input);
        in[j]->zeroPoint = 0; 
        in[j]->elementType = TIDLRT_Int32;
        in[j]->scale = 1.0;
        in[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)in[j]->name, (char *)onnxRtParams->inDataNames[j]);
      }
      else if (inElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64)
      {
        in[j] = &(ins[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(in[j]);
        in[j]->ptr = (int64_t *)(input);
        in[j]->zeroPoint = 0; 
        in[j]->elementType = TIDLRT_Int64;
        in[j]->scale = 1.0;
        in[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)in[j]->name, (char *)onnxRtParams->inDataNames[j]);
      }
      else if (inElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT)
      {
        in[j] = &(ins[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(in[j]);
        in[j]->ptr = (float *)(input);
        in[j]->zeroPoint = 0; 
        in[j]->elementType = TIDLRT_Float32;
        in[j]->scale = 1.0;
        in[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)in[j]->name, (char *)onnxRtParams->inDataNames[j]);
      }
      else
      {
        printf("Invoke : Unsupported input Tensor element type %ld \n", inElementType);
      }
      currInIdx++;
    }

    /* Output tesnsors property set up */
    for (j = 0; j < onnxRtParams->numNetOutData; j++)
    {
      void* output = onnxRtParams->outputTensorData[j];
      int64_t outElementType = onnxRtParams->outputTensorElementType[j];

      if (outElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8)
      {
        out[j] = &(outs[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(out[j]);
        out[j]->ptr = (uint8_t *)(output);
        out[j]->zeroPoint = 0; //quantization->zero_point->data[0];
        out[j]->elementType = TIDLRT_Uint8;
        out[j]->scale = 1.0; //1 / quantization->scale->data[0];
        out[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)out[j]->name, (char *)onnxRtParams->outDataNames[j]);
      }
      else if (outElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32)
      {
        out[j] = &(outs[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(out[j]);
        out[j]->ptr = (int32_t *)(output);
        out[j]->zeroPoint = 0;
        out[j]->elementType = TIDLRT_Int32;
        out[j]->scale = 1.0;
        out[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)out[j]->name, (char *)onnxRtParams->outDataNames[j]);
      }
      else if (outElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64)
      {
        out[j] = &(outs[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(out[j]);
        out[j]->ptr = (int64_t *)(output);
        out[j]->zeroPoint = 0;
        out[j]->elementType = TIDLRT_Int64;
        out[j]->scale = 1.0;
        out[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)out[j]->name, (char *)onnxRtParams->outDataNames[j]);
      }
      else if (outElementType == ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT)
      {
        out[j] = &(outs[j]);
        status = data_->infer_ops.TIDLRT_setTensorDefault(out[j]);
        out[j]->ptr = (float *)(output);
        out[j]->zeroPoint = 0;
        out[j]->elementType = TIDLRT_Float32;
        out[j]->scale = 1.0;
        out[j]->layout = TIDLRT_LT_NCHW;
        strcpy((char *)out[j]->name, (char *)onnxRtParams->outDataNames[j]);
      }
      else
      {
        printf("ERROR : Unsupported output tensor element type %ld \n", outElementType);
      }
    }
  }
  status = data_->infer_ops.TIDLRT_invoke(handle, in, out);

  if(options->m_debug_level > 0)
  {
    double proc_time    = (stats->proc_time_end - stats->proc_time_start)  / 1000;
    double cp_in_time   = (stats->cpIn_time_end - stats->cpIn_time_start)  / 1000;
    double cp_out_time  = (stats->cpOut_time_end - stats->cpOut_time_start)/ 1000;

    printf("Sub Graph Stats %f %f %f \n", cp_in_time, proc_time, cp_out_time);
  }
  return status;
}

extern "C"
{
int32_t TIDLEP_getDdrStats(uint64_t * read, uint64_t * write)
{
  return(data_->infer_ops.TIDLRT_getDdrStats(read, write));
}

void TIDL_createStateFunc(OnnxTIDLSubGraphParams * state_subGraph, std::string * string_buf, const std::string node_name)
{
  onnxRtParams_t * onnxRtParams = &state_subGraph->onnxRtParams;
  state_subGraph->currFrameIdx_ = 0;
  state_subGraph->subGraphPtr_ = NULL;
  state_subGraph->string_buf = string_buf;
  ModelProto model_proto;
  model_proto.ParseFromString(*string_buf);

  auto onnxGraph = model_proto.graph();

  if(data_->m_debug_level)
  {
    printf("Compile %s\n",  node_name.c_str());
    printf("Compiling Sub ONNX Model \n");
    onnxProto_PrintProps(onnxGraph);
  }

  state_subGraph->ioBuffDesc = (void*)malloc(sizeof(sTIDL_IOBufDesc_t));
  assert(state_subGraph->ioBuffDesc);

  int status = 0;
  char outDataNamesList[500] = "";
  tidl_onnxrtFindOnnxOutputNames(onnxGraph, (char*)outDataNamesList);
  strcpy((char*)state_subGraph->subGraphName_, (char*)outDataNamesList);
  strcpy((char*)state_subGraph->subGraphName_, replaceChar((char*)state_subGraph->subGraphName_, '/', '_', strlen((const char*)state_subGraph->subGraphName_)));

  status = tidl_subgraph_rt_create(data_, state_subGraph->subGraphName_, (sTIDL_IOBufDesc_t*)state_subGraph->ioBuffDesc, state_subGraph); 
  
  int32_t currIdx = 0;
  for (int i = 0; i < onnxGraph.input_size(); i++) 
  {    
    if (TIDL_isInputConst(string_buf, onnxGraph.input(i).name())) 
    {
      continue;
    }
    state_subGraph->inputIdx[currIdx++] = i;
  }
  state_subGraph->numInputs = currIdx;
  state_subGraph->numOutputs = onnxGraph.output_size();

  for (int i = 0; i < state_subGraph->numInputs; i++) 
  {      
    onnxrt_printf(data_->m_debug_level, "\nInput tensor name -  %s \n", onnxGraph.input(state_subGraph->inputIdx[i]).name().c_str());
    strcpy((char *)onnxRtParams->inDataNames[i],  (char*)onnxGraph.input(state_subGraph->inputIdx[i]).name().c_str());
  }
  for (int i = 0; i < state_subGraph->numOutputs; i++)
  {
    onnxrt_printf(data_->m_debug_level, "Output tensor name - %s \n", onnxGraph.output(i).name().c_str());
    strcpy((char *)onnxRtParams->outDataNames[i],  onnxGraph.output(i).name().c_str());
  }

  onnxrt_printf(data_->m_debug_level, "Compute status : %d \n", status);
}

void TIDL_computeInvokeFunc(OnnxTIDLSubGraphParams * state_subGraph)
{
  int32_t status;
  status = tidl_subgraph_rt_invoke(data_, (sTIDL_IOBufDesc_t*)state_subGraph->ioBuffDesc, state_subGraph);
  //TODO: call subgraph_rt_delete in destructor for infer
}


} //extern C

0 F_Fontana over 4 years ago in reply to Anand Pathak

Prodigy 140 points

Hi Anand,

thanks, on PC this worked successfully!

To execute on the TDA4 EVM instead, should I build it for J7 in the same way and replace libraries on the target?

Thanks,

Federico

+1 Anand Pathak over 4 years ago in reply to F_Fontana

TI__Genius 9065 points

Hi Federico,

Import library won't be needed for EVM. Need to build inference library.

make onnxrt_EP TARGET_PLATFORM=TI_DEVICE

And copy the .so file (onnxrt_EP/out/J7/A72/LINUX/release/libtidl_onnxrt_EP.so.1.0) to /usr/lib on EVM.

Regards,

Anand

Processors

Processors forum

TDA4VM: ONNX Runtime + TIDL: Issue on Element WIse Add operator