Hello, I have a few questions for you. They are all attached to the file. We await a good news with patience.
1. For Custom Model, sometimes it is necessary to use Ops that are not supported by TIDL and ONNRuntime. We try to introduce external ONNXRuntime Plugin Custom Op. The introduced Op is grid_sampler implemented in the open source project MMDeploy, and the compilation passed normally. And when only using the CPUProvier mode, the model can function normally whether in the Emulation environment or the TDA4 development board. However, when compiling the model file in the Heterogeneous Execution mode, the following error will occur: ``` onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /home/a0230315/workarea/edgeai/onnxruntime/onnxruntime/core/providers/tidl/tidl_execution_provider.cc:179 virtual std::vector<std::unique_ptr<onnxruntime::ComputeCapability> > onnxruntime::TidlExecutionProvider::GetCapability(const onnxruntime::GraphViewer&, const std::vector<const onnxruntime::KernelRegistry*>&) const graph_build.Resolve().IsOK() was false. ``` It is difficult to find out where the problem lies by tracing the ONNXRuntime source code in edgeai-tidl-tools. I would like to ask what the possible reasons are, or is there a better solution for Custom Op? 2. Following on from the above, we used ResNet-18 and connected several more intermediate layers as output. However, Segmenatation Fault (Core Dump) errors will occur when the model has multiple outputs, but if the same structure has only one output, there will be no similar problems. Is it that our settings are wrong, or that the tools provided by TIDL itself do not support multi-output and multi-input models? 3. Following on from the above, regarding the Custom Model, we tried to use YOLOv8 (excluding NMS) for inference. After investigation, we found that Concat has some unexpected limitations. For example, Concat cannot be performed along axes other than axis=1. What is this? Is it normal? 4. If we do not add image preprocessing (Sub with mean value, Div with std value) to the inside of the model, but have already processed it externally, the Compile Model will make an error when it reaches the PTQ stage. I would like to ask if Post-training Quantization cannot be performed if the Input Type is Float? Do we have to add the pre-processing section to the model? 5. As far as the conclusion is concerned, too advanced operations, such as sampling by coordinates, rearranging tensors, etc., basically cannot pass the Model Compile stage of TIDL. The same Op can be run in pure CPU EP mode but in Heterogeneous Execution It fails to run when delegated to Arm for execution. Are there detailed support documents for reference?