TDA4VM: use TIDL few questions

wang luke

Part Number: TDA4VM

Hello, I have a few questions for you. They are all attached to the file. We await a good news with patience.

1. For Custom Model, sometimes it is necessary to use Ops that are not supported by TIDL and ONNRuntime. We try to introduce external ONNXRuntime Plugin Custom Op. The introduced Op is grid_sampler implemented in the open source project MMDeploy, and the compilation passed normally. And when only using the CPUProvier mode, the model can function normally whether in the Emulation environment or the TDA4 development board. However, when compiling the model file in the Heterogeneous Execution mode, the following error will occur:
    ```
    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /home/a0230315/workarea/edgeai/onnxruntime/onnxruntime/core/providers/tidl/tidl_execution_provider.cc:179 virtual std::vector<std::unique_ptr<onnxruntime::ComputeCapability> > onnxruntime::TidlExecutionProvider::GetCapability(const onnxruntime::GraphViewer&, const std::vector<const onnxruntime::KernelRegistry*>&) const graph_build.Resolve().IsOK() was false. 
    ```
    It is difficult to find out where the problem lies by tracing the ONNXRuntime source code in edgeai-tidl-tools. I would like to ask what the possible reasons are, or is there a better solution for Custom Op?

2. Following on from the above, we used ResNet-18 and connected several more intermediate layers as output. However, Segmenatation Fault (Core Dump) errors will occur when the model has multiple outputs, but if the same structure has only one output, there will be no similar problems. Is it that our settings are wrong, or that the tools provided by TIDL itself do not support multi-output and multi-input models?

3. Following on from the above, regarding the Custom Model, we tried to use YOLOv8 (excluding NMS) for inference. After investigation, we found that Concat has some unexpected limitations. For example, Concat cannot be performed along axes other than axis=1. What is this? Is it normal?

4. If we do not add image preprocessing (Sub with mean value, Div with std value) to the inside of the model, but have already processed it externally, the Compile Model will make an error when it reaches the PTQ stage. I would like to ask if Post-training Quantization cannot be performed if the Input Type is Float? Do we have to add the pre-processing section to the model?

5. As far as the conclusion is concerned, too advanced operations, such as sampling by coordinates, rearranging tensors, etc., basically cannot pass the Model Compile stage of TIDL. The same Op can be run in pure CPU EP mode but in Heterogeneous Execution It fails to run when delegated to Arm for execution. Are there detailed support documents for reference?

over 1 year ago

0 Pratik Kedar over 1 year ago

TI__Mastermind 24041 points

Hi,

Thank you for posting question, currently we are facing accelerated volume of questions on our platform.

We will get back to you on this, thank you for your patience.

0 Varun Tripathi over 1 year ago

TI__Genius 10155 points

Wang,
Apologies for the delay - we're going through your queries and will get back to you soon

Thanks,
Varun

0 Varun Tripathi over 1 year ago

TI__Genius 10155 points

Wang,
Here are a few responses to your queries (In blue):

1. For Custom Model, sometimes it is necessary to use Ops that are not supported by TIDL and ONNRuntime. We try to introduce external ONNXRuntime Plugin Custom Op. The introduced Op is grid_sampler implemented in the open source project MMDeploy, and the compilation passed normally. And when only using the CPUProvier mode, the model can function normally whether in the Emulation environment or the TDA4 development board. However, when compiling the model file in the Heterogeneous Execution mode, the following error will occur:
```
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /home/a0230315/workarea/edgeai/onnxruntime/onnxruntime/core/providers/tidl/tidl_execution_provider.cc:179 virtual std::vector<std::unique_ptr<onnxruntime::ComputeCapability> > onnxruntime::TidlExecutionProvider::GetCapability(const onnxruntime::GraphViewer&, const std::vector<const onnxruntime::KernelRegistry*>&) const graph_build.Resolve().IsOK() was false.
```
It is difficult to find out where the problem lies by tracing the ONNXRuntime source code in edgeai-tidl-tools. I would like to ask what the possible reasons are, or is there a better solution for Custom Op?

=> We plan to support Grid sample as part of SDK 10.0 - With regards to your question around custom operators via ONNX - is your intention to have an ARM specific custom op? (And what steps are you following to add the custom operator?)

2. Following on from the above, we used ResNet-18 and connected several more intermediate layers as output. However, Segmenatation Fault (Core Dump) errors will occur when the model has multiple outputs, but if the same structure has only one output, there will be no similar problems. Is it that our settings are wrong, or that the tools provided by TIDL itself do not support multi-output and multi-input models?

=> We support multiple inputs and outputs, can you try our latest release and let us know if you have any issues? https://github.com/TexasInstruments/edgeai-tidl-tools/releases/tag/09_02_07_00

3. Following on from the above, regarding the Custom Model, we tried to use YOLOv8 (excluding NMS) for inference. After investigation, we found that Concat has some unexpected limitations. For example, Concat cannot be performed along axes other than axis=1. What is this? Is it normal?

=> Newer releases support Concat with support for channel, height or width axis. You can refer to the following page to see operators supported by our execution provider: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/supported_ops_rts_versions.md

4. If we do not add image preprocessing (Sub with mean value, Div with std value) to the inside of the model, but have already processed it externally, the Compile Model will make an error when it reaches the PTQ stage. I would like to ask if Post-training Quantization cannot be performed if the Input Type is Float? Do we have to add the pre-processing section to the model?

=> Both cases are supported, can you share the model compilation options you are using?

5. As far as the conclusion is concerned, too advanced operations, such as sampling by coordinates, rearranging tensors, etc., basically cannot pass the Model Compile stage of TIDL. The same Op can be run in pure CPU EP mode but in Heterogeneous Execution It fails to run when delegated to Arm for execution. Are there detailed support documents for reference?
=> I'd recommend moving to a newer release (09_02_07_00) which has most of these issues ironed out, and look at compilation specific options here: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/README.md

Thanks,
Varun

Processors

Processors forum

TDA4VM: use TIDL few questions