Tool/software:
Hi,
I’m trying to perform inference on a TDA4VM board using a custom onnx model for semantic segmentation. I’ve tried two approaches:
1. Generate inference artifacts using the onnxruntime with TIDLCompilationProvider and the following provider options
TIDL_OPTIONS = {
'tidl_tools_path' : get_tidl_tools_path(),
'artifacts_folder' : WORKDIR + 'artifacts/',
"platform":"J7",
"version":"7.2",
'debug_level' : 2,
'tensor_bits' : 8,
"ti_internal_nc_flag" : 1601,
'advanced_options:calibration_frames' : 10,
'advanced_options:calibration_iterations' : 5,
'accuracy_level' : 1,
}
This returns no errors, and produces the required artifacts, but inference on the board based on these artifacts and onnxruntime with TIDLExecutionProvider yields severely degraded performance (near zero accuracy) for my model.
2. Pre-quantize the model using onnxruntime.quantization.quantize_static, and then try to generate artifacts using the TIDLCompilationProvider with provider options
TIDL_OPTIONS = {
'tidl_tools_path' : get_tidl_tools_path(),
'artifacts_folder' : WORKDIR + 'artifacts/',
"platform":"J7",
"version":"7.2",
'debug_level' : 2,
'tensor_bits' : 8,
"ti_internal_nc_flag" : 1601,
'advanced_options:prequantized_model': 1,
'accuracy_level' : 9,
}
This also returns no errors, but does not produce all the required artifacts in the artifacts folder. It produces the allowedNode.txt and onnxrtMetaData.txt files, but no .bin files; the .bin files are present in the tempDir subdirectory of the artifacts folder, but the file sizes of these artifacts is 4x what they should be (indicating float data rather than int8), and they cannot be used successfully for inference on the board. Note also that running the pre-quantized model on CPU yields high accuracy network predictions, so I’m sure that my network can be quantized to int8 without substantial loss of accuracy.
This problem seems to be independent of the actual model I’m using. Below, I’ve attached a minimal example of a randomly initialized 2 layer conv net that, when compiled+evaluated according to approach 1 above, yields low quality predictions, while pre-quantizing yields much better quality predictions on CPU, but no .bin artifacts are produced at compilation time. If you would like to run this script to see the problem yourself:
- For approach one, run
- python3 minimal_workflow.py -c [in the TI dev docker with standard setup]
- scp -r ti_comp_test minimal_workflow.py tda4vm: [i.e. copy script and model to board]
- python3 minimal_workflow.py -e [on the board]
- For approach two, run
- python3 minimal_workflow.py -p [in the TI dev docker]
- ls ti_comp_test/artifacts/
Step 1c and 2a print accuracy numbers that illustrate the effect of quantization; note that the numbers for 1c are much worse than those for 2a. Step 2b shows that no .bin files are generated when compiling pre-quantized models.
[Note that step 1a does show some warnings in stdout, but no fatal error is encountered; there are no warnings similar to these for my semantic segmentation model]
Any help regarding this would be highly appreciated! I’d prefer to use approach 2 (since this gives me more flexibility regarding the quantization algorithm), but fixing approach 1 would also be a step forward! Thanks for your help!