This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi experts,
We analyze our model using edge-AI tools on a TDA4VH (32TOPS), the time is about 7.5ms
We also use model analyzer from edge-AI studio, connected to AM69A (32 TOPS)
However, the result has a lot of difference, the inference only needs 2.40ms
What might be the cause of this difference?
For edge-AI tools, the sdk version is 09_00_00_06, and I heard that the sdk version of edge-AI studio is 8.06
Thanks.
Hi,
For edge-AI tools, the sdk version is 09_00_00_06, and I heard that the sdk version of edge-AI studio is 8.06
Please make sure that reference point for comparison is correct (SDK version)
I believe there should be a way to select sdk version on edgeai studio tool.
Thanks
Hi,
I asked this earlier, but unfortunately only sdk 8.6 is supported.
We assumed that the inference time will be close, in order to test our model on differenent SOC to get inference time that can be referenced.
But now the inference time might not be able to referenced.
Hi,
Thanks for confirmation, yes seems like tool is on supporting 9.0 sdk version yet.
We assumed that the inference time will be close,
No, this is not the case, as every sdk release has updated c7x firmware and other related sdk components.
I hope the observation related to nuances between benchmark data is clear, as it was rooted to sdk version mismatch.
Is there anything else you need help with ?
Thanks
Hi,
but the benchmark difference is large, for the same model, one inference time is 7.5ms and another is 2.4ms.
Is the influence of sdk version really ths big?
Hi,
Could you please share how are benchmarking the model on target ? which flow are you using here ? openvx based, gst based, or TIDL-RT via RTOS sdk ?
In case if you are using TIDL-RT can you share infer config to look for ?
Hi,
We both use OSRT for model inference
https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/tidl_osr_debug.md
here's the config we use:
tensor_bits = 8 debug_level = 0 max_num_subgraphs = 16 accuracy_level = 1 calibration_frames = 3 calibration_iterations = 3 output_feature_16bit_names_list = "bbox_0, bbox_1, bbox_2"#"conv1_2, fire9/concat_1" params_16bit_names_list = "" #"fire3/squeeze1x1_2" mixed_precision_factor = -1 quantization_scale_type = 0 high_resolution_optimization = 0 pre_batchnorm_fold = 0 ti_internal_nc_flag = 1601 data_convert = 0 SOC = os.environ["SOC"] if (quantization_scale_type == 3): data_convert = 0 #set to default accuracy_level 1 activation_clipping = 0 weight_clipping = 0 bias_calibration = 1 channel_wise_quantization = 0 tidl_tools_path = os.environ["TIDL_TOOLS_PATH"] optional_options = { # "priority":0, #delay in ms # "max_pre_empt_delay":10 "platform":"J7", "version":"7.2", "tensor_bits":tensor_bits, "debug_level":debug_level, "max_num_subgraphs":max_num_subgraphs, "deny_list":"", #"MaxPool" "deny_list:layer_type":"", "deny_list:layer_name":"", "model_type":"",#OD "accuracy_level":accuracy_level, "advanced_options:calibration_frames": calibration_frames, "advanced_options:calibration_iterations": calibration_iterations, "advanced_options:output_feature_16bit_names_list" : output_feature_16bit_names_list, "advanced_options:params_16bit_names_list" : params_16bit_names_list, "advanced_options:mixed_precision_factor" : mixed_precision_factor, "advanced_options:quantization_scale_type": quantization_scale_type, #"object_detection:meta_layers_names_list" : meta_layers_names_list, -- read from models_configs dictionary below #"object_detection:meta_arch_type" : meta_arch_type, -- read from models_configs dictionary below "advanced_options:high_resolution_optimization": high_resolution_optimization, "advanced_options:pre_batchnorm_fold" : pre_batchnorm_fold, "ti_internal_nc_flag" : ti_internal_nc_flag, # below options will be read only if accuracy_level = 9, else will be discarded.... for accuracy_level = 0/1, these are preset internally "advanced_options:activation_clipping" : activation_clipping, "advanced_options:weight_clipping" : weight_clipping, "advanced_options:bias_calibration" : bias_calibration, "advanced_options:add_data_convert_ops" : data_convert, "advanced_options:channel_wise_quantization" : channel_wise_quantization, # Advanced options for SOC 'am69a' # "advanced_options:inference_mode" : inference_mode, # "advanced_options:num_cores" : num_cores }
Hi,
Comparison would make sense if we do comparison with same sdk ! as its hard to catch the nuances in settings of both the flow and underneath sdk components are different.
Still am suspecting this could be because of few reasons (i dont have full visibility of both the flows at my end)
1. Different quantization style
1.1 8/16 bits, mixed etc
1.2 Calibration images set, calibration iterations, quantization style (symmetric vs asymmetric), other aspects of quantization
2. Arm offload
2.1 Check how many nodes are delegated to arm and c7x moreover you can check the subgraphs !
2.1 Are you opting for meta arch support
3. Advanced Options
Please make sure of the above mentioned things along with remaining compilations options are same.
Did you tried running same model in 8.5 OSRT flow ? what was the benchmark in that case ?
Hi,
I think the above settings are the same.
However we might not change our current sdk version, thus the testing will stop here.
Thanks for your support.