TDA4VH-Q1: The inference time of model is different using edge-AI tools and edge-AI studio

Liu Tsing-Yue

Intellectual 291 points

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA4VH, AM69A

Hi experts,

We analyze our model using edge-AI tools on a TDA4VH (32TOPS), the time is about 7.5ms

We also use model analyzer from edge-AI studio, connected to AM69A (32 TOPS)

dev.ti.com/.../

However, the result has a lot of difference, the inference only needs 2.40ms

What might be the cause of this difference?

For edge-AI tools, the sdk version is 09_00_00_06, and I heard that the sdk version of edge-AI studio is 8.06

Thanks.

11 months ago

0 Pratik Kedar 11 months ago

TI__Mastermind 24041 points

Hi,

Liu Tsing-Yue said:
For edge-AI tools, the sdk version is 09_00_00_06, and I heard that the sdk version of edge-AI studio is 8.06

Please make sure that reference point for comparison is correct (SDK version)

I believe there should be a way to select sdk version on edgeai studio tool.

Thanks

0 Liu Tsing-Yue 11 months ago in reply to Pratik Kedar

Intellectual 291 points

Hi,

I asked this earlier, but unfortunately only sdk 8.6 is supported.

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1354406/edge-ai-studio-choosing-sdk-version-in-model-analyzer/5168086?tisearch=e2e-sitesearch&keymatch=%2520user%253A574787#5168086

We assumed that the inference time will be close, in order to test our model on differenent SOC to get inference time that can be referenced.

But now the inference time might not be able to referenced.

0 Pratik Kedar 11 months ago in reply to Liu Tsing-Yue

TI__Mastermind 24041 points

Hi,

Thanks for confirmation, yes seems like tool is on supporting 9.0 sdk version yet.

Liu Tsing-Yue said:
We assumed that the inference time will be close,

No, this is not the case, as every sdk release has updated c7x firmware and other related sdk components.

I hope the observation related to nuances between benchmark data is clear, as it was rooted to sdk version mismatch.

Is there anything else you need help with ?

Thanks

0 Liu Tsing-Yue 11 months ago in reply to Pratik Kedar

Intellectual 291 points

Hi,

but the benchmark difference is large, for the same model, one inference time is 7.5ms and another is 2.4ms.

Is the influence of sdk version really ths big?

0 Pratik Kedar 11 months ago in reply to Liu Tsing-Yue

TI__Mastermind 24041 points

Hi,

Could you please share how are benchmarking the model on target ? which flow are you using here ? openvx based, gst based, or TIDL-RT via RTOS sdk ?

In case if you are using TIDL-RT can you share infer config to look for ?

0 Liu Tsing-Yue 11 months ago in reply to Pratik Kedar

Intellectual 291 points

Hi,

We both use OSRT for model inference

https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/tidl_osr_debug.md

here's the config we use:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
tensor_bits = 8
debug_level = 0
max_num_subgraphs = 16
accuracy_level = 1
calibration_frames = 3
calibration_iterations = 3
output_feature_16bit_names_list = "bbox_0, bbox_1, bbox_2"#"conv1_2, fire9/concat_1"
params_16bit_names_list = "" #"fire3/squeeze1x1_2"
mixed_precision_factor = -1
quantization_scale_type = 0
high_resolution_optimization = 0
pre_batchnorm_fold = 0
ti_internal_nc_flag = 1601
data_convert = 0
SOC = os.environ["SOC"]
if (quantization_scale_type == 3):
    data_convert = 0
#set to default accuracy_level 1
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

tensor_bits = 8
debug_level = 0
max_num_subgraphs = 16
accuracy_level = 1
calibration_frames = 3
calibration_iterations = 3
output_feature_16bit_names_list = "bbox_0, bbox_1, bbox_2"#"conv1_2, fire9/concat_1"
params_16bit_names_list = "" #"fire3/squeeze1x1_2"
mixed_precision_factor = -1
quantization_scale_type = 0
high_resolution_optimization = 0
pre_batchnorm_fold = 0

ti_internal_nc_flag = 1601

data_convert = 0
SOC = os.environ["SOC"]
if (quantization_scale_type == 3):
    data_convert = 0

#set to default accuracy_level 1
activation_clipping = 0
weight_clipping = 0
bias_calibration = 1
channel_wise_quantization = 0


tidl_tools_path = os.environ["TIDL_TOOLS_PATH"]

optional_options = {
# "priority":0,
#delay in ms
# "max_pre_empt_delay":10
"platform":"J7",
"version":"7.2",
"tensor_bits":tensor_bits,
"debug_level":debug_level,
"max_num_subgraphs":max_num_subgraphs,
"deny_list":"", #"MaxPool"
"deny_list:layer_type":"", 
"deny_list:layer_name":"",
"model_type":"",#OD
"accuracy_level":accuracy_level,
"advanced_options:calibration_frames": calibration_frames,
"advanced_options:calibration_iterations": calibration_iterations,
"advanced_options:output_feature_16bit_names_list" : output_feature_16bit_names_list,
"advanced_options:params_16bit_names_list" : params_16bit_names_list,
"advanced_options:mixed_precision_factor" :  mixed_precision_factor,
"advanced_options:quantization_scale_type": quantization_scale_type,
#"object_detection:meta_layers_names_list" : meta_layers_names_list,  -- read from models_configs dictionary below
#"object_detection:meta_arch_type" : meta_arch_type,                  -- read from models_configs dictionary below
"advanced_options:high_resolution_optimization": high_resolution_optimization,
"advanced_options:pre_batchnorm_fold" : pre_batchnorm_fold,
"ti_internal_nc_flag" : ti_internal_nc_flag,
# below options will be read only if accuracy_level = 9, else will be discarded.... for accuracy_level = 0/1, these are preset internally
"advanced_options:activation_clipping" : activation_clipping,
"advanced_options:weight_clipping" : weight_clipping,
"advanced_options:bias_calibration" : bias_calibration,
"advanced_options:add_data_convert_ops" : data_convert,
"advanced_options:channel_wise_quantization" : channel_wise_quantization,
# Advanced options for SOC 'am69a' 
# "advanced_options:inference_mode" : inference_mode,
# "advanced_options:num_cores" : num_cores
}

+1 Pratik Kedar 11 months ago in reply to Liu Tsing-Yue

TI__Mastermind 24041 points

Hi,

Comparison would make sense if we do comparison with same sdk ! as its hard to catch the nuances in settings of both the flow and underneath sdk components are different.

Still am suspecting this could be because of few reasons (i dont have full visibility of both the flows at my end)

1. Different quantization style

1.1 8/16 bits, mixed etc

1.2 Calibration images set, calibration iterations, quantization style (symmetric vs asymmetric), other aspects of quantization

2. Arm offload

2.1 Check how many nodes are delegated to arm and c7x moreover you can check the subgraphs !

2.1 Are you opting for meta arch support

3. Advanced Options

Please make sure of the above mentioned things along with remaining compilations options are same.

Did you tried running same model in 8.5 OSRT flow ? what was the benchmark in that case ?

0 Liu Tsing-Yue 11 months ago in reply to Pratik Kedar

Intellectual 291 points

Hi,

I think the above settings are the same.

However we might not change our current sdk version, thus the testing will stop here.

Thanks for your support.

Processors

Processors forum

TDA4VH-Q1: The inference time of model is different using edge-AI tools and edge-AI studio