Hi,
My team and I have been using the edgeai-benchmark tool in an attempt to compile a custom object detection model for use with EdgeAI apps on the SK-TDA4VM board. The custom model we are currently attempting this process with uses a checkpoint of efficientdet-lite0 as a base that has been finetuned on some of our own data. We have successfully compiled and ran similar models pulled from modelzoo using the benchmark tool (E.G. efficientdet-lite0_bifpn_maxpool2x2_relu_ti-lite), and much like the efficientdet models on modelzoo, we changed the activation type of our models to relu before we began finetuning.
After exporting our model to tflite format and following the instructions located here github.com/.../custom_models.md, we have created the following pipeline config settings which uses coco as the input and calibration datasets:
'od-xxxx': utils.dict_update(common_cfg, preprocess=preproc_transforms.get_transform_tflite((320, 320), (320, 320), backend='cv2'), session=tflite_session_type(**utils.dict_update(tflite_session_cfg, input_mean=(127.0, 127.0, 127.0), input_scale=(1.0/128.0, 1.0/128.0, 1.0/128.0)), runtime_options=utils.dict_update(runtime_options_tflite_np2, {'object_detection:meta_arch_type': 5, 'object_detection:meta_layers_names_list':f'{settings.models_path}/d0-lite_mod.prototxt'}), model_path=f'{settings.models_path}/d0-lite_mod.tflite'), postprocess=postproc_transforms.get_transform_detection_tflite( normalized_detections=False, ignore_index=0, resize_with_pad=False), metric=dict(label_offset_pred=datasets.coco_det_label_offset_90to90(label_offset=0)), model_info=dict(metric_reference={'accuracy_ap[.5:.95]%':31.57})
The prototext file we are using is mostly a copy of the modelzoo prototexts, with the only change being to the top_k, detection threshold, and input dimension values. When attempting to compile the model with this config, we get the error:
DIM Error - For Tensor 0, Dim 1 is 0
Followed by a segmentation fault. We also alternatively get the same dimension error, but instead of 0, it says Dim 1 is some large negative number.
In an effort to avoid crashing, we modified the above pipeline config by removing the meta layer names list option. This modification did allow the script to complete without crashing, however doing so caused the script to auto generate this as the meta pipeline data:
TIDL Meta PipeLine (Proto) File : Number of OD backbone nodes = 0 Size of odBackboneNodeIds = 0 Number of subgraphs:2 , 306 nodes delegated out of 363 nodes
This is incorrect for a variety of reasons (didn't count any OD backbone nodes, identified 2 subgraphs, not every node delegate), and unsurprisingly, inference calculated very poor results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Packaging the compiled model and running it on the board results in a very choppy framerate with no objects ever being detected. Attempting the same process using a model with efficientdet-lite1 as a base performed similarly, except it detected nothing but a large amount of false positives during inference. These results occur regardless of configuration in settings.yaml.
Is there something missing with our prototext or pipeline config that would account for the dimension error or the poor inference results, or could the issue lie in our model and how the TIDL tools interpret it?
Thank you,
Andrew