Hello,
I've been trying to set up edgeai-modelmaker and run the example scripts, but I am running into issues with dependencies. I am able to create the pyenv environment and run it without any issues. I am running ./setup_all.sh in the edgeai-modelmaker repository with yolov5 enabled. Everything seems to complete from there, and I am seeing the right edgeai folders in the parent directory. Unfortunately, when I am running ./run-modelmaker.sh with any of the configs, they fail for various reasons. I am also having some issues with edgeai-benchmark when running ./run_benchmarks_pc.sh. Could you please review my output and see if you can figure out what's happening? I am following the instructions exactly as they appear on the edgeai-modelmaker github page, and I have also tried again on a fresh vm image to no avail.
Output from running with config_classification.yaml:
UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this
DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid
potential slowness/freeze if necessary.
cpuset_checked))
Creating model
=> The shape of the following weights did not match:
classifier.1.weight
classifier.1.bias
=> WARNING: weights could not be loaded completely.
Start training
./run_modelmaker.sh: line 73: 22019 Killed python ./scripts/run_modelmaker.py $2 --target_device $1
Output from running with config_detection.yaml:
dataset split sizes are limited to: {'train': 393, 'val': 107}
loading annotations into memory...
Done (t=0.25s)
creating index...
index created!
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
Run params is at: /home/narayan/edgeai-modelmaker/data/projects/tiscapes2017_driving/run/20230713-140207/yolox_nano_lite/run.yaml
Traceback (most recent call last):
File "./scripts/run_modelmaker.py", line 140, in <module>
main(config)
File "./scripts/run_modelmaker.py", line 76, in main
model_runner.run()
File "/home/narayan/edgeai-modelmaker/edgeai_modelmaker/ai_modules/vision/runner.py", line 152, in run
self.model_training.run()
File "/home/narayan/edgeai-modelmaker/edgeai_modelmaker/ai_modules/vision/training/edgeai_mmdetection/detection.py", line 415, in run
__name__, force_import=True)
File "/home/narayan/edgeai-modelmaker/edgeai_modelmaker/utils/misc_utils.py", line 99, in import_file_or_folder
imported_module = importlib.import_module(basename, package_name or __name__)
File "/root/.pyenv/versions/3.6.15/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/narayan/edgeai-mmdetection/tools/train.py", line 16, in <module>
from mmdet.apis import init_random_seed, set_random_seed, train_detector
File "/home/narayan/edgeai-mmdetection/mmdet/apis/__init__.py", line 2, in <module>
from .inference import (async_inference_detector, inference_detector,
File "/home/narayan/edgeai-mmdetection/mmdet/apis/inference.py", line 7, in <module>
from mmcv.ops import RoIPool
File "/root/.pyenv/versions/py36/lib/python3.6/site-packages/mmcv/ops/__init__.py", line 2, in <module>
from .active_rotated_filter import active_rotated_filter
File "/root/.pyenv/versions/py36/lib/python3.6/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
['active_rotated_filter_forward', 'active_rotated_filter_backward'])
File "/root/.pyenv/versions/py36/lib/python3.6/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/root/.pyenv/versions/3.6.15/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
Output from running with config_segmentation.yaml:
INFO:20230713-164953: running - kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx
INFO:20230713-164953: pipeline_config - {'task_type': 'human_pose_estimation', 'dataset_category': 'cocokpts', 'calibration_dataset': <edgeai_benchmark.datasets.coco_kpts.COCOKeypoints object at 0x7f5fdffaa110>, 'input_dataset': <edgeai_benchmark.datasets.coco_kpts.COCOKeypoints object at 0x7f5fdffaaf10>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x7f5fdfc16f50>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x7f5fc785cc10>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x7f5fc7830050>, 'model_info': {'metric_reference': {'accuracy_ap[.5:.95]%': 49.5}, 'model_shortlist': 10}}
INFO:20230713-164953: infer - kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx - this may take some time...libtidl_onnxrt_EP loaded 0x7f5f753e07e0
^CProcess NoDaemonPoolWorker-4:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.6.15/lib/python3.6/multiprocessing/pool.py", line 720, in next
item = self._items.popleft()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./scripts/benchmark_modelzoo.py", line 74, in <module>
tools.run_accuracy(settings, work_dir)
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/tools/run_accuracy.py", line 88, in run_accuracy
pipeline_runner.run()
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 81, in run
return self._run_pipelines_parallel()
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 114, in _run_pipelines_parallel
results_list = parallel_exec.run()
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/utils/parallel_run.py", line 87, in run
return self._run_parallel()
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/utils/parallel_run.py", line 107, in _run_parallel
result = results_iterator.__next__(timeout=self.maxinterval)
File "/root/.pyenv/versions/3.6.15/lib/python3.6/multiprocessing/pool.py", line 724, in next
self._cond.wait(timeout)
File "/root/.pyenv/versions/3.6.15/lib/python3.6/threading.py", line 299, in wait
gotit = waiter.acquire(True, timeout)
Output from edgeai-benchmark ./run_benchmarks_pc.sh:
download_ok: True
configs to run: ['kd-7040_onnxrt_coco_edgeai-yolov5_yolov5s6_pose_640_ti_lite_54p9_82p2_onnx', 'kd-7050_onnxrt_coco_edgeai-yolov5_yolov5s6_pose_640_ti_lite_54p9_82p2_onnx', 'kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx']
number of configs: 3
TASKS | | 0% 0/3| [< ]
INFO:20230713-164850: starting process on parallel_device - 0 0%| || 0/3 [00:00<?, ?it/s]
INFO:20230713-164856: starting - kd-7040_onnxrt_coco_edgeai-yolov5_yolov5s6_pose_640_ti_lite_54p9_82p2_onnx
INFO:20230713-164856: model_path - /home/narayan/edgeai-yolov5/pretrained_models/models/keypoint/coco/edgeai-yolov5/yolov5s6_pose_640_ti_lite_54p9_82p2.onnx
INFO:20230713-164856: model_file - /home/narayan/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/kd-7040_onnxrt_coco_edgeai-yolov5_yolov5s6_pose_640_ti_lite_54p9_82p2_onnx/model/yolov5s6_pose_640_ti_lite_54p9_82p2.onnx
Downloading 1/1: /home/narayan/edgeai-yolov5/pretrained_models/models/keypoint/coco/edgeai-yolov5/yolov5s6_pose_640_ti_lite_54p9_82p2.onnx
Downloading software-dl.ti.com/.../yolov5s6_pose_640_ti_lite_54p9_82p2.onnx to /home/narayan/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/kd-7040_onnxrt_coco_edgeai-yolov5_yolov5s6_pose_640_ti_lite_54p9_82p2_onnx/model/yolov5s6_pose_640_ti_lite_54p9_82p2.onnx
103481344it [00:14, 7271582.86it/s]
Download done for /home/narayan/edgeai-yolov5/pretrained_models/models/keypoint/coco/edgeai-yolov5/yolov5s6_pose_640_ti_lite_54p9_82p2.onnx
INFO:20230713-164940: starting process on parallel_device - 0
INFO:20230713-164945: starting - kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx
Downloading software-dl.ti.com/.../kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx.tar.gz to /home/narayan/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx.tar.gz
44916736it [00:06, 6948378.06it/s]
Extracting /home/narayan/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx.tar.gz to /home/narayan/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx
INFO:20230713-164953: model_path - /home/narayan/edgeai-modelzoo/models/vision/keypoint/coco/edgeai-yolox/yolox_s_pose_ti_lite_49p5_78p0.onnx
INFO:20230713-164953: model_file - /home/narayan/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx/model/yolox_s_pose_ti_lite_49p5_78p0.onnx
INFO:20230713-164953: running - kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx
INFO:20230713-164953: pipeline_config - {'task_type': 'human_pose_estimation', 'dataset_category': 'cocokpts', 'calibration_dataset': <edgeai_benchmark.datasets.coco_kpts.COCOKeypoints object at 0x7f5fdffaa110>, 'input_dataset': <edgeai_benchmark.datasets.coco_kpts.COCOKeypoints object at 0x7f5fdffaaf10>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x7f5fdfc16f50>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x7f5fc785cc10>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x7f5fc7830050>, 'model_info': {'metric_reference': {'accuracy_ap[.5:.95]%': 49.5}, 'model_shortlist': 10}}
INFO:20230713-164953: infer - kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx - this may take some time...libtidl_onnxrt_EP loaded 0x7f5f753e07e0
^CProcess NoDaemonPoolWorker-4:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.6.15/lib/python3.6/multiprocessing/pool.py", line 720, in next
item = self._items.popleft()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./scripts/benchmark_modelzoo.py", line 74, in <module>
tools.run_accuracy(settings, work_dir)
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/tools/run_accuracy.py", line 88, in run_accuracy
pipeline_runner.run()
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 81, in run
return self._run_pipelines_parallel()
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 114, in _run_pipelines_parallel
results_list = parallel_exec.run()
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/utils/parallel_run.py", line 87, in run
return self._run_parallel()
File "/home/narayan/edgeai-benchmark/edgeai_benchmark/utils/parallel_run.py", line 107, in _run_parallel
result = results_iterator.__next__(timeout=self.maxinterval)
File "/root/.pyenv/versions/3.6.15/lib/python3.6/multiprocessing/pool.py", line 724, in next
self._cond.wait(timeout)
File "/root/.pyenv/versions/3.6.15/lib/python3.6/threading.py", line 299, in wait
gotit = waiter.acquire(True, timeout)