TDA4VM: EdgeAI-SDK run error: could not use cuda to train the example in EdgeAI-Modelmaker.

zhaoyanguo

Prodigy 60 points

Part Number: TDA4VM

Environment:

I use :

Processor SDK Linux for TDA4VM version 080600;

TexasInstruments/edgeai-modelmaker :master;

Problem:

when I try to run example of modelmaker, the print info is:

Run params is at: /home/ubuntu/data2/ZYG/EdgeAI/edgeai-modelmaker/data/projects/tiscapes2017_driving/run/20230427-103639/yolox_nano_lite/run.yaml
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3/'
/home/ubuntu/data2/ZYG/EdgeAI/edgeai-mmdetection/mmdet/utils/setup_env.py:33: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
...
Python: 3.6.13 |Anaconda, Inc.| (default, Jun  4 2021, 14:25:59) [GCC 7.5.0]
CUDA available: False
GCC: gcc (GCC) 10.2.0
PyTorch: 1.10.0+cu113

I think that my gpuand cuda is not work at first, then I try to test my GPU by a runnning a deeplearning test project:

you can see that the gpu is work well, the result seems to told me that cuda environment is available!

I want to konw how to resolve the problem, thanks!

over 2 years ago

+1 Pratik Kedar over 2 years ago

TI__Mastermind 24041 points

Hi,

As i can see from your shared logs, you are building yolox_nano model, could you please share config file with us.

Regards,

Pratik

0 zhaoyanguo over 2 years ago in reply to Pratik Kedar

Prodigy 60 points

This my config file for yolo_nano project:

common:
    target_module: 'vision'
    task_type: 'detection'
    target_device: 'TDA4VM'
    # run_name can be any string, but there are some special cases:
    # {date-time} will be replaced with datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    # {model_name} will be replaced with the name of the model
    run_name: '{date-time}/{model_name}'

dataset:
    # enable/disable dataset loading
    enable: True #False
    # max_num_files: [750, 250] #None

    # Object Detection Dataset Examples:
    # -------------------------------------
    # Example 1, (known datasets): 'widerface_detection', 'pascal_voc0712', 'coco_detection', 'udacity_selfdriving', 'tomato_detection', 'tiscapes2017_driving'
    # dataset_name: widerface_detection
    # -------------------------------------
    # Example 2, give a dataset name and input_data_path.
    # input_data_path could be a path to zip file, tar file, folder OR http, https link to zip or tar files
    # for input_data_path these are provided with this repository as examples:
    #    'http://software-dl.ti.com/jacinto7/esd/modelzoo/08_06_00_01/datasets/tiscapes2017_driving.zip'
    #    'http://software-dl.ti.com/jacinto7/esd/modelzoo/08_06_00_01/datasets/animal_detection.zip'
    # -------------------------------------
    # Example 3, give image folders with annotation files (require list with values for both train and val splits)
    # dataset_name: coco_detection
    # input_data_path: ["./data/projects/coco_detection/dataset/train2017",
    #                        "./data/projects/coco_detection/dataset/val2017"]
    # input_annotation_path: ["./data/projects/coco_detection/dataset/annotations/instances_train2017.json",
    #                        "./data/projects/coco_detection/dataset/annotations/instances_val2017.json"]
    # -------------------------------------
    dataset_name: tiscapes2017_driving
    input_data_path: 'http://software-dl.ti.com/jacinto7/esd/modelzoo/08_06_00_01/datasets/tiscapes2017_driving.zip'

training:
    # enable/disable training
    enable: True #False

    # Object Detection model chosen can be changed here if needed
    # options are: 'yolox_nano_lite', 'yolox_tiny_lite', 'yolox_s_lite'
    model_name: 'yolox_nano_lite'

    training_epochs: 15 #30
    # batch_size: 8 #32
    # learning_rate: 0.005
    num_gpus: 0 #0 #1 #4

compilation:
    # enable/disable compilation
    enable: True #False
    tensor_bits: 8 #16 #32

0 zhaoyanguo over 2 years ago in reply to Pratik Kedar

Prodigy 60 points

Thanks for reminding me that the config file, when I change the parameter of num_gpus:0 ==> num_gpus:1, my gpu could work.

Thank you.

Processors

Processors forum

TDA4VM: EdgeAI-SDK run error: could not use cuda to train the example in EdgeAI-Modelmaker.