This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: EdgeAI-SDK run error: could not use cuda to train the example in EdgeAI-Modelmaker.

Part Number: TDA4VM

Environment:

I use :

Processor SDK Linux for TDA4VM version 080600;

/edgeai-modelmaker :master;

Problem:

when I try to run example of modelmaker, the print info is:

Run params is at: /home/ubuntu/data2/ZYG/EdgeAI/edgeai-modelmaker/data/projects/tiscapes2017_driving/run/20230427-103639/yolox_nano_lite/run.yaml
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3/'
/home/ubuntu/data2/ZYG/EdgeAI/edgeai-mmdetection/mmdet/utils/setup_env.py:33: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
...
Python: 3.6.13 |Anaconda, Inc.| (default, Jun  4 2021, 14:25:59) [GCC 7.5.0]
CUDA available: False
GCC: gcc (GCC) 10.2.0
PyTorch: 1.10.0+cu113

I think that my gpuand cuda is not work at first, then I try to test my GPU by a runnning a deeplearning test project:

you can see that the gpu is work well, the result seems to told me that cuda environment is available!

I want to konw how to resolve the problem, thanks!

  • Hi,

    As i can see from your shared logs, you are building yolox_nano model, could you please share config file with us.

    Regards,

    Pratik

  • This my config file for yolo_nano project:

    common:
        target_module: 'vision'
        task_type: 'detection'
        target_device: 'TDA4VM'
        # run_name can be any string, but there are some special cases:
        # {date-time} will be replaced with datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
        # {model_name} will be replaced with the name of the model
        run_name: '{date-time}/{model_name}'
    
    dataset:
        # enable/disable dataset loading
        enable: True #False
        # max_num_files: [750, 250] #None
    
        # Object Detection Dataset Examples:
        # -------------------------------------
        # Example 1, (known datasets): 'widerface_detection', 'pascal_voc0712', 'coco_detection', 'udacity_selfdriving', 'tomato_detection', 'tiscapes2017_driving'
        # dataset_name: widerface_detection
        # -------------------------------------
        # Example 2, give a dataset name and input_data_path.
        # input_data_path could be a path to zip file, tar file, folder OR http, https link to zip or tar files
        # for input_data_path these are provided with this repository as examples:
        #    'http://software-dl.ti.com/jacinto7/esd/modelzoo/08_06_00_01/datasets/tiscapes2017_driving.zip'
        #    'http://software-dl.ti.com/jacinto7/esd/modelzoo/08_06_00_01/datasets/animal_detection.zip'
        # -------------------------------------
        # Example 3, give image folders with annotation files (require list with values for both train and val splits)
        # dataset_name: coco_detection
        # input_data_path: ["./data/projects/coco_detection/dataset/train2017",
        #                        "./data/projects/coco_detection/dataset/val2017"]
        # input_annotation_path: ["./data/projects/coco_detection/dataset/annotations/instances_train2017.json",
        #                        "./data/projects/coco_detection/dataset/annotations/instances_val2017.json"]
        # -------------------------------------
        dataset_name: tiscapes2017_driving
        input_data_path: 'http://software-dl.ti.com/jacinto7/esd/modelzoo/08_06_00_01/datasets/tiscapes2017_driving.zip'
    
    training:
        # enable/disable training
        enable: True #False
    
        # Object Detection model chosen can be changed here if needed
        # options are: 'yolox_nano_lite', 'yolox_tiny_lite', 'yolox_s_lite'
        model_name: 'yolox_nano_lite'
    
        training_epochs: 15 #30
        # batch_size: 8 #32
        # learning_rate: 0.005
        num_gpus: 0 #0 #1 #4
    
    compilation:
        # enable/disable compilation
        enable: True #False
        tensor_bits: 8 #16 #32
    

  • Thanks for reminding me that the config file, when I change the parameter of num_gpus:0 ==> num_gpus:1, my gpu could work.

    Thank you.