This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

segmentation demo error

Hi,

I use version 0.17 of both caffe-jacinto and caffe-jacinto-model.

OS:16.04

CUDA driver:384.18

CUDA 9.0

cudnn 7.6.5

libprotobuf 2.6.1(protoc --version)

I run 'make all' and "make pycaffe" in caffe-jacinto successed,and import caffe successed.

But, An error occurs when I run ./train_cityscapes_segmentation.sh.

Logging output to training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/train-log_2020-08-19_14-08-37.txt
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial
num_gpus: 1  gpulist: ['0']
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/l1reg
num_gpus: 1  gpulist: ['0']
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/sparse
num_gpus: 1  gpulist: ['0']
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/test
num_gpus: 1  gpulist: ['0']
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/test_quantize
num_gpus: 1  gpulist: ['0']
I0819 14:08:39.731693 14605 caffe.cpp:902] This is NVCaffe 0.17.0 started at Wed Aug 19 14:08:39 2020
I0819 14:08:39.731797 14605 caffe.cpp:904] CuDNN version: 7605
I0819 14:08:39.731802 14605 caffe.cpp:905] CuBLAS version: 9000
I0819 14:08:39.731806 14605 caffe.cpp:906] CUDA version: 9000
I0819 14:08:39.731808 14605 caffe.cpp:907] CUDA driver version: 9000
I0819 14:08:39.731813 14605 caffe.cpp:908] Arguments:
[0]: /home/dl/works/caffe-jacinto/build/tools/caffe
[1]: train
[2]: --solver=training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/solver.prototxt
[3]: --weights=../trained/image_classification/imagenet_jacintonet11v2/initial/imagenet_jacintonet11v2_iter_320000.caffemodel
[4]: --gpu
[5]: 0
I0819 14:08:39.753000 14605 gpu_memory.cpp:105] GPUMemory::Manager initialized
I0819 14:08:39.753576 14605 gpu_memory.cpp:107] Total memory: 4235001856, Free: 3848732672, dev_info[0]: total=4235001856 free=3848732672
I0819 14:08:39.753583 14605 caffe.cpp:226] Using GPUs 0
I0819 14:08:39.754065 14605 caffe.cpp:230] GPU 0: GeForce GTX 1050
I0819 14:08:39.754117 14605 solver.cpp:41] Solver data type: FLOAT
I0819 14:08:39.763103 14605 solver.cpp:44] Initializing solver from parameters:
train_net: "training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/train.prototxt"
test_net: "training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/test.prototxt"
test_iter: 500
test_interval: 2000
base_lr: 0.01
display: 100
max_iter: 120000
lr_policy: "multistep"
gamma: 0.1
power: 1
momentum: 0.9
weight_decay: 0.0001
snapshot: 10000
snapshot_prefix: "training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/cityscapes5_jsegnet21v2"
solver_mode: GPU
device_id: 0
random_seed: 33
debug_info: false
train_state {
  level: 0
  stage: ""
}
snapshot_after_train: true
test_initialization: false
stepvalue: 60000
stepvalue: 90000
iter_size: 1
type: "SGD"
I0819 14:08:39.763200 14605 solver.cpp:76] Creating training net from train_net file: training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/train.prototxt
I0819 14:08:39.763654 14605 net.cpp:457] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy/top1
I0819 14:08:39.763661 14605 net.cpp:457] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy/top5
I0819 14:08:39.763984 14605 net.cpp:80] Initializing net from parameters:
name: "jsegnet21v2_train"
state {
  phase: TRAIN
  level: 0
  stage: ""
}
layer {
  name: "data"
  type: "ImageLabelData"
  top: "data"
  top: "label"
  transform_param {
    mirror: true
    crop_size: 640
    mean_value: 0 

### network structure
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "out_deconv_final_up8"
  bottom: "label"
  top: "loss"
  propagate_down: true
  propagate_down: false
  loss_param {
    ignore_label: 255
    normalization: VALID
  }
}
I0819 14:08:39.764248 14605 net.cpp:110] Using FLOAT as default forward math type
I0819 14:08:39.764269 14605 net.cpp:116] Using FLOAT as default backward math type
I0819 14:08:39.764276 14605 layer_factory.hpp:172] Creating layer 'data' of type 'ImageLabelData'
I0819 14:08:39.764281 14605 layer_factory.hpp:184] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
I0819 14:08:39.764312 14605 net.cpp:200] Created Layer data (0)
I0819 14:08:39.764317 14605 net.cpp:542] data -> data
I0819 14:08:39.764336 14605 net.cpp:542] data -> label
I0819 14:08:39.764410 14605 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0819 14:08:39.764533 14605 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0819 14:08:39.764617 14605 data_reader.cpp:58] Data Reader threads: 1, out queues: 1, depth: 16
I0819 14:08:39.765130 14619 blocking_queue.cpp:40] Data layer prefetch queue empty
I0819 14:08:39.765652 14605 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0819 14:08:39.766284 14621 db_lmdb.cpp:36] Opened lmdb data/train-image-lmdb
*** Aborted at 1597817319 (unix time) try "date -d @1597817319" if you are using GNU date ***
PC: @     0x7f5d11183f88 caffe::C2TensorProtos::MergePartialFromCodedStream()
*** SIGSEGV (@0x0) received by PID 14605 (TID 0x7f5c8b7fe700) from PID 0; stack trace: ***
    @     0x7f5d0edb54c0 (unknown)
    @     0x7f5d11183f88 caffe::C2TensorProtos::MergePartialFromCodedStream()
    @     0x7f5d10134049 google::protobuf::MessageLite::ParseFromArray()
    @     0x7f5d1120722e caffe::DataReader<>::CursorManager::fetch()
    @     0x7f5d1120f370 caffe::DataReader<>::InternalThreadEntryN()
    @     0x7f5d111a9650 caffe::InternalThread::entry()
    @     0x7f5d111ab6db boost::detail::thread_data<>::run()
    @     0x7f5d059655d5 (unknown)
    @     0x7f5cee9706ba start_thread
    @     0x7f5d0ee874dd clone
    @                0x0 (unknown)

  • I am really hope to get your help.

  • I have found the reason for the error. This is because the conversion of the training set has problems.

    2534 /home/dl/works/Dataset/cityscapes/leftImg8bit/train/zurich/zurich_000114_000019_leftImg8bit.png (1024, 2048, 3) (3, 1024, 2048)
    2535 /home/dl/works/Data./tools/utils/create_cityscapes_segmentation_lmdb.sh: line 22: 3365 Killed ./tools/utils/create_segmentation_image_lmdb.py --rand_seed 1 --shuffle --list_file=data/train-image-list.txt --output_dir="data/train-image-lmdb"
    Starting...
    Namespace(height=None, image_dir=None, label=False, label_dict=None, list_file='data/val-image-list.txt', output_dir='data/val-image-lmdb', rand_seed=1, search_string='*.png', shuffle=True, width=None)
    Reading image list file...done
    0 /home/dl/works/Dataset/cityscapes/leftImg8bit/val/lindau/lindau_000037_000019_leftImg8bit.png (1024, 2048, 3) (3, 1024, 2048)
    1 /home/dl/works/Dataset/cityscapes/leftImg8bit/val/munster/munster_000014_000019_leftImg8bit.png (1024, 2048, 3) (3, 1024, 2048)
    2 /home/dl/works/Dataset/cityscapes/leftImg8bit/val/frankfurt/frankfurt_000000_021879_leftImg8bit.png (1024, 2048, 3) (3, 1024, 2048)

  • Hi Jiandong Gao,

    Thanks for the update.

    Can this thread be closed ? or do you have any further questions here?

    Regards,

    Praveen