Hi,
I use version 0.17 of both caffe-jacinto and caffe-jacinto-model.
OS:16.04
CUDA driver:384.18
CUDA 9.0
cudnn 7.6.5
libprotobuf 2.6.1(protoc --version)
I run 'make all' and "make pycaffe" in caffe-jacinto successed,and import caffe successed.
But, An error occurs when I run ./train_cityscapes_segmentation.sh.
Logging output to training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/train-log_2020-08-19_14-08-37.txt
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial
num_gpus: 1 gpulist: ['0']
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/l1reg
num_gpus: 1 gpulist: ['0']
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/sparse
num_gpus: 1 gpulist: ['0']
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/test
num_gpus: 1 gpulist: ['0']
training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/test_quantize
num_gpus: 1 gpulist: ['0']
I0819 14:08:39.731693 14605 caffe.cpp:902] This is NVCaffe 0.17.0 started at Wed Aug 19 14:08:39 2020
I0819 14:08:39.731797 14605 caffe.cpp:904] CuDNN version: 7605
I0819 14:08:39.731802 14605 caffe.cpp:905] CuBLAS version: 9000
I0819 14:08:39.731806 14605 caffe.cpp:906] CUDA version: 9000
I0819 14:08:39.731808 14605 caffe.cpp:907] CUDA driver version: 9000
I0819 14:08:39.731813 14605 caffe.cpp:908] Arguments:
[0]: /home/dl/works/caffe-jacinto/build/tools/caffe
[1]: train
[2]: --solver=training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/solver.prototxt
[3]: --weights=../trained/image_classification/imagenet_jacintonet11v2/initial/imagenet_jacintonet11v2_iter_320000.caffemodel
[4]: --gpu
[5]: 0
I0819 14:08:39.753000 14605 gpu_memory.cpp:105] GPUMemory::Manager initialized
I0819 14:08:39.753576 14605 gpu_memory.cpp:107] Total memory: 4235001856, Free: 3848732672, dev_info[0]: total=4235001856 free=3848732672
I0819 14:08:39.753583 14605 caffe.cpp:226] Using GPUs 0
I0819 14:08:39.754065 14605 caffe.cpp:230] GPU 0: GeForce GTX 1050
I0819 14:08:39.754117 14605 solver.cpp:41] Solver data type: FLOAT
I0819 14:08:39.763103 14605 solver.cpp:44] Initializing solver from parameters:
train_net: "training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/train.prototxt"
test_net: "training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/test.prototxt"
test_iter: 500
test_interval: 2000
base_lr: 0.01
display: 100
max_iter: 120000
lr_policy: "multistep"
gamma: 0.1
power: 1
momentum: 0.9
weight_decay: 0.0001
snapshot: 10000
snapshot_prefix: "training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/cityscapes5_jsegnet21v2"
solver_mode: GPU
device_id: 0
random_seed: 33
debug_info: false
train_state {
level: 0
stage: ""
}
snapshot_after_train: true
test_initialization: false
stepvalue: 60000
stepvalue: 90000
iter_size: 1
type: "SGD"
I0819 14:08:39.763200 14605 solver.cpp:76] Creating training net from train_net file: training/cityscapes5_jsegnet21v2_2020-08-19_14-08-37/initial/train.prototxt
I0819 14:08:39.763654 14605 net.cpp:457] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy/top1
I0819 14:08:39.763661 14605 net.cpp:457] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy/top5
I0819 14:08:39.763984 14605 net.cpp:80] Initializing net from parameters:
name: "jsegnet21v2_train"
state {
phase: TRAIN
level: 0
stage: ""
}
layer {
name: "data"
type: "ImageLabelData"
top: "data"
top: "label"
transform_param {
mirror: true
crop_size: 640
mean_value: 0
### network structure
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "out_deconv_final_up8"
bottom: "label"
top: "loss"
propagate_down: true
propagate_down: false
loss_param {
ignore_label: 255
normalization: VALID
}
}
I0819 14:08:39.764248 14605 net.cpp:110] Using FLOAT as default forward math type
I0819 14:08:39.764269 14605 net.cpp:116] Using FLOAT as default backward math type
I0819 14:08:39.764276 14605 layer_factory.hpp:172] Creating layer 'data' of type 'ImageLabelData'
I0819 14:08:39.764281 14605 layer_factory.hpp:184] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
I0819 14:08:39.764312 14605 net.cpp:200] Created Layer data (0)
I0819 14:08:39.764317 14605 net.cpp:542] data -> data
I0819 14:08:39.764336 14605 net.cpp:542] data -> label
I0819 14:08:39.764410 14605 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0819 14:08:39.764533 14605 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0819 14:08:39.764617 14605 data_reader.cpp:58] Data Reader threads: 1, out queues: 1, depth: 16
I0819 14:08:39.765130 14619 blocking_queue.cpp:40] Data layer prefetch queue empty
I0819 14:08:39.765652 14605 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0819 14:08:39.766284 14621 db_lmdb.cpp:36] Opened lmdb data/train-image-lmdb
*** Aborted at 1597817319 (unix time) try "date -d @1597817319" if you are using GNU date ***
PC: @ 0x7f5d11183f88 caffe::C2TensorProtos::MergePartialFromCodedStream()
*** SIGSEGV (@0x0) received by PID 14605 (TID 0x7f5c8b7fe700) from PID 0; stack trace: ***
@ 0x7f5d0edb54c0 (unknown)
@ 0x7f5d11183f88 caffe::C2TensorProtos::MergePartialFromCodedStream()
@ 0x7f5d10134049 google::protobuf::MessageLite::ParseFromArray()
@ 0x7f5d1120722e caffe::DataReader<>::CursorManager::fetch()
@ 0x7f5d1120f370 caffe::DataReader<>::InternalThreadEntryN()
@ 0x7f5d111a9650 caffe::InternalThread::entry()
@ 0x7f5d111ab6db boost::detail::thread_data<>::run()
@ 0x7f5d059655d5 (unknown)
@ 0x7f5cee9706ba start_thread
@ 0x7f5d0ee874dd clone
@ 0x0 (unknown)