This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA2x

Hi,

I follow the documentation for the semantic segmentation example. but when I do the cmd  './ train_cityscapes_segmentation.sh'.

Logging output to training/cityscapes5_jsegnet21v2_2020-08-17_11-30-06/train-log_2020-08-17_11-30-06.txt

training/cityscapes5_jsegnet21v2_2020-08-17_11-30-06/initial
num_gpus: 1 gpulist: ['0']
I0817 11:30:08.093606 25692 caffe.cpp:902] This is NVCaffe 0.17.0 started at Mon Aug 17 11:30:07 2020
I0817 11:30:08.093741 25692 caffe.cpp:904] CuDNN version: 7605
I0817 11:30:08.093746 25692 caffe.cpp:905] CuBLAS version: 9000
I0817 11:30:08.093749 25692 caffe.cpp:906] CUDA version: 9000
I0817 11:30:08.093750 25692 caffe.cpp:907] CUDA driver version: 10000
I0817 11:30:08.093755 25692 caffe.cpp:908] Arguments:
[0]: /home/jiandong/Project/caffe-jacinto/build/tools/caffe
[1]: train
[2]: --solver=training/cityscapes5_jsegnet21v2_2020-08-17_11-30-06/initial/solver.prototxt
[3]: --weights=../trained/image_classification/imagenet_jacintonet11v2/initial/imagenet_jacintonet11v2_iter_320000.caffemodel
[4]: --gpu
[5]: 0
I0817 11:30:08.114403 25692 gpu_memory.cpp:105] GPUMemory::Manager initialized
I0817 11:30:08.114796 25692 gpu_memory.cpp:107] Total memory: 4236312576, Free: 3315728384, dev_info[0]: total=4236312576 free=3315728384
I0817 11:30:08.114804 25692 caffe.cpp:226] Using GPUs 0
I0817 11:30:08.115073 25692 caffe.cpp:230] GPU 0: GeForce GTX 1050 Ti
I0817 11:30:08.115137 25692 solver.cpp:41] Solver data type: FLOAT
I0817 11:30:08.121004 25692 solver.cpp:44] Initializing solver from parameters:
train_net: "training/cityscapes5_jsegnet21v2_2020-08-17_11-30-06/initial/train.prototxt"
test_net: "training/cityscapes5_jsegnet21v2_2020-08-17_11-30-06/initial/test.prototxt"
test_iter: 500
test_interval: 2000
base_lr: 0.01
display: 100
max_iter: 120000
lr_policy: "multistep"
gamma: 0.1
power: 1
momentum: 0.9
weight_decay: 0.0001
snapshot: 10000
snapshot_prefix: "training/cityscapes5_jsegnet21v2_2020-08-17_11-30-06/initial/cityscapes5_jsegnet21v2"
solver_mode: GPU
device_id: 0
random_seed: 33
debug_info: false
train_state {
level: 0
stage: ""
}
snapshot_after_train: true
test_initialization: false
stepvalue: 60000
stepvalue: 90000
iter_size: 4
type: "SGD"
I0817 11:30:08.121160 25692 solver.cpp:76] Creating training net from train_net file: training/cityscapes5_jsegnet21v2_2020-08-17_11-30-06/initial/train.prototxt
I0817 11:30:08.121774 25692 net.cpp:457] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy/top1
I0817 11:30:08.121781 25692 net.cpp:457] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy/top5
I0817 11:30:08.122068 25692 net.cpp:80] Initializing net from parameters:
name: "jsegnet21v2_train"
state {
phase: TRAIN
level: 0
stage: ""
}
layer {
name: "data"
type: "ImageLabelData"
top: "data"
top: "label"
transform_param {
mirror: true
crop_size: 640
mean_value: 0
}
image_label_data_param {
image_list_path: "data/train-image-lmdb"
label_list_path: "data/train-label-lmdb"
batch_size: 4
shuffle: true
threads: 1
backend: LMDB
}
}
layer {
name: "data/bias"
type: "Bias"
bottom: "data"
top: "data/bias"
param {
lr_mult: 0
decay_mult: 0
}
bias_param {
filler {
type: "constant"
value: -128
}
}
}
layer {
name: "conv1a"
type: "Convolution"
bottom: "data/bias"
top: "conv1a"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
bias_term: true
pad: 2
kernel_size: 5
group: 1
stride: 2
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "conv1a/bn"
type: "BatchNorm"
bottom: "conv1a"
top: "conv1a"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "conv1a/relu"
type: "ReLU"
bottom: "conv1a"
top: "conv1a"
}
layer {
name: "conv1b"
type: "Convolution"
bottom: "conv1a"
top: "conv1b"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
bias_term: true
pad: 1
kernel_size: 3
group: 4
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "conv1b/bn"
type: "BatchNorm"
bottom: "conv1b"
top: "conv1b"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "conv1b/relu"
type: "ReLU"
bottom: "conv1b"
top: "conv1b"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1b"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "res2a_branch2a"
type: "Convolution"
bottom: "pool1"
top: "res2a_branch2a"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
bias_term: true
pad: 1
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "res2a_branch2a/bn"
type: "BatchNorm"
bottom: "res2a_branch2a"
top: "res2a_branch2a"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "res2a_branch2a/relu"
type: "ReLU"
bottom: "res2a_branch2a"
top: "res2a_branch2a"
}
layer {
name: "res2a_branch2b"
type: "Convolution"
bottom: "res2a_branch2a"
top: "res2a_branch2b"
param {
lr_mult: 1
decay_mult:1 }decay_mult: 0lr_mult: 2param {
}




convolution_param {
num_output: 64
bias_term: true
pad: 1
kernel_size: 3
group: 4
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "res2a_branch2b/bn"
type: "BatchNorm"
bottom: "res2a_branch2b"
top: "res2a_branch2b"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "res2a_branch2b/relu"
type: "ReLU"
bottom: "res2a_branch2b"
top: "res2a_branch2b"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "res2a_branch2b"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "res3a_branch2a"
type: "Convolution"
bottom: "pool2"
top: "res3a_branch2a"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
bias_term: true
pad: 1
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "res3a_branch2a/bn"
type: "BatchNorm"
bottom: "res3a_branch2a"
top: "res3a_branch2a"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "res3a_branch2a/relu"
type: "ReLU"
bottom: "res3a_branch2a"
top: "res3a_branch2a"
}
layer {
name: "res3a_branch2b"
type: "Convolution"
bottom: "res3a_branch2a"
top: "res3a_branch2b"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
bias_term: true
pad: 1
kernel_size: 3
group: 4
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "res3a_branch2b/bn"
type: "BatchNorm"
bottom: "res3a_branch2b"
top: "res3a_branch2b"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "res3a_branch2b/relu"
type: "ReLU"
bottom: "res3a_branch2b"
top: "res3a_branch2b"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "res3a_branch2b"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "res4a_branch2a"
type: "Convolution"
bottom: "pool3"
top: "res4a_branch2a"
param {
lr_mult: 1
decay_mult:1 }decay_mult: 0lr_mult: 2param {
}




convolution_param {
num_output: 256
bias_term: true
pad: 1
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "res4a_branch2a/bn"
type: "BatchNorm"
bottom: "res4a_branch2a"
top: "res4a_branch2a"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "res4a_branch2a/relu"
type: "ReLU"
bottom: "res4a_branch2a"
top: "res4a_branch2a"
}
layer {
name: "res4a_branch2b"
type: "Convolution"
bottom: "res4a_branch2a"
top: "res4a_branch2b"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
bias_term: true
pad: 1
kernel_size: 3
group: 4
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "res4a_branch2b/bn"
type: "BatchNorm"
bottom: "res4a_branch2b"
top: "res4a_branch2b"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "res4a_branch2b/relu"
type: "ReLU"
bottom: "res4a_branch2b"
top: "res4a_branch2b"
}
layer {
name: "pool4"
type: "Pooling"
bottom: "res4a_branch2b"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 1
stride: 1
}
}
layer {
name: "res5a_branch2a"
type: "Convolution"
bottom: "pool4"
top: "res5a_branch2a"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
bias_term: true
pad: 2
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 2
}
}
layer {
name: "res5a_branch2a/bn"
type: "BatchNorm"
bottom: "res5a_branch2a"
top: "res5a_branch2a"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "res5a_branch2a/relu"
type: "ReLU"
bottom: "res5a_branch2a"
top: "res5a_branch2a"
}
layer {
name: "res5a_branch2b"
type: "Convolution"
bottom: "res5a_branch2a"
top: "res5a_branch2b"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
bias_term: true
pad: 2
kernel_size: 3
group: 4
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 2
}
}
layer {
name: "res5a_branch2b/bn"
type: "BatchNorm"
bottom: "res5a_branch2b"
top: "res5a_branch2b"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "res5a_branch2b/relu"
type: "ReLU"
bottom: "res5a_branch2b"
top: "res5a_branch2b"
}
layer {
name: "out5a"
type: "Convolution"
bottom: "res5a_branch2b"
top: "out5a"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
bias_term: true
pad: 4
kernel_size: 3
group: 2
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 4
}
}
layer {
name: "out5a/bn"
type: "BatchNorm"
bottom: "out5a"
top: "out5a"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "out5a/relu"
type: "ReLU"
bottom: "out5a"
top: "out5a"
}
layer {
name: "out5a_up2"
type: "Deconvolution"
bottom: "out5a"
top: "out5a_up2"
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 64
bias_term: false
pad: 1
kernel_size: 4
group: 64
stride: 2
weight_filler {
type: "bilinear"
}
}
}
layer {
name: "out3a"
type: "Convolution"
bottom: "res3a_branch2b"
top: "out3a"}decay_mult: 1lr_mult: 1
param {



param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
bias_term: true
pad: 1
kernel_size: 3
group: 2
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "out3a/bn"
type: "BatchNorm"
bottom: "out3a"
top: "out3a"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "out3a/relu"
type: "ReLU"
bottom: "out3a"
top: "out3a"
}
layer {
name: "out3_out5_combined"
type: "Eltwise"
bottom: "out5a_up2"
bottom: "out3a"
top: "out3_out5_combined"
}
layer {
name: "ctx_conv1"
type: "Convolution"
bottom: "out3_out5_combined"
top: "ctx_conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
bias_term: true
pad: 1
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "ctx_conv1/bn"
type: "BatchNorm"
bottom: "ctx_conv1"
top: "ctx_conv1"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "ctx_conv1/relu"
type: "ReLU"
bottom: "ctx_conv1"
top: "ctx_conv1"
}
layer {
name: "ctx_conv2"
type: "Convolution"
bottom: "ctx_conv1"
top: "ctx_conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
bias_term: true
pad: 4
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 4
}
}
layer {
name: "ctx_conv2/bn"
type: "BatchNorm"
bottom: "ctx_conv2"
top: "ctx_conv2"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "ctx_conv2/relu"
type: "ReLU"
bottom: "ctx_conv2"
top: "ctx_conv2"
}
layer {
name: "ctx_conv3"
type: "Convolution"
bottom: "ctx_conv2"
top: "ctx_conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
bias_term: true
pad: 4
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 4
}
}
layer {
name: "ctx_conv3/bn"
type: "BatchNorm"
bottom: "ctx_conv3"
top: "ctx_conv3"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "ctx_conv3/relu"
type: "ReLU"
bottom: "ctx_conv3"
top: "ctx_conv3"
}
layer {
name: "ctx_conv4"
type: "Convolution"
bottom: "ctx_conv3"
top: "ctx_conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
bias_term: true
pad: 4
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 4
}
}
layer {
name: "ctx_conv4/bn"
type: "BatchNorm"
bottom: "ctx_conv4"
top: "ctx_conv4"
batch_norm_param {
moving_average_fraction: 0.99
eps: 0.0001
scale_bias: true
}
}
layer {
name: "ctx_conv4/relu"
type: "ReLU"
bottom: "ctx_conv4"
top: "ctx_conv4"
}
layer {
name: "ctx_final"
type: "Convolution"
bottom: "ctx_conv4"
top: "ctx_final"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 8
bias_term: true
pad: 1
kernel_size: 3
kernel_size: 3
group: 1
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0
}
dilation: 1
}
}
layer {
name: "ctx_final/relu"
type: "ReLU"
bottom: "ctx_final"
top: "ctx_final"
}
layer {
name: "out_deconv_final_up2"
type: "Deconvolution"
bottom: "ctx_final"
top: "out_deconv_final_up2"
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 8
bias_term: false
pad: 1
kernel_size: 4
group: 8
stride: 2
weight_filler {
type: "bilinear"
}
}
}
layer {
name: "out_deconv_final_up4"
type: "Deconvolution"
bottom: "out_deconv_final_up2"
top: "out_deconv_final_up4"}decay_mult: 0lr_mult: 0
param {



convolution_param {
num_output: 8
bias_term: false
pad: 1
kernel_size: 4
group: 8
stride: 2
weight_filler {
type: "bilinear"
}
}
}
layer {
name: "out_deconv_final_up8"
type: "Deconvolution"
bottom: "out_deconv_final_up4"
top: "out_deconv_final_up8"
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 8
bias_term: false
pad: 1
kernel_size: 4
group: 8
stride: 2
weight_filler {
type: "bilinear"
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "out_deconv_final_up8"
bottom: "label"
top: "loss"
propagate_down: true
propagate_down: false
loss_param {
ignore_label: 255
normalization: VALID
}
}
I0817 11:30:08.122395 25692 net.cpp:110] Using FLOAT as default forward math type
I0817 11:30:08.122406 25692 net.cpp:116] Using FLOAT as default backward math type
I0817 11:30:08.122413 25692 layer_factory.hpp:172] Creating layer 'data' of type 'ImageLabelData'
I0817 11:30:08.122421 25692 layer_factory.hpp:184] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
I0817 11:30:08.122437 25692 net.cpp:200] Created Layer data (0)
I0817 11:30:08.122442 25692 net.cpp:542] data -> data
I0817 11:30:08.122463 25692 net.cpp:542] data -> label
I0817 11:30:08.122532 25692 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0817 11:30:08.123039 25692 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0817 11:30:08.123042 25712 blocking_queue.cpp:40] Data layer prefetch queue empty
I0817 11:30:08.123098 25692 data_reader.cpp:58] Data Reader threads: 1, out queues: 1, depth: 4
I0817 11:30:08.123612 25692 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
I0817 11:30:08.124177 25714 db_lmdb.cpp:36] Opened lmdb data/train-image-lmdb
*** Aborted at 1597635008 (unix time) try "date -d @1597635008" if you are using GNU date ***
PC: @ 0x7fa63a96cecf caffe::C2TensorProtos::MergePartialFromCodedStream()
*** SIGSEGV (@0x0) received by PID 25692 (TID 0x7fa5a2263700) from PID 0; stack trace: ***
@ 0x7fa63844a4c0 (unknown)
@ 0x7fa63a96cecf caffe::C2TensorProtos::MergePartialFromCodedStream()
@ 0x7fa63985d892 google::protobuf::MessageLite::ParseFromArray()
@ 0x7fa63a9aa4e1 caffe::DataReader<>::CursorManager::fetch()
@ 0x7fa63a9b71f0 caffe::DataReader<>::InternalThreadEntryN()
@ 0x7fa63ae3f640 caffe::InternalThread::entry()
@ 0x7fa63ae415db boost::detail::thread_data<>::run()
@ 0x7fa62e8045d5 (unknown)
@ 0x7fa61780f6ba start_thread
@ 0x7fa63851c4dd clone

  • Hi 

    caffe-jacinto has not been updated recently. It is likely that it doesn't work with the latest version of drivers. Couple of details might help to get an idea of why this crash is occuring.

    (1) what is the OS?

    (2) which version of Python

    (3) what is your CUDA version

    (4) what is the cudnn version?

  • (1) OS:Ubuntu16.04

    (2)python==2.7

    (3) cuda9

    (4)cudnn7

    (5)cuda driver 418.78 (I fix it in makefile)

    I0817 11:30:08.093741 25692 caffe.cpp:904] CuDNN version: 7605
    I0817 11:30:08.093746 25692 caffe.cpp:905] CuBLAS version: 9000
    I0817 11:30:08.093749 25692 caffe.cpp:906] CUDA version: 9000
    I0817 11:30:08.093750 25692 caffe.cpp:907] CUDA driver version: 10000

    I only want to run the segmentation demo in my pc, then deploy in TDA2x, which branch should I use? 0.17 or master? 

  • There may be some incompatibility between your driver versions. I am no that knowledgeable to guess the reason for the issue just by looking at these parameters. I would recommend to try with a few versions and then see if the issue goes away. 

    But do you know why the CUDA version and CUDA driver version are reported to be different in the log? does it indicate some issue with your drivers?

    I0817 11:30:08.093749 25692 caffe.cpp:906] CUDA version: 9000
    I0817 11:30:08.093750 25692 caffe.cpp:907] CUDA driver version: 10000

    in github we have the branch caffe-0.17, and the master branch is empty. In git.ti.com we have only the master branch. Both of these are mostly same. You can use any of these.

    Best regards,

    Manu.

  • Thanks for your reply, I will try caffe-0.17 branch, My cuda driver version is 410.78 because my OS exist another CUDA-10.0 for other uses,In Caffe-jacinto,my CUDA version is 9.0.

    nvcc --version:nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.1.

    I add  /usr/lib/nvidia-410 of CUDA_LIB_DIR  in Makeflie, and use Anaconda3-python2.7 to make. 

    ifneq ("$(wildcard $(CUDA_DIR)/lib64)","")
    CUDA_LIB_DIR += $(CUDA_DIR)/lib64
    CUDA_LIB_DIR += /usr/lib/nvidia-410 /usr/lib/nvidia-396 /usr/lib/nvidia-390 /usr/lib/nvidia-387 /usr/lib/nvidia-384 /usr/lib/nvidia-381 /usr/lib/nvidia-375 /usr/lib/nvidia-367 /usr/lib/nvidia-361 /usr/lib/nvidia-352
    endif
    CUDA_LIB_DIR += $(CUDA_DIR)/lib

  • I don't think using caffe-0.17 branch will solve the issue. As i said I suspect issues with your driver or CUDA versions.

    It would be better to try it on another machine where is there is only one version of CUDA installed.

  • Thanks for your kindness! 

    In addition, will the following errors in make runtest affect the segmentation demo? It seems all in segmentationAccuracy.

  • That runtest error does not affect segmentation demo. You can go ahead with your evaluation.

    Best regards,

    Manu.

  • Thank you for your prompt reply.

    Best regards,

    Jiandong.

  • Hi Manu,

    Which protobuf version is recommend to run segmentation demo? This looks like a problem with protobuf. my protobuf version is 3.5.0

    I0818 09:56:04.259243 21965 net.cpp:110] Using FLOAT as default forward math type
    I0818 09:56:04.259255 21965 net.cpp:116] Using FLOAT as default backward math type
    I0818 09:56:04.259281 21965 layer_factory.hpp:172] Creating layer 'data' of type 'ImageLabelData'
    I0818 09:56:04.259287 21965 layer_factory.hpp:184] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT
    I0818 09:56:04.259304 21965 net.cpp:200] Created Layer data (0)
    I0818 09:56:04.259311 21965 net.cpp:542] data -> data
    I0818 09:56:04.259372 21965 net.cpp:542] data -> label
    I0818 09:56:04.259439 21965 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
    I0818 09:56:04.260056 21965 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
    I0818 09:56:04.260061 21984 blocking_queue.cpp:40] Data layer prefetch queue empty
    I0818 09:56:04.260115 21965 data_reader.cpp:58] Data Reader threads: 1, out queues: 1, depth: 4
    I0818 09:56:04.260507 21965 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0
    I0818 09:56:04.261078 21986 db_lmdb.cpp:36] Opened lmdb data/train-image-lmdb
    *** Aborted at 1597715764 (unix time) try "date -d @1597715764" if you are using GNU date ***
    PC: @ 0x7f0df1e2d40f caffe::C2TensorProtos::MergePartialFromCodedStream()
    *** SIGSEGV (@0x0) received by PID 21965 (TID 0x7f0d3d7fe700) from PID 0; stack trace: ***
    @ 0x7f0def9164c0 (unknown)
    @ 0x7f0df1e2d40f caffe::C2TensorProtos::MergePartialFromCodedStream()
    @ 0x7f0df0d29892 google::protobuf::MessageLite::ParseFromArray()
    @ 0x7f0df1e6a0f1 caffe::DataReader<>::CursorManager::fetch()
    @ 0x7f0df1e76e00 caffe::DataReader<>::InternalThreadEntryN()
    @ 0x7f0df22ea090 caffe::InternalThread::entry()
    @ 0x7f0df22ec02b boost::detail::thread_data<>::run()
    @ 0x7f0de5cd05d5 (unknown)
    @ 0x7f0dcecdb6ba start_thread
    @ 0x7f0def9e84dd clone

  • Hi Jiandong, please try with protobuf version 2.5

    You can get lot of help online regarding caffe, if you search with the specific error that you are getting. Example:

    https://github.com/BVLC/caffe/issues/19

  • I tried the protobuf version 2.5 but got .build_release/src/caffe/proto/caffe.pb.h:12:2: error: #error This file was generated by a newer version of protoc, SO I updata to version 2.6.1 but also make faild:

    CXX/LD -o .build_release/tools/get_image_size.bin
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::WireFormatLite::ReadString(google::protobuf::io::CodedInputStream*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::io::CodedOutputStream*)'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::StringTypeHandlerBase::New[abi:cxx11]()'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::Message::GetTypeName[abi:cxx11]() const'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned char*)'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::Message::InitializationErrorString[abi:cxx11]() const'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::io::CodedOutputStream*)'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::DescriptorPool::FindFileByName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::StringTypeHandlerBase::Delete(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::MessageFactory::InternalRegisterGeneratedFile(char const*, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&))'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::NameOfEnum[abi:cxx11](google::protobuf::EnumDescriptor const*, int)'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::WireFormatLite::WriteBytes(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::io::CodedOutputStream*)'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::empty_string_[abi:cxx11]'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::Message::DebugString[abi:cxx11]() const'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)'
    .build_release/lib/libcaffe-nv.so: undefined reference to `google::protobuf::internal::WireFormatLite::WriteString(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::io::CodedOutputStream*)'
    collect2: error: ld returned 1 exit status
    Makefile:655: recipe for target '.build_release/tools/get_image_size.bin' failed
    make: *** [.build_release/tools/get_image_size.bin] Error 1

    I use protobuf 3.5 compiled successfully before.

  • Hi,

    Caffe has always been difficult to build - the challenge is to get the versions right. Did you find anything that works from your google search? I can find a lot of resources when I search

    caffe protobuf

    Best regards,

    Manu.