TDA2P-ACD: ssdJacintoNetV2 Performance difference

khethan

Part Number: TDA2P-ACD

Dear All,

I have some question about the performance of ssdJacintoNetV2 on TDA2P.

When using the trained model(ssdJacintonetV2 768x320), the performance works at 15 fps, but when I use the model I've learned by script, the performance drops to about 11 fps.

According to related deploy files, It look like the trained model has 5 heads, but the training model is 6.

So I tried that the heads reduce from 5 to 5.(refer attached deploy file)

but FPS is still 11.

Is the cause I thought wrong? How can I increase the fps to 15?

pls refer my development environment as below.

SDK3.6.0.0

caffe-jachinto 0.17

caffe-jacinto-model 0.17

bootmode : SD boot

// key changes based on train_image_object_detection.sh

model_name=ssdJacintoNetV2

resize_width=768
resize_height=320
crop_width=768
crop_height=320
use_difficult_gt=0
small_objs=1
ker_mbox_loc_conf=1

num_classes=4
chop_num_heads=1

use_batchnorm_mbox=1

attached file(deploy.prototxt:logfile) : sparse.zip

Thank you.

BR,

Khethan

over 6 years ago

0 Praveen Eppa1 over 6 years ago

TI__Genius 17580 points

Hi Khethan,

Did you refer to below thread for importing the model ?

Also, are you running only Detection output layer on DSP and rest all the layers on EVE core?

Thanks,

Praveen

0 khethan over 6 years ago in reply to Praveen Eppa1

Intellectual 470 points

Hi Praveen,

I had already tried the above methods.

The parameter regarding the Detection output layer on DSP is as below.

layersGroupId = 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0

conv2dKernelType = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Thanks for quick reply

Khethan

0 khethan over 6 years ago in reply to khethan

Intellectual 470 points

Hi Praveen,

Can you give me more suggestion to help me solve the issue?

BR,

Khethan

0 Praveen Eppa1 over 6 years ago in reply to khethan

TI__Genius 17580 points

Hi Khethan,

I am checking with VSDK experts on this, below are the comments..

1. If the usecase is running on TDA2px which has 2EVEs, the expected output is around 12 FPS only

2. It seems like new model is taking more cycles and hence reducing the FPS. For confirmation you can check the stats logs (Press p after running the usecase) for 11 and 15 FPS case and check the local latency of tidl link. That will explain whether the new model is taking more time or SDK

Thanks,

Praveen

0 khethan over 6 years ago in reply to Praveen Eppa1

Intellectual 470 points

Hi Praveen,

It turn out that My TDA2p support up to 15fps while using TI voc trained model(ssdJacintoNetV2) as below.

I also attached full detail log.

### CPU [ EVE2], LinkID [ 49],
[IPU1-0] 82.465595 s:
[IPU1-0] 82.465869 s: [ ALG_TIDL ] Link Statistics,
[IPU1-0] 82.466144 s: ******************************
[IPU1-0] 82.466205 s:
[IPU1-0] 82.466266 s: Elapsed time = 40639 msec
[IPU1-0] 82.466327 s:
[IPU1-0] 82.466357 s: New data Recv = 12.59 fps
[IPU1-0] 82.466449 s:
[IPU1-0] 82.466479 s: Input Statistics,
[IPU1-0] 82.466540 s:
[IPU1-0] 82.466571 s: CH | In Recv | In Drop | In User Drop | In Process
[IPU1-0] 82.466876 s: | FPS | FPS | FPS | FPS
[IPU1-0] 82.466998 s: --------------------------------------------------
[IPU1-0] 82.467089 s: 0 | 7.55 0. 0 0. 0 7.52
[IPU1-0] 82.467211 s:
[IPU1-0] 82.467242 s: Output Statistics,
[IPU1-0] 82.467303 s:
[IPU1-0] 82.467516 s: CH | Out | Out | Out Drop | Out User Drop
[IPU1-0] 82.467608 s: | ID | FPS | FPS | FPS
[IPU1-0] 82.467730 s: ---------------------------------------------
[IPU1-0] 82.467821 s: 0 | 0 7.52 0. 0 0. 0
[IPU1-0] 82.467913 s:
[IPU1-0] 82.468157 s: [ ALG_TIDL ] LATENCY,
[IPU1-0] 82.468248 s: ********************
[IPU1-0] 82.468309 s: Local Link Latency : Avg = 128044 us, Min = 126914 us, Max = 129019 us,
[IPU1-0] 82.468431 s: Source to Link Latency : Avg = 140895 us, Min = 136918 us, Max = 169249 us,

--------------------------------------------------------------------------------------

### CPU [ EVE1], LinkID [ 49],
[IPU1-0] 83.993594 s:
[IPU1-0] 83.993655 s: [ ALG_TIDL ] Link Statistics,
[IPU1-0] 83.994113 s: ******************************
[IPU1-0] 83.994204 s:
[IPU1-0] 83.994235 s: Elapsed time = 42206 msec
[IPU1-0] 83.994326 s:
[IPU1-0] 83.994357 s: New data Recv = 12.36 fps
[IPU1-0] 83.994448 s:
[IPU1-0] 83.994479 s: Input Statistics,
[IPU1-0] 83.994753 s:
[IPU1-0] 83.994814 s: CH | In Recv | In Drop | In User Drop | In Process
[IPU1-0] 83.994906 s: | FPS | FPS | FPS | FPS
[IPU1-0] 83.995058 s: --------------------------------------------------
[IPU1-0] 83.995150 s: 0 | 7.53 0. 0 0. 0 7.51
[IPU1-0] 83.995485 s:
[IPU1-0] 83.995546 s: Output Statistics,
[IPU1-0] 83.995607 s:
[IPU1-0] 83.995638 s: CH | Out | Out | Out Drop | Out User Drop
[IPU1-0] 83.995760 s: | ID | FPS | FPS | FPS
[IPU1-0] 83.995851 s: ---------------------------------------------
[IPU1-0] 83.995912 s: 0 | 0 7.51 0. 0 0. 0
[IPU1-0] 83.996248 s:
[IPU1-0] 83.996278 s: [ ALG_TIDL ] LATENCY,
[IPU1-0] 83.996339 s: ********************
[IPU1-0] 83.996400 s: Local Link Latency : Avg = 128100 us, Min = 126975 us, Max = 128836 us,
[IPU1-0] 83.996522 s: Source to Link Latency : Avg = 141979 us, Min = 137132 us, Max = 178887 us,
[IPU1-0] 83.996644 s:

I think so.

In previous post, I had already attached 11fps log and deploy file.

please refer my dataset trained model as below.

I attached again the full log file.

### CPU [ EVE2], LinkID [ 49],
[IPU1-0] 70.721923 s:
[IPU1-0] 70.721984 s: [ ALG_TIDL ] Link Statistics,
[IPU1-0] 70.722045 s: ******************************
[IPU1-0] 70.722106 s:
[IPU1-0] 70.722137 s: Elapsed time = 28657 msec
[IPU1-0] 70.722198 s:
[IPU1-0] 70.722259 s: New data Recv = 5.82 fps
[IPU1-0] 70.722320 s:
[IPU1-0] 70.722351 s: Input Statistics,
[IPU1-0] 70.722595 s:
[IPU1-0] 70.722656 s: CH | In Recv | In Drop | In User Drop | In Process
[IPU1-0] 70.722717 s: | FPS | FPS | FPS | FPS
[IPU1-0] 70.722839 s: --------------------------------------------------
[IPU1-0] 70.722930 s: 0 | 5.86 0. 0 0. 0 5.79
[IPU1-0] 70.723083 s:
[IPU1-0] 70.723296 s: Output Statistics,
[IPU1-0] 70.723449 s:
[IPU1-0] 70.723479 s: CH | Out | Out | Out Drop | Out User Drop
[IPU1-0] 70.723540 s: | ID | FPS | FPS | FPS
[IPU1-0] 70.723632 s: ---------------------------------------------
[IPU1-0] 70.723693 s: 0 | 0 5.79 0. 0 0. 0
[IPU1-0] 70.724242 s:
[IPU1-0] 70.724303 s: [ ALG_TIDL ] LATENCY,
[IPU1-0] 70.724394 s: ********************
[IPU1-0] 70.724425 s: Local Link Latency : Avg = 171570 us, Min = 170561 us, Max = 173153 us,
[IPU1-0] 70.724760 s: Source to Link Latency : Avg = 350078 us, Min = 181755 us, Max = 395047 us,

-------------------------------------------------

### CPU [ EVE1], LinkID [ 49],
[IPU1-0] 72.238028 s:
[IPU1-0] 72.238089 s: [ ALG_TIDL ] Link Statistics,
[IPU1-0] 72.238180 s: ******************************
[IPU1-0] 72.238241 s:
[IPU1-0] 72.238272 s: Elapsed time = 30215 msec
[IPU1-0] 72.238333 s:
[IPU1-0] 72.238363 s: New data Recv = 5.82 fps
[IPU1-0] 72.238638 s:
[IPU1-0] 72.238668 s: Input Statistics,
[IPU1-0] 72.238729 s:
[IPU1-0] 72.238760 s: CH | In Recv | In Drop | In User Drop | In Process
[IPU1-0] 72.238882 s: | FPS | FPS | FPS | FPS
[IPU1-0] 72.238973 s: --------------------------------------------------
[IPU1-0] 72.239065 s: 0 | 5.85 0. 0 0. 0 5.79
[IPU1-0] 72.239370 s:
[IPU1-0] 72.239431 s: Output Statistics,
[IPU1-0] 72.239492 s:
[IPU1-0] 72.239522 s: CH | Out | Out | Out Drop | Out User Drop
[IPU1-0] 72.239583 s: | ID | FPS | FPS | FPS
[IPU1-0] 72.239675 s: ---------------------------------------------
[IPU1-0] 72.239736 s: 0 | 0 5.79 0. 0 0. 0
[IPU1-0] 72.240285 s:
[IPU1-0] 72.240346 s: [ ALG_TIDL ] LATENCY,
[IPU1-0] 72.240407 s: ********************
[IPU1-0] 72.240468 s: Local Link Latency : Avg = 171477 us, Min = 170012 us, Max = 172879 us,
[IPU1-0] 72.240773 s: Source to Link Latency : Avg = 405571 us, Min = 182121 us, Max = 449460 us,

related log and deplay : ssdJdetv2_fps15vs11.zip

Thanks,

BR,

Khethan

0 Praveen Eppa1 over 6 years ago in reply to khethan

TI__Genius 17580 points

Hi Khethan,

If this is the case then nothing can be done from usecase side, you may need to work on the model. So, you can try with more sparsity in the model .

Thanks,

Praveen

Processors

Processors forum

TDA2P-ACD: ssdJacintoNetV2 Performance difference