CCS: TDA4x: Mobilenet model trained on Padded Input

Vyom Mishra1

Tool/software: Code Composer Studio

Dear Sir,

I am using "tidl_j7_01_00_00_00" for Importing the "mobilenet model".

With reference to the previous post, we resolved the random box and Bounding box localization issue by changing the threshold and using the "quantizationstyle=3' instead of '2'.

But when we are trying to run the same model trained on "Padded input data" we are facing the huge mismatch in PC and target side results.

I have attached PC and target board output below for comparison:

Target

Please find the import and inference config files attached:

Fullscreen 7607.tidl_import_msi_mobilenet_pd.txt Download

modelType          = 0
inputNetFile       = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/new_padded/deploy.prototxt"
inputParamsFile    = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/new_padded/mob.caffemodel"
outputNetFile      = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/tidl_models/caffe/tidl_net_msi_mobilenet_pd_padded.bin"
outputParamsFile   = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/tidl_models/caffe/tidl_io_msi_mobilenet_pd_padded"
numParamBits = 12
numFeatureBits = 12
quantizationStyle = 2
inDataFormat = 0
inElementType  = 0 
inWidth = 512
inHeight = 512
inNumChannels = 3
perfSimConfig = "../../test/testvecs/config/import/perfsim_base.cfg"
inData = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/det.txt"
numFrames = 100
postProcType = 2
inFileFormat = 2

Fullscreen 4682.tidl_infer_msi_mobilenet_pd.txt Download

inFileFormat    = 2
postProcType = 2
numFrames   = 1
padInBuffInTB = 1
netBinFile      = "testvecs/config/tidl_models/caffe/tidl_net_msi_mobilenet_pd_padded.bin"
ioConfigFile    = "testvecs/config/tidl_models/caffe/tidl_io_msi_mobilenet_pd_padded1.bin"
outData =   "testvecs/output/msi_mobilenet.bin"
inData  =   "testvecs/config/det.txt"
debugTraceLevel = 1
writeTraceLevel = 0
numFrames = 22
writeOutput = 1

We have used "quantizationstyle=2' for this model as results for this quantization is better than of quantizationstyle=3.

Confidence threshold: 0.3( as it has better detection as compared to other thresholds)

Kindly help us to improve our target side results.

Note: Model trained on normal input has the results shared in the related post as "results.zip"

or please find here for your reference 0434.results.zip

Thanks and Regards,

Vyom Mishra

over 5 years ago

0 Sankalp Kallakuri22 over 5 years ago

Intellectual 980 points

Gentle Reminder.

0 Subhajit Paul over 5 years ago in reply to Sankalp Kallakuri22

TI__Expert 7015 points

Sankalp Kallakuri, Vyom Mishra,

Can you answer the following questions which will help me understand the problem better?

1. When you say PC results, are these generated by the training framework, or TIDL host emulation?

2. Did you retrain your network use padded inputs?

3. Is the resolution same between training, inference, PC run and target run?

- Subhajit

0 Sankalp Kallakuri22 over 5 years ago in reply to Subhajit Paul

Intellectual 980 points

Hi Subhajit,

Please find answers in square brackets.

1. When you say PC results, are these generated by the training framework, or TIDL host emulation? [Training Framework]

2. Did you retrain your network use padded inputs?[Yes]

3. Is the resolution same between training, inference, PC run and target run?[Yes]

Best Regards,

Sankalp

0 Subhajit Paul over 5 years ago in reply to Sankalp Kallakuri22

TI__Expert 7015 points

Sankalp Kallakuri,

TIDL does not guarantee that the results will match will match the results obtained from training framework due to quantization.

Can you send the console output of the import process and the layersminmax file?

- Subhajit

0 Sankalp Kallakuri22 over 5 years ago in reply to Subhajit Paul

Intellectual 980 points

Subhajit,

We agree that quantization would play a part in the degradation of results. We experimented with quantization style as well as num parambits and num feature bits but did not find improvement in the results. We shall share the requested files tomorrow.

Best Regards,

Sankalp Kallakuri

0 Vyom Mishra1 over 5 years ago in reply to Subhajit Paul

Genius 4590 points

Dear Subhajit,

Please find the requested file for your reference.

Padded_model.zip

Thanks and Regards,

Vyom Mishra

0 Subhajit Paul over 5 years ago in reply to Vyom Mishra1

TI__Expert 7015 points

Vyom Mishra

I will look into the files provided.

Also, can you look into the "steps to debug" document and match the input and layer level output between framework (PC) and TIDL host-emulation (PC)

- Subhajit

0 Sankalp Kallakuri22 over 5 years ago in reply to Subhajit Paul

Intellectual 980 points

Subhajit,

We will take up this activity. We are also sharing the min max values of the unpadded and padded image models over here from the PC side[framework output].The unpadded worked well on the TDA2x. Both perform poorly on TDA4x.

Unpadded baseline model range file is "Baseline_model_params_range" padded model range file is "MDK_COCO_model_params_range".

Unable to attach files here will share via email.

Regards,

Sankalp

0 Subhajit Paul over 5 years ago in reply to Sankalp Kallakuri22

TI__Expert 7015 points

Sankalp Kallakuri,

I have recieved the files. Let me have a look

- Subhajit

0 Vyom Mishra1 over 5 years ago in reply to Subhajit Paul

Genius 4590 points

Gentle Reminder!

0 Vyom Mishra1 over 5 years ago in reply to Subhajit Paul

Genius 4590 points

Dear Sir,

We have experimented with L1 Regularized model and facing the similar FP's on the detection, Please find for your reference:

Parameters used while importing:

Confidence threshold: 0.3

numparambits =12

numfeaturebits = 12

quantizationstyle = 2

Kindly help us to resolve the issue.

Thanks and Regards,

Vyom Mishra

0 Vyom Mishra1 over 5 years ago in reply to Subhajit Paul

Genius 4590 points

Dear Sir,

We have experimented with the configuration as below:

numparambits =8

numfeaturebits =8

quantizationstyle =3

above parameters were giving good results for the un-padded model.

While experimenting with the same parameters for the L1 Regularized padded model we observed

a) No FP's

b) Missed detections

c) very low scores for True Positives

** Detected bounding boxes score were matching the PC scores when parameters were numparambits = numfeaturebits =12

I am sharing you some results please find it for your reference:

Kindly do the needful.

Thanks and Regards,

Vyom Mishra

0 kumar.desappan over 5 years ago in reply to Vyom Mishra1

TI__Mastermind 22145 points

Hi Could you follow the steps mentioned below to narrow down the layer where mismatch is found

http://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/tidl_j7_01_00_01_00/ti_dl/docs/user_guide_html/md_tidl_fsg_steps_to_debug_mismatch.html

0 Vyom Mishra1 over 5 years ago in reply to kumar.desappan

Genius 4590 points

Dear Sir,

I have attached the .csv and log.txt file for you reference

Configuration:

model: L1 Regularized trained on Padded Input data

Confidence Threshold: 0.3

numparambits = numfeaturebits =12

quantizationstyle = 3

/cfs-file/__key/communityserver-discussions-components-files/791/tidl_5F00_net_5F00_l1.bin_5F00_paramDebug.csv">tidl_net_l1.bin_paramDebug.csv .code-editor-heading{border-top:1px solid #ebebeb;border-left:1px solid #ebebeb;border-right:1px solid #ebebeb;background-color:#fbfbfb;border-radius:3px 3px 0 0;font-size:12.6px;display:flex;justify-content:space-between;align-items:center;overflow:hidden}.code-editor .code-editor-heading .icon{width:32px;height:32px;display:block;overflow:hidden;text-indent:-3000em;background-repeat:no-repeat;background-size:80%;background-position:center}.code-editor .code-editor-heading .fs{background-image:url('https://e2e.ti.com/cfs-filesystemfile/__key/defaultwidgets/547b4cbb4efb4c3d83533f8f35fb4b7b-1a84591e31034fac832d29ed8584666c/fullscreen.svg?_=638878647125055990')}.code-editor .code-editor-heading .dl{background-image:url('https://e2e.ti.com/cfs-filesystemfile/__key/defaultwidgets/547b4cbb4efb4c3d83533f8f35fb4b7b-1a84591e31034fac832d29ed8584666c/download.svg?_=638878647125035940')}.code-editor .code-editor-heading .filename{padding:10px;display:block;white-space:nowrap;overflow:hidden;text-overflow:ellipsis}.code-editor .code-editor-heading a{color:#474747}.code-editor .code-editor-heading a:hover{color:#007c8c} 84591e31034fac832d29ed8584666c1770370941_code-editor-fs">Fullscreen e.ti.com/cfs-filesystemfile/__key/communityserver-discussions-components-files/791/26657.log.txt?_=637171905060000000" 26657.log.txt">26657.log.txt e.ti.com/cfs-filesystemfile/__key/communityserver-discussions-components-files/791/26657.log.txt?_=637171905060000000" 26657.log.txt">Download 1e31034fac832d29ed8584666c1770370941_code-editor" ;min-width:1230px;margin:0px;" of Layer Detected : 65 ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |Out Data Name |Group |#Ins |#Outs |Inbuf Ids |Outbuf Id |In NCHW |Out NCHW |MACS | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |data | 0| -1| 1| x x x x x x x x | 0 | 0 0 0 0 | 1 3 512 512 | 0 | |conv1 | 0| 1| 1| 0 x x x x x x x | 1 | 1 3 512 512 | 1 16 256 256 | 30408704 | |conv2_1/dw | 0| 1| 1| 1 x x x x x x x | 2 | 1 16 256 256 | 1 16 256 256 | 11534336 | |conv2_1/sep | 0| 1| 1| 2 x x x x x x x | 3 | 1 16 256 256 | 1 32 256 256 | 37748736 | |conv2_2/dw | 0| 1| 1| 3 x x x x x x x | 4 | 1 32 256 256 | 1 32 128 128 | 5767168 | |conv2_2/sep | 0| 1| 1| 4 x x x x x x x | 5 | 1 32 128 128 | 1 64 128 128 | 35651584 | |conv3_1/dw | 0| 1| 1| 5 x x x x x x x | 6 | 1 64 128 128 | 1 64 128 128 | 11534336 | |conv3_1/sep | 0| 1| 1| 6 x x x x x x x | 7 | 1 64 128 128 | 1 64 128 128 | 69206016 | |conv3_2/dw | 0| 1| 1| 7 x x x x x x x | 8 | 1 64 128 128 | 1 64 64 64 | 2883584 | |conv3_2/sep | 0| 1| 1| 8 x x x x x x x | 9 | 1 64 64 64 | 1 128 64 64 | 34603008 | |conv4_1/dw | 0| 1| 1| 9 x x x x x x x | 10 | 1 128 64 64 | 1 128 64 64 | 5767168 | |conv4_1/sep | 0| 1| 1| 10 x x x x x x x | 11 | 1 128 64 64 | 1 128 64 64 | 68157440 | |conv4_2/dw | 0| 1| 1| 11 x x x x x x x | 12 | 1 128 64 64 | 1 128 32 32 | 1441792 | |conv4_2/sep | 0| 1| 1| 12 x x x x x x x | 13 | 1 128 32 32 | 1 256 32 32 | 34078720 | |conv5_1/dw | 0| 1| 1| 13 x x x x x x x | 14 | 1 256 32 32 | 1 256 32 32 | 2883584 | |conv5_1/sep | 0| 1| 1| 14 x x x x x x x | 15 | 1 256 32 32 | 1 256 32 32 | 67633152 | |conv5_2/dw | 0| 1| 1| 15 x x x x x x x | 16 | 1 256 32 32 | 1 256 32 32 | 2883584 | |conv5_2/sep | 0| 1| 1| 16 x x x x x x x | 17 | 1 256 32 32 | 1 256 32 32 | 67633152 | |conv5_3/dw | 0| 1| 1| 17 x x x x x x x | 18 | 1 256 32 32 | 1 256 32 32 | 2883584 | |conv5_3/sep | 0| 1| 1| 18 x x x x x x x | 19 | 1 256 32 32 | 1 256 32 32 | 67633152 | |conv5_4/dw | 0| 1| 1| 19 x x x x x x x | 20 | 1 256 32 32 | 1 256 32 32 | 2883584 | |conv5_4/sep | 0| 1| 1| 20 x x x x x x x | 21 | 1 256 32 32 | 1 256 32 32 | 67633152 | |conv5_5/dw | 0| 1| 1| 21 x x x x x x x | 22 | 1 256 32 32 | 1 256 32 32 | 2883584 | |conv5_5/sep | 0| 1| 1| 22 x x x x x x x | 23 | 1 256 32 32 | 1 256 32 32 | 67633152 | |conv5_6/dw | 0| 1| 1| 23 x x x x x x x | 24 | 1 256 32 32 | 1 256 16 16 | 720896 | |ctx_output1/dw | 0| 1| 1| 23 x x x x x x x | 25 | 1 256 32 32 | 1 256 32 32 | 2883584 | |conv5_6/sep | 0| 1| 1| 24 x x x x x x x | 26 | 1 256 16 16 | 1 512 16 16 | 33816576 | |ctx_output1/sep | 0| 1| 1| 25 x x x x x x x | 27 | 1 256 32 32 | 1 512 32 32 | 135266304 | |ctx_output1/sep/relu_mbox_loc_perm | 0| 1| 1| 27 x x x x x x x | 28 | 1 512 32 32 | 1 24 32 32 | 12582912 | |conv6/dw | 0| 1| 1| 26 x x x x x x x | 29 | 1 512 16 16 | 1 512 16 16 | 1441792 | |ctx_output1/sep/relu_mbox_conf_perm | 0| 1| 1| 27 x x x x x x x | 30 | 1 512 32 32 | 1 12 32 32 | 6291456 | |ctx_output1/sep/relu_mbox_loc_flat | 0| 1| 1| 28 x x x x x x x | 31 | 1 24 32 32 | 1 1 1 24576 | 24576 | |ctx_output1/sep/relu_mbox_conf_flat | 0| 1| 1| 30 x x x x x x x | 32 | 1 12 32 32 | 1 1 1 12288 | 12288 | |conv6/sep | 0| 1| 1| 29 x x x x x x x | 33 | 1 512 16 16 | 1 512 16 16 | 67371008 | |pool6 | 0| 1| 1| 33 x x x x x x x | 34 | 1 512 16 16 | 1 512 8 8 | 131072 | |pool7 | 0| 1| 1| 34 x x x x x x x | 35 | 1 512 8 8 | 1 512 4 4 | 32768 | |pool8 | 0| 1| 1| 35 x x x x x x x | 36 | 1 512 4 4 | 1 512 2 2 | 8192 | |ctx_output2/dw | 0| 1| 1| 33 x x x x x x x | 37 | 1 512 16 16 | 1 512 16 16 | 1441792 | |ctx_output3/dw | 0| 1| 1| 34 x x x x x x x | 38 | 1 512 8 8 | 1 512 8 8 | 360448 | |ctx_output4/dw | 0| 1| 1| 35 x x x x x x x | 39 | 1 512 4 4 | 1 512 4 4 | 90112 | |ctx_output5/dw | 0| 1| 1| 36 x x x x x x x | 40 | 1 512 2 2 | 1 512 2 2 | 22528 | |ctx_output2/sep | 0| 1| 1| 37 x x x x x x x | 41 | 1 512 16 16 | 1 512 16 16 | 67371008 | |ctx_output3/sep | 0| 1| 1| 38 x x x x x x x | 42 | 1 512 8 8 | 1 512 8 8 | 16842752 | |ctx_output4/sep | 0| 1| 1| 39 x x x x x x x | 43 | 1 512 4 4 | 1 512 4 4 | 4210688 | |ctx_output5/sep | 0| 1| 1| 40 x x x x x x x | 44 | 1 512 2 2 | 1 512 2 2 | 1052672 | |ctx_output2/sep/relu_mbox_loc_perm | 0| 1| 1| 41 x x x x x x x | 45 | 1 512 16 16 | 1 40 16 16 | 5242880 | |ctx_output3/sep/relu_mbox_loc_perm | 0| 1| 1| 42 x x x x x x x | 46 | 1 512 8 8 | 1 40 8 8 | 1310720 | |ctx_output4/sep/relu_mbox_loc_perm | 0| 1| 1| 43 x x x x x x x | 47 | 1 512 4 4 | 1 24 4 4 | 196608 | |ctx_output5/sep/relu_mbox_loc_perm | 0| 1| 1| 44 x x x x x x x | 48 | 1 512 2 2 | 1 24 2 2 | 49152 | |ctx_output2/sep/relu_mbox_conf_perm | 0| 1| 1| 41 x x x x x x x | 49 | 1 512 16 16 | 1 20 16 16 | 2621440 | |ctx_output3/sep/relu_mbox_conf_perm | 0| 1| 1| 42 x x x x x x x | 50 | 1 512 8 8 | 1 20 8 8 | 655360 | |ctx_output4/sep/relu_mbox_conf_perm | 0| 1| 1| 43 x x x x x x x | 51 | 1 512 4 4 | 1 12 4 4 | 98304 | |ctx_output5/sep/relu_mbox_conf_perm | 0| 1| 1| 44 x x x x x x x | 52 | 1 512 2 2 | 1 12 2 2 | 24576 | |ctx_output2/sep/relu_mbox_loc_flat | 0| 1| 1| 45 x x x x x x x | 53 | 1 40 16 16 | 1 1 1 10240 | 10240 | |ctx_output3/sep/relu_mbox_loc_flat | 0| 1| 1| 46 x x x x x x x | 54 | 1 40 8 8 | 1 1 1 2560 | 2560 | |ctx_output4/sep/relu_mbox_loc_flat | 0| 1| 1| 47 x x x x x x x | 55 | 1 24 4 4 | 1 1 1 384 | 384 | |ctx_output5/sep/relu_mbox_loc_flat | 0| 1| 1| 48 x x x x x x x | 56 | 1 24 2 2 | 1 1 1 96 | 96 | |ctx_output2/sep/relu_mbox_conf_flat | 0| 1| 1| 49 x x x x x x x | 57 | 1 20 16 16 | 1 1 1 5120 | 5120 | |ctx_output3/sep/relu_mbox_conf_flat | 0| 1| 1| 50 x x x x x x x | 58 | 1 20 8 8 | 1 1 1 1280 | 1280 | |ctx_output4/sep/relu_mbox_conf_flat | 0| 1| 1| 51 x x x x x x x | 59 | 1 12 4 4 | 1 1 1 192 | 192 | |ctx_output5/sep/relu_mbox_conf_flat | 0| 1| 1| 52 x x x x x x x | 60 | 1 12 2 2 | 1 1 1 48 | 48 | |mbox_loc | 0| 5| 1| 31 53 54 55 56 x x x | 61 | 1 1 1 24576 | 1 1 1 37856 | 122880 | |mbox_conf_flatten | 0| 5| 1| 32 57 58 59 60 x x x | 62 | 1 1 1 12288 | 1 1 1 18928 | 61440 | |detection_out | 0| 2| 1| 61 62 x x x x x x | 63 | 1 1 1 37856 | 1 1 1 144 | 0 | |detection_out | 0| 1| -1| 63 x x x x x x x | 0 | 1 1 1 144 | 0 0 0 0 | 0 | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------- = j('#fragment-1a84591e31034fac832d29ed8584666c1770370941_code-editor-fs'); = j('#fragment-1a84591e31034fac832d29ed8584666c1770370941_code-editor'); function(){ utionCodeEditor('fullscreen')) { utionCodeEditor('fullscreen', false); utionCodeEditor('fullscreen', true);

Observations:

a) The double-digit difference is observed between "meanOrigFloat" and " orgmax"

b) Layer ID: (Having difference >5) :4,6,10,12,14,16,18,20 are Depth Wise convolution

As per the observations Depthwise Convolution has the maximum Quantization loss.

Configuration:

model: L1 Regularized trained on Padded Input data

Confidence Threshold: 0.3

numparambits = numfeaturebits =12

quantizationstyle = 2

tidl_net_msi_l1.bin_paramDebug.csv

Fullscreen 8611.log.txt Download

Num of Layer Detected :  65 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Num|TIDL Layer Name               |Out Data Name                                     |Group |#Ins  |#Outs |Inbuf Ids                       |Outbuf Id |In NCHW                             |Out NCHW                            |MACS       |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    0|TIDL_DataLayer                |data                                              |     0|    -1|     1|  x   x   x   x   x   x   x   x |  0       |       0        0        0        0 |       1        3      512      512 |         0 |
    1|TIDL_ConvolutionLayer         |conv1                                             |     0|     1|     1|  0   x   x   x   x   x   x   x |  1       |       1        3      512      512 |       1       16      256      256 |  30408704 |
    2|TIDL_ConvolutionLayer         |conv2_1/dw                                        |     0|     1|     1|  1   x   x   x   x   x   x   x |  2       |       1       16      256      256 |       1       16      256      256 |  11534336 |
    3|TIDL_ConvolutionLayer         |conv2_1/sep                                       |     0|     1|     1|  2   x   x   x   x   x   x   x |  3       |       1       16      256      256 |       1       32      256      256 |  37748736 |
    4|TIDL_ConvolutionLayer         |conv2_2/dw                                        |     0|     1|     1|  3   x   x   x   x   x   x   x |  4       |       1       32      256      256 |       1       32      128      128 |   5767168 |
    5|TIDL_ConvolutionLayer         |conv2_2/sep                                       |     0|     1|     1|  4   x   x   x   x   x   x   x |  5       |       1       32      128      128 |       1       64      128      128 |  35651584 |
    6|TIDL_ConvolutionLayer         |conv3_1/dw                                        |     0|     1|     1|  5   x   x   x   x   x   x   x |  6       |       1       64      128      128 |       1       64      128      128 |  11534336 |
    7|TIDL_ConvolutionLayer         |conv3_1/sep                                       |     0|     1|     1|  6   x   x   x   x   x   x   x |  7       |       1       64      128      128 |       1       64      128      128 |  69206016 |
    8|TIDL_ConvolutionLayer         |conv3_2/dw                                        |     0|     1|     1|  7   x   x   x   x   x   x   x |  8       |       1       64      128      128 |       1       64       64       64 |   2883584 |
    9|TIDL_ConvolutionLayer         |conv3_2/sep                                       |     0|     1|     1|  8   x   x   x   x   x   x   x |  9       |       1       64       64       64 |       1      128       64       64 |  34603008 |
   10|TIDL_ConvolutionLayer         |conv4_1/dw                                        |     0|     1|     1|  9   x   x   x   x   x   x   x | 10       |       1      128       64       64 |       1      128       64       64 |   5767168 |
   11|TIDL_ConvolutionLayer         |conv4_1/sep                                       |     0|     1|     1| 10   x   x   x   x   x   x   x | 11       |       1      128       64       64 |       1      128       64       64 |  68157440 |
   12|TIDL_ConvolutionLayer         |conv4_2/dw                                        |     0|     1|     1| 11   x   x   x   x   x   x   x | 12       |       1      128       64       64 |       1      128       32       32 |   1441792 |
   13|TIDL_ConvolutionLayer         |conv4_2/sep                                       |     0|     1|     1| 12   x   x   x   x   x   x   x | 13       |       1      128       32       32 |       1      256       32       32 |  34078720 |
   14|TIDL_ConvolutionLayer         |conv5_1/dw                                        |     0|     1|     1| 13   x   x   x   x   x   x   x | 14       |       1      256       32       32 |       1      256       32       32 |   2883584 |
   15|TIDL_ConvolutionLayer         |conv5_1/sep                                       |     0|     1|     1| 14   x   x   x   x   x   x   x | 15       |       1      256       32       32 |       1      256       32       32 |  67633152 |
   16|TIDL_ConvolutionLayer         |conv5_2/dw                                        |     0|     1|     1| 15   x   x   x   x   x   x   x | 16       |       1      256       32       32 |       1      256       32       32 |   2883584 |
   17|TIDL_ConvolutionLayer         |conv5_2/sep                                       |     0|     1|     1| 16   x   x   x   x   x   x   x | 17       |       1      256       32       32 |       1      256       32       32 |  67633152 |
   18|TIDL_ConvolutionLayer         |conv5_3/dw                                        |     0|     1|     1| 17   x   x   x   x   x   x   x | 18       |       1      256       32       32 |       1      256       32       32 |   2883584 |
   19|TIDL_ConvolutionLayer         |conv5_3/sep                                       |     0|     1|     1| 18   x   x   x   x   x   x   x | 19       |       1      256       32       32 |       1      256       32       32 |  67633152 |
   20|TIDL_ConvolutionLayer         |conv5_4/dw                                        |     0|     1|     1| 19   x   x   x   x   x   x   x | 20       |       1      256       32       32 |       1      256       32       32 |   2883584 |
   21|TIDL_ConvolutionLayer         |conv5_4/sep                                       |     0|     1|     1| 20   x   x   x   x   x   x   x | 21       |       1      256       32       32 |       1      256       32       32 |  67633152 |
   22|TIDL_ConvolutionLayer         |conv5_5/dw                                        |     0|     1|     1| 21   x   x   x   x   x   x   x | 22       |       1      256       32       32 |       1      256       32       32 |   2883584 |
   23|TIDL_ConvolutionLayer         |conv5_5/sep                                       |     0|     1|     1| 22   x   x   x   x   x   x   x | 23       |       1      256       32       32 |       1      256       32       32 |  67633152 |
   24|TIDL_ConvolutionLayer         |conv5_6/dw                                        |     0|     1|     1| 23   x   x   x   x   x   x   x | 24       |       1      256       32       32 |       1      256       16       16 |    720896 |
   25|TIDL_ConvolutionLayer         |ctx_output1/dw                                    |     0|     1|     1| 23   x   x   x   x   x   x   x | 25       |       1      256       32       32 |       1      256       32       32 |   2883584 |
   26|TIDL_ConvolutionLayer         |conv5_6/sep                                       |     0|     1|     1| 24   x   x   x   x   x   x   x | 26       |       1      256       16       16 |       1      512       16       16 |  33816576 |
   27|TIDL_ConvolutionLayer         |ctx_output1/sep                                   |     0|     1|     1| 25   x   x   x   x   x   x   x | 27       |       1      256       32       32 |       1      512       32       32 | 135266304 |
   28|TIDL_ConvolutionLayer         |ctx_output1/sep/relu_mbox_loc_perm                |     0|     1|     1| 27   x   x   x   x   x   x   x | 28       |       1      512       32       32 |       1       24       32       32 |  12582912 |
   29|TIDL_ConvolutionLayer         |conv6/dw                                          |     0|     1|     1| 26   x   x   x   x   x   x   x | 29       |       1      512       16       16 |       1      512       16       16 |   1441792 |
   30|TIDL_ConvolutionLayer         |ctx_output1/sep/relu_mbox_conf_perm               |     0|     1|     1| 27   x   x   x   x   x   x   x | 30       |       1      512       32       32 |       1       12       32       32 |   6291456 |
   31|TIDL_FlattenLayer             |ctx_output1/sep/relu_mbox_loc_flat                |     0|     1|     1| 28   x   x   x   x   x   x   x | 31       |       1       24       32       32 |       1        1        1    24576 |     24576 |
   32|TIDL_FlattenLayer             |ctx_output1/sep/relu_mbox_conf_flat               |     0|     1|     1| 30   x   x   x   x   x   x   x | 32       |       1       12       32       32 |       1        1        1    12288 |     12288 |
   33|TIDL_ConvolutionLayer         |conv6/sep                                         |     0|     1|     1| 29   x   x   x   x   x   x   x | 33       |       1      512       16       16 |       1      512       16       16 |  67371008 |
   34|TIDL_PoolingLayer             |pool6                                             |     0|     1|     1| 33   x   x   x   x   x   x   x | 34       |       1      512       16       16 |       1      512        8        8 |    131072 |
   35|TIDL_PoolingLayer             |pool7                                             |     0|     1|     1| 34   x   x   x   x   x   x   x | 35       |       1      512        8        8 |       1      512        4        4 |     32768 |
   36|TIDL_PoolingLayer             |pool8                                             |     0|     1|     1| 35   x   x   x   x   x   x   x | 36       |       1      512        4        4 |       1      512        2        2 |      8192 |
   37|TIDL_ConvolutionLayer         |ctx_output2/dw                                    |     0|     1|     1| 33   x   x   x   x   x   x   x | 37       |       1      512       16       16 |       1      512       16       16 |   1441792 |
   38|TIDL_ConvolutionLayer         |ctx_output3/dw                                    |     0|     1|     1| 34   x   x   x   x   x   x   x | 38       |       1      512        8        8 |       1      512        8        8 |    360448 |
   39|TIDL_ConvolutionLayer         |ctx_output4/dw                                    |     0|     1|     1| 35   x   x   x   x   x   x   x | 39       |       1      512        4        4 |       1      512        4        4 |     90112 |
   40|TIDL_ConvolutionLayer         |ctx_output5/dw                                    |     0|     1|     1| 36   x   x   x   x   x   x   x | 40       |       1      512        2        2 |       1      512        2        2 |     22528 |
   41|TIDL_ConvolutionLayer         |ctx_output2/sep                                   |     0|     1|     1| 37   x   x   x   x   x   x   x | 41       |       1      512       16       16 |       1      512       16       16 |  67371008 |
   42|TIDL_ConvolutionLayer         |ctx_output3/sep                                   |     0|     1|     1| 38   x   x   x   x   x   x   x | 42       |       1      512        8        8 |       1      512        8        8 |  16842752 |
   43|TIDL_ConvolutionLayer         |ctx_output4/sep                                   |     0|     1|     1| 39   x   x   x   x   x   x   x | 43       |       1      512        4        4 |       1      512        4        4 |   4210688 |
   44|TIDL_ConvolutionLayer         |ctx_output5/sep                                   |     0|     1|     1| 40   x   x   x   x   x   x   x | 44       |       1      512        2        2 |       1      512        2        2 |   1052672 |
   45|TIDL_ConvolutionLayer         |ctx_output2/sep/relu_mbox_loc_perm                |     0|     1|     1| 41   x   x   x   x   x   x   x | 45       |       1      512       16       16 |       1       40       16       16 |   5242880 |
   46|TIDL_ConvolutionLayer         |ctx_output3/sep/relu_mbox_loc_perm                |     0|     1|     1| 42   x   x   x   x   x   x   x | 46       |       1      512        8        8 |       1       40        8        8 |   1310720 |
   47|TIDL_ConvolutionLayer         |ctx_output4/sep/relu_mbox_loc_perm                |     0|     1|     1| 43   x   x   x   x   x   x   x | 47       |       1      512        4        4 |       1       24        4        4 |    196608 |
   48|TIDL_ConvolutionLayer         |ctx_output5/sep/relu_mbox_loc_perm                |     0|     1|     1| 44   x   x   x   x   x   x   x | 48       |       1      512        2        2 |       1       24        2        2 |     49152 |
   49|TIDL_ConvolutionLayer         |ctx_output2/sep/relu_mbox_conf_perm               |     0|     1|     1| 41   x   x   x   x   x   x   x | 49       |       1      512       16       16 |       1       20       16       16 |   2621440 |
   50|TIDL_ConvolutionLayer         |ctx_output3/sep/relu_mbox_conf_perm               |     0|     1|     1| 42   x   x   x   x   x   x   x | 50       |       1      512        8        8 |       1       20        8        8 |    655360 |
   51|TIDL_ConvolutionLayer         |ctx_output4/sep/relu_mbox_conf_perm               |     0|     1|     1| 43   x   x   x   x   x   x   x | 51       |       1      512        4        4 |       1       12        4        4 |     98304 |
   52|TIDL_ConvolutionLayer         |ctx_output5/sep/relu_mbox_conf_perm               |     0|     1|     1| 44   x   x   x   x   x   x   x | 52       |       1      512        2        2 |       1       12        2        2 |     24576 |
   53|TIDL_FlattenLayer             |ctx_output2/sep/relu_mbox_loc_flat                |     0|     1|     1| 45   x   x   x   x   x   x   x | 53       |       1       40       16       16 |       1        1        1    10240 |     10240 |
   54|TIDL_FlattenLayer             |ctx_output3/sep/relu_mbox_loc_flat                |     0|     1|     1| 46   x   x   x   x   x   x   x | 54       |       1       40        8        8 |       1        1        1     2560 |      2560 |
   55|TIDL_FlattenLayer             |ctx_output4/sep/relu_mbox_loc_flat                |     0|     1|     1| 47   x   x   x   x   x   x   x | 55       |       1       24        4        4 |       1        1        1      384 |       384 |
   56|TIDL_FlattenLayer             |ctx_output5/sep/relu_mbox_loc_flat                |     0|     1|     1| 48   x   x   x   x   x   x   x | 56       |       1       24        2        2 |       1        1        1       96 |        96 |
   57|TIDL_FlattenLayer             |ctx_output2/sep/relu_mbox_conf_flat               |     0|     1|     1| 49   x   x   x   x   x   x   x | 57       |       1       20       16       16 |       1        1        1     5120 |      5120 |
   58|TIDL_FlattenLayer             |ctx_output3/sep/relu_mbox_conf_flat               |     0|     1|     1| 50   x   x   x   x   x   x   x | 58       |       1       20        8        8 |       1        1        1     1280 |      1280 |
   59|TIDL_FlattenLayer             |ctx_output4/sep/relu_mbox_conf_flat               |     0|     1|     1| 51   x   x   x   x   x   x   x | 59       |       1       12        4        4 |       1        1        1      192 |       192 |
   60|TIDL_FlattenLayer             |ctx_output5/sep/relu_mbox_conf_flat               |     0|     1|     1| 52   x   x   x   x   x   x   x | 60       |       1       12        2        2 |       1        1        1       48 |        48 |
   61|TIDL_ConcatLayer              |mbox_loc                                          |     0|     5|     1| 31  53  54  55  56   x   x   x | 61       |       1        1        1    24576 |       1        1        1    37856 |    122880 |
   62|TIDL_ConcatLayer              |mbox_conf_flatten                                 |     0|     5|     1| 32  57  58  59  60   x   x   x | 62       |       1        1        1    12288 |       1        1        1    18928 |     61440 |
   63|TIDL_DetectionOutputLayer     |detection_out                                     |     0|     2|     1| 61  62   x   x   x   x   x   x | 63       |       1        1        1    37856 |       1        1        1      144 |         0 |
   64|TIDL_DataLayer                |detection_out                                     |     0|     1|    -1| 63   x   x   x   x   x   x   x |  0       |       1        1        1      144 |       0        0        0        0 |         0 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total Giga Macs : 1.0637
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Observations:

a) The double-digit difference is observed between "meanOrigFloat" and " orgmax"

b) Layer ID: (Having difference >5) :4,6,10,12,14,16,18,20 are Depth Wise convolution

:3 are Separable Convolution

As per the observations, Separable Convolution(for one layer:3), Depthwise Convolution has the maximum Quantization loss.

Kindly do the needful.

Thanks and Regards,

Vyom Mishra

0 kumar.desappan over 5 years ago in reply to Vyom Mishra1

TI__Mastermind 22145 points

Are you comparing the 12-bit TIDL results with you PC based reference software (Caffe/TensorFlow) right?

If yes, did you match the first tensor (Input to the network shall exactly match)

BTW, did you train the model with padded input or original model was trained without a pad?

0 Sankalp Kallakuri22 over 5 years ago in reply to kumar.desappan

Intellectual 980 points

Are you comparing the 12-bit TIDL results with you PC based reference software (Caffe/TensorFlow) right? [Yes]

If yes, did you match the first tensor (Input to the network shall exactly match)[https://e2e.ti.com/support/tools/ccs/f/81/p/877757/3253803?tisearch=e2e-sitesearch&keymatch=vyom#3253803]

BTW, did you train the model with padded input or original model was trained without a pad?[Yes]

0 Sankalp Kallakuri22 over 5 years ago in reply to Sankalp Kallakuri22

Intellectual 980 points

gentle reminder.

0 kumar.desappan over 5 years ago in reply to Sankalp Kallakuri22

TI__Mastermind 22145 points

Hi, Your answer to below two questions are not clear. Can you be more specific?

if yes, did you match the first tensor (Input to the network shall exactly match)

BTW, did you re-train the model with padded input

0 Vyom Mishra1 over 5 years ago in reply to kumar.desappan

Genius 4590 points

Dear Sir,

1) We have trained the model with the Padded Input.

2) First Tensor( Input to the network)

As observed and reported as Thread to the forum(https://e2e.ti.com/support/tools/ccs/f/81/t/877757)

The first layer dump had an issue of size( x2) which was resolved on Mr Subhajit Paul side but code modifications were not shared to us(if possible kindly share the changes).

First layer dump(of correct size: 512x512x3) was shared by Mr Shubajit on the above-mentioned thread which was visualized with YUView, It was found to be correct input.

But still on our side,

It is coming to be 512x512x3x2 if numparambits and numfeaturebits are greater then 8.

As of now, we need to have the code changes to obtain the correct trace dump sizes to compare with the PC layer dumps.

After that only we can comment on the input to the model.

Kindly do the needful.

Thanks and Regards,

Vyom Mishra

0 kumar.desappan over 5 years ago in reply to Vyom Mishra1

TI__Mastermind 22145 points

Refer below page for debugging accuracy mismatch issues.

Comparing the input tensor to TIDL with Reference

It is important to match the input tensor to TIDL net with the input tesnor of network which was trained.
Save the input tensor from the training code that you are using in float format.
Use writeTraceLevel = 3 to write the layer level traces from the TIDL to files.
By default the data normalizing batchNorm layer is merged to following convolution layer. So set foldPreBnConv2D = 0 to avoid this.
Compare the output of this batchNorm layer with input tensor from training code. Refer Link

http://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/tidl_j7_01_01_00_10/ti_dl/docs/user_guide_html/md_tidl_fsg_steps_to_debug_mismatch.html

0 Vyom Mishra1 over 5 years ago in reply to kumar.desappan

Genius 4590 points

Dear Sir,

What does the training code mean?

Is training code means PC code only?

Thanks and Regards,

Vyom Mishra

0 kumar.desappan over 5 years ago in reply to Vyom Mishra1

TI__Mastermind 22145 points

The training code means the - framework (Caffe , TensorFlow , pytorch etc) used to generate model used for inference here

0 Sithara Tresa Chacko over 5 years ago in reply to Subhajit Paul

Expert 1120 points

Dear Sir,

We had compared the layer wise dumps for PC and Target.

1.Out of 16 channels for the first convolution layer,6 mismatches were observed ,but the subsequent 45 layers matched visually.

2.For TIDL_Convolution Layer,(1, 40 ,16 ,16) ,(46 th Layer), Since the resolution is small, we dumped the results into a text file and tried to match the same,

But the results are not matching . Please find the attached PC and Board results for your reference.

a)Board

46board.txt

b)PC

46pc.txt

Kindly help to resolve the same..

Regards

Sithara Tresa Chacko

0 Sithara Tresa Chacko over 5 years ago in reply to Sithara Tresa Chacko

Expert 1120 points

Dear Sir,

Comparison between PC and target layer dump results,

1.As the above query, the results were not matching, we tried to compare ctx_output2/sep/relu_mbox_loc_perm Layer from target and ctx_output2/sep/relu_mbox_loc Layer output from PC,

We observed 0.45 percent deviation only.

46|TIDL_ConvolutionLayer |ctx_output2/sep/relu_mbox_loc_perm | 0| 1| 1| 42 x x x x x x x | 46 | 1 512 16 16 | 1 40 16 16 | 5242880 |

Is it fine to do so?

2.Considering the above assumption to be right, all other layer results are matching except the detection_out layer. Multiple bounding boxes are observed for a single pedestrian.

How can we resolve the same?

Regards,

Sithara Tresa Chacko

0 kumar.desappan over 5 years ago in reply to Sithara Tresa Chacko

TI__Mastermind 22145 points

Hi Sithara,

Can you share one input image and expected output from caffe for the same (Along with layer level float tensor from caffe).

We will try to reproduce this issue at our end using the model that you have shared.

Regards,

Kumar.D

0 Pooja R1 over 5 years ago in reply to kumar.desappan

Intellectual 790 points

Hi ,

All the requested inputs are shared by e-mail to our TI contact, Mr .Karthik R by Mr.Sankalp Kallakuri.

Regards,

Sithara Tresa Chacko

0 kumar.desappan over 5 years ago in reply to Pooja R1

TI__Mastermind 22145 points

Hi,

The input image file is not found the zip shared. Could you please share input image file (JPG/BMP/PNG) corresponding to output JPG

0 kumar.desappan over 5 years ago in reply to kumar.desappan

TI__Mastermind 22145 points

Can you try with below Import config. We expect the issue to be solved with this Parameters

modelType = 0
inputNetFile = "D:\work\vision\CNN\customers\xx\deploy.prototxt"
inputParamsFile = "D:\work\vision\CNN\customers\xx\mob.caffemodel"
outputNetFile = "../../test/testvecs/config/tidl_models/caffe/tidl_net_msi_mobilenet_pd_padded.bin"
outputParamsFile = "../../test/testvecs/config/tidl_models/caffe/tidl_io_msi_mobilenet_pd_padded"
numParamBits = 8
numFeatureBits = 8
quantizationStyle = 3
inDataFormat = 0
inWidth = 512
inHeight = 512
inNumChannels = 3
perfSimConfig = "../../test/testvecs/config/import/perfsim_base.cfg"
inData = "D:\work\vision\CNN\customers\xx\img_list.txt"
postProcType = 2
inFileFormat = 2
foldPreBnConv2D = 0

0 Sankalp Kallakuri22 over 5 years ago in reply to kumar.desappan

Intellectual 980 points

Hi Kumar,

We have seen improvement in the results after the suggestions made by you. We are however still facing some cases where some the Caffe output is not matching with the import tool output.

I have attached a few images to show this phenomenon. What worries us is that the score of the detection is quite high for the detection on Caffe side but the detection is completely missed in the import tool side.

I will email few of the images to Karthik. The output of import tool and PC[Caffe] are stitched together the input images have been sent separately.

Best Regards,

Sankalp Kallakuri

0 kumar.desappan over 5 years ago in reply to Sankalp Kallakuri22

TI__Mastermind 22145 points

Hi Sankalp,

I have tried with the images that you have shared.

16-bit results are matching with the expected output, but the 8-bit mode is missing a few detections as you observed.

Could you please quantify the overall accuracy drop with the 8-bit mode?

The 16-bit inference runtime would be considerably higher compared to 8-bit.

we would recommend using quantization aware training (QAT) to get the best results in 8-bit mode. Please refer below link for more information on the same.

https://github.com/TexasInstruments/jacinto-ai-devkit

Note : This dev-kit is for PyTorch. You may not be able to use it for current model immediately.

We are also working on improving the import tool to improve the calibration for 8-bit mode. This will be available around 3Q this year.

So, we would recommend using 8-bit mode for now and use to QAT / improved calibration for 8-bit mode later

Regards,

Kumar.D

0 Sithara Tresa Chacko over 5 years ago in reply to kumar.desappan

Expert 1120 points

Dear Sir,

We had tried the same for numParamBits and numFeatureBits set to 16 and observed multiple boxes for the same input shared in the previous queries.

The results are improved for NumParamBits and NumFeatureBits set to 8 and quantization style 3.

The import configuration file is:

Fullscreen IMPORT.txt Download

modelType          = 0
inputNetFile       = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/deploy.prototxt"
inputParamsFile    = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mob.caffemodel"
outputNetFile      = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/tidl_models/tidl_net_pd_l1.bin"
outputParamsFile   = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/tidl_models/tidl_io_pd_l1"
numParamBits = 16
numFeatureBits = 16
quantizationStyle = 3
inDataFormat = 0
inElementType  = 0 
inWidth = 512
inHeight = 512
inNumChannels = 3
perfSimConfig = "../../test/testvecs/config/import/perfsim_base.cfg"
perfSimTool = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/utils/perfsim/ti_cnnperfsim.out"
tidlStatsTool = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/tidl_quant_stats_tool.out"
inData = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/l1.txt"
numFrames = 3
foldPreBnConv2D = 0
postProcType = 2
inFileFormat = 2

Kindly suggest what can be done from our side.

Thanks &Regards

Sithara Tresa Chacko

0 kumar.desappan over 5 years ago in reply to Sithara Tresa Chacko

TI__Mastermind 22145 points

Sithara ,

We have recently fixed a bug in 16-bit flow object detection layer and we are not observing this multiple detections now.

This bug fix is not part of 6.2 SDK release.

We will update you on the TIDL patch release date for this soon (After aligning internally). BTW, 16-bit mode is not fully optimized will have performance impact

Could you quantify the accuracy degradation with 8-bit mode?

0 Sankalp Kallakuri22 over 5 years ago in reply to kumar.desappan

Intellectual 980 points

Dear Kumar,

For now we feel the speed drop is ok for our current model with 16 bit. But the FPs which are appearing are less tolerable.

Keeping this in mind until you provide the new SDK we will stick to 8 bit.

We will try and give you numbers for accuracy on the 8 bit mode too.

Best Regards,

Sankalp

0 Sithara Tresa Chacko over 5 years ago in reply to Sankalp Kallakuri22

Expert 1120 points

Dear Sir,

Thankyou for the suggestion to set flip =false.

The new model with Flip=False gives better results compared to the previous model. We had observed precision of 93.23571 .

Still in some cases, we observed localization issue.

Can we improve the model .Please suggest.

Thanks & Regards

Sithara Tresa Chacko

0 kumar.desappan over 5 years ago in reply to Sithara Tresa Chacko

TI__Mastermind 22145 points

Hi Sithara,

Can you share the precision difference between the TIDL inference and caffe PC inference

0 Sithara Tresa Chacko over 5 years ago in reply to kumar.desappan

Expert 1120 points

Hi,

We have observed,

Caffe PC side:

Precision:92.44 and Recall:76.65

TIDL inference side:

Precision :93.23 and Recall:68.66

Thanks &Regards

Sithara Tresa Chacko

0 kumar.desappan over 5 years ago in reply to Sithara Tresa Chacko

TI__Mastermind 22145 points

We hope the issue is solved with the patch release shared with you.

Please open new thread if you still face any issue

Processors

Processors forum

CCS: TDA4x: Mobilenet model trained on Padded Input

Comparing the input tensor to TIDL with Reference