This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS: TDA4x: Mobilenet model trained on Padded Input

Tool/software: Code Composer Studio

Dear Sir,

I am using "tidl_j7_01_00_00_00" for Importing the "mobilenet model".

With reference to the previous post, we resolved the random box and Bounding box localization issue by changing the threshold and using the "quantizationstyle=3' instead of '2'.

But when we are trying to run the same model trained on "Padded input data" we are facing the huge mismatch in PC and target side results.

I have attached PC and target board output below for comparison:

PC

Target

Please find the import and inference config files attached:

modelType          = 0
inputNetFile       = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/new_padded/deploy.prototxt"
inputParamsFile    = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/new_padded/mob.caffemodel"
outputNetFile      = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/tidl_models/caffe/tidl_net_msi_mobilenet_pd_padded.bin"
outputParamsFile   = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/tidl_models/caffe/tidl_io_msi_mobilenet_pd_padded"
numParamBits = 12
numFeatureBits = 12
quantizationStyle = 2
inDataFormat = 0
inElementType  = 0 
inWidth = 512
inHeight = 512
inNumChannels = 3
perfSimConfig = "../../test/testvecs/config/import/perfsim_base.cfg"
inData = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/det.txt"
numFrames = 100
postProcType = 2
inFileFormat = 2






inFileFormat    = 2
postProcType = 2
numFrames   = 1
padInBuffInTB = 1
netBinFile      = "testvecs/config/tidl_models/caffe/tidl_net_msi_mobilenet_pd_padded.bin"
ioConfigFile    = "testvecs/config/tidl_models/caffe/tidl_io_msi_mobilenet_pd_padded1.bin"
outData =   "testvecs/output/msi_mobilenet.bin"
inData  =   "testvecs/config/det.txt"
debugTraceLevel = 1
writeTraceLevel = 0
numFrames = 22
writeOutput = 1






We have used  "quantizationstyle=2' for this model as results for this quantization is better than of quantizationstyle=3.

Confidence threshold: 0.3( as it has better detection as compared to other thresholds)

Kindly help us to improve our target side results.

Note: Model trained on normal input has the results shared in the related post as "results.zip"

or please find here for your reference 0434.results.zip

Thanks and Regards,

Vyom Mishra

  • Sankalp Kallakuri, Vyom Mishra,

    Can you answer the following questions which will help me understand the problem better?

    1. When you say PC results, are these generated by the training framework, or TIDL host emulation?

    2. Did you retrain your network use padded inputs?

    3. Is the resolution same between training, inference, PC run and target run?

    - Subhajit

  • Hi Subhajit,

    Please find answers in square brackets.

    1. When you say PC results, are these generated by the training framework, or TIDL host emulation? [Training Framework]

    2. Did you retrain your network use padded inputs?[Yes]

    3. Is the resolution same between training, inference, PC run and target run?[Yes]

    Best Regards,

    Sankalp 

  • Sankalp Kallakuri,

    TIDL does not guarantee that the results will match will match the results obtained from training framework due to quantization.

    Can you send the console output of the import process and the layersminmax file?

    - Subhajit

  • Subhajit,

    We agree that quantization would play a part in the degradation of results. We experimented with quantization style as well as num parambits and num feature bits but did not find improvement in the results. We shall share the requested files tomorrow.

    Best Regards,

    Sankalp Kallakuri

  • Dear Subhajit,

    Please find the requested file for your reference.

    Padded_model.zip

    Thanks and Regards,

    Vyom Mishra

  • Vyom Mishra

    I will look into the files provided.

    Also, can you look into the "steps to debug" document and match the input and layer level output between framework (PC) and TIDL host-emulation (PC)

    - Subhajit

  • Subhajit,

    We will take up this activity. We are also sharing the min max values of the unpadded and padded image models over here from the PC side[framework output].The unpadded worked well on the TDA2x. Both perform poorly on TDA4x.

    Unpadded baseline model range file is "Baseline_model_params_range" padded model range file is "MDK_COCO_model_params_range". 

    Unable to attach files here will share via email.

    Regards,

    Sankalp

  • Sankalp Kallakuri,

    I have recieved the files. Let me have a look

    - Subhajit

  • Gentle Reminder!

  • Dear Sir,

    We have experimented with L1 Regularized model and facing the similar FP's on the detection, Please find for your reference:

    Parameters used while importing:

    Confidence threshold: 0.3

    numparambits =12

    numfeaturebits = 12

    quantizationstyle = 2

    Kindly help us to resolve the issue.

    Thanks and Regards,

    Vyom Mishra

  • Dear Sir,

    We have experimented with the configuration as below:

    numparambits =8

    numfeaturebits =8

    quantizationstyle =3

    above parameters were giving good results for the un-padded model.

    While experimenting with the same parameters for the L1 Regularized padded model we observed

    a) No FP's

    b) Missed detections

    c) very low scores for True Positives

    ** Detected bounding boxes score were matching the PC scores when parameters were numparambits = numfeaturebits =12

    I am sharing you some results please find it for your reference:

    Kindly do the needful.

    Thanks and Regards,

    Vyom Mishra

  • Dear Sir,

    I have attached the .csv and log.txt file for you reference

    1)

    Configuration:

    model: L1 Regularized trained on Padded Input data

    Confidence Threshold: 0.3

    numparambits = numfeaturebits =12

    quantizationstyle = 3

    tidl_net_l1.bin_paramDebug.csv

    Num of Layer Detected :  65 
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Num|TIDL Layer Name               |Out Data Name                                     |Group |#Ins  |#Outs |Inbuf Ids                       |Outbuf Id |In NCHW                             |Out NCHW                            |MACS       |
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0|TIDL_DataLayer                |data                                              |     0|    -1|     1|  x   x   x   x   x   x   x   x |  0       |       0        0        0        0 |       1        3      512      512 |         0 |
        1|TIDL_ConvolutionLayer         |conv1                                             |     0|     1|     1|  0   x   x   x   x   x   x   x |  1       |       1        3      512      512 |       1       16      256      256 |  30408704 |
        2|TIDL_ConvolutionLayer         |conv2_1/dw                                        |     0|     1|     1|  1   x   x   x   x   x   x   x |  2       |       1       16      256      256 |       1       16      256      256 |  11534336 |
        3|TIDL_ConvolutionLayer         |conv2_1/sep                                       |     0|     1|     1|  2   x   x   x   x   x   x   x |  3       |       1       16      256      256 |       1       32      256      256 |  37748736 |
        4|TIDL_ConvolutionLayer         |conv2_2/dw                                        |     0|     1|     1|  3   x   x   x   x   x   x   x |  4       |       1       32      256      256 |       1       32      128      128 |   5767168 |
        5|TIDL_ConvolutionLayer         |conv2_2/sep                                       |     0|     1|     1|  4   x   x   x   x   x   x   x |  5       |       1       32      128      128 |       1       64      128      128 |  35651584 |
        6|TIDL_ConvolutionLayer         |conv3_1/dw                                        |     0|     1|     1|  5   x   x   x   x   x   x   x |  6       |       1       64      128      128 |       1       64      128      128 |  11534336 |
        7|TIDL_ConvolutionLayer         |conv3_1/sep                                       |     0|     1|     1|  6   x   x   x   x   x   x   x |  7       |       1       64      128      128 |       1       64      128      128 |  69206016 |
        8|TIDL_ConvolutionLayer         |conv3_2/dw                                        |     0|     1|     1|  7   x   x   x   x   x   x   x |  8       |       1       64      128      128 |       1       64       64       64 |   2883584 |
        9|TIDL_ConvolutionLayer         |conv3_2/sep                                       |     0|     1|     1|  8   x   x   x   x   x   x   x |  9       |       1       64       64       64 |       1      128       64       64 |  34603008 |
       10|TIDL_ConvolutionLayer         |conv4_1/dw                                        |     0|     1|     1|  9   x   x   x   x   x   x   x | 10       |       1      128       64       64 |       1      128       64       64 |   5767168 |
       11|TIDL_ConvolutionLayer         |conv4_1/sep                                       |     0|     1|     1| 10   x   x   x   x   x   x   x | 11       |       1      128       64       64 |       1      128       64       64 |  68157440 |
       12|TIDL_ConvolutionLayer         |conv4_2/dw                                        |     0|     1|     1| 11   x   x   x   x   x   x   x | 12       |       1      128       64       64 |       1      128       32       32 |   1441792 |
       13|TIDL_ConvolutionLayer         |conv4_2/sep                                       |     0|     1|     1| 12   x   x   x   x   x   x   x | 13       |       1      128       32       32 |       1      256       32       32 |  34078720 |
       14|TIDL_ConvolutionLayer         |conv5_1/dw                                        |     0|     1|     1| 13   x   x   x   x   x   x   x | 14       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       15|TIDL_ConvolutionLayer         |conv5_1/sep                                       |     0|     1|     1| 14   x   x   x   x   x   x   x | 15       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       16|TIDL_ConvolutionLayer         |conv5_2/dw                                        |     0|     1|     1| 15   x   x   x   x   x   x   x | 16       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       17|TIDL_ConvolutionLayer         |conv5_2/sep                                       |     0|     1|     1| 16   x   x   x   x   x   x   x | 17       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       18|TIDL_ConvolutionLayer         |conv5_3/dw                                        |     0|     1|     1| 17   x   x   x   x   x   x   x | 18       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       19|TIDL_ConvolutionLayer         |conv5_3/sep                                       |     0|     1|     1| 18   x   x   x   x   x   x   x | 19       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       20|TIDL_ConvolutionLayer         |conv5_4/dw                                        |     0|     1|     1| 19   x   x   x   x   x   x   x | 20       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       21|TIDL_ConvolutionLayer         |conv5_4/sep                                       |     0|     1|     1| 20   x   x   x   x   x   x   x | 21       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       22|TIDL_ConvolutionLayer         |conv5_5/dw                                        |     0|     1|     1| 21   x   x   x   x   x   x   x | 22       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       23|TIDL_ConvolutionLayer         |conv5_5/sep                                       |     0|     1|     1| 22   x   x   x   x   x   x   x | 23       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       24|TIDL_ConvolutionLayer         |conv5_6/dw                                        |     0|     1|     1| 23   x   x   x   x   x   x   x | 24       |       1      256       32       32 |       1      256       16       16 |    720896 |
       25|TIDL_ConvolutionLayer         |ctx_output1/dw                                    |     0|     1|     1| 23   x   x   x   x   x   x   x | 25       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       26|TIDL_ConvolutionLayer         |conv5_6/sep                                       |     0|     1|     1| 24   x   x   x   x   x   x   x | 26       |       1      256       16       16 |       1      512       16       16 |  33816576 |
       27|TIDL_ConvolutionLayer         |ctx_output1/sep                                   |     0|     1|     1| 25   x   x   x   x   x   x   x | 27       |       1      256       32       32 |       1      512       32       32 | 135266304 |
       28|TIDL_ConvolutionLayer         |ctx_output1/sep/relu_mbox_loc_perm                |     0|     1|     1| 27   x   x   x   x   x   x   x | 28       |       1      512       32       32 |       1       24       32       32 |  12582912 |
       29|TIDL_ConvolutionLayer         |conv6/dw                                          |     0|     1|     1| 26   x   x   x   x   x   x   x | 29       |       1      512       16       16 |       1      512       16       16 |   1441792 |
       30|TIDL_ConvolutionLayer         |ctx_output1/sep/relu_mbox_conf_perm               |     0|     1|     1| 27   x   x   x   x   x   x   x | 30       |       1      512       32       32 |       1       12       32       32 |   6291456 |
       31|TIDL_FlattenLayer             |ctx_output1/sep/relu_mbox_loc_flat                |     0|     1|     1| 28   x   x   x   x   x   x   x | 31       |       1       24       32       32 |       1        1        1    24576 |     24576 |
       32|TIDL_FlattenLayer             |ctx_output1/sep/relu_mbox_conf_flat               |     0|     1|     1| 30   x   x   x   x   x   x   x | 32       |       1       12       32       32 |       1        1        1    12288 |     12288 |
       33|TIDL_ConvolutionLayer         |conv6/sep                                         |     0|     1|     1| 29   x   x   x   x   x   x   x | 33       |       1      512       16       16 |       1      512       16       16 |  67371008 |
       34|TIDL_PoolingLayer             |pool6                                             |     0|     1|     1| 33   x   x   x   x   x   x   x | 34       |       1      512       16       16 |       1      512        8        8 |    131072 |
       35|TIDL_PoolingLayer             |pool7                                             |     0|     1|     1| 34   x   x   x   x   x   x   x | 35       |       1      512        8        8 |       1      512        4        4 |     32768 |
       36|TIDL_PoolingLayer             |pool8                                             |     0|     1|     1| 35   x   x   x   x   x   x   x | 36       |       1      512        4        4 |       1      512        2        2 |      8192 |
       37|TIDL_ConvolutionLayer         |ctx_output2/dw                                    |     0|     1|     1| 33   x   x   x   x   x   x   x | 37       |       1      512       16       16 |       1      512       16       16 |   1441792 |
       38|TIDL_ConvolutionLayer         |ctx_output3/dw                                    |     0|     1|     1| 34   x   x   x   x   x   x   x | 38       |       1      512        8        8 |       1      512        8        8 |    360448 |
       39|TIDL_ConvolutionLayer         |ctx_output4/dw                                    |     0|     1|     1| 35   x   x   x   x   x   x   x | 39       |       1      512        4        4 |       1      512        4        4 |     90112 |
       40|TIDL_ConvolutionLayer         |ctx_output5/dw                                    |     0|     1|     1| 36   x   x   x   x   x   x   x | 40       |       1      512        2        2 |       1      512        2        2 |     22528 |
       41|TIDL_ConvolutionLayer         |ctx_output2/sep                                   |     0|     1|     1| 37   x   x   x   x   x   x   x | 41       |       1      512       16       16 |       1      512       16       16 |  67371008 |
       42|TIDL_ConvolutionLayer         |ctx_output3/sep                                   |     0|     1|     1| 38   x   x   x   x   x   x   x | 42       |       1      512        8        8 |       1      512        8        8 |  16842752 |
       43|TIDL_ConvolutionLayer         |ctx_output4/sep                                   |     0|     1|     1| 39   x   x   x   x   x   x   x | 43       |       1      512        4        4 |       1      512        4        4 |   4210688 |
       44|TIDL_ConvolutionLayer         |ctx_output5/sep                                   |     0|     1|     1| 40   x   x   x   x   x   x   x | 44       |       1      512        2        2 |       1      512        2        2 |   1052672 |
       45|TIDL_ConvolutionLayer         |ctx_output2/sep/relu_mbox_loc_perm                |     0|     1|     1| 41   x   x   x   x   x   x   x | 45       |       1      512       16       16 |       1       40       16       16 |   5242880 |
       46|TIDL_ConvolutionLayer         |ctx_output3/sep/relu_mbox_loc_perm                |     0|     1|     1| 42   x   x   x   x   x   x   x | 46       |       1      512        8        8 |       1       40        8        8 |   1310720 |
       47|TIDL_ConvolutionLayer         |ctx_output4/sep/relu_mbox_loc_perm                |     0|     1|     1| 43   x   x   x   x   x   x   x | 47       |       1      512        4        4 |       1       24        4        4 |    196608 |
       48|TIDL_ConvolutionLayer         |ctx_output5/sep/relu_mbox_loc_perm                |     0|     1|     1| 44   x   x   x   x   x   x   x | 48       |       1      512        2        2 |       1       24        2        2 |     49152 |
       49|TIDL_ConvolutionLayer         |ctx_output2/sep/relu_mbox_conf_perm               |     0|     1|     1| 41   x   x   x   x   x   x   x | 49       |       1      512       16       16 |       1       20       16       16 |   2621440 |
       50|TIDL_ConvolutionLayer         |ctx_output3/sep/relu_mbox_conf_perm               |     0|     1|     1| 42   x   x   x   x   x   x   x | 50       |       1      512        8        8 |       1       20        8        8 |    655360 |
       51|TIDL_ConvolutionLayer         |ctx_output4/sep/relu_mbox_conf_perm               |     0|     1|     1| 43   x   x   x   x   x   x   x | 51       |       1      512        4        4 |       1       12        4        4 |     98304 |
       52|TIDL_ConvolutionLayer         |ctx_output5/sep/relu_mbox_conf_perm               |     0|     1|     1| 44   x   x   x   x   x   x   x | 52       |       1      512        2        2 |       1       12        2        2 |     24576 |
       53|TIDL_FlattenLayer             |ctx_output2/sep/relu_mbox_loc_flat                |     0|     1|     1| 45   x   x   x   x   x   x   x | 53       |       1       40       16       16 |       1        1        1    10240 |     10240 |
       54|TIDL_FlattenLayer             |ctx_output3/sep/relu_mbox_loc_flat                |     0|     1|     1| 46   x   x   x   x   x   x   x | 54       |       1       40        8        8 |       1        1        1     2560 |      2560 |
       55|TIDL_FlattenLayer             |ctx_output4/sep/relu_mbox_loc_flat                |     0|     1|     1| 47   x   x   x   x   x   x   x | 55       |       1       24        4        4 |       1        1        1      384 |       384 |
       56|TIDL_FlattenLayer             |ctx_output5/sep/relu_mbox_loc_flat                |     0|     1|     1| 48   x   x   x   x   x   x   x | 56       |       1       24        2        2 |       1        1        1       96 |        96 |
       57|TIDL_FlattenLayer             |ctx_output2/sep/relu_mbox_conf_flat               |     0|     1|     1| 49   x   x   x   x   x   x   x | 57       |       1       20       16       16 |       1        1        1     5120 |      5120 |
       58|TIDL_FlattenLayer             |ctx_output3/sep/relu_mbox_conf_flat               |     0|     1|     1| 50   x   x   x   x   x   x   x | 58       |       1       20        8        8 |       1        1        1     1280 |      1280 |
       59|TIDL_FlattenLayer             |ctx_output4/sep/relu_mbox_conf_flat               |     0|     1|     1| 51   x   x   x   x   x   x   x | 59       |       1       12        4        4 |       1        1        1      192 |       192 |
       60|TIDL_FlattenLayer             |ctx_output5/sep/relu_mbox_conf_flat               |     0|     1|     1| 52   x   x   x   x   x   x   x | 60       |       1       12        2        2 |       1        1        1       48 |        48 |
       61|TIDL_ConcatLayer              |mbox_loc                                          |     0|     5|     1| 31  53  54  55  56   x   x   x | 61       |       1        1        1    24576 |       1        1        1    37856 |    122880 |
       62|TIDL_ConcatLayer              |mbox_conf_flatten                                 |     0|     5|     1| 32  57  58  59  60   x   x   x | 62       |       1        1        1    12288 |       1        1        1    18928 |     61440 |
       63|TIDL_DetectionOutputLayer     |detection_out                                     |     0|     2|     1| 61  62   x   x   x   x   x   x | 63       |       1        1        1    37856 |       1        1        1      144 |         0 |
       64|TIDL_DataLayer                |detection_out                                     |     0|     1|    -1| 63   x   x   x   x   x   x   x |  0       |       1        1        1      144 |       0        0        0        0 |         0 |
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Total Giga Macs : 1.0637
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    Observations:

    a) The double-digit difference is observed between "meanOrigFloat" and " orgmax" 

    b) Layer ID: (Having difference >5) :4,6,10,12,14,16,18,20 are Depth Wise convolution

    As per the observations Depthwise Convolution has the maximum Quantization loss.

    2)

    Configuration:

    model: L1 Regularized trained on Padded Input data

    Confidence Threshold: 0.3

    numparambits = numfeaturebits =12

    quantizationstyle = 2

    tidl_net_msi_l1.bin_paramDebug.csv

    Num of Layer Detected :  65 
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Num|TIDL Layer Name               |Out Data Name                                     |Group |#Ins  |#Outs |Inbuf Ids                       |Outbuf Id |In NCHW                             |Out NCHW                            |MACS       |
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0|TIDL_DataLayer                |data                                              |     0|    -1|     1|  x   x   x   x   x   x   x   x |  0       |       0        0        0        0 |       1        3      512      512 |         0 |
        1|TIDL_ConvolutionLayer         |conv1                                             |     0|     1|     1|  0   x   x   x   x   x   x   x |  1       |       1        3      512      512 |       1       16      256      256 |  30408704 |
        2|TIDL_ConvolutionLayer         |conv2_1/dw                                        |     0|     1|     1|  1   x   x   x   x   x   x   x |  2       |       1       16      256      256 |       1       16      256      256 |  11534336 |
        3|TIDL_ConvolutionLayer         |conv2_1/sep                                       |     0|     1|     1|  2   x   x   x   x   x   x   x |  3       |       1       16      256      256 |       1       32      256      256 |  37748736 |
        4|TIDL_ConvolutionLayer         |conv2_2/dw                                        |     0|     1|     1|  3   x   x   x   x   x   x   x |  4       |       1       32      256      256 |       1       32      128      128 |   5767168 |
        5|TIDL_ConvolutionLayer         |conv2_2/sep                                       |     0|     1|     1|  4   x   x   x   x   x   x   x |  5       |       1       32      128      128 |       1       64      128      128 |  35651584 |
        6|TIDL_ConvolutionLayer         |conv3_1/dw                                        |     0|     1|     1|  5   x   x   x   x   x   x   x |  6       |       1       64      128      128 |       1       64      128      128 |  11534336 |
        7|TIDL_ConvolutionLayer         |conv3_1/sep                                       |     0|     1|     1|  6   x   x   x   x   x   x   x |  7       |       1       64      128      128 |       1       64      128      128 |  69206016 |
        8|TIDL_ConvolutionLayer         |conv3_2/dw                                        |     0|     1|     1|  7   x   x   x   x   x   x   x |  8       |       1       64      128      128 |       1       64       64       64 |   2883584 |
        9|TIDL_ConvolutionLayer         |conv3_2/sep                                       |     0|     1|     1|  8   x   x   x   x   x   x   x |  9       |       1       64       64       64 |       1      128       64       64 |  34603008 |
       10|TIDL_ConvolutionLayer         |conv4_1/dw                                        |     0|     1|     1|  9   x   x   x   x   x   x   x | 10       |       1      128       64       64 |       1      128       64       64 |   5767168 |
       11|TIDL_ConvolutionLayer         |conv4_1/sep                                       |     0|     1|     1| 10   x   x   x   x   x   x   x | 11       |       1      128       64       64 |       1      128       64       64 |  68157440 |
       12|TIDL_ConvolutionLayer         |conv4_2/dw                                        |     0|     1|     1| 11   x   x   x   x   x   x   x | 12       |       1      128       64       64 |       1      128       32       32 |   1441792 |
       13|TIDL_ConvolutionLayer         |conv4_2/sep                                       |     0|     1|     1| 12   x   x   x   x   x   x   x | 13       |       1      128       32       32 |       1      256       32       32 |  34078720 |
       14|TIDL_ConvolutionLayer         |conv5_1/dw                                        |     0|     1|     1| 13   x   x   x   x   x   x   x | 14       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       15|TIDL_ConvolutionLayer         |conv5_1/sep                                       |     0|     1|     1| 14   x   x   x   x   x   x   x | 15       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       16|TIDL_ConvolutionLayer         |conv5_2/dw                                        |     0|     1|     1| 15   x   x   x   x   x   x   x | 16       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       17|TIDL_ConvolutionLayer         |conv5_2/sep                                       |     0|     1|     1| 16   x   x   x   x   x   x   x | 17       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       18|TIDL_ConvolutionLayer         |conv5_3/dw                                        |     0|     1|     1| 17   x   x   x   x   x   x   x | 18       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       19|TIDL_ConvolutionLayer         |conv5_3/sep                                       |     0|     1|     1| 18   x   x   x   x   x   x   x | 19       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       20|TIDL_ConvolutionLayer         |conv5_4/dw                                        |     0|     1|     1| 19   x   x   x   x   x   x   x | 20       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       21|TIDL_ConvolutionLayer         |conv5_4/sep                                       |     0|     1|     1| 20   x   x   x   x   x   x   x | 21       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       22|TIDL_ConvolutionLayer         |conv5_5/dw                                        |     0|     1|     1| 21   x   x   x   x   x   x   x | 22       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       23|TIDL_ConvolutionLayer         |conv5_5/sep                                       |     0|     1|     1| 22   x   x   x   x   x   x   x | 23       |       1      256       32       32 |       1      256       32       32 |  67633152 |
       24|TIDL_ConvolutionLayer         |conv5_6/dw                                        |     0|     1|     1| 23   x   x   x   x   x   x   x | 24       |       1      256       32       32 |       1      256       16       16 |    720896 |
       25|TIDL_ConvolutionLayer         |ctx_output1/dw                                    |     0|     1|     1| 23   x   x   x   x   x   x   x | 25       |       1      256       32       32 |       1      256       32       32 |   2883584 |
       26|TIDL_ConvolutionLayer         |conv5_6/sep                                       |     0|     1|     1| 24   x   x   x   x   x   x   x | 26       |       1      256       16       16 |       1      512       16       16 |  33816576 |
       27|TIDL_ConvolutionLayer         |ctx_output1/sep                                   |     0|     1|     1| 25   x   x   x   x   x   x   x | 27       |       1      256       32       32 |       1      512       32       32 | 135266304 |
       28|TIDL_ConvolutionLayer         |ctx_output1/sep/relu_mbox_loc_perm                |     0|     1|     1| 27   x   x   x   x   x   x   x | 28       |       1      512       32       32 |       1       24       32       32 |  12582912 |
       29|TIDL_ConvolutionLayer         |conv6/dw                                          |     0|     1|     1| 26   x   x   x   x   x   x   x | 29       |       1      512       16       16 |       1      512       16       16 |   1441792 |
       30|TIDL_ConvolutionLayer         |ctx_output1/sep/relu_mbox_conf_perm               |     0|     1|     1| 27   x   x   x   x   x   x   x | 30       |       1      512       32       32 |       1       12       32       32 |   6291456 |
       31|TIDL_FlattenLayer             |ctx_output1/sep/relu_mbox_loc_flat                |     0|     1|     1| 28   x   x   x   x   x   x   x | 31       |       1       24       32       32 |       1        1        1    24576 |     24576 |
       32|TIDL_FlattenLayer             |ctx_output1/sep/relu_mbox_conf_flat               |     0|     1|     1| 30   x   x   x   x   x   x   x | 32       |       1       12       32       32 |       1        1        1    12288 |     12288 |
       33|TIDL_ConvolutionLayer         |conv6/sep                                         |     0|     1|     1| 29   x   x   x   x   x   x   x | 33       |       1      512       16       16 |       1      512       16       16 |  67371008 |
       34|TIDL_PoolingLayer             |pool6                                             |     0|     1|     1| 33   x   x   x   x   x   x   x | 34       |       1      512       16       16 |       1      512        8        8 |    131072 |
       35|TIDL_PoolingLayer             |pool7                                             |     0|     1|     1| 34   x   x   x   x   x   x   x | 35       |       1      512        8        8 |       1      512        4        4 |     32768 |
       36|TIDL_PoolingLayer             |pool8                                             |     0|     1|     1| 35   x   x   x   x   x   x   x | 36       |       1      512        4        4 |       1      512        2        2 |      8192 |
       37|TIDL_ConvolutionLayer         |ctx_output2/dw                                    |     0|     1|     1| 33   x   x   x   x   x   x   x | 37       |       1      512       16       16 |       1      512       16       16 |   1441792 |
       38|TIDL_ConvolutionLayer         |ctx_output3/dw                                    |     0|     1|     1| 34   x   x   x   x   x   x   x | 38       |       1      512        8        8 |       1      512        8        8 |    360448 |
       39|TIDL_ConvolutionLayer         |ctx_output4/dw                                    |     0|     1|     1| 35   x   x   x   x   x   x   x | 39       |       1      512        4        4 |       1      512        4        4 |     90112 |
       40|TIDL_ConvolutionLayer         |ctx_output5/dw                                    |     0|     1|     1| 36   x   x   x   x   x   x   x | 40       |       1      512        2        2 |       1      512        2        2 |     22528 |
       41|TIDL_ConvolutionLayer         |ctx_output2/sep                                   |     0|     1|     1| 37   x   x   x   x   x   x   x | 41       |       1      512       16       16 |       1      512       16       16 |  67371008 |
       42|TIDL_ConvolutionLayer         |ctx_output3/sep                                   |     0|     1|     1| 38   x   x   x   x   x   x   x | 42       |       1      512        8        8 |       1      512        8        8 |  16842752 |
       43|TIDL_ConvolutionLayer         |ctx_output4/sep                                   |     0|     1|     1| 39   x   x   x   x   x   x   x | 43       |       1      512        4        4 |       1      512        4        4 |   4210688 |
       44|TIDL_ConvolutionLayer         |ctx_output5/sep                                   |     0|     1|     1| 40   x   x   x   x   x   x   x | 44       |       1      512        2        2 |       1      512        2        2 |   1052672 |
       45|TIDL_ConvolutionLayer         |ctx_output2/sep/relu_mbox_loc_perm                |     0|     1|     1| 41   x   x   x   x   x   x   x | 45       |       1      512       16       16 |       1       40       16       16 |   5242880 |
       46|TIDL_ConvolutionLayer         |ctx_output3/sep/relu_mbox_loc_perm                |     0|     1|     1| 42   x   x   x   x   x   x   x | 46       |       1      512        8        8 |       1       40        8        8 |   1310720 |
       47|TIDL_ConvolutionLayer         |ctx_output4/sep/relu_mbox_loc_perm                |     0|     1|     1| 43   x   x   x   x   x   x   x | 47       |       1      512        4        4 |       1       24        4        4 |    196608 |
       48|TIDL_ConvolutionLayer         |ctx_output5/sep/relu_mbox_loc_perm                |     0|     1|     1| 44   x   x   x   x   x   x   x | 48       |       1      512        2        2 |       1       24        2        2 |     49152 |
       49|TIDL_ConvolutionLayer         |ctx_output2/sep/relu_mbox_conf_perm               |     0|     1|     1| 41   x   x   x   x   x   x   x | 49       |       1      512       16       16 |       1       20       16       16 |   2621440 |
       50|TIDL_ConvolutionLayer         |ctx_output3/sep/relu_mbox_conf_perm               |     0|     1|     1| 42   x   x   x   x   x   x   x | 50       |       1      512        8        8 |       1       20        8        8 |    655360 |
       51|TIDL_ConvolutionLayer         |ctx_output4/sep/relu_mbox_conf_perm               |     0|     1|     1| 43   x   x   x   x   x   x   x | 51       |       1      512        4        4 |       1       12        4        4 |     98304 |
       52|TIDL_ConvolutionLayer         |ctx_output5/sep/relu_mbox_conf_perm               |     0|     1|     1| 44   x   x   x   x   x   x   x | 52       |       1      512        2        2 |       1       12        2        2 |     24576 |
       53|TIDL_FlattenLayer             |ctx_output2/sep/relu_mbox_loc_flat                |     0|     1|     1| 45   x   x   x   x   x   x   x | 53       |       1       40       16       16 |       1        1        1    10240 |     10240 |
       54|TIDL_FlattenLayer             |ctx_output3/sep/relu_mbox_loc_flat                |     0|     1|     1| 46   x   x   x   x   x   x   x | 54       |       1       40        8        8 |       1        1        1     2560 |      2560 |
       55|TIDL_FlattenLayer             |ctx_output4/sep/relu_mbox_loc_flat                |     0|     1|     1| 47   x   x   x   x   x   x   x | 55       |       1       24        4        4 |       1        1        1      384 |       384 |
       56|TIDL_FlattenLayer             |ctx_output5/sep/relu_mbox_loc_flat                |     0|     1|     1| 48   x   x   x   x   x   x   x | 56       |       1       24        2        2 |       1        1        1       96 |        96 |
       57|TIDL_FlattenLayer             |ctx_output2/sep/relu_mbox_conf_flat               |     0|     1|     1| 49   x   x   x   x   x   x   x | 57       |       1       20       16       16 |       1        1        1     5120 |      5120 |
       58|TIDL_FlattenLayer             |ctx_output3/sep/relu_mbox_conf_flat               |     0|     1|     1| 50   x   x   x   x   x   x   x | 58       |       1       20        8        8 |       1        1        1     1280 |      1280 |
       59|TIDL_FlattenLayer             |ctx_output4/sep/relu_mbox_conf_flat               |     0|     1|     1| 51   x   x   x   x   x   x   x | 59       |       1       12        4        4 |       1        1        1      192 |       192 |
       60|TIDL_FlattenLayer             |ctx_output5/sep/relu_mbox_conf_flat               |     0|     1|     1| 52   x   x   x   x   x   x   x | 60       |       1       12        2        2 |       1        1        1       48 |        48 |
       61|TIDL_ConcatLayer              |mbox_loc                                          |     0|     5|     1| 31  53  54  55  56   x   x   x | 61       |       1        1        1    24576 |       1        1        1    37856 |    122880 |
       62|TIDL_ConcatLayer              |mbox_conf_flatten                                 |     0|     5|     1| 32  57  58  59  60   x   x   x | 62       |       1        1        1    12288 |       1        1        1    18928 |     61440 |
       63|TIDL_DetectionOutputLayer     |detection_out                                     |     0|     2|     1| 61  62   x   x   x   x   x   x | 63       |       1        1        1    37856 |       1        1        1      144 |         0 |
       64|TIDL_DataLayer                |detection_out                                     |     0|     1|    -1| 63   x   x   x   x   x   x   x |  0       |       1        1        1      144 |       0        0        0        0 |         0 |
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Total Giga Macs : 1.0637
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    Observations:

    a) The double-digit difference is observed between "meanOrigFloat" and " orgmax" 

    b) Layer ID: (Having difference >5) :4,6,10,12,14,16,18,20 are Depth Wise convolution

                                                            :3 are Separable Convolution

    As per the observations, Separable Convolution(for one layer:3), Depthwise Convolution has the maximum Quantization loss.

    Kindly do the needful.

    Thanks and Regards,

    Vyom Mishra

  • Are you comparing the 12-bit TIDL results with you PC based reference software (Caffe/TensorFlow) right?

    If yes, did you match the first tensor (Input to the network shall exactly match)

    BTW, did you train the model with padded input or original model was trained without a pad?

  • Are you comparing the 12-bit TIDL results with you PC based reference software (Caffe/TensorFlow) right? [Yes]

    If yes, did you match the first tensor (Input to the network shall exactly match)[https://e2e.ti.com/support/tools/ccs/f/81/p/877757/3253803?tisearch=e2e-sitesearch&keymatch=vyom#3253803]

    BTW, did you train the model with padded input or original model was trained without a pad?[Yes]

  • Hi, Your answer to below two questions are not clear. Can you be more specific?

    if yes, did you match the first tensor (Input to the network shall exactly match)

    BTW, did you re-train the model with padded input

  • Dear Sir,

    1) We have trained the model with the Padded Input.

    2) First Tensor( Input to the network)

    As observed and reported as Thread to the forum(https://e2e.ti.com/support/tools/ccs/f/81/t/877757)

    The first layer dump had an issue of size( x2) which was resolved on Mr Subhajit Paul side but code modifications were not shared to us(if possible kindly share the changes).

    First layer dump(of correct size: 512x512x3) was shared by Mr Shubajit on the above-mentioned thread which was visualized with YUView, It was found to be correct input.

    But still on our side,

    It is coming to be 512x512x3x2 if numparambits and numfeaturebits are greater then 8. 

    As of now, we need to have the code changes to obtain the correct trace dump sizes to compare with the PC layer dumps.

    After that only we can comment on the input to the model.

    Kindly do the needful.

    Thanks and Regards,

    Vyom Mishra 

  • Refer below page for debugging accuracy mismatch issues.

    Comparing the input tensor to TIDL with Reference

    • It is important to match the input tensor to TIDL net with the input tesnor of network which was trained.
    • Save the input tensor from the training code that you are using in float format.
    • Use writeTraceLevel = 3 to write the layer level traces from the TIDL to files.
    • By default the data normalizing batchNorm layer is merged to following convolution layer. So set foldPreBnConv2D = 0 to avoid this.
    • Compare the output of this batchNorm layer with input tensor from training code. Refer Link

    http://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/tidl_j7_01_01_00_10/ti_dl/docs/user_guide_html/md_tidl_fsg_steps_to_debug_mismatch.html

  • Dear Sir,

    What does the training code mean?

    Is training code means  PC code only?

    Thanks and Regards,

    Vyom Mishra

  • The training code means the - framework (Caffe , TensorFlow , pytorch etc)  used to generate model used for inference here

  • Dear Sir,

    We had compared the layer wise dumps for PC and Target.

    1.Out of 16 channels for the first convolution layer,6 mismatches were observed ,but the subsequent 45 layers matched visually.

    2.For TIDL_Convolution Layer,(1, 40 ,16 ,16) ,(46 th Layer), Since the resolution is small, we dumped the results into a text file and tried to match the same,

    But the results are not matching . Please find the attached PC and Board results for your reference.

    a)Board 

    46board.txt

    b)PC 

    46pc.txt

    Kindly help to resolve the same..

    Regards

    Sithara Tresa Chacko

  • Dear Sir,

     Comparison between PC and target layer dump results,

    1.As the above query, the results were not matching, we tried to compare ctx_output2/sep/relu_mbox_loc_perm   Layer from target and  ctx_output2/sep/relu_mbox_loc Layer output from PC,  

    We observed 0.45 percent deviation only.    

    46|TIDL_ConvolutionLayer    |ctx_output2/sep/relu_mbox_loc_perm                |     0|     1|     1| 42   x   x   x   x   x   x   x | 46       |       1      512       16       16 |       1       40       16       16 |   5242880 |

    Is it fine to do so?

    2.Considering the above assumption to be right, all other layer results are matching except the detection_out layer. Multiple bounding boxes are observed for a single pedestrian.

    How can we resolve the same?

    Regards,

    Sithara Tresa Chacko

  • Hi Sithara,

    Can you share one input image and expected output from caffe for the same (Along with layer level float tensor from caffe).

    We will try to reproduce this issue at our end using the model that you have shared.

    Regards,

    Kumar.D

  • Hi ,

    All the requested inputs are shared by e-mail to our TI contact, Mr .Karthik R  by Mr.Sankalp Kallakuri.

    Regards,

    Sithara Tresa Chacko

  • Hi,

    The input image file is not found the zip shared. Could you please share input image file  (JPG/BMP/PNG) corresponding to output JPG

  • Can you try with below Import config. We expect the issue to be solved with this Parameters

    modelType = 0
    inputNetFile = "D:\work\vision\CNN\customers\xx\deploy.prototxt"
    inputParamsFile = "D:\work\vision\CNN\customers\xx\mob.caffemodel"
    outputNetFile = "../../test/testvecs/config/tidl_models/caffe/tidl_net_msi_mobilenet_pd_padded.bin"
    outputParamsFile = "../../test/testvecs/config/tidl_models/caffe/tidl_io_msi_mobilenet_pd_padded"
    numParamBits = 8
    numFeatureBits = 8
    quantizationStyle = 3
    inDataFormat = 0
    inWidth = 512
    inHeight = 512
    inNumChannels = 3
    perfSimConfig = "../../test/testvecs/config/import/perfsim_base.cfg"
    inData = "D:\work\vision\CNN\customers\xx\img_list.txt"
    postProcType = 2
    inFileFormat = 2
    foldPreBnConv2D = 0

  • Hi Kumar,

    We have seen improvement in the results after the suggestions made by you. We are however still facing some cases where some the Caffe output is not matching with the import tool output.

    I have attached a few images to show this phenomenon. What worries us is that the score of the detection is quite high for the detection on Caffe side but the detection is completely missed in the import tool side.

    I will email few of the images to Karthik. The output of import tool and PC[Caffe] are stitched together the input images have been sent separately.

    Best Regards,

    Sankalp Kallakuri

  • Hi Sankalp,

    I have tried with the images that you have shared.

    16-bit results are matching with the expected output, but the 8-bit mode is missing a few detections as you observed.

    Could you please quantify the overall accuracy drop with the 8-bit mode?

    The 16-bit inference runtime would be considerably higher compared to 8-bit. 

    we would recommend using quantization aware training (QAT) to get the best results in 8-bit mode. Please refer below link for more information on the same.

    https://github.com/TexasInstruments/jacinto-ai-devkit

    Note : This dev-kit is for PyTorch. You may not be able to use it for current model immediately.

    We are also working on improving the import tool to improve the calibration for 8-bit mode. This will be available around 3Q this year.

    So, we would recommend using 8-bit mode for now and use to QAT / improved calibration for 8-bit mode later

    Regards,

    Kumar.D

  • Dear Sir,

    We had tried the same for numParamBits  and numFeatureBits set to 16 and observed multiple boxes  for the same input shared in the previous queries.

    The results are improved for NumParamBits and NumFeatureBits set to 8 and quantization style 3.

    The import configuration file is:

    modelType          = 0
    inputNetFile       = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/deploy.prototxt"
    inputParamsFile    = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mob.caffemodel"
    outputNetFile      = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/tidl_models/tidl_net_pd_l1.bin"
    outputParamsFile   = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/tidl_models/tidl_io_pd_l1"
    numParamBits = 16
    numFeatureBits = 16
    quantizationStyle = 3
    inDataFormat = 0
    inElementType  = 0 
    inWidth = 512
    inHeight = 512
    inNumChannels = 3
    perfSimConfig = "../../test/testvecs/config/import/perfsim_base.cfg"
    perfSimTool = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/utils/perfsim/ti_cnnperfsim.out"
    tidlStatsTool = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/tidl_quant_stats_tool.out"
    inData = "/home/sithara/ti/j7/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/l1.txt"
    numFrames = 3
    foldPreBnConv2D = 0
    postProcType = 2
    inFileFormat = 2
    
    
    
    
    
    
    
    
    
    

    Kindly suggest what can be done from our side.

    Thanks &Regards

    Sithara Tresa Chacko

  • Sithara ,

    We have recently fixed a bug in 16-bit flow object detection layer and we are not observing this multiple detections now.

    This bug fix is not part of 6.2 SDK release.

    We will update you on the TIDL patch release date for this soon (After aligning internally). BTW, 16-bit mode is not fully optimized will have performance impact

    Could you quantify the accuracy degradation with 8-bit mode?

  • Dear Kumar,

    For now we feel the speed drop is ok for our current model with 16 bit. But the FPs which are appearing are less tolerable.

    Keeping this in mind until you provide the new SDK we will  stick to 8 bit.

    We will try and give you numbers for accuracy on the 8 bit mode too.

    Best Regards,

    Sankalp

  • Dear Sir,

    Thankyou for the suggestion to set flip =false.

    The new model with Flip=False gives better results compared to the previous model. We had observed precision of 93.23571 .

    Still in some cases, we observed localization issue.

    Can we improve the model .Please suggest.

    Thanks & Regards

    Sithara Tresa Chacko

    .

  • Hi Sithara,

    Can you share the precision difference between the TIDL inference and caffe PC inference

  • Hi,

    We have observed,

    Caffe PC  side:

    Precision:92.44 and Recall:76.65

    TIDL inference side:

    Precision :93.23 and Recall:68.66

    Thanks &Regards

    Sithara Tresa Chacko

  • We hope the issue is solved with the patch release shared with you.

    Please open new thread if you still face any issue

  • Hi kumar,

    We also met some issue. Can you share the patch with us?

    Thanks,

    FU