This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-AM62A: How to improve the speed of inference

Part Number: PROCESSOR-SDK-AM62A

The 16bit model's  tidl Giga Macs is 11.1551 and inference speed is 200ms. I tried to reduce the size of the input image, The  tidl Giga Macs reduce to 10.2614.The 8-bit model is faster ? Do you have any suggestions for improving the inference speed.  And the import config as follow:

modelType          = 2
numParamBits       = 8
numFeatureBits     = 8
inElementType = 0
calibrationOption = 7
quantizationStyle  = 3
biasCalibrationIterations = 2
inputNetFile       = "../../test/testvecs/models/public/onnx/hyposelr18_infered.onnx"
outputNetFile      = "../../test/testvecs/config/tidl_models/onnx/tidl_net_hylr18_8.bin"
outputParamsFile   = "../../test/testvecs/config/tidl_models/onnx/tidl_io_hylr18_8_"
inFileFormat = 2
inDataNorm  = 1
inMean = 0.0 0.0 0.0
inScale = 0.003921 0.003921 0.003921
resizeWidth = 256
resizeHeight = 256
inDataFormat = 1
inNumChannels = 3
inWidth  = 256
inHeight = 256
inData  =   "../../test/testvecs/config/hrpose.txt"
inDataNamesList = x:0
outDataNamesList = "Identity:0, Identity_1:0"
postProcType = 0

tidl_net_hylr18_8.bin_paramDebug.csv

Num of Layer Detected :  83 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Num|TIDL Layer Name               |Out Data Name                                     |Group |#Ins  |#Outs |Inbuf Ids                       |Outbuf Id |In NCHW                             |Out NCHW                            |MACS       |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    0|TIDL_DataLayer                |x:0_original                                      |     0|    -1|     1|  x   x   x   x   x   x   x   x |  0       |       0        0        0        0        0        0 |       1        1        1        3      256      256 |         0 |
    1|TIDL_ConstDataLayer           |StatefulPartitionedCall/batchnorm_4/sub:0         |     0|    -1|     1|  x   x   x   x   x   x   x   x |  1       |       0        0        0        0        0        0 |       1        1        1       64       64       64 |         0 |
    2|TIDL_ConstDataLayer           |StatefulPartitionedCall/batchnorm_4/mul:0         |     0|    -1|     1|  x   x   x   x   x   x   x   x |  2       |       0        0        0        0        0        0 |       1        1        1       64       64       64 |         0 |
    3|TIDL_ConstDataLayer           |StatefulPartitionedCall/batchnorm_2/sub:0         |     0|    -1|     1|  x   x   x   x   x   x   x   x |  3       |       0        0        0        0        0        0 |       1        1        1       64       64       64 |         0 |
    4|TIDL_ConstDataLayer           |StatefulPartitionedCall/batchnorm_2/mul:0         |     0|    -1|     1|  x   x   x   x   x   x   x   x |  4       |       0        0        0        0        0        0 |       1        1        1       64       64       64 |         0 |
    5|TIDL_ConstDataLayer           |StatefulPartitionedCall/batchnorm_3/sub:0         |     0|    -1|     1|  x   x   x   x   x   x   x   x |  5       |       0        0        0        0        0        0 |       1        1        1       64       64       64 |         0 |
    6|TIDL_ConstDataLayer           |StatefulPartitionedCall/batchnorm_3/mul:0         |     0|    -1|     1|  x   x   x   x   x   x   x   x |  6       |       0        0        0        0        0        0 |       1        1        1       64       64       64 |         0 |
    7|TIDL_ConstDataLayer           |StatefulPartitionedCall/batchnorm_1/sub:0         |     0|    -1|     1|  x   x   x   x   x   x   x   x |  7       |       0        0        0        0        0        0 |       1        1        1       64       64       64 |         0 |
    8|TIDL_ConstDataLayer           |StatefulPartitionedCall/batchnorm_1/mul:0         |     0|    -1|     1|  x   x   x   x   x   x   x   x |  8       |       0        0        0        0        0        0 |       1        1        1       64       64       64 |         0 |
    9|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu:0                    |     0|     1|     1|  0   x   x   x   x   x   x   x |  9       |       1        1        1        3      256      256 |       1        1        1       64      128      128 | 154140672 |
   10|TIDL_PoolingLayer             |StatefulPartitionedCall/maxpool_1:0               |     0|     1|     1|  9   x   x   x   x   x   x   x | 10       |       1        1        1       64      128      128 |       1        1        1       64       64       64 |   2359296 |
   11|TIDL_ConvolutionLayer         |StatefulPartitionedCall/block_2_1_conv_1:0        |     0|     1|     1| 10   x   x   x   x   x   x   x | 11       |       1        1        1       64       64       64 |       1        1        1       64       64       64 | 150994944 |
   12|TIDL_EltWiseLayer             |StatefulPartitionedCall/batchnorm_1/mul_2:0       |     0|     2|     1| 11   8   x   x   x   x   x   x | 12       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   13|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_1:0                  |     0|     2|     1| 12   7   x   x   x   x   x   x | 13       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   14|TIDL_ConvolutionLayer         |StatefulPartitionedCall/block_2_1_conv_2:0        |     0|     1|     1| 13   x   x   x   x   x   x   x | 14       |       1        1        1       64       64       64 |       1        1        1       64       64       64 | 150994944 |
   15|TIDL_EltWiseLayer             |StatefulPartitionedCall/batchnorm_2/mul_2:0       |     0|     2|     1| 14   4   x   x   x   x   x   x | 15       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   16|TIDL_EltWiseLayer             |StatefulPartitionedCall/batchnorm_2/Add_1:0       |     0|     2|     1| 15   3   x   x   x   x   x   x | 16       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   17|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_2:0                  |     0|     2|     1| 16  10   x   x   x   x   x   x | 17       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   18|TIDL_ConvolutionLayer         |StatefulPartitionedCall/block_2_2_conv_1:0        |     0|     1|     1| 17   x   x   x   x   x   x   x | 18       |       1        1        1       64       64       64 |       1        1        1       64       64       64 | 150994944 |
   19|TIDL_EltWiseLayer             |StatefulPartitionedCall/batchnorm_3/mul_2:0       |     0|     2|     1| 18   6   x   x   x   x   x   x | 19       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   20|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_3:0                  |     0|     2|     1| 19   5   x   x   x   x   x   x | 20       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   21|TIDL_ConvolutionLayer         |StatefulPartitionedCall/block_2_2_conv_2:0        |     0|     1|     1| 20   x   x   x   x   x   x   x | 21       |       1        1        1       64       64       64 |       1        1        1       64       64       64 | 150994944 |
   22|TIDL_EltWiseLayer             |StatefulPartitionedCall/batchnorm_4/mul_2:0       |     0|     2|     1| 21   2   x   x   x   x   x   x | 22       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   23|TIDL_EltWiseLayer             |StatefulPartitionedCall/batchnorm_4/Add_1:0       |     0|     2|     1| 22   1   x   x   x   x   x   x | 23       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   24|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_4:0                  |     0|     2|     1| 23  17   x   x   x   x   x   x | 24       |       1        1        1       64       64       64 |       1        1        1       64       64       64 |    262144 |
   25|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_5:0                  |     0|     1|     1| 24   x   x   x   x   x   x   x | 25       |       1        1        1       64       64       64 |       1        1        1      128       32       32 |  75497472 |
   26|TIDL_ConvolutionLayer         |StatefulPartitionedCall/batchnorm_7/Add_1:0       |     0|     1|     1| 24   x   x   x   x   x   x   x | 26       |       1        1        1       64       64       64 |       1        1        1      128       32       32 |   8388608 |
   27|TIDL_ConvolutionLayer         |StatefulPartitionedCall/batchnorm_6/Add_1:0       |     0|     1|     1| 25   x   x   x   x   x   x   x | 27       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   28|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_6:0                  |     0|     2|     1| 27  26   x   x   x   x   x   x | 28       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |    131072 |
   29|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_7:0                  |     0|     1|     1| 28   x   x   x   x   x   x   x | 29       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   30|TIDL_ConvolutionLayer         |StatefulPartitionedCall/batchnorm_9/Add_1:0       |     0|     1|     1| 29   x   x   x   x   x   x   x | 30       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   31|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_8:0                  |     0|     2|     1| 30  28   x   x   x   x   x   x | 31       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |    131072 |
   32|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_9:0                  |     0|     1|     1| 31   x   x   x   x   x   x   x | 32       |       1        1        1      128       32       32 |       1        1        1      256       32       32 | 301989888 |
   33|TIDL_ConvolutionLayer         |StatefulPartitionedCall/batchnorm_12/Add_1:0      |     0|     1|     1| 31   x   x   x   x   x   x   x | 33       |       1        1        1      128       32       32 |       1        1        1      256       32       32 |  33554432 |
   34|TIDL_ConvolutionLayer         |StatefulPartitionedCall/batchnorm_11/Add_1:0      |     0|     1|     1| 32   x   x   x   x   x   x   x | 34       |       1        1        1      256       32       32 |       1        1        1      256       32       32 | 603979776 |
   35|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_10:0                 |     0|     2|     1| 34  33   x   x   x   x   x   x | 35       |       1        1        1      256       32       32 |       1        1        1      256       32       32 |    262144 |
   36|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_11:0                 |     0|     1|     1| 35   x   x   x   x   x   x   x | 36       |       1        1        1      256       32       32 |       1        1        1      256       32       32 | 603979776 |
   37|TIDL_ConvolutionLayer         |StatefulPartitionedCall/batchnorm_14/Add_1:0      |     0|     1|     1| 36   x   x   x   x   x   x   x | 37       |       1        1        1      256       32       32 |       1        1        1      256       32       32 | 603979776 |
   38|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_12:0                 |     0|     2|     1| 37  35   x   x   x   x   x   x | 38       |       1        1        1      256       32       32 |       1        1        1      256       32       32 |    262144 |
   39|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_13:0                 |     0|     1|     1| 38   x   x   x   x   x   x   x | 39       |       1        1        1      256       32       32 |       1        1        1      512       32       32 |1207959552 |
   40|TIDL_ConvolutionLayer         |StatefulPartitionedCall/batchnorm_17/Add_1:0      |     0|     1|     1| 38   x   x   x   x   x   x   x | 40       |       1        1        1      256       32       32 |       1        1        1      512       32       32 | 134217728 |
   41|TIDL_ConvolutionLayer         |StatefulPartitionedCall/batchnorm_16/Add_1:0      |     0|     1|     1| 39   x   x   x   x   x   x   x | 41       |       1        1        1      512       32       32 |       1        1        1      512       32       32 |2415919104 |
   42|TIDL_EltWiseLayer             |StatefulPartitionedCall/Relu_14:0                 |     0|     2|     1| 41  40   x   x   x   x   x   x | 42       |       1        1        1      512       32       32 |       1        1        1      512       32       32 |    524288 |
   43|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_15:0                 |     0|     1|     1| 42   x   x   x   x   x   x   x | 43       |       1        1        1      512       32       32 |       1        1        1      128       32       32 |  67108864 |
   44|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_16:0                 |     0|     1|     1| 43   x   x   x   x   x   x   x | 44       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   45|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_17:0                 |     0|     1|     1| 44   x   x   x   x   x   x   x | 45       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   46|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_18:0                 |     0|     1|     1| 45   x   x   x   x   x   x   x | 46       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   47|TIDL_EltWiseLayer             |StatefulPartitionedCall/add_7:0                   |     0|     2|     1| 43  46   x   x   x   x   x   x | 47       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |    131072 |
   48|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_19:0                 |     0|     1|     1| 47   x   x   x   x   x   x   x | 48       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   49|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_20:0                 |     0|     1|     1| 48   x   x   x   x   x   x   x | 49       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   50|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_21:0                 |     0|     1|     1| 49   x   x   x   x   x   x   x | 50       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   51|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_22:0                 |     0|     1|     1| 50   x   x   x   x   x   x   x | 51       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   52|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_23:0                 |     0|     1|     1| 51   x   x   x   x   x   x   x | 52       |       1        1        1      128       32       32 |       1        1        1      512       32       32 |  67108864 |
   53|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_24:0                 |     0|     1|     1| 51   x   x   x   x   x   x   x | 53       |       1        1        1      128       32       32 |       1        1        1      512       32       32 |  67108864 |
   54|TIDL_ConvolutionLayer         |StatefulPartitionedCall/bias_add_9:0              |     0|     1|     1| 52   x   x   x   x   x   x   x | 54       |       1        1        1      512       32       32 |       1        1        1       19       32       32 |   9961472 |
   55|TIDL_ConvolutionLayer         |StatefulPartitionedCall/bias_add_11:0             |     0|     1|     1| 53   x   x   x   x   x   x   x | 55       |       1        1        1      512       32       32 |       1        1        1       38       32       32 |  19922944 |
   56|TIDL_ConcatLayer              |StatefulPartitionedCall/concat:0                  |     0|     3|     1| 48  54  55   x   x   x   x   x | 56       |       1        1        1      128       32       32 |       1        1        1      185       32       32 |    189440 |
   57|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_25:0                 |     0|     1|     1| 56   x   x   x   x   x   x   x | 57       |       1        1        1      185       32       32 |       1        1        1      128       32       32 |  24248320 |
   58|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_26:0                 |     0|     1|     1| 57   x   x   x   x   x   x   x | 58       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   59|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_27:0                 |     0|     1|     1| 58   x   x   x   x   x   x   x | 59       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   60|TIDL_EltWiseLayer             |StatefulPartitionedCall/add_8:0                   |     0|     2|     1| 57  59   x   x   x   x   x   x | 60       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |    131072 |
   61|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_28:0                 |     0|     1|     1| 60   x   x   x   x   x   x   x | 61       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |  16777216 |
   62|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_29:0                 |     0|     1|     1| 61   x   x   x   x   x   x   x | 62       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   63|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_30:0                 |     0|     1|     1| 62   x   x   x   x   x   x   x | 63       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   64|TIDL_EltWiseLayer             |StatefulPartitionedCall/add_9:0                   |     0|     2|     1| 61  63   x   x   x   x   x   x | 64       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |    131072 |
   65|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_31:0                 |     0|     1|     1| 64   x   x   x   x   x   x   x | 65       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |  16777216 |
   66|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_32:0                 |     0|     1|     1| 65   x   x   x   x   x   x   x | 66       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   67|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_33:0                 |     0|     1|     1| 66   x   x   x   x   x   x   x | 67       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   68|TIDL_EltWiseLayer             |StatefulPartitionedCall/add_10:0                  |     0|     2|     1| 65  67   x   x   x   x   x   x | 68       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |    131072 |
   69|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_34:0                 |     0|     1|     1| 68   x   x   x   x   x   x   x | 69       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |  16777216 |
   70|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_35:0                 |     0|     1|     1| 69   x   x   x   x   x   x   x | 70       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   71|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_36:0                 |     0|     1|     1| 70   x   x   x   x   x   x   x | 71       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   72|TIDL_EltWiseLayer             |StatefulPartitionedCall/add_11:0                  |     0|     2|     1| 69  71   x   x   x   x   x   x | 72       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |    131072 |
   73|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_37:0                 |     0|     1|     1| 72   x   x   x   x   x   x   x | 73       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |  16777216 |
   74|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_38:0                 |     0|     1|     1| 73   x   x   x   x   x   x   x | 74       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   75|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_39:0                 |     0|     1|     1| 74   x   x   x   x   x   x   x | 75       |       1        1        1      128       32       32 |       1        1        1      128       32       32 | 150994944 |
   76|TIDL_EltWiseLayer             |StatefulPartitionedCall/add_12:0                  |     0|     2|     1| 73  75   x   x   x   x   x   x | 76       |       1        1        1      128       32       32 |       1        1        1      128       32       32 |    131072 |
   77|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_40:0                 |     0|     1|     1| 76   x   x   x   x   x   x   x | 77       |       1        1        1      128       32       32 |       1        1        1      512       32       32 |  67108864 |
   78|TIDL_ConvolutionLayer         |StatefulPartitionedCall/Relu_41:0                 |     0|     1|     1| 76   x   x   x   x   x   x   x | 78       |       1        1        1      128       32       32 |       1        1        1      512       32       32 |  67108864 |
   79|TIDL_ConvolutionLayer         |Identity:0                                        |     0|     1|     1| 77   x   x   x   x   x   x   x | 79       |       1        1        1      512       32       32 |       1        1        1       19       32       32 |   9961472 |
   80|TIDL_ConvolutionLayer         |Identity_1:0                                      |     0|     1|     1| 78   x   x   x   x   x   x   x | 80       |       1        1        1      512       32       32 |       1        1        1       38       32       32 |  19922944 |
   81|TIDL_DataLayer                |Identity:0                                        |     0|     1|    -1| 79   x   x   x   x   x   x   x |  0       |       1        1        1       19       32       32 |       0        0        0        0        0        0 |         0 |
   82|TIDL_DataLayer                |Identity_1:0                                      |     0|     1|    -1| 80   x   x   x   x   x   x   x |  0       |       1        1        1       38       32       32 |       0        0        0        0        0        0 |         0 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total Giga Macs : 10.2614
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------