Tool/software: Code Composer Studio
Hi,
We are running a Yolov3 based model on the target board and it shows a relatively bad inference time as shown below:
The infer_config file is as below:
inFileFormat = 1
numFrames = 1
padInBuffInTB = 1
netBinFile = "testvecs/config/tidl_models/onnx/tidl_net_yolo3ours.bin"
ioConfigFile = "testvecs/config/tidl_models/onnx/tidl_io_yolo3ours_1.bin"
#inData = testvecs/config/detection_list_fir.txt
inData = testvecs/input/image2_magick9.y
outData = testvecs/output/tidl_yolo3ours_onnx_od.bin
debugTraceLevel = 0
writeTraceLevel = 0
writeOutput = 2
Running results are as below:
Instance created for testvecs/config/infer/public/onnx/tidl_infer_yolo3ours.txt
----------------------- TIDL Process with TARGET DATA FLOW ------------------------
# 0 . .. 0 1.00000 0.00000 255.00000 0
1 7.55960 0.00000 24.47219 0
2 14.64365 0.00000 13.86266 0
3 12.14462 0.00000 11.44540 0
4 12.49913 0.00000 20.16140 0
5 6.24956 0.00000 26.24183 0
6 15.58967 0.00000 15.13823 0
7 8.56469 0.00000 6.65523 0
8 16.13209 0.00000 10.97192 0
9 15.58967 0.00000 15.13823 0
10 23.44829 0.00000 6.56764 0
11 18.91503 0.00000 11.20802 0
12 15.58967 0.00000 15.13823 0
13 18.73857 0.00000 10.40635 0
14 19.95180 0.00000 7.31764 0
15 18.30978 0.00000 11.90620 0
16 18.30978 0.00000 11.90620 0
17 18.00308 0.00000 8.33191 0
18 45.87738 0.00000 4.33765 0
19 18.30978 0.00000 11.90620 0
20 30.01260 0.00000 8.32983 0
21 18.14329 0.00000 14.05479 0
22 18.14329 0.00000 14.05479 0
23 34.96238 0.00000 6.09226 0
24 37.30158 0.00000 6.21958 0
25 18.14329 0.00000 14.05479 0
26 34.89151 0.00000 4.64296 0
27 28.52071 0.00000 6.41639 0
28 9.07164 0.00000 16.31458 0
29 31.45592 0.00000 4.60963 0
30 33.96969 0.00000 6.35861 0
31 9.07164 0.00000 18.18855 0
32 21.71580 0.00000 5.94038 0
33 22.01132 0.00000 9.94943 0
34 9.07164 0.00000 18.18855 0
35 34.63575 0.00000 5.19694 0
36 22.64388 0.00000 9.18571 0
37 4.53582 0.00000 19.62158 0
38 25.06937 0.00000 6.50196 0
39 5.19582 0.00000 6.15879 0
40 45.10648 0.00000 4.98820 0
41 25.06937 0.00000 6.50196 0
42 14.62370 0.00000 8.00071 0
43 35.37612 0.00000 4.91857 0
44 25.06937 0.00000 8.45654 0
45 26.52969 0.00000 5.50327 0
46 52.46389 0.00000 3.96463 0
47 25.06937 0.00000 8.45654 0
48 24.78684 0.00000 5.89022 0
49 28.96082 0.00000 4.69600 0
50 12.53468 0.00000 10.53078 0
51 58.02005 0.00000 4.30886 0
52 56.10837 0.00000 4.18832 0
53 12.53468 0.00000 12.84436 0
54 64.29828 0.00000 3.87258 0
55 28.21528 0.00000 8.04529 0
56 12.53468 0.00000 12.84436 0
57 24.84498 0.00000 4.22621 0
58 34.69289 0.00000 6.51430 0
59 12.53468 0.00000 14.36016 0
60 64.72967 0.00000 3.69228 0
61 21.01660 0.00000 6.18559 0
62 12.53468 0.00000 14.36016 0
63 46.18002 0.00000 5.15374 0
64 9.12569 0.00000 9.42394 0
65 40.84504 0.00000 3.57449 0
66 40.84504 0.00000 5.09242 0
67 52.81095 0.00000 3.82496 0
68 56.50985 0.00000 3.07911 0
69 40.84504 0.00000 5.09242 0
70 37.78695 0.00000 3.41388 0
71 62.11702 0.00000 3.97637 0
72 40.84504 0.00000 5.09242 0
73 66.02151 0.00000 2.62036 0
74 29.40389 0.00000 5.67952 0
75 14.70194 0.00000 5.71353 0
76 7.29751 0.00000 4.38506 0
77 15.00614 0.00000 4.39820 0
78 8.05464 0.00000 5.09023 0
79 15.45412 0.00000 3.81775 0
80 8.62579 0.00000 4.40539 0
81 50.34578 0.00000 3.63486 0
83 50.34578 0.00000 3.63486 0
85 12.53468 0.00000 14.36016 0
87 13.81208 0.00000 5.79203 0
88 20.08617 0.00000 5.47640 0
89 8.07824 0.00000 5.32294 0
90 21.42973 0.00000 5.27305 0
91 15.58251 0.00000 6.03240 0
92 31.39681 0.00000 4.80941 0
94 31.39681 0.00000 4.80941 0
96 15.58967 0.00000 15.13823 0
98 28.53765 0.00000 7.21853 0
99 36.19500 0.00000 5.91242 0
100 16.24935 0.00000 7.20029 0
101 46.74425 0.00000 5.19850 0
102 34.09076 0.00000 5.22136 0
103 28.10185 0.00000 8.71829 0
104 3.29802 -23.04416 5.45783 1
93 19.75225 0.00000 7.44219 0
95 7.30617 -17.24570 7.25414 1
82 29.22105 0.00000 5.61239 0
84 4.93531 -16.61497 4.66030 1
TSC Mega Cycles = 158042.99 ... .... .....
Please note that we got a similar results as expected (104.4 ms) for the PeleeNet example as shown in the tidl user guide.
Some questions:
1. Why does it takes so much time to run our model? What may be the root cause for that?
2. How cam we profile the CPU usage and DRAM bandwidth?
3. Is there any possibility to accelerate the DRAM?
4. What means each column in the layer inference log prints? for example layer 82: 82 29.22105 0.00000 5.61239 0
Thx
Neta