This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: TIDL-RT inference error when input channel number greater than 4

Part Number: TDA4VM


Hello TI:

We have found that when the input channel number greater than 4, the inference results of the TIDL model is incorrect.

But when the input channel is less than 4, the inference results of the TIDL model is correct compared with the Pytorch model.

Here is our test:

When the input is a 3-channel RGB image, the output of python and tda4 board side reasoning is roughly aligned:

The first 200 values of the Pytorch inference output:

```
tensor([0.2725, 1.8240, 0.9287, 1.0627, 1.4536, 1.3616, 0.9387, 1.0476, 1.8022,
1.8972, 1.7944, 1.9753, 2.0522, 1.6511, 1.3245, 1.2046, 1.6687, 2.0710,
1.8052, 1.4341, 1.5159, 1.3245, 1.4829, 1.6804, 1.4365, 1.7385, 1.6130,
1.3274, 1.6187, 1.7466, 2.0071, 2.1467, 2.0046, 1.9263, 1.8596, 1.9434,
1.7725, 1.8657, 2.3823, 2.6248, 2.2700, 1.6094, 1.6526, 1.5845, 1.5388,
1.8826, 1.5706, 1.2964, 1.7742, 1.6318, 1.6179, 2.2485, 2.1445, 1.4136,
1.3721, 1.9043, 1.7217, 1.6313, 1.8491, 1.7224, 2.1184, 2.7771, 2.4487,
1.9482, 2.0022, 1.9531, 1.7512, 1.6338, 1.5212, 1.1211, 1.2781, 1.3647,
1.5173, 1.9783, 2.0759, 1.4543, 1.0547, 1.2500, 1.0984, 0.9580, 1.2556,
1.5652, 1.2031, 1.4524, 1.6968, 2.1414, 2.0488, 1.4937, 1.6997, 1.7988,
1.3606, 1.0713, 1.5337, 1.0088, 1.2561, 1.0376, 0.4551, 2.0996, 1.4224,
1.6277, 2.4272, 2.2100, 0.6223, 1.2932, 3.0981, 2.4563, 1.1636, 2.5801,
4.0732, 2.2795, 1.2170, 1.1143, 2.1606, 2.6863, 1.8450, 1.2229, 1.4094,
0.9294, 1.5471, 1.7346, 0.7766, 1.6616, 2.5457, 1.8667, 2.5105, 1.5776,
1.5364, 2.8516, 1.8926, 1.4551, 1.8564, 2.5530, 2.6946, 2.3096, 2.2207,
2.2153, 2.1553, 1.9778, 2.7310, 2.2983, 1.2312, 2.0168, 2.3958, 1.2305,
2.1570, 2.5898, 1.5598, 3.1360, 3.1733, 1.5193, 1.4409, 3.0139, 2.2844,
1.8755, 3.4807, 3.0547, 2.7522, 3.6755, 2.7910, 2.1860, 2.1313, 1.6392,
2.4373, 1.9514, 1.8430, 1.5222, 1.5061, 1.6863, 1.9619, 2.7542, 3.1331,
1.2903, 0.5537, 2.2219, 1.8372, 0.5667, 1.0918, 2.1479, 1.3037, 1.6404,
1.3792, 2.1606, 2.7366, 1.1772, 2.5527, 2.2637, 1.3894, 1.1377, 2.1436,
1.0078, 1.2209, 0.3113, 0.2517, 1.6677, 1.6304, 1.9866, 2.1826, 2.0413,
0.8716, 1.7761], device='cuda:0')

The first 200 values of the TIDL model  inference output:

```
0.272461 1.82422 0.928711 1.0625 1.4541 1.36133 0.938477 1.04688 1.80176 1.89746
1.79395 1.97559 2.05273 1.65137 1.3252 1.20508 1.66895 2.07031 1.80469 1.43359
1.51562 1.3252 1.4834 1.68066 1.43652 1.73828 1.61328 1.32715 1.61816 1.74609
2.00684 2.14648 2.00391 1.92578 1.85938 1.94336 1.77246 1.86523 2.38281 2.62402
2.27051 1.60938 1.65234 1.58496 1.53906 1.88281 1.57031 1.29688 1.77441 1.63184
1.61816 2.24902 2.14453 1.41406 1.37207 1.9043 1.72168 1.63086 1.84961 1.72266
2.11816 2.77734 2.44824 1.94824 2.00195 1.95312 1.75098 1.63379 1.52148 1.12109
1.27832 1.36426 1.51758 1.97852 2.0752 1.4541 1.05566 1.25 1.09863 0.958008
1.25586 1.56543 1.20312 1.45215 1.69629 2.1416 2.04883 1.49414 1.7002 1.79883
1.36133 1.07129 1.5332 1.00879 1.25586 1.03711 0.455078 2.09961 1.42188 1.62793
2.42773 2.20996 0.62207 1.29297 3.09668 2.45703 1.16406 2.58008 4.07324 2.2793
1.2168 1.11426 2.16016 2.68555 1.8457 1.22363 1.40918 0.929688 1.54688 1.73438
0.777344 1.66211 2.5459 1.86719 2.51074 1.57812 1.53613 2.85156 1.89258 1.45508
1.85645 2.55273 2.69434 2.30859 2.2207 2.21484 2.15527 1.97754 2.73047 2.29883
1.23145 2.0166 2.39551 1.23047 2.15723 2.58984 1.55957 3.13574 3.17285 1.51953
1.44141 3.01367 2.28418 1.875 3.48047 3.05371 2.75195 3.67578 2.79102 2.18555
2.13184 1.63867 2.43652 1.95117 1.84277 1.52246 1.50586 1.68555 1.96191 2.75391
3.13281 1.29004 0.553711 2.22168 1.83691 0.566406 1.0918 2.14746 1.30371 1.64062
1.37891 2.16113 2.73633 1.17676 2.55273 2.2627 1.38965 1.1377 2.14355 1.00781
1.2207 0.311523 0.251953 1.66797 1.63086 1.9873 2.18262 2.04102 0.87207 1.77637
```

When the model structure is the same, only the input is changed to the input channel number is 6(left and right graphs are concatenated), the output of the python side and tda4 side is very different, and the result cannot be aligned:

The first 200 values of the Pytorch inference output:

```
tensor([0.0000e+00, 2.7285e+00, 2.8269e+00, 1.6304e+00, 1.3997e+00, 1.3689e+00,
1.4131e+00, 1.3860e+00, 1.3345e+00, 1.3909e+00, 1.4070e+00, 1.3889e+00,
1.3545e+00, 1.3674e+00, 1.3391e+00, 1.3613e+00, 1.3481e+00, 1.3748e+00,
1.4243e+00, 1.4846e+00, 1.4036e+00, 1.4185e+00, 1.3950e+00, 1.3835e+00,
1.3638e+00, 1.3652e+00, 1.4065e+00, 1.3867e+00, 1.3027e+00, 1.3511e+00,
1.3777e+00, 1.3601e+00, 1.3704e+00, 1.3643e+00, 1.3557e+00, 1.3911e+00,
1.4158e+00, 1.4163e+00, 1.3875e+00, 1.4126e+00, 1.4490e+00, 1.4751e+00,
1.5117e+00, 1.5244e+00, 1.5005e+00, 1.4155e+00, 1.4963e+00, 1.4980e+00,
1.4241e+00, 1.4448e+00, 1.4675e+00, 1.4526e+00, 1.4204e+00, 1.4573e+00,
1.4790e+00, 1.4539e+00, 1.4944e+00, 1.4873e+00, 1.4543e+00, 1.5115e+00,
1.5195e+00, 1.4326e+00, 1.4438e+00, 1.5266e+00, 1.4805e+00, 1.4639e+00,
1.4380e+00, 1.4746e+00, 1.4836e+00, 1.5225e+00, 1.5859e+00, 1.5027e+00,
1.4927e+00, 1.6277e+00, 1.6060e+00, 1.4963e+00, 1.5686e+00, 1.5256e+00,
1.4050e+00, 1.4802e+00, 1.4988e+00, 1.5071e+00, 1.4541e+00, 1.4600e+00,
1.4753e+00, 1.5767e+00, 1.4890e+00, 1.5300e+00, 1.5010e+00, 1.4807e+00,
1.5544e+00, 1.5212e+00, 1.4985e+00, 1.5979e+00, 1.6865e+00, 2.4600e+00,
0.0000e+00, 7.2339e-01, 0.0000e+00, 6.0815e-01, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 3.5156e-02, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 3.7598e-02, 0.0000e+00, 0.0000e+00,
0.0000e+00, 1.4404e-02, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 9.6924e-02, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
5.1514e-02, 1.4648e-03, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 2.9053e-02, 0.0000e+00,
0.0000e+00, 1.1248e+00, 3.2275e-01, 4.7437e-01, 4.4336e-01, 3.3008e-01,
3.9404e-01, 4.8340e-01], device='cuda:0')
```

The first 200 values of the TIDL model  inference output:

```
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
```

When the input channel of the ONNX model is greater than 4, we use the TIDL import tools to transform the ONNX model and inference the transformed model on TDA4VM board, but the results is incorrect.

Could you give some suggestion about when the input channel of ONNX model is greater than 4, how can we transform the model using TIDL model importer tools ?