TDA2EXEVM: what is the limit on the number of groups in convolution

user6502096

Prodigy 80 points

Part Number: TDA2EXEVM
Other Parts Discussed in Thread: TDA2

Hi,Because I use group convolution and the fps is very low, what is the limit on the number of groups in convolution?

over 4 years ago

0 Praveen Eppa1 over 4 years ago

TI__Genius 17580 points

Hi,

The max number of supported groups in convolution are 1024.

Thanks,

Praveen

0 user6502096 over 4 years ago in reply to Praveen Eppa1

Prodigy 80 points

Hi,

Thank you for your reply.

The number of group I used in each convolution is less than 1024, but my fps is still very low. I think it is the problem of tidl importTool.

How to set the conv2dKernelType of tidl_import_JDetNet.txt？ Only if the size of conv is less than 32*32 is set to 1. Or are there other conditions?

Thanks

0 Praveen Eppa1 over 4 years ago in reply to user6502096

TI__Genius 17580 points

Hi,

Sorry for the late reply. Please set conv2dKernelType to 1 if the size of conv is upto 64*64 and try if it can improve the fps.

Thanks.

Praveen

0 user6502096 over 4 years ago in reply to Praveen Eppa1

Prodigy 80 points

Hi,

Thank you for your reply. I have already tried to set conv2dKernelType to 1 if the size of conv is upto 64*64, but the fps still very low.

Are there any other restrictions?

Thanks.

0 Praveen Eppa1 over 4 years ago in reply to user6502096

TI__Genius 17580 points

Hi,

If you are running SSD model, you may need to modify below as shown to overcome low fps,

keep_top_k: 20
confidence_threshold: 0.15

More details are in the below thread.

https://e2e.ti.com/support/processors/f/791/t/689617

Thanks,

Praveen

0 user6502096 over 4 years ago in reply to Praveen Eppa1

Prodigy 80 points

Hi,

Thank you for your reply. Yes, I was running SSD model. I have already tried to set keep_top_k to 20 and confidence_threshold to 0.15, but the fps still very low.

Are there any other restrictions or import tool have some problems?

Thanks.

0 Praveen Eppa1 over 4 years ago in reply to user6502096

TI__Genius 17580 points

Hi,

>> Are there any other restrictions or import tool have some problems?

No, there are no other problems in the import tool. This low fps is because of grouped convolutions will take more time for processing in TIDL, so this could be a reason for low performance. Kindly try with small grouped convolutions.

Thanks,

Praveen

0 user6502096 over 4 years ago in reply to Praveen Eppa1

Prodigy 80 points

Hi,

Thank you for your reply.

Normally, the use of group convolution can increase the execution speed of the model, but why does the use of group convolution in tidl make the fps drop?

By the way, The total parameter amount of the ssdJacintoNetV2 I trained is 3.25693e+06, and Total Giga Macs is 3.6114. After the import tool, the executed fps is 20, but why the total parameter amount of the ssd model I additionally designed is less (253024),Total Giga Macs is 1.4182, and after the import tool, the executed fps is 15 (drop)?

The settings have been set according to the user guide. Are there any additional restrictions on tidl and import tool?

Thanks.

0 Praveen Eppa1 over 4 years ago in reply to user6502096

TI__Genius 17580 points

Hi,

Kindly share the import config file for checking ?

Thanks,

Praveen

0 user6502096 over 4 years ago in reply to Praveen Eppa1

Prodigy 80 points

Hi,

I have shared my import config file. Please test for the import tool and fps.

Thanks.https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/import_5F00_config_5F00_file.7z

0 Praveen Eppa1 over 4 years ago in reply to user6502096

TI__Genius 17580 points

Hi,

Import config file looks fine, the conv2dKernelType was set correctly. The fps reduction is mainly because of uneven and small tensor sizes. The kernels are be better optimized for tensor sizes multiple of 8. I think fps you got is the final number for your model.

You can try one last thing setting "nms_threshold: 0.4" in the deploy.prototxt.

Thanks,

Praveen

0 user6502096 over 4 years ago in reply to Praveen Eppa1

Prodigy 80 points

Hi,

Thank you for your reply.

The situation you mentioned in my model is starts from the pool3 layer, but this design is the same as ssdJacintoNet, So why my fps is lower than ssdJacintoNet?

What is the principle of nms_threshold: 0.4？ I have tried setting nms_threshold to 0.4, but the fps has not changed.

Thanks.

0 Praveen Eppa1 over 4 years ago in reply to user6502096

TI__Genius 17580 points

Hi,

>> The situation you mentioned in my model is starts from the pool3 layer, but this design is the same as ssdJacintoNet, So why my fps is lower than ssdJacintoNet?

Even though this situation started from pool3 layer, but in the earlier layers there are grouped conv layers which are not well optimized in the TDA2 (as we do SIMD across numchannels), so there is performance degradation in your model.

Thanks,

Praveen

0 user6502096 over 4 years ago in reply to Praveen Eppa1

Prodigy 80 points

Hi,

Thank you for your reply.

>>In the earlier layers there are grouped conv layers which are not well optimized in the TDA2

But the maximum grouped conv of each layer is only 4, which is the same as the maximum grouped conv value of the ssdJacintoNet. Or is there a limit to the number of grouped conv? If so, how much is it?

Thanks.

Processors

Processors forum

TDA2EXEVM: what is the limit on the number of groups in convolution