TDA4VEN-Q1: TIDL Model DDR bandwidth Increases Dramatically with Minor Network Change

Yu Chen

Part Number: TDA4VEN-Q1

Tool/software:

SDK Version: 10_00_08_00 edgeai tools

	Model Size	Inference Time on TDA4VEN	DDR bandwidth (14 images per second)
Model 1	21G MAC	41ms	1706Mb/s
Model 2	22G MAC	48.7ms	2488Mb/s

Note: Model2 doesn't introduce any new convolution layers compared to model 1.

DDR bandwidth increased by 45%.

This issue was tracked by TI support team as follows:

2024.12.13: All materials have been submitted to TI support team.

9 months ago

0 Yu Chen 9 months ago

Prodigy 40 points

Important Update:

We have found a way to decrease DDR bandwidth by setting “high_resolution_optimization” to True during model compilation. The update is as follows:

	Model Size	Inference Time on TDA4VEN	DDR bandwidth (14 images per second)
Model 1 high_resolution_optimization =False	21G MAC	41ms	1706Mb/s
Model 1 high_resolution_optimization =True	21G MAC	53ms	2500Mb/s
Model 2 high_resolution_optimization =False	22G MAC	48.7ms	2488Mb/s
Model 2 high_resolution_optimization =True	22G MAC	42.2ms	1600Mb/s

But It only worked for Model 2. Model 1 gets worse.

To conclude, we have following Questions:

Can TI R&D Team advise better solutions to this issue?
Can TI R&D Team advise when should we use the "high_resolution_optimization" option? In other words, what is defined as high_resolution?
Can TI R&D Team advise if there is any potential issue when setting this option to True? (as it won't work well with Model 1)
Does Onnx Version have anything to do with inference time/ DDR bandwidth?

0 Wen Li 9 months ago in reply to Yu Chen

TI__Expert 7441 points

Hi Yu;

I have forwarded your question to the R&D team. They will provide answers to you soon.

Thanks and regards

Wen Li

0 Wen Li 9 months ago in reply to Yu Chen

TI__Expert 7441 points

Hi Yu Chen;

When and how have you send uploaded the materials?

Could you please use the same ticket to upload info, or update the ticket? If you open multiple tickets for the same problem, the system will be confused. In order to get your questions answered quickly, please use one ticket for the same question. But if you have a new question, please open a new ticket.

Thank your patient!

Wen Li

0 Joe Shen 9 months ago in reply to Wen Li

TI__Expert 3735 points

Hi Li,

I sent email to you, please help check.

Regards

Joe

0 Adam Hua 8 months ago in reply to Yu Chen

TI__Expert 5015 points

Hi Yu,

As far as I know, high_resolution_optimization has something to do with dynamic padding add, which is still an experimental function yet.

Onnx version has little to do with inference time/DDR bandwidth if the operator does the same function.

Regards,

Adam

0 Febin Sam 8 months ago

TI__Prodigy 240 points

Hi Yu,
1. Even though Convolution operators are same between Model 1 and Model 2, there is difference in other operators used and also in the branches present in both models. There are various data dependency aspects due to difference in branches in the network, which also affects the memory space(DDR vs on chip memory) into which each layers output is written. This eventually affects the inference time and the DDR bandwidth.
2. Enabling "high_resolution_optimization" is always expected to give better DDR bandwidth(or same DDR bandwidth compared to when it is disabled). However this feature is not stable at this point of time, so it may give unexpected results in certain situations.
3. No, Onnx Version does not affect the inference time/ DDR bandwidth.

Regards
Febin

0 Yu Chen 8 months ago in reply to Febin Sam

Prodigy 40 points

Hi Febin,

Thanks for your response.

Yes, Model 2 has a new branch compared to Model 1, but it is relatively small (1G MAC) compared to the entire network. However, the DDR bandwidth has increased by 45%. Do you mean this is mainly due to the increase in the number of outputs? We would like to understand the key factors that significantly increase DDR consumption so that we can use them as guidelines when designing networks.
When you mention that it may "give unexpected results in certain situations," do you mean that the same network could produce unexpected results even if it is correct most of the time? Or do you mean that unexpected results may occur only with certain network structures?

Looking forward to your insights.

0 TommySong 8 months ago in reply to Yu Chen

TI__Genius 9580 points

Update in today's call:

1. The high_resolution_optimization option will only impact the specific model. So, if the model was verified OK, then it should be OK to use it.

2. Still need to analysis current model differences to figure out the impact factors of DDR BW. This can be used to optimize later model.

@ Febin, please help to do further analysis of the model and share what contribute to the DDR BW mostly. Thanks.

+1 Febin Sam 8 months ago

TI__Prodigy 240 points

Hi Yu,
DDR bandwidth depends on lot of network dynamics. I will try to describe some factors :
1. Any layer which has its output tensor size greater than On-chip memory size, will have its output tensor written to and read from DDR.
2. When there are parallel branches in the network, output tensor of the root layer has to be held in memory till all of its consumer branches have been processed. The root layers output tensor may be stored On-chip, hogging up the memory , resulting in some of the layers(depending on there tensor sizes) in each branch to be stored in DDR.
3. If any convolution layer in the parallel branches has high weight tensor size, it can also significantly contribute to DDR bandwidth.
Regards
Febin

+1 Febin Sam 8 months ago

TI__Prodigy 240 points

Hi Yu,
We have recently added some optimisations, for such situations where DDR bandwidth increases on having multiple parallel branches in model.
I tested your models on top of this optimisation, and i am getting similar inference time and DDR bandwidth for both models 1 & 2.
This optimisation will be part of our next SDK release.

Regards
Febin

0 Yu Chen 8 months ago in reply to Febin Sam

Prodigy 40 points

Hi Febin,

That sounds good! I think we can close this issue for now .We will keep tracking after the next SDK is available.

Also, thanks for the insights about the factors affecting DDR bandwidth.

Processors

Processors forum

TDA4VEN-Q1: TIDL Model DDR bandwidth Increases Dramatically with Minor Network Change