This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VEN-Q1: TIDL Model DDR bandwidth Increases Dramatically with Minor Network Change

Part Number: TDA4VEN-Q1


Tool/software:

SDK Version: 10_00_08_00  edgeai tools

Model Size        Inference Time on TDA4VEN      DDR bandwidth (14 images per second)   
Model 1   21G MAC 41ms 1706Mb/s
Model 2      22G MAC 48.7ms 2488Mb/s

Note: Model2 doesn't introduce any new convolution layers compared to model 1.

DDR bandwidth increased by 45%.

This issue was tracked by TI support team as follows:

2024.12.13:   All materials have been submitted to TI support team.

  • Important Update:

    We have found a way to decrease DDR bandwidth by setting “high_resolution_optimization” to True during model compilation. The update is as follows:
    Model Size Inference Time on TDA4VEN DDR bandwidth (14 images per second)
    Model 1
    high_resolution_optimization =False
    21G MAC 41ms 1706Mb/s
    Model 1
    high_resolution_optimization =True
    21G MAC 53ms 2500Mb/s
    Model 2
    high_resolution_optimization =False
    22G MAC 48.7ms 2488Mb/s
    Model 2
    high_resolution_optimization =True
    22G MAC 42.2ms 1600Mb/s
    But It only worked for Model 2. Model 1 gets worse.

    To conclude, we have following Questions:

    1. Can TI R&D Team advise better solutions to this issue?
    2. Can TI R&D Team advise when should we use the "high_resolution_optimization" option? In other words, what is defined as high_resolution?
    3. Can TI R&D Team advise if there is any potential issue when setting this option to True? (as it won't work well with Model 1)
    4. Does Onnx Version have anything to do with inference time/ DDR bandwidth?
  • Hi Yu;

    I have forwarded your question to the R&D team. They will provide answers to you soon.

    Thanks and regards

    Wen Li

  • Hi Yu Chen;

    When and how have you send uploaded the materials? 

    Could you please use the same ticket to upload info, or update the ticket? If you open multiple tickets for the same problem, the system will be confused. In order to get your questions answered quickly, please use one ticket for the same question. But if you have a new question, please open a new ticket. 

    Thank your patient!

    Wen Li

  • Hi Li,

    I sent email to you, please help check.

    Regards

    Joe

  • Hi Yu,

    As far as I know, high_resolution_optimization has something to do with dynamic padding add, which is still an experimental function yet. 

    Onnx version has little to do with inference time/DDR bandwidth if the operator does the same function. 

    Regards,

    Adam

  • Hi Yu,
    1. Even though Convolution operators are same between Model 1 and Model 2, there is difference in other operators used and also in the branches present in both models. There are various data dependency aspects due to difference in branches in the network, which also affects the memory space(DDR vs on chip memory) into which each layers output is written. This eventually affects the inference time and the DDR bandwidth.
    2.   Enabling "high_resolution_optimization" is always expected to give better DDR bandwidth(or same DDR bandwidth compared to when it is disabled). However this feature is not stable at this point of time, so it may give unexpected results in certain situations.
    3. No, Onnx Version does not affect the inference time/ DDR bandwidth.

    Regards
    Febin

  • Hi Febin,

    Thanks for your response.

    1. Yes, Model 2 has a new branch compared to Model 1, but it is relatively small (1G MAC) compared to the entire network. However, the DDR bandwidth has increased by 45%. Do you mean this is mainly due to the increase in the number of outputs? We would like to understand the key factors that significantly increase DDR consumption so that we can use them as guidelines when designing networks.

    2. When you mention that it may "give unexpected results in certain situations," do you mean that the same network could produce unexpected results even if it is correct most of the time? Or do you mean that unexpected results may occur only with certain network structures?

    Looking forward to your insights.

  • Update in today's call:

    1. The high_resolution_optimization option will only impact the specific model. So, if the model was verified OK, then it should be OK to use it.

    2. Still need to analysis current model differences to figure out the impact factors of DDR BW. This can be used to optimize later model.

    @ Febin, please help to do further analysis of the model and share what contribute to the DDR BW mostly. Thanks.

  • Hi Yu,
    DDR bandwidth depends on lot of network dynamics. I will try to describe some factors :
    1. Any layer which has its output tensor size greater than On-chip memory size, will have its output tensor written to and read from DDR.
    2. When there are parallel branches in the network, output tensor of the root layer has to be held in memory till all of its consumer branches have been processed. The root layers output tensor may be stored On-chip, hogging up the memory , resulting in some of the layers(depending on there tensor sizes) in each branch to be stored in DDR.
    3. If any convolution layer in the parallel branches has high weight tensor size, it can also significantly contribute to DDR bandwidth.
    Regards
    Febin

  • Hi Yu,
    We have recently added some optimisations, for such situations where DDR bandwidth increases on having multiple parallel branches in model.
    I tested your models on top of this optimisation, and i am getting similar inference time and DDR bandwidth for both models 1 & 2.
    This optimisation will be part of our next SDK release.

    Regards
    Febin

  • Hi Febin,

    That sounds good! I think we can close this issue for now .We will keep tracking after the next SDK is available.

    Also, thanks for the insights about the factors affecting DDR bandwidth.