This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VE-Q1: Efficiency issues of TDA4VE TIDL

Part Number: TDA4VE-Q1
Other Parts Discussed in Thread: TDA4VM

Tool/software:

HI, TI experts:

We used the same model, configuration, and program to run on TDA4VM (SDK8.0) and TDA4VE (SDK9.2) respectively (both TDA4VM and TDA4VE have 2G DDR), and found that the time spent on TDA4VE was nearly 20ms more than on TDA4VM. However, from the perspective of configuration, TDA4VE has an improvement in MMA compared to TDA4VM. But the actual test results are opposite, please help confirm the reason. Thankful.

here is the result of TDA4VM(SDK8.0)

here is the result of  TDA4VE(SDK9.2)

  • Hello,

    Thank you for posting your question to the forum. 

    However, from the perspective of configuration, TDA4VE has an improvement in MMA compared to TDA4VM

    Would you be able to clarify your statement here?

    TDA4VM (SDK8.0) and TDA4VE (SDK9.2)

    This comparison is a combination of both SDK versions and SoCs which I do not believe is something we would track. Would you be able to see what the behavior is if running TDA4VM with SDK 9.2 to eliminate a variable?

    Best,

    Asha

  • Hello Asha,

    Thanks for your reply.

    Would you be able to clarify your statement here?

    here is the different between two socs

    This comparison is a combination of both SDK versions and SoCs which I do not believe is something we would track. Would you be able to see what the behavior is if running TDA4VM with SDK 9.2 to eliminate a variable?


    We have already done some projects using TDA4VM (SDK8.0). Due to cost and other reasons, we need to change to TDA4VE (SDK9.2). Although we haven't had any major issues with other features after porting, but the running efficiency of this model is not as good as before. We need to solve this problem on TDA4VE, and we may not make the same comparison on TDA4VM. We hope you can help us find the reason for the reduced running efficiency on TDA4VE. Look forward to your reply.

    Best,

    Sam

  • Hi Asha,

    How is the situation processing?Looking forward  to your reply.

    Best,

    Sam

  • Hi Sam,

    As we need to locate whether it is SDK problem or hardware problem, could you try running this model on 9.2 on TDA4VM? 

    Regards,

    Adam

  • Hi Adam,

    Thanks for your reply.

    We have internally unified our opinions. Considering the current project timeline and personnel investment, as I mentioned earlier, we have decided to use TDA4VE due to cost and other reasons. We apologize that we will not attempt SDK9.2 on TDA4VM again.

    In other word,how can we improve the efficiency of model operation on TDA4VE? We tried mixed quantization, but the model accuracy cannot be guaranteed.We have tried the methods in the TI Deep Learning Product User Guide, but it still does not work. How to adjust which layers need to use 16 bits through _paramDebuG.csv?

    Regards,

    Sam

  • Hi Sam,

    If you can provide original model, we can test 9.2 on TDA4VM. Check this doc for 16bit layer configuration : https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/tidl_fsg_quantization.md . However, 8bit is enough for most of models. Can you identify which layer results in large error in 8bit?

    Regards,

    Adam

  • Hi Adam,


    obs.rar Attached file is our original model,one more thing ,we did the test with 16 bit quantification

    We would like to confirm:
    1. Can the EVM board use the edgeai-idle-tool to quantify model? As we know, edgeai-idle-tool can only be used for SK boards, while EVM boards can only use the TIDL-RT Importer tool.We have been using the TIDL-RT Importer tool.which is the most recommended tool?
    2. Can the TIDL-RT Importer achieve the same effect as the edgeai-idle-tool? Some options are only available in the edgeai-idle-tool, such as advanced-options: mixed-precision_factor

    Regards,

    Sam

  • Hi Sam,

    I apologize for not being responsive on your thread. It looks like your ultimate goal right now is improve the accuracy and/or performance of your model with the 9.2 SDK on TDA4VE? I will provide some general guidance for steps that you can take regarding this. But to answer your questions first:

    1. Yes, you can use edgeai-tidl-tools on an EVM board - it is not exclusive to the SK boards. The edgeai-tidl-tools folder is not on the default filesystem that gets build when you follow the documentation from Vision Apps, so you will need to copy the folder as part of your root filesystem. The most recommended tool moving forward for customers is edgeai-tidl-tools as it is more straightforward to use and debug issues. 

    2. Most of the functionality remains the same, but features may be named differently. We have a high level comparison chart here. Both TIDL-RT and edgeai-tidl-tools support mixed precision import. Documentation for TIDL-RT Documentation for edgeai-tidl-tools

    Some general guidance regarding performance -

    Can you clarify if you are trying to improve accuracy (numerical values) or runtime performance (how long the model takes to run)? In your original post, you are referring to frame time, however if you are trying to explore mixed-precision, that indicates to me that you are also debugging accuracy. 

    In terms of improving accuracy and/or performance, you can look at some of the documentation we have regarding quantization and our overall troubleshooting guide

    For accuracy issues, you can look at the process documented in the troubleshooting guide as Steps to Debug Functional Mismatch in Host emulation and Feature Map Comparison with Reference. 

    For runtime performance issues, you can look at generating the layer cycles as documented in the troubleshooting guide as Troubleshooting for performance issues. 

    Best,

    Asha

  • Hi Asha,

    Thanks for your reply and guidance. Our ultimate goal was initially to confirm whether there were fundamental differences in software or hardware between TDA4VM and TDA4VE. However, due to the project timeline, we must try to improve the model performance on TDA4VE. So, our current goal is to ensure the accuracy and performance of the model on TDA4VE. We followed the guidelines of TIDL Importer to attempt mixed quantization, but the accuracy and precision of the test results decreased significantly. We will try using edge-idle-tools for mixed quantization. We will synchronize our progress.

    Regards,

    Sam

  • Hi Sam,

    To clarify, you have checked your model with 16-bit quantization. Was this sufficient in terms of accuracy and precision? Just want to make sure we are starting with an appropriate baseline when investigating mixed precision. 

    Best,

    Asha

  • Hi Asha,

    We have recently been experimenting with mixed precision quantization with Adam. The original purpose of this post is whether the performance of TDA4VE is inferior to TDA4VM. Let's focus on this topic.Thanks!

    Regards,

    Sam

  • Hi Sam,

    From a hardware/datasheet spec perspective, I would not expect to see a significant performance drop between TDA4VM and TDA4VE. In both instances you would be running your model on 1 C7x+MMA core. 

    From a software perspective, for the model you provided, to further debug - could you provide the inference with 8.0 SDK at debugTraceLevel = 1 and report back the layer level cycles? This can then be compared with the 9.2 values to see if there are any significant differences. 

    Best,

    Asha

  • Hi Asha,

    I have tested their model on tda4vm and ve with the model with edgeai-tidl-tools. In default settings, vm has more msmc than ve and indeed has better latency than ve. After setting the same msmc, ve is better than vm. So the customers' problem may just come from the msmc size.

    Regards,

    Adam

  • Hi Adam,

    Thank you for helping out with debug for this issue, I appreciate it. 

    If the settings had TDA4VM using more MSMC space in the comparison TDA4VE, I would not be surprised if latency was higher for TDA4VE, assuming in this case that more DDR memory was used as a result. 

    Can we close this thread as a result of your investigation? Then open a new one if experiencing difficulties with mixed precision.

    Best,

    Asha