This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA2EXEVM: TIDL usecase performance

Part Number: TDA2EXEVM


Hi,

I am using PROCESSOR_SDK_VISION_03_03_00_00 and implement OD on TIDL.

I use jacintoNet512x512 model in the caffe-jacinto-model.

I evaluate the non-sparse model on TDA2X and show the required TSC cycles =  44,220,579,576 = 44.22 G

I assume that if the EVE core operate at 650M , the fps of this model is 650M/44.22G = 0.014 on one EVE.

It's very slow.

But if i directly put this model on the OD use-case.

The non-sparse can run almost 15 fps.

Compared with the original evaluation,it's very fast.

1. How there is such big difference?Is my model evaluation on TIDL tb is wrong?

2. On the use-case, I compare operation speed of the sparse and non-sparse model.

the sparsity of the sparse model is 67% 

But on the use-case there is almost no speed up (almost 15 fps)?

How?is the sparsity is not enough?

Or do I need to modify any configuration inside the usecase?

Thank you!

Best Regards,

Eric Lai 

  • Hi,

    Please check this below thread where similar issue is resolved,
    e2e.ti.com/.../681674

    Thanks,
    Praveen
  • Hi Praveen,

    I think it's different question.
    I evaluate the TSC cycles on the TIDL tb.
    But I run it on the usecase, it's very fast compared with the evaluation.
    I want to know the difference.
    and the second question is the speed up rate of the sparse and non-sparse model.
    please give me more information.
    Thank you.

    Best Regards,
    Eric Lai
  • Hi Eric Lai,

    Did you run all the layers of the SSD network on EVE? In that case it can run very slow as detection output layers are not optimal on EVE.

    So, you need to run initial part of the network on EVE and later part of the network on DSP. Please use layersGroupId parameter in import config to specify the layers to cores EVE(1) or DSP(2) mapping.

    Also, please refer table 4B in the TIDeepLearningLibrary_DataSheet.pdf to see performance of SSD.

    Thanks,
    Praveen
  • Hi Praveen,

    Okay,thank you.
    I will set the parameter and test it again.
    I read the datasheet.
    From the Table 1 TIDL Convolution Layer Performance,
    we can know that sparsity can influence the EVE operation cycles a lot.
    For my models, one is non-sparse model and the other is sparse model with almost 70% sparsity.
    But run on the use-case, why there is no difference on it?
    I think it should be faster?

    Best Regards,
    Eric Lai
  • Yes. Sparse model should be faster, but only for larger size images. So if resolution of the image is small (< 64x64) you may not see much difference.

    Could you share the import config file and log file for analysis?

    Thanks,
    Praveen
  • Hi Praveen,

    Here is my log file.

    Btw,for the conv2dKernelType parameter , I test a lot of combination.

    Set all to zero, Set all to one .....

    But the performance is almost the same on the use-case.

    Thank you.

    Log_file.zip

    Best Regards,

    Eric Lai

  • Hi Eric Lai,

    Thanks for sharing logs.

    Form the import logs of sparse and non-sparse models, I see that there is a change in GMACS, so this should effect the performance. see below,
    From sparse model import_log.txt for layer2,
    Layer 2 : Max PASS : -2147483648 : 122877 Out Q : 5524 , 123359, TIDL_ConvolutionLayer, PASSED #MMACs = 157.29, 89.78, 98.04, Sparsity : 37.67, 42.92

    From non-sparse model import_log.txt for layer2,
    Layer 2 : Max PASS : -2147483648 : 120750 Out Q : 5444 , 121224, TIDL_ConvolutionLayer, PASSED #MMACs = 157.29, 134.15, 144.18, Sparsity : 8.33, 14.71

    So, we can clearly see that there is increase in GMACs and decrease in sparsity in non-sparse model compare to sparse model and this should have some impact on the performace. Can you please re-run once and double check the performance on the use-case.

    Thanks,
    Praveen
  • Hi Praveen,

    From the import log file , the first value of #MMAC is the same.
    Do you mean the second and third value of the #MMAC?
    Is the sec and third value of represent the actual MAC based on the sparsity?
    I re-runn the use-case for these two models,but the problem is the same.
    You can refer the above log file to see the use-case log.
    And also I modify the parameter TIDL_OD_FPS_OPPNOM in the chains_tidlOD.c to bigger value.
    But i doesn't influence the result.
    Do you try the influence of the sparsity? is that increase a lot of performance?


    Best Regards,
    Eric Lai
  • Hi Praveen,
  • From the import log, the first value of #MMAC is the same.
    Do you mean the sec and third values of #MMAC?
    Is that the actual MAC based on the sparsity?
    I re-run the use-case , but the result is the same.
    You can refer the use-case log inside the above log.zip.
    Do you try the influence of sparsity on the OD use-case?
    Is that increase performance a lot?

    Best Regards,
    Eric Lai
  • Hi Praveen,

    Also i modify the parameter TIDL_OD_FPS_OPPNOM inside the chains_tidl_OD.c
    But it doesn't influence the result.

    Best Regards,
    Eric Lai
  • Hi Eric Lai,

    >> Do you mean the sec and third values of #MMAC?
    >> Is that the actual MAC based on the sparsity?
    Yes, this MAC is based on the saprsity

    >> Do you try the influence of sparsity on the OD use-case?
    No. Can you please share both the models, I will run and check the performance.

    Thanks,
    Praveen
  • Hi Praveen,

    Here is my model.

    Btw,i modify the parameter TIDL_OD_FPS_OPPNOM inside the chains_tidl_OD.c 

    But it doesn't influence the result.

    I'm looking forward to your answer.

    Best Regards,

    Eric Lai

    model.zip

  • Hi Eric Lai,

    Form the OD use case logs (TIDL_OD_usecase_log.txt),
    Sparse model time is ..
    [IPU1-0] 99.542640 s: Local Link Latency : Avg = 150382 us, Min = 149820 us, Max = 151101 us,
    [IPU1-0] 99.543158 s: Source to Link Latency : Avg = 164672 us, Min = 163698 us, Max = 166260 us,

    and, non-sparse model time is,
    [IPU1-0] 132.031720 s: Local Link Latency : Avg = 263246 us, Min = 262582 us, Max = 264137 us,
    [IPU1-0] 132.032025 s: Source to Link Latency : Avg = 305553 us, Min = 276337 us, Max = 376929 us,

    So, there is performance difference between sparse and non-sparse. Right?

    Thanks,
    Praveen
  • Hi Praveen,

    Thank you for your response.

    Yes, i see the difference between the latency.

    And I thought the latency is the time that we can get the first output value.

    Am i right?

    But i more care about the fps, why i can not see obvious difference on the fps?

    as the below figure, we can see the big difference on the fps.

    Best Regards.

    Eric Lai

  • Hi Eric Lai,

    I do not know much about VSDK profiling details.

    Could you please create a new thread with this TIDL_OD_usecase_log and ask the same question about the difference in fps. VSDK experts will answer that new thread.

    Thanks,
    Praveen
  • Hi Praveen,

    Okay, i will ask the question again.
    Thank you so much for your great help.

    Best Regards,
    Eric Lai