This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA2EVM5777: Theoretical Performance Evaluation for Custom Deep Learning Model to run TDA2xx Platform

Part Number: TDA2EVM5777


Hi,

I want to understand the Theoretical as well as Actual performance evaluation of DL model. And how to make sure the designed model will run within available processing power of EVE?

Assume that for a model GigaMacs requirement is 1.6 GMacs. And according to EVE datasheet , EVE can process 16 MACs per Cycle.
So, Considering EVE Freq = 535MHz
Execution Time = (Total Macs required / 16 ) * (1 / 535MHz) secs = 1.6 G /( 16 * 535 M) = 0.18 seconds

From previous my E2E query,   https://e2e.ti.com/support/arm/automotive_processors/f/1021/t/694669

The assumption of 16 MAC per cycle throught the netwrok is wrong, and for each layer different MACs/Cycle should be taken into consideration. But, In the datasheet specific configurations of different layers are mentioend with MACs per cycle. 

If one has to design a custom CNN architecture with different input size, stride, and kernel size. How one can theoretically calculate total number cycles required to run full network?

    Input Shape     Output Shape      
Layer No Layer Type N C H W Kernel Size Stride N C H W MACs/CYCLE #MMACS Total MCycles
1 TIDL_BatchNormLayer 1 3 36 64     1 3 36 64   0.01  
2 TIDL_ConvolutionLayer 1 3 36 64 3 1 1 10 34 62   0.57  
3 TIDL_PoolingLayer 1 10 34 62 3 2 1 10 17 31   0.01  
4 TIDL_ConvolutionLayer 1 10 17 31 3 1 1 16 15 29   0.63  

For the above network, If one has to calculate total cycles required to execute this network? What should be the MACs per Cycle for each layer? And also comment about the other layers like Flatten, DetectionOutput, SoftMax etc and their MACs/Cycle?

regards.

sagar

  • Hi Sagar,

    In the data sheet, we have provided performance numbers of convolution for various feature size. The MACs/Cycle or the Mega cycles can be used for estimate the performance number for your convolutions.
    Also, we provided MCycles information for other layers in the datasheet.

    Let us know the issues you are facing for estimating cycles for your network.

    Thanks,
    Praveen
  • Hi Praveen,

    Yes. Datasheet has provided performance number of convolution for various feature size. But, In my custom network i did not find any similar matching with datasheet. Datasheet is mainly focussing on network configuration used for SegNet and SSD Object Detection.

    My point is , How one can generalize calculation of Macs per Cycle ?
    Please check the above network architecture. I want to calculate Mac Per Cycle (Highlighted in Yellow Color). How should i calculate it ?

    I need general rule for an example (Assume 8 bit quantization):
    Conv 3x3 = 11 MAC/Cycle
    conv 5x5 = 7 MAC/ Cycle
    etc

    And if you do not have this information. How one can calculate it? Does it mean, I have to execute the custom network using JTAG and from profiling information I have consider those values.

    My main question is , I want to design a custom DL netowork model. And before going for actual execution on board. I have to calculate MCycles for that network.

    Regards,
    Sagar
  • Hi Sagar,

    Sorry for late reply.

    We can use below rule in general for Mac Per Cycle estimation of TIDL layers.
    For all types on Convolution layers, you can consider an average of 10 MAC/cycle as an estimation.
    For all other types of layers, you can consider an average of 3 to 4 MAC/cycle as an estimation.

    Thanks,
    Praveen