TDA2EVM5777: Theoretical Performance Evaluation for Custom Deep Learning Model to run TDA2xx Platform

SagarK

Part Number: TDA2EVM5777

Hi,

I want to understand the Theoretical as well as Actual performance evaluation of DL model. And how to make sure the designed model will run within available processing power of EVE?

Assume that for a model GigaMacs requirement is 1.6 GMacs. And according to EVE datasheet , EVE can process 16 MACs per Cycle.
So, Considering EVE Freq = 535MHz
Execution Time = (Total Macs required / 16 ) * (1 / 535MHz) secs = 1.6 G /( 16 * 535 M) = 0.18 seconds

From previous my E2E query, https://e2e.ti.com/support/arm/automotive_processors/f/1021/t/694669

The assumption of 16 MAC per cycle throught the netwrok is wrong, and for each layer different MACs/Cycle should be taken into consideration. But, In the datasheet specific configurations of different layers are mentioend with MACs per cycle.

If one has to design a custom CNN architecture with different input size, stride, and kernel size. How one can theoretically calculate total number cycles required to run full network?

		Input Shape						Output Shape
Layer No	Layer Type	N	C	H	W	Kernel Size	Stride	N	C	H	W	MACs/CYCLE	#MMACS	Total MCycles
1	TIDL_BatchNormLayer	1	3	36	64			1	3	36	64		0.01
2	TIDL_ConvolutionLayer	1	3	36	64	3	1	1	10	34	62		0.57
3	TIDL_PoolingLayer	1	10	34	62	3	2	1	10	17	31		0.01
4	TIDL_ConvolutionLayer	1	10	17	31	3	1	1	16	15	29		0.63

For the above network, If one has to calculate total cycles required to execute this network? What should be the MACs per Cycle for each layer? And also comment about the other layers like Flatten, DetectionOutput, SoftMax etc and their MACs/Cycle?

regards.

sagar

over 7 years ago

0 Praveen Eppa1 over 7 years ago

TI__Genius 17580 points

Hi Sagar,

In the data sheet, we have provided performance numbers of convolution for various feature size. The MACs/Cycle or the Mega cycles can be used for estimate the performance number for your convolutions.
Also, we provided MCycles information for other layers in the datasheet.

Let us know the issues you are facing for estimating cycles for your network.

Thanks,
Praveen

0 SagarK over 7 years ago in reply to Praveen Eppa1

Intellectual 705 points

Hi Praveen,

Yes. Datasheet has provided performance number of convolution for various feature size. But, In my custom network i did not find any similar matching with datasheet. Datasheet is mainly focussing on network configuration used for SegNet and SSD Object Detection.

My point is , How one can generalize calculation of Macs per Cycle ?
Please check the above network architecture. I want to calculate Mac Per Cycle (Highlighted in Yellow Color). How should i calculate it ?

I need general rule for an example (Assume 8 bit quantization):
Conv 3x3 = 11 MAC/Cycle
conv 5x5 = 7 MAC/ Cycle
etc

And if you do not have this information. How one can calculate it? Does it mean, I have to execute the custom network using JTAG and from profiling information I have consider those values.

My main question is , I want to design a custom DL netowork model. And before going for actual execution on board. I have to calculate MCycles for that network.

Regards,
Sagar

0 Praveen Eppa1 over 7 years ago in reply to SagarK

TI__Genius 17580 points

Hi Sagar,

Sorry for late reply.

We can use below rule in general for Mac Per Cycle estimation of TIDL layers.
For all types on Convolution layers, you can consider an average of 10 MAC/cycle as an estimation.
For all other types of layers, you can consider an average of 3 to 4 MAC/cycle as an estimation.

Thanks,
Praveen

Processors

Processors forum

TDA2EVM5777: Theoretical Performance Evaluation for Custom Deep Learning Model to run TDA2xx Platform