Hi,
I want to understand the Theoretical as well as Actual performance evaluation of DL model. And how to make sure the designed model will run within available processing power of EVE?
Assume that for a model GigaMacs requirement is 1.6 GMacs. And according to EVE datasheet , EVE can process 16 MACs per Cycle.
So, Considering EVE Freq = 535MHz
Execution Time = (Total Macs required / 16 ) * (1 / 535MHz) secs = 1.6 G /( 16 * 535 M) = 0.18 seconds
From previous my E2E query, https://e2e.ti.com/support/arm/automotive_processors/f/1021/t/694669
The assumption of 16 MAC per cycle throught the netwrok is wrong, and for each layer different MACs/Cycle should be taken into consideration. But, In the datasheet specific configurations of different layers are mentioend with MACs per cycle.
If one has to design a custom CNN architecture with different input size, stride, and kernel size. How one can theoretically calculate total number cycles required to run full network?
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For the above network, If one has to calculate total cycles required to execute this network? What should be the MACs per Cycle for each layer? And also comment about the other layers like Flatten, DetectionOutput, SoftMax etc and their MACs/Cycle?
regards.
sagar