AM62A7-Q1: C7x+MMA versions between AM62A and TDA4VEN

Daviel Almonte

Part Number: AM62A7-Q1
Other Parts Discussed in Thread: AM67A, AM68A, TDA4VM

Tool/software:

Hello,

When viewing the comparison between AM62A and TDA4EN, I noticed that TDA4EN supports Histograms while AM62A does not. This leads me to believe that these are 2 different versions of C7x Subsystem:

questions:

1. what are the versions of C7x between these devices (it is not mentioned in the TRM)

2. what other C7x+MMA differences are there between these versions?

9 months ago

0 Reese Grimsley 9 months ago

TI__Genius 16396 points

Hello,

Yes, you are correct that these C7x's are not identical. AM62A uses C7504 whereas TDA4AEN (architecturally, very similar to AM67A) uses C7524. This is generally not exposed because C7 is a closed architecture such that the low-level features are not directly accessible -- C7xMMA is treated as a black-box AI accelerator.

That said, there are a few differences that impact the user-level

Increased L2 cache size (1.25 MB in AM62A vs. 2.25 MB in TDA4AEN)
Histogram / lookup table support in AEN, not present in AM62A. This mainly impacts nonlinear activations
2nd C7xMMA on AEN -- each C7xMMA is a 256-bit variant that equates to 2 TOPS

BR,
Reese

0 Daviel Almonte 9 months ago

TI__Expert 3325 points

Thanks for the clarification!

Can you clarify what nonlinear activations are?

0 Reese Grimsley 9 months ago in reply to Daviel Almonte

TI__Genius 16396 points

Ah, yes of course.

I'll first note that there are many good resources online to describe what these are and why they exist. They may give a more intuitive explanation than my text below

In short form, nonlinear-activations are generally pointwise functions that happen on the output of larger layers (e.g. convolution) that had mixed together many points/values. This makes a sort of 'break' in an otherwise linear algorithm that lets it generalize to more complex functions and patterns in the underlying data. In essence, neural networks MUST have these non-linearities to function on complex data.

Some examples of this are ReLU, SiLU, sigmoid, tanh (I'd recommend checking out an image of the x-y plot for these. Some are really simple like RELU, such that there's almost no extra compute involved. Others like SiLU or tanh are more complex to compute -- for these, a look-up table is a good way to accelerate for quantized data. For 8-bit quantization, you basically need a 256-element table, from which the non-linear result is fetched.

There's a mention of some of these activations here in our documentation: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/supported_ops_rts_versions.md#feature-set-comparison-across-devices.

BR,
Reese

0 Daviel Almonte 9 months ago in reply to Reese Grimsley

TI__Expert 3325 points

Hi Reese

thanks for the good read. I'll take a deeper dive in this. in the meantime, my customer and I put together this table, can you help me fill out the rest?

Feature / device	AM62A	TDA4xEN (AM67A)	TDA4VE (AM68A)	TDA4VM
C7x Version	C7504	C7524	C7120	C7100
MMA Version	rev2	rev2	rev2	rev1
L2 Cache Size per DSP/MMA	1.25 MB	2.25 MB	512kB	512kB
L3 Cache Size	64 KB	256 KB	4MB	8MB
DDR Bandwidth	14GB/s	14GB/s	34GB/s	17GB/s
Histogram / Lookup Table Support	no	yes	yes	yes
Number of C7x/MMA	1	2	2	1
TOPS	1 or 2 TOPS	4 TOPS	8TOPS	8TOPS
Resnet50 v1 224x224 [fps]	33.5	40	184	121

+1 Reese Grimsley 9 months ago in reply to Daviel Almonte

TI__Genius 16396 points

Hi David,

I updated your table with what I know - I think all boxes are covered.

Worth noting that AM62A's benchmarks are nominally 1.7 TOPS, so maybe another 10% possible if run at higher clock. The reason for this is somewhat historical (first EVM and what's in test farm use old PMIC which can't hit max speed across entire temp range).

Additionally, devices like AM67A/TDA4AEN that have >1 C7x core will only use 1 core for their benchmarks. This is why you don't see ~2x AM62A's performance here.

BR,

Reese

0 Daviel Almonte 9 months ago in reply to Reese Grimsley

TI__Expert 3325 points

Thanks!

Processors

Processors forum

AM62A7-Q1: C7x+MMA versions between AM62A and TDA4VEN