This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62A7-Q1: C7x+MMA versions between AM62A and TDA4VEN

Part Number: AM62A7-Q1
Other Parts Discussed in Thread: AM67A, AM68A, TDA4VM

Tool/software:

Hello, 

When viewing the comparison between AM62A and TDA4EN, I noticed that TDA4EN supports Histograms while AM62A does not. This leads me to believe that these are 2 different versions of C7x Subsystem:

questions:

1. what are the versions of C7x between these devices (it is not mentioned in the TRM)

2. what other C7x+MMA differences are there between these versions?  

  • Hello,

    Yes, you are correct that these C7x's are not identical. AM62A uses C7504 whereas TDA4AEN (architecturally, very similar to AM67A) uses C7524. This is generally not exposed because C7 is a closed architecture such that the low-level features are not directly accessible -- C7xMMA is treated as a black-box AI accelerator.

    That said, there are a few differences that impact the user-level

    • Increased L2 cache size (1.25 MB in AM62A vs. 2.25 MB in TDA4AEN)
    • Histogram / lookup table support in AEN, not present in AM62A. This mainly impacts nonlinear activations
    • 2nd C7xMMA on AEN -- each C7xMMA is a 256-bit variant that equates to 2 TOPS

    BR,
    Reese

  • Thanks for the clarification! 

    Can you clarify what  nonlinear activations are?

  • Ah, yes of course.

    I'll first note that there are many good resources online to describe what these are and why they exist. They may give a more intuitive explanation than my text below

    In short form, nonlinear-activations are generally pointwise functions that happen on the output of larger layers (e.g. convolution) that had mixed together many points/values. This makes a sort of 'break' in an otherwise linear algorithm that lets it generalize to more complex functions and patterns in the underlying data. In essence, neural networks MUST have these non-linearities to function on complex data.

    Some examples of this are ReLU, SiLU, sigmoid, tanh (I'd recommend checking out an image of the x-y plot for these. Some are really simple like RELU, such that there's almost no extra compute involved. Others like SiLU or tanh are more complex to compute -- for these, a look-up table is a good way to accelerate for quantized data. For 8-bit quantization, you basically need a 256-element table, from which the non-linear result is fetched.

    There's a mention of some of these activations here in our documentation: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/supported_ops_rts_versions.md#feature-set-comparison-across-devices.

    BR,
    Reese

  • Hi Reese 

    thanks for the good read. I'll take a deeper dive in this. in the meantime, my customer and I put together this table, can you help me fill out the rest? 

    Feature / device

    AM62A

    TDA4xEN

    (AM67A)

    TDA4VE

    (AM68A)

    TDA4VM

    C7x Version

    C7504

    C7524

    C7120 C7100

    MMA Version

    rev2

    rev2

    rev2

    rev1

    L2 Cache Size per DSP/MMA

    1.25 MB

    2.25 MB

    512kB

    512kB

    L3 Cache Size

    64 KB 256 KB

    4MB

    8MB

    DDR Bandwidth

    14GB/s

    14GB/s

    34GB/s

    17GB/s

    Histogram / Lookup Table Support

    no

    yes

    yes yes

    Number of C7x/MMA

    1

    2

    2

    1

    TOPS

    1 or 2 TOPS

    4 TOPS

    8TOPS

    8TOPS

    Resnet50 v1 224x224 [fps]

    33.5 40

    184

    121

  • Hi David,

    I updated your table with what I know - I think all boxes are covered.

    Worth noting that AM62A's benchmarks are nominally 1.7 TOPS, so maybe another 10% possible if run at higher clock. The reason for this is somewhat historical (first EVM and what's in test farm use old PMIC which can't hit max speed across entire temp range).

    Additionally, devices like AM67A/TDA4AEN that have >1 C7x core will only use 1 core for their benchmarks. This is why you don't see ~2x AM62A's performance here.

    BR,

    Reese