AM62A3: Request for Time-Series AI Model Examples on AM62A for Industrial Applications (Non-Vision)

Part Number: AM62A3
Other Parts Discussed in Thread: AM62A7

Hello TI E2E Team,

I am currently evaluating the AM62A7 processor for an industrial automation project. While I understand that the AM62A series is highly optimized for Vision AI via its C7x/MMA accelerator, my use case focuses on Industrial Time-Series Data analysis rather than image processing.

Specifically, I am looking to implement AI models on an EtherCAT Master device to perform tasks such as Predictive Maintenance (PdM) or Anomaly Detection based on high-speed sensor data and bus traffic.

I have the following questions regarding the support for non-vision AI:

  1. Availability of Examples: Are there any reference designs or software examples (Jupyter Notebooks, etc.) for Time-Series forecasting or classification (e.g., LSTM, GRU, or 1D-CNN) specifically validated for the AM62A NPU?

  2. Optimization Path: For models in ONNX or TFLite format that process 1D sensor data, what is the recommended workflow to ensure they are offloaded to the C7x/MMA instead of running only on the Arm Cortex-A53 cores?

  3. Sensor Data Integration: Since this will be used in an EtherCAT Master environment, are there any recommended approaches for feeding real-time bus data into the TIDL (TI Deep Learning) runtime efficiently?

If there are any white papers or GitHub repositories covering Non-Vision Edge AI on the AM62Ax platform, please let me know.

Best regards,
Jack Cha

  • Hi Jack,

    These are good questions, and you are correct that this is a vision-optimized accelerator. As such, the set of operators supported on this device is tuned for vision CNN and transformers. 

    We have listed the set of supported operators and any parameter-set restictions on the page here [1] -- please note these are version-sensitive, and the latest SDK for 11.1 corresponds to 11_01_06_00. 

    • LSTM, GRU are not implemented currently and Convolution support is nominally for square kernels (1D can be emulated at reduced efficiency with zero'd weights)

    Per your questions: 

    • Availability of Examples: Are there any reference designs or software examples (Jupyter Notebooks, etc.) for Time-Series forecasting or classification (e.g., LSTM, GRU, or 1D-CNN) specifically validated for the AM62A NPU?

    Unfortunately no, our examples to date are for vision models and applications

    • Optimization Path: For models in ONNX or TFLite format that process 1D sensor data, what is the recommended workflow to ensure they are offloaded to the C7x/MMA instead of running only on the Arm Cortex-A53 cores?

    Please see [1] -- if the operators are all supported in your model, then it will accelerate completely on the NPU. When you compile a model, it will list which layers are not optimized for the NPU and that they will run on Arm. You will also see messages at runtime for your model that say how many X of Y layers will be accelerated

    I will mention that unsupported operators in the middle of a network can slow down overall runtime substantially as a result of moving data between Arm and NPU

    1. Sensor Data Integration: Since this will be used in an EtherCAT Master environment, are there any recommended approaches for feeding real-time bus data into the TIDL (TI Deep Learning) runtime efficiently?

    Good question here. I'm not very familiar with ethercat and how the data is unpacked from the network interface / protocol stack

    Typically we have a multi-step chain of vision components that all operate on data in a DDR_SHARED_REGION, which the remote cores (C7 NPU included) want data to come from. Image buffers are quite big, so this is important for fast runtime. Depending on the size of your inputs and frequency you run the model at, this may be very relevant.

    There is an interface with TIDLRT (the core runtime underneath ONNX/TFL) for allocating your inputs into this shared region [2]. It can only be used with C/C++ programs (python does not allow this type of memory access). Doing this for the input and output will reduce extraneous mem-copies. 

    [1] https://github.com/TexasInstruments/edgeai-tidl-tools/blob/11_01_06_00/docs/supported_ops_rts_versions.md 

    [2] https://github.com/TexasInstruments/edgeai-tidl-tools/blob/e70106c59a5bec5a4f794b81b88534279b8a2312/examples/osrt_cpp/ort/onnx_main.cpp#L282