This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] EDGE-AI-STUDIO: How do I use and retrain AI models from TI’s Model Zoo for my own dataset using TI Deep Learning (TIDL) on Edge AI AM6xA Processors?

Part Number: EDGE-AI-STUDIO
Other Parts Discussed in Thread: AM68A, AM69A, AM67A

Tool/software:

Q:

“I have found TI’s model zoo with neural networks that have been validated and benchmarked on AM6xA processor [AM62A, AM67A, AM68A, AM68PA, AM69A or TDA4x SoCs] using C7x AI accelerator."

"I need to train these models for my own application with a dataset that I have collected. How do I do this?”

 

A:

The above is fairly common question from developers seeking to leverage pre-optimized networks from TI for their own custom application. It is suggested to use some of these models because they have already been analyzed and optimized for runtime latency and accuracy. The models available from TI are pre-trained on common datasets (e.g. COCO or Imagenet1k), but may need retraining to be applied to custom use-case.

For some networks, we have modified the original network architecture to be friendlier to the hardware accelerator used on AM6xA and TDA4x SoCs, the C7xMMA. These modifications are usually to improve accuracy or runtime efficiency on our fixed-point accelerator. For example, replacing a SiLU activation function with ReLU is far faster in fixed-point. We denote these modified architectures by including “lite” or “TI lite” in the model name within our benchmarks or model zoo repo.

With recent operator support, some original models may be supported as-is, where they were not before. Please see our list of supported operators/layers (and any related restrictions) for the latest information, and do note this documentation is version-specified with respect to our SDK.

In subsequent responses, I’ll describe and categorize models within our model zoo as it pertains to retraining and reuse.

 

Links and Repositories of note:

  • edgeai-tensorlab
    • hosts several programming tools for model training, optimization, compilation and benchmarking
  • edgeai-modelzoo
    • Part of tensorlab, hosting links and files for the modelzoo, including some pregenerated model artifacts. This was previously a separate repo (prior to July 2024), so if you find an archived version of edgeai-modelzoo on github, it is likely out of date.
  • Edge AI Studio
    • Online, GUI-based tool for training models, viewing pregenerated benchmarks, and other AI model development tasks.
    • See the Model Selection tool for pre-generated benchmarks
  • edgeai-tidl-tools
    • standalone tools for compiling a trained model that is exported into ONNX or TFLITE format

Please note I have embedded plenty of links into the text for your benefit. Most of these links will reach into the edgeai-tensorlab or edgeai-tidl-tools repositories.

 

TL;DR (too long, didn’t read): TI has a set of AI models that have been validated and benchmarked on our SOCs. These can be retrained, but not all architectures have a full set of examples and programming tools from TI for doing retraining yourself. In this FAQ, I’ll talk about the available resources and necessary knowledge to own this process for your design. Some architectures are modified and labelled “TI Lite” to denote that the architecture is not identical to the original version. Others are pulled by TI as-is. TI-Lite models require some optimizations that TI hard performed to be similarly done as part of training; unmodified architectures have limited support from TI for training (but compilation with TIDL tools has e2e support)

  • Supported Model Category 1: Fully supported models for retraining within TI tools via Edge AI Studio or edgeai-modelmaker.

    TI supports a set of models for transfer learning and retraining on custom datasets. These generally start from pretrained model checkpoints that used large open source datasets to train from scratch. These architectures have supporting code and examples from TI to handle both training and model compilation.

    The baseline tool used here is edgeai-modelmaker (part of edgeai-tensorlab), which can be used programmatically as-is OR via a graphical interface within Model Composer (part of Edge AI Studio). This uses Pytorch as the training framework, and will export models into ONNX format. When training completes, the model will be compiled for a target SoC. Compilation is preformed within edgeai-benchmark.

    • If you are using edgeai-modelmaker, it is recommended to use a docker or virtual environment to separate any global dependencies on your development PC from versions that modelmaker will set up.

     

    The set of architectures supported in these tools are subject to change, and typically use a set of scalable architectures to give multiple options that trade off speed and accuracy. For example, the YOLOX architecture has multiple variants (nano, tiny, small, etc.) that can be selected between. The supported set of models reflects common requests, the state of the art, and the speed vs. accuracy profile on TI’s deep learning solution.

     

    For training, the actual framework and training code comes from another subdirectory within the edgeai-tensorlab repo,  e.g. YOLOX-based keypoint-detection architectures will use edgeai-yolox. There are multiple training tools used by modelmaker. Modelmaker provides a consistent interface for each training tool.

     

    Developers can change a variety of parameters related to training, like number of epochs, learning rate, weight decay, etc. Within Model Composer, there are boxes and descriptors for these. Edgeai-modelmaker exposes such parameters through a YAML configuration file that is applied for both training and compilation – this file is the primarily interface with modelmaker. During compilation, any settings in the YAML configuration will be used to override the set of defaults within edgeai-benchmark/configs.

    For example, edgeai-modelmaker/config_classification.yaml will default to mobilenet_v2_lite for training and compilation.

    • Training configurations are set for this model deeper in the edgeai-modelmaker source code
      • note that each model will have a small ‘compilation’ section with a “model_compilation_id”, which is a shorthand for pairing a task-type abbreviation (e.g. CL=classification) with a numerical identifier. CL-6090 corresponds to mobilenet_v2_lite. This is used to find default compilation settings.
    • Compilation configurations are set with defaults within the edgeai-benchmark configs source code for the associated model_selection (aka model compilation id, the aforementioned shorthand for a model name).
      • Note that there are many more compilation/benchmark configurations than there are training configurations – this is because all of the model zoo is compiled and benchmarked within this tool, whereas not all models we use are trained with modelmaker.
      • Some models will have specific options that we have found improves accuracy. For instance, yolo-x architectures designate several layers to run in 16-bit mode (vs. default 8-bit) for accuracy improvement
  • Supported Model Category 2: TI-Lite models for retraining – uploaded weights, parameters, model source available on modelzoo

    For models that have been modified by TI to be “lite” or “TI lite”, some change to the network architecture was performed to optimize speed and accuracy. In this case, we have taken the upstream version of a model, tuned the architecture for our accelerator, and reuploaded it into our model zoo.

    We have limited support for retraining these model types, and developers are expected to leverage the open source code and supporting tooling to train these modified architectures for their use-case. The information above describes the general process and resources available for reproducing these TI Lite models.

    • If a model is NOT modified by TI, then please look at the third option for supported model types in this FAQ. Use the upstream version of the model.
    • Note that some models inherently use ‘lite’ in the name (e.g. efficientdet-lite3 or SSDLite) and are not modified by TI. If unsure, try checking model hubs like Hugging Face, Tensorflow, paperswithcode, etc. to see if the same name is prevalent in other resources for that specific model.

     

    A developer has 2 approaches for reproducing such TI-Lite models:

    • Find trainable weights for this TI-lite model and the configuration code for said architecture. Then use them to train your own version of the TI-Lite model
    • Take original pretrained weights, original model architecture from upstream location. Perform model optimizations to match TI-lite model (or set of supported operators) and train.

    You may also skip the pretrained weights and start from scratch, but this will be more challenging to train. Larger datasets and many epochs are needed to train from scratch.

     

    For most Ti-lite models, we provide trainable weights in pytorch’s PTH file format as well as the exported ONNX file. For tensorflow models, retraining these architectures is not supported, and developers who must stick to this architecture will need to analyze the upstream and TI versions, delineate the differences, and modify in Tensorflow accordingly.

     

    Within modelzoo, the models are grouped by task type, dataset and training tool. If the training tool includes “edgeai”, then the model was likely trained within edgeai-tensorlab (the tool name may be abbreviated, e.g. tv=torchvision).  The top level-directory structure includes forks of popular upstream training tools like mmdetection, mmpose, and yolox, which also depend on edgeai-torchvision.

     

    Models within these edgeai- subdirectories of modelzoo generally contain:

    • A .PTH containing the trained weights in Pytorch format – this is needed for further training!
    • A .ONNX file exported from the Pytorch training
    • A .YAML file describing the configuration used by TI for preprocessing, compilation, and postprocessing with TIDL
    • For some, a .LOG file containing logging and info messages captured during training.

    Please note some of these files have a .LINK extension. These files are single-line of text with a link to the main file. This reduces the size of the git repo. Most links will also include an SDK version, indicating this model was tested and validated with that release.

     

    To retrain the model, you need code that builds the model architecture and a PTH file containing starting weights.

    • Code for the model architecture should be in the repo where it was trained.
      • That training tool should be referenced within the modelzoo directory name, which contains the .PTH file. Tool name may be abbreviated
      • Check the training tool repo’s README for any ‘edgeai’ mentions or references. The team’s practice is generally to prepend/link to TI-specific instructions for ti-lite models before the upstream version’s original README
        • g. edgeailite in edgeai-torchvision
        • For some training repos, the documentation will describe exactly which modifications were applied to the models trainable with that tool
      • Please follow standard Pytorch practices for applying PTH checkpoints as the initial weights. The names of the weights must align with the constructed model.

    When the model completes training, it must be exported into ONNX format. If the model is object or keypoint detection with a detection head, it is also necessary to export a PROTOTXT file that describes this head according to the TIDL meta architectures.

    It is also recommended to seek any preprocessing information performed during training – similar preprocessing should be done on inputs prior to inference. Unless this was intentionally changed, it is likely the same as what modelzoo uses (see model’s config.YAML)

    Please see edgeai-tidl-tools for the standalone compile/import with TIDL once the exported model files are ready.

  • Supported Model Category 3: Open Source Architecture w/o TI modification

     

    The last category of models in modelzoo described in this FAQ are open source models that originate entirely outside TI.

    For these models, we have very limited technical support as it pertains to training. We have downloaded the model from an external source, like pytorch’s model hub, onnx’s model hub, tensorflow’s model garden, hugging face etc., compiled the model with TIDL tools, and benchmarked the performance on our accelerator.

    For these models, we support compilation with TIDL tools. To this end, modelzoo includes the exported model file and a YAML config file that describes the preprocessing and compilation parameters. Please see edgeai-tidl-tools for the standalone compile/import with TIDL.

  • Additional Topic: Model Optimization for TIDL and C7x

    The C7x is a fixed-point accelerator leveraging matrix-multiplication hardware and a DSP. Some layers and configurations are more efficient on this architecture, and most models benefit from some optimization to such an architecture.

    As a starting point, all layers should be in a supported configuration as described in our supported-operators document: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/supported_ops_rts_versions.md

     

    TI has a few tools for automating optimizations, hosted as part of the edgeai-modeloptimization part of tensorlab. Most optimizations either replace unsupported layer configurations with supported ones or replace a supported operation with a more efficient one. These optimizations can also be categorized based on when they need to be applied – before or after training. Any changes to the model architecture itself are referred to as “model surgery”.

    Any model change that will impact the numerical values coming out of a layer should be performed before training is complete. Pre-training optimization would include, for example:

    • Changing an activation function
    • Changing the size of a convolution or pooling kernel’s size, stride, or dilation
    • Changing an interpolation technique for upscaling
    • Quantization aware training
    • Pruning / enforcing sparsity

    Alternatively, post-training optimizations are appropriate when the change does not actually impact the values themselves, for example:

    • Converting a large maxpool into multiple cascaded ones
    • Replacing dynamic axes with static tensor shapes
    • Modifying a minor attribute/parameter of a layer
    • Fixing an awkward result of model export after training
    • Adding layers to handle input data preprocessing (subtract mean, multiple by scale, convert YUV->RGB format)

    Pre-training optimization

    As part of tensorlab, torchmodelopt is a python library for applying model optimizations when training with torch. This toolkit includes tooling for quantization aware training, sparsity/pruning, and model surgery.

    Please see the README and documentation on the torchmodelopt page for usage guidelines. A common technique is to train a model ordinarily, apply optimizations, and then continue training with a reduced learning rate and number (50-100) of epochs. It is important to continue training after applying optimizations.

     

    Post-training / export optimization (ONNX)

    Post-training optimization tools are part of the edgeai-tidl-tools, and are primarily geared towards ONNX models, since those are easiest to parse and modify.

    The tidl-onnx-model-optimizer is used to apply optimizations on .ONNX models, and has a continually growing set of rules that may be applied to a model. Please see the README for instructions on usage, and this tool is open for contributions.

     

    Additional optimizations for preprocessing – YUV and input_optimization

    Most models require preprocessing on input data, often such that values are on the interval [0,1] or [-1,1]. The model then accepts single-precision floats as the input to the network. However, running this preprocessing on CPU cores and transferring all input data as 4-bytes per value to the accelerator can be wasteful of common SoC resources (CPU, DDR, caches).

    It is more efficient for the input to require as little preprocessing as possible outside the accelerator. This nominally means changing the model to accept 8-bit in RGB- or YUV-encoding, and run these elementwise operations on the accelerator itself using the >=256bit-SIMD DSP.

    YUV encoding is useful because many cameras and ISPs (including VISS within the Vision Preprocessing Accelerator (VPAC) within AM6xA and TDA4x SoCs)) tend to use YUV-based encodings. Including this in the model removes the need for an additinoal image-format conversion.  

     

    Within the edgeai-tidl-tools/scripts, there are ONNX and TFLITE-specific tools to implement this functionality, so that layers are added like so (ONNX model for YOLOX-NANO shown) prior to the Conv layer: