This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] TDA4VH-Q1: Accelerate Transformer Models on TI SoCs.

Part Number: TDA4VH-Q1

TIDL SDK Version: 9.1

I have downloaded transformer model (deit) from Tis Tools repos : https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/tidl_fsg_vtfr.md#deit-transformer-example

Facing issues while importing the same from TIDL-RT, mode import tools in SDK 9.1

How can we fix the same ?

  • Transformers, brief overview! 

    In the realm of deep learning, transformers have gained significant popularity in recent years, surpassing Convolutional Neural Networks (CNNs) in certain domains. This shift can be attributed to the exceptional performance of transformers in various natural language processing (NLP) tasks and their ability to handle sequential data more effectively. The transformer architecture, which was introduced in 2017, has revolutionized the field of NLP. It employs a self-attention mechanism, multi-head attention, positional encoding, and feed forward neural networks. Transformers are particularly good at parallelization, handling long-range dependencies, and adapting to different tasks, making them well-suited for NLP tasks. These advantages contribute to their growing fame over CNN.

    But why Transformers over Conventional CNNs?

    Transformers have proven to be a superior option to Convolutional Neural Networks (CNNs) in certain domains due to their unique advantages. Firstly, transformers can process the entire input sequence in parallel, which leads to faster training and inference compared to the sequential processing of CNNs. Secondly, transformers are excellent at capturing long-range dependencies in data, making them well-suited for tasks that require an understanding of contextual relationships across the entire input sequence. Additionally, transformers have shown promise in addressing object detection and classification problems, expanding their applicability beyond NLP tasks. Lastly, due to their architecture, transformers can be easily adapted to various natural language processing tasks with minimal architectural changes, offering flexibility and efficiency. These benefits have made transformers very popular and have positioned them as the preferred choice over CNNs in specific applications

    Accelerate Transformer Models on TI Devices

    With the recent 9.1 SDK release TI extended the support of transformer-based models, for faster and better acceleration on TI SoCs. VTFR (Vision Transformers) Vision Transformers apply the transformer architecture (Consisting of Multi Headed Attention (MHA) and Feed Forward Network (FFN) blocks) to a wide variety of vision tasks such as Object Classification, Object Detection, Semantic Segmentation, etc. are currently supported with TIDL latest SDK release. SDK 9.1 Supports Basic transformer (classification) ViT, Deit (matmul, layernorm, softmax) and Partial support of SwinT. Please refer to the tidl_fsg_vtfr.md for more details on layer mapping and implementation level details from the TI side.

    Potential Issues and Resolutions

    We have seen customers trying Deit model generation using timm-based libraries (deit-transformer-example) running in issue for unsupported opset version this can be resolved by downgrading Opset 14 to Opset 11, secondly model import tool fails to read the input layer name at the time of model compilation, this can be resolved by explicitly setting inDataNameList flag, please follow below steps, for more details.

    Please change the models Opset version from 14 (As mentioned in readme files pytorch code section) to Opset 11.

    • torch.onnx.export(deit,x, "deit_tiny.onnx",export_params=True,opset_version=11,do_constant_folding=True,input_names=['input'],output_names=['output'])

    Once you have imported model in Opset 11 version follow below steps.

    Set inDataNamesList = "input.1" 

    Here we are explicitly instructing NC tool to start compiling the model from the first input layer : input.1, this is turn around way to get pass the model import issue.

    Please refer below,

    Import config : 

    modelType          = 2
    numParamBits       = 8
    
    inputNetFile       = "../../test/testvecs/models/Hirain/deit_tiny_1.onnx"
    outputNetFile      = "../../test/testvecs/output/gen-artifacts/Hirain/tidl_net_deit_tiny_1.bin"
    outputParamsFile   = "../../test/testvecs/output/gen-artifacts/Hirain/tidl_io_deit_tiny_1_"
    
    inDataNorm  = 1
    inMean = 123.675 116.28 103.53
    inScale = 0.017125 0.017507 0.017429
    
    
    inWidth  = 224
    inHeight = 224 
    inNumChannels = 3
    
    numFrames = 1
    inDataNamesList = "input.1"
    
    
    inData = "../../test/testvecs/input/university/image_list.txt"
    
    postProcType = 1
    
    debugTraceLevel = 3
    writeTraceLevel = 3

    Infer Config :

    #inFileFormat	= 0
    #numFrames	= 1
    
    inFileFormat    = 2
    postProcType = 1
    numFrames   = 1
    
    netBinFile	= "testvecs/output/gen-artifacts/Hirain/tidl_net_deit_tiny_1.bin"
    ioConfigFile	= "testvecs/output/gen-artifacts/Hirain/tidl_io_deit_tiny_1_1.bin"
    
    outData	= "testvecs/output/output-hirai_deit.bin"
    inData	= "testvecs/input/Hirain/image_list.txt"
    
    
    debugTraceLevel	= 2
    #writeTraceLevel	= 1

    Image_list.txt 

    ti-processor-sdk-rtos-j784s4-evm-09_01_00_06/c7x-mma-tidl/ti_dl/test/testvecs/input/input-test-data/input/airshow.jpg 895

    You can get the airshow image from : 

    https://github.com/TexasInstruments/edgeai-tidl-tools/tree/master/test_data

    Here the glance of console output : 

    End of Layer # -  598 with outPtrs[0] = 0x558528943480
    TIDL_process is completed with handle : 288f4000
     T   14497.34 Skipping static gen-set function
     .... ..... .../home/sdk/j784s4/9.1/ti-processor-sdk-rtos-j784s4-evm-09_01_00_06/c7x-mma-tidl/ti_dl/test/testvecs/input/input-test-data/input/airshow.jpg
    895
    
     A :   895, 1.0000, 1.0000,   895 .... .....TIDL_deactivate is called with handle : 288f4000
    PREEMPTION: Removing priroty object with handle = 0x5585288f4000 and targetPriority = 0,      Number of obejcts left are = 0, removed object with base  = 0x5585288f7f80 and size =128
    Workstation:~/sdk/j784s4/9.1/ti-processor-sdk-rtos-j784s4-evm-09_01_00_06/c7x-mma-tidl/ti_dl/test$