This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-AM62A-LP: EDGE AI TIDL Model compilation issues

Part Number: SK-AM62A-LP

I am having problems with the compilation of a custom (Facial Landmark detection) model.
Both the onnx and the tflite version of the model are failing to compile.
They do however work perfectly, when I only do inference on the CPU.

More info about the model in question can be found here:
github.com/.../pj_tflite_face_landmark_with_attention


With tflite, the model compilation script stops with the following error:
tidl_import_common.cpp:192 void* my_alloc(int): Assertion 'ptr=NULL' failed

This is porbably due to the following messages:

Unsupported (TIDL check) TIDL layer type --- 26 Tflite layer type --- 34 layer output name--- channel_padding
Unsupported (TIDL check) TIDL layer type --- 1 Tflite layer type --- 3 layer output name--- output_mesh_identity:0
Unsupported (TIDL check) TIDL layer type --- 0 Tflite layer type --- 0 layer output name---           Add_6
Unsupported (import) TIDL layer type for Tflite layer type --- 8  layer output name---           Floor
Unsupported (TIDL check) TIDL layer type --- 54 Tflite layer type --- 53 layer output name---            Cast
Unsupported (import) TIDL layer type for Tflite layer type --- 81  layer output name---            Prod

This is strange, because according to github.com/.../supported_ops_rts_versions.md Add and Conv2d layers should be supported.  


When add these layers to the deny_list, the compilation script says:

'ALL MODEL CHECK PASSED'


and then during inference:
In TIDL_runtimesPostProccessNet 4
In TIDL_subgraphRtCreate
Bus error (core dumped)


The onnx model has issues with the 'Neg' and 'Mul' layers. When I reduce the network to only the first few layers, the model still does not compile.
How can I fix this?

I have provided the models and compilation script.

mesh_model_compile.zip

  • Hi Kasper,

    Thanks for including enough files and example code for me to look at this on my side. I notice you're setting debug_leve to 5. Could you try this at 2 or lower? Levels 3 and 4 are predominately for use during inference, and I have seen errors thrown during compilation from using debug level >=3. I'd recommend setting to 2, since this is maximum verbosity -- going above that is for diagnosing accuracy issues related to quantization. As you're doing this, could you provide a log of the tflite or onnx compilation when running without including any layer types in the deny list?

    Those layers should certainly be allowed in the model, especially simple elementwise operations like ADD and multiply. Layers like Conv2D with additional parameters such as kernel size and padding have restrictions on those parameters, e.g. kernel must be no longer than 7x7.

    It would also help to know which tidl_tools version you are using. Was this downloaded with edgeai-tidl-tools? Knowing the commit tag of that would be enough. The current commit on the main branch (aka tag 09_00_00_06, which is the SDK version) uses TIDL 09_00_00_01

    I'll work to reproduce this on my side as well as get back sometime today or Monday.

    Best,
    Reese

  • Hi Kasper,

    Adding in some comments on this again after reproducing myself. I can verify on my side that I'm seeing the same (or at least similar) issues. I tried this on the previous release (8.6) without success. I see that with no layers in the deny-list, it fails during model checks, and if the right layers are denied, then it passes model checks but faults during compilation.

    I am going to look into this with the dev team. There are layers in the network that I would expect to require Arm offload, but layers like Conv and Mul should not be offloaded by default. The number of nodes that do this cause a large number of subgraphs ( >>16, which is the max that TIDL supports), so it's possible the compiler is struggling to simulate the accelerator, thus causing a fault.

    Is this model purely a custom architecture or is it based on a standard architecture? That pertains both to the backbone/encoder and the detection/keypoint heads. We generally have to do some additional work for detection models since they use a large set of layers with low compute - linked below is a doc related to meta-architectures and how we handle object detection models.

    https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/tidl_fsg_od_meta_arch.md

    I assume Tensorflow is the framework used for model development. Could you provide any info about the version?

    -Reese

  • Hi Reese,

    Thank you for the response. This custom model is basically a (slightly) modified version of the Google Attention Mesh model:

    see https://arxiv.org/pdf/2006.10962.pdf

    and

    https://github.com/google/mediapipe/blob/master/docs/solutions/face_mesh.md#face-landmark-model

    We did not train the model ourselves, but we downloaded it from the PINTO model zoo

    https://github.com/PINTO0309/PINTO_model_zoo/tree/main/282_face_landmark_with_attention

    The onnx version is a result of the tflite2tensorflow conversion tool:

    https://github.com/PINTO0309/tflite2tensorflow

    The large number of subgraphs is probably not the issue. When I edit the onnx model and just take the first few layers, I get a model with only Conv, Add, Mul, Neg and Relu layers. When I try to compile tthis model it still fails, and on the CPU it still works.

    Kasper

    P.S. I have checked the license of the Facial landmark model and we are allowed to use it :)

  • Hi Kasper,

    Thank you for this information -- it is very helpful. I cannot tell if the Attention Mesh model is actually using an transformer-like components in the architecture -- doesn't seem so at a glance. I mention this because we don't yet have native support for self-attention/transformer.

    One of the components in this model that may be problematic is the many elementwise operations along a single axis (e.g. input tensor is NxCxHxW, and tensor to multiply by is 1x1x1xW) -- I don't believe we currently support this. For layers we do support, there are some configurations that are not covered in the SW implementation on our accelerator, and it seems that the logging during compilation is not making it known why it's failing in this instance. It also looks like this 'parametric relu' or prelu is decomposed by tf2onnx into more elementwise operators, since I'm fairly certain PRELU isn't natively supported by TIDL.

    I've briefed the dev team on this and filed an issue on this topic. In the short term, I think we'll struggle to get this model compiled, frankly. Complex detection heads can be difficult to optimize as well (we support 4-6 'meta-architectures' out of the box to help simplify this for devs; see link in my last reply)

    Are you open to alternate architectures? I would assume a perk of your current model is that it was pretrained. We have a keypoint-detection model based on YOLOX that may be of interest, although this was first developed for human pose, so some modification + training required for doing a face-mesh

    Best,
    Reese

  • Hi Reese,

    I am a colleague of Kasper, also working on the same problem.

    Thank you for your reply with the information. Perhaps you have already noticed this, but when I try to compile the tflite version of this model, it essentially gives two very similar errors: 

    1. Error: Layer 12, BatchGatherND/concat_3:BatchGatherND/concat_3 is missing inputs in the network and cannot be topologically sorted
    2. Error: Layer 0, BatchGatherND_3/concat_3:BatchGatherND_3/concat_3 is missing inputs in the network and cannot be topologically sorted

    On checking with netron.app, it seems BatchGatherND/concat_3 and BatchGatherND_3/concat_3 are the names of connections between a Concatenation layer and a GatherND layer (see screenshots below).

    It is strange that this kind of connection throws this error here, because such a connection (with exact same dimensions) is repeated 10 other times in the network without any problems (in other branches):

    1. BatchGatherND_1/concat_3
    2. BatchGatherND_2/concat_3
    3. BatchGatherND_4/concat_3
    4. BatchGatherND_5/concat_3
    5. BatchGatherND_6/concat_3
    6. BatchGatherND_7/concat_3
    7. BatchGatherND_8/concat_3
    8. BatchGatherND_9/concat_3
    9. BatchGatherND_10/concat_3
    10. BatchGatherND_11/concat_3

    FYI, here is the model card for this model. It does say that this model architecture is based on ModelNetV2 (but with customized blocks), which is supported by TI-EdgeAI (as seen in the examples).

    I'm hoping that this information might help you in debugging this issue further. Fingers crossed


    Alternatives

    If the specific architecture of this particular network is really the issue, this is an older version of this type of model available that does not have attention-related components (onnx version available here). [Model card available here.]

    Additionally, there is also a newer version of this type of model available here (both tflite and converted onnx available). [Model card available here.]

    However, the compilation of both these tflite models gives a segmentation fault while setting the interpreter (without any additional information).
    Compiling the ONNX versions gives the error: “Unknown model file format version.”.

    Just letting you know these things in case problems with one of these alternatives is solvable.


    Other than this, the supported alternative models you suggested are unfortunately not feasible for us. We need a dense mesh of landmarks on the face, and training such a model ourselves requires access to large datasets that we don't have :(

    Regards,

    Amogh

  • Hi Amogh,

    Thanks for adding on more information. I'm still looking into this, but I'll make a few comments per your response.

    Regarding the BatchND concatenation nodes, I agree it's strange that it would be throwing errors here, especially for only 2 of 10+. It seems like it may be looking for an input (name: BatchGatherND_2/Tile:0) that is defined as a static value rather than an actual input to the net. That doesn't explain why it would happen for some and not others... It's a good observation - thank you for mentioning. It might be possible to test this theory by modifying that particular net from python or graphically with https://github.com/ZhangGe6/onnx-modifier

    One issue across the board is with the PRELU nodes - I think this is more pronounced for the ONNX version, where it's broken into a sequence of neg, mul, add, etc. operations. What I notice about these operations is that their static values are stored in a tensor that does not exactly match the dimensions of the input -- it is missing a batch dimension (image below). My assumption is that our compilation tools are rejecting the layer for this reason. Revising an earlier comment, I know of several other models that have done element-wise add/multiply along a specific axis, and those define all 4 dimensions (NCHW) where this is defining 3 (CHW). I'll have to take a deeper look to confirm my suspicion that this is why basic elementwise operations are being rejected during model checks.

    Regarding the unknown model file format version, I took a quick look at the file and saw it was created for ONNX opset 13. We support opset 9 and 11 (preferably 11), so this might be the reason why that alternative model is failing. The TFLite version said v3, although I'm not sure how this aligns with actual tensorflow-lite versions, since it doesn't have a minor version value as I would expect. However, the 'newer' version of the model is using opset 11, so I'm surprised it is breaking on this.

    Understood on the alternative architectures. A dataset for something like this is difficult/expensive to create  :)

    I'll keep looking into this. The dev team is also aware of this, since it seems there are multiple issues at play here.

    Best, Reese