This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-AM62A-LP: [edgeai-tidl-tools] A Segmentation fault error occurred during TIDL compilation.

Part Number: SK-AM62A-LP

Hi, friends.

       When I use the tidl_tools to compile the tidl model for am62a, an error occurred.

        Segmentation fault      (core dumped) python3 onnxrt_ep.py -c

        This is my model. Can you help me troubleshoot it?

          model_best.zip

Best Regards

Henry

  • Hi Henry,

    Thanks for your query. I'll try to compile this on my side to reproduce the error.

    I wanted to get some clarification as well:

    1. Are there any particular settings here that are important? I'm guessing that this is a minimal example of testing the workflow. I'll make a small config for the compilation script, but seeing yours might help isolate any issues there.

    2. If you have logs available, would you mind attaching them?

    3. Why is one of the outputs named 'input' whereas the first input is 'input.1'?

    4. When creating this model in ONNX, what opset version did you use? To my knowledge, we support versions 9 and 11 -- I do not expect other versions to import correctly.

    Now on the compilation itself, I started by sanity-checking that the onnx model opens and runs on random input - it does. I tried to compile with the following configuration in edgeai-tidl-tools/examples/osrt_python/model_configs.py:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    'model-best-henry2333' : {
    'model_path' : os.path.join(models_base_path, 'model_best.onnx'),
    'source' : {'model_url': None, 'opt': False, 'infer_shape' : False},
    'mean': [0,0,0],
    'scale' : [1,1, 1],
    'num_images' : numImages,
    'model_type' : None,
    'session_name': 'onnxrt'
    }
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    I see the error you're running into. At max debug level, I see a segmentation fault occur right after  the first layer is specified:

    Fullscreen
    1
    2
    Supported TIDL layer type --- Conv -- /conv1/Conv
    Segmentation fault (core dumped)
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    To me, this means that the failure occurred while it was trying to understand the individual layers of the model. I am going to look deeper at this so I can understand where it's failing.

    Please give me some a few days here. In the meantime, could you respond to my clarifying questions above? Thank you.

    Best,
    Reese

  • Hi, Reese

         Thank you for your assistance. Here are some details.

         1.  There is no special configuration; I can show some settings.

               platform: J7

               version: '8.6'

               tensor_bits: 8
         2.   There is no valuable log because a Segmentation fault occurred, and the program terminated.
         3.   This model is an example model created to reproduce the issue, using default-generated tensor names. The tensor names have no special meaning.
         4.   I usually set opset_version=11.

    Best Regards

    Henry

          

  • Thank you for the details, Henry. That configuration should work.

    Since I was able to reproduce the issue, I will work on this further myself. I'll try to find where and why this segmentation fault is happening. Please give me a couple of days to dig into this.

    Best,
    Reese

  • Hi Henry,

    I'm narrowing on the issue, and it's related to the import tool not allocating tensor sizes correctly for the second convolution layer. While I find a fix for this, could I ask that you modify the network to use different names for input.1 and input? What I'm seeing on my side with a deep dive in the code is suggesting that the 'input' name is causing an unexpected issue since it's both an intermediate tensor and an output. I'll verify this soon.

    Best,
    Reese

  • Hi Henry,

    I'll slightly revise what I said previously. The fact that an internal/output name is called 'input' should not be a major problem, but I think the fact that an output is also used as an intermediate tensor/net is causing a problem.

    I assume that is made an output so you can do some additional analysis on the intermediate information. To me, this should be a supported capability, and I will see what I can learn on this. For now, I would recommend removing that net named 'input' from the graph's set of outputs and retry this.

    I was able to produce a temporary fix to this issue, but it ran into a secondary seg fault, which I believe was within the ONNX runtime stack. I will raise this with the dev team.

    -Reese