SK-AM62A-LP: [edgeai-tidl-tools] A Segmentation fault error occurred during TIDL compilation.

Henry2333

Intellectual 480 points

Part Number: SK-AM62A-LP

Hi, friends.

When I use the tidl_tools to compile the tidl model for am62a, an error occurred.

Segmentation fault (core dumped) python3 onnxrt_ep.py -c

This is my model. Can you help me troubleshoot it?

model_best.zip

Best Regards

Henry

over 2 years ago

0 Reese Grimsley over 2 years ago

TI__Genius 11106 points

Hi Henry,

Thanks for your query. I'll try to compile this on my side to reproduce the error.

I wanted to get some clarification as well:

1. Are there any particular settings here that are important? I'm guessing that this is a minimal example of testing the workflow. I'll make a small config for the compilation script, but seeing yours might help isolate any issues there.

2. If you have logs available, would you mind attaching them?

3. Why is one of the outputs named 'input' whereas the first input is 'input.1'?

4. When creating this model in ONNX, what opset version did you use? To my knowledge, we support versions 9 and 11 -- I do not expect other versions to import correctly.

Now on the compilation itself, I started by sanity-checking that the onnx model opens and runs on random input - it does. I tried to compile with the following configuration in edgeai-tidl-tools/examples/osrt_python/model_configs.py:

Fullscreen

1
2
3
4
5
6
7
8
9
    'model-best-henry2333' : {
        'model_path' : os.path.join(models_base_path, 'model_best.onnx'),
        'source' : {'model_url': None, 'opt': False,  'infer_shape' : False},
        'mean': [0,0,0],
        'scale' : [1,1, 1],
        'num_images' : numImages,
        'model_type' : None,
        'session_name': 'onnxrt'
    }
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    'model-best-henry2333' : {
        'model_path' : os.path.join(models_base_path, 'model_best.onnx'),
        'source' : {'model_url': None, 'opt': False,  'infer_shape' : False},
        'mean': [0,0,0],
        'scale' : [1,1, 1],
        'num_images' : numImages,
        'model_type' : None,
        'session_name': 'onnxrt'
    }

I see the error you're running into. At max debug level, I see a segmentation fault occur right after the first layer is specified:

Fullscreen

1
2
Supported TIDL layer type ---            Conv -- /conv1/Conv 
Segmentation fault (core dumped)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Supported TIDL layer type ---            Conv -- /conv1/Conv 
Segmentation fault (core dumped)

To me, this means that the failure occurred while it was trying to understand the individual layers of the model. I am going to look deeper at this so I can understand where it's failing.

Please give me some a few days here. In the meantime, could you respond to my clarifying questions above? Thank you.

Best,
Reese

0 Henry2333 over 2 years ago in reply to Reese Grimsley

Intellectual 480 points

Hi, Reese

Thank you for your assistance. Here are some details.

1. There is no special configuration; I can show some settings.

platform: J7

version: '8.6'

tensor_bits: 8

2. There is no valuable log because a Segmentation fault occurred, and the program terminated.

3. This model is an example model created to reproduce the issue, using default-generated tensor names. The tensor names have no special meaning.

4. I usually set opset_version=11.

Best Regards

Henry

0 Reese Grimsley over 2 years ago in reply to Henry2333

TI__Genius 11106 points

Thank you for the details, Henry. That configuration should work.

Since I was able to reproduce the issue, I will work on this further myself. I'll try to find where and why this segmentation fault is happening. Please give me a couple of days to dig into this.

Best,
Reese

0 Reese Grimsley over 2 years ago in reply to Reese Grimsley

TI__Genius 11106 points

Hi Henry,

I'm narrowing on the issue, and it's related to the import tool not allocating tensor sizes correctly for the second convolution layer. While I find a fix for this, could I ask that you modify the network to use different names for input.1 and input? What I'm seeing on my side with a deep dive in the code is suggesting that the 'input' name is causing an unexpected issue since it's both an intermediate tensor and an output. I'll verify this soon.

Best,
Reese

0 Reese Grimsley over 2 years ago in reply to Reese Grimsley

TI__Genius 11106 points

Hi Henry,

I'll slightly revise what I said previously. The fact that an internal/output name is called 'input' should not be a major problem, but I think the fact that an output is also used as an intermediate tensor/net is causing a problem.

I assume that is made an output so you can do some additional analysis on the intermediate information. To me, this should be a supported capability, and I will see what I can learn on this. For now, I would recommend removing that net named 'input' from the graph's set of outputs and retry this.

I was able to produce a temporary fix to this issue, but it ran into a secondary seg fault, which I believe was within the ONNX runtime stack. I will raise this with the dev team.

-Reese

Processors

Processors forum

SK-AM62A-LP: [edgeai-tidl-tools] A Segmentation fault error occurred during TIDL compilation.