AM62A3: Models compiled with edgeai-tidl-tools cause segmentation fault on AM62a

Stefan Werner

Part Number: AM62A3

Tool/software:

Hello, I am using a AM62a target device where offloading models to the accelerator causes a segmentation fault.

- I have validated the "out-of-box" examples within a docker container that I use to compile the artifacts

- I have compiled the osrt_python/tfl and osrt_python/ort examples successfully within the container (these are the files that I am transfering to the target)

- I have tested the inference in the docker container without offloading

- I have tested the inference without offloading on my target device

Only when I enable offloading on the AM62a device, I get the segmentation fault. This is the case for the osrt_python/tfl as well as the osrt_python/ort models.

- I am using the release tag 10_00_07_00 within the docker container as well as on the target device

- I have recently updated the target device's TIDL version following this explanation edgeai-tidl-tools/docs/backward_compatibility.md at 10_00_07_00 · TexasInstruments/edgeai-tidl-tools

On a side note: is the description in edgeai-tidl-tools/docs/backward_compatibility.md at 10_00_07_00 · TexasInstruments/edgeai-tidl-tools sufficient to upgrade / install the edgeai-tidl-tool on the target device? I've read something about an RTOS SDK version at some point. Is this also something I need to upgrade? If so, how?

What can I do about this? How do I approach this problem? Thanks for the help!

1 month ago

0 Stefan Werner 1 month ago

Prodigy 30 points

Output of python tflrt_delegate.py:

root@am62dl:/opt/edgeai-tidl-tools/examples/osrt_python/tfl# python3 tflrt_delegate.py
Running 4 Models - ['cl-tfl-mobilenet_v1_1.0_224', 'ss-tfl-deeplabv3_mnv2_ade20k_float', 'od-tfl-ssd_mobilenet_v2_300_float', 'od-tfl-ssdlite_mobiledet_dsp_320x320_coco']

Running_Model : cl-tfl-mobilenet_v1_1.0_224

Number of subgraphs:1 , 34 nodes delegated out of 34 nodes

Segmentation fault (core dumped)

0 Reese Grimsley 1 month ago in reply to Stefan Werner

TI__Genius 10676 points

Hello Stefan,

We will figure out where this is coming from. My first suspicion is related to tool versions

Stefan Werner said:
- I am using the release tag 10_00_07_00 within the docker container as well as on the target device

TIDL Tools versions with odd numbers are designated as ones portable to the previous SDK, in this case 9.2. Is that the version of the SDK that you have on your AM62A installation?

You should have $EDGEAI_SDK_VERSION set to a string similar to 09_02 or 9.2 defined within your linux environment (defined by an auto-run script on login)
You should have updated the firmware, OSRT components (e.g. TFLite and ONNXRT libs), and other ti libraries like libtivision_apps.so using the steps mentioned in the backwards_compatibility.md doc
- it sounds like you have done this, but simply verifying here
- Ensure the $SOC environment variable was set to 'am62a'. This should have also been handled by the auto-run script on login

If you are seeing seg faults on any model, then my estimation is that something was not correctly updated. Your approach on testing different components is isolating this to the TIDL stack, so it is helpful.

I am also curious why your device's hostname is root@am62dl, but perhaps that was intentional and we can ignore.

Suggested steps for collecting more info/logs:

pass debug_level=2 to the runtime when creating the model. It should be sufficient to set this in examples/osrt_python/common_utils.py
On target, run /opt/vx_app_arm_remote_log.out in the background before starting your script
Run the python application from gdb, and check the backtrace for the thread that seg-faulted
Run a `pip3 freeze | grep -i "tflite\|onnx\|tidl" and share the package versions

Stefan Werner said:
On a side note: is the description in edgeai-tidl-tools/docs/backward_compatibility.md at 10_00_07_00 · TexasInstruments/edgeai-tidl-tools sufficient to upgrade / install the edgeai-tidl-tool on the target device? I've read something about an RTOS SDK version at some point. Is this also something I need to upgrade? If so, how?

Yes, the instructions here are sufficient to upgrade the TIDL stack (not just edgeai-tidl-tools) on the previous SDK with latest bugfixes and changes, with one caveat -- the memory map between the EVM and your hardware platform must be compatible. If you are on the starter kit EVM, ignore this point.

I do not think the RTOS SDK (probably PSDK RTOS) is necessary here, but please point me towards this note if you happen across it again. If you needed to change the memory map for your custom hardware, this would be relevant. Note that for AM62A, we have a 'firmware-builder' tool that occupies same function as PSDK RTOS SDK.

BR,
Reese

0 Stefan Werner 1 month ago in reply to Reese Grimsley

Prodigy 30 points

Hello Reese,

thanks for the help, appreciated!

Reese Grimsley said:
TIDL Tools versions with odd numbers are designated as ones portable to the previous SDK, in this case 9.2. Is that the version of the SDK that you have on your AM62A installation?

The EDGEAI_SDK_VERSION is set to 09_00_00. Since I've tried to update the target device to 10_00_07_00, I guess this is wrong, no? I've checked the setup_target_device.sh script that we've used to update the device and could not find anything related to updating this environment variable. Am I missing some steps here to properly update the device to be compatible with the models compiled with edgeai-tidl-tools version 10_00_07_00? On the target device we have used the 10_00_07_00 tag of the edgai-tidl-tools: "root@am62dl:/opt/edgeai-tidl-tools# git status HEAD detached at 10_00_07_00" .

Reese Grimsley said:
Ensure the $SOC environment variable was set to 'am62a'. This should have also been handled by the auto-run script on logi

The SOC variable is indeed set to am62a upon logging into the target device. This is also what we set to compile the model artifacts in the docker container.

Reese Grimsley said:
I am also curious why your device's hostname is root@am62dl, but perhaps that was intentional and we can ignore.

Yes, this is just us renaming our device. This should not matter at all.

Reese Grimsley said:
Run the python application from gdb, and check the backtrace for the thread that seg-faulted

This is the output of GDB when using the option *thread apply all bt*. Does anything suspicious come to mind here?

Processors

Processors forum

AM62A3: Models compiled with edgeai-tidl-tools cause segmentation fault on AM62a