SK-TDA4VM: Optiflow - Custom Model output buffer allocation problems leading to segmentation fault

Part Number: SK-TDA4VM

Hi,

I ported YOLOPX for SK-TDA4VM on SDK version 11.00.00.08 with TIDL Tools tag 11.01.05.00.

The model does run, but because of the code being in python, is very slow.

I tried running the model using Optiflow framework by adding code to /opt/edgeai-dl-inferer and following the steps mentioned in this E2E Link

This process of adding the code, in itself was successful, meaning I can see my model running( or at least initializing) and can see that my post processing constructor gets called. However I often get a segmentation fault, sometimes even before the constructor gets called.

I am attaching files with the terminal output with the said segmentation fault and running a backtrace in gdb.

In addition I have one file with output when the code runs without any segmentation fault.

I need help with the following:

  1. How do I activate a logging level other than DL_INFER_LOG_ERROR? And how do get debugging info? The following were changed in /opt/edgeai-gst-apps/configs/yolopx.yaml
    1. I tried setting debugging to mask 7 inside, but no debug prints or directories are being generated.
    2. changing log_level also does not seem to make any difference 
  2. You can see in the files attached that first iteration through the model (just above the part where the Gstreamer pipelines are printed) uses different addresses for the different model outputs. But after that 4 of the outputs use the same address and another 2 outputs use different addresses. 
    1. Does this mean that the first 4 outputs just get overwritten one after another? I.e. detection is being overwritten by stride8 and so on?
    2. The address difference between segmentation and lane is much smaller than the size of these outputs, which would mean that things are being overwritten even there.
    3. I need to figure out where the buffer allocation is actually being called, but it seems that the number of outputs are somehow not properly allocated individual memory. 
  3. How can I see where the code calling the DL functions is called? Is there some other part of the SDK where I can see the actual creation of inferers and instances of post processors when running with optiflow?
    1. I found some code under /opt/edgeai-dl-inferer/tests which shows how everything is called, but the app_dl_inferer_test located under bin/Release does not work as it throwing some memory allocation error. I even tried this with the sematic segmentation model provided in edgeai-tidl-tools examples to make sure the error is not because of my model. A file with that backtrace is also attached.

segfault.txt 

running_fine.txt 

app_dl_inferer_test.txt 

post_process_yolopx.cpp

post_process_yolopx.h

Due to size constraints I am unable to upload the model artifacts and related yaml files. Please let me know how I can share these with you and if you need anything else.Regards,

Charanjit Singh

Edit: Added files. to be placed under /opt/edgeai-dl-inferer/post_process/include and src

  • Hi, can you try to compress the files and share here ?

  • Hi Vaibhav,

    I tried compressing the model artifacts with zip, however the size is still too large for the upload.

    In the below upload, I removed the optimized model and zipped the rest of the directory. You can download the optimized model from this link provided by your colleague where he uploaded the model.

    e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/yolopx_5F00_optimized.onnx 

    e2e.ti.com /cfs-file/__key/communityserver-discussions-components-files/791/yolopx_5F00_optimized.onnx 

    yolopx_no_model.zip

    Regards,

    Charanjit