AM67A: Monodepth Estimation

Part Number: AM67A
Other Parts Discussed in Thread: MIDAS

Hi,
I am trying to run the Structure From Motion (SfM) Application and the Edge AI Monodepth Estimation Demo on my board, but I am encountering several issues.

  1. The Structure From Motion Application is missing from my software build, and I do not know how to correctly add or install it on the board.

  2. I want to run the Edge AI Monodepth Estimation Demo using a custom model.
    My board is running TIDL 11_00_06_00, and I generated the model artifacts for de-7310_midas-small_onnx using ORST.

  3. When I try to launch the GStreamer pipeline for the Monodepth Estimation demo, the pipeline fails to start. This may be caused by incompatibility, missing components, or incorrect integration of the model in TIDL.

  4. My goal is to test a custom Monodepth Estimation model and integrate it into ROS, so I need clarification on:

    • how to add or build the Structure From Motion Application if it is not included in my current build;

    • how to correctly run the Edge AI Monodepth Estimation demo with a custom model on TIDL 11_00_06_00;

    • what steps are required to ensure that the GStreamer pipeline successfully initializes and runs the model.

Best regards,
Andow

  • Hi Androw, 

    Good questions here. In general, mono-cam depth estimation has limited support for AM67A regarding what we provide out-of-box in the SDK. Let me respond to your queries individually

    The Structure From Motion Application is missing from my software build, and I do not know how to correctly add or install it on the board.

    This is part of a 'vision_apps' [1] package that is only supported on TDA4x processors. This software is part of the PSDK RTOS and also part of the AM67A firmware-builder, but I'll reiterate that industrial devices like AM67A, AM62A do not include vision_apps support from the e2e team. 

    I want to run the Edge AI Monodepth Estimation Demo using a custom model.
    My board is running TIDL 11_00_06_00, and I generated the model artifacts for de-7310_midas-small_onnx using ORST.

    Okay, sounds like that at least compiled. Note that we support postprocessing for image classification, object detection, keypoint detection (for human pose) and semantic/pixel-level segmentation. Depth estimation does not have an associated postprocessing set of functions

    When I try to launch the GStreamer pipeline for the Monodepth Estimation demo, the pipeline fails to start. This may be caused by incompatibility, missing components, or incorrect integration of the model in TIDL.

    Yep, that's expected. See above point. Monodepth estimation in our tooling does not include a post-processing implementation. This is for the user (you) to implement. You can use that git repo for mono depth estimation [2] as a baseline, though I only wrote this for the MiDaS model in Python. Other models may have some differences w.r.t. post-processing (e.g. output is depth vs. disparity)

    My goal is to test a custom Monodepth Estimation model and integrate it into ROS, so I need clarification on:

    • how to add or build the Structure From Motion Application if it is not included in my current build;

    • how to correctly run the Edge AI Monodepth Estimation demo with a custom model on TIDL 11_00_06_00;

    • what steps are required to ensure that the GStreamer pipeline successfully initializes and runs the model.

    For the first, it would be recommended to use TDA4x SDK to build vision apps for this Structure From Motion application. This is a stereo vision application rather than mono.

    To run that demo (which I developed a few years ago), you would need to minimally change the model path in the run_demo.sh script. If it is also MiDaS as you say, then I'm hopeful that is the extent of the changes. However, incorporating this into edgeai-gst-apps [3] would require you add a postprocessing function for depth estimation (in python, CPP, or into the tidlpostproc Gstreamer plugin (also CPP))

    On the last point, this depends on if you need postprocessing. If you want to just run the model to check that it initializes and runs, but let the results disappear in to the void or /dev/null, you can make a pipeline with optiflow [4]. The resulting Gstreamer string will include some postprocessing and visualization (to HDMI monitor, to file, to network, etc. ). You can chop off the portion after tidlinferer plugin so that the input pipe to the model and the model itself run, but nothing depth-estimation specific thereafter. 

    I can give more information in whichever direction you need -- hope the verbose response is helpful.

    P.S. For robotics, we're altering our positioning such that TDA4x devices are the recommendation, especially if there is any need for functional safety (ASIL/SIL), as opposed to AM6xA. The AM67A equivalent is TDA4AEN -- they are similar devices but have different software. 

    [1] https://git.ti.com/cgit/processor-sdk/vision_apps

    [2] https://github.com/TexasInstruments/edgeai-gst-apps 

    [3] https://github.com/TexasInstruments-Sandbox/edgeai-demo-monodepth-estimation 

    [4] https://github.com/TexasInstruments/edgeai-gst-apps/tree/main/optiflow 

  • Thank you for the detailed explanation — it was very helpful for me.
    My main task right now is to obtain depth estimation using only a single camera, so it seems that the SFM app is not suitable for my case.

    Therefore, my current goal is to run edgeai-demo-monodepth-estimation.

    Regarding post-processing — I’ve already figured that part out. I created a small ORST handler, but I would still like to be able to test the pipeline with a video stream or a live camera.

    At the moment, I have a Webcam C170, which has slightly different parameters compared to the C270, so I updated the resolution for usb-720p in gst_configs.py.

    I also compiled the Midas model for the TIDL version that is available on my board.

    After updating the camera and model in run_demo.sh, I run the demo and get the following issue:

    root@j722s-evm:/opt/app/edgeai-demo-monodepth-estimation# ./run_demo.sh 
    libtidl_onnxrt_EP loaded 0x182d3720 
    Final number of subgraphs created are : 1, - Offloaded Nodes - 84, Total Nodes - 84 
    APP: Init ... !!!
       107.149544 s: MEM: Init ... !!!
       107.149634 s: MEM: Initialized DMA HEAP (fd=5) !!!
       107.149851 s: MEM: Init ... Done !!!
       107.149887 s: IPC: Init ... !!!
       107.205711 s: IPC: Init ... Done !!!
    REMOTE_SERVICE: Init ... !!!
    REMOTE_SERVICE: Init ... Done !!!
       107.213742 s: GTC Frequency = 200 MHz
    APP: Init ... Done !!!
       107.213934 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
       107.213957 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
       107.213981 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
       107.215012 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-0 
       107.215449 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-1 
       107.215991 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-2 
       107.216467 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-3 
       107.216517 s:  VX_ZONE_INFO: [tivxInitLocal:202] Initialization Done !!!
       107.216609 s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
    Calculating output tensor dimensions and offsets...
    model datatype : float32
    caps: video/x-raw, width=1920, height=1080, format=RGB, framerate=0/1
    Parsing GST pipeline: 
    input: v4l2src device=/dev/video-usb-cam0  ! image/jpeg,width=640,height=480 ! jpegdec ! tiovxdlcolorconvert  !  video/x-raw, format=NV12 ! queue leaky=2 max-size-buffers=2  ! tiovxmultiscaler name=split_resize     split_resize.   ! video/x-raw, width=246, height=246, format=NV12   ! tiovxdlpreproc out-pool-size=4 data-type=float32   channel-order=nchw tensor-format=rgb  mean-0=0.000000 mean-1=0.000000 mean-2=0.000000 scale-0=0.003922 scale-1=0.003922 scale-2=0.003922  ! application/x-tensor-tiovx  ! appsink name=tensor_in max-buffers=2 drop=True    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1536, height=864, format=NV12 ! tiovxdlcolorconvert out-pool-size=4 ! video/x-raw, format=RGB ! appsink name=image_in max-buffers=2 drop=True
    
    output:  appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1920, height=1080  ! queue ! tiovxdlcolorconvert out-pool-size=4  ! video/x-raw, format=NV12  !  tiperfoverlay main-title=""  ! kmssink sync=false driver-name=tidss  plane-id=31 force-modesetting=True
    
    Starting with in_gst: 
    v4l2src device=/dev/video-usb-cam0  ! image/jpeg,width=640,height=480 ! jpegdec ! tiovxdlcolorconvert  !  video/x-raw, format=NV12 ! queue leaky=2 max-size-buffers=2  ! tiovxmultiscaler name=split_resize     split_resize.   ! video/x-raw, width=246, height=246, format=NV12   ! tiovxdlpreproc out-pool-size=4 data-type=float32   channel-order=nchw tensor-format=rgb  mean-0=0.000000 mean-1=0.000000 mean-2=0.000000 scale-0=0.003922 scale-1=0.003922 scale-2=0.003922  ! application/x-tensor-tiovx  ! appsink name=tensor_in max-buffers=2 drop=True    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1536, height=864, format=NV12 ! tiovxdlcolorconvert out-pool-size=4 ! video/x-raw, format=RGB ! appsink name=image_in max-buffers=2 drop=True
    
    
    out gst:  appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1920, height=1080  ! queue ! tiovxdlcolorconvert out-pool-size=4  ! video/x-raw, format=NV12  !  tiperfoverlay main-title=""  ! kmssink sync=false driver-name=tidss  plane-id=31 force-modesetting=True
    Starting GST pipeline
    pull buffers
    pull buffers
    pull buffers
    pull buffers
    pull buffers
    pull buffers
    pull buffers
    pull buffers
    ...

    The camera LED does not turn on. After doing some debugging of the GStreamer commands, I found that the pipeline does not start and crashes right after:

    ! tiovxmultiscaler name=split_resize     split_resize.

    I would appreciate any advice on what I should check next.

  • Hi Adam,

    Yes, I see the problem. Your camera is 720p, but some of the downscaling sizes are using larger resolutions

    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1536, height=864, format=NV12

    The tiovxmultiscaler plugin (and corresponding "MSC" hardware accelerator block) will only downscale images. No upscaling is supported in this hardware. 

    That portion of the pipeline-generation would need to be edited to handle this smaller input resolution. There may be a similar aspect for the output visualization. I see the data from the appsrc plugin is expected to be 1080p

    appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1920, height=1080

    This demo-ware application does not handle as many permutations of input/output sizes as edgeai-gst-apps, but is also not quite so complex in the pipeline-generation logic

    BR,
    Reese

  • Thanks for the reply, I fixed the settings for pipeline and everything seems to work but there is no video on the display...

    root@j722s-evm:/opt/app/edgeai-demo-monodepth-estimation# ./run_demo.sh 
    libtidl_onnxrt_EP loaded 0x25954800 
    Final number of subgraphs created are : 1, - Offloaded Nodes - 84, Total Nodes - 84 
    APP: Init ... !!!
     25671.558889 s: MEM: Init ... !!!
     25671.558968 s: MEM: Initialized DMA HEAP (fd=5) !!!
     25671.559181 s: MEM: Init ... Done !!!
     25671.559215 s: IPC: Init ... !!!
     25671.618936 s: IPC: Init ... Done !!!
    REMOTE_SERVICE: Init ... !!!
    REMOTE_SERVICE: Init ... Done !!!
     25671.630651 s: GTC Frequency = 200 MHz
    APP: Init ... Done !!!
     25671.630877 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
     25671.630894 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
     25671.630903 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
     25671.631824 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-0 
     25671.632152 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-1 
     25671.632387 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-2 
     25671.632627 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-3 
     25671.632657 s:  VX_ZONE_INFO: [tivxInitLocal:202] Initialization Done !!!
     25671.632720 s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
    Calculating output tensor dimensions and offsets...
    model datatype : float32
    caps: video/x-raw, width=1280, height=720, format=RGB, framerate=0/1
    Parsing GST pipeline: 
    input: v4l2src device=/dev/video-usb-cam0  ! image/jpeg,width=1024,height=768 ! jpegdec ! tiovxdlcolorconvert  !  video/x-raw, format=NV12 ! queue leaky=2 max-size-buffers=2  ! tiovxmultiscaler name=split_resize     split_resize.   ! video/x-raw, width=328, height=246, format=NV12  ! tiovxmultiscaler target=1 ! queue leaky=2 max-size-buffers=2 ! video/x-raw, width=246, height=246  ! tiovxdlpreproc out-pool-size=4 data-type=float32   channel-order=nchw tensor-format=rgb  mean-0=0.000000 mean-1=0.000000 mean-2=0.000000 scale-0=0.003922 scale-1=0.003922 scale-2=0.003922  ! application/x-tensor-tiovx  ! appsink name=tensor_in max-buffers=2 drop=True    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1024, height=576, format=NV12 ! tiovxdlcolorconvert out-pool-size=4 ! video/x-raw, format=RGB ! appsink name=image_in max-buffers=2 drop=True
    
    output:  appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1280, height=720  ! queue ! tiovxdlcolorconvert out-pool-size=4  ! video/x-raw, format=NV12  !  tiperfoverlay main-title=""  ! kmssink sync=false driver-name=tidss  plane-id=31 force-modesetting=True
    
    Starting with in_gst: 
    v4l2src device=/dev/video-usb-cam0  ! image/jpeg,width=1024,height=768 ! jpegdec ! tiovxdlcolorconvert  !  video/x-raw, format=NV12 ! queue leaky=2 max-size-buffers=2  ! tiovxmultiscaler name=split_resize     split_resize.   ! video/x-raw, width=328, height=246, format=NV12  ! tiovxmultiscaler target=1 ! queue leaky=2 max-size-buffers=2 ! video/x-raw, width=246, height=246  ! tiovxdlpreproc out-pool-size=4 data-type=float32   channel-order=nchw tensor-format=rgb  mean-0=0.000000 mean-1=0.000000 mean-2=0.000000 scale-0=0.003922 scale-1=0.003922 scale-2=0.003922  ! application/x-tensor-tiovx  ! appsink name=tensor_in max-buffers=2 drop=True    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1024, height=576, format=NV12 ! tiovxdlcolorconvert out-pool-size=4 ! video/x-raw, format=RGB ! appsink name=image_in max-buffers=2 drop=True
    
    
    out gst:  appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1280, height=720  ! queue ! tiovxdlcolorconvert out-pool-size=4  ! video/x-raw, format=NV12  !  tiperfoverlay main-title=""  ! kmssink sync=false driver-name=tidss  plane-id=31 force-modesetting=True
    Starting GST pipeline
    pull buffers
    got GST buffers in app code
    0.06637763977050781
    pull buffers
    got GST buffers in app code
    0.03599214553833008
    pull buffers
    got GST buffers in app code
    0.03386831283569336
    pull buffers
    got GST buffers in app code
    0.03460121154785156
    pull buffers
    got GST buffers in app code
    0.04039931297302246
    pull buffers
    got GST buffers in app code
    0.04718732833862305
    pull buffers
    got GST buffers in app code
    0.05154824256896973
    pull buffers
    got GST buffers in app code
    0.048491477966308594
    pull buffers
    got GST buffers in app code
    0.05122661590576172
    pull buffers
    got GST buffers in app code
    0.04781746864318848
    pull buffers
    got GST buffers in app code
    0.051619529724121094
    pull buffers
    got GST buffers in app code
    0.04768490791320801
    pull buffers
    got GST buffers in app code
    0.05287361145019531
    pull buffers
    got GST buffers in app code
    0.04698657989501953
    pull buffers
    got GST buffers in app code
    0.05157828330993652
    pull buffers
    got GST buffers in app code
    0.04773426055908203
    pull buffers
    got GST buffers in app code
    0.05212545394897461
    pull buffers
    got GST buffers in app code
    0.047638893127441406
    pull buffers
    got GST buffers in app code
    0.05172157287597656
    pull buffers
    got GST buffers in app code
    0.04822945594787598
    pull buffers
    got GST buffers in app code
    0.05141901969909668
    pull buffers
    got GST buffers in app code
    0.047869205474853516
    pull buffers
    got GST buffers in app code
    0.05159640312194824
    pull buffers
    got GST buffers in app code
    0.048233985900878906
    pull buffers
    got GST buffers in app code
    0.051508188247680664
    pull buffers
    got GST buffers in app code
    0.04786062240600586
    pull buffers
    ^CKB shortcut caught
    
    Ran 26 frames
    **** Runtime Stats ****
    ---- Pull input time (ms): avg 19 +- 5 (min to max: 3 to 26)
    ---- infer time (ms): avg 5
    ---- Output (draw, post-proc) time (ms): avg 20 +- 1
    ---- FPS: 20.73
    -----------------------
    
    True
    paused pipe; waiting gst thread to join
    exiting...
    

  • Okay, I fixed the display and it seems to work correctly, but the camera doesn't respond to objects. Everything is always blue.
    Is this a problem with the model compilation?
    And is there a way to check the model is correct other than running it in the app?

      

  • Hello,

    You may want to dump the output tensor into a file or add print statements to look at the distribution of data. If it is all the same output value, then that is of course an error. What was your approach for compiling the model? I would suggest returning to that script and look at the output data distribution on static images. 

    Usually these types of models with disparity output need to normalize the output so that there is a better range of data. If you're using the demo code here (looks like it), then that's handled at [3]. Maybe worth printing out the distribution of data

    Another aspect is that some models are particularly sensitive to 8-bit quantization. Continuous valued outputs like depth/disparity maps are impacted by this in particular, but also applies to things like bounding boxes / keypoint pixel locations. To solve these situations, we typically use a mixture of 8-bit and 16-bit. Usually, 16-bit layers are only needed for the last couple of layers. For the midas model, it looks like there are two layers that we've set that option for [1][2]

    [1] https://github.com/TexasInstruments/edgeai-tensorlab/blob/3de61dfa503c408346c3bcd029f49a25e42a8a73/edgeai-benchmark/configs/depth_estimation.py#L71 -- see the output_feature_16bit_names_list which has a comma-separated list of layer names

    [2] https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/README.md#advanced-options-for-accuracy-enhancement -- reference for the mixed-precision settings

    [3] https://github.com/TexasInstruments-Sandbox/edgeai-demo-monodepth-estimation/blob/b5dd792e45d4ead4a0c2f4b089477906519ad085/display.py#L165 -- postprocessing for the depth map

    BR,
    Reese



  • Hi, it turned out that there was an mistake in OSRT config for the model. Here is the config that gives this result:

        "de-7310_midas-small_onnx": create_model_config(
            task_type="depth_estimation",
            source=dict(
                model_url="http://software-dl.ti.com/jacinto7/esd/modelzoo/11_00_00/models/vision/depth_estimation/nyudepthv2/midas/midas-small.onnx",
                infer_shape=True,
            ),
            preprocess=dict(
                resize=[256, 256], 
                crop=[256, 256], 
                data_layout="NCHW",
                pad_color=0,
                resize_with_pad=False,
                reverse_channels=False, 
                add_flip_image=False,
            ),
            session=dict(
                session_name="onnxrt",
                model_path=os.path.join(models_base_path, "midas-small.onnx"),
                target_device="AM67A", 
                input_mean=[123.675, 116.28, 103.53],
                input_scale=[0.017125, 0.017507, 0.017429], 
                input_optimization=False, 
            ),
            postprocess=dict(
                with_argmax=False, 
            ),
            extra_info=dict(
                num_images=numImages, 
                num_classes=1,
            ),
            runtime_options = {
                'advanced_options:output_feature_16bit_names_list' : [511, 983],
            },
        )

    Can you review it? maybe I missed something...
    I also noticed that sometimes there are chaotic changes in the distance on dark objects, is there any way to increase the stability of the estimate?