AM67A: Monodepth Estimation

Andow Roberts

Part Number: AM67A
Other Parts Discussed in Thread: MIDAS

Hi,
I am trying to run the Structure From Motion (SfM) Application and the Edge AI Monodepth Estimation Demo on my board, but I am encountering several issues.

The Structure From Motion Application is missing from my software build, and I do not know how to correctly add or install it on the board.
I want to run the Edge AI Monodepth Estimation Demo using a custom model.
My board is running TIDL 11_00_06_00, and I generated the model artifacts for de-7310_midas-small_onnx using ORST.
When I try to launch the GStreamer pipeline for the Monodepth Estimation demo, the pipeline fails to start. This may be caused by incompatibility, missing components, or incorrect integration of the model in TIDL.
My goal is to test a custom Monodepth Estimation model and integrate it into ROS, so I need clarification on:
- how to add or build the Structure From Motion Application if it is not included in my current build;
- how to correctly run the Edge AI Monodepth Estimation demo with a custom model on TIDL 11_00_06_00;
- what steps are required to ensure that the GStreamer pipeline successfully initializes and runs the model.

Best regards,
Andow

15 days ago

0 Reese Grimsley 15 days ago

TI__Genius 16756 points

Hi Androw,

Good questions here. In general, mono-cam depth estimation has limited support for AM67A regarding what we provide out-of-box in the SDK. Let me respond to your queries individually

Andow Roberts said:
The Structure From Motion Application is missing from my software build, and I do not know how to correctly add or install it on the board.

This is part of a 'vision_apps' [1] package that is only supported on TDA4x processors. This software is part of the PSDK RTOS and also part of the AM67A firmware-builder, but I'll reiterate that industrial devices like AM67A, AM62A do not include vision_apps support from the e2e team.

Andow Roberts said:
I want to run the Edge AI Monodepth Estimation Demo using a custom model.
My board is running TIDL 11_00_06_00, and I generated the model artifacts for de-7310_midas-small_onnx using ORST.

Okay, sounds like that at least compiled. Note that we support postprocessing for image classification, object detection, keypoint detection (for human pose) and semantic/pixel-level segmentation. Depth estimation does not have an associated postprocessing set of functions

Andow Roberts said:
When I try to launch the GStreamer pipeline for the Monodepth Estimation demo, the pipeline fails to start. This may be caused by incompatibility, missing components, or incorrect integration of the model in TIDL.

Yep, that's expected. See above point. Monodepth estimation in our tooling does not include a post-processing implementation. This is for the user (you) to implement. You can use that git repo for mono depth estimation [2] as a baseline, though I only wrote this for the MiDaS model in Python. Other models may have some differences w.r.t. post-processing (e.g. output is depth vs. disparity)

Andow Roberts said:
My goal is to test a custom Monodepth Estimation model and integrate it into ROS, so I need clarification on:

how to add or build the Structure From Motion Application if it is not included in my current build;

how to correctly run the Edge AI Monodepth Estimation demo with a custom model on TIDL 11_00_06_00;

what steps are required to ensure that the GStreamer pipeline successfully initializes and runs the model.

For the first, it would be recommended to use TDA4x SDK to build vision apps for this Structure From Motion application. This is a stereo vision application rather than mono.

To run that demo (which I developed a few years ago), you would need to minimally change the model path in the run_demo.sh script. If it is also MiDaS as you say, then I'm hopeful that is the extent of the changes. However, incorporating this into edgeai-gst-apps [3] would require you add a postprocessing function for depth estimation (in python, CPP, or into the tidlpostproc Gstreamer plugin (also CPP))

On the last point, this depends on if you need postprocessing. If you want to just run the model to check that it initializes and runs, but let the results disappear in to the void or /dev/null, you can make a pipeline with optiflow [4]. The resulting Gstreamer string will include some postprocessing and visualization (to HDMI monitor, to file, to network, etc. ). You can chop off the portion after tidlinferer plugin so that the input pipe to the model and the model itself run, but nothing depth-estimation specific thereafter.

I can give more information in whichever direction you need -- hope the verbose response is helpful.

P.S. For robotics, we're altering our positioning such that TDA4x devices are the recommendation, especially if there is any need for functional safety (ASIL/SIL), as opposed to AM6xA. The AM67A equivalent is TDA4AEN -- they are similar devices but have different software.

[1] https://git.ti.com/cgit/processor-sdk/vision_apps

[2] https://github.com/TexasInstruments/edgeai-gst-apps

[3] https://github.com/TexasInstruments-Sandbox/edgeai-demo-monodepth-estimation

[4] https://github.com/TexasInstruments/edgeai-gst-apps/tree/main/optiflow

0 Andow Roberts 15 days ago in reply to Reese Grimsley

Prodigy 30 points

Thank you for the detailed explanation — it was very helpful for me.
My main task right now is to obtain depth estimation using only a single camera, so it seems that the SFM app is not suitable for my case.

Therefore, my current goal is to run edgeai-demo-monodepth-estimation.

Regarding post-processing — I’ve already figured that part out. I created a small ORST handler, but I would still like to be able to test the pipeline with a video stream or a live camera.

At the moment, I have a Webcam C170, which has slightly different parameters compared to the C270, so I updated the resolution for usb-720p in gst_configs.py.

I also compiled the Midas model for the TIDL version that is available on my board.

After updating the camera and model in run_demo.sh, I run the demo and get the following issue:

root@j722s-evm:/opt/app/edgeai-demo-monodepth-estimation# ./run_demo.sh 
libtidl_onnxrt_EP loaded 0x182d3720 
Final number of subgraphs created are : 1, - Offloaded Nodes - 84, Total Nodes - 84 
APP: Init ... !!!
   107.149544 s: MEM: Init ... !!!
   107.149634 s: MEM: Initialized DMA HEAP (fd=5) !!!
   107.149851 s: MEM: Init ... Done !!!
   107.149887 s: IPC: Init ... !!!
   107.205711 s: IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
   107.213742 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
   107.213934 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
   107.213957 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
   107.213981 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
   107.215012 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-0 
   107.215449 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-1 
   107.215991 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-2 
   107.216467 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-3 
   107.216517 s:  VX_ZONE_INFO: [tivxInitLocal:202] Initialization Done !!!
   107.216609 s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
Calculating output tensor dimensions and offsets...
model datatype : float32
caps: video/x-raw, width=1920, height=1080, format=RGB, framerate=0/1
Parsing GST pipeline: 
input: v4l2src device=/dev/video-usb-cam0  ! image/jpeg,width=640,height=480 ! jpegdec ! tiovxdlcolorconvert  !  video/x-raw, format=NV12 ! queue leaky=2 max-size-buffers=2  ! tiovxmultiscaler name=split_resize     split_resize.   ! video/x-raw, width=246, height=246, format=NV12   ! tiovxdlpreproc out-pool-size=4 data-type=float32   channel-order=nchw tensor-format=rgb  mean-0=0.000000 mean-1=0.000000 mean-2=0.000000 scale-0=0.003922 scale-1=0.003922 scale-2=0.003922  ! application/x-tensor-tiovx  ! appsink name=tensor_in max-buffers=2 drop=True    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1536, height=864, format=NV12 ! tiovxdlcolorconvert out-pool-size=4 ! video/x-raw, format=RGB ! appsink name=image_in max-buffers=2 drop=True

output:  appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1920, height=1080  ! queue ! tiovxdlcolorconvert out-pool-size=4  ! video/x-raw, format=NV12  !  tiperfoverlay main-title=""  ! kmssink sync=false driver-name=tidss  plane-id=31 force-modesetting=True

Starting with in_gst: 
v4l2src device=/dev/video-usb-cam0  ! image/jpeg,width=640,height=480 ! jpegdec ! tiovxdlcolorconvert  !  video/x-raw, format=NV12 ! queue leaky=2 max-size-buffers=2  ! tiovxmultiscaler name=split_resize     split_resize.   ! video/x-raw, width=246, height=246, format=NV12   ! tiovxdlpreproc out-pool-size=4 data-type=float32   channel-order=nchw tensor-format=rgb  mean-0=0.000000 mean-1=0.000000 mean-2=0.000000 scale-0=0.003922 scale-1=0.003922 scale-2=0.003922  ! application/x-tensor-tiovx  ! appsink name=tensor_in max-buffers=2 drop=True    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1536, height=864, format=NV12 ! tiovxdlcolorconvert out-pool-size=4 ! video/x-raw, format=RGB ! appsink name=image_in max-buffers=2 drop=True


out gst:  appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1920, height=1080  ! queue ! tiovxdlcolorconvert out-pool-size=4  ! video/x-raw, format=NV12  !  tiperfoverlay main-title=""  ! kmssink sync=false driver-name=tidss  plane-id=31 force-modesetting=True
Starting GST pipeline
pull buffers
pull buffers
pull buffers
pull buffers
pull buffers
pull buffers
pull buffers
pull buffers
...

The camera LED does not turn on. After doing some debugging of the GStreamer commands, I found that the pipeline does not start and crashes right after:

! tiovxmultiscaler name=split_resize     split_resize.

I would appreciate any advice on what I should check next.

0 Reese Grimsley 14 days ago in reply to Andow Roberts

TI__Genius 16756 points

Hi Adam,

Yes, I see the problem. Your camera is 720p, but some of the downscaling sizes are using larger resolutions

split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1536, height=864, format=NV12

The tiovxmultiscaler plugin (and corresponding "MSC" hardware accelerator block) will only downscale images. No upscaling is supported in this hardware.

That portion of the pipeline-generation would need to be edited to handle this smaller input resolution. There may be a similar aspect for the output visualization. I see the data from the appsrc plugin is expected to be 1080p

appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1920, height=1080

This demo-ware application does not handle as many permutations of input/output sizes as edgeai-gst-apps, but is also not quite so complex in the pipeline-generation logic

BR,
Reese

0 Andow Roberts 14 days ago in reply to Reese Grimsley

Prodigy 30 points

Thanks for the reply, I fixed the settings for pipeline and everything seems to work but there is no video on the display...

root@j722s-evm:/opt/app/edgeai-demo-monodepth-estimation# ./run_demo.sh 
libtidl_onnxrt_EP loaded 0x25954800 
Final number of subgraphs created are : 1, - Offloaded Nodes - 84, Total Nodes - 84 
APP: Init ... !!!
 25671.558889 s: MEM: Init ... !!!
 25671.558968 s: MEM: Initialized DMA HEAP (fd=5) !!!
 25671.559181 s: MEM: Init ... Done !!!
 25671.559215 s: IPC: Init ... !!!
 25671.618936 s: IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
 25671.630651 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
 25671.630877 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
 25671.630894 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
 25671.630903 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
 25671.631824 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-0 
 25671.632152 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-1 
 25671.632387 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-2 
 25671.632627 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-3 
 25671.632657 s:  VX_ZONE_INFO: [tivxInitLocal:202] Initialization Done !!!
 25671.632720 s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
Calculating output tensor dimensions and offsets...
model datatype : float32
caps: video/x-raw, width=1280, height=720, format=RGB, framerate=0/1
Parsing GST pipeline: 
input: v4l2src device=/dev/video-usb-cam0  ! image/jpeg,width=1024,height=768 ! jpegdec ! tiovxdlcolorconvert  !  video/x-raw, format=NV12 ! queue leaky=2 max-size-buffers=2  ! tiovxmultiscaler name=split_resize     split_resize.   ! video/x-raw, width=328, height=246, format=NV12  ! tiovxmultiscaler target=1 ! queue leaky=2 max-size-buffers=2 ! video/x-raw, width=246, height=246  ! tiovxdlpreproc out-pool-size=4 data-type=float32   channel-order=nchw tensor-format=rgb  mean-0=0.000000 mean-1=0.000000 mean-2=0.000000 scale-0=0.003922 scale-1=0.003922 scale-2=0.003922  ! application/x-tensor-tiovx  ! appsink name=tensor_in max-buffers=2 drop=True    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1024, height=576, format=NV12 ! tiovxdlcolorconvert out-pool-size=4 ! video/x-raw, format=RGB ! appsink name=image_in max-buffers=2 drop=True

output:  appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1280, height=720  ! queue ! tiovxdlcolorconvert out-pool-size=4  ! video/x-raw, format=NV12  !  tiperfoverlay main-title=""  ! kmssink sync=false driver-name=tidss  plane-id=31 force-modesetting=True

Starting with in_gst: 
v4l2src device=/dev/video-usb-cam0  ! image/jpeg,width=1024,height=768 ! jpegdec ! tiovxdlcolorconvert  !  video/x-raw, format=NV12 ! queue leaky=2 max-size-buffers=2  ! tiovxmultiscaler name=split_resize     split_resize.   ! video/x-raw, width=328, height=246, format=NV12  ! tiovxmultiscaler target=1 ! queue leaky=2 max-size-buffers=2 ! video/x-raw, width=246, height=246  ! tiovxdlpreproc out-pool-size=4 data-type=float32   channel-order=nchw tensor-format=rgb  mean-0=0.000000 mean-1=0.000000 mean-2=0.000000 scale-0=0.003922 scale-1=0.003922 scale-2=0.003922  ! application/x-tensor-tiovx  ! appsink name=tensor_in max-buffers=2 drop=True    split_resize. ! queue leaky=2 max-size-buffers=2  ! video/x-raw, width=1024, height=576, format=NV12 ! tiovxdlcolorconvert out-pool-size=4 ! video/x-raw, format=RGB ! appsink name=image_in max-buffers=2 drop=True


out gst:  appsrc format=GST_FORMAT_TIME is-live=true  name=out ! video/x-raw,  format=RGB, width=1280, height=720  ! queue ! tiovxdlcolorconvert out-pool-size=4  ! video/x-raw, format=NV12  !  tiperfoverlay main-title=""  ! kmssink sync=false driver-name=tidss  plane-id=31 force-modesetting=True
Starting GST pipeline
pull buffers
got GST buffers in app code
0.06637763977050781
pull buffers
got GST buffers in app code
0.03599214553833008
pull buffers
got GST buffers in app code
0.03386831283569336
pull buffers
got GST buffers in app code
0.03460121154785156
pull buffers
got GST buffers in app code
0.04039931297302246
pull buffers
got GST buffers in app code
0.04718732833862305
pull buffers
got GST buffers in app code
0.05154824256896973
pull buffers
got GST buffers in app code
0.048491477966308594
pull buffers
got GST buffers in app code
0.05122661590576172
pull buffers
got GST buffers in app code
0.04781746864318848
pull buffers
got GST buffers in app code
0.051619529724121094
pull buffers
got GST buffers in app code
0.04768490791320801
pull buffers
got GST buffers in app code
0.05287361145019531
pull buffers
got GST buffers in app code
0.04698657989501953
pull buffers
got GST buffers in app code
0.05157828330993652
pull buffers
got GST buffers in app code
0.04773426055908203
pull buffers
got GST buffers in app code
0.05212545394897461
pull buffers
got GST buffers in app code
0.047638893127441406
pull buffers
got GST buffers in app code
0.05172157287597656
pull buffers
got GST buffers in app code
0.04822945594787598
pull buffers
got GST buffers in app code
0.05141901969909668
pull buffers
got GST buffers in app code
0.047869205474853516
pull buffers
got GST buffers in app code
0.05159640312194824
pull buffers
got GST buffers in app code
0.048233985900878906
pull buffers
got GST buffers in app code
0.051508188247680664
pull buffers
got GST buffers in app code
0.04786062240600586
pull buffers
^CKB shortcut caught

Ran 26 frames
**** Runtime Stats ****
---- Pull input time (ms): avg 19 +- 5 (min to max: 3 to 26)
---- infer time (ms): avg 5
---- Output (draw, post-proc) time (ms): avg 20 +- 1
---- FPS: 20.73
-----------------------

True
paused pipe; waiting gst thread to join
exiting...

0 Andow Roberts 13 days ago in reply to Andow Roberts

Prodigy 30 points

Okay, I fixed the display and it seems to work correctly, but the camera doesn't respond to objects. Everything is always blue.
Is this a problem with the model compilation?
And is there a way to check the model is correct other than running it in the app?

+1 Reese Grimsley 13 days ago in reply to Andow Roberts

TI__Genius 16756 points

Hello,

You may want to dump the output tensor into a file or add print statements to look at the distribution of data. If it is all the same output value, then that is of course an error. What was your approach for compiling the model? I would suggest returning to that script and look at the output data distribution on static images.

Usually these types of models with disparity output need to normalize the output so that there is a better range of data. If you're using the demo code here (looks like it), then that's handled at [3]. Maybe worth printing out the distribution of data

Another aspect is that some models are particularly sensitive to 8-bit quantization. Continuous valued outputs like depth/disparity maps are impacted by this in particular, but also applies to things like bounding boxes / keypoint pixel locations. To solve these situations, we typically use a mixture of 8-bit and 16-bit. Usually, 16-bit layers are only needed for the last couple of layers. For the midas model, it looks like there are two layers that we've set that option for [1][2]

[1] https://github.com/TexasInstruments/edgeai-tensorlab/blob/3de61dfa503c408346c3bcd029f49a25e42a8a73/edgeai-benchmark/configs/depth_estimation.py#L71 -- see the output_feature_16bit_names_list which has a comma-separated list of layer names

[2] https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/README.md#advanced-options-for-accuracy-enhancement -- reference for the mixed-precision settings

[3] https://github.com/TexasInstruments-Sandbox/edgeai-demo-monodepth-estimation/blob/b5dd792e45d4ead4a0c2f4b089477906519ad085/display.py#L165 -- postprocessing for the depth map

BR,
Reese

0 Andow Roberts 13 days ago in reply to Reese Grimsley

Prodigy 30 points

Hi, it turned out that there was an mistake in OSRT config for the model. Here is the config that gives this result:

    "de-7310_midas-small_onnx": create_model_config(
        task_type="depth_estimation",
        source=dict(
            model_url="http://software-dl.ti.com/jacinto7/esd/modelzoo/11_00_00/models/vision/depth_estimation/nyudepthv2/midas/midas-small.onnx",
            infer_shape=True,
        ),
        preprocess=dict(
            resize=[256, 256], 
            crop=[256, 256], 
            data_layout="NCHW",
            pad_color=0,
            resize_with_pad=False,
            reverse_channels=False, 
            add_flip_image=False,
        ),
        session=dict(
            session_name="onnxrt",
            model_path=os.path.join(models_base_path, "midas-small.onnx"),
            target_device="AM67A", 
            input_mean=[123.675, 116.28, 103.53],
            input_scale=[0.017125, 0.017507, 0.017429], 
            input_optimization=False, 
        ),
        postprocess=dict(
            with_argmax=False, 
        ),
        extra_info=dict(
            num_images=numImages, 
            num_classes=1,
        ),
        runtime_options = {
            'advanced_options:output_feature_16bit_names_list' : [511, 983],
        },
    )

Can you review it? maybe I missed something...
I also noticed that sometimes there are chaotic changes in the distance on dark objects, is there any way to increase the stability of the estimate?

Processors

Processors forum

AM67A: Monodepth Estimation