SK-TDA4VM: Visual Localization with Arducam Stereo Setup

Aditya Varadaraj

Part Number: SK-TDA4VM
Other Parts Discussed in Thread: TDA4VM

Hi,

We plan to modify the existing Visual localization demo code for our stereo ARDUCAM setup for warehouse purposes on TDA4VM. We have a few questions related to that.

Can you please briefly explain how to get these three files that are required for a 3D sparse map
1. Voxel info binary file
2. Map 3D keypoints binary file
3. Map 3D keypoints descriptor file
What 3D keypoint and descriptor network was used to extract 64 dimensional feature descriptors for 3D sparse map creation?
Is the DKAZE network (model for 2d keypoint and descriptor) trained on CARLA simulator dataset and would it work for real warehouse data? If not, Could you share the model so that we can train the model with our warehouse data?
Also, can we use a different keypoint and descriptor deep learning model instead of DKAZE? If yes, how do we use custom models with HW accelerators?

Thanks

over 1 year ago

0 Do-Kyoung Kwon over 1 year ago

TI__Expert 3040 points

Hello Aditya,

1. These 3 files are the outputs of mapping. Please note that 3D Lidar data and image are used for mapping and Calra simulator was used for the demo.

2. Not sure if I understand this question correctly. DKAZE network outputs 2D key points along with descriptors. The 3D position of each key point is from the corresponding 3D Lidar point. Camera and 3D Lidar should be calibrated.

3. I do not think DKAZE trained using Carla simulator works for real warehouse. The DKAZE network and the demo was mainly developed for the proof of concept. Since we haven't tested it in real world, we do not recommend to use it for real-world use cases.

4. I'd liked to recommend to use public network, e.g., SuperPoint. The trained network should be compiled for C7x/MMA. Please refer to GitHub - TexasInstruments/edgeai-tidl-tools: Edgeai TIDL Tools and Examples - This repository contains Tools and example developed for Deep learning runtime (DLRT) offering provided by TI’s edge AI solutions .

Best regards,

Do-Kyoung

0 Aditya Varadaraj over 1 year ago in reply to Do-Kyoung Kwon

Prodigy 100 points

Hi!

Thanks for the response. Could you please clarify few more questions

1. Could you please also confirm if it is a 3D descriptor (from point cloud) to 2D descriptor matching (from DKAZE on images) or in other words, if the match between the 3D keypoint descriptors from point cloud of corresponding 3D keypoints of 3D sparse map and the 2D keypoint descriptors of the image from the DKAZE network is used for localization?

2. Could you please clarify few things related to 3d sparse map files especially 3D keypoints binary file, 3D descriptor binary file? From my understanding, it seems to be that point cloud from LIDAR was obtained and a separate network was used for obtaining 3D keypoints and their corresponding 3D descriptors of 64 dimension from the LIDAR data. Could you please let me know if it was a deep learning network (if yes, which model) or a classical CV keypoint and descriptor algorithm that was used for this 3D keypoint and descriptor for 3D sparse map files?

0 Do-Kyoung Kwon over 1 year ago in reply to Aditya Varadaraj

TI__Expert 3040 points

Hi Aditya,

Let me summarize the mapping and localization process in this demon. Please note that two processes are quite different in terms of sensors and algorithms used.

1. Mapping

Sensors: IMU (for ego vehicle's pose estimation), 3D LIDAR, camera.
These sensors should be calibrated. After calibration, we are able to find the corresponding 3D point from LIDAR for the feature point in image. (It is not always true though since 3D LIDAR is sparse. There could be feature points in image, for which we cannot find exactly matching 3D points. Such points won't be added to the map or interpolation technique can be used)
DKAZE network will provide feature points and their descriptors
- Other DL networks or CV based methods can be used instead of DKAZE
So the map will have a bunch of points in it. Each point has the following information: X, Y, Z (3D coordinate) and 64-D descriptor

2. Localization

Sensors: IMU, camera
- 3D LIDAR is not used
- In localization, IMU is optional. But it should help improve localization accuracy.
DKAZE network will provide feature points and their descriptor. Each point has x, y (image coordinate) and 64-D descriptor
- The same feature detection method should be used in mapping and localization. Otherwise, for each point we cannot find the corresponding points from the map.
Localization
- For the detected feature points in the image, we find the corresponding points in the map by comparing 64-D descriptors
- Using the matched feature points, localize the camera (i.e. ego vehicle)
  - We have multiple pairs of (x, y) - (X, Y, Z)
  - We can localize by solving PnP (Perspective-n-Point)

Best regards,

Do-Kyoung

Processors

Processors forum

SK-TDA4VM: Visual Localization with Arducam Stereo Setup