Precision mapping through the stereo vision and geometric transformations in unknown environments

Petrakis, Georgios; Antonopoulos, Angelos; Tripolitsiotis, Achilles; Trigkakis, Dimitris; Partsinevelos, Panagiotis

doi:10.1007/s12145-023-00972-2

Precision mapping through the stereo vision and geometric transformations in unknown environments

METHODOLOGY
Published: 28 February 2023

Volume 16, pages 1849–1865, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Earth Science Informatics Aims and scope Submit manuscript

Precision mapping through the stereo vision and geometric transformations in unknown environments

Download PDF

Georgios Petrakis¹,
Angelos Antonopoulos¹,
Achilles Tripolitsiotis²,
Dimitris Trigkakis¹ &
…
Panagiotis Partsinevelos¹

246 Accesses
2 Citations
Explore all metrics

Abstract

Geodetic techniques for surveying and rapid mapping need to be revisited due to the present progress on satellite, sensor, and geospatial technologies. Conventional surveying methods provide high level of accuracy but require significant human involvement in the field while GNSS (Global Navigation Satellite System) positioning method, provides unsatisfactory accuracy in urban or high vegetated areas due to the degraded GNSS signal coverage. In this study, an alternative surveying method is proposed which facilitates the process of characteristic point localization, using stereoscopic vision and at least one visual marker. At first, the camera system localizes itself and maps the study area using stereo SLAM (Simultaneous Localization and Mapping) algorithm while subsequently detects the visual markers (origin and targets), placed in the area. Afterwards, using a multi-view geometry method for the marker localization and an optimization algorithm for origin marker’s plane alignment, the system is able to export the coordinates of the markers and a point cloud (provided by SLAM) in a local coordinate system based on the origin marker’s pose and location. The study involves both terrestrial and unmanned aerial vehicle platforms that may carry the proposed equipment. An extensive set of indoor and outdoor, terrestrial and UAV experiments validates the methodology which succeeds a horizontal and vertical error in a level of 10 cm or better. To the best of our knowledge this study proposes the first surveying alternative which requires only a stereo camera and at least one visual marker in order to localize specific and arbitrary points in a centimeter level of accuracy. The proposed methodology, demonstrates that the use of low-cost equipment instead of the costly and complicated surveying equipment, may prove sufficient to produce an accurate 3D map of the scene in an unknown environment.

A Low-Cost MMS Approach to the Simultaneous Localization and Mapping Problem

Accuracy of Unmanned Aerial Systems Photogrammetry and Structure from Motion in Surveying and Mapping: A Review

Article 18 April 2021

Visual information assisted UAV positioning using priori remote-sensing information

Article 08 August 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Geospatial technology, based on higher geodesy, remote sensing and geographical information science has change the way that scientists, engineers and citizens study or interact with their environment. This fact results in fundamental advances in various topics of geomatics such as location-based applications, spatial data infrastructures, navigation or geodetic equipment. Nevertheless, conventional surveying, although is the most accurate and robust method of applied geodeysy, it remains a time consuming process with significant human effort (Carrera-Hernández et al. 2020). On the other hand, GNSS (Global Navigation Satellite System) positioning method, provides unsatisfactory accuracy in urban or high vegetated areas due to the degraded GNSS signal coverage (Chiang et al. 2019).

Specifically, traditional and modern surveying methods are not always complementary, since in many cases, the use of total stations is mandatory and cannot be substituted with GNSS receivers, while in other cases traditional surveying is prohibitive. However, most surveying procedures including laser scanning involve costly equipment while the necessity of a cost-effective surveying alternative in GNSS-degraded environments remains a critical unresolved issue (Trigkakis et al. 2020).

In the initial research of this study (Trigkakis et al. 2020), alternative solutions for positioning in GNSS-degraded areas are presented. Some approaches involve the improvement of signal with respect to independent system analysis (Panigrahi et al. 2015) while other methodologies propose alternative techniques including angle approximation (Tang et al. 2015), shadow matching (Urzua et al. 2017), multipath estimations using 3D models (Zahran et al 2018) and statistical models (Romero-Ramirez et al. 2018; Partsinevelos et al. 2020). Various studies make use of high-resolution aerial or terrestrial images and statistical / machine learning algorithms in order to georeference, map and detect dynamic patterns (Zahran et al. 2018; Jende et al. 2018), while the use of simultaneous localization and mapping (SLAM) algorithms combined with complementary methods from Photogrammetry and / or GNSS / INS (Inertal Navigation System) for localization in GNSS denied environments is signified in the literature (Bobbe et al. 2017; Gabrlik 2015; Helgesen et al. 2019). Under the same perspective, some studies use monocular SLAM approaches and attempt to solve the inherent problem of scale estimation by using barometers, altimeters and landmarks (Urzua et al. 2017; Kuroswiski et al. 2018) or by utilizing orientation (AHRS) and position sensors (GPS) (Munguía et al. 2016). In Lichao Xu et al. (2019), a localization method for indoor environments is presented, which is able to recognize pre-defined markers with centimeter level of accuracy utilizing RGB-Depth ORB-SLAM2 algorithm.

Several studies have been conducted for accurate and / or rapid mapping with the use of mobile mapping systems (MMS), Photogrammetry and image processing techniques. In Kalacska et al. (2020), authors follow the approach of Structure-from-Motion (SfM) with multi-view stereo technique of Photogrammetry to produce ortho-images and 3D surfaces without the use of ground control points (GCPs) using UAVs equipped with GNSS receivers and optical sensors. In Pinto and Matos (2020), densely 3D information in underwater environments is constructed through the fusion of multiple light stripe range (LSR) and photometric stereo (PS) methods outperforming the corresponding conventional methods in terms of accuracy while in Bañón et al. (2019), aerial images and ground control points (GCPs) are used in order to produce a 3D model in a coastal region through SfM. The characteristic points are measured using a GPS receiver for the validation of the methodology with a vertical RMSE error of 0.12 m. Tomaštík et al. (2017), evaluate the positional accuracy of forest rapid—mapping, using point cloud data created by UAV images and the Agisoft software with an accuracy level of 20 cm.

Various studies are referred to localization and detection methods employing MMS equipped with stereo sensors. Haque et al. (2020), propose an unmanned aerial system (UAS) which is able to find its location in a 3D CAD model of a pre-defined environment. The UAS with a stereo-depth camera, maps the area using OrbSLAM2 algorithm (Mur-Artal and Tardós 2017), detects and extracts vector features with the aid of a convolutional neural network (CNN) and rectifies its location comparing the SLAM mapping area with the 3D CAD model. In Li et al. (2017), authors propose a pose estimation methodology based on mobile accelerometers, visual markers and stereo vision fusion, achieving a centimeter level of accuracy while in (Vrba and Saska 2020; Vrba et al. 2019), a methodology that detects a micro aerial vehicle (MAV) is proposed, utilizing machine learning techniques and an RGB-Stereo depth camera with an average RMS error of 2.86 m. In Zhang et al. (2019), a real-time obstacle avoidance method is developed with the aid of a stereo camera, a GNSS receiver and an embedded system mounted on a UAV in order to detect obstacles and follow an alternative, obstacle-free path. In Ma et al. (2021), authors utilize a UAV with two cameras and a GNSS receiver in order to detect and geographically localize insulators in power transmission lines based on the bounding box of the detected insulators. Moreover, stereo-depth cameras have been used in UAVs for autonomous landing in GNSS-denied environments, where a UAV is able to detect, locate and land on an unmanned ground vehicle (UGV) making use of information from a multi-camera system and deep learning algorithms (Yang et al. 2018; Animesh et al. 2019).

As referred above, the literature abounds of positioning methodologies for GNSS-denied areas, rapid mapping solutions using photogrammetric techniques or localization systems based on SLAM and detection. Although most of the studies propose alternative localization solutions, none of them focus on surveying or traditional topography combined with computer vision and multi-view geometry algorithms.

In the monocular approach of the present methodology (Trigkakis et al. 2020) an implementation based on SLAM, point cloud and image processing techniques, localizes characteristic points in a local coordinate system utilizing only a monocular camera attached on a UAV in combination with a visual marker. Although the main issue of the monocular setup approaches is the scale estimation (Sahoo et al. 2021), the proposed methodology controlled this issue by using the “Multiple convergence” method achieving an accuracy level of 50 cm (Trigkakis et al. 2020).

In this study, the methodology is further extended using a stereo camera instead of a single sensor and validated conducting various indoor and outdoor, UAV and terrestrial experiments. More specifically, the extended methodology takes advantage of a stereo camera and a visual marker, and is capable to map an unknown area, providing refined estimations of point coordinates in a local 3D coordinate system fusing stereo SLAM, image processing techniques and coordinate system transformations. The main objective of this study, is to propose a surveying or rapid-mapping alternative with an accuracy level of 10 cm or better, using conventional components, while supporting the use of a UAV. Although the present study validates the methodology in terms of localization accuracy in a local coordinate system, the use and connection with a global coordinate reference system such as WGS-84 is quite possible. The methodology can be employed in urban environments or dense-canopy areas where the GNSS signal is degraded, in emergency situations (Chuang 2018) where there is a need for damage assessment (Ampadu et al. 2020; Lassila 2018) and / or in search and rescue applications (Mishra et al. 2020; McRae et al. 2019). The proposed methodology is a cost-effective, rapid and efficient surveying solution where a few minutes of flying and processing are sufficient to map an area of interest and extract the coordinates of the characteristic points without limitations related with steeply slope terrains or occluded areas.

To the best of our knowledge, there is no similar solution that makes use of a visual SLAM algorithm, a stereo camera and a visual marker in order to provide a 3D local coordinate system in a 10 cm level of accuracy. Unlike the similar localization methods, the coordinate estimations were not extracted in a software-based reference system but in a reference system which is well-defined in the scene. The main contribution of the study are as follows:

An alternative surveying solution was developed using stereo-SLAM, multi-view geometry and coordinate system transformations.
The methodology can be performed with minimum and cost-effective equipment since a stereo-camera and at least one visual marker are enough to map an unknown environment localizing characteristic points in a 3D local coordinate system.
All coordinate estimations are transferred and exported in a local reference system which is well-defined in the scene, using the plane and the pose of a visual marker.
The proposed solution provides an accuracy in a level of 10 cm or better, a significant improvement compared with the monocular approach.

In the following sections the core system, the equipment and the implementation are presented, while in Section 3 an extensive set of experiments and results validate the presented methodology and demonstrate that can be used as an alternative surveying solution. Sections 4 and 5 discuss the results and conclusions of the proposed methodology.

Materials and methods

The main goal of this study is the localization of visual markers and characteristic points of the scene, providing their local coordinates in 3D space under a high level of accuracy using minimal equipment. The presented methodology maps the area of interest, by extracting the pose estimation of pre-defined visual markers and a point cloud in a local coordinate system using stereo vision. At first, the visual markers are placed in the scene; the origin marker defines the reference system of the coordinate estimations while the target markers are represent the characteristic points or features. Subsequently, a SLAM algorithm enables the stereo camera to map the desired area and localize itself in an unknown environment (Mur-Artal and Tardós 2017), while in combination with image and geometrical processing, the present methodology estimates the coordinates of target markers and an arbitrary point cloud which approximates the structure of the environment, allowing additional measurements in the local coordinate system of the scene.

System architecture

The system architecture is presented in the following figure (Fig. 1).

As presented in Fig. 11, the processing levels in the procedure are separated in three stages. Initially, a video is captured by a stereo camera creating a bag file (http://wiki.ros.org/Bags). After the recording process, scripts in Python language extract stereo imaging data using ROS (Robot Operating System, https://www.ros.org/), obtain calibration information utilizing camera’s factory settings while separate and store the imaging data per sensor.

Subsequently, the SLAM uses ORB (Rublee et al. 2011) algorithm in order to extract the interest keypoints οf images and the local descriptors which aid the system to recognize the features from multiple angles and distances. The SLAM algorithm extracts the ORB features from both images (left and right) while for each ORB feature of the left image, detects the corresponding ORB feature of the right image. The coordinates of stereo interest keypoints are defined as (uL,υL, uR) (Mur-Artal and Tardós 2017) where the first two coordinates (uL,υL) are the horizontal and vertical coordinate of the the left image while the third one is the horizontal coordinate of the right image. Afterwards, the system, using the internal parameters of the camera and the information of the extracted features, predicts the position and orientation of the camera (pose), while if it observes groups of features in multiple sequential frames, it stores a keyframe.

Based on the process above, the SLAM algorithm outputs multiple keyframes which are treated as landmarks since, in combination with the keypoints, are necessary for the local mapping, the loop closure detection and for the re-localization of the camera. For the optimization of the camera’s pose prediction, local mapping and loop closure detection, SLAM algorithm utilizes the bundle adjustment (BA) algorithm using the Levenberg–Marquardt method (Mur-Artal and Tardós 2017).

After the end of the SLAM process, it outputs a point cloud and a trajectory of the scene while traditional image processing techniques such as adaptive and Otsu thresholding (Otsu 1979) provide the identifications of target markers. Subsequently, through the multi-line convergence method (Trigkakis et al. 2020), the locations of the markers are estimated while the pose of the origin marker is optimized with the utilization of plane alignment method (Trigkakis et al. 2020). Finally, the coordinate estimations are transferred in a local coordinate system, defined by the pose of the origin marker (see Section 2.2). After the end of the process, the resulted 3D scene with the point cloud, the camera trajectory and the marker estimations, are graphically presented through the visualization module (see Section 2.3).

The study’s methodology performs SLAM processing based on ORB-SLAM2 (Mur-Artal and Tardós 2017) algorithm making use of two infrared sensors, while ArUco library (Romero-Ramirez et al. 2018; Garrido-Jurado et al. 2016) is utilized to represent origin and target markers. ArUco markers, are synthetic square markers defined by a binary matrix (black and white) with a black border and a specific identifier (id), meaning that different markers are given different identities.

The ORB-SLAM2 algorithm was selected due to its robustness over several state-of-the-art SLAM alternative solutions such as LDSO (Gao et al. 2018), openVINS (Geneva et al. 2020), VINS-fusion (Qin et al. 2019), Maplab (Schneider et al. 2018), Basalt (Usenko et al. 2019), Kimera (Rosinol et al. 2020) and open-VSLAM (Sumikura et al. 2019). In Sharafutdinov et al. (2021), authors compare the above SLAM alternatives using the ablsolute pose error metric in position and rotation and quite popular datasets in robotics such as EuRoC MAV (Burri et al. 2016), OpenLoris-Scene (Shi et al. 2020) and KITTI (Geiger et al. 2012). The results prove that ORB-SLAM2 and openVSLAM achieved the highest overall accuracy. Moreover, in (Giubilato et al. 2018), authors compare stereo visual SLAM algorithms for planetary rovers proving the superiority of ORB-SLAM2 against the S-PTAM (Pire et al. 2015), LibVISO2 (Geiger et al. 2011), RTAB-MAP (Labb and Michaud 2013) and ZED-VO (the proprietary software by ZED development toolkit, https://www.stereolabs.com/developers/release/).

ORB-SLAM2 in stereo mode provides a real-world scale that is given in meters due to the known camera baseline between the two sensors instead of the monocular solution in the previous version of this study (Trigkakis et al. 2020) where the scale was calculated mathematically.

Functionality

A core component of the methodology for the final coordinate estimations and 3D scene reconstruction is the coordinate system definition. The first coordinate system is defined and established by ORB-SLAM2 using the first frame of the captured video. The x and y axes in this initial coordinate system, follow the right and top directions of the frame respectively while the z axis is equivalent with the camera direction towards the landscape of the area. Subsequently, the calibration data and the camera pose (retrieved by camera trajectory information) along with the target marker coordinates which are calculated by OpenCV (https://opencv.org/) algorithms, extract the vectors of rotation and translation that are utilized in transformation of the initial reference system to the camera reference system. Finally, the reference system definition module, calculates the translation vector and rotation matrix from the orientation and translation of the origin marker and defines the final reference system based on the origin marker’s pose. The x and y axes of the marker reference system follow the right and top direction of the marker while the z axis follows the zenith direction as depicted in Fig. 2 below.

Concerning the marker pose estimates, the Multiple Line Convergence (M.L.C.) method (Trigkakis et al. 2020) was implemented. M.L.C. is a method for the marker location definition that is based on the observation that the extended line segments which connect each pose estimate with corresponding camera position, converge in an area that corresponds to the location of the marker in the 3D scene. The method defines the optimized point that the extended line segments converge, using pseudo-inverse least squares optimization (Samuel 2004; Eldén 1982). The method can be described by the following equation:

$$\mathrm{p }= {\mathrm{S}}^{+} \cdot \mathrm{ C}$$

(1)

where p is the minimized distance of the theoretical convergence point from all the lines while S + is the pseudo-inverse matrix of S which is defined in Eq. 2. C is defined in Eq. 3.

$$S =\sum_{i}\left[{n}_{i}{n}_{i}^{T} - I\right]$$

(2)

$$C=\sum_{i}\left[{n}_{i}{n}_{i}^{T} - I\right]{a}_{i}$$

(3)

where each line is defined with “i”, “αi” is the starting point of line “i” and “ni” is the direction of line i while “I” is an identity matrix.

Subsequently, Plane alignment method (P.A.) (Trigkakis et al. 2020) is performed to correct the translation and rotation errors of the origin marker that defines the final coordinate system of the scene. This step is important because any pose estimation error in the origin marker is transferred in every target marker and point cloud data of the scene. With the P.A. method, the pose and rotation of the origin marker is corrected leading to reliable measurements and an accurate definition of the origin coordinate system.

Visualization module

The visualization of the results is a requirement during the implementation, testing and experimentation phases of the methodology since visualizing the data in a 3D scene is important for tasks that require metric information from the corresponding real-world scene. Beyond the visualization capabilities that the module provides through the processing, it can be used in offline mode with a built-in user interface, allowing the navigation and interaction within the visual scene. More specifically, it is able to render the entire scene containing the marker estimations, the camera trajectory and the point cloud, while the vectors of the camera trajectory pose and the detection line segments of the camera to the markers’ center are also depicted in the 3D scene (Fig. 3).

From a more practical perspective, the user interface supports point selection through a cursor extracting the corresponding coordinate estimations on the screen, while providing a succinct overview of the camera trajectory reproducing the path of the camera through animation.

Figure 3 depicts the user interface (UI) of the visualization module in which a part of a 3d scene is visualized. All the features that are displayed in the figure are located in the local reference system based on the origin marker. The poses (translation and orientation) of the camera are depicted using the line segments (in green, Fig. 3) which follow the trajectory of the camera while the marker detections of the camera are visualized as red lines which connect each camera position with center of the detected marker. It’s worth mentioning that the center of markers is placed on the convergence point of the lines based on the multi-line convergence method. Finally, the user is able to select a point from the point cloud that are visualized in green color, exporting on the top-right of the screen the coordinates from the initial and final coordinate system while if more points are selected, the module exports the corresponding mean values of the selected point.

Equipment setup

The main equipment components include a stereo camera, the aruco markers, a conventional computing system and a UAV. Concerning the stereo-camera, the Intel Realsense D435 stereo-depth camera was used which includes two infrared sensors (left and right), a color sensor and an infrared projector for the depth information. In the present study, only the two infrared sensors were used. The resolution of the camera sensors is 1280 × 800, the sensor aspect ratio is 8:5, the focal length 1.93 mm, while the format is 10-bit RAW.

The UAV that was used in the present study is a custom-made hexacopter (Fig. 4a). The frame as well as the propellers, are made of carbon fiber while the Flight Control Unit (FCU) is a Pixhawk 2 Cube. The UAV is designed to be used with a companion embedded computer, a Raspberry Pi 4 module with 8 GB of RAM running at an overclocked rate of 2.3 GHz (Fig. 4b). The embedded computer interfaces with both the FCU as well as the Realsense camera (Fig. 4c).

The origin and the target markers have a size of 30 × 30 cm while they are installed in a custom-made adjustable stand. This stand is able to stabilize the marker pose in a horizontal reference plane with the aid of two stainless steel threaded rods and a leveler (Fig. 5a).

For validation purposes and ground-truth measurements, a Topcon GPT 3000 geodetic total station was used with ± (3 mm + 2 ppm × D) mean square error (m.s.e) measurement accuracy where D is the measured distance between the total station and the prism (Fig. 5b).

Results

To validate the present methodology an extended set of experiments was performed under different conditions relating to the study area, the arrangement of markers on the ground and the use or not of a UAV.

For the evaluation process, a geodetic total station was utilized in order to measure the reference coordinates of the visual markers and several characteristic points. The origin of the local coordinate system was defined using the center of the origin marker with the coordinates X = 0, Y = 0 and Z = 0. It’s worth mentioning that the videos were recorded at 90fps using 848 × 480 resolution.

For the evaluation of the experiments the absolute error ($\left|{\mathrm{X}}_{\mathrm{meas}}-{\mathrm{X}}_{\mathrm{est}}\right|$) between the measured coordinates of X, Y, Z and the corresponding estimations is used while the horizontal error ($\sqrt{{\mathrm{X}}_{\mathrm{err}}^{2}+{\mathrm{Y}}_{\mathrm{err}}^{2}}$) is also calculated. The experiments below are separated in two main sections: terrestrial and UAV experiments.

In each experiment, a visual marker which represents the origin of local coordinate system and one or three markers which represent the targets are located to the scene and measured with the total station for ground truth information. Afterwards, the stereo camera, stand-alone or attached on the UAV is guided through a desired trajectory path in order to identify the markers and maps the surroundings.

The experiments were designed aiming to simulate a real-case scenario of surveying a plot or a field in which traditional land surveying techniques and equipment are utilized. More specifically, the main field-work of a surveyor is to measure the coordinates of a few points that form the borders of the mapping area while in most of the cases, the path that the surveyor follows can be approached with straight, right-angle, step-shaped, pi-shpaed and squared-based paths.

Thus, in the present experimentation, the methodology was tested utilizing the commonly-used paths that referrred above while the visual markers which represent the characteristic points of the path were placed on locations aiming to form the shape of each path similarly to a real-case scenario. For instance in a surveyed area with square shape, the visual markers are placed in the four corners of the square while in a long straight path, two or more visual markers are located along the straignt path.

Terrestrial experiments

Straight path (indoor)

The first experiment was conducted in an indoor environment. Two markers were utilized for the origin and target marker respectively, in a distance of 3.5 m while the camera followed a straight path as presented in Fig. 6a. It's worth mentioning that the target marker was placed about 70 cm higher than the origin marker. The results of this experiment are presented in Table 1.

Table 1 Estimations of straight path (indoor) experiment

Precision mapping through the stereo vision and geometric transformations in unknown environments

Abstract

Similar content being viewed by others

A Low-Cost MMS Approach to the Simultaneous Localization and Mapping Problem

Accuracy of Unmanned Aerial Systems Photogrammetry and Structure from Motion in Surveying and Mapping: A Review

Visual information assisted UAV positioning using priori remote-sensing information

Explore related subjects

Introduction

Materials and methods

System architecture

Functionality

Visualization module

Equipment setup

Results

Terrestrial experiments

Straight path (indoor)

Right-angle path (indoor)

Step-shaped path (indoor)

Pi-shaped path (indoor)

UAV experiments

Straight path

Square path

Initial and proposed methodology comparison

Validation of characteristic points

Discussion

Conclusions

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation