1 Introduction

Simultaneous Localization and Mapping (SLAM) is the most basic prerequisite of intelligent robots and the necessary ability of driverless vehicles. Although there are many accurate and effective solutions to SLAM problems, such as the methods based on vision [1] or Lidar [2, 3], or integrating their advantages [4], most of the existing methods [5, 6], including them, are based on the assumption of a static world, which greatly limits the scope of application of these excellent methods. The typical method to estimate Lidar ego-motion is to apply the iterative closest point (ICP) method to the point clouds of adjacent frames [7]. However, Lidar-based methods can work even in the dark, and many 3D Lidars can capture the details of the environment over a long distance [6]. Therefore, this work focuses on the use of 3D Lidar to deal with the problem of SLAM in urban environment containing complex outliers.

The main contributions of this paper are as follows:

  • We integrate a robust outliers elimination method based on 4-point congruence set into a state-of-the-art SLAM framework to further improve the accuracy and robustness of odometry and mapping.

  • We use a reasonable coarse-to-fine registration method to replace the traditional single-step registration method to further reduce computation of the registration part, so as to reduce total runtime consumption of the system.

The rest of the paper is organized as follows: we introduce the related work in Sect. 10.2, describe the proposed method in Sect. 10.3, compare our method with the most advanced method in Sect. 10.4, and summarize our work and put forward the prospect in Sect. 10.5.

2 Related Work

2.1 Point Cloud Registration

The overlap rate of two point clouds to be registered is a key parameter, which is usually inversely proportional to the number of outliers. If two point clouds have a large overlap rate or obvious point correspondence, then their registration can be achieved by the iterative closest point [8] and various improved algorithms without initialization. ICP will fall into the local optimal solution. Go-ICP [9] and Gogma [10] use branch and bound in six-dimensional space to achieve global registration of point clouds without corresponding points. Because branch and bound method has exponential complexity, these methods are computationally expensive, and they often diverge due to small overlap [15].

The common rough registration methods are divided into hypothesis, test methods are represented by Random Sample Consensus (RANSAC), and geometric feature-based methods are represented by 4PCS [11]. RANSAC and its improved algorithm have cubic complexity and are not suitable for large-scale point clouds. Mellado et al. [13] reduced the complexity to quadratic and linear successively by matching the congruent four-point sets in two point clouds. Theiler et al. [12] use the set of key points for matching, which reduces the size of the set to be searched. Mohamad et al. [14] generalize the construction of coplanar four-point basis, which further improves the efficiency of point cloud registration. Raposo and Barreto [15] only use the topological relationship and normal vector between two points to construct matching rules, which greatly improves the registration efficiency and is suitable for point clouds with smaller overlap ratio. On the basis of the initial solution of coarse registration, precise registration is used to locally minimize the nonconvex error function to obtain the exact solution. Shan and Englot [6] suggest searching rotation and translation separately to reduce the registration complexity.

There are a lot of related works, but they either cannot fully meet the requirements of real-time, robustness, and high-precision in Lidar SLAM or have not carried out systematic experiments on Lidar point cloud dataset.

2.2 Lidar Odometry and Mapping

LOAM [2], rank second in the odometry evaluation project of KITTI vision benchmark [16], has a very low drift in the short-trajectory scenarios, and LOAM has greatly promoted the development of Lidar SLAM field. However, LOAM has no loop closure detection to eliminate the continuous accumulation of errors, so it may produce significant errors in the case of long trajectory. More importantly, LOAM does not consider dynamic objects, resulting in incorrect results in dynamic scenes. LeGo-LOAM [6] optimizes LOAM by adding loop closure detection to reduce motion drift, adding clustering and segmentation modules to filter out clusters with less than 30 points to realize filtering of noise. LOL [17] improves odometry accuracy by detecting the geometric similarity between online 3D point clouds and prior offline maps.

3 Proposed System

3.1 Task Description and Definitions

The problem addressed in this work is how to use the point cloud observed by 3D Lidar to estimate its ego-motion and build a global map for the traversed environment.

We define the point cloud measured as P, Lidar pose as X, global map as M, use the upper right corner to represent the time stamp and define the K-scan point cloud as Pk, where a point is \(p_{i}^{k}\), that is, \({\varvec{P}}^{{\varvec{k}}} = \left( {p_{1}^{k} ,p_{2}^{k} , \ldots ,p_{n}^{k} } \right)\), \(p_{i}^{k} = \left( {x_{i}^{k} ,y_{i}^{k} ,z_{i}^{k} } \right)\), where n is the size of the point cloud, \(k \in {\mathbb{N}}^{ + }\).

Under the above definition, the problem can be modeled as: Given map \(M^{t - 1}\), Lidar pose \(X^{t - 1}\) at time t − 1, and point cloud \({\varvec{P}}^{{\varvec{t}}}\) at time t, to update \(X^{t}\) and \(M^{t}\), as can be shown in (10.1) and (10.2).

$$ X^{t} = {\mathbb{F}}\left( {X^{t - 1} ,{\varvec{P}}^{{\varvec{t}}} } \right) = {\mathbb{F}}\left( {X^{t - 1} ,{\mathbb{W}}\left( {{\varvec{P}}^{{*}{\varvec{t}}} ,W^{t} } \right)} \right) $$
(10.1)
$$ M^{t} = {\mathbb{G}}\left( {M^{t - 1} ,{\varvec{P}}^{{\varvec{t}}} } \right) = {\mathbb{G}}\left( {M^{t - 1} ,{\mathbb{W}}({\varvec{P}}^{{*}{\varvec{t}}} ,W^{t} )} \right) $$
(10.2)

where \(P^{{{*}t}}\) is the description of the real scene in the field of vision at time t. \(W^{t} = (w_{1}^{t} ,w_{2}^{t} , \ldots ,w_{i}^{t} )\) is the sum of many interference factors such as sensor noise, abnormal measurement values, or dynamic objects. \({\mathbb{W}}\) is the unknown noise function that transforms the real scene into the measurement point cloud. Next, we define the set of outliers in point cloud caused by \(W^{t}\) that are harmful to registration as \({\varvec{P}}_{{{\varvec{out}}}}\), and the remaining points are \({{\varvec{P}}}_{{\varvec{i}}{\varvec{n}}}\), namely,

$$ {\varvec{P}} = {\varvec{P}}_{{{\varvec{out}}}} \cup {\varvec{P}}_{{{\varvec{in}}}} = {\mathbb{W}}\left( {P^{*} ,W} \right) $$
(10.3)

The task of this paper is to find the optimal function \({\mathbb{F}}\) and \({\mathbb{G}}\), and the core procedure is to find and eliminate \({{\varvec{P}}}_{{\varvec{o}}{\varvec{u}}{\varvec{t}}}\) to optimize them.

3.2 Segmentation

The input of the segmentation module is the original 3D LIDAR point cloud \({\varvec{P}} = \left\{ {p_{1} ,p_{2} , \ldots ,p_{n} } \right\}\), the output is m clusters \(\{ P_{{C_{1} }} ,P_{{C_{2} }} , \ldots ,P_{{C_{m} }} \}\) and corresponding cluster label vector \(L = \left( {l_{1} ,l_{2} , \ldots ,l_{n} } \right)\), where \(l_{i} \in \left[ { - 1,m} \right] \cap {\mathbb{Z}}\), label −1 represents the point belongs to the ground, and the positive integer represents the serial number of non-ground clusters.

The point cloud \({\varvec{P}}^{{\varvec{k}}}\) is first projected onto the circular range image \(P_{U,V} \triangleq \{ p_{{\left( {u,v} \right)}} |u \in \left[ {1,360^\circ /R_{{{\text{hori}}}} } \right] \cap {\mathbb{Z}}\), \(v \in \left[ {1,N_{{{\text{scan}}}} } \right] \cap {\mathbb{Z}}\}\) of size \(N_{{{\text{pixel}}}} = 360^\circ /R_{{{\text{hori}}}} *N_{{{\text{scan}}}}\). The row index \({\text{row}}_{i}\) of \(p_{i}\) is shown in (10.4), and the column index \({\text{col}}_{i}\) is shown in (10.5).

$$ {\text{row}}_{i} = \left( {\tan^{ - 1} \frac{{z_{i} }}{{\sqrt {x_{i}^{2} + y_{i}^{2} } }} + {\text{Pit}}_{{{\text{bottom}}}} } \right)/R_{{{\text{vert}}}} $$
(10.4)
$$ {\text{col}}_{i} = \tan^{ - 1} \frac{{y_{i} }}{{x_{i} }}/R_{{{\text{hori}}}} $$
(10.5)

The value of the image is the distance from \(p_{i}\) to the Lidar \(r_{i} = \sqrt {x_{i}^{2} + y_{i}^{2} + z_{i}^{2} }\), where \(R_{{{\text{hori}}}}\) is the horizontal resolution of the Lidar, \(R_{{{\text{vert}}}}\) is the vertical resolution of the Lidar, \({\text{Pit}}_{{{\text{bottom}}}}\) is the minimum pitch of the Lidar Lidar beam, and \(N_{{{\text{scan}}}}\) is the number of Lidar lines. After the projection, a point \(p_{u,v + 1}\) and \(p_{u,v}\), \(v \in \left[ {1,\frac{1}{2}N_{{{\text{scan}}}} } \right] \cap {\mathbb{Z}}\) in \(P\) can be uniquely identified by the row index and column index, denoted by:

$$ p_{i} \equiv p_{{\left( {u,v} \right)}} = \left( {x_{{\left( {u,v} \right)}} ,y_{{\left( {u,v} \right)}} ,z_{{\left( {u,v} \right)}} } \right) $$
(10.6)

Then the ground is extracted from the range image \(P_{U,V}\). For points \(p_{u,v + 1}\) and \(p_{u,v}\), where \(v \in \left[ {1,\frac{1}{2}N_{{{\text{scan}}}} } \right] \cap {\mathbb{Z}}\); if the pitch of adjacent rows \(Pit_{v,v + 1}\) (as shown in (10.7)) are less than 10°, they are denoted as ground points.

$$ Pit_{v,v + 1} = \tan^{ - 1} \frac{{z_{u,v + 1} - z_{u,v + 1} }}{{\sqrt {(x_{u,v + 1} - x_{u,v} )^{2} + (y_{u,v + 1} - y_{u,v} )^{2} } }} $$
(10.7)

Finally, the segmentation clustering method based on range image [18] was applied to cluster the points into several clusters. Points from the same cluster are assigned a unique label. After this process, for \(p_{i}^{k} \in {\varvec{P}}^{{\varvec{k}}}\), we get \(l_{i}\), \({\text{row}}_{i}\), \({\text{col}}_{i}\), and \(r_{i}\), where \(i \in \left[ {1,N_{{{\text{pixel}}}} } \right] \cap {\mathbb{Z}}\). \({\varvec{P}}^{{\varvec{k}}}\) is divided into a ground point set and some non-ground point sets, that is, \({\varvec{P}}^{{\varvec{k}}} = {\varvec{P}}_{{\varvec{g}}} \cup {\varvec{P}}_{{{\varvec{ng}}}}\), and it is worth noting that both types of points may contain outliers. The extraction of features from non-ground points can reduce time-consuming and improve the accuracy of registration through label matching and noise filtering.

3.3 Feature Extraction

In order to reduce the computation amount of registration, we extract the representative feature points to reduce the number of the points to be registered. The feature extraction method is similar to that used in LeGo-LOAM. We first used (10.8) to evaluate the curvature ci of point pi, defining points whose curvature is greater than threshold \(c_{{{\text{thre}}}}\) as edge features and points whose curvature is less than \(c_{{{\text{thre}}}}\) as plane features.

$$ c_{i} = \frac{1}{{\left| {\varvec{S}} \right|||r_{i}|| }}\left\|\mathop \sum \limits_{j \in S,j \ne i} (r_{j} - r_{i} )\right\| $$
(10.8)

where S is a set of continuous points on the same row from the image, points in S are uniformly distributed on both sides of pi, and |S| is the number of points in S.

3.4 Lidar Odometry

The input of Lidar odometry module is the feature point in the point cloud, which includes estimating the Lidar motion between adjacent scans and eliminating the motion distortion of the point cloud through feature matching. Looking for the Lidar of the adjacent scanning motion—by looking for rotation matrix \(\tilde{\user2{R}} \in {\text{SO}}(3)\) and translation vector \(\tilde{t} \in {\mathbb{R}}^{3}\) to minimize the point cloud registration error, such as (10.9).

$$ \tilde{\user2{T}} \triangleq \left( {\tilde{\user2{R}},\tilde{t}} \right) = \arg \min \left( {\mathop \sum \limits_{i}|| \tilde{\user2{R}}^{t} p_{i}^{t} + \tilde{t} - q_{i}^{t - 1} }|| \right) $$
(10.9)

where \(q_{i}^{t - 1}\) is the corresponding point at t − 1 found by \(p_{i}^{t}\).

The LOAM details \({\mathbb{F}}\), and LeGo-LOAM optimizes it. For brevity, this chapter focuses on further optimization based on these methods, which is achieved by matching only valid feature points in the adjacent frames. The optimization process can be defined as:

$$ {\text{min}}\;||\tilde{\user2{T}}^{t} - {\varvec{T}}^{{*}{\varvec{t}}}|| \;\;\;{\text{s}}.{\text{t}}.\;p_{i}^{t} \in {\varvec{F}}_{{{\varvec{in}}}}^{{\varvec{t}}} $$
(10.10)

There is the optimal transformation matrix from \({\varvec{P}}^{{\varvec{t}}}\) to \({\varvec{P}}^{{{\varvec{t}} - 1}}\), \({\varvec{R}}^{\user2{*}}\) and \({\text{tr}}^{*}\) are the corresponding rotation matrix and the translational vectors, and \({\varvec{F}}_{{{\varvec{in}}}}^{{\varvec{t}}}\) is the set of effective feature points at time t, namely \({\varvec{F}}^{{\varvec{t}}} = \left( {{\varvec{F}}_{{{\varvec{out}}}}^{{\varvec{t}}} \cup {\varvec{F}}_{{{\varvec{in}}}}^{{\varvec{t}}} } \right)\). The process of finding the corresponding relationship is shown in Fig. 10.1, that is, the transformation is found in the plane feature point set and the edge feature point set, respectively, and the computational effort is reduced by reducing the candidate range of corresponding points. We focus on two strategies, outlier elimination and weight compensation of feature point.

Fig. 10.1
figure 1

Corresponding relation of feature points at adjacent moments

(1) Outlier elimination: To solve this problem, we use the two-step coarse-to-fine registration strategy, that is, rough registration is first used to detect outliers, and then the outliers are avoided to participate in the L-M algorithm. We can express the outlier feature point \({\varvec{F}}_{{{\varvec{out}}}}\) as:

$$ {\varvec{F}}_{{{\varvec{out}}}}^{{\varvec{t}}} = \left\{ {p_{i}^{t} | ||{{\varvec{R}}^{*} p_{i}^{t} + tr^{*} - q_{i}^{t - 1} }|| > \varepsilon \wedge p_{i}^{t} \in {\varvec{F}}^{{\varvec{t}}} } \right\} $$
(10.11)

where \(q_{i}^{t - 1}\) is the feature point corresponding to \(p_{i}^{t}\). We first define feature descriptor \({\varvec{d}}^{{\varvec{t}}}\) and then use \({\varvec{d}}^{{\varvec{t}}}\) and \({\varvec{d}}^{{{\varvec{t}} - 1}}\) to calculate pose transformation \(\tilde{\user2{T}}_{{{\text{rough}}}}\) as follows:

  1. (1)

    Finding the first point in the four-point basis: We select the feature point \(f_{{\left( {u_{1} ,v_{1} } \right)}}^{t} \in {\varvec{F}}^{{\varvec{t}}}\) as the first point \(a_{1}^{t}\) in the ith range image, namely \(a_{1}^{t} \triangleq f_{{\left( {u_{1} ,v_{1} } \right)}}^{t}\), where \(u_{1}^{t} \in \left[ {1 + \frac{{360^\circ *\left( {i - 1} \right)}}{{N_{{{\text{sub}}}} *R_{{{\text{hori}}}} }},\frac{360^\circ *i}{{N_{{{\text{sub}}}} *R_{{{\text{hori}}}} }}} \right] \cap {\mathbb{Z}}\) and \(v_{1}^{t} \in \left[ {\frac{{N_{{{\text{scan}}}} }}{4},\frac{{N_{{{\text{scan}}}} }}{2}} \right] \cap [v_{1}^{t - 1} - 1,v_{1}^{t - 1} + 1] \cap {\mathbb{Z}}\).

  2. (2)

    Finding the second point: We select \(a_{2}^{t} \triangleq f_{{\left( {u_{2} ,v_{2} } \right)}}^{t}\) in the sub-image farthest from \(a_{1}^{t}\), where \(u_{2}^{t} \in \left[ {1 + \frac{{360^\circ *\left( {i - 1 + \frac{{N_{{{\text{div}}}} }}{2}} \right)}}{{N_{{{\text{sub}}}} *R_{{{\text{hori}}}} }},\frac{{360^\circ *\left( {i + \frac{{N_{{{\text{div}}}} }}{2}} \right)}}{{N_{{{\text{sub}}}} *R_{{{\text{hori}}}} }}} \right] \cap {\mathbb{Z}}\), \(f_{{\left( {u_{2} ,v_{2} } \right)}}^{t} \in P_{{\left( {U,V} \right)_{{i + \frac{1}{2}N_{{{\text{sub}}}} }} }}^{t}\), the line segment defined with \(a_{1}^{t}\) and \(a_{2}^{t}\) as endpoints is \(l_{1}^{t} \triangleq a_{1}^{t} a_{2}^{t}\).

  3. (3)

    Constructing feature descriptors: We select \(a_{3}^{t}\) and \(a_{4}^{t}\) from the sub-images excepting \(a_{1}^{t}\) and \(a_{2}^{t}\) to satisfy the conditions that \(l_{2}^{t} \triangleq a_{3}^{t} a_{4}^{t}\) is coplanar with \(l_{1}^{t}\), \(\left| {l_{2}^{t} } \right| > \frac{{\left| {l_{1}^{t} } \right|}}{4}\), and \(\angle \left( {l_{1}^{t} ,l_{2}^{t} } \right) > \frac{180^\circ }{{N_{{{\text{sub}}}} }}\). The intersection of \(l_{1}^{t}\) and \(l_{2}^{t}\) is \(e^{t}\). Then we record feature descriptor \({\varvec{d}}^{{\varvec{t}}}\), \({\varvec{d}}^{{\varvec{t}}} \triangleq \left( {d_{1}^{t} ,d_{2}^{t} } \right) \) when the above three conditions are true, otherwise, \({\varvec{d}}^{{\varvec{t}}} \triangleq \left( {d_{1}^{t} ,d_{3}^{t} } \right)\), where \(d_{1}^{t} \triangleq \left( {a_{1}^{t} ,a_{2}^{t} } \right)\), \(d_{2}^{t} \triangleq \left( {a_{3}^{t} ,a_{4}^{t} ,e^{t} ,\angle \left( {l_{1}^{t} ,l_{2}^{t} } \right)} \right)\), and \(d_{3}^{t} \triangleq \left( {c_{{a_{1} }}^{t} ,c_{{a_{2} }}^{t} } \right)\).

  4. (4)

    Matching feature descriptors: If the condition in (10.3) is true, and we use Super4PCS [13] to match \({\varvec{d}}^{{\varvec{t}}}\) and \({\varvec{d}}^{{{\varvec{t}} - 1}}\); otherwise, we use 2PNS [15].

The process of the second L-M step is similar to that in the first step, but the edge feature points is used instead, and the number of rows of feature points in (1) and (2) is no longer limited (Fig. 10.2).

Fig. 10.2
figure 2

Schematic diagram of four-point base selection

(2) Weight compensation of feature point: LeGo-LOAM gives the feature points a weight \(s_{i}\) which is inversely proportional to the distance between Lidar and points in the iteration process of L-M algorithm. The difference is that we also consider the influence of the point cloud holes in the background caused by the occlusion of dynamic objects on the stability of the system. Therefore, we reduce the weight of the feature points within a certain range of the dynamic cluster.

$$ \widetilde{{s_{i} }} = s_{i} {*}\left( {1 - \frac{{R_{{{\text{hori}}}} }}{{\left| {\theta_{{{\text{yaw}}}} } \right|}}} \right) $$
(10.12)

where \(\widetilde{{s_{i} }}\) is the new feature weight and \(\theta_{{{\text{yaw}}}}\) is the minimum yaw difference between a feature point and a dynamic object.

3.5 Lidar Mapping

Let us review the mapping model \({\mathbb{G}}\) (10.2), which is to use \(M^{t - 1}\) and \({\varvec{P}}^{{\varvec{t}}}\) to build \(M^{t}\) because the global map contains a lot of feature information, and the earlier information contains less error. The Lidar mapping module further optimizes the pose estimation by matching the \({\varvec{F}}^{{\varvec{t}}}\) with the submap \(M_{{{\text{sub}}}}^{t - 1}\) around the Lidar. Please refer to the description in [20] for detailed matching and optimization procedures. We use the map storage method in LeGo-LOAM to store only the feature point set \({\varvec{F}}^{{\varvec{t}}}\).

$$ M^{t - 1} \triangleq \left\{ {\underbrace {{\left\{ {{\varvec{F}}^{1} } \right\},{ }\left\{ {{\varvec{F}}^{2} } \right\}, \ldots ,\left\{ {{\varvec{F}}^{{{\varvec{t}} - {\varvec{k}}2}} } \right\}}}_{{M_{{{\text{hist}}}}^{t - 1} }}, \ldots ,\underbrace {{\left\{ {{\varvec{F}}^{{{\varvec{t}} - {\varvec{k}}}} } \right\}, \ldots ,\left\{ {{\varvec{F}}^{{\varvec{t}}} } \right\}}}_{{M_{{{\text{sub}}}}^{t - 1} }}} \right\} $$
(10.13)

In addition, the optional function is to further eliminate the error of the mapping module by detecting the loop closure. In LeGo-LOAM, if \({\varvec{F}}^{{\varvec{t}}}\) can find corresponding feature in earlier map \(M_{{{\text{hist}}}}^{t - 1}\) or recent features by ICP, a new constraint is added. The difference is that we use coarse-to-fine registration strategy in odometry module again to replace single-step ICP.

Then, the estimated pose of the sensor is updated by sending the pose map to the optimized system (e.g., [19]).

4 Experiments

All experiments were performed on a laptop equipped with a 2.5 GHz Intel i7-4700 processor, 16 GB of RAM.

4.1 Feature Extraction

Figure 10.3 shows a frame of point cloud and feature points (in green) in scenario 1, including five moving pedestrians (circled in white box), where Fig. 10.3a shows the wrong feature extraction results of LeGo-LOAM. It can be seen that many feature points are distributed on unstable outliers belong to pedestrians, while our result (Fig. 10.3b) is more reasonable because we detect outliers and avoid extracting feature points from them to avoid the incorrect odometry and mapping results caused by wrong matching of feature points. To be concise, we only show the feature edge points that are sufficient to illustrate the problem.

Fig. 10.3
figure 3

Feature extraction results of LeGo-LOAM and ours

The results show that our method has the advantage of effective feature extraction in scenes containing a large proportion of outliers (Table 10.1).

Table 10.1 Comparison of effective features ratio between LeGo-LOAM and ours

4.2 Odometry

Figure 10.4 shows the odometry comparison results of scenario 1, where the red curve is the result of LeGO-LOAM, while the blue one is ours. The results of the two methods have little difference from a global perspective, but there are quite differences in detail. We choose the most representative location (upper left corner of scenario 1, where pedestrians gathering here) to show the contrast, as can be seen from the green box, and the red line has a distinct jagged edge, which is exactly the wrong odometry estimation caused by the outliers, while the blue line has no such problem.

Fig. 10.4
figure 4

Odometry comparison results in scenario 1 between the two methods, where the green box is enlarged to highlight the differences. Ours is smoother and more accurate than that of LeGo-LOAM

Figure 10.5 shows the comparison result between the two methods and the true value in scenario 2, and we can see from Fig. 10.5 that the accuracy can still be maintained in the long trajectory scene (with a length of about 3730 m). The relative pose estimation error is calculated by comparing the final pose and the initial pose. Rotation errors and translation errors of LeGo-LOAM are 5.65° and 3.19 m, while our results are 4.52° and 2.75 m. These two groups of experiments illustrate the advantage of our method in odometry estimation.

Fig. 10.5
figure 5

Odometry comparison results in scenario 2 between the two methods. Green box is enlarged to highlight the differences

4.3 Mapping

The results of representative scenarios indicate that LeGo-LOAM will produce wrong mapping results because they do not consider outliers, as shown from Figs. 10.6, 10.7, 10.8, and 10.9.

Fig. 10.6
figure 6

Mapping result of scenario 1. a The result of LeGO-LOAM, we can clearly see the shadow left by outliers. b Our result, we get a clean map by identifying outliers and avoiding extracting feature points and mapping using them

Fig. 10.7
figure 7

Details comparison in scenario 1. a, b, and c are the results of LeGo-LOAM. d, e, and f are the results of our method. The white arrows highlight the differences

Fig. 10.8
figure 8

Mapping result of scenario 2. a The result of LeGO-LOAM. b The result of our method

Fig. 10.9
figure 9

Details comparison in scenario 2. a, b, and c are the results of LeGo-LOAM. d, e, and f are the results of our method. The white arrow highlights the differences

Figure 10.6 shows the mapping result of scenario 1, where Fig. 10.6a is the LeGO-LOAM mapping result. Figure 10.7 shows the details of the mapping result of scene 1, where (a), (b), and (c) are the result of incorrect mapping caused by incorrect feature extraction. Dynamic pedestrians and vehicles participate in feature extraction and association and are thus added to the map, leaving obvious residual shadows. Among them, Fig. 10.7a is the enlarged result on the top of the map, and the residual shadow is the movement track left by four pedestrians moving forward. Figure 10.7b is a local enlarged image on the left of the map, and the residual shadow is the movement track left by an upward moving vehicle. Figure 10.7c is a local enlarged image of the lower right part of the map, and the residual shadow is left by a moving car. Different from LeGO-LOAM, we first detect and eliminate outliers and focus on extracting effective features, so we have obvious advantages when mapping in dynamic scenarios.

Figure 10.8 shows the mapping result of scenario 2, when we zoom in on the local map near the loop closure, as shown in Fig. 10.9, we can see the obvious differences as indicated by the white arrows, and the buildings in map of LeGO-LOAM have obviously offset, while ours remains globally consistent, which benefits from two-step of coarse-to-fine registration strategy.

4.4 Runtime

The running time of odometry module and mapping module between our method and LeGo-LOAM is shown in Table 10.2. The time consumption of odometry module is reduced by about 15%, while that of mapping module is about 31%. The results demonstrate the advantage of our approach in terms of time consumption, which is because we use two-step coarse-to-fine registration strategy to replace single-step registration.

Table 10.2 Runtime (ms) comparison of one frame between LeGo-LOAM and ours

5 Conclusions and Future Works

Future works include further improving the accuracy of SLAM system in complex and highly dynamic scenes, because we find the proposed system will still produce unsatisfactory performance when there are many and dense moving objects around the Lidar. Some methods to solve the problem of robot kidnapping may be the solutions to the raised problems. We will also try to deploy SLAM system on multiple base stations to make up for the deficiency caused by the serious occlusion of a single base station by mobile objects through cooperative tasks. Therefore, we will further combine the semantic information and motion information of the scene, which will also provide a new idea for the construction of semantic map.