1 Introduction

With the development of unmanned driving, real-time localization and navigation research for outdoor mobile robots or unmanned vehicles are receiving more and more attentions. The common solution for navigation tasks of unmanned vehicles at present is heavily relied on high-precision maps and GPS/INS information. However, when GPS data and high-precision maps are not available, SLAM (Simultaneously Localization and Mapping) is thought to be a good substitute to solve the above problems. Visual SLAM and LIDAR SLAM are two commonly used outdoor SLAM approaches. While visual SLAM algorithms are greatly influenced by the intensity of light, the optical structure and the texture features of environments, the robustness of the system is often challenged, especially for long-term large-scale scenes applications. In contrast, the principle of LIDAR SLAM algorithms is to match point clouds obtained by LIDAR sensors, which mainly depends on the geometric information of the environment. Benefited by wide measurement range, high accuracy and good robustness, LIDAR SLAM has been widely used in unmanned robots or vehicles, and is often considered as the real key to solve autonomous navigation problem such as “unmanned driving”.

Generally speaking, LIDAR SLAM approaches for on-road applications are relatively mature, but there is no satisfactory solution for complicated, unconstructed, multi-scene environments, such as ground/watersurface composite environments (harbors, dams, bridges, islands, etc.) dealt in this paper. Ground/watersurface composite environments are very different from ground-only environments: first, there are fewer laser points or sparser features on the water due to laser rays’ inability to be reflected, so approaches depending on traditional geometric features (lines, corners, edges, and so on) or global/local descriptors (B-shot, FPFH, VFH, and so on) extraction may fail; second, unmanned ships are vulnerable to the influence of wind and waves on the water so that the motion is unstable, which may produce distortions of 3D point cloud images; finally, there will be a lot of dynamic noises on the water in perception process, which may greatly affect the matching results if not processed effectively. Due to above reasons, traditional LIDAR SLAM methods for ground-only environments have poor robustness and performances when applied to complex multi-scene environments with a lot of water. In addition, shortcomings such as the failure of matching in high-speed motion and great dependence on GPS/INS information may also lead to the difficulties in the practical application of traditional SLAM approaches in unmanned navigation systems.

In this paper, the SLAM problem of outdoor robots such as UGV or USV in outdoor multi-scene environments is studied. In view of the above mentioned shortcomings, approaches with traditional geometric features or descriptors extracted from point clouds is not considered here, because under such challenging environments less features can be extracted for most of scans, which may makes the algorithm unstable and time-consuming, especially in low-lapping, high-speed, high dynamic scenes. Instead, the Super4PCS algorithm with only co-planar 4-point extraction is adopted for initial transformation estimation in scan matching for ease of use, better robustness and real-time property even in low-lapping multi-scene applications. Moreover, two other classical iterative point-to-point based scan matching approaches, ICP(Iterative Closest Point) and NDT (Normal Distributions Transform), are also introduced and combined to construct the final comprehensive coarse-to-fine LIDAR SLAM system (S4-SLAM) to achieve a good balance between accuracy (under 1% global drift) and real-time property (5 Hz for localization output, 1 Hz correction and mapping output), which makes it can keep high accuracy and real-time property in both UGV and USV applications.

The main contributions and the innovations of our method are as follows:

  1. 1.

    Traditional SLAM approaches are mainly designed and used in ground-only environments. Their performances will degraded heavily (sometimes the matching may even fail) when used in ground/watersurface composite environments with poor robustness. By introducing the Super4PCS algorithm in which only four co-planar points are needed for scan matching, our method can provide better environmental adaptability to deal with sparse point clouds or feature points, high speed movements, and high noise levels. Furthermore, several modifications are made on the original Super4PCS algorithm to improve the real-time performance and reduce the mismatch rate, so that it can be applied effectively to complicated multi-scene environments.

  2. 2.

    In this paper the mainstream framework of existing SLAM approaches are adopted and greatly improved. By extending the traditional single localization node into the new two-node localization-correction structure in the front-end, an effective combination or good trade-off among scan matching algorithms such as Super4PCS, ICP, and NDT can be achieved. For example, Super4PCS is used to provide rapidly an initial iteration value for ICP to realize a coarse-to-fine positioning process in the localization node at a high frequency, and NDT is speeded up by introducing a dynamic voxel map to correct the localization node at a lower frequency. Finally, a real-time LIDAR SLAM system with good robustness is constructed;

  3. 3.

    Compared to other SLAM approaches, the proposed S4-SLAM method has lower dependence on GPS/INS data, and can be applied to GPS-loss/GPS-denied environments. Extensive experiments have been carried out on KITTI dataset, campus and harbor environments, and the results show that our method can produce high-frequency and high-precision positioning and mapping outputs when unmanned vehicles/ships moves at a high speed.

2 Related work

LIDAR sensors are becoming more and more popular in mobile robots and unmanned vehicles applications, and up to now a large number of LIDAR SLAM approaches have been proposed accordingly. In this section only 6-Dof LIDAR SLAM approaches with 3D point cloud registration/mapping based on scan matching algorithms are discussed. The earliest and best established algorithm for solving the point cloud matching problem is ICP (Besl and McKay 1992). However, the standard ICP algorithm need a good initial iteration value, otherwise the final iteration result may fall into a local optimum (Pottmann et al. 2006; Bae 2009). Many researchers have proposed modified versions of the original ICP algorithm. Bouaziz et al. (2013) proposed the Sparse ICP algorithm that uses sparse norm to replace the 2-norm and achieves better registration results. But this improved ICP is still a local optimization algorithm, and cannot guarantee the convergence to the global optimal solution. Yang et al. (2016) proposed the GO-ICP algorithm that tries to avoid the local minima problem and ensures global convergence, but it requires a larger overlap of two point clouds. A widely used yet effective global solution is to roughly align two point clouds manually or with some pre-processing techniques to get a coarse initial pose estimation and then use ICP iteration steps to achieve fine registration, but this coarse-to-fine method is generally more complex and difficult to apply in real-time. Biber and Straßer (2003) proposed the so-called NDT(Normal Distributions Transform) algorithm based on a completely probabilistic model description, and Magnusson (2009) generalized the NDT method to 3D, but the performance is sensitive to the selection of the grid-cell size. Irani and Raghavan (1999) proposed the RANSAC algorithm, which randomly selects three pairs of corresponding points from two point clouds, and calculates the rigid transformation between these two point clouds iteratively. However, this algorithm is still not applicable to large-scale point clouds. Aiger et al. (2008) proposed the 4PCS algorithm as a variant of the RANSAC algorithm. A coplanar 4-point set is extracted from two point clouds to get the rigid transformation with the minimum registration error as the final registration result, which makes 4PCS an excellent global registration algorithm. Furthermore, Mellado et al. (2014) proposed the Super4PCS algorithm with improvements based on the 4PCS algorithm. It can greatly reduce the complexity of the original 4PCS and moreover can handle larger displacement and low overlap between point clouds. Experiment results of this paper also shows the Super4PCS algorithm has good robustness to keep the performance in ground/watersurface multi-scene environments applications that has fewer data and much sparser features. The Super4PCS can be used to find a coarse approximation of the global optimum, and then combined with a local algorithm (e.g., ICP or NDT) to refine the transformation.

ICP based LIDAR SLAM approaches can be used effectively when the vehicle moves slowly or the scanning rate is set to high. In this case, the two adjacent frames of point clouds have a large overlap and can be registered and aligned in a good manner. Almeida and Santos (2013) use the ICP algorithm to match the two adjacent frames and obtain the corresponding position transformation relations, but their method may possibly fall into local optimum when the overlap between frames is small. Besides, this method is difficult to meet the requirements of real-time. Viejo and Cazorla (2014) extract scene vertices and plane features from point clouds to complete matching and positioning. Moosmann and Stiller (2011) proposed a 3D LIDAR SLAM which takes into account the motion distortion caused by the rotation of the LIDAR and used an interpolation strategy to remove the distortion, and then a 3D grid map with small surfaces is constructed. After that the constraint relationship between points and surfaces is used to calculate the transformation matrix. Ceriani et al. (2015) construct a sparse grid map and use a generalized ICP method to accomplish the positioning task. Dubé et al. (2018) proposed a 3D LIDAR SLAM framework based on segment matching, in which the region growing idea is adopted to segment a point cloud and track each segment after segmentation. Finally, the transformation matrix is obtained by the verification of geometric consistency. Furthermore, Dubé et al. (2017) added a loop detection method based on point cloud to their SLAM algorithm. Their method extracts multiple features of each segment, and uses the random forest to complete segment classification and matching. The geometric consistency of the segments is verified to detect a true loop. Zhang and Singh (2014) put forward a method of real-time LIDAR odometry (LOAM), it estimates the velocity of LIDAR and continuously modifies LIDAR pose by matching edge and planar feature points extracted from the scanned data. Zhang and Singh (2015) put forward a method combining visual odometry and LIDAR odometry (V-LOAM). Their experiments proved that this method is insensitive to the surrounding light condition, and show satisfactory accuracy and robustness even when the robot moves at high speed. Shan and Englot (2018) proposed LeGO-LOAM which improves the LOAM algorithm by extracting flat features on the ground and sharp features off the ground, and then matched the features by two-steps L-M optimization, but this algorithm needs to assume that the ground exists. Koide et al. (2018) proposed a real-time SLAM algorithm that constructs the ground constraints by extracting the ground plane and uses the graph optimization to build the map. However, this algorithm still needs to assume that the ground exists, and cannot operate in other non-ground environments. Deschaud (2018) proposed a SLAM method based on Implicit Moving Least Squares surface representation (IMLS-SLAM). The method extracts sampling points with strongest observability to pose parameters, and then uses the constraint relationship between the sample points and the implicit surfaces to complete the scan-to-model registration. Their algorithm can finally achieve similar precision as LOAM, but cannot be used in real-time. Ji et al. (2018) propose the CPFG-SLAM algorithm which combines NDT and ICP to perform better than LOAM in open field with fewer feature points, however the computation complexity is high and can not be used in real-time too. Behley and Stachniss (2018) use the Surfel map in the 3D LIDAR SLAM of outdoor large scene which map the point clouds obtained by the rotating laser to 2D planes in order to constructed the Surfel map, but the accuracy and real-time property are still lower than LOAM.

Most above mentioned LIDAR SLAM approaches are designed for ground environments with rich features, and highly depend on ground points or need to treat the ground as an important constraints, which limits their application performances and robustness in more challenging multi-scene environments. By now few related studies on LIDAR SLAM is reported to be used in ground/watersurface composite environments. Most existing algorithms can be find it difficult to achieve good results in this different kind of environments. As far as the authors know, our paper is the first one to carry out LIDAR SLAM research and experimental verification on both unmanned ground vehicles and ships simultaneously. Three existing mainstream matching methods (Super4PCS, ICP, NDT) are fused to improve the standard SLAM framework. Finally a high-precision SLAM system with good robustness and real-time is constructed in order to be applied in complex multi-scene environments.

3 System overview

Figure 1 shows the diagram of the proposed S4-SLAM system. Our SLAM system is composed of two functions: the odometry as the front-end and loop closure optimization as the back-end. In odometry the classic SLAM framework is improved and extended from the traditional single localization node into a new two-node structure including a localization node and a correction node, where the localization node outputs the location information at a high frequency and the correction node corrects the localization node at a lower frequency.

Fig. 1
figure 1

Block diagram of the S4-SLAM system

Considering the trade-off between real-time performance and localization accuracy, a coarse-to-fine scan matching method is used by combining the Super4PCS algorithm with the ICP algorithm in the localization node. In view of shortcomings of the Super4PCS algorithm, a filter is proposed to roughly filter out the ground and the dynamic objects, and a heuristic iterative strategy based on the overlap rate evaluation is introduced to select the four coplanar points. All these improvements can accelerate the matching speed of the original Super4PCS algorithm. After that, the standard ICP matching method is used in fine matching. Since the coarse matching step can provide a good initial value for ICP, an accurate pose estimation between frames could be calculated quickly with less ICP iterations after down-sampling. Moreover, because motion drifts and noises are inevitable in the matching of two adjacent frames, it is necessary to correct the odometry periodically to ensure the localization accuracy. In the correction node a fixed-size dynamic voxel grid local sub-map is built and the odometry is corrected by matching the current key-frame with the sub-map. In the loop closure function our system introduces a location-based loop detection approach and verifies the loops with the overlap rate to reduce the probability of false positives. The trajectory output node integrates the pose message of the odometry function and the loop closure function, and finally outputs the pose message of the vehicle.

4 Data preprocessing

LIDAR devices acquires laser points by rotating laser rays. During scanning, the point cloud acquired from a scan would have motion distortion due to the motion of the vehicle, so a data preprocessing step should be carried out to eliminate it in advance before registration.

In this paper, coordination frames and symbols are defined as follows. The LIDAR coordinate system is expressed as {L} with the origin located at the geometric center of the LIDAR, and the x-axis pointing to the forward, the y-axis pointing left, the z-axis is pointing upward. The world coordinate system {W} is set to coincide with {L} at the initial position for convenience. Let k be the k-th scan of the LIDAR, Pk is the point cloud obtained from the kth scan, \( \tilde{P}_{k} \) is the point cloud obtained after removing the ground and the dynamic objects of Pk, and \( \hat{P}_{k} \) is the point cloud obtained after downsampling of \( \tilde{P}_{k} \). A point i,\( i \in P_{k} \) is denoted as \( X_{(k,i)}^{L} \) in {L}, and \( X_{(k,i)}^{W} \) in {W}, where the right uppercase superscription to indicate the coordinate systems.

Let the start time of the kth scan be tk and the end time be tk+1, the motion of the vehicle during [tk, tk+1] is expressed as:

$$ \Delta \varvec{s}_{(k,k + 1)} = [\Delta \theta_{x} ,\Delta \theta_{y} ,\Delta \theta_{z} ,\Delta p_{x} ,\Delta p_{y} ,\Delta p_{z} ]^{\text{T}} , $$
(1)

where \( \Delta \theta_{x} ,\Delta \theta_{y} ,\Delta \theta_{z} \) are rotation angles around the x-, y-, and z-axis, and \( \Delta p_{x} ,\Delta p_{y} ,\Delta p_{z} \) are displacements along the x-, y-, and z-axis respectively. For a laser point \( \tilde{X}_{(k,i)}^{L} \) at \( t_{i}^{k} \in \left[ {t_{k} ,t_{k + 1} } \right] \), the linear interpolation strategy can be used to eliminate the influence of motion distortion, that is, the current motion of the vehicle relative to the initial time \( t_{k} \) can be calculated as \( c \cdot \Delta \varvec{s}_{(k,k + 1)} \), where \( c = (t_{i}^{k} - t_{k} )/t_{\text{scan}} \) is a defined scalar, and tscan represents the period time required for a scan. However, it is difficult to obtain the time interval \( t_{i}^{k} - t_{k} \) accurately in real-life application. Instead the scanning angle of the laser ray is adopted to calculate c as \( c = (\alpha_{i}^{k} - \alpha_{k} )/360 \), where \( \alpha_{i}^{k} \) and \( \alpha_{k} \) are the horizontal rotation angle of the laser ray at the current and the initial time respectively. Then \( \tilde{X}_{(k,i)}^{L} \) is projected to the coordinate system of the initial time:

$$ X_{(k,i)}^{L} = R_{(k,i)}^{ - 1} (\tilde{X}_{(k,i)}^{L} - T_{(k,i)} ) , $$
(2)

where \( T_{(k,i)} = c \cdot [\Delta p_{x} ,\Delta p_{y} ,\Delta p_{z} ]^{\rm T} \) represents the displacement matrix and \( R_{(k,i)} = R_{z(k,i)} R_{y(k,i)} R_{x(k,i)} \) represents the rotation matrix, and \( X_{(k,i)}^{L} \) represents the point after eliminating the motion distortion.

If the vehicle has an inertial navigation system such as an IMU device, \( \Delta \varvec{s}_{(k,k + 1)} \) can be obtained directly from the device. Otherwise, the constant-velocity motion model can be used to solve the problem approximately, which assumes that the motion of the current frame is equal to that of the previous frame, \( \Delta \varvec{s}_{(k,k + 1)} = \Delta \varvec{s}_{(k - 1,k)} \). Through the above process, the motion distortion is eliminated.

5 LIDAR odometry

The LIDAR odometry is improved and extended from traditional single localization node into a new two-node structure, that is, it consists of a localization node and a correction node. The localization node realizes a coarse-to-fine matching process by combining Super4PCS with ICP to output pose message at a high frequency. The correction node uses the NDT method to match the key-frame with the dynamic local sub-map, and corrects the localization node at a lower frequency.

5.1 The localization node

5.1.1 Review of Super4PCS

The standard 4PCS algorithm solves the global 3D registration problem by using co-planar sets of four points. Figure 2 shows the principle of the algorithm. According to the principle of geometry, the cross ratios of line segments remain invariant under affine transformations and rigid motions. Suppose four points in a set \( B = \{ a,b,c,d\} \) selected from a point cloud P are co-planar, the line segments ab and cd intersect at point e, and the cross ratios of the two line segments are: \( r_{1} = \left\| {a - e} \right\|/d_{1} \), \( r_{2} = \left\| {c - e} \right\|/d_{2} \), where d1, d2 are the lengths of two line segments: \( d_{1} = \left\| {a - b} \right\| \), \( d_{2} = \left\| {c - d} \right\| \). According to the invariance of cross ratios r1, r2, all corresponding 4-points sets \( U = \{ U_{1} ,U_{2} , \ldots ,U_{\text{v}} \} \) can be found in the point cloud Q. For each \( U_{i} ,i = 1,2, \ldots ,{\text{v}} \), compute the optimal rigid transformation matrix between B and \( U_{i} \). The solution with largest overlap ratio between P and Q is selected as the true corresponding 4-points set \( B' = \{ a',b',c',d'\} \).

Fig. 2
figure 2

The cross ratios of line segments remain invariant under affine transformations and rigid motion

In view of the slow registration speed of 4PCS algorithm, Mellado et al. proposed Super4PCS algorithm, which uses intelligent index to reduce the computational complexity of 4PCS algorithm significantly. The time complexity of the 4PCS algorithm is \( O(n^{2} + k) \), where n is the number of points contained in the point cloud Q, k is the number of candidate congruent point sets. The Super4PCS algorithm has made two improvements compared to the 4PCS: (1) The process of searching matching point pairs based on distance is accelerated by rasterizing point clouds; (2) A smart index is proposed to filter out all pairs of redundant points. Through the above two improvements, the time complexity of the algorithm is reduced to linear complexity \( O(n + k_{1} + k_{2} ) \), where n is the number of points in the target point cloud Q, k1 is the number of point pairs with distance d, and k2 is the number of congruent sets obtained after filtering.

However, when using the original Super4PCS algorithm for point cloud matching, there are still following disadvantages existed:

  1. 1.

    Firstly, the 4PCS and the Super4PCS may find wrong corresponding point pairs in the point cloud with large plane, which results in the failures of matching. As shown in Fig. 3, the four points a, b, c, d on the plane S1 are co-planar, and many corresponding co-planar point sets such as {a1, b1, c1, d1}, {a2, b2, c2, d2}, … can be found on the plane S2. In fact, the wrong point sets exist in large numbers because all points in this plane is co-planar. To find the correct set and exclude other wrong sets in these sets is very time-consuming. One feasible solution is to filter out these large planes in the point cloud before matching to avoid the unnecessary calculation and improve the efficiency of the algorithm;

    Fig. 3
    figure 3

    It is possible to find wrong point sets in the point cloud with large planes, {a1, b1, c1, d1} is the right point set corresponding to {a, b, c, d}, and {a2, b2, c2, d2} is the wrong candidate point set

  2. 2.

    Secondly, dynamic objects in the environment also affect the registration. It will be difficult to find a 4-points set in point cloud Q that is truly equal with B if one point in the co-planar 4-points set B is on a dynamic object, and it will waste a lot of time to verify these wrong point sets. Therefore, the dynamic objects in point clouds should be removed before registration in order to improve the efficiency and accuracy of registration;

  3. 3.

    Finally, the Super4PCS algorithm may not find the correct corresponding point set if the selected point is not in the overlapping area. The Super4PCS algorithm first randomly selects three points in the point cloud Q, and then selects the fourth point to make these four points co-planar and the distances between each other are as large as possible. When the overlap rate of two point clouds is high, this strategy can find the best match quickly, however it may not find the best match when the overlap rate is low.

5.1.2 Coarse matching based on improved Super4PCS

In view of the above mentioned disadvantages of the original Super4PCS algorithm, several improvements are made in this paper to enhance the performance of this registration approach. For disadvantage (1), a ground filter based on elevation differences is proposed to filter out the ground points in order to avoid a large number of wrong matchings and meanwhile improve the real-time property by reducing the amount of potential matching pairs. For disadvantage (2), small dynamic objects are clustered and filtered out with a 2D binary image projection of the grid map to speed up the cluster process. For disadvantage (3), a heuristic iterative strategy based on the calculation of the overlap rate between two point sets is proposed to optimize the selection of co-planar four points, in order to reduce the amount of candidate points and the noise interference, which is especially useful when the overlap rate is low. All above strategies are integrated to improve the performance of the original Super4PCS by carefully balancing the registration accuracy and real-time property. The details of these improvements are given as follows.

As seen in real-life experiments, points from flat ground terrain usually form the largest plane in the point cloud, which may leads to wrong matchings. So the ground points should be filtered out firstly to improve the efficiency of matching. Considering real-time performance, the concept of elevation differences is used to roughly filter out the ground points without accurately extracting the ground and obstacles (Kammel et al. 2008). As shown in Fig. 4, a local grid map is built with the origin located at the geometric center of the LIDAR. The size of each voxel is set to 30 cm*30 cm, and the actual size of the map is consistent to the scanning range of our LIDAR. Each laser point \( X_{(k,i)}^{L} \) will be mapped onto a single corresponding grid cell c(m,n), and all points in c(m,n) constitute the point set Pmn. For Pmn, find the maximum elevation different ∆hmn. If ∆hmn is less than a pre-determined threshold hthr, the cell c(m,n) will be considered a ground cell. After traversing all cells by above steps and removing detected ground cells, the point cloud \( \tilde{P}_{k} \) without ground points can be obtained. Figure 5a shows the point cloud before ground filtering, Fig. 5b shows the point cloud after ground filtering.

Fig. 4
figure 4

Set up a local grid map and detect ground points according to the maximum elevation differences

Fig. 5
figure 5

The process of filtering out ground and dynamic objects: a shows the point cloud before ground filtering, b shows the ground detection results, where red points are detected as from ground and should be filtered out later, and green points are from buildings and other objects that should be remained, c shows the binary image which is mapped from the grid map, d is the image that after be eroded, e shows the result after the clustering process, f shows the result after filtering out the dynamic objects, where the blue point cloud is the possible dynamic object, the red point cloud is the ground we detected

As for the removal of dynamic objects, in our paper small dynamic objects instead of dynamic objects are considered and filtered out, that is, dynamic objects are replaced with small objects based on the observation that dynamic objects in real world are usually small in size. Moreover, the grid map is mapped to 2D image in order to speed up the clustering process greatly. The 2D image is represented by a binary image, as shown in Fig. 5c, in which black points represents objects above the ground.

Because objects in real world may be occluded, they may become discontinuous if clustered directly on the original image. In view of that the image erosion operator is used before clustering, as shown in Fig. 5d. After that, the region labeling algorithm is introduced to cluster the point cloud as following steps: (1) traverse the image from rows and columns respectively; (2) if the current pixel value is 0 (black), check whether the two pixel values on the left and on the top are 0; (3) if not, merge them and mark them with a label as a new cluster. Through above labeling steps all pixels with a same label are considered to be in the same cluster, and their rectangular bounding box can be calculated, as shown in Fig. 5e. Those clusters with areas less than a pre-determined threshold sthr will be filtered out. Finally, the point cloud after filtering ground and dynamic objects are recovered from the grid map, are shown in Fig. 5f, where the green point cloud is the original point cloud, the blue is the possible dynamic object, and the red is classified as the ground.

As for the selection of co-planar four points, a heuristic iterative strategy is proposed in this paper according to the concept of LCP(Largest Common Pointset) based on the calculation of the overlap rate, and determines the maximum range in which the selected points should lie. Suppose there are two point clouds P and Q, find a subset Pmax of P, \( \forall p_{i} \in P_{\hbox{max} } \), \( \exists q_{i} \in Q \) such that \( \left\| {T(p_{i} ) - q_{i} } \right\| \le \delta \), \( \delta > 0 \) with the maximum overlap ratio calculated by the following formula (da Silva et al. 2011):

$$ \lambda_{\hbox{max} } = \frac{{\left| {P_{\hbox{max} } } \right|}}{\left| P \right|} . $$
(3)

which means the maximum overlap under the admissible error δ; the operator \( | \cdot | \) stands for the number of all points in a point set. The biggest overlap ratio λmax will correspond to the maximum range for selecting points.

The initial overlap rate can be roughly estimated based on the motion of two adjacent frames. As shown in Fig. 6, since point clouds are acquired by LIDAR scanning, two adjacent frames of point clouds can be approximated to two circles, and the overlap rate of the two frames can be estimated approximately according to the overlap area of the two circles, the area sc of the shadow in Fig. 6 is:

$$ s_{c} = 2r^{2} \arccos \left( {\frac{{l_{12} }}{2r}} \right) - l_{12} \sqrt {r^{2} - \frac{1}{4}l_{12}^{2} } , $$
(4)

where r is the scanning range of the LIDAR; l12 is the distance between centers of these two circles, which can be obtained from the motion of the vehicle. So the estimated initial overlap ratio is:

$$ \lambda_{\text{init}} = \frac{{s_{c} }}{{\pi r^{2} }}, $$
(5)
Fig. 6
figure 6

The overlap rate can be roughly estimated by using the intersecting parts of two circles: overlap rate = shaded area/circle area

The specific algorithm flow for computing the overlap ratio λ iteratively is as Algorithm 1.

figure f

In order to reduce the computing time, a threshold \( \lambda_{\text{thr}} \) is set in advance so that the iteration will be terminated when \( \lambda \ge \lambda_{\text{thr}} \). The maximum translation distance dmax and the maximum rotation angle amax of the vehicle should also be set previously so that movements larger than these two threshold will be excluded to accelerate the iteration.

5.1.3 Fine matching based on ICP

After coarse matching, ICP is used to accomplish fine matching between two point clouds. The standard ICP matching algorithm was firstly proposed by Besl and McKay (1992) with complexity O(NP, NQ), where NP and NQ represent numbers of points contained in these two point clouds respectively. In our paper, the ICP algorithm with a 3D-Kdtree structure is used to reduce the time complexity to O(NP logNQ). However, when there are a large number of points in the point cloud, the algorithm can be still time-consuming. Therefore, downsampling is carried out before matching by using voxel grids to compress the points, while maintaining the shape characteristics of the point cloud. The number of points can be greatly reduced. As well as a good initial transformation matrix obtained by the coarse matching step, the matching speed of ICP is greatly accelerated. One of the main contributions of this paper is the use of coarse-to-fine matching methods to form the front-end node of our SLAM system, by combining several mainstream registration technologies to achieve good trade-off between the accuracy and the real-time performance.

5.1.4 Summary of the localization algorithm

The task of the localization node is to obtain the motion transformation matrix of two adjacent frames of point clouds. The whole iterative steps are summarized in Algorithm 2. It takes \( P_{k - 1} \)(the last scan), \( P_{k} \)(the current scan), and \( \tilde{P}_{k - 1} \),\( \hat{P}_{k - 1} \),\( \tilde{T}_{k - 1}^{W} \)(the outputs from the last recursion) as the inputs. Finally, the algorithm outputs \( \tilde{P}_{k} \),\( \hat{P}_{k} \),\( \tilde{T}_{k}^{W} \) for the next recursion.

Figure 7 shows final matching results of the localization algorithm. Figure 7a shows the matching result of the Super4PCS coarse matching method, and Fig. 7b shows the matching result of the ICP fine matching method.

figure g
Fig. 7
figure 7

The matching result of the localization algorithm. The blue point cloud is the source point cloud, the red point cloud is the target point cloud, and the green point cloud is the resulting point cloud. Coarse matching roughly aligns two frame point clouds, and fine matching aligns the point clouds precisely (Color figure online)

5.2 The correction node

The localization algorithm by matching two adjacent frames may inevitably produce cumulative errors, which will grow over gradually with time. In view of that, to make the algorithm effective for long-term navigation tasks, the so-called correction node is introduced in this paper in order to correct the location information periodically with selected key-frames using NDT matching. By doing so, the traditional single localization-node in odometry part of SLAM system is extended to a new two-node structure consist of both localization and correction node, which can further improve the accuracy of the odometry. Furthermore, a dynamic voxel grid map is proposed to speed up the matching process of the NDT algorithm and make it more suitable for real-time localization applications.

In the correction node, key frames are decided in a very simple but yet effective way. That is, a key frame is selected every five frames. No other sophisticated approach such as calculating displacements or over-lapping ratio between frames is used. The reason is given as below. Because the localization node is updated at 5 Hz, the key frame in the correction node is selected nearly at 1 Hz. Noted that a LIDAR sensor mounted on an unmanned vehicle is often omnidirectional with a wide range of over 100 meters, the displacement between two adjacent key frames is less than 30 meters even if the vehicle travels at a very high speed 100 km/h. That means key frames are not far from each other so that the registration accuracy can be guaranteed even in real-time applications. For other scenes with different LIDARs or vehicle speeds, similar strategy on how to select key-frames can also be made.

When a key-frame is selected, the correction node uses NDT to match the key-frame with the dynamic local sub-map, and the output of the localization node is thus corrected by directly replaced by the resulting transformation. Magnusson et al. (2009) proved that NDT is better than ICP on both robustness and accuracy with a large number of experiments. In addition, the NDT algorithm has a good performance of real-time and stability, and has a good adaptability to different environments. Therefore, NDT is adopted in the correction node to match the current key-frame with the local sub-map.

The NDT algorithm divides the space occupied by reference point cloud into voxel grid. Then the probability density function (PDF) describing the voxel is calculated according to the distribution of scan data dropped in each voxel. In this paper a dynamic voxel grid map is introduced, which can be updated dynamically with newly upcoming points. The map supports dynamic insertion and deletion of points with a HashMap storage structure of voxels. Each voxel contains several attributions including index, centroid (mean), covariance matrix, points and the number of points. Only voxels contain more than a certain number of points are considered as active and used in our correction algorithm. In order to prevent collision, the voxel index I(q) in which a point q lies in is calculated as:

$$ I(q) = \left\lfloor {t.x} \right\rfloor + \left\lfloor {t.y} \right\rfloor < < h_{bits} + \left\lfloor {t.z} \right\rfloor < < 2h_{bits} , $$
(6)

where ≪ represents the left shift operator; \( t = T_{\text{mg}}^{ - 1} \cdot q \cdot r_{es}^{ - 1} \), Tmg represents the transformation from the current coordinate system to the map coordinate system; res represents the grid resolution. The index I(q) is used as the key-value of the HashMap and all points are stored in their corresponding HashMap. So that only the mean and covariance matrix of the voxels with newly inserted points need to be recalculated in each iteration. When n points qi(i = 1,2,…,n) are inserted into a voxel already having m points, the mean value μ of the voxel will be updated as:

$$ \mu \leftarrow \left( {m \cdot \mu + \sum\limits_{i = 1}^{n} {q_{i} } } \right) \cdot \frac{1}{n + m},\quad m \leftarrow n + m, $$
(7)

and the covariance matrix is recalculated as:

$$ \varSigma = \frac{1}{m - 1}\sum\limits_{i} {(q_{i} - \mu )(q_{i} - \mu )}^{\rm T} , $$
(8)

As the vehicle moves, the dynamic grid map will delete voxels which are outside the radius of the map to reduce the storage memory. Moreover, the eliminated points of ground planes as above mentioned are restored to compensate the accumulated error of the odometer in height. The correction algorithm is concluded as Algorithm 3.

figure h

In the above table, the letter j is used to represent the j-th update of the keyframes, \( M_{j} \) is the local sub-map after the j-th update. The inputs of the algorithm are the current frame of point cloud \( P_{k} \), the current local sub-map \( M_{j - 1} \), the pose transformation of the last key-frame \( T_{j - 1}^{W} \), and the current output of the localization node \( \tilde{T}_{k}^{W} \). The algorithm collects key-frames at the frequency of 1 Hz, and \( \hat{P}_{j} \) is obtained by down-sampling the current key-frame \( P_{j} \). Then the current key-frame is matched with the local sub-map using NDT algorithm to obtain \( T_{j}^{W} \) with initial transformation \( \tilde{T}_{k}^{W} \) from the localization node. After that, the algorithm updates the local sub-map with \( \hat{P}_{j} \), and maintains a fixed-size local dynamic voxel grid map. Finally, the algorithm outputs \( M_{j} \), \( T_{k}^{W} \) and \( T_{j}^{W} \) for the next iteration.

Figure 8 shows the process of the correction. Suppose in the k-th scan we got the point cloud Pk (colored in red), the local map Mj (colored in black). Our goal is to find the transformation \( T_{k}^{W} \), and to complete the matching of Pk and Mj. The matching result is the green part in Fig. 8.

Fig. 8
figure 8

The correction node uses NDT to match the current key-frame with the local sub-map. The black is the built local map, the red is the source point cloud, and the green is the resulting point cloud (Color figure online)

It is should be noted that when a key-frame Mj is selected and corrected by NDT matching with the previous key-frame Mj-1, the normal frames between these two key-frames are no need to be corrected with linear interpolation techniques, because our algorithm is used for real-time applications so that only the key-frame should be corrected and updated.

6 Loop closure

For loop closure in back end, a heuristic location-based detection method is introduced in this paper, that is, using a 3D-Kdtree to save historical locations(namely poses here), and then searching for neighbor historical locations in the Kdtree for currently estimated localization, so that a loop candidate is thought to be detected if the distance between these two locations is less than a threshold rth. It should be noted that though the measures used here is simple (yet effective) and mainly constructed to keep the loop closure node consistent with the proposed two-node localization-correction odometry module, they are more like realization (or tricks) rather than theoretical contributions. However, the implementation of the loop closure is of great significance to the integrity and practical application of the proposed S4-SLAM algorithm, and given as follows in details.

The process is shown in Fig. 9. Considering the real-time performance, the difference of frame indexes between the selected historical location and the currently estimated localization should be greater than a threshold nth, which can avoid a large number of unnecessary verification processes. The historical location satisfying the above conditions will be marked as a candidate loop.

Fig. 9
figure 9

Loop detection on the pose graph via a location-based strategy. For example, the current frame index is k, and r is the radius to search for the historical locations. Three frames with index k-i-5, k-i-4, k-i-3 are candidate loops, while the frame with index k-1 should be excluded

Moreover, the overlap rate (namely LCP) is used to verify candidate loops, which can improve the accuracy and recall rate of the algorithm, and reduces the probability of false positives. The Super4PCS algorithm is used to match two frames under a candidate loop in turn. When the overlap rate is greater than a threshold λloop_th, the candidate loop will be determined to be a real loop. Finally the ICP algorithm is used to match the loops and terminate the iteration.

Our method can still achieve good results in the case of large displacements and low-overlap of point clouds because the Super4PCS algorithm is a global matching algorithm, and the verification process can reduce the probability of false positives. It could work well when the accumulative drift is less than rth, but the performance depends heavily on the selection of rth. If rth is too large, many unnecessary verifications may be carried out, else if rth is too small, it may not be able to detect a real loop. After a lot of experiments, our paper takes rth = 20 m. If GPS data is available, it could be useful to assist the above process in a more effective way.

Based on the pose graph, an optimization step is carried out to close the detected loop in a similar way to Latif et al. (2013). Nodes are set to poses of key-frames, and an edge is the motion estimation between two nodes. The optimization problem is transformed into a least squares problem as follows:

$$ \mathop {\hbox{min} }\limits_{\xi } \frac{1}{2}\sum\limits_{i,j \in \varepsilon } {e_{ij}^{\rm T} } \sum {_{ij}^{ - 1} e_{ij} } , $$
(9)

where ε is the set of all edges; ξ is the optimization variable (the poses of all nodes); eij is the error function of poses; and \( \sum_{ij}^{ - 1} \) is the information matrix.

It should also be noted that the loop closure algorithm only optimizes the key-frames, and normal frames between key-frames do not participate in the optimization process. In this paper, the linear interpolation technique is used to obtain the pose information of the normal frames between two adjacent key-frames. The interpolation scaling can be calculated as the scaling of current pose estimations between these frames with the output of the localization node and the correction node, or simply set to the equal values.

7 Experimental results

In this section, our proposed S4-SLAM method is tested extensively on KITTI dataset, and real-life outdoor experiments like campus and harbor with a ground mobile robot and an unmanned surface vehicle respectively. All algorithms are realized in the ROS(Robot Operation System) environment on a PC platform with Intel (R) Core (TM) i5-3210 3.0 GHz dual-core CPU and 16GDDRIII RAM. In experiments, we use the same criterion namely the drift ratio as the KITTI server, which is calculated as following steps: (1) Calculate drifts every 10 frames, and find frames that are 100 m, 200 m,…, 800 m away from the current frame according to ground truth; (2) Find the same numbered frames in the test dataset, and calculate pose transformations between these frames and the current frame; (3) Compare these transformation to true pose transformations to obtain the translational error \( t_{e} (x_{e} ,y_{e} ,z_{e} ) \). The drift ratio can be calculated as:

$$ {\text{Drift}} = \frac{{\left\| {t_{\text{e}} } \right\|}}{L} \times 100\% , $$
(10)

where L is the displacement distance of the vehicle.

7.1 Tests on the KITTI datasets

The KITTI dataset was established by the German Karlsruhe Institute of Technology and the United States Institute of Technical Research (Geiger et al. 2012; Geiger et al. 2013). It is currently the largest computer vision algorithm evaluation dataset for automatic driving in the world. The data collection platform for the KITTI dataset is assembled with 2 grayscale cameras, 2 color cameras, a Velodyne HDL-64E LIDAR, 4 optical lenses, and a GPS/INS navigation system (See Fig. 10). Our method uses data from the Velodyne LIDAR only. The LIDAR runs at 10 Hz frequency and has \( 360^{ \circ } \) horizontal view field, \( 26.8^{ \circ } \) vertical view field and 120 m measuring range. The ground truth used in the experiment is obtained by a high-precision GPS/INS navigation system with drift less than 10 cm. The odometry benchmark consists of 22 sequences of data which contain grayscale images, color images, and laser point cloud data. For the sake of fairness, only Sequence 00-10 were provided ground truth, and Sequence 11-21 without ground truth were used for the website to evaluate algorithms. From all test sequences, the evaluation of the website computes translational and rotational errors for all possible subsequences of length (100,…, 800) meters, and the server ranks algorithms according to the results of the evaluation. The average drift of our algorithm is under 1%.

Fig. 10
figure 10

The KITTI platform is equipped with a Velodyne HDL-64E LIDAR, two stereo cameras, and a high-precision GPS/INS for ground truth acquisition

7.1.1 Verification for our improved Super4PCS

In experiments, Sequences 00-10 of KITTI dataset are used to test our improved Super4PCS algorithm on both accuracy and real-time performances.

Table 1 shows the accuracy improvement by filtering out ground and dynamic objects. It can be seen that the new algorithm can reduce the drift from 1.56 to 0.93%, that means over 40% promotion on accuracy.

Table 1 Effects of ground and dynamic objects removal

As shown in Table 2, the matching speeds, translation errors and rotation errors between the improved Super4PCS algorithm and the original Super4PCS algorithm are compared with Sequences 01, 03, 05, 07 and 09. The average time consumed by the improved Super4PCS is about 0.1 s, which is over 60% faster than the original Super4PCS algorithm. For accuracy comparison, according to KITTI’s suggestion, the translation error and the rotation error are calculated every two frames (the actual driving distance is about 3 m) on each sequence, and the average is finally taken. For each two frames Frame(i) and Frame(i + 3), calculate the translation matrix between two algorithm results as T, that is, Frame(i + 3) = T*Frame(i). Set the true transformation as Tg, we can have

$$ T^{ - 1} T_{g} = \left( {\begin{array}{*{20}c} a & {\quad * } & {\quad * } & {\quad dx} \\ * & {\quad b} & {\quad * } & {\quad dy} \\ * & {\quad * } & {\quad c} & {\quad dz} \\ * & {\quad * } & {\quad * } & {\quad 1} \\ \end{array} } \right) $$
(11)
Table 2 Comparison of time costs, translation and rotation errors between the improved Super4pcs and the original Super4pcs

The translation error and the rotation error between two frames are then calculated as

$$ \varepsilon_{\text{translation}} = \sqrt {dx^{2} + dy^{2} + dz^{2} } $$
(12)
$$ \varepsilon_{\text{rotation}} = \cos^{ - 1} \left( {\hbox{max} \left( {\hbox{min} \left( {\frac{a + b + c - 1}{2},1} \right), - 1} \right)} \right) $$
(13)

In Table 2 the translation errors of the improved Super4PCS are over 30% smaller than the original error, while the rotation errors are over 15% smaller. It can be concluded that the improvements of Super4PCS not only speed up the matching process, but also improve the precision, which enable the algorithm to run in scenes with high real-time and accuracy requirements.

7.1.2 Positioning and mapping experiments on the KITTI dataset

Table 3 shows the average time-consuming of each algorithm (including ground and dynamic objects filtering, Super4PCS, ICP, NDT) used in the S4-SLAM system. The point cloud data in KITTI dataset was de-skewed with an external odometry, so we don’t need to consider the effects of motion distortion. Finally, the system outputs positioning information at a frequency of about 5 Hz, and outputs mapping information at a frequency of about 1 Hz. Since the operating frequency of the LIDAR in the KITTI dataset is 10 Hz, therefore, our system runs at half the original frequency.

Table 3 Time-consuming results of the algorithms in S4-SLAM system

The KITTI datasets mainly contain three types of environments: “urban” with buildings around, “country” on small roads with vegetations in the scene, and “highway” where roads are wide and the surrounding environment is relatively clean. Among them, the KITTI data acquisition platform has a slower speed in urban and country environments and a faster speed on in highway (the highest moving speed is 85 km/h) environments. In order to test the performance of S4-SLAM under different environments and conditions, especially with high speed, large displacement and low overlapping of point clouds, the above datasets are sampled for test at intervals of 2, 3 and 4 frames respectively. Table 4 shows the drift and the time cost of the algorithm, and the average drifts are separately 1.03%, 1.31%, and 1.43%, which is a little larger than the original result 0.93% in Table 2. Moreover, time costs for each frame are separately 233 ms, 245 ms, 269 ms, which can basically meet the requirements of real-time applications. It can be concluded that our S4-SLAM can maintain good robustness in cases of high speeds, large displacements and low overlapping of point clouds.

Table 4 Time cost and drift of our method

Figure 11 shows the trajectory comparison between our algorithm and the ground truth in sequence 00, 05, 06, 07, 08, 09, it can be seen that all trajectories can basically coincides with the ground truth, which proves that the S4-SLAM system can meet the crucial requirement of long-distance accurate localization for outdoor vehicles.

Fig. 11
figure 11

The trajectory comparison between the results of our algorithm and the groundtruth on the KITTI training dataset. The red trajectory is the groundtruth and the green trajectory is our results, we successfully detect the real loops in sequence 00, 05, 06, 07, 09, but fail in sequence 08, because the cumulative drift is greater than the detection radius rth in sequence 08 (Color figure online)

Figure 12 shows mapping results of S4-SLAM with datasets collected from different on-road ground environments. Among them, Fig. 12a shows the map built in a city block, Fig. 12b shows the map that is built on a country road, and Fig. 12c shows the map that is built on a field high-way road. From the above results, it can be seen that the established 3D point cloud map is consistent with the actual environment.

Fig. 12
figure 12

The mapping results for different scenarios in KITTI datasets, a is in a city block (Sequence 07), b is on a small road with many vegetations (Sequence 09), and c is on a highway (Sequence 01)

The localization and mapping experiments from Figs. 11 to 12 show that the proposed S4-SLAM algorithm has good environmental adaptability and can basically meets the requirements of real-time location and mapping for outdoor ground vehicles even with high driving speed.

Furthermore, our method is also compared with the top rank LOAM algorithm on KITTI, as Table 5 shows, in which results of LOAM are directly adopted from (Zhang and Singh 2017). LOAM has been extensively recognized as the state-of-the-art LIDAR-only outdoor SLAM approach in the KITTI benchmark, which makes it suitable and typical for comparison with important reference significance. It can be seen that our algorithm outperform LOAM in some sequences such as 00, 01, 03, 05 and 07. The results show that the performance of S4-SLAM is generally comparable to LOAM, though there are still some differences between them. In constructed environments like urban and highway, as buildings often produce less noises and outliers, S4-SLAM performs even better that LOAM. On the other hand, as for country environments with many trees and brushes which may produce more noises and outliers, S4-SLAM is affected somehow and performs not as well as LOAM. It also shows that a good pre-processing filter to remove noises and outliers is important for S4-SLAM, because less mismatch can be achieved.

Table 5 Comparison of drift between our SLAM method and LOAM

As for real-time property, the updating frequency of LOAM is 10 Hz, while S4-SLAM is 5 Hz. However, though slower than LOAM, S4-SLAM can still keep effective in more challenging environments, such as ground/watersurface multi-scene applications, as the latter experimental results in a harbor shows, which indicates S4-SLAM has better robustness compared to LOAM.

7.2 Outdoor tests

In order to test the proposed S4-SLAM method with real-life applications, a ground wheeled mobile robot (namely a UGV) and an unmanned surface vehicle (namely a USV) are used respectively for experiments in a campus off-road ground environment and a ground/watersurface combined harbor environment without GPS or IMU informations. Both the UGV and USV platforms are equipped with a Hesai Pandar40 LIDAR for acquiring point cloud data by perceptions of environments. The Hesai Pandar40 is a hybrid solid-state LIDAR sensor with 40 scanning lines, 10 Hz scanning frequency and 150 m scanning range.

7.2.1 SEU(Southeast University) campus tests

Firstly an outdoor off-road ground experiment in the campus of Southeast University is carried out. The UGV platform used in this experiment is shown in Fig. 13a. It is equipped with a high-precision IMU, a differential GPS and a Hesai Pandar40 LIDAR. However, only LIDAR data is used in S4-SLAM, while the differential GPS and the IMU data can be used to provide the ground truth. Our aim is to test the LIDAR-only S4-SLAM algorithm in GPS-denied environments. Figure 13b shows the traveling trajectory of the robot. The end of the trajectory is set to the start point, so that the final trajectory forms a loop. The average moving speed of the mobile robot is about 3 m/s, and the total trajectory length is approximately 0.6 km. Figure 13c shows the actual scene of the SEU campus. There are a great number of structured and unstructured features in the environment such as trees, buildings, pedestrians, vehicles and a big fountain filled in waters, which make it really a complicated environment. As shown in Fig. 13d, the drift of the front-end LIDAR odometry without loop closing is 0.89%, while the average drift of S4-SLAM with loop closure can be further reduced to 0.48%, which means both the odometry function and the correction function can output satisfactory localization result. The loop closure is very effective and useful in promoting the localization accuracy, though the real-time performance of the system is slightly reduced. The above results indicate our S4-SLAM algorithm can be used to meet different level of requirements in real-life applications. If high real-time performance is crucial, only the odometry function can work very well. Otherwise the S4-SLAM can provide more accurate outputs. Figure 13e shows the mapping result of S4-SLAM. Buildings, trees, the big fountain and other objects in the scene can be clearly seen in the figure. The established 3D point cloud map is highly consistent with the actual environment. The SEU campus test demonstrates the effectiveness and practicability of S4-SLAM in complicated ground environments with structured/unstructured terrains, as well as dynamic targets like fast moving pedestrians and vehicles.

Fig. 13
figure 13

Positioning and mapping experiments in campus, a is a photo of the mobile robot, b is a photo of our experimental environment, c is the aerial photo of the campus from Google Maps, d is the trajectory comparison between S4-SLAM, LIDAR odometry and ground truth, the trajectory of S4-SLAM and the ground truth are basically coincident, but the error of the LIDAR odometry is larger, and the trajectory of the odometry does not form a loop, e is the mapping result of the campus, and the map built by S4-SLAM are basically consistent with the actual environment

7.3 Tests in harbor

In order to verify the robustness of the proposed method in scenes with sparse feature points, high moving speed and high dynamic noise, experiments are also carried out in a harbor, which means a ground/watersurface composite environment. In the experiment, the “Jinghai No. 3” unmanned ship platform is used, as shown in Fig. 14a. The platform is equipped with a high-precision IMU, a differential GPS, two cameras and a Hesai Pandar40 LIDAR. Again only LIDAR data is used in S4-SLAM, and the ground truth is obtained by the differential GPS and the high-precision IMU. Figure 14b, c shows the photos of the harbor. The distance between the unmanned ship and the shore is not more than 30 m. The average speed of the unmanned ship is about 10 m/s in the experiment, and the total length of the trajectory is about 3 km. Figure 14d shows the trajectory comparison of S4-SLAM with the ground truth for the first test, where the red line is the result of our algorithm, the blue line is the result of LOAM, and the black line is the ground truth. It can be seen that the two trajectories from S4-SLAM and LOAM are nearly coincide with the drift of 0.83%, while the accuracy of the LOAM degrades rapidly and produce large drift. Figure 14e shows the mapping results. The established 3D point cloud map of S4-SLAM is also highly consistent with the actual environment.

Fig. 14
figure 14

Positioning and mapping experiments in harbor for test 01, a is a photo of the unmanned ship, b is a photo of our experimental environment, c is the aerial photo of the harbor from Google Maps, d is the trajectory comparison between our algorithm, LOAM and ground truth, e is the mapping result of the harbor, the ship in the harbor is clearly visible in the map built by the proposed method

Moreover, other tests are carried out with different times and driving trajectories to show the effectiveness of S4-SLAM. Four of them are selected and illustrated in Fig. 15 and analyzed. The global drifts of S4-SLAM and LOAM for these four sequences are compared as in Table 6. All parameters of LOAM are re-tuned in order to produce as good result as it can. It can be seen clearly that LOAM’s drifts increase with the traveling length, while S4-SLAM can track the ground truth with better accuracy. By checking the frame-to-frame registration results, it can be found out that not only orientation but also translation errors are accumulated gradually as the vehicle travels. This may because the main idea of LOAM is to extract both edge and plane points to finish scan matching, which makes it works well in ground-only environments. However, in harbor environments, the lack of data on the watersurface makes plane features much less, so that the algorithm may be unable to register two frames very well with only edge points, as can be seen in Fig. 16a. It should also be noted that the high speed movements and the high noise levels in this challenging environment, the latter mostly risen from the diffuse reflection on the watersurface, can lead to bad features or wrong matchings and further weakened performance of the LOAM algorithm. The registration results of LOAM can be seen in Fig. 16b, which show registration error clearly in top view, while Fig. 16c also shows a large bending error in the front view. Sometimes LOAM may even fail if its parameters are not tuned properly. However, the proposed S4-SLAM algorithm still performs very well with better registration results as well as much less drifts, as can be seen in Fig. (d) and (e). Similar results can also be observed under other frame-to-frame matchings in experiments.

Fig. 15
figure 15

Other four tests in the harbor. a, b, c, d are the results for test 02 to 05. The left figures are the localization results, where the black lines are the ground truth, and the red lines are the resulting trajectories of S4-SLAM. The right figures are the mapping results (Color figure online)

Table 6 Drifts of S4-SLAM vs LOAM for harbor tests
Fig. 16
figure 16

Comparative analysis of LOAM and S4-SLAM. a shows features points which are extracted by LOAM from one frame; b shows the registration error of LOAM in top view; c shows the registration error of LOAM in front view; e shows the registration result of S4-SLAM in top view; e shows the registration result of S4-SLAM in front view

From above experimental results it can be concluded that the performance of LOAM highly depends on the type of environment, and cannot be proved robust in different kinds of applications, especially in multi-scene environments such as ground/watersurface composite environments discussed in this paper. It works quite well in scenes with rich feature points and fewer disturbances or noises, but not as satisfying in scenes with sparse feature points, high noise levels that may lead to more registration errors or drifts. On the other hands, in S4-SLAM several algorithms including Super4PCS, ICP and NDT are incorporated so that a coarse-to-fine registration process is created to keep the whole framework stable and robust even in ground/watersurface composite environments with the global drift under 1%. The S4-SLAM method is proved to be an effective SLAM solution to more challenging environments. The core idea of S4-SLAM is to integrate the existing mainstream registration technologies (such as Super4PCS, ICP and NDT) to solve the problem of registration with sparse feature points and high noise levels in challenging multi-scene environments mentioned above. In addition, improvements are also made to the existing SLAM framework to achieve an effective combination and hierarchical processing of different registration algorithms. The SLAM system constructed in this paper is accurate, robust to noises, and has balanced real-time performance even in complicated multi-scene environments.

8 Conclusion and future work

In this paper, a novel real-time LIDAR SLAM system namely S4-SLAM is constructed to solve the problem of long-distance accurate localization and mapping of robots such as UGVs or USVs in ground/watersurface multi-scene applications. The system is composed of LIDAR odometry function and loop closure function. In the odometry function a new correction node is added to the traditional localization node so that pose outputs with different level of accuracy can be produced at different frequency, and in the loop closure function loops is detected to optimize the map. Our system works well on both the KITTI datasets and real-life scenes including a campus and a harbor. It is proved to achieve satisfactory trade-off between real-time and accuracy, and keep robust even in cases of few feature points, high moving speed and high dynamic noises.

In our current system, the loop detection is a simple process carried out by a heuristic location-based method. Finding a real-time and more stable loop closure algorithm will be the focus of our future work. Other works include more water areas test including lakes, rivers and seaside, and the fusion of GPS and IMU data to current system in order to build a LIDAR SLAM system with better performances.