Keywords

1 Introduction

Large-scale 3D point cloud and LIDAR (Light Detection And Ranging) technique are hot topics that gradually emerges and become ubiquitous in recent years, mainly used for large-scale 3D point cloud generation. Currently acquisition of both indoor and outdoor environments is widely developed and used in many fields such as navigation, architecture and real estate, and is getting popularity thanks to the appearance of 3D laser scanning machines and range cameras.

Compared to other modeling techniques, the merits of point cloud data obtained by LIDAR and Kinect are irreplaceable. First, the data is real and truthful, like the saying “what you see is what you get”. Second, big scale data indicates millions of points or even more, which contains rich information to be processed such as millimeter level accuracy. However, the existing noise makes it difficult to calculate interlaced objects like trees or other plants. Another shortage in current methods is the lack of combination of position, color and strength together to generate models. The existing algorithms usually deal with point cloud position but ignore true color of each point, which needs further improvement.

To achieve better results from the large-scale scanning point cloud data by LIDAR, many studies have attempted to establish or improve the point cloud processing algorithms. In these methods, the major challenge lies in how to identify the noise and classify the cluttered scene. Fortunately, there are some open source libraries emerged for dealing with point cloud, i.e., Point Cloud Library (PCL) of [1], which is a fully developed library for n-D Point Clouds and 3D geometry processing.

2 Point Cloud Processing

The processing of point cloud has already been developed and regulated as sophisticated mechanisms. We summarize the basic steps for the point cloud processing as shown in Fig. 1.

Fig. 1.
figure 1

Basic point cloud processing steps

2.1 Filtering

Filtering is usually the first step for point cloud treatment, which deals with noisy points, outliers, holes and data compression to obtain “clean” data.

Filtering methods have already been studied for a long time, [2, 3] similarly used some filtering methods in order to detect target like plane terrain surface, classify buildings as well as tiny elements such as electrical power lines. Act as an implement, [4] used first pulse data to improve the result. Instead of classifying points in a local neighborhood, [5] first segmented the point cloud into patches in which all points can be connected through a smooth path of nearby points and then these segments were classified based on their geometric relationships with the surrounding segments.

2.2 Feature Estimation

Feature as a key criterion plays important role in judging and estimating points. Local feature and global feature are two ways of estimating curvature and the normal of points. For feature estimation, [6] first developed a robust algorithm which can extract surfaces, feature lines and feature junctions from noisy point clouds. Later improvement involved [7] in whose work feature detection and reconstruction were recognized as problem during input occurs, described by a point cloud.

2.3 Key Point Extraction

Key point is also known as point of interest. Located on 3D point cloud or surface model, it can be detected and obtained by defining some certain standards and then extracting. Technically speaking, the number of such kind of points is far more less than the one of original one, thus making it possible for us to analyze what we really concern about. One thing to mention is that key point can be combined with local feature descriptors to be considered as key descriptors, moving forward to compactly represent the data we previously get from either Kinect or things like that. Some famous descriptors like SIFT and SURF are often used in this procedure.

2.4 Registration

In the area of reverse engineering, computer vision and ones like digital heritage, point clouds usually have defects such as incomplete data, translation and rotation dislocation. In order to obtain a complete data model, an appropriate coordinate transformation is needed, and sets of points obtained from different perspectives are merged into a unified coordinate system, and then operations like visualization can be carried out. When it comes to registration, [8] successfully handled a cluttered scene.

2.5 Segmentation

Segmentation is to assign a public label to similar region or surface. Methods in [9] mentioned smoothness constraint. According to the work of [9], the segmentation methods can be divided into 3 categories, where the target shape played as a judgment criterion.

  1. 1.

    Edge based segmentation. Typical variations on this were reported earlier in [10, 11]. Two stages were included here: edge detection and points grouping. One was to detect the outlines of the borders between different regions while the other mainly generated final segments

  2. 2.

    Surface-based segmentation. The similarity measure lies on local surface properties to conduct segmentation. Points with spatial distance and similar surface properties are merged together. One of the good performances is its noise-resistance. Similar to the previous one, surface-based segmentation also has two major categories: bottom-up which starts from seed-pixels and then grow and top-down which starts by putting the points together and fitting a single surface to it. [12, 13]

  3. 3.

    Scanline-based segmentation. Each row is considered as a scan-line and treated independently with each other in the first stage. So this method is especially suitable for range images. Typical application is [14] dealing with artificial construction.

2.6 Sample Consensus

Methods like random Sample Consensus (RANSAC) and primitives like planes and cylinders are commonly employed or combined freely in this procedure. Early work used Voronoi point insertion in local tangent spaces and Moving Least Squares (MLS) projection to realize the sampling. After that there was [1517] in the same period developing different version of Locally Optimal Projector (LOP) to effectively overcome outliers and noise. While the latest work by [18] presented an edge-aware manner which has higher robustness.

2.7 Surface Generation

Surface reconstruction is widely used in broad scope, ranging from data visualization, machine vision to medical technology even aerospace. [2, 19, 20] are some latest research in this field. Far more work has been done before. More discussions about surface reconstruction will be included in the Sects. 3 and 4.

2.8 Data Structure

At the end of the pipeline of whole point cloud processing, we should pay much attention to the data structure which is a key problem for point data storage and processing, as high efficient structure has critical effects to the algorithm speed and storage. Quick search method based on the neighborhood is realized here. [21] for the kd-tree and [22] for the octree are all excellent research in this field.

3 Urban Environment Laser Scanning

One of our focuses is to analyze the commonly used methods on the point cloud of outdoor large scenes, and from this part, we will focus on the Lidar information acquisition and data processing of urban environment, which is used most as one kind of large-scale scenes.

3.1 Target of Laser Scanning and Remote Sensing for Urban Environment

Most research are engaged into the management of recovering single buildings or downtown area, while newly rising of researches are aiming at residential area. It is worth mentioning that [23] showed how to get detailed scanned data: two or more rotating laser scanners were taken on a moving car, even a helicopter to scan in full view.

Dense buildings: An earlier work [24] first worked out ways to rapidly and automatically reconstructing large-scale model base on remote sensor data. The next year saw an explosion of great work [25, 26]. Recently was [27] who proposed a 3D urban scene reconstruction method based on the exploration of properties of architectural scenes. A supplement was [28] that considered trees and topologically complex grounds almost at the same time.

Residential area reconstruction is a newly emerge interest topic. In contrast to multiple-floors or high-rise buildings mentioned in [2931] gave a unique idea to decompose and reconstruct irregular low buildings. Another problem to address is the dense trees that frequently appear in company with residential buildings. Aiming at these areas, related research well defined the problem and found a comparatively clear way to detect the vegetation. Other previous work include [32, 33].

3.2 Scanning Methods and Solutions

Great efforts have been dedicated to the 3D reconstruction of urban environments from point data sets. But there are still challenges to be addressed when it comes to significantly complex.

3.2.1 Manhattan-World (MW) Grammars

[26] combined the existing mapping and navigation databases with computer vision methods following Manhattan World assumption. What’s more [34] developed the MW methods so that an independent complete model can be obtained to describe buildings with partial texture. Figure 2 shows the pipelines of the two methods for contrast. [25] took MW into consideration and created flat roof models. Tracing back to [36] we found an origin of this MW method. At that time researchers had observed that most indoor and outdoor (city) scenes were designed on a Manhattan three-dimensional grid.

Fig. 2.
figure 2

Pipelines of Zheng et al. [26] (up) and Vanegas et al. [34] (down)

3.2.2 Aerial LiDAR Method

Many research efforts have addressed the complex problem of modeling cities from aerial LiDAR data. Several automatic pipelines have been introduced by recent work (e.g., [32, 37, 38]). The work above all removed some kinds of trees and noise, while the remaining objects were divided into ground points and building patches which were gridded then.

There are still some problems to be solved so that objects other than planar can also be reconstructed. Therefore works aiming at primitive emerged. Based on a RJMCMC sampler, [39] established two steps to combine parametric models. Work [37, 40] also addressed this problem and detected planes via user interaction. Studies [24, 35, 41, 42] acted as implements to show this method.

3.2.3 Multi-view Stereo (MVS) Algorithm

Different from data captured by LiDAR methods, MVS combines various viewpoints together. [27] proposed a 3D urban scene reconstruction method based on exploration of properties of architectural scenes. Briefly, it utilized a given set of calibrated photographs to generate point clouds, and an MVS algorithm was used in the process, whose details were given in [43]. [44] presented MVS imagery that sometimes had spatially heterogeneous point distributions without induced adjacent relationships among each two points, including outliers.

As a supplement was a patch-based MVS (PMVS) algorithm presented in [43]. It used a sparse set of matched key points for matching, expanding and filtering. This process was repeated until a visibility constraint to filter away false matches can be applied.

3.3 Major Objects of Urban Remote Sensing

Several papers indicated that there were three representative elements in the urban scenes we would concern about, namely buildings, trees and ground.

Buildings are one of the most important elements when dealing with urban environment. Objects namely roof and wall are all focus of numerous studies. [19] well interpreted the reconstruction of such parts (see Fig. 3). As illustrated on Fig. 4, [28] simplified mesh-patches while keeping a high accuracy. Trees are always sort of troublesome when it comes to accurate reconstruction. Although [31] truly involved scenes as residential area, its treatment about trees was still a simplified template matching method. The other method such as using billboard for trees’ representation would be a shortcut, but from the street view, a more realistic tree modeling was more necessary such as [45].

Fig. 3.
figure 3

Reconstruction of roof and wall

Fig. 4.
figure 4

Simplified mesh-patches

Ground independently can make up an important landscape no matter in which fields. Point cloud related things mainly concentrate in surface reconstruction. A continuous surface is often used to represent ground. Generally speaking plane is considered as an imitation of ground.

3.4 Advantages/Disadvantages of Existing Methods

Because of the diversity and complexity of our references, limitations and contributions cannot be completely included in this paper. Here we briefly give a summary as following:

  1. 1.

    For Manhattan-World methods, there are mainly three limitations. MW assumption results are the first one. Although the MW parts are efficiently reconstructed, there are still lots of architecture not belongs to the type.

  2. 2.

    Second, according to a classification-depending idea, bad results may appear owning to great amount of noise and missing data.

  3. 3.

    Third, all the work above cannot effectively handing data sets with tiny change or poly tropic surface.

4 Indoor Scenes

The other of our focuses is to analyze the commonly used methods on the point cloud of indoor scenes. In contrast to external surface of buildings which are relatively piecewise flat, inside scenes are more complicated when it comes to 3D structures [46]. Let alone the endless furniture with various shapes, rooms in and out are also a big problem (Fig. 5).

Fig. 5.
figure 5

Complex indoor scene

4.1 Scene Understanding

The major problem lies in the recognition of hundreds of objects; here we call them same kind with different shapes. Even one same kind of objects can have several forms (Fig. 6), thus increasing the difficulties when handling scanned data.

Fig. 6.
figure 6

Different results in same searching premise

4.1.1 Separation

As addressed in [46] (see Fig. 7), classification and separation were interdependent issues, and the realization triggered an algorithm which went through the whole room by a search-classify region-growing process. [47] presented a method according to texture and surroundings to identify objects. [48] presented an algorithm for indoor scene separation. In the research classification labeled of features are detected and separated via graph-cut to the whole scene. [49] combined color, depth and contextual information together to realize a semantic labeling progress.

Fig. 7.
figure 7

Separation and classification outcome of Lee et al. [46]

4.1.2 Classification

As mentioned above, the classification methods of 3D box around objects is adopted by [50] while [51] made a supplement with physical considerations. Rather than the image understanding background, [52] first pre-segmented the obtained points and then found good way to detect repeating areas. The latest is [49] who used a graphical model to learn features and contextual relations across objects.

Apart from the two sub-problems, Geometric priors for objects are also involved. Similar to [53, 54] used geometry to represent individual objects, which were commonly utilized in understanding surroundings. Similar works include [5557]. They all engaged in understanding indoor objects and filling missing parts.

4.2 Scanning Techniques

Also thanks to the quick development of range camera, scanning becomes an easy task. Among the vast literature, the possibility of real-time lightweight 3D scanning has been early demonstrated by [59]. When it comes to the up-to-date techniques, [60] presented a guided real-time scanning setup, where the incoming 3D data stream was continuously analyzed, and the data quality was automatically assessed.

For further study, repetition [61], symmetry [62, 63] also got some notice. Primitives as well played a role in the completion of missing parts. Other geometric proxies and abstractions including curves, skeletons, planar abstractions, etc. have been used. In the context of image understanding, indoor scenes have been abstracted and modeled as a collection of simple cuboids [58] to capture a variety of man-made objects.

4.3 Scene Modeling

Several decades ago people have set about to use laser scanner on a mobile robot to obtain indoor circumstance. Literature introduced ICP (iterative closest point) or SLAM (simultaneous localization and mapping) techniques. However the limitation of expensive hardware took the two to an end. However, For instance, parts can act as entities for discovering repetitions [52], training classifiers, or facilitating shape synthesis. In [60], multiple objects of a single category could also be represented by a smaller set of part-based template. Expensive matching is usually a basement of these approaches, along with no low memory footprint real-time realizations.

5 Conclusions

Point cloud and large-scale scenes based on optical acquisition are topics that are gaining increasing attention by recent years, and new relative researches spring up constantly. So far, remarkable progress has been made in both basic processing of traditional point sets and newly developed approaches in scanning streets, parks and households. Meanwhile, algorithms continuously appear to improve the previous ones. What people have done not only solves the problems of understanding what a large environment we are staying in, but also helps better drawing blue print for the coming city construction as well as detailed decoration.

Challenges still exist and we need to do better jobs. Acquired models need to be more accurate and less noisy, data sets need to be greatly enlarged, and results of reconstruction also have much more to be revised.

With the development of technology, more accurate range cameras come into use which will largely promote the solution and accuracy of point cloud. Besides, improved algorithms can shorten the calculation time meanwhile enhance their robustness. The expected result is to clearly obtain point data and successfully reconstruct all kinds of architecture as much as possible. Trees, heritage buildings, and some irregular ones are the main problems that to be solved.

This survey mainly provides an overview of the previous works, and relative methods and ideas included should be further explored from the references in order to gain a more over-all understanding. Our goal is to lay the foundation for the novices in this field, and we hope we can give valuable insights into this important research and encourage new ones.