1 Introduction

LiDAR (light detection and ranging) is a surveying technology that measures distance by illuminating a target with a laser light [1,2,3,4]. Lidar is popularly used as a technology to make various kinds of geometric and geological data. In some cases, it is simply referred to as laser scanning or 3D scanning, with terrestrial, airborne and mobile applications.

Typical LiDAR systems produce large-scale data sets, with lots of 3D sampling points. Sometimes, they refer these data point sets as point clouds. Figure 1 shows a typical airborne LiDAR acquisition system for large landscape-scale data sets. Figure 2 shows another terrestrial LiDAR acquisition system, for ground structures like buildings, rock formations, and others.

One of the fundamental problems with LiDAR data is that there are still no general-purpose, open standard for storing and analyzing LiDAR data. Thus, the LiDAR system developers should make decisions on the storing file formats and fundamental analysis methods.

Another problem occurs with the enormous number of points captured with LiDAR equipment. A typical large-scale point cloud consists of millions of points, which makes it hard to be loaded onto the main memory of commercial PC’s. To efficiently use these LiDAR data with other applications including GIS (geographic information system) and others, extracting abstract information from the LiDAR data is strongly required.

Fig. 1
figure 1

A landscape-scale airborne LiDAR data acquisition system (courtesy of ASPRS)

Fig. 2
figure 2

Terrestrial LiDAR data acquisition system

In this paper, we first present a set of solutions for the LiDAR data handling. The widely used LiDAR file formats are also explained. Then, we show our method to extract geometric representation from cloud points, acquired from airborne LiDAR devices and also from terrestrial LiDAR devices. We also show efficient way of using existing general-purpose geometric algorithms and their implementation.

Our prototype implementation of the LiDAR framework was accomplished through combining the LiDAR file formats, our method of extracting geometric primitives, and the generic versions of geometric algorithms. We finally show the results of our LiDAR framework.

2 LiDAR file formats

Since there are no dominant and/or open standard specifications for storing LiDAR data, there are various kinds of file formats such as XYZ, PTS, PTX, PCD, and others. Among them, we will show two of the most widely used LiDAR data file formats: LAS and E57. Our practical choices of E57 handling library is also explained.

2.1 LAS file format

The LAS file format (short for LASer) is a data format designed to store 3D point cloud data, especially for airborne LiDAR scanning devices. It was developed by the American Society for Photogrammetry and Remote Sensing (ASPRS) [5, 6]. The LAS format is a sequential binary format used to store data from sensors and as intermediate processing storage by some applications.

Figure 3 shows the global view of a LAS file format. Figure 4 shows an example format of single point data record stored in the LAS file, containing intensity, scan angle, color value. Those data are typically specialized to detect and extract features from the area-based LiDAR data.

The libLAS is a library for reading and writing geospatial data encoded in the LAS file format [7]. It consists of base library with multiple application programming interfaces available for programming languages like C, C++, Python as well as languages available in .NET Framework. Also, a variety of useful command-line utilities is provided for translating LAS files from one version of the LAS format to another, inspecting header information, and translating LAS data to and from text.

2.2 E57 file format

The E57 file format is a compact, vendor-neutral format for storing point clouds, images, and metadata produced by 3D imaging systems, such as laser scanners [8, 9]. This file format is specified by the ASTM, an international standards organization, and it is documented in the ASTM E2807 standard. The E57 format is intended to be a more general format that is well-suited for storing data across a variety of application domains.

At a high level, the structure of an E57 file is a hierarchical tree structure, as shown in Fig. 5. The format of the hierarchy is based on the XML data format. At a low level, the actual point data is represented using a compressed binary format. Other large data blocks, such as images, are also represented efficiently in binary. In this way, the format supports flexibility and extensibility using text-based XML, while enabling efficient input/output and storage using compressed streams of binary data.

Fig. 3
figure 3

LAS file format

Fig. 4
figure 4

An example point data record format stored in the LAS file

Fig. 5
figure 5

E57 file format

An E57 file is divided into three parts: a header, a set of optional binary sections, and an XML section. The header is a small, 48-byte binary structure that contains critical file-level information, such as the version number and the location of the XML section. The XML section contains the hierarchical tree structure describe above. If the file contains point data or images, these portions of the hierarchy are referenced by the XML section, and the actual data is stored in the binary sections, with a separate section for each set of points or image.

2.3 libE57: an existing implementation

The libE57 system consists of a library, supporting utilities and example programs, and documentation. The software includes two separate application programming interfaces (APIs) for reading, writing, and manipulating E57 files—the foundation API and the Simple API [9].

The foundation API is a full-featured interface that operates at a relatively low-level, allowing control over all aspects of an E57 file, including custom extensions.

The simple API is a simplified interface (built on top of the foundation API) that supports the most common use cases for reading and writing E57 files. The library also includes tools for converting LAS format files into E57 files, and a tool for validating E57 file correctness is under development.

3 Geometric primitive extraction

From the point data in either of LAS and E57 file formats, we can reconstruct the LiDAR scanned point cloud, as shown in Fig. 6. In this section, we show two implementation schemes to extract geometric primitives from the point clouds. The existing geometric library of CGAL [12] are used to realize our schemes.

3.1 DEM extraction

In this section, we present our scheme to extract DEM (digital elevation model) data from airborne LiDAR data. Digital elevation model (DEM) represents bare ground surfaces without any objects like plants and buildings. To extract DEM data from the LiDAR point clouds, not only reconstruction of the point clouds but also object removals are needed [10]. However, this method takes a large amount of time to find neighboring points for each points. Thus, it is not appropriate for large-scale airborne LiDAR point clouds which typically consists of millions of points.

Fig. 6
figure 6

An example LiDAR scanned point cloud (courtesy of UNAVCO)

To generate DEM’s, we used height maps, which can be extracted from the height information of point clouds. Height maps are actually the projection of point clouds onto 2D planes. Initially, a height map is constructed as a two-dimensional array with zero values. For every points in the target point cloud, we project those points onto the 2D plane. During the projection, we can store the original height of those points as floating point numbers. In our system, we digitized the Z values to integer values between 0 and 255, for more simplicity and efficiency. Since we will use these height maps mainly for visualization purposes, we can accept these rough approximation.

If needed, we can use floating-point values for more accuracy. Since the Z values in our height map is restricted to integers between 0 and 255, the height map can be interpreted as an intensity map, as shown in Fig. 7.

We extracts the contours through connecting projected points on height map with the same height value as shown in Fig. 8. However, using these contours directly for the DEM’s has some problems, since generated height map contains lots of small contours, which represents objects like trees and buildings, while DEM should represent the bare ground without any objects.

Fig. 7
figure 7

Height maps presented as intensity maps. (a) An example height map of airborne LiDAR data in rural area. (b) An example height map of airborne LiDAR data

Fig. 8
figure 8

Contour extraction from the height map of Fig. 7b.

Our system can update generated height map to remove objects, instead of manipulating point cloud itself. Since height maps can be represented as 2D intensity maps, it is much concise to read, and it can be manipulated as bitmap images.

Height maps can not only be manipulated itself, but also can be applied image based filters. For example, we can apply median filter to remove noises captured during LiDAR data acquisition, Gaussian filter to remove small object like trees and many another image based filters. Figure 9a shows the median filter applied version of height map. Figure 9b represents the generated contours using filtered height maps. Newly generated version of contour shows much smaller number of partial contours, which means small objects are automatically removed and in a proper way.

Fig. 9
figure 9

Height maps and contours after filtering operations. a The median filtered height map. b Generated contours from the filtered height map

Additionally, for LiDAR data that have color information, we also construct texture maps from color information of point clouds. Texture maps are represented as two-dimensional array. To construct a texture map, we initially clear it with zero color values. Then, for every points in point clouds, we project that point to the 2D-plane with the size of the given texture map. The color values of those points are stored at the corresponding locations. We used this texture map as an aerial photograph to render precise terrain. An example of constructed texture map is shown in Fig. 10.

3.2 3D Model reconstruction

In this section, we present our implementation scheme to construct the 3D geometric models from cloud points acquired with terrestrial LiDAR equipment. Our final goal is to build 3D model of real-world constructions, from their scanned LiDAR data. Those geometric models will be used on other application systems including 3DS Max, 3D GIS, and others.

Although there are lots of researches on 3D model reconstruction from point clouds [11], it is hard to adopt those methods to reconstruct surfaces for large-scale point clouds, which typically correspond to buildings and real-world constructions. According to [11], the biggest number of cloud points used for surface reconstruction is approximately 6 million, and reconstruction from these 6 million points took at least 30 seconds to get reasonable results. However, our captured data of building point clouds consist of more than 248 million points, as shown in fig 11a. Considering almost reconstruction algorithms show the time complexity of O(\(n^{2})\), it would take tremendous time to reconstruct those point clouds.

Another problem we can’t adopt precedent methods is they assume that point clouds would be in a single closed surface. Though buildings should be in closed surface like cube, LiDAR scanner can’t scan bottom side of the building, and finally we have big holes at the bottom. Figure 11b shows a typical sample point cloud of a specific building. Figure fig11c shows the hole at the bottom of that building.

So, our scheme is to manually pick geometric primitives like triangles, squares, cylinders and/or cubes out of point clouds to generate 3D Models. After picking basic attributes, we grabbed framebuffer to generate texture of generated basic primitives. Figure 12 shows a sequence of semi-automatically extracting geometric primitives from a large-scale point cloud.

Fig. 10
figure 10

A texture map extracted from the LiDAR data

Fig. 11
figure 11

Statistics and samples of our terrestrial LiDAR data. a LiDAR scanned data of buildings in Suwon, Korea. b A sample point cloud of a building. c The point cloud have a big hole in the bottom side

Fig. 12
figure 12

Sequence of extracting geometric primitives from point clouds to construct the 3D geometric model. a selecting a specific geometric primitives from the point cloud. b a simply extracted geometric primitive from a. c Generating a set of geometric primitives from the point clouds. d The final result of geometric primitives

3.3 Geometric primitive extraction with CGAL

In this section, we introduce CGAL, the Computational Geometry Algorithms Library. CGAL is a software library of computational geometry algorithms [12]. CGAL is a software project that provides easy access to efficient and reliable geometric algorithms in the form of a C++ library. CGAL is used in various areas needing geometric computation, such as geographic information systems, computer aided design, molecular biology, medical imaging, computer graphics, and robotics.

The CGAL library offers data structures and algorithms like triangulations, convex hull algorithms, point set processing, arrangements of curves, surface and volume mesh generation, geometry processing, alpha shapes, shape analysis, axis-aligned bounding boxes (AABB) and k-D trees.

In this famous geometry library, we focused on the following three components:

  • Point set processing component: This component implements methods to analyze and process unorganized point sets. The input is an unorganized point set (un-oriented or oriented). The point set can be analyzed to measure its average spacing, and processed through functions devoted to the simplification, outlier removal, smoothing, normal estimation, normal orientation and feature edges estimation.

  • Point set shape detection component: This component implements an efficient RANSAC-based primitive shape detection algorithm for un-oriented point sets. RANSAC (random sample consensus) is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed [13, 14].

  • Estimation of local differential properties of point-sampled surfaces component: For a surface discretized as a point cloud or a mesh, it is desirable to estimate pointwise differential quantities. This component allows the estimation of local differential quantities of a surface from a point sample.

We tried to use it for our geometric primitive extraction from cloud points that have closed surface. Figure 13 shows examples of surface reconstruction of point cloud with closed surface. Figure 13a shows a raw point cloud, and Fig. 13b–d is its geometric mesh representation, with respect to the difference resolution settings.

Fig. 13
figure 13

Examples of surface reconstruction with closed surface. a Original point cloud, 2,503 points. b Mesh representation with 2648 points. c Mesh representation with 932 points. d Mesh representation with 432 points

Fig. 14
figure 14

Our over-all system design for LiDAR data handling

Fig. 15
figure 15

Integrating the geometric extraction with an existing GIS application program

4 Implementation and its results

Originally, our fundamental strategy for the geometric data extraction was to develop in-house versions of the well-known registration algorithms. However, we rapidly recognized that preceding well-known algorithms are inefficient for both airborne and terrestrial LiDAR data.

We developed new schemes to extract geometric primitives from LiDAR data and adopted them to our prototype implementation. Additionally, we also tried to use CGAL which is one of widely-known well-known implementation, in our implementation. The basic steps in our prototype implementation are presented in Fig. 14. From the implementation of our prototype system, we acquired results in Figs. 9, 12, and 13 with reasonable performances.

The acquired geometric primitives are combined to construct the geometric representation of real-world constructions, from their corresponding point clouds. The geometric primitives can be used in various application programs. Figure 15 shows an example integration of our extracted geometric model with existing GIS navigation platform.

5 Conclusions and future work

Nowadays, we have many requirements for LiDAR data handling. We started to find an efficient way of handling LiDAR data and its related information. Actually, we focused on two technical issues:

  • Storing the LiDAR data itself

  • Extracting geometric primitives from those point clouds.

Our analysis shows that the E57 file formats and some registration algorithms are good solutions to these issues. Additionally, we found that some algorithms are already available in public domain, especially in the geometry handling libraries including CGAL.

We developed two new implementation schemes to extract geometric primitives. With these new schemes, we achieved the final prototype of LiDAR data handling framework. Through combining these methods, we got the geometric models remarkably efficiently, even in a cost-effective way.

Our prototype system still need some more technical improvements in its practical implementations. In near future, we will accelerate our implementation with GPU-based parallel executions with CUDA and/or OpenCL.