1 Introduction

Understanding 3D images is increasingly important in fields as diverse as navigation, architecture, and biology. Each domain is replete with unique patterns that practitioners must uncover as they struggle to make sense of their data. In urban scenes, for example, the topology is characterized by the broad planes of facades, the slightly curving manifold of the street, and the fenestration of regular repeated architectural features like windows, balconies and cornices. Also notable is the fractal noise of vegetation, the Bezier curves of parked cars, and the high frequency spikes of objects in motion faster than the laser can catch them. In this paper we show how a focus on the repeated architectural structures yields significant scene understanding and can be used for compressing and registering range scans. We demonstrate a method to discover these regular structures online, as the scanner scans. We apply knowledge of these structures to facilitate compression and registration.

Processing point clouds quickly is crucial in many scenarios. Analyzing scans column by column allows seamless integration with the 3D camera because the camera is likewise scanning the scene column by column. This synchronizes the execution of the algorithm with gathering the data, accelerating our work to the point of just-in-time scene understanding. While focusing on individual columns gives excellent local knowledge on the scene we do not wish to lose sight of the big picture. After all, each column is just a sliver of a much larger tableau. However, by keeping an evolving data structure with macro features and greedily updating as the scan unfolds we are not forced to choose between speed and global knowledge, in fact our algorithm achieves both.

We can imagine this online processing as a line sweep algorithm. The scanline or column of measurements moves across the scene in discrete steps. At each step we search for periodicity and planes. We aggregate and update this data maintaining best estimates about higher level features in the scene. The processing occurs on the fly so that by the time the scanline reaches the edge of the scene our algorithm has done its work and requires no further manipulations of the data. Integration into the 3D camera hardware is a natural step forward for these types of algorithms.

2 Related Work

Modeling 3D urban scenes from range scan data is a field of active research (Allen et al. 2003; Stamos et al. 2008; Zhao and Shibasaki 2003). There is a plethora of range image segmentation techniques from range data including edge detection (Bellon and Silva 2002; Wami and Batchelor 1994), region growing (Besl and Jain 1988; Pulli and Pietikinen 1993), and polynomial surface fitting (Besl and Jain 1988; Marshall et al. 2001). Most of these methods provide edge maps or regions expressed as polynomial functions. Yu et al. (2001) utilizes a graph-cut approach for segmentation. Our earlier work includes segmentation algorithms for the extraction of planar, smooth non-planar, and non-smooth connected components (Chao and Stamos 2007; Stamos and Allen 2002; Stamos et al. 2006).

Detecting symmetries and repetitive structures (such as windows and balconies) can be an important element of scene understanding. The concept of shape grammars for buildings introduced by Stiny (1982) as a formal approach to architectural design has been used for the image-based modeling of facades acquired through aerial photographs (Muller et al. 2007) and for the generation of synthetic cities (Muller et al. 2006). A shape-grammar approach that uses ground-based images was presented in Teboul et al. (2010, 2011). An earlier approach is the one of Lee and Nevatia (2004). The work of Mayer and Reznik (2007) uses a sequence of images to produce a sparse 3D point cloud as input to the detection algorithms. Work on the same topic is presented in Wu et al. (2010). Finally Park et al. (2009) detects regular patterns in images that are not constrained to be buildings.

Fewer approaches exist when the input is a 3D range scan. Pauly et al. (2008) derives regularities of substructures from a 3D model or range scan of a scene. This works by detecting symmetries (or similarity transformations) of basic structures in a regular grid. This general approach can be used for extracting regularities but it is sensitive in the calculation of curvatures and computationally intensive. Shen et al. (2011) presents a partitioning of urban facades which detects grids in an adaptive way. This extends the transformation voting technique of Pauly et al. (2008) to facades that are not globally rectilinear. In Stamos and Allen (2002) window-like rectangular features were extracted by using 3D edge detection on high-resolution 3D data. Line features are again utilized in Bokeloh et al. (2009) for symmetry detection. This symmetry detection of line features builds upon a graph based method for detecting symmetry presented in Berner et al. (2008). Nan et al. (2010) presents an interactive interface which exploits regular structures in urban scenes to improve sparse point clouds. Employing a Markov Network approach (Triebel et al. 2006) labels points as windows, but requires training. The work of Li et al. (2011) provides a fusion of 2D- and 3D-segmentation and symmetry detection for decomposition of urban facades. The authors of Martinet et al. (2006) use spherical harmonic decompositions to detect rotational and reflective symmetries in meshes. Our work also exploits harmonics, but we use them to detect translational symmetries per scanline. Our paper provides algorithms that work online and purely on 3D-range input without relying at all on 2D-images. We have presented work in the online detection of repeated patterns in urban scenes using 3D-information (Friedman and Stamos 2011). This paper extends the applicability of that work into the area of 3D-range registration. Table 1 compares and contrasts several of the algorithms discussed above.

Table 1 Comparison of regularity detecting algorithms. This table is adapted from Mitra et al. (2012)

There is a vast literature in the area of 3D-range registration Iterative Closest Point (ICP) is one of the most popular range registration algorithms (Besl and McKay 1992; Rusinkiewicz and Levoy 2001). ICP provides very accurate results but requires a good initialization of the registration transformation. Methods that do not require initialization in most cases detect and match features such as spin images (Huber and Hebert 2003) or lines (Stamos and Leordeanu 2003). A RANSAC-type automated ICP algorithm is presented in Aiger et al. (2008). In our paper we are describing an automated range registration algorithm that is based on online detection and matching of repeated patterns. Our algorithm is extremely fast and accurate without being confused by parts of the scene that may be different (for instance objects such as cars that can move and change positions during acquisition, or vegetation that may change between seasons).

Our contributions with respect to earlier work can be summarized as follows:

  1. (a)

    We extract repeated structures from large-scale scenes online, processing the data as it is acquired while also enabling extremely efficient offline processing.

  2. (b)

    Using periodicity as the basis of our detection, we have robust results even in low resolution areas of the scene.

  3. (c)

    We compress data-rich range scans while maintaining their architectural regularity.

  4. (d)

    We present automatic registration opening the door to simultaneous acquisition and realtime registration from many range scanners at once.

  5. (e)

    All of the above are achieved without requiring user interaction, or training data.

3 Data Acquisition

Our lab operates a Leica ScanStation2 laser scanner (Leica Geosystems 2012). The data retrieved is organized into a two dimensional array. The direction of the laser is controlled by delicately calibrated stepper motors. The laser rotates a small consistent angle vertically and records a measurement and then repeats. Once an entire column has been recorded another motor rotates the laser a small consistent angle horizontally and another column is recorded. In the pages that follow we will refer to single arrays within the 2D array as both columns and scanlines. In the coordinate system returned by the scanner the z-axis corresponds to the vertical vector.

This data is considered 2.5 dimensional because the 2D array of 3D points contains valuable information that distinguishes the scan from an unstructured point cloud. For example, investigating neighborhoods in the 2D array we are able to fit normals in constant time without resorting to an exhaustive search of nearest neighbors, or the additional overhead of an approximate nearest neighbors (ANN) library.

4 Extraction of Repeated Architectural Features Through Fourier Analysis

Windows and balconies are intriguing data points in urban laser scans. They are ubiquitous: virtually every commercial and residential structure is perforated by some kind of fenestration. In general, they are regular, occurring at a defined period within the building. Yet despite this regularity they are challenging to detect because of their high variability of appearance.

Glass windows without any shades often allow the laser into an interior from which only negligible light can return, yielding a missing data point. Some materials used to make window blinds appear extremely planar and may be mistaken for facades. Other materials like fabrics and plastic shades will present highly irregular geometry depending on the wind, the angle of incidence and other factors.

To compound the confusion many window sills are crowded with house plants, children’s toys and other accouterments of apartment interiors. Balconies are marked by prominent self occlusions. Terraces which protrude from a building cast shadows on the facade which grow with the incident angle of the laser. This produces vastly different results for the same structure depending on its altitude.

All these facts make detection of regular architectural features difficult. Nevertheless the periodicity of the features is salient. While each individual feature may be unpredictable when analyzed together they are predictably unpredictable. This regularity can be extracted through Fourier analysis and this knowledge is fertile ground for downstream processing.

4.1 Finding the Major Planes

We begin by approximating the major planes for each column.Footnote 1 Most scanlines in the urban setting are dominated by points belonging to the ground and points from a facade. In general facades are extremely planar. The manifold of a city’s pavement is usually curved, but its curvature is in general low and a planar approximation is often sufficient. To extract these major planes we fit minor planes in small neighborhoods around each point in the scanline. Using Principal Component Analysis (PCA) we determine the normal to these minor planes as the eigenvector corresponding to the smallest eigenvalue of the covariance matrix of the neighborhood. The eigenvalue itself is also stored as its magnitude is a measure of the goodness of fit of the local plane to the local data. Throughout the scan we maintain our best estimates of the facade and ground normals.

Buildings rise more or less perpendicularly from the more or less horizontal ground. So we may filter potential ground normals from potential facade normals by taking the dot product with the vertical vector. We assume the vertical vector is known as there a number of simple ways to determine it. We expect this product to be near 0 for ground normals and near 1 for facade normals. Now we have gathered two groups of small local planes from which we would like to deduce two global planes. The eigenvalues computed earlier help separate the noise from the signal by measuring the reliability of each local plane. However, simply selecting the normals corresponding to the smallest eigenvalues in the column would be a mistake. It is common, for example for a facade with a balcony to register two reliable vertical planes, or a scan that passes over a box truck or a ceiling from an interior may contain two or more horizontal planes with very low eigenvalues. To insure against these misdetections, the major planes are chosen as those with low eigenvalues and agreement with the majority of other potential vertical or horizontal planes.

As the scan progresses and more columns are recorded we greedily update our major plane estimates. While searching for the most reliable facade normal, it is important that we remain vigilant against facade shifts that occur when the scanner passes over a corner of a building or two adjacent facades. Before the running estimate of the facade is updated the angle between the old normal and the new normal is measured. If it is insignificant and the current normal is more reliable than the previous it is updated. If this angle is near π/2 we note a corner. If this angle is not near zero and not near π/2 this may be a noisy column or modern architecture. The angle between facade normals does not detect facade translation which are common when adjacent buildings are offset from the street by different amounts. To ensure that the facade has not been translated we must check that:

$$ (f_1 - f_2) \cdot n_1 \approx (f_1 - f_2)\cdot n_2 \approx 0, $$
(1)

where f 1 is a point on the old facade, f 2 is a point on the new facade and n 1 and n 2 are normals of the two facades. If this is not the case then we must note a facade shift and going forward greedily improve the new facade. See Fig. 1. In settings where online processing is unnecessary, it is possible to improve previous global estimates after one pass has been made over the data.

Fig. 1
figure 1

The red lines indicate the edges of the major planes. The intersection of the ground plane with the facade plane can be computed even though this vector is often obscured in the raw data by vegetation, vehicles and pedestrians. The darker blue points are candidates for the facade plane and the lighter yellow points are ground plane candidates. Note that the facade shift has been correctly identified by the algorithm (Color figure online)

4.2 The Column Function

The global features of the major planes are complemented by the meso-level features of regular fenestration in windows and balconies. Fenestration tends to be regular occurring once for each floor. The first step in the detection of this regularity is construction of a column function. The estimates of the ground and facade planes allow us to distinguish all the points in the column on or near the facade. The rest of our analysis will focus on this subset.

The column function is derived from a local measure taken at each point in the column. Conceivably curvature, eigenvalues, or consecutive angles could be used. In our work we have chosen consecutive angles for the simplicity and speed of their computation. Two column functions are shown in Fig. 2. Despite being a noisy and simple metric the regularity of repetitive features shines through.

Fig. 2
figure 2

Two column functions are displayed. The lighter one has covered a planar region wheres as the darker scanline and corresponding column function pass over a section of the building with windows. The regularity of the windows shines through the noisy column function. This regularity can be precisely extracted using Fourier analysis (Color figure online)

Before this regularity can be extracted the column function must be interpolated. As it stands the column function is distorted by an oversampling of close features and an under sampling of distant ones. Figure 3 illustrates the predicament. To rectify this, points are sorted by their height. The column function is scaled to reflect these heights. After the linear interpolation, the column function shows actual distances in the scene and not the concentration of measurements returned by the laser scanner.

Fig. 3
figure 3

The scanner over-samples near features and under-samples distant ones. As a result the beginning of the column function before interpolation displays a lower frequency period. It appears that the frequency increases throughout the column function when in fact the period is constant, as can be seen in the column function after interpolation. Interpolation is critical to precisely detect the period of the column function

4.3 Frequency Space

The signal after interpolation is truly periodic. Fourier analysis will indicate exactly which period is present in the function. The Discrete Fourier Transform maps our signal into the frequency domain. Figure 4 displays one such transform. It is symmetric because the column function input is real. In frequency space periodic functions have salient peaks at the frequencies of the period in the signal. The zeroth frequency is the average of the signal and is the largest peak in the transform, but tells us nothing about the period. Rather it is the secondary spikes which indicate the dominant period of a column function. These secondary spikes are detected in one pass over the data. At this point it is possible to return to the spatial domain and plot the periods found in the frequency space. The results are displayed in Fig. 5. The results show that analysis of a single column is enough to discover the period of repetitive architectural features.

Fig. 4
figure 4

The Fourier Transform of a column function containing windows has noticeable peaks corresponding to the frequency of the period. The column function is a real signal so the transform is symmetric which accounts for the double penultimate highest peaks. The highest peak in this transform is the zeroth frequency

Fig. 5
figure 5

The dominant period from a selection of column functions which covered areas of periodicity

4.4 Aggregation

The estimation of the period from a single column may be improved through aggregation. Since the algorithm is processing data online directly after uncovering the period latent in one column function another column function is ready for processing. With high probability this column will record data from the same repeated architectural features that were uncovered previously. Adjacent columns with near identical period can then be grouped providing higher level insight into the features present in the facade. The period estimates can be improved by detecting the secondary spikes in the sum of all the transforms of the neighboring columns. Before summing the transforms must be centered about the zeroth frequency. This is because the columns are not necessarily the same size as occlusions may obstruct some scanlines and not others. Figure 6 illustrates this process. The resultant period is a more robust measure of the frequency of the repetitive features.

Fig. 6
figure 6

Aggregating Fourier Transforms provides a more robust measure of the periodicity of a set of adjacent periodic columns. The second highest peak in the aggregated transform is more prominent than in the transform of any single column. The black columns are planar regions and the colored columns are repeated structures (Color figure online)

4.5 Fenestration Centroids from Square Waves

Deeper insight into the periodic feature can be obtained if we fit a square wave to the column function. This wave measures the actual height of the architectural feature and allows us to extract the exact middle of the windows and balconies, see Figs. 8 and 9. The previously gathered information about periodicity in the column allows us to uncover the best-fitting square wave quickly. It is practical to exhaustively check all possible square waves up to some finite resolution whose period coincides with the period the Fourier analysis uncovered in the column. The square wave that fits the column function will be our estimate of both the height and centers of the architectural features.

To fit a square wave the column function is turned into a binary function by taking all values above a threshold as 1 and all values below it as 0. We measure the difference between square waves and this binary thresholded version of the column function. We repeat this process for every possible square wave within the period of the column, up to some discrete resolution. The wave with the smallest difference is chosen as the square wave approximation of the column function. A typical square wave and the column function it fits are shown in Fig. 7.

Fig. 7
figure 7

A square wave can be fit to the column functions. This yields more information about the features and can be used to obtain robust translation vectors between different views

The middle of the high edge of the square wave corresponds with the middle of the repeated fenestration. Square waves are generated for each column in each aggregated window group computed in Sect. 4.4. The center of the feature can then be estimated as the median of the vertical centers of each column and the mean of the horizontal distance between the first and last column. The centers computed for each column can be seen in Fig. 8 and the aggregate centers are shown in Figs. 9 and 21.

Fig. 8
figure 8

The square wave gives a good estimation of the window centers. Note that the periodic points from the individual column do contain both high and low frequency outliers. However in the aggregate our window center estimation is robust. Furthermore high frequency “outliers” often contain additional information such as the frequency of panes within each window. See Fig. 9 for the aggregate centroids that are the medians of the many centers shown above

Fig. 9
figure 9

The median feature middle from all the columns in the group is selected as the height of the groups centroid. The horizontal location is determined by the Chebyshev distance in x and y dimensions between the first and last scanlines in the group (Color figure online)

4.6 Evaluating Regularity Detection

We employ several methods to evaluate the quality of the detected translational symmetries. First, polygons of the repeated architectural features are superimposed on the scan data. This gives a visual verification of the detection of both the period and the phase of the features. Figure 10 displays the superpositions in several different scans.

Fig. 10
figure 10

The green polygons indicate the areas the algorithms detected as window regions. The size of each period is determined by the dominant frequency. The size and centering of the polygons is determined by the square wave fitting as described in Sect. 4.5. We compare these polygons against the ground truth from Fig. 11 to obtain Table 2 (Color figure online)

For a quantitative assessment we have annotated the scans with labels for the regular features. For example, the red points in Fig. 11 are labeled as windows by a human visually selecting them. This ground truth is compared against the polygons displayed in Fig. 10. If 90 % or more of the labeled points fall within the area detected as a window it is considered a true positive. If there is a detected window and no labeled points fall within it, then it is a false positive, and lastly, if there are labeled points but no detected window it is a false negative. In this way, Table 2 is obtained.

Fig. 11
figure 11

Points belonging to window regions are colored red. These labeled scans are compared against our algorithms output to evaluate the regularity detection. The quantitative results are shown in Table 2 (Color figure online)

Table 2 The precision and recall of the regularity detection is presented here

4.6.1 Runtime Analysis

The algorithms presented here run in O(nlogn) time where n is the size of a single column. The sorting of the column function and the subsequent Fast Fourier transform both come with a cost of O(nlogn). The aggregation of adjacent columns has the effect of multiplying n by a constant but does not change our big-O analysis. This is because windows and other periodic features do not scale up with the size of the scan. A larger city does not have larger windows. Rather the windows have a more or less constant range of widths and heights regardless of the size of the scan in which they occur. By a similar argument our exhaustive search for a square wave approximation of the column function is limited by the size of the period.

We conducted time tests of our algorithm running a variety of scans. The results are presented in Table 3. The per point processing time varies from 3 to 16 microseconds. This is several orders of magnitude faster than the rate at which the scanner acquires a single point. Additionally, our code is implemented in Java and not optimized.

Table 3 This table shows the algorithm’s running time on range scans of various sizes. Run time is not completely proportional to scan size. This is because the individual column size and the ratio of periodic columns also factor into the algorithm’s performance

4.6.2 Limitations

Many buildings in modern cities exhibit vertical periodicity. However many buildings contain aperiodic fenestration. Our algorithm can detect the major planes of such structures but will not detect regularity. Our algorithm is not appropriate for low-rise residential structures. These structures are unlikely to contain vertically repeated features.

In low resolution scans, the local planes fit to each point become unreliable. This obscures the alternation between planar facade and architectural features. If the resolution is very low even major plane estimation will become error-prone.

5 Compressing Range Scans

Range scans of urban scenes are dense. Data-rich point clouds of millions of points are not practical for many applications. Navigational robots may have limited computational resources onboard and processing millions of points may exceed their capacity. Networked applications may not have the bandwidth necessary to transmit this huge volume of data. However, much of the urban scene is repetitive, especially fenestration. This redundancy can be reduced through compression.

5.1 Segmentation and the Representative Region

Compression begins by identifying a representative region which will be stored in full, and used as a replacement for the data we remove. The vertical period of the facade was discovered in Sect. 4. Moreover, in Sect. 4.4 horizontal sections of the facade were grouped and classified as planar or periodic. We now wish to segment the facade into like regions. One of these regions ought to be able to replace all the others vertically aligned with it, without compromising the character of the facade. Furthermore, the planar sections of the facade which are interspersed with the periodic sections of windows and balconies are also ripe for compression.

Segmentation proceeds by creating a rectangle around each periodic feature. The height of the rectangle is determined by the period detected in Sect. 4.3. The width of the rectangle is determined by taking half the distance to the center of the neighboring periodic features, unless these features occur in the beginning or end of a facade in which case the rectangle is stretched to the facade’s edge. We use the Chebyshev distance to extract this rectangle. If we used the Euclidean distance we would extract the circle around each window center.

Because windows typically have vertical and horizontal symmetry we wish to extract the feature’s surrounding rectangle. The Chebyshev distance between two points α and β in ℝ3 is defined as:

$$ D_{Chebyshev}(\alpha, \beta) : \max_{i \in(x,y,z)} \bigl(|\alpha_i - \beta_i|\bigr). $$
(2)

We use this metric to determine the rectangular region with sides equal to the length of the dominant period. One of these regions is chosen as the representative, in our experiments we choose the median size feature from the sets aggregated in Sect. 4.4. At first glance it is tempting to choose the densest region as the representative. This strategy is often a mistake. Usually the region that is densest has a unique feature that distinguishes it from the rest of regions and disqualifies it as fair representative. Instead the region with median density is chosen.

To complete the compression we translate the chosen region vertically by the distance of the period. In this way the size of the column can be reduced by the number of windows found in the group of columns.

For a more ambitious reduction we assume that the building facade is planar. The vertical distance that came from the Fourier analysis can be coupled with the horizontal distance to the next adjacent group of periodic columns. The Chebyshev metric is altered to handle two different distances one vertical, the other horizontal. In the coordinate system given by the scanner the vertical axis corresponds to the z-axis of the points. However, the horizontal direction along the facade may be a combination of both the x and y coordinates. So, to find the horizontal distance to the next group of columns we take the Chebyshev distance in both the x and y directions.

$$ D_{Vertical}(\alpha, \beta) : \bigl(|\alpha_z - \beta_z|\bigr) $$
(3)
$$ D_{Horizontal}(\alpha, \beta): \max\bigl(|\alpha_x - \beta_x|, |\alpha_y - \beta_y|\bigr) $$
(4)

Now much larger swaths of the building are included in the compression, see Fig. 16. The entire facade is compressed by a factor equal to the number of periodic elements it contains.

5.2 Comparison of Facade Segments

Some regions may contain extra information that should not be discarded when we compress with the median region. To avoid compressing regions with something that is not truly representative, the regions in question should be compared. Comparing 3D point clouds can be computationally intensive. To maintain the realtime flavor of the algorithm we choose a method based on Osada et al. (2002). Histograms are generated from a subset of points within each region. The histogram contains the Euclidean distance of randomly selected pairs of points from the subset. To prevent spurious matches, it is sensible to choose high curvature points (computed as in Cazals and Pouget 2005), or points whose eigenvalues computed during the local PCA were high, see Fig. 17. These points contain more of the unique character of each region rather than the homogeneous planarity. Comparing histograms is simple: one is subtracted from the other. Alternative histogram distance metrics exist as detailed in Pele and Werman (2010), however the L1 metric is simple and efficient for our purposes. A large absolute value in the difference between histograms indicates two distinct regions, both of which should be preserved in the compressed scan. Here we introduce a parameter γ as the threshold by which regions are declared similar. This parameter allows user’s discretion over the degree to which their scans are compressed and the amount of data loss they will tolerate. Figure 18 shows the same scan with different levels of compression achieved by altering this threshold. The graphs in Fig. 19 show the compression rates and error at different histogram thresholds. We measure the error by averaging the Hausdorff distance between the translated representative region and the region it replaces.

5.3 Replacing Regions with Translations of the Representative

Once the regions which are sufficiently similar to the representative are identified they can be removed. Scans with repetitive features removed are shown in Figs. 12, 13, 14, 15, 20 and 21. Working just with this reduced set it is still possible to recreate the look and feel of the initial scan. By translating the representative region vertically by multiples of the dominant period we are able to fill the missing data.

Fig. 12
figure 12

Our framework for scan compression. The compressed scan has reduced all the windows above while maintaining the ornamentation and irregularities

Fig. 13
figure 13

The top image is a segmented scan. The middle image is a compressed version of the scan above it. The bottom image shows the amount of reduction. In the bottom image the grey points indicate the representative regions which are copied and translated to achieve the compression. See Sect. 5.3. The compression ratio is 48 % with 383735 points removed out of 805176. Even though some window groups contain multiple columns of windows compression is unaffected because the periodicity of all of them has been correctly identified (Color figure online)

Fig. 14
figure 14

The left image is a segmented scan. The middle image is a compressed version of the scan to the left. The right image shows the data reduction. The grey points are the representative regions. The compression ratio is 47 %, 513983 points have been removed out of a total of 1082958 points (Color figure online)

Fig. 15
figure 15

The left image is a segmented scan. The middle image is a compressed version of the scan to the left. The right image shows the amount of reduction. The grey points are the representative regions. The compression ratio is 55 %, with 300628 points removed out of 546964 total points (Color figure online)

Fig. 16
figure 16

These facades have been segmented by their periodicity. The Chebyshev metric identified the rectangles containing each periodic feature (Color figure online)

Fig. 17
figure 17

Histograms are used as shape signatures to efficiently distinguish between distinct regions. Left: some regions of high curvature points. Right: same regions with their shape histograms. Similar regions have similar histograms

Fig. 18
figure 18

The same scan with different levels of compression applied. The amount of compression is determined by changing the threshold which separates the histogram shape signatures, as described in Sect. 5.2. From left to right the compression rates are 57 %, 47 % and 13 %, and the histogram difference thresholds, γ are 2000, 700, and 500

Fig. 19
figure 19

At left the compression rates achieved at various histogram thresholds. At right the cumulative error at various histogram thresholds. The error is measured by averaging the Hausdorff distance (in meters) between the original regions and the representative that replaces them

Fig. 20
figure 20

Scans are compressed by repeating a representative region in place of the original data (Color figure online)

Fig. 21
figure 21

The left image is the original scan. The middle images shows the feature centroids superimposed over the compressed scan. The right image shows all the points remaining after the compressed points are removed. The centroid of the architectural features discovered in Sect. 4.5 improve the compression results by aligning the groups by floor

6 Registering in Real Time

Registration is a major topic of research in 3D computer vision community. Here we present a realtime registration technique which builds upon the information acquired from the analysis of the periods of the windows and balconies. This strategy could be employed by two or more scanners, for simultaneous realtime multi-view scene acquisition. Our algorithm is extremely efficient and robust for scenes with repetitive patterns. Figures 22 and 23 show synthetic views, while Figs. 24 and 25 show two distinct real world scans registered using our algorithm.

Fig. 22
figure 22

Two views were synthesized from a single scan by applying a known transformation T. Our method then estimates a transformation \(T_{*}^{-1}\), using the methods of Sect. 6

Fig. 23
figure 23

A single group of regular features occurring in both scans is sufficient to achieve an accurate registration

Fig. 24
figure 24

Fours views of two distinct scans that have been registered following Sect. 6. The points from one scan are shown in blue and points from the other scan are shown in black

Fig. 25
figure 25

Six views of two distinct scans one with blue points the other with black points. Despite a wide variance of resolution and incident angle of the laser, the transformation between the views can be computed online and automatically (Color figure online)

6.1 Orthonormal Bases of Aggregated Columns

Solving for the rotation between two overlapping scans is essentially a problem of defining bases. We must find the orthonormal bases that describe the two sets to be registered. Let us call these two sets S 1 and S 2. For each point x 1S 1 its corresponding point in S 2 called x rot is given by:

$$ x_{rot} = B_2B_1^{-1}x_1, $$
(5)

where B 1 is the matrix composed of the orthonormal bases of S 1 and B 2 is a similar matrix for S 2. To find the orthonormal bases B 1 and B 2 we find the vertical vector v and the orthogonal facade vector f. The facade vector is established by taking the vector between any two points within the facade, called f start and projecting this vector onto v. The facade vector is the vector between f start and its projection onto v. By subtracting out the component of f start in the direction of v we ensure that f is orthogonal to v.

$$ f = f_{start} - \bigl((f_{start} \cdot v) v\bigr). $$
(6)

Both vectors f and v are normalized. Now their cross product yields the complete set of three basis vectors:

$$ B_* = \left (\begin{array}{c} f \\ v \\ f \times v \end{array} \right ) $$
(7)

Figure 26 shows two sets of basis vectors and the repeated architectural features they were computed from.

Fig. 26
figure 26

Basis vectors for two window groups

6.2 Computing Corresponding Pairs

Once the rotation matrix \(B_{2}B_{1}^{-1}\) is known, solving for the translation between the scans is reduced to finding a pair of corresponding points. More precisely, we wish to find two locations in each scan of a single real world point. The translation between scans t is then:

$$ t = x_2 - B_2B_1^{-1}x_1, $$
(8)

where x 1 and x 2 are two data points that correspond to the same real world point.

There are several ways to compute x 1 and x 2. The probability that the laser will measure the exact same point in two different scans of the same scene is vanishingly small. Therefore choosing a pair of points and simply calculating the translation vector between them is bound to introduce some error. It is preferable to rely on higher level knowledge about the scene. To this end we propose two different methods for computing translations between different views. The methods are appropriate in different circumstances. The window width sequence method is best when the areas of overlap between the views are large and the resolution is high. The corner vector method is not as reliable as the window sequence but it requires less resolution and overlap.

6.2.1 Window Width Sequences

Registering two scans by a pair of architectural features should result in an alignment of all of the overlapping features. We compute the rotation matrix based on two sets of these features, the window groups from Sect. 4.4. If the mapping between feature groups is correct it should ensure that all the adjacent feature groups are also aligned. It is easy to check this by relying on their width. Expanding from the pair of feature groups we have aligned, we should find similar feature groups in all the remaining sets of adjacent features until the edge of the overlap.

6.2.2 Corner Vector

If the scan does not contain significant or high resolution overlap it may be difficult to distinguish this pattern of matching features because there might be only one set of features that we can align. In these circumstances we rely again on the macro data about major planes found in Sect. 4.1. Now we can use the ground plane estimate to compute the vertical coordinate of the translation vector.

For corresponding aggregated window column groups we try to match the first columns from each set of columns. This locks in two of the coordinates of the translation vector. To find the vertical translation we resort to our major plane detection from Sect. 4.1. The cross product of the facade and ground normals gives the direction of the line of intersection between the two planes (assuming they are not parallel):

$$ i_d = n_g \times n_f. $$
(9)

Now this directional vector must be translated to align with the intersection of the major planes. For this translation vector we combine the z-coordinate of the ground plane and x and y coordinates of a point on the facade plane in the current scan line. Essentially dropping a line from the facade plane until it hits the ground.

6.3 Evaluation of Registration

Since the number of feature groups is relatively small it is not impractical to compute many possible transformations and then evaluate them all and select the best to register the scans. So, many transformations are generated and one will be selected as the best. We eliminate spurious transformations in a number of ways. First we ensure that the feature groups to be aligned have similar periods as detected in Sect. 4.3. Furthermore we only compute transformations between window groups of similar widths. Then we evaluate the quality of the transformation by checking the distance between aligned columns once a transformation has been applied to one of them.

To further evaluate our registration algorithm we performed ICP on our results. Surprisingly, there was a notable degradation in the quality of the registration. After close analysis we concluded that the ICP algorithm tried to align regions like parked cars that did not actually correspond to the same real world structures. This experience illustrated another strength of our algorithm. Not only do we compute registration in realtime far faster than ICP, we base our registration on facade features which are lasting elements of the scene. ICP treats all close points equally, so scans performed on different days are susceptible to error when a new set of cars are parked on the street, for example. The average error returned by the ICP algorithm was 0.65 meters. ICP measures an error 0.914 meters before registration but much of this error comes from false correspondences and does not actually indicate a flaw in the registration. The scan before and after ICP is applied is shown in Fig. 27, clearly the registration before ICP is more accurate.

Fig. 27
figure 27

The left image shows two views registered by our algorithm. The right images shows the registration after ICP is performed. Notice how the quality of the registration has been degraded by ICP. The lamppost, mailbox and trash can all exhibit significant error, which was not present in the real time registration on the left

The transformation matrix output by our algorithm was compared against the transformation matrix computed from an interactive registration program where a user selects several corresponding points in the two scans. The interactive program then uses a least squares technique to find the transformation matrix between the scans. In our tests the per entry difference between the rotation matrices from our automatic approach and the interactive program never exceeded 0.0084 and averaged 0.0037. Alternative methods for comparing rotation matrices are presented in Huynh (2009). The translation vectors differed by an average of 0.12 meters.

7 Conclusion and Future Work

We have shown an online algorithm for detecting regular features and we have exploited this regularity to compress and register urban range scans. The meso-level information of the regular architectural fenestration provides tremendous insight into the scan. It will no doubt be shown to be applicable to a host of other applications in urban scene processing.

7.1 Smarter Scanners

One field ripe for research is integrating online algorithms of the kind described here into the hardware of the laser scanner. By processing the scan as it is acquired the offline computational resources are freed to pursue higher level recognition tasks.

Compression in particular could be helpful in an onboard application. A robot navigating a complex scene may simply not have the resources to store an entire point cloud of an urban scene. Integrating compression into the act of processing circumvents this need.

7.2 Parallel Range Scanning

Another area of future work will employ the registration algorithm presented here to allow simultaneous realtime registering of scans from multiple scanners. This application could prove critical in range finding sensor networks. If registration can occur as the scans are recorded each sensor could leverage all the global information acquired as it becomes available. In this way each sensor could enjoy the speed and robustness that parallelism affords without sacrificing the global knowledge of a serial scan.