Keywords

1 Introduction

Understanding and modeling in detail the dynamic 3D urban scenes can enable effectively urban environment sustainability. In particular, the efficient spatiotemporal urban monitoring in large scale is critical in various engineering, civilian, and military applications such as urban and rural planning, mapping, and updating geographic information systems, housing value, population estimation, surveillance, transportation, archeology, architecture, augmented reality, 3D visualization, virtual tourism, location-based services, navigation, wireless telecommunications, disaster management, and noise, heat, and exhaust spreading simulations. All these subjects are actively discussed in the geography, geoscience, and computer vision scientific communities both in academia and industry. Organizations like Google and Microsoft are trying and seeking to include extensively up-to-date 2D and 3D urban models in their products (Microsoft Virtual Earth and Google Earth).

The prohibitively high cost of generating manually such 2D and 3D dynamic models/maps explains the urgent need towards automatic approaches, especially when one considers modeling and monitoring time-varying events within the complex urban areas. In addition, there is an emergence for algorithms that provide generic solutions through the automated and concurrent processing of all available data like panchromatic, multispectral, hyperspectral, radar, and digital elevation data. However, processing multimodal data is not straightforward (He et al. 2011b; Longbotham et al. 2012; Berger et al. 2013) and requires novel, sophisticated algorithms that on the one hand can accept as an input multiple data from different sensors, data with different dimensions, and data with different geometric, spatial, and spectral properties and on the other hand can automatically register and process them.

Furthermore, despite the important research activity during the last decades, there are, still, important challenges towards the development of automated and accurate change detection algorithms (Lu et al. 2011c; Longbotham et al. 2012; Hussain et al. 2013). It has been generally agreed and is verified by the quantitative evaluation of recent research efforts that there isn’t, still, any specific single, generic, automated methodology that is appropriate for all applications and/or all the case studies. The maximum accuracy of the 2010 multimodal change detection contest was just over 70 % (Longbotham et al. 2012). This is in accordance and closely related with Wilkinson’s earlier report on the minor improvement during the last decade on the performance of classification algorithms (Wilkinson 2005). Even the latest machine learning techniques haven’t contributed much on the remote sensing data classification problem. Standard approaches usually result in similar levels of accuracy with the newer more advanced ones. Therefore, several aspects of the change detection process, towards the efficient 2D and 3D updating of geospatial databases, possess emerging challenges.

The aforementioned need for more intensive research and development is, furthermore, boosted by the available and increasing petabyte archives of geospatial (big) data. Along with the increasing volume and reliability of real-time sensor observations, the need for high performance, big geospatial data processing, and analysis systems, which are able to model and simulate a geospatially enabled content, is greater than ever. Both in global and local scales, the vision towards a global human settlement layer (Craglia et al. 2012) with multiscale volumetric information describing in detail our planet in 4D (spatial dimensions plus time) requires generic, automated, efficient, and accurate new technologies.

Towards this end, a significant amount of research is still, nowadays, focusing on the design, development, and validation of novel computational change detection procedures. Among them, those concentrating on forest change detection are holding the biggest share (Hansen and Loveland 2012) due to the importance on climate change, biodiversity and the suitability of past and current satellite remote sensing sensors, their spatial and spectral properties, and operational monitoring algorithms (Phelps et al. 2013). Cropland, vegetation, and urban environments are the other change detection and monitoring targets that benefit more from the current and upcoming very high-spatial-resolution, very high-spectral-resolution, and very high-temporal-resolution remote sensing data.

This chapter is focusing on the recent advances on change detection computational methods for monitoring urban environments from satellite remote sensing data, with emphasis on the most recent advances in the domain. In order to study change detection methodologies, their main key components are identified and studied independently. The most recent techniques are presented in a systematic fashion. In particular, publications during the last 6 years are reviewed and recent research efforts are classified in certain categories regarding the type of the algorithm employed, the type of geospatial data used, and the type of the detection target. Earlier reviews (Lu et al. 2004, 2011c; Radke et al. 2005) give a detailed summary of the efforts during the last decades (Singh 1989). Moreover, the focus here is on change detection methods applied to medium-, high-, and very high-resolution data, since for urban environments smaller scales do not provide spatial products with suitable accuracies for local geospatial database update. In the following sections several aspects of the change detection targets, end products, the relevant remote sensing data, preprocessing, and core change detection algorithms are detailed and discussed.

2 Change Detection Targets and End Products

The main detection targets in urban environments are land cover, land use, urban growth, impervious surfaces, man-made objects, buildings, and roads. With the same order one can indicate a suitable spatial accuracy from regional to more local scales. Therefore, each query for monitoring specific phenomenon, terrain classes, or terrain object poses specific constrains that describe the end product of the procedure. Which is the detection target and the desired location and size, which is the desired time period, and which is the required spatial accuracy?

The answer to the aforementioned questions indicates various parameters and sorts significantly the required approaches and algorithms that should be employed. Table 10.1 summarizes the recent research activity on change detection and monitoring of urban environments according to the desired product and target that each recent study has been focusing on. Land cover/land use, urbanization, impervious surfaces and man-made objects, building, and slum or damaged buildings compromise the five dominant categories.

Table 10.1 Change detection and monitoring targets

These categories are not referring to different terrain objects but rather on a hierarchical terrain object relation like in most model-based descriptions (ontologies, grammars, etc.). This categorization depicts both the different end-product requirements like their spatial scale and the type of urban objects/terrain classes are required for detection and monitoring. Along with the different specifications of the currently available remote sensing data, this is, actually, the main reason why these categories seem to form different groups in the literature including data, methods, and validation practices. In particular, the biggest share are holding the efforts which focus either on land cover/land use or either on building change detection.

On the one hand, the opening of the United States Geological Survey’s Landsat data archive (Woodcock et al. 2008; Wulder et al. 2012) and the newly launched Landsat Data Continuity Mission (LDCM) enabled the easy access to a record of historical data and related studies on monitoring mainly land-cover/land-use changes, updating land national cover maps, and detecting the spatiotemporal dynamics, the evolution of land-use change, and landscape patterns. With this increased data availability and the increasing open data policies both in the USA and EU, similar studies can correspond to the current demand for improving the capacity to mass process big data and enable the efficient spatiotemporal modeling and monitoring.

On the other hand, a significant amount of research was focused on local scales and building change detection. Novel promising automated algorithms were developed which allow one to automatically detect, capture, analyze, and model efficiently single buildings in dynamic urban scenes. Mainly model-based approaches, like parametric, structural, statistical, procedural, and grammar-based ones, have been design to detect, both in 2D and in 3D, buildings and spatiotemporal changes. Google Earth, Virtual Earth, and other government applications and databases must be/remain updated, and therefore, the motivation on automated algorithms instead of costly manual digitization procedures is, still, high.

Apart from the requirements regarding the multiple properties of the desired product and detection target, the change detection procedure is affected by a number of parameters including spatial, spectral, thematic, and temporal constraints; radiometric, atmospheric, and geometric properties; and soil moisture conditions. Therefore, a sophisticated methodology should be able to address in a preprocessing step all the various constrains and conditions that will enable an effective and accurate core spatiotemporal analysis. In the following two subsections, certain important aspects regarding the multiple properties of the remote sensing data are detailed along with a brief description on the required preprocessing steps.

3 Remote Sensing Data

During the last decades important technological advances in optics, photonics, electronics, and nanotechnology allowed the development of frame and push-broom sensor with high spatial and spectral resolution. New satellite mission have been scheduled continuously and gradually remote sensing data of higher quality from either passive or active sensors will be available. However, today data with high spatial and spectral resolution is either for military or commercial use. In Table 10.2, a summary of the currently available satellite remote sensing sensors, which were employed in recent change detection studies, is reported along with the major data specifications and cost. Apart from their spatial, spectral, and temporal resolution, their cost is referring to archive data (apart from the Cartosat-1 case) and is associated with the specific product/mode which offers the highest spatial resolution. The cost refers to list prices (e-geos 2013; GeoStore 2013) and has been estimated for the minimum (“per scene”) order and per square kilometer (km2) in order to ease the comparison. It is obvious that when moving from the medium- and high-spatial-resolution products to the very high-resolution ones, the cost per square kilometer increases significantly i.e., from about 1€ per km2 to about 20€. The high-spatial-resolution SAR satellite sensors are, also, offering costly products, similar with or higher than the optical ones. In addition, it should be noted that as we are moving from smaller to larger spatial scales, the number of images required to cover the same area increases significantly. Therefore, the cost for delivering change detection geospatial products increases exponentially as we are moving from regional land cover/use or urban growth studies to local building change detection and cadastral map updating.

Table 10.2 Summary of the currently available satellite remote sensing data, which have been employed in recent studies, and their major specifications and cost

In Table 10.3, recent change detection approaches are classified according to the type of the remote sensing data used in each recent study. Medium- to high-resolution optical data, radar data, and multimodal data (Fig. 10.1) are holding the biggest share among the recent change detection research activity. However, 3D data (satellite or airborne) and vector data from existing geodatabases are gaining increasing attention for spatiotemporal monitoring in local scales. In region scales, the research activity, as has been already mentioned, has been empowered from the increasing US and EU open data policies. Moreover, new open products which include basic but necessary preprocessing procedures will boost more research and development for quantifying global and regional transitions given the changing state of global/regional climate, biodiversity, food, and other critical environmental/ecosystem issues. Web-enabled Landsat data is an example, where large volumes of preprocessed Landsat 7 Enhanced Thematic Mapper Plus data are operationally offered for easing the mapping procedure of land-cover extent and change (Hansen et al. 2014).

Fig. 10.1
figure 1

A multimodal, multitemporal remote sensing dataset covering a 25 km2 region in the East Prefecture of Attica, Greece. The corresponding DEM is shown in the upper right image. Middle row: An aerial orthomosaic acquired in 2010 (left), a WorldView-2 image acquired in 2011 (middle) and a WorldView-2 image acquired in 2010 (right). Bottom row: A QuickBird image acquired in 2009 (left), a QuickBird image acquired in 2007 (middle) and a TerraSAR-X image acquired in 2013 (right)

Table 10.3 Remote sensing data and recent change detection and monitoring research studies

4 Data Preprocessing

Certain factors, such as the radiometric calibration and normalization between multitemporal datasets, the quality of atmospheric corrections, the quality of data registration, the complexity of the landscape and topography under investigation, the analyst’s skill and experience, and last but not least, the selected change detection algorithm, are directly associated with quality of the change detection product. The initial preprocessing stage, which current efforts try to standardize (Yang and Lo 2000; Chander et al. 2009; Hansen et al. 2014), addresses important issues regarding the radiometric, atmospheric, and geometric corrections in the available datasets transforming them from raw to geospatial ready-for-analysis data. However, there are still a number of challenges that should be addressed (Villa et al. 2012) in order to exploit raw big remote sensing data and transform them to big geospatial reflectance surfaces. The most important is automation. In the following two subsections, the main preprocessing procedures are briefly described and discussed. It should be noted that for Landsat datasets, certain protocols have been proposed and widely adopted (Han et al. 2007; Vicente-Serrano et al. 2008) including (i) geometric correction, (ii) calibration of the satellite signal to obtain “top of atmosphere” radiance, (iii) atmospheric correction to estimate surface reflectance, (iv) topographic correction, and (v) relative radiometric normalization between images obtained at different dates. The latter is not required in cases where, e.g., an absolute physical correction model has been employed. The radiometric processing should be the initial one; however, this is not always the case, since, for example, the former Landsat datasets in Europe were available already and geometrically corrected (e.g., level 1 system corrected from the European Space Agency).

4.1 Radiometric and Atmospheric Correction and Calibration

The main goal of radiometric and atmospheric corrections is to model the various sources of noise which affect the information captured by the sensor, making it difficult to differentiate the surface signal from any type of noise. Despite the efforts that are persistently made to calibrate satellite sensors towards correcting lifetime radiometric trends and minimize the effect from atmospheric noise, certain studies have shown that the application of accurate sensor calibrations and complex atmospheric corrections does not guarantee the multitemporal homogeneity of (e.g., Landsat) datasets since complete atmospheric properties are difficult to quantify and simplifications are commonly assumed (Han et al. 2007). Therefore, a cross-calibration between the data stack and time series can address the problem.

Given a remote sensing optical dataset, the first step is to convert the capture radiance, the raw digital numbers to the “top of atmosphere” values (Chander et al. 2009; Villa et al. 2012, and the references therein). Then the second step is to model the upward and downward irradiance which is constrained by the gases absorption and the water molecules and aerosols scattering. Complex radiative transference models simulate the atmosphere and light interactions between the sun-to-terrain and terrain-to-sensor trajectories. Although, such an atmospheric correction can account for signal attenuation and restore in some extent the intercomparability of satellite images taken on different dates, “top of atmosphere” values are widely used directly for inventory and ecosystem studies or in procedures that are based on post-classification change detection approaches. However, recent studies indicate that cross-calibration and atmospheric corrections are required prior to relative normalization since certain remote sensing products and accurate biophysical parameters like vegetation indices cannot be calculated (Vicente-Serrano et al. 2008).

The third step is to model the modified illumination conditions due to the scene topography. In order to simplify this extremely complex setting, in practice one concentrates on the shaded areas which deliver less than expected reflectance and on the sunny areas which deliver more than expected. Then, usually, we assume a Lambertian terrain behavior or model non-Lambertian effects. Last but not least, a relative radiometric normalization should be performed between the images of the time series/dataset, in case where an absolute physical correction model was not employed. The normalization process is based on a linear comparison between the images which have been acquired on different dates. To this end, linear regression or other automated techniques like the pseudo-invariant feature regression has given promising results (Vicente-Serrano et al. 2008) while indicating that the relative radiometric normalization is an absolutely essential step to ensure high levels of homogeneity between the images of the dataset.

4.2 Geometric Corrections and Data Registration

Once the radiometric and atmospheric calibration has been performed, the next step is to register, co-register, and geo-reference the available data. Early studies (Dai and Khorram 1998; Roy 2000; Bovolo et al. 2009) have underlined the important problems which occurred from data misregistration and how significantly the change detection product is affected. Therefore, in order to develop operational detection systems, the registration problem must be addressed with an optimal way (Klaric et al. 2013). In particular, this is a common challenge in most computer vision, medical imaging, remote sensing, and robotics applications, and this is the reason why image registration, segmentation, and object detection hold the biggest share in modern image analysis and computer vision research and development (Sotiras et al. 2013).

Speaking briefly, the image registration task involves three main components: a transformation model, an objective function, and an optimization method. Thesuccess of the procedure depends naturally on the transformation model and the objective function. The dependency on the optimization process follows from the fact that image registration is inherently an ill-posed problem. Actually, in almost all realistic scenarios and computer vision applications, the registration is ill-posed according to Hadamard’s definition of well-posed problems. Therefore, devising each component of the registration algorithm in such way that the requirements (regarding accuracy, automation, speed, etc.) are met is a demanding and challenging process (Eastman et al. 2007; Le Moigne et al. 2011; Sotiras et al. 2013).

The intensive research on invariant feature descriptors (Lowe 2004) empowered the automation in the feature detection (points, lines, regions, templates, etc.) procedure. Along with the model fitting approaches, through iterative non-deterministic algorithms, an optimal set of the selected mathematical model parameters (i.e., transformation, deformation, etc.) is detected excluding outliers. Area-based methods, mutual information methods, and descriptor-based algorithms restore data deformations and through a resampling data are warped to the reference. Furthermore, since the effective modeling requires rich spatial, spectral, and temporal observations over the structured environment recent approaches fuse data from various sensors, i.e., multimodal data (Fig. 10.1). The various sensors include frame and push-broom cameras and multispectral, hyperspectral, and thermal cameras, while the various platforms include satellite, airborne, UAV, and ground systems.

In multimodal data registration (De Nigris et al. 2012; He et al. 2011b), mutual information techniques have become a standard reference, mainly in medical imaging (Legg et al. 2013; Wachinger and Navab 2012; Sotiras et al. 2013). However, being an area-based technique, the mutual information process possesses natural limitations. To address them, a combination with other, preferably feature-based, methods have gain high robustness and reliability. To speed up the computation, scale space representations (Tzotsos et al. 2014) are employed along with fast optimization algorithms. However, when data have significant rotation and/or scaling differences, these methods either fail or become extremely time expensive. Future development on addressing the multimodal data challenges may concentrate more on feature-based methods, where appropriate invariant and modality-insensitive features (Heinrich et al. 2012) can provide the reliable and adequate volume of features for a generic and automated multimodal data registration.

To sum up, the described radiometric and geometric corrections between all the available data of a given time series transform raw data to valuable “ready-for-analysis” geospatial datasets and ensure an optimal exploitation from the following, in the processing chain, core change detection algorithms.

5 Unsupervised Change Detection Methods

Unsupervised approaches are based on automated computational frameworks that usually produce binary maps indicating whether a change has occurred or not. Therefore, standard unsupervised change detection techniques are not usually based on a detailed analysis of the concept of change but rather compare two or more images by assuming that their radiometric properties are similar, excluding real change detection phenomenon (Bruzzone and Bovolo 2013). However, this assumption in realist scenarios is not satisfied, especially, in local scales. In particular, the captured complexity of terrain objects, with different spectral behaviors at different dates and environmental conditions, is significant especially in very high-resolution data. That is the main reason why although unsupervised change detection methods have validated so far, their effectiveness on medium- to high-resolution data and usually under pixel-based image analysis, when the spatial resolution reaches submeter accuracies, they become less accurate (Hussain et al. 2013).

Unsupervised approaches have accumulated a significant amount of research efforts since i) on the one hand, they are more attractive from an operational point of view, allowing automation without the need for manual collection of reference data/samples and ii) on the other hand, they can possible address the aforementioned challenges and move towards a semantic change labeling by identifying the exact land-cover transition.

In Table 10.4, a summary of the recent unsupervised change detection studies is presented. Recent methods are classified according to the core technique on which they were mainly based on. The majority of recent studies is based on standard direct comparisons, data transformations, data fusion, multiscale analysis, and clustering. Most of the recent unsupervised methods are, also, pixel-based approaches and focus on the pixel-by-pixel analysis of the multispectral multitemporal data.

Table 10.4 Summary of recent change detection studies classified according to their unsupervised or supervised nature and the main technique that they were based on

More specifically, they calculate after a certain computation (like a transformation, a spectral analysis, etc.) the magnitude of change vectors and apply a thresholding technique in order to detect possible changes.

An important number of approaches are based on ratios, kernels, change vector analysis, and indices (Bovolo et al. 2012; Canty and Nielsen 2008; Celik 2009; Chen et al. 2011; Dalla Mura et al. 2008; Renza et al. 2013; Demir et al. 2013; Gueguen et al. 2011; Marchesi and Bruzzone 2009; Marpu et al. 2011; Volpi et al. 2012). Other efforts are based on multiscale analysis like wavelets (Bovolo et al. 2013; Celik and Ma 2010, 2011; Dalla Mura et al. 2008; Moser et al. 2011), fuzzy theory (Ling et al. 2011; Luo and Li 2011; Robin et al. 2010), clustering and MRFs (Aiazzi et al. 2013; Celik 2010; Ghosh et al. 2011, 2013; Salmon et al. 2011; Moser and Serpico 2009; Moser et al. 2011; Wang et al. 2013).

Spectral mixture analysis (Yetgin 2012), level sets (Bazi et al. 2010; Hao et al. 2014), and data fusion approaches (Du et al. 2012, 2013; Moser and Serpico 2009; Ma et al. 2012; Gong et al. 2012) are holding an important share also. Moreover, and despite the fact that their core employed algorithms are supervised, recent proposed automated studies are based on object-based techniques (Bouziani et al. 2010), semi-supervised support vectors (Bovolo et al. 2008), and neural networks (Pacifici and Del Frate 2010).

In addition, among the recent unsupervised techniques, a clear computational advantage possess the ones who can address the dependence between spatially adjacent image neighbors either by standard texture or morphological measures or either by clustering, Markov random fields, Bayesian networks, and context-sensitive analysis. Such frameworks (Celik 2009, 2010; Ghosh et al. 2013; Volpi et al. 2012; Bruzzone and Bovolo 2013) can cope more efficiently with the complexity pictured in very high-resolution data.

Promising experimental results after the application of an unsupervised change detection procedure, which is based on the iterative reweighting multivariate alteration detection (IR-MAD) algorithm (Nielsen 2007; Canty and Nielsen 2008), are presented in Figs. 10.2, 10.3, and 10.4. Based on the invariant properties of the standard MAD transform where we assume that the orthogonal differences contain the maximum information in all spectral bands, an iterative reweighting procedure involving no-change probabilities can account for the efficient detection of changes. In the upper row of Fig. 10.2, the QuickBird image acquired in 2007 is shown, while the corresponding QuickBird image acquired in 2009 is presented in the middle row. The detected changes after the application of the IR-MAD and post-processing morphological algorithms are shown in the bottom. All changes represent the new buildings that were constructed in the region after 2007. The detected changes/buildings are overlaid in the 2009 image and shown with a red color. The ground truth data are shown with the same manner in green.

Fig. 10.2
figure 2

Unsupervised change detection in multitemporal high resolution data. Upper row: The raw QuickBird image, acquired in 2007, in RGB321 (left) and R432 (right). Middle row: The raw QuickBird image, acquired in 2009, in RGB321 (left) and R432 (right). Bottom row: The detected changes (they are all new buildings), overlaid in the 2009 image, are shown with a red color. Ground truth data are shown in green

Fig. 10.3
figure 3

The detected under an unsupervised manner changes (buildings) and the corresponding ground truth data. Upper row: A map with the possible changes after the application of the regularized iteratively reweighted MAD algorithm (left) and after thresholding (right). Bottom row: The detected changes (buildings) after the application of morphological post-processing (left) and the ground truth (right). All the changes (new buildings) have been successfully detected. The quantitative evaluation reported a low detection completeness of around 60 % and a much higher detection correctness of 95 %

Fig. 10.4
figure 4

The detected changes (new buildings) in 3D after the application of the unsupervised change detection procedure on QuickBird 2007 and 2009 satellite data. The detected new building in 3D are shown in the upper part, while the 3D buildings from the ground truth data are shown in the bottom. After a close inspection one can observe the low completeness and high correctness detection rates of the unsupervised change detection algorithm

In Fig. 10.3, the IR-MAD output and the corresponding binary image after a thresholding are shown in the upper row. The detected changes (new buildings) after the application of a morphological post-processing procedure and the corresponding ground truth data are shown in the bottom. All the changes (all new buildings) have been successfully detected by the unsupervised procedure. The quantitative evaluation reported a low detection completeness of around 60 % and a high detection correctness of 95 %. This can be, also, observed in Fig. 10.4 where the detected changes have been associated with the corresponding DEM. The detected new buildings in 3D are shown in the upper part of Fig. 10.4, while the 3D buildings from the ground truth data are shown in the bottom.

6 Supervised Change Detection Methods

The supervised classification approaches traditionally are based on the detection of changes from a post-classification process (which is usually another classification). This process enables, also, the detection of actual class transitions instead of a binary “change or not change” product. However, errors from each step and each individual classification are propagating and are summed up at the end product. Moreover, collecting reliable, dense training sample sets can be difficult and time-consuming for certain cases (e.g., historical data) or even unrealistic if one has to deal with extensive dense time series and multimodal data. In practice, however, the post-classification approach is, nowadays, the most standard one especially for global and regional scales, for land-cover, land-use, and urbanization monitoring.

In more local scales and for very high-resolution data, the standard supervised approach is an object-oriented one under an object-based image analysis framework (Blaschke 2010). Multilevel segmentation and supervised classification are the main key process there (Tzotsos et al. 2011, 2014). Recent object-based change detection approaches (Table 10.4) include scale space filtering and multivariate alteration detection (Doxani et al. 2012), the combination with multi-view airborne laser scanning data (Hebel et al. 2013), the detection of impervious surfaces (Xian and Homer 2010), shaded areas (Zhou et al. 2009), landslides (Lu et al. 2011b), and building damage assessment after earthquakes (Brunner et al. 2010). Another promising combination is to employ data mining techniques under an object-based framework in order to address big datasets and dense, long-term time series (Schneider 2012).

To this end, algorithms focusing on knowledge discovery in databases aim at extracting/mining nontrivial, implicit information from unstructured datasets. In particular, for geospatial datasets, data mining techniques are exploiting spatial and nonspatial properties in order to discover the desired knowledge/data. Dos Santos Silva et al. (2008) proposed a data mining framework which associates each change pattern to one predefined type of change by employing a decision-tree classifier to describe shapes found in land-use maps. Boulila et al. (2011) employed fuzzy sets and a data mining procedure to build predictions and decisions. Based on the imperfections related to the spatiotemporal mining process, they proposed an approach towards a more accurate and reliable information extraction of the spatiotemporal land-cover changes. Vieira et al. (2012) introduced a joint object-based data mining framework during which instead of the standard supervised classification step, a data mining algorithm was employed to generate decision trees from certain training sets. Schneider (2012) proposed an approach that exploits multi-seasonal information in dense time stacks of Landsat imagery comparing the performance of maximum likelihood, boosted decision trees, and support vector machines. Experimental results indicated only minor differences in the overall detection accuracy between boosted decision trees and support vector machines, while for band combinations across the entire dataset, both classifiers achieved similar accuracy and success rates.

This observation is in accordance with similar recent studies (Table 10.4) which employ powerful machine learning classifiers (Bovolo et al. 2010; Chini et al. 2008; Camps-Valls et al. 2008; Habib et al. 2009; Pacifici and Del Frate 2010; Demir et al. 2012; Pagot and Pesaresi 2008; Taneja et al. 2013; Volpi et al. 2013) for supervised change detection and indicate why they are so popular for remote sensing classification and change detection problems. However, machine learning algorithms are, usually, time-consuming and efforts towards a more computational efficient design and algorithmic optimization are required (Habib et al. 2009). Moreover, in local scales and very high-resolution data, including 3D or vector data, there is a lot of room for research and development in order to exploit the entire multimodal datasets. In particular, an important outcome from the recent 2012 multimodal remote sensing data contest (Berger et al. 2013) indicates that none of the submitted algorithms actually exploited in full synergy the entire available dataset, which included very high-resolution multispectral images (with a 50 cm spatial resolution for the panchromatic channel), very high-resolution radar data (TerraSAR-X), and LiDAR 3D data from the city of San Francisco, USA.

Therefore, in local scales, but not only, novel sophisticated, generic solutions should exploit the recent advances in 2D and 3D building extraction, reconstruction, and 3D city modeling which have gain a lot of attention during the last decade due to emerging new engineering applications including augmented reality, virtual tourism, location-based services, navigation, wireless telecommunications, disaster management, etc. In a similar manner like the post-classification change detection, monitoring the structured environment, both in 2D and 3D, can be based on the recent advancements on building extraction and reconstruction by, for instance, a similar direct comparison between two different dates. In the following subsection, recent building detection and modeling methods are briefly reviewed.

7 Computational Methods for 2D and 3D Building Extraction and Modeling

The accurate extraction and recognition of man-made objects from remote sensing data has been an important topic in remote sensing, photogrammetry, and computer vision for more than two decades. Urban object extraction is, still, an active research field, with the focus shifting to object detailed representation, the use of data from multiple sensors, and the design of novel generic algorithms.

Recent quantitative results from the ISPRS (WGIII/4) benchmark on urban object detection and 3D building reconstruction (Rottensteiner et al. 2013) indicated that, in 2D, buildings can be recognized and separated from the other terrain objects; however, there is room for improvement towards the detection of small building structures and the precise delineation of building boundaries.

In 3D, none of the methods was able to fully exploit the spatial accuracy of the available datasets. Therefore, although for visualization purposes 3D building reconstruction may be considered as a solved problem, for geospatial applications, and when geometrically and topologically accurate building models are required, novel efficient algorithms are, also, required. Moreover, regarding other urban object like trees, there is a lot of room, also, for research and development towards their efficient extraction and discrimination in complex urban regions.

In Table 10.5, a summary of recent building and road network extraction and reconstruction approaches are presented. They are classified in three categories, i.e., 2D building detection/extraction, 3D building extraction/reconstruction, and road network detection. Buildings among the other man-made object dominate the research interest due to the aforementioned emerging applications that their efficient modeling can guarantee. In general, advanced methods are much likely to have a model-based structure and take into consideration the available intrinsic information such as color, texture, shape, and size and topological information as location and neighborhood. Novel expressive ways for the efficient modeling of urban terrain objects both in 2D and 3D have, already, received significant attention from the research community. From the standard generic, parametric, polyhedral, and structural models, novel ones have been, recently, proposed like the statistical ones, the geometric shape priors, and the procedural modeling with L-system grammar or other shape grammars (Rousson and Paragios 2008; Matei et al. 2008; Zebedin et al. 2008; Poullis and You 2010; Karantzalos and Paragios 2010; Simon et al. 2010). Furthermore, focusing on automation and efficiency, certain optimization algorithm have been developed for the model-based object extraction and reconstruction like discrete optimization algorithms, random Markov fields, and Markov chain Monte Carlo (Szeliski et al. 2008).

Table 10.5 Summary of recent building and road network extraction and reconstruction approaches

Focusing on 2D building boundaries detection, various techniques have been proposed (Champion et al. 2010; Katartzis and Sahli 2008; Senaras et al. 2013; Karantzalos and Argialas 2009; Stankov and He 2013; Wegner et al. 2011; Huang and Zhang 2012; Zhou et al. 2009), including unsupervised, semi-supervised, and supervised ones.

Even if the end product is in 2D, certain studies are based on 3D data (e.g., DSM, LiDAR) (dos Santos Galvanin and Porfírio Dal Poz 2012; Yang et al. 2013; Rutzinger et al. 2009; Sampath and Shah 2010). In particular, buildings can be detected by calculating the difference between objects and terrain height. In case other data are, also, available, data fusion and classification approaches are employed. Other approaches are focusing on processing very high-resolution satellite data and certain of those have proposed algorithms for building detection from just a single aerial or satellite panchromatic image (Benedek et al. 2010; Karantzalos and Paragios 2009; Katartzis and Sahli 2008; Wegner et al. 2011; Huang and Zhang 2012).

The reported qualitative and quantitative validation indicates that the automated detection is hindered by certain factors. The major difficulty is to address scene complexity, as most urban scenes contain, usually, very rich information and various cues. These cues, which are mainly other artificial surfaces and man-made objects, possess important geometric and radiometric similarities with buildings. In addition, addressing occlusions, shadows, different perspectives and data quality issues constrain significantly the operational performance of the developed automated algorithms.

In 3D, a number of methods are based only on a digital surface model or a set of point clouds (Lafarge et al. 2010; Rutzinger et al. 2009; Sampath and Jie Shan 2010; Shaohui Sun and Salvaggio 2013; Sirmacek et al. 2012; Heo Joon et al. 2013). Other ones are exploiting multimodal data like optical and 3D data (Karantzalos and Paragios 2010) or optical and SAR data (Sportouche et al. 2011). Even in 3D there are efforts that are based on a single optical satellite image (Izadi and Saeedi 2012) or a single SAR one (Ferro et al. 2013). Image-based 3D reconstruction has been, also, demonstrated from user-contributed photos (Irschara et al. 2012) and multiangular optical images (Turlapaty et al. 2012).

Experimental results demonstrating the performance of supervised classification algorithms combined with post-classification procedures for building extraction from high-resolution satellite data are shown in Figs. 10.5 and 10.6.

Fig. 10.5
figure 5

Supervised 2D building detection based on data classification algorithms. Upper row: A Pleiades image acquired in 2013 (left) and the result from a standard minimum distance classification algorithm (showing only classes related to buildings). The quantitative evaluation reported a low detection overall quality of 62 %. Middle row: The result from a standard maximum likelihood classification algorithm with an overall detection rate of 67 % (left). A SVM classifier scores higher with an overall detection quality of 74 % (right). Bottom row: The detected buildings, after post-classification processing in the SVM output, are labeled and shown with different colors (left). The detected buildings overlaid on the raw Pleiades image (right)

Fig. 10.6
figure 6

The detected building in 3D after the application of a supervised SVM classifier and post-processing procedures on the high spatial resolution Pleiades data (top). Scene buildings in 3D as extracted from the ground truth data (bottom)

Standard pixel-based classification algorithms like the minimum distance, maximum likelihood, and SVMs deliver detection outcomes with a low correctness rate. In particular, in the upper left part of Fig. 10.5, the raw Pleiades image acquired in 2013 is shown. The result from the minimum distance algorithm, showing only classes related to buildings, is shown in the upper right part of the figure. The quantitative evaluation reported a low detection overall quality of 62 % for the minimum distance algorithm. With the same ground samples, the maximum likelihood algorithm reported an overall detection rate of 67 % and the result is shown in the middle row (left). The SVM classifier scores higher with an overall detection quality of 74 % (middle right).

After post-classification procedures, including mathematical morphology, object radiometric and geometric properties calculation, and spatial relation analysis, the result from the supervised classification has been refined and its correctness rate is significantly improved. The detected buildings, based on the SVM output, which have been recognized and labeled by the algorithm, are shown in 2D with different colors in the bottom row of Fig. 10.5 (left). The detected buildings overlaid on the raw Pleiades image are, also, presented in the bottom right of Fig. 10.5. Moreover, the low detection rate can be observed in Fig. 10.6 where the detected buildings are presented. In particular, the detected buildings are shown in 3D, in the top of Fig. 10.6, while all scene buildings are shown in the bottom as they have been extracted from the ground truth data.

8 Conclusion and Future Directions

Computational change detection is a mature field that has been extensively studied from the geography, geoscience, and computer vision scientific communities during the past decades. An important amount of research and development has been devoted to comprehensive problem formulation, generic and standardize procedures, various applications, and validation for real and critical earth observation challenges.

In this review, we have made an effort to provide a comprehensive survey of the recent developments in the field of 2D and 3D change detection approaches in urban environments. Our approach was structured around the key change detection components, i.e., (i) the properties of the change detection targets and end products; (ii) the characteristics of the remote sensing data; (iii) the initial radiometric, atmospheric, and geometric corrections; (iv) the unsupervised methodologies; (v) the supervised methodologies; and (vi) the building extraction and reconstruction algorithms.

The aim was to focus our presentation on giving an account of recent approaches that have not been covered in previous surveys, and therefore, recent advances during the last 6 years have been reviewed. In addition, the change detection approaches were classified according to the monitoring targets (Table 10.1) and according to the remote sensing data that were design to process (Table 10.3). The unsupervised and supervised methods were classified according to their core algorithm that they were, mainly, based on (Table 10.4). Moreover, a summary of the currently available satellite remote sensing sensors, which were employed in recent studies, and their major specifications and cost are given in Table 10.2. Recent approaches focusing on 2D and 3D building extraction and modeling are given in Table 10.5, providing important computational frameworks which can be directly or partially adopted for addressing more efficiently the change detection problem. In particular, in a similar way with the change detection approaches that are based on post-classification comparison procedures, building changes can be extracted by comparing multitemporal building detection maps and reconstructed urban/city models.

Based on the current status and state of the art, the validation outcomes of relevant studies, and the special challenges of each detection component separately, the present study highlights certain issues and insights that may be applicable for future research and development.

8.1 Need to Design Novel Multimodal Computational Frameworks

In accordance with recent reports (Longbotham et al. 2012; Zhang 2012; Berger et al. 2013), this survey highlights that the fusion of multimodal, multitemporal data is considered to be the ultimate solution for optimized information extraction. Currently, there is a lack in single, generic frameworks that can in full synergy process and exploit all available geospatial data. This is a rather crucial issue since the effective and accurate detection and modeling requires rich spatial, spectral, and temporal (remote or not) observations over the structured environment acquired (i) from various sensors, including frame and push-broom cameras and multispectral, hyperspectral, thermal, and radar sensors, and (ii) from various platforms, including satellite, airborne, UAV, and ground systems. This is not a trivial task and a lot of research and development is, thus, required.

8.2 Need for Efficient Unsupervised Techniques Able to Identify “From-To” Change Trajectories

Unsupervised and supervised approaches are holding the same share of research interest. In particular, the unsupervised ones in many cases achieve the same overall detection accuracy levels as the supervised ones do (e.g., Longbotham et al. 2012). This is a really promising fact given the possible capability of (near) real-time response to urgent and timely crucial change detection tasks, without training samples available. In dense time series and big geospatial data analysis, this seems, also, the only possible direction. However, most applications require end products which report on the detailed land-cover/land-use “from-to” change trajectories instead of a binary “change or not” map (Lu et al. 2011c; Bruzzone and Bovolo 2013). The need for incorporating spatial context and relationships into the detection procedure and introduce automated algorithms able to detect changes with a semantic meaning is underlined from the present study.

8.3 The Importance of Open Data Policies

Furthermore, this survey exhibits the importance of open data policies. This is, mainly, due to the fact that the extensive recent research activity in regional scales has been boosted by the currently increasing US and EU open data policies and mostly by the opening of the United States Geological Survey’s Landsat data archive (Woodcock et al. 2008; Wulder et al. 2012) including current and future missions. Even not in a raw or quality-controlled format and not in a formal open data framework, there is an increasing availability of Google Earth/Street View, Microsoft Bing Maps/Streetside data which can also ease certain applications and studies. All these open data and open source (regarding software) initiatives and polices ensure the availability of big geospatial data and the availability of remote sensing datasets spanning densely over longer periods which, moreover, can enable further research towards quantifying global and regional transitions given the changing state of the urban environment, global and regional climate, biodiversity, food, and other critical environmental/ecosystem issues.

8.4 The Importance of Automation

The aforementioned availability of open big geospatial data impose as never before the need for automation. Despite the important advances and the available image processing technologies, powered mainly from the computer vision community, still, the skills and experience of an analyst are very important for the success of a classification/post-classification procedure (Weng 2011; Lu et al. 2011c), requiring human intervention which is labor consuming and subjective. Therefore, introducing generic, automated computational methods in every change detection component is of fundamental importance.

8.5 The Importance of Innovative Basic Research in the Core of the Change Detection Mechanism

Recent state-of-the-art change detection, classification, and modeling methodologies are not reaching high (>80 %) levels of accuracy and success rates when complex and/or extensive regions and/or local scales and/or relative small urban objects and/or dense time series have been explored in the urban environment (Wilkinson 2005; Longbotham et al. 2012; Berger et al. 2013; Rottensteiner et al. 2013). Thus, there is a strong need for designing new core classification, change detection, and modeling approaches being able to properly handle the high amount of spatial, spectral, and temporal information from the new generation sensors, being able to search effectively through huge archives of remote sensing datasets.

8.6 The Importance of Operational Data Preprocessing

Most standard remote sensing algorithms and techniques (classifications, indices, biophysical parameters, model inversions, object detection, etc.) assume cloud-free data, already radiometric, atmospheric, and geometric corrected. However, this is not an operationally solved problem yet. The production of a European cloud-free mosaic, two times per year, was not 100 % feasible despite the availability of three different satellite sensors and a considerable flexibility in the date windows around every region (Hoersch and Amans 2012). Moreover, in accordance with recent relevant studies (Vicente-Serrano et al. 2008), this survey underlines the fact that it is essential to accurately ensure the homogeneity of multitemporal datasets through operational radiometric and geometric data corrections including sensor calibration, cross-calibration, atmospheric, geometric, and topographic corrections and relative radiometric normalization using objective statistical techniques. Being able to address for the same invariant terrain object, the pictured different spectral signatures in time series data, being able to construct operationally cloud-free reflectance surfaces (Villa et al. 2012), will further boost the effectiveness and applicability of remote sensing methods in emerging urban environmental applications.

To sum up, the significant research interest on urban change detection and modeling is driven from real, critical, and current environmental and engineering problems, which pose emerging technological questions and challenges. Recent advances on the domain indicate that remote sensing and computer vision state-of-the-art approaches can be fused and further expanded towards the fruitful and comprehensive exploitation of open, big geospatial data.