Keywords

1 Introduction

Image classification is the process of assigning pixels or image objects to corresponding land cover/land use categories. The correct assignment of pixels or object to their respective classes cannot happen in isolation because the recognition of an object is influenced by the presence of other surrounding objects, as well as the overall scene context. Traditional pixel-based classification systems fail to correctly classify all the pixels to the appropriate class because they solely rely on low level pixel information and often produce topological errors such as finding a car on top of a building due to spectral similarities. Adding contextual information into a classification expert system can prevent such errors and improve the classification results. (Kumar et al. 2008; Marques et al. 2011; Cressie 2015). In fact, high level image information such as shape compactness, area measure, distance separating an object from the closest road or the closest parking area, the mean spectral value within a specific spectral band can give a unique combination of image information describing a unique object. However, finding the right combination of parameters describing that unique object is not an easy task when considering the large amount of information generated by the segmentation process. Moreover, methods presented in current literature rely on trial and error process and lack of objective techniques to assess the quality of image information to be considered for classification (Meinel and Neubert 2004; Blaschke 2010). This leaves the analyst with a very little control over the classification outcomes. In this paper we propose an approach based on Bayesian probabilities that evaluate the quality of image information through a classification simulation before it is integrated into a classification system in order to maximise classification accuracy. Through this technique, the analyst can have a great control over the classification results by deciding which image information to consider or not, based on the simulation results.

2 Data and Methodology

2.1 Dataset

High resolution satellite data such as GeoEye, Ikonos data would be the best for urban land cover/land use classification (Aguilar et al. 2012). Due to budget constraints we chose a 0.5 m aerial photograph provided free of charge by the National Geospatial Information (NGI) in Cape Town. The 0.5 m resolution colour image was acquired on 14th of April 2014 and possesses four multispectral bands covering the visible light spectrum composed of red(R), green(G), blue(B) bands and the near infrared(NIR). Due to the large extent of the area, a subset of the image was created compression free in order to preserve the quality of the data as compression alters the quality of image information (Campbell 2007). The vector data used to extract object metrics such as area measures, shape compactness, spatial distances between objects was produced by digitizing of objects’ outlines because polygon outline digitizing remains the most accurate technique for object metrics extraction (Chang 2008). The snapping distance while digitizing was set to 2 as larger measures can alter objects’ shapes (Chang 2008).

2.2 Modelling the Urban Scene

In urban analysis, modelling a scene is to identify the most relevant high level image information that describes objects within the scene. When it is very challenging to separate certain objects within an urban scene based on spectral properties, the discrimination can be done successfully based on size measures such as area measures (Pozzi and Small 2002). The area measure describes more accurately object outlines than measures such as perimeter. Small buildings and cars which sometimes share similar spectral properties in the red band can be separated based on their respective area measures because a car has an area measure two to three times smaller than a building. Objects shape measures such as shape compactness should always be considered in the analysis (Wentz 2000). Visually identifying and comparing objects on the basis of shape is easy and intuitive for humans to do, but difficult for artificial intelligence systems. As a result, numerous attempts have been made to quantify shape using measures such as shape compactness, the ratio of the object’s length and its width (Wentz 2000). In this study, the shape compactness measures were estimated using Eq. (1)

$$ shape\kern0.5em compactness=\frac{4\ast \pi \ast Area}{Perimeter^2} $$
(1)

This ratio returns values ranging from 0 to 1 and values closer to 1 describe more compact shapes while values closer to 0 characterize less compact shapes. The spectral signature of an object is very relevant in land cover/land use classification (Khedam and Belhadj-Aissa 2011). Spectral signature can improve the discrimination of an object from others because the spectral signature of a given object differs from one band to the other. For instance, vegetation class can be separated from other classes based on its spectral response in the near infrared and red bands. In fact 90% of spectral information describing vegetation is stored in near-infra red and red bands while only 10% of information is spread in other bands (Bonn and Escadafal 1996). To identify suitable spectral signatures that enable a good separation between the objects in the urban scene we used the minimum distance feature space optimization technique (Kumar et al. 2008). The spectral signatures found with large deviations distances were considered as suitable to separate the classes. High resolution multispectral photograph offers good object spatial detail that can be exploited in image classification. Objects within an image can relate to each other by spatial relationships such as distances between them. A good measure to estimate the spatial relationship between objects is the Euclidean distance (Poelmans et al. 2013).

The use of probabilistic analysis in handling uncertainty is very popular in applications such as aerial image classification (Flygare 1997). Bayesian networks provide a solid form of knowledge representation and a flexible approach of reasoning to predict values of non-observed variables (Table 1). Bayesian Networks can predict a land cover classification outcome based on single variable (land cover type) and data evidence of single variable to classify (Flygare 1997). The conditional probabilities were estimated in this study using Eq. (2).

$$ P\left(X/Y\right)=\frac{P\left(Y/X\right)P(X)}{P(Y)} $$
(2)
Table 1 An estimation of the land use/land cover classes’ states using probabilities

With X and Y the respective states of parent nodes (type of land cover) and child nodes (spectral, spatial… attributes) for each single variable. Table 1 gives an illustration of states of the parent node describing the building, roads, sport fields and grassland classes.

After estimation of the different states of land cover classes, child nodes were associated to parent nodes based on high level image information derived from the scene model. For this simulation, we described each class by at least two child nodes. We considered as high compactness, any values of shape compactness greater than 0.5, medium compactness any values located between 0.3 and 0.5 and poor compactness any values smaller than 0.3. Following these, 100% of building objects were found with high shape compactness as they approximate real world shapes while 15% of objects identified as roads were found with medium shape compactness and 25% of objects identified as grass land were found with high shape compactness.

To separate impervious surfaces from non-impervious surfaces we chose to work with the red band because it contains a maximum of spectral information characterising impervious surfaces. By analysing the reflectance of various objects in the scenes we found out that most of objects’ reflectance was either located below 80, between 80 and 140 or beyond 140. Following this, we labelled as “high spectral signature” any signature greater than 140, medium spectral signatures any signatures values found between 80 and 140 and low spectral signature any signature found below 80 (Table 2).

Table 2 An example of states of the child nodes based on pixel reflectance in the red band

After estimating the different states of parent and child nodes, a classification simulation was executed using Genie Smile software to determine the probabilities of occurrence of each land use/land cover class knowing the status of child and parent nodes. The simulation of a multi-criterion classification revealed that the building class would achieve a classification accuracy of 94.3% if the image classification is based on shape compactness values located between 0.3 and 0.5 and spectral signature greater than 140. Moreover the same combination of criteria would classify grassland with a poor accuracy 5.7% while roads would achieve a classification accuracy of 92.4%. If based solely on the high spectral signature characteristics buildings classification would achieve an accuracy of 51.9% and roads would achieve an accuracy of 12.3%, sport ground would produce a classification accuracy of 3.4% and grassland would achieve 32.3% classification accuracy. Based on medium spectral signature criterion in the red band, the building class would achieve a poor accuracy of 40.9% while roads and grassland classes would respectively achieve 49.2 and 9.8% accuracy. From the above it appears that no single decision criterion is sufficient and that maximum accuracy can be achieved by a combination of criteria.

2.3 Multi-scale Image Classification

After modelling the urban scene using objects’ metrics extracted from the digitized polygons, the image was segmented then classified based on the multi-scale object classification expert system described in Fig. 1.

Fig. 1
figure 1

Classification expert system. Final classifications are presented in rounded-edge shapes and the high level descriptive criteria in rectangle shape

3 Results

3.1 Segmentation

The different parameters used in the three segmentation levels are reported in Table 3. The resulting images associated to each segmentation level are also presented in Fig. 2. The segmentation scales parameters of 100, 180 and 200 were chosen because they correctly define the major urban objects within their respective land use/land cover classes (Ikokou and Smit 2013).

Table 3 Multi-resolution segmentation parameters
Fig. 2
figure 2

Multi-resolution segmentation results: (a) at scale of 55(level 0), (b) at scale of 100(level 1), (c) at scale of 120(level 2) and (d) at scale of 200(level3)

3.2 Classification Results

In order to evaluate the accuracy of our classification, an accuracy assessment was done. The traditional pixel-based error matrix was used for this evaluation. The overall accuracy, user’s and producer’s accuracies and the kappa coefficient were estimated. To achieve these the classification results were exported in raster format then compared to their respective reference samples. The reference data was manually collected from each land cover/land use class and sufficient ground samples were collected to ensure the accountability of spectral variability within the each class (Salehi et al. 2012). The overall classification accuracy produced by our technique achieved 96% with a Kappa coefficient of 92.03 (Fig. 3).

Fig. 3
figure 3

On the left image all the large buildings were classified as single class. By the use of contextual information the single class was broken down into commercial and educational buildings

The overall classification results achieved showed that selecting high potential object’s features that strongly described individual classes can reduce misclassification and poor accuracy. The use of Bayesian probabilities has played a very important role in this selection of features that led to the satisfactory results.

4 Discussion and Conclusion

The classification system proposed in this study produced improved results to those of Salehi et al. (2012) and Breytenbach et al. (2013) who used similar classification techniques but neglected the assessment of the quality of high level image information prior the classification process. Morphological and topological features have contributed in improving the separation between impervious surface classes including roads, residential buildings, commercial buildings, educational buildings and parking areas. These results revealed that more consideration should be directed to the use of morphological features rather than solely spectral features when classifying impervious land use/land cover classes. The very promising classification results obtained in this study highlighted the usefulness of evaluating the potential of image information prior to the classification process in order to achieve better results especially when classifying very heterogeneous areas such as urban areas with high resolution imagery. Instead of relying on trial and error selection of image information this research presents a novel multi-scale image analysis system suitable for high resolution data when classifying complex urban environment. The method offers a robust, practical, fast and easy to use system for classifying high resolution imagery of urban cities. The technique overcame the spectral and spatial complexity of the study area, resulting in an overall classification accuracy of 96%. This degree of agreement between the classified objects and the corresponding references is very promising and shows the great benefit of considering only relevant image information for object-based classification.