Unsupervised mapping of a hybrid urban area in South Africa

Ikokou, Guy Blanchard; Smit, Julian

doi:10.1007/s12518-021-00379-y

Unsupervised mapping of a hybrid urban area in South Africa

Original Paper
Published: 07 June 2021

Volume 13, pages 619–643, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Geomatics Aims and scope Submit manuscript

Unsupervised mapping of a hybrid urban area in South Africa

Download PDF

Guy Blanchard Ikokou¹ &
Julian Smit²

208 Accesses
Explore all metrics

Abstract

Hybrid urban areas are dominated by important spectral mixtures from formal and informal housing units which make them difficult to map even for the most robust classifier. Proposals to introduce other descriptive features, such as size, shape, texture, and context into the classification process, come with another drawback which is how to ensure the selected feature thresholds are optimal. Image segmentation which is the backbone of object-based analysis depends on a range of parameters including scale parameter, shape, smoothness, colour, and compactness weighting factors. Current techniques to select optimal segmentation scales only give the remote sensing analyst control over one parameter out of five (20%). This study proposes a classification strategy that gives the analyst control of 60% of the parameters to ensure an acceptable segmentation outcome. The study also proposes a feature selection approach that eliminates feature overlaps within the feature space which may not be observable within the original data. An automatic optimal parameter selection function is also proposed in this study. Tested on a SPOT5 resolution merge image, the approach overpowered the accuracy metrics of (Kemper et al.in Int Arch Photogramm Remote Sens Spat Inform Sci 40(7): 1389, 2015) with overall, sensitivity, specificity, precision, true skill statistic accuracies of respectively 0.97, 0.96, 1, 0.942, 0.95 against 0.97, 0.804, 0.98, 0.477, and 0.781. Similar trends are observed with the smallest average error of omission for built-up and non-built structures at 0.042 and 0 against to 0.196 and 0.164. The errors of commission for built-up and non-built-up structures were 0.060 and 0.008 respectively compared to 0.523 and 0.585.

Optimizing the Selection of Spatial and Non-spatial Data for Higher Accuracy Multi-scale Classification of Urban Environments

Feature Extraction in Urban Areas Using UAV Data

Urban structure type mapping method using spatial metrics and remote sensing imagery classification

Article 12 June 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Modern cities are generally composed of well-structured housing units and street patterns. The quest for better living conditions in main cities has resulted in the development of informal settlements within or at peripheries of towns, creating a town or city with a mixture of formal and informal housing structures (hybrid urban areas). Planning for service delivery in these informal areas requires detailed land cover information. Image classification has been widely used to extract land information from satellite and aerial images (Herold et al. 2003; Du Plessis 2015; Kohli et al. 2016; Debbage et al. 2017). The improvements in sensor resolution has resulted in high detailed geospatial information. However, one drawback from such improvement is the high spectral variability within individual classes that can lead to intra class confusion due to similarities in spectral signatures. The situation becomes even more exacerbated with complex environments such as urban areas, which have complex and unclear object spectral signatures (Ben-Dor et al. 2001; Herold et al. 2004; Kuffer and Barrosb 2011). For instance, Heiden et al. (2001) reported that tile materials such as polyethylene, bitumen, and concrete mainly dominate roofs of residential buildings while zinc materials dominate non-residential structures such as commercial buildings. The authors reported that low reflectance in long wavelengths was the main characteristic of roof made of zinc material while a strong reflectance in these wavelengths would characterise roof materials made of chains of hydrocarbons such as polyethylene or bitumen. In addition, rooftops made of concrete have a low reflectance in short wavelengths.

Herold et al. (2002) and Herold et al. (2004) reported that roads have increasing reflectance with long wavelengths and roads made of concrete and gravel can reach reflectance peaks in the infrared band. Concrete roads were described with high reflectance in the visible light spectrum while new asphalt roads show low reflectance in short wavelengths and high reflectance in long wavelengths including the visible and infrared. The authors reported that red tile and wood shingle roofs exhibit high reflectance in the infrared band. It is also argued that the presence of iron oxide in the red tile roofs increase the absorption in the visible light (Weng and Quattrochi 2006). Moreover, high reflection in the green band is a key attribute of water bodies while urban vegetation shows high reflectance in the red and infrared (Herold et al. 2003; Jilge et al. 2016).The development of very high spatial resolution sensors have also made objects’ shapes and contextual attributes crucial in image classification. (Reigber et al. 2007; Meng et al. 2009; Chen et al. 2012). It was reported in Steiniger et al. (2008) that similar spatial advantages can also be made available from high-resolution aerial photographs.

Numerous image classification strategies of urban areas have been reported in the literature. Novack and Kux (2010) proposed an object-based classification strategy of an informal settlement in Sao Paulo in Brazil using a high-resolution Quickbird image. The approach used the Segmentation Parameter Tuner algorithm for the selection of optimal scale parameters. The principle underlying the selection process is that a parameter search is performed based on the fitness between a training sample drawn by the user and the segmentation produced by the algorithm (Costa et al. 2008). The authors extracted geometrical, spectral and textural segment features to train the object-based classification. The technique achieved a classification accuracy of about 70% with a kappa agreement of 65%. One drawback of the scale parameter selection process is the restriction in user defined search range as the proposed range may not include certain optimal scale parameters for the segmentation (Meyer and Niekerk 2016). An alternative image classification approach was presented in Odindi et al. (2012) who performed a land cover classification of Port Elizabeth using a Landsat image. The authors used the statistical K-means and ISODATA pixel-based classifiers to extract built up structures, green vegetation, water bodies, dune, and bare ground. Although the strategy has been widely used for land cover mapping (Abbas et al. 2016), the centroids estimated by the K-means are not always representative of their respective classes. The other limitation of the algorithm pointed out in Singh et al. (2011) is the presence of empty clusters during the classification. Furthermore, a comparison between the Iterative Self Organizing Data Analysis and eCognition has highlighted the superiority of the object-based algorithm as the latter relies on meaningful objects rather than individual pixels (Manakos et al. 2000). Similarly, Fundisi and Musakwa (2017) classified high-resolution Pleiades images of an urban area in Gauteng province (South Africa) using ISODATA algorithm in ENVI software. The mapped urban area was dominated by vegetation cover, which explains the high overall classification accuracy of 85.5% and kappa agreement of 77%. More recently Gxumisa and Breytenbach (2017) also pointed out the superiority of object-based classification approaches over their pixel-based counterparts when classifying a SPOT5 multispectral image covering the Soshanguve area in Gauteng province, South Africa. Object based classification strategies have also been reported as more suitable for heterogeneous areas such as urban areas (Kemper et al. 2015; Degerickx et al. 2017; Kuffer et al. 2017; Ouerghemmi et al. 2017; Van der Linden et al. 2019). One major contribution to the good performance of the proposed strategy in Gxumisa and Breytenbach (2017), with regard to building extraction, was the introduction of elevation data into the classification process to minimize the influence of pixel similarity among classes. However, one suggestion made by the authors to improve the accuracy was the use of techniques such as principal component analysis that identifies suitable spectral bands that would best describe each individual class (Gxumisa and Breytenbach 2017).

This study proposes a feature selection method combining a principal component analysis which identifies the possible optimal objects’ descriptive features which are then projected onto a Chebyshev’s matrix to eliminate possible features’ overlaps which may not be observable in the original data. The final candidate features are qualified as unique after “optimization” in order to ensure an acceptable object identification during the classification process. The selection approach is tested on inter-class distance features to produce the more potent measures that separate each object from its neighbours. The proposed segmentation parameters’ selection approach includes a local search through which possible optimal combinations of scale parameter/compactness/colour are identified to produce good quality segments at a single level segmentation. The obtained parameters are further processed through a global search tool to identify how many segmentation levels are needed to represent most of the land use/land cover classes in the imagery and reveal the associated parameter combinations. An additional scale/compactness search function is proposed to automatically identify best parameter matches to achieve acceptable segmentation outcomes.

Study area and methods

Study area and data

To test our image classification strategy, we chose the city of Stellenbosch due to its small size (Fig. 1), its challenging landscape with mountains and hills as well as its diversity in urban vegetation cover, building footprints and water bodies including dams, swimming pools, and reservoirs. Stellenbosch is a small town in the Western Cape Province located at 33.9321° S and 18.8602° E. Stellenbosch municipality covers an area of 831 km². To the west and southwest, it extends as far as the urban edge of the Cape Town metropolitan area while to the east and southeast it is bounded by mountain ranges. The western part of Stellenbosch municipality and the eastern part of Franschhoek valley are separated by mountains. With a population of 77,476 inhabitants, about 50% of residents live in suburbs including but not limited to Idasvallei, Coatesville, Die Boord, Brandwatch, Jamestown, Paradyskloof and the shantytown of Kayamandi at the North West periphery of the city. The university is located near the city centre while some schools are spread within the city and the shantytown of Kayamandi. Land use and land cover in Stellenbosch is similar to most South African cities including but not limited to roads, residential, urban vegetation, commercial, industrial, educational buildings, and water bodies.

The data used in this study includes a multispectral SPOT5 10 m spatial resolution and a 2.5-m spatial resolution panchromatic images. The historical imagery was captured on 20th November 2008. In addition to the satellite imagery, a 0.5-m high resolution aerial photograph covering the study area was acquired from the National Geospatial Information office in Cape Town. All satellite data was supplied with metadata files by the South African National Space Agency (SANSA). The date of image capture did not matter for the study since the focus was more on the land use/land cover classes which obviously may have expanded but this has no technical implications on the presented methodology.

After the pre-processing of the satellite imagery, we segmented the enhanced image using 8 randomly selected scale parameters to collect segments’ brightness attributes. The series of segmentations were done keeping the shape/colour as well as the compactness/smoothness parameters at 0.5 for equal influence on the results and we gave each band a weight of 1 so that the multiresolution algorithm takes into account all the spectral information made available by each band in eCognition. The collected brightness measures were representative to all classes involved in this study according to the land use/land cover classes. The total number of samples collected within the study area represent a count of 1460 objects’ spectral brightness measures selected across the image to avoid a bias representation of certain classes. It must be noted that the shape compactness we are referring to here is distinct from the segmentation parameter associated to the smoothness and will be computed as follows:

$$Shape\kern0.5em compactness=\left[4\pi \times objectarea\right]{\left( object\kern0.5em perimer\right)}^{-2}$$

(1)

Because of spectral similarities across land use/land cover within urban areas, attributes such as inter-object separation distances, the perimeter measures, the shape compactness indices, object’s lengths, width, length over width ratios will be considered in the classification process. However among the 7 mentioned attributes, the inter-object separation distance seems to be the most complex, due to the fact that in residential urban areas a large amount of buildings may share the same proximity distance. Moreover, classes such as urban trees which may be very close to residential buildings may exhibit similarity of proximity distance to residential buildings. This would give serious difficulties even to the most robust classifier to separate these classes under such circumstances. One solution we will explore and incorporate into the classification strategy is to propose an approach that could refine these distance topology relationships and offer optimised measures that could minimize misclassifications.

To minimize computer storage and the analysis time and reduce the image analyst effort, it is proposed to reduce the size of the images. The satellite images and the aerial photograph used in this study were cropped using Gimp 2.1.0.8 software compression free to not alter the pixel values of images statistics (Campbell 2006).

Study methodology

Geometric corrections

The imagery provided for the study was SPOT5 imagery of level 1A which consequently requires geometric corrections before any further analysis (Sowmya et al. 2017; MohanRajan et al.2020). Geometric correction of satellite imagery consists of modelling the relationship between coordinates on images and ground coordinates. The first-order polynomial model was disregarded for our area since it is more suitable for flat landscape and our study area is dominated by hills and mountains. Instead, the second degree polynomial model was selected for the geometric corrections (Mather and Tso 2016). All the satellite images were geometrically corrected using Lo 19 projection which is a local projection system, using the corrected 2008 colour aerial photograph as a reference. The panchromatic satellite image was first registered to the aerial photograph to preserve the high spatial resolution then the multispectral image was co-registered to the panchromatic satellite image using Erdas Imagine software. The nearest neighbour resampling method was used because according to Mather and Tso (2016), it does not alter the original pixel values and produces less distortions compared to cubic convolution and bilinear interpolation (Baboo and Devi 2010). We used a total of 10 ground control points which were identified in both satellite images and the aerial photograph. The total root mean square error of the correction of an image is estimated through as follows (Rocchini and Rita 2005):

$$Total rms=\sqrt{\frac{1}{n\sum \left({u}^{2}+{v}^{2}\right)}}$$

(2)

Figure 2 presents the results of the image registration process. It can be observed that there is a continuation of linear features between the multispectral SPOT5 image on the right and the colour aerial photograph, revealing the success of the geometric correction strategy.

Reflectance normalisation

To collect remotely sensed data of lasting values, the data must be calibrated to physical units such as reflectance because the radiance recorded by the sensor for each pixel is an apparent radiance. This apparent radiance is the combination of the radiance of the object on the earth surface and atmospheric effects (Rani et al. 2017). The estimation of ground reflectance requires the conversion of pixels’ DN values to apparent radiance, then conversion of the apparent radiance to apparent reflectance and finally the apparent reflectance is converted to ground reflectance. For SPOT imagery, the conversion from pixel DN values to apparent radiance is done through the following (Mather and Tso 2016):

$${L}_{\lambda }=\left[\frac{Gain}{Pixel\kern0.5em DN\kern0.5em value}\right]+ Bias$$

(3)

With $L_{\lambda }$, the apparent radiance. The conversion of the apparent radiance to the apparent reflectance is done through the following equation (Mather and Tso 2016):

$${\rho}_{\lambda }=\frac{\pi {L}_{\lambda }{d}^2}{ESUN_{\lambda}\cos {\theta}_s}$$

(4)

With $\rho_{\lambda }$ the apparent reflectance, $ESUN_{\lambda }$ the exo-atmospheric solar irradiance in $Watts/m^{2} \mu m$, $d$ the Earth-Sun distance in astronomical units and $s$ the solar zenith angle. The Earth-Sun distance is estimated as follows (Mather 2004):

$$d=1-0.01674\cos \left( JD-4\right)$$

(5)

The target reflectance is estimated by multiplying the apparent reflectance by 400, rounded and encrypted back to 8bits radiometric resolution through the following piece of code implemented in Erdas function environment as follows:

$${\displaystyle \begin{array}{l} IF\;\left.\left( ROUND\left( reflec\tan ce\times 400\right)\right)\right\rangle 255\kern0.5em THEN\\ {} DN=255\\ {} ELSE IF\left( ROUND\left( refelc\tan ce\times 400\right)\right)\left\langle 0\right.\\ {} DN=0\\ {} ELSE\\ {} DN= ROUND\left( reflec\tan ce\times 400\right)\\ {} ENDIF\end{array}}$$

(6)

All the parameters used in the conversion of apparent radiance to the ground reflectance were provided in the metadata files of the satellite imagery. The result of the conversion from the pixel DN to ground reflectance is a range of pixel values from 0 to 255 grey levels in absence of atmospheric errors. A visual analysis of our imagery revealed the presence of water bodies within the study area, thus the lowest reflectance value expected should be zero or close to zero in the infrared band. However, the statistics of our multispectral image show that the image still has atmospheric distortions as illustrated by Fig. 3 as follows.

From the observation of Fig. 3, there is a need for further processing of the imagery to reduce the minimum pixel value attributed to water bodies to zero or a value very close to zero.

Atmospheric corrections

Several atmospheric correction models exist in the literature (López-Serrano et al. 2016; Sowmya et al. 2017; Boakye, et al. 2020.). Atmospheric correction methods can be related to the spectral resolution of the available multispectral satellite imagery and the availability of image capture data (Dutta and Das 2019; Lhissou et al. 2020; Miky 2019). Figure 4 details a workflow guiding the selection of the appropriate atmospheric correction method.

Observing both paths from the multispectral imagery the SPOT5 multispectral image satisfies the left path requirements. From the four correction models proposed it was noted that the empirical atmospheric correction model requires ground calibration data in the scene and the ancillary data provided with the imagery did not contain such information. The dark object subtraction model assumes that no atmospheric transmittance is lost and that there occurs no diffuse downward radiation at the surface (Song et al. 2001), but the hilly and mountainous landscape of Stellenbosch town does not satisfy these requirements. Moreover, a visual analysis of the land use/land cover classes on the satellite imagery and aerial photograph did not show the presence of dense vegetation cover, excluding the possibility of using the Dense Dark vegetation method. The radiative Transfer Model seems to be suitable for our study area, ATCOR2 and ATCOR3 available in the image pre-procession software PCI Geomatica, are such Radial Transfer Models. Since ATCOR2 is more suitable for flat landscape (Richter and Center 2004) ATCOR3 was selected for our study area and the tool has been used for atmospheric corrections of mountainous areas (Tan et al. 2012; Ateşoğlu and Tunay 2014). Since ATCOR3 requires the use of a digital elevation model, we used contour lines and GPS point’s coordinates provided by the National Geospatial Information office to produce a digital elevation model using ArcGIS software. Figure 5 shows the outcome of the atmospheric correction process. After selection of a few water body segments, it can be observed that the lowest pixel value in the infrared band was reduced to 0.083018 while the brightest water body segment had a reflectance of 0.50404, which is expected in absence of atmospheric distortions on the objects’ ground reflectance values.

Data fusion

Poor spatial and spectral resolutions have been a great challenge for urban mapping. Some authors have suggested a spatial resolution of a multispectral image finer than 5 m (Harold et al. 2003). High spatial resolution is required for a better description of metrics such as objects’ shapes whereas different object and land surfaces are better identified if high spectral resolution is available (De Jong and Van DerMeer 2005). The SPOT5 panchromatic and multispectral resolutions are respectively 2.5 m and 10 m, offering respective pixel sizes of 6.25m² and 100m². Informal building units are reported to have sizes between 6 and 20m² while formal residential building units are described to have sizes greater than 30m² (Busgeeth et al. 2008). As a consequence, using an image that can provide rich spectral information about the objects on the earth’s surface but provides coarse spatial resolution may not be the good combination to extract informal housing units. A solution to this dilemma is to bring together the high spectral resolution property of the multispectral image and the high spatial resolution property from the panchromatic image into one single image to benefit from both properties. Several resolutions merge approaches have been reported in literature with each technique offering its strengths and weaknesses (Simone et al. 2002; Ghassemian 2016; Pohl and Van Genderen 2016). Shamshad et al. (2004) investigated four resolution merge techniques including the Principal Component Analysis, the Multiplicative, the Brovery Transform and the Wavelet Transform resolution merge methods. The study reveals that all four techniques improved the image spatial resolution but only the Principal Component Analysis and the Wavelet Transform preserved the statistical parameters of the bands. For this study the Wavelet Transform method included in Erdas Imagine software was used. The choice on the Wavelet Transform over the principal component analysis was based on the fact that the method does not alter the image radiometric resolution (Shamshad et al. 2004; Mehra and Nishchal 2014). The outcome of this process is a multispectral image that possesses both the high spectral resolution and the high spatial resolution derived from the input images as shown in Fig. 6.

The visual interpretation of the original 10 m multispectral SPOT5 image reveals it would have been very difficult to extract high quality building outlines due to the poor spatial resolution. With the resolution merge performed, it can be observed that some building outlines are well represented.

Features identification

Spectral features

Spectral features play a vital role in describing objects on the Earth surface as discussed previously. To describe objects within our area few spectral metrics were estimated in order to separate land cover/land use classes from one another when running the classification algorithm. NDVI indexes have been proven efficient to separate vegetation from non-vegetation classes (Gandhi et al. 2015; Hashim et al. 2019). To separate vegetation from non-vegetation classes a threshold measure of the index was estimated to isolate vegetation class from other classes. In addition to the NDVI index, thresholds in the green band were also estimated to enhance the discrimination between the vegetation class and other classes. Spectral thresholds in the infrared band were estimated to isolate buildings from water bodies. Additionally, some spectral thresholds were estimated to separate residential from non-residential building classes.

Shape features

Shape can enable to separate tree patches from green building roofs. For that purpose shape compactness thresholds were estimated using the equation in (1). A certain number of shape compactness thresholds were estimated to separate residential from non-residential buildings and to separate informal housing from formal housing units.

Size features

Size characteristics including area sizes, perimeters, lengths, width, and length over width ratios were estimated. For instance, some length over width thresholds were estimated to separate roads from non-roads classes. Length thresholds were also estimated to isolate roads from non-road classes. Objects’ size measures were also estimated in order to identify the most meaningful measures that can separate land use/land cover classes from one another. Some area size thresholds were selected to separate buildings from water bodies as well as separating residential from non-residential buildings. Area size thresholds of parking were estimated to separate educational buildings from commercial buildings as well as informal housing units from formal housing units.

Distance features

High-dimensional data are very common in image classification when multiple features such as proximity distances between various components of land use/land cover classes are to be considered (Lever et al. 2017). Proximity between objects in urban areas is among the most diverse, making it very difficult to separate one class from others due to the large dimensionality of the data. One solution we opted in order to identify the most prominent distance features that enable the separation between classes and reduce the dimension of the data is to process the collected measures through a Principal Component Analysis (Ng 2017; Cushion et al. 2019). For the purpose, distance measures were manually collected after digitizing various classes samples in ArcGIS then recorded the different distances that separate classes using the measure tool in ArcGIS. Table 1 presents the averaged distance measures between the various land use/land cover classes.

Table 1 Averaged inter-class mutually separating distances

Unsupervised mapping of a hybrid urban area in South Africa

Abstract

Similar content being viewed by others

Optimizing the Selection of Spatial and Non-spatial Data for Higher Accuracy Multi-scale Classification of Urban Environments

Feature Extraction in Urban Areas Using UAV Data

Urban structure type mapping method using spatial metrics and remote sensing imagery classification

Explore related subjects

Introduction

Study area and methods

Study area and data

Study methodology

Geometric corrections

Reflectance normalisation

Atmospheric corrections

Data fusion

Features identification

Spectral features

Shape features

Size features

Distance features

Image classification

Finding optimal local segmentation scale-compactness parameters combination

Finding the optimal number of segmentation levels needed for the study area

Global search of optimal scale parameter-compactness combination

Automatic selection of scale parameter and best fit compactness thresholds for optimal segmentation of the study area

Image classification

Results

Spatial distance feature analysis

Spectral properties of features

Size and shape properties of features

Automatic selection of best fit scale parameters and compactness thresholds

Image segmentation

Image classification

Discussion and conclusion

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation