Introduction

Urban models are computer-based simulations used for testing theories about spatial location and interaction between land uses and related activities. They also provide a digital environment for testing the consequences of physical planning policies on the future form of cities. Three-dimensional (3D) city models of urban areas are an important input for many applications in the field of urban monitoring. The 3-D city models are also used for simulation and planning in case of catastrophic events like flooding, tsunamis or earth quakes. With the availability of very high resolution (VHR) satellite data, the large urban areas can be investigate fast and inexpensively regarding their three dimensional shape. Most of the actual methods used for the generation of city models depend on a large amount of interactive work.

Utilization of Virtual 3D city models gives overview of the existing city, which is useful for taking decision. The future development plan can be visualized by overlaying on the existing virtual 3D city models, which enable planners to simulate and making intelligent decisions. The possible data sources for a 3D city model are cadastral data, digital terrain models, building models, street-space models and green space models (Döllner et al., 2006). The concept of n-D urban model given by Hamilton et al. (2005), which provides a holistic view of the city by integrating various data information. The information includes 2D maps, 3D urban models, thematic information, Timeline analysis, local survey, policy and regulation developed by the local authorities.

The 3D city models are also useful during the disaster management such as floods, fire and earthquakes. They play a vital role in the rescue operations for the development of the strategic planning which helps people to be rescued effectively (Kolbe et al., 2005).

In recent years, establishing 3D city models and Geographic Information Systems (GIS) is increasingly popular. Due to recent developments in computer technology, visualization is becoming more important and effective for the professionals who deal with information systems. Due to continuous technological updation, requirements of virtual reality, 3D GIS, urban modeling etc., are continuously increasing in various applications. A 3D city model is a 3D representation of a city or an urban environment, using data derived from multiple sources. These sources are as stereo aerial images, airborne LIDAR data and high resolution satellite imagery. The multiple source contains a large number of objects of different classes, structures and different data models.

Quality Classes of 3D City Models

For representation of the 3D city model standards has been defined by the City Geographic Markup Language (CityGML).

CityGML

CityGML is a XML based encoding based on the application schema for the Geography Markup Language (GML) for the representation and storage of the 3-D city and landscape models covers the geometrical topological and semantic aspects (www.citygml.org, Kolbe et al. 2005). It is an open data model and a profile of GML3, international standards for spatial data exchange issued by Open Geospatial Consortium (OGC). CityGML is compatible with OGC web services such as Web feature services, Web processing services and catalog service. The class taxonomy of CityGML distinguish between buildings and non-building structures such as vegetation, water bodies and transportation facilities (Kolbe et al., 2005). The CityGML dataset contain and define information in five levels starting from level-0 to level-4 (Fig. 1).

Fig. 1
figure 1

Different level of details of 3D city model

  • Level of detail 0 (LoD0): It includes Digital terrain model overlaid with Aerial image

  • Level of detail 1 (LoD1): It Includes Building block model

  • Level of detail 2 (LoD2): Further enhanced by addition of Roof structures and facade textures

  • Level of detail 3 (LoD3): Enhanced by detailed roof and wall structures including balconies and vegetation

  • Level of detail 4 (LoD4): Interior structures like rooms, doors, stairs, and furniture

The different level of detail required different types of the data collection. Models with LoD-0 can be produced from overlaying of the raster data sets over the Digital Elevation Model. LoD-1 requires geometry (footprints and height) of the objects and they will appear with their elevation on LoD-0. The extraction of feature details is further enhanced if the data is of very high resolution to extract roof structure to upgrade the level of details from LoD-1 to LoD-2. Aerial imagery is also used to provide texture since it is more economical and no other source are available (Steidler and Beck, 2005) for detailing up to LoD-3.

However aerial images do not lead to high resolution textures due to oblique view cause distortion in texture of vertical wall. To create view from a pedestrian perspective, the texture must be applied in fine resolutions provided only by terrestrial photography (Göbel and Freiwald, 2008) to develop proper LoD-3 model. Thus many projects use terrestrial images for texturing the objects (Holzer and Forkert 2004). Terrestrial surveys can provide all necessary data but the costs are higher than those of airborne methods. Thus methods like car-mounted fisheye cameras have been adopted to reduce the costs (Forkert et al. 2005). Airborne Laser Scanning (ALS) also used for automatic detection and modelling of buildings (Rottensteiner, 2003). Detailing of LoD-4 needs information of Interior structures, which can be provide by the architectural map as shown in the case study of the Vienna underground modelling (Forkert 2006).

The present study aims to develop techniques for 3D modelling of buildings for urban morphological analysis. An attempt has been made to develop 3D city model with LOD-1 for part of Ahmedabad city, Gujarat and Hyderabad city, Telangana State. The broad methodology involves use of extraction of the urban building foot prints using object oriented classification and extraction of urban 3D features using Digital Elevation Model.

Objectives

The main objective was to develop procedures for 3D modelling of buildings having following sub-objectives.

  • Automatic extraction of building footprints from satellite images by developing rule based approach

  • Generation of DEM and extraction of building heights

  • Development of LOD-1 3D city model.

Study area

In this work two cities, namely Ahmedabad (Gujarat) and Hyderabad (Andhra Pradesh) were considered for the development of the 3D city model.

  1. a.

    Ahmedabad is the largest city and former capital of the Indian state of Gujarat covering an area of 464 km2. It is the fifth largest city and seventh largest metropolitan area of India. Ahmedabad is located on the banks of the river Sabarmati.

  2. b.

    Hyderabad is the capital and largest city of the southern Indian state of Andhra Pradesh. Occupying 650 km2, along the banks of the MusiRiver. Hyderabad has a population of 6.8 million and a metropolitan population of 7.75 million. Hyderabad is the fourth most populous city and sixth most populous urban agglomeration in India.

Data used

In this study Indian Remote sensing satellite data were used for the generation of the 3D city model. CARTOSAT-1 is the first Indian Remote Sensing Satellite capable of providing in-orbit stereo images. The images are used for Cartographic applications meeting the global requirements. Cameras of this satellite have a resolution of 2.5 m. It provides stereo pairs required for generating DEMs, Ortho Image products, and Value added products for various applications of GIS. In this study 4 pairs of Cartosat-1 stereo data is used for the Hyderabad city and two pairs of Cartosat-1 stereo data were used for the Ahmedabad city.

CARTOSAT-2A is the thirteenth satellite in the Indian Remote Sensing Satellite series (IRS). It is a sophisticated and rugged remote sensing satellite that can provide scene specific spot imagery. This satellite carries a Panchromatic Camera (PAN). The spatial resolution of this camera is better than 1 m and has a swath of 9.6 km. It have various application cartography, infrastructure development and in natural resource management. In this study 6 scene of Cartosat-2 data were used for the Hyderabad city and 3 scenes of Cartosat-2 data were used for the Ahmedabad city.

LISS-IV can work either in panchromatic or in multispectral mode with a resolution of 5.8 m. For Resourcesat-1, the swath width varies from 23.9 km in multispectral mode to 70.3 km in panchromatic mode. In this study 4 scene of LISS-IV multispectral data is used for both the city.

Methodology

The approach for the automatic derivation of building block model of the city using Indian remote sensing satellite data is given in the Fig. 2.

Fig. 2
figure 2

Methodology for derivation of digital building model

DEM Generation

Photogrammetric techniques are used for the Generation of DEM using stereoscopic data such as Cartosat-1, Ikonos, Quick bird, SPOT and Prism. Earlier the physical sensor models were used but now a days, RPC models is more popular for the processing of stereoscopic data. The Rational Function Model (RFM) coefficients, called RPCs, provided along with the Cartosat data are essentially a form of generalized sensor model. These model parameters are sensor independent, have high fitting accuracy and are real time calculated (Tao and Hu 2002).

Rational Function Model

A sensor model relates 3D object point positions to their corresponding image positions through the collinearity condition. The RFM relates object space coordinates to the image space coordinates. The image pixel coordinates (x, y) are expressed as ratios of polynomials of ground coordinates (X, Y, Z). Generally they are represented as third order polynomials. Ratios have a forward form:

$$ \mathrm{x}=\mathrm{P}1\left(\mathrm{X},\mathrm{Y},\mathrm{Z}\right)/\mathrm{P}2\left(\mathrm{X},\mathrm{Y},\mathrm{Z}\right)\mathrm{and} $$
(1)
$$ \mathrm{y}=\mathrm{P}3\left(\mathrm{X},\mathrm{Y},\mathrm{Z}\right)/\mathrm{P}4\left(\mathrm{X},\mathrm{Y},\mathrm{Z}\right) $$
(2)

This equation is called upward RF. Usually RF model is generated based on a rigorous sensor model.

Pi:

(i =1 2, 3 and 4) are the polynomial functions

These co-efficient need to be updated by utilizing ground control points and there are various methods to update RPC solutions, in the absence of the sensor model (Tao and Hu 2002). In this study, a complementary transformation has been used to update the accuracy of the GCPs. It is an indirect method in which an affine transformation is developed between the scan and pixel values of the control points. The transformation also relate relationship between measured coordinates of the control points (Tao et al., 2004; Tao and Hu, 2002) and the forward RPCs. This transformation takes care of all the distortions including drift, drag and scale. In this study Cartosat-1 stereo pairs over Ahmedabad and Hyderabad city along with the corresponding RPCs were used to generate DEM.

DEM Accuracy

The DEM generated using Cartosat-1 stereo data over Ahmedabad and Hyderabad city needs to be evaluate in prior to generate the building elevation information. In order to estimate the accuracy of the DEM the Differential Global positioning System (DGPS) measurements were taken by establishment of a reference station in both the region. The reference stations were setup by taking continuous measurements of 96 h and IGS stations observation with precise ephemeris using Berneese software. The 25 and 27 highly precise points were established over Hyderabad and Ahmedabad city with reference to base stations. The maximum error of the DEM is found 5.6 and 5.3 m with standard deviation of 1.85 and 2.04 m for Hyderabad and Ahmedabad city respectively.

Data Merging

A sensor has radiometric, spatial, spectral, and temporal resolution. The Resolution Merge function perform for resampling low spatial resolution data to a higher spatial resolution while retaining spectral information. In this study panchromatic data of Cartosat-2 used to merge with the resourcesat-2 LISS-IV data to produce the 1 m multispectral output. The generated multispectral image is further utilized for the classification of the building foot print using object oriented approach.

Object based classification

Assigning a class to a cluster of DN value of image is known as image segmentation or image classification. In remote sensing there are two methods of classification, one is pixel based classification and other is object based classification. In contrast to pixel based classification methods that classify individual pixels directly, on the basis of tone/reflectance value. Object based classification first aggregates image pixels into spectrally homogenous image object using image segmentation algorithm. Further it classifies the individual object based on one or more object characteristics.

Multiresolution Segmentation

For object based classification the popular segmentation approach is known as Multiresolution segmentation. The Multiresolution segmentation algorithm consecutively merges pixels or existing image objects. It is based on a pair wise region merging technique. It is an optimization procedure which, for a given number of image objects, minimizes the average heterogeneity and maximizes their respective homogeneity. This homogeneity criterion is defined as a combination of spectral homogeneity and shape homogeneity. The algorithm also locally minimizes the average heterogeneity of image objects for a given resolution of image objects.

The object homogeneity to which the scale parameter refers is defined in the Composition of Homogeneity criterion field. Higher values for the scale parameter results in larger image objects and smaller values for smaller objects. The scale parameter determines the maximum allowed heterogeneity for the resulting image objects. In this circumstance, homogeneity is used as a synonym for minimized heterogeneity. The internal criteria for determining homogeneity are defined in terms of color and shape. The shape of the object is further divided into smoothness and compactness of the object (Fig. 3). In this study, we have kept 0.8for color and 0.2 for the shape. In shape, we have kept compactness and smoothness same as0.5 with a scale parameter of 15.

Fig. 3
figure 3

Multiresolution concept flow diagram

Classification of Building Footprints

In the process of segmentation, the aggregation of the pixels is converted into image objects. The image objects has two major advantage. The first advantage is the performing of reduced number of the class assignments and the other one is that the objects are more meaningful towards the feature assignments, which can be evaluated for classification based on the different object properties such as form, spectral, texture and context. These image objects are polygon of nearly similar size having interior homogeneity (Flanders et al., 2003). The image objects are classified using membership function classifier, where the rules and the constrained are defined in membership function to control the classification procedure (Myint et al. 2011). The membership function explain the feature characteristics, which determines that the image objects belongs to a particular class or not by defining rule sets.

Rule Set Generation

There are basically two types of rule based approaches for the classification of the image objects. The first approach is to utilize a single knowledge base which contains all the rule while the second approach is to decompose the mutli-source database into individual analysis and further integrate them into a joint analysis for the classification of image objects (Li and Chen, 2005). In the first approach the class assignment of the image object is defined by developing class hierarchy or process tree to assign unclassified image object by defining a set of rule sets. These rule sets contains the criteria by which the object is classified in a particular class. The criteria may be based on object layer value, object shape parameters, its geometry, and position as well as the association parameters. To define process tree, initially a process has to be added in process tree. urther all the rules will added in the process tree as a child process which needs to follow the parent process to assign a class to image object. The rule sets were generated to assign the clusters in a class using additive or subtractive approach. In additive approach different rule set will be defined to extract different sets of buildings from unclassified data to get maximum features. It may be also resultant to get some extra objects which does not belong to object class, which needs to be subtracted by defining different rule sets. In this study rule sets were derived by the spectral properties such as standard deviation and brightness value of the different bands of multispectral image.

The standard deviation of the cluster were computed as,

$$ {\sigma}_k=\sqrt{\frac{1}{sx\times sy}\left({\displaystyle \sum_{\left(x,y\right)}{\left({c}_k\left(x,y\right)\right)}^2-\frac{1}{sx\times sy}{\displaystyle \sum_{\left(x,y\right)}{c}_k\left(x,y\right)}{\displaystyle \sum_{\left(x,y\right)}{c}_k\left(x,y\right)}}\right)} $$
(3)

The mean brightness value of the cluster were computed as,

$$ {\overline{c}}_k=\frac{1}{sx\times sy}{\displaystyle \sum_{\left(x,y\right)}{c}_k\left(x,y\right)} $$
(4)

Where,

c k :

is the pixel intensity

sx :

number of pixels in the x-direction

sy :

number of pixels in the y-direction

The mean brightness value 100 of red band and green band is considered as a parents process to define image object as a building class. It was observed that some of the non -building feature was included in the assigned class. The subtractive approach were used to remove non-building objects by defining the rule using spectral properties of standard deviation of red band and green band by defining the rule having standard deviation less than 9, which removes the non-building features from the building class and assign as unclassified, but it was also observed that some high rise apartments is missing from the classification. To include the missing objects, an additive rule were added by defining the mean brightness value greater than 200 in red and green band, which selected the high rise apartments from the unclassified objects and assigned in building class.

After classification of the image objects the objects are merged the connected objects into single object and export in the form of polygon shape file contains building footprints.

Extraction of Building Height

The extracted building foot print using object based classification were used to get the elevation information from the stereoscopic Cartosat DEM. The DEM needs conditioning to remove to remove spike and depressions. The DEM were conditioned in an iterative manner by considering the elevation value of the surrounding eight neighboring pixels. The extraction of height of the buildings shown in Fig. 4. DEM were overlaid with the building footprint and the average value of elevation inside the footprint were extracted using zonal statistics analysis in GIS environment by selecting average parameter. To estimate the building height the bottom of the building needs to be computed. To compute lower elevation of the building a buffer of the 25 m around the building were generated to cover the surrounding elevation of the buildings. This surrounding elevation covers the area which may represents the ground or surrounding roads or any other ground location representing ground elevation. Such lower elevation computes using zonal statistics within the buffer region by selecting minimum parameter. These two information in two different tables were joined by selecting primary key field. The difference between the average elevation and minimum elevation in joined database will represent the heights of the building represented by the building footprints.

Fig. 4
figure 4

Methodology for Building height extraction

Development of 3D City Model

Buildings are the most important part of a 3D city model for many applications. Manual modeling can lead to very good results but is only feasible for very small areas. Visual analysis is an important component of landscape planning and parts of a process which would identify the most suitable site for a development of a project. In 3D city model the LOD replace the concept of scale as in traditional 2-D maps. The development of 3D city model describes the overlay of features in 3D environment by overlapping features on the elevation surface. The features needs to extrude with their height information to maintain the visual quality by incorporation of the vertical exaggeration. The extracted building footprints with building height derived from the DEM were overlaid over DEM.

Results and Discussion

The stereo analysis over Ahmedabad and Hyderabad city were carried out using DGPS measurement. The model accuracies for Ahmedabad in Easting, Northing and elevation were observed 2.56 m, 2.31 m and 1.160 m respectively. Similarly the model accuracies for Hyderabad were observed 2.48 m, 2.25 m and 3.97 m Easting, Northing and elevation respectively. It is clearly indicates that the overall model accuracies of both the region is within the range of expected error of 5 m. The resultant DEM of Hyderabad and Ahmedabad city and its surrounding are shown in Figs. 5 and 6 for Ahmedabad and Hyderabad regions respectively. The object-based classification perform over the merged products of Cartosat-2 and LISS-IV. The assignment of the class by a class hierarchy is a better approach and rule sets needs to be defined carefully. These rule sets varied with data sets, which needs to take care to avoid misclassification. The extracted building footprint for a part of the Ahmedabad and Hyderabad city are shown in Figs. 7 and 8 respectively. The object elevation were estimated by the average elevation value within the object. This elevation value is used for the extraction of the building height is used by the selecting the nearest ground elevation.

Fig. 5
figure 5

DEM of Ahmedabad City and surrounding

Fig. 6
figure 6

DEM of Hyderabad City and surrounding

Fig. 7
figure 7

Building foot print with different elevation overlaid on the image of part of Ahmedabad city

Fig. 8
figure 8

Building foot print with different elevation of part of Hyderabad city

It was also observed that in the region of very low height buildingsi.e, the building corresponds to slum region having maximum height up to 4 m and building corresponds to ground floor only did not get proper elevation information. These building shows the height of the building of the order of up to 1 m or less than that, clearly indicating the impact of the digital elevation models for the extraction of the building heights. This method clearly identified all the building footprints of the medium size and large building with proper shape. It is observed that error in DEM mainly affect low rise building as compare to medium and high rise buildings.

The overlay analysis by overlaying of building footprint in vector format is carried out by Arcscene 3D environment. The vector layer of footprint was extruded with their height information and DEM were taken as base height. The DEM were overlapped with the merged product as base layer information. The 3D city model for a part of Ahmedabad and Hyderabad city is shown in Figs. 9 and 10 respectively.

Fig. 9
figure 9

3D city model for a part of the Ahmedabad city

Fig. 10
figure 10

3D city model for a part of the Hyderabad city

Conclusions

This study highlighted the significance of 3D city model generation employing photogrammetric techniques. Extraction of details of building topology using semi and automated techniques employing high resolution satellite data by applying merging techniques. The study was carried out to demonstrate a procedure for generating 3D city model using satellite data. Since the accuracy of DEM restrict the delineating the building heights, which may be the cause error in estimating the height of low rise buildings. This is a pilot study for the future high resolution satellite such as Cartosat-2C/2D and Cartosat-3 series. In such future mission the vertical accuracy will be improved from 5 to 1 m and finer details of the building can be seen as their spatial resolution will be high. The enhancement in spatial resolution will improve the accuracy for multispectral as well as merged product for the object based classification. Since for going the 3D city model better than LOD-2, it requires blue print of each of the building, which is not required for city planning level.