Introduction

Urban area is increasing faster than the urban population itself (Tewolde and Cabral 2011). With this rapid growth, urban planners and policy-makers exert a heavy pressure on lands and resources available (Rafiee et al. 2009). They need advanced and reliable methods of cities to make the informed decisions necessary to guide sustainable development in rapidly changing urban environments (Pham et al. 2011). In fact, urban expansion modeling is an interdisciplinary field as it involves numerous scientific areas such as GIS, complexity theory, urban geography, and remote sensing (Rui 2013). One of the main characters in urban expansion modeling is the subject of sprawl measurement. Urban sprawl is known as a multifarious concept dealing with the expansion of auto-oriented and low-density development (Yuan et al. 2005). After Ewing (1994, 1997), sprawl as a form of spatial development, mainly found in open and rural lands on the edge of metropolitan areas, which is characterized by low densities, scattered and discontinuous leapfrog expansion, segregation of land uses, etc.

According to literature, statistical techniques along with remote sensing and GIS have been increasingly used in many urban sprawl studies (Jat et al. 2006). Statistical techniques have also been used to determine the relationship between the impervious area and various urban development parameters. In the recent years, different analysts have made considerable progress in quantifying the urban sprawl (Wei et al. 2006; Yu and Ng 2007; Schneider and Woodcock 2008; Singh 2014). A common approach is to consider the built-up area and population density over the spatial and temporal changes (Effat and El Shobaky 2015). The built-up is generally considered as the key parameter for quantifying urban sprawl (Barnes et al. 2001; Epstein et al. 2002). Characterizing urban sprawl involves appropriate quantification and statistical summarization like as Shannon entropy, patchiness, landscape metrics, hierarchical clustering, regression analysis, etc.

Shannon’s entropy as a well-accepted manner in assessment of urban sprawl has been used in a variety of studies (Bhatta 2009). The entropy is used to measure the extent of urban sprawl that can be directly carried out within GIS environment (Chatterjee et al. 2016). Shannon’s entropy acts as an indicator of spatial concentration or dispersion. It can specify the degree of urban expansion by examining whether the land development is dispersed or compact (Joshi et al. 2006).

The present study aimed to generate an urban sprawl model using two statistical approaches of relative Shannon’s entropy and hierarchical clustering analysis. This measurement of physical growth in urban districts of Mashhad city is essential to urban planners and decision-makers who immediately need updated database for planning and management purposes. In this study, spatial simulation of sprawl model was carried out by the integration of GIS technique.

Methodology

Study area

Mashhad as the capital city of Khorasan-e-Razavi province, which is located in the northeastern Iran (36°37′–36°58′N, 59°26′–59°44′E) has about 2,766,300 population (Statistical Centre of Iran 2011). The surface area of urban restriction of Mashhad is recorded as 292 km2, which consists of about 13 municipality district included a central business district (CBD) (Fig. 1). Hence, the population density in whole Mashhad is calculated about 94.7 p/ha (Mansouri Daneshvar et al. 2017).

Fig. 1
figure 1

General position and geographical location of the study area (the numbers referee to the municipality districts of Mashhad)

Statistical raw data of population, total surface area, built-up area, green space and open area, residential real estate area, and average building floors based on each municipality district are shown in Table 1. For instance, the most population and total surface area are belonged to districts 2 and 7, respectively. Meanwhile, the most built-up area and residential real estate area are belonged to districts 9 and 10, respectively. Spatial distribution of aforementioned data in each district reveals the statistical divergence over the Mashhad city. This divergence of raw data could be a key signature of urban sprawl, which will study in this paper.

Table 1 Statistical raw data of population, total surface area, built-up area, green space and open area, residential real estate area, and average building floors based on each municipality district in Mashhad

Urban sprawl modeling

The present study attempted to present an urban sprawl model using two statistical approaches of relative Shannon’s entropy and hierarchical clustering analysis (Fig. 2). On this basis, two types of variables were produced after the study areas’ districts comprised of geo-statistical and independent variables. Geo-statistical variables were produced as some indices to apply in model process. However, independent variables were used to calculate statistical correlations. In this regard, a correlation coefficient was estimated between model output of sprawl indices and independent variables. Simulating of the model to present the spatial visualization was done in GIS technique similar to prior literature, which have focused on urban sprawl models (e.g., Mohammady and Delvar 2016; Malik and Abdalla 2017).

Fig. 2
figure 2

Diagram of urban sprawl modeling

Geo-statistical variables

Five geo-statistical variables were considered of input measurement variables in sprawl model comprised of population density index (PDI), constructional density index (CDI), residential density index (RDI), horizontal density index (HDI), and vertical density index (VDI) based on each municipality district in Mashhad (Table 2). According to the table, all variables can be categorized in to a major group of sprawl index named as density index. Density index is the most popular sprawl measure (Galster et al. 2001). There are various types of densities and many ways to measure it (Burton 2000). Density is defined as the ratio between the amount of a raw data and its or another data’s surface area on which it exists. Aforementioned five geo-statistical variables and indices were produced as follow Eqs. 15:

Table 2 Geo-statistical variables and indices of population density index, constructional density index, residential density index, horizontal density index, and vertical density index based on each municipality district in Mashhad
$$PDI=\frac{P}{{{A_t}}},$$
(1)

where, P is the population amount, A t is the total surface area in hectare unit, and PDI is population density index in p/Ha unit in each zone (district).

$$CDI=\frac{{{A_b} \times 100}}{{{A_o}}},$$
(2)

where, A b is the built-up area in hectare unit, A o is the green space and open area in hectare unit, and CDI is constructional density index in percent unit in each zone (district).

$$RDI=\frac{{{A_r} \times 100}}{{{A_t}}},$$
(3)

where, A r is the residential real estate area in hectare unit, A t is the total surface area in hectare unit, and RDI is residential density index in percent unit in each zone (district).

$$HDI=\frac{{{A_b} \times 100}}{{{A_t}}},$$
(4)

where, A b is built-up area in hectare unit, A t is the total surface area in hectare unit, and HDI is horizontal density index in percent unit in each zone (district).

$$VDI=\frac{{F \times {A_b} \times 100}}{{{A_t}}},$$
(5)

where, F is the zonal mean of building floors, A b is built-up area in hectare unit, A t is the total surface area in hectare unit, and VDI is vertical density index in percent unit in each zone (district). Higher values of aforementioned variables reveal directly negative impact on urban sprawl and dispersion. Hence, their negative direction impacts on urban sprawl are immediately considered into sprawl measurement procedures.

Shannon’s entropy

According to literature, entropy method is the most reliable and robust metric among the available urban sprawl measurement indices. Shannon’s entropy (E) can be used to measure the degree of spatial concentration or dispersion of a geographical variable (x i ) among zones (Thomas 1981). Shannon’s entropy (E) in its traditional form is given by Eq. 6:

$$E= - \;\sum\limits_{{i=1}}^{n} {{P_i} \times lo{g_e}{P_i}} ,$$
(6)

where, P i as proportion of a phenomenon occurring in the ith zone that is given by Eq. 7, x i is the observed value of the phenomena occurring in the ith zone and n is the total number of zones (13).

$${P_i}=\frac{{{x_i}}}{{\sum\nolimits_{{i=1}}^{n} {{x_i}} }}.$$
(7)

Relative entropy can be used to scale the entropy value into a value that ranges from 0 to 1. According to Thomas (1981), relative entropy (H n ) for n number of zones can be calculated as follow equation:

$${H_n}=\frac{E}{{{{\log }_e}(n)}},$$
(8)

where, log e (n) refers to the upper limit of entropy. However, the present study aims to measure the degree of spatial compactness or sprawl of a geographical zone based on several variables. Hence, modified relative entropy (H k ) index is made as follow equation:

$${H_k}={{\left[ { - \sum\limits_{{i=1}}^{k} {{P_i} \times lo{g_e}{P_i}} } \right]} \mathord{\left/ {\vphantom {{\left[ { - \sum\limits_{{i=1}}^{k} {{P_i} \times lo{g_e}{P_i}} } \right]} {lo{g_e}(k)}}} \right. \kern-0pt} {lo{g_e}(k)}},$$
(9)

where, k is the total number of variables (5) and log e (k) refers to the upper limit of entropy equal to 1.609. The value of entropy ranges from 0 to log e (k). In this instance, 0.5 (for each district) is considered as threshold. Due to negative direct impacts of applied variables, values lower than 0.5 is generally considered as sprawl. However, higher values of entropy indicate the occurrence of compactness. In the present study, the sprawl index is defined as reversed values of relative entropy.

Hierarchical clustering analysis

Hierarchical clustering analysis (HCA) and principal components analysis (PCA) are widely used for clustering of geographical data (Mansouri Daneshvar 2015). In this research, the proximity matrix was derived from distance correlation based on dissimilarity measure of squared Euclidean distance in SPSS software to cluster the cases (districts). To hierarchical cluster, the Euclidean distance measure for the observations and Ward’s method for the linkage rule were used. Among the various distance measures and linkage rules that can be implemented in hierarchical clustering analysis, this combination presents the most distinctive groups for data used. In the present study, a clustering dendrogram of HCA is considered to control the modified relative entropy results.

Independent variables

In this research, five independents variables comprised of informal settlements, building parcel size, per capita car owning, land price, and frequency crimes were produced based on the thirteen municipality districts. Aforementioned independent variables were shown in Table 3.

Table 3 Independent characteristics of informal settlements, building parcel size, per capita car owning, land price, and frequency crimes for each municipality district in Mashhad

Correlation coefficient

In the research model, both Shannon’s entropy and clustering analysis are used to calculate dispersion or compactness characteristics indicating degree of urban sprawl. However, further statistical research is carried out in order to support comparative relationship between the degree of sprawl index and independents variables. According to the model diagram in Fig. 2, a correlation coefficient analysis is used to detect the possible relationships between sprawl indices and aforementioned independent variables. These relationships can be used to prospect the future urban sprawl.

Result and discussion

Geo-statistical indices

According to Eqs. 15, about five geo-statistical indices were prepared in Fig. 3. In this regard, all indices were categorized into three classes. First variable of population density index revealed that the low densities of population with high suitability of urban dispersion are belonged to districts 7–9 and 12 (Fig. 3a). Second variable of constructional density index revealed that the low densities of built-up areas toward the vacant areas are belonged to districts 2, 5, 8 and 12 (Fig. 3b). Third variable of residential density index revealed that the low densities of residential real estate areas with high suitability of scattering settlements are belonged to districts 2–3, 7 and 12 (Fig. 3c). Forth variable of horizontal density index revealed that the low densities of built-up areas toward the total surface area are belonged to districts 2, 5, 7–8 and 12 (Fig. 3d). Fifth variable of vertical density index revealed that the low densities of floored built-up areas toward the total surface area are belonged to districts 5, 7–8 and 12 (Fig. 3e). After all indices, districts 2, 7, 8, and 12 have the highest potential of sprawl growth among the other districts in the study area.

Fig. 3
figure 3

Spatial visualization of the study area for urban sprawl indices. a Population density index. b Residential density index. c Constructional density index. d Horizontal density index. e Vertical density index. f Distances to CBD (all legends were inserted into figures separately)

Assumption of decentralization from a central core to the urban periphery is often fundamental to sprawl’s characterization (Galster et al. 2001). Hence, the spatial centered points of all districts were surveyed distance to central business district (CBD). On this basis, estimated distances for districts 2, 7, 8, and 12 were recorded as 7, 7, 4, and 16 kilometers, respectively. Compared to other district’s distances to CBD in Fig. 3f, no any meaningful relation is observed. Therefore, decentralization characteristic in Mashhad city do not influence on sprawl measurement.

Urban sprawl index measurement

According to Eq. 9, statistical estimation of the components of Shannon’s entropy based on each municipality district in Mashhad was calculated in Table 4. The modified relative entropy calculated in this table reveals that the highest urban sprawl is occurred on the districts 12, 7, and 8 with values of 0.324, 0.414, and 0.447, respectively. The result of relative entropy measurement was mapped in Fig. 4. This figure reveals that the aforementioned crucial districts spatially are located in northwestern and southeastern parts of Mashhad city. Based on legal documentations of Mashhad urban planning, the northwestern region is the suitable area for future urban development. Hence, the sprawl expansion of the city over the district 2 is a natural trend of the physical development in the study area. In contrarily, the southeastern region, which is comprised the Mashhad international airport areas, has not any contribution in for future urban development. Hence, district 7 should be protected from sprawl dispersion impacts.

Table 4 Statistical estimation of the components of Shannon’s entropy based on each municipality district in Mashhad
Fig. 4
figure 4

Distribution of relative entropy index in three categories based on each municipality district in Mashhad (the legend was inserted into figure. In addition, the numbers referee to sprawled municipality districts in Mashhad)

After the HCA methodology, a clustering dendrogram was considered to control the entropy results. For this purpose, the sprawl classes in three clusters of high, moderate and low sprawl metrics were produced as a dendrogram in Fig. 5 with rescaled distance cluster combination of 10. Like as relative entropy results, the hierarchical clustering analysis was shown crucial districts of 7, 8, and 12 in the high sprawl cluster. According to the results, HCA and modified entropy procedures were shown the same categorizes of sprawl expansion in the study area and then could be applied for such study in everywhere is faced with sprawl growth. Jat et al. (2008) have stated that the results of the different analysis on urban sprawl such as multivariate regression analysis, landscape metrics of patchiness and built-up density, and remotely sensed satellite imageries are well in agreement with the Shannon’s entropy values.

Fig. 5
figure 5

(Source: extracted from SPSS software)

Clustering dendrogram of the districts based on urban sprawl measurement in HCA

Statistical correlations

In this study, the Spearman correlation coefficients between the degree of sprawl index and independents variables of informal settlements, building parcel size, per capita car owning, land price, and frequency crimes were produced based on the 13 municipality districts. Thereafter, statistical correlations were presented in Table 5. After the last table, a significant direct correlation (R = 0.44) between sprawl index and informal settlement development was detected in each district. Also, direct correlations (R = 0.32–0.30) were observed between sprawl index and frequency of crimes and building parcel size. Contrarily, the result revealed a reverse correlation (R = − 0.50) between sprawl index and land price index was explored based on each districts. Therefore, the correlation between sprawl index and car owning index was not significant.

Table 5 The correlations between relative sprawl index and five Independents characteristics of informal settlements, building parcel size, per capita car owning, land price, and frequency of crimes

Consequently, the sprawl expansion in Mashhad city depends on the growth of informal settlements and increase of frequency of crimes. In addition to informal settlements and crimes, the greatest building parcel size and the lowest land price are contributed to urban sprawl in Mashhad. The low value of land price in the study area is a triggering factor on inhabitation of low-income socio-economical people through informal settlements. The seeding of informal and squatted settlements enhances the urban and sub-urban crimes. Feng et al. (2007) have pointed that the sprawled urban development in China are identified by fragmentation and irregularity of landscape, low population density and low income and consequently negative environmental impacts. Hence, the urban management in the study area should control the sprawl expansion in the crucial districts by environmental prevention of land use change and land degradation.

Conclusion

Urban sprawl refers to the dispersed and unplanned expansion of urban areas, which causes a wide range of environmental and socio-economical problems in developing countries. Gillham (2002) and Helbich and Leitner (2010) have reported that urban sprawls formed at the fringes of metropolitan areas, by spreading through commercial and industrial development with low density, and followed by large uncontrolled urban expansion with low quality of services and accessibilities. However, urban sprawl formation in developing countries may follow different growth patterns compared with other parts of the world (Alsharif and Pradhan 2014). On this basis, sprawl measurement in such countries needs to further statistical detection and quantification.

In the present study, an urban sprawl model using two statistical approaches of relative Shannon’s entropy and hierarchical clustering analysis was generated for Mashhad city, northeastern Iran. Input measurements of urban sprawl modeling was estimated based on five geo-statistical variables and indices of population density index (PDI), constructional density index (CDI), residential density index (RDI), horizontal density index (HDI), and vertical density index (VDI). The result of relative entropy measurement reveals that districts 12, 8, and 7 are crucial zones of Mashhad in regard of sprawl expansion, which are located in northwestern and southeastern parts of city. Like as relative entropy result, the hierarchical clustering analysis was shown crucial districts of 7, 8, and 12 in the high sprawl cluster. Based on statistical correlations, significant direct correlations between sprawl model output and informal settlement development and frequency of crimes were detected in each district. Contrarily, the result revealed a reverse correlation between sprawl model output and land price index was explored based on each district.

Consequently, the sprawl expansion in Mashhad city depends on the growth of informal settlements and increase of crimes. The reverse relationship between sprawl index and land price index indicates that low-income people populate crucial districts especially over the scattered vacant areas. This sprawl expansion could trigger growth the informal settlements and other negative environmental impacts in the study area. The most prevalent unplanned urban growth patterns exhibits as low-density built-up dispersal generally termed as urban sprawl characterizing the negative impact on environmental and social dimensions (Poelmans and Van Rompaey 2009).