Introduction

Urban expansion results from human alteration of the land surface (Lambin et al. 2001; Zadbagher et al. 2018) and is intensifying as a consequence of economic growth. Modeling urban expansion can reveal its underlying processes and spatiotemporal dynamics (Akın et al. 2015; Liu et al. 2017), providing planners and decision-makers with early warning of ecological and environmental consequences (Smidt et al. 2018; Sun et al. 2018). Cellular automata (CA) models are effective tools to reconstruct past urban expansion and project future scenarios by analyzing land-use dynamics (Liu et al. 2018; Soares-Filho et al. 2013). Biophysical, socioeconomic, demographic, and meteorological prerequisites have been identified as factors that drive urban expansion (Ahmed et al. 2014; Osman et al. 2016). These factors are commonly applied to build transition rules in CA modeling. However, too many factors may yield multicollinearity that has undesirable impacts on the models (Feng and Tong 2017b; Von Thaden et al. 2018), leading to negative effects on their utility. Factor selection is therefore crucial in constructing CA models. As a consequence, the comparison of individual and multiple factors in CA-based urban modeling should improve our understanding of land-use dynamics.

Accurate analysis of the relationships between urban expansion and its drivers is fundamental in estimating land conversion probability, which guides CA models to control the distribution of new urban cells (Kamusoko and Gamba 2015). Wahyudi and Liu (2013) reported that, during the past two decades, an increasing number of driving factors of various categories have been applied in CA modeling. We have categorized these influencing factors as:

  1. 1.

    Biophysical variables that represent primary determinants of urban expansion, including meteorological and topographic conditions such as land surface temperature, normalized difference vegetation index, digital elevation model (DEM), and terrain slope (Dubovyk et al. 2011; Mahiny and Clarke 2012)

  2. 2.

    Human-disturbance variables that represent the proximity disturbance such as distance to city center, road networks, water bodies, ecological reserves, and farmland (Engelen 2002; Reilly et al. 2009), as well as the activity disturbance such as urban development activities and intensity (Li et al. 2011; Mitsova et al. 2011; Sang et al. 2011)

  3. 3.

    Socioeconomic variables that include social, economic, and demographic factors, such as employment potential, gross domestic product (GDP), land price, and population density (Dang and Kawasaki 2017; Haase et al. 2012; Poelmans and Rompaey 2009; Rienow and Goetzke 2015)

  4. 4.

    Institutional variables that represent macro-control policies on urban development, including urban planning regulations, rural development policies, and land development regulations (Delden et al. 2010; Deng et al. 2015; Peña et al. 2005)

It is difficult to include all these factors in CA modeling, because they relate to hundreds of discrete variables that have been shown in many cases to be highly correlated. Factor correlation leads to variable multicollinearity, hence a negative impact on simulation accuracy. To reduce variable multicollinearity while minimizing information loss, modelers have identified dominant factors using different methods such as quantitative analysis, rough set theory, regression analysis, and spatial statistics (Mondal et al. 2015; Osman et al. 2016; Wang et al. 2011). To eliminate data redundancy, principal component analysis was applied to identify factors that dominate land development (Li and Yeh 2002). For example, Osman et al. (2016) applied an analytic hierarchy process to determine the weights of candidate factors in different regions of the Giza Governorate in Egypt. Rather than using all candidate factors, Wang et al. (2011) calibrated CA-based urban models using dominant factors identified by a rough set. A multivariate statistical model was applied to examine the influence of drivers on urban expansion at two different periods and reconstructed the urban expansion using CA models (Ahmed et al. 2014). Geographically weighted regression has been used to probe the spatial variation of driving factors for urban expansion modeling (Mondal et al. 2015; Shafizadeh-Moghadam and Helbich 2015). To assess the effects of driving factors on CA-based simulation results, González et al. (2015) applied a simplified global sensitivity analysis in modeling the urban expansion of Madrid.

Comprehensive analysis of the factors affecting urban development is generally beneficial for CA modeling. Feng and Tong (2017b) reported that CA model performance might decline when including more (e.g., larger than five) factors because multicollinearity among them may likely lead to inaccurate weight definition. With this knowledge, it is essential to ask: (1) Which factors are important to urban expansion in rapidly urbanizing areas, and is a specific factor useful in building a CA model? (2) Do multi-factor-based (MFB) CA models necessarily outperform individual-factor-based (IFB) CA models, if an individual factor can be used to build CA models? (3) How each factor affects the future urban scenario prediction? As a result, the objective of this study is to address the above issues.

We applied a generalized additive model (GAM) that examines the ability of each factor to explain urban expansion and rank factors by their statistical significance. GAM is a nonparametric extension of a generalized linear model and uses an unspecified smoothing function to build the nonlinear relationship between response and explanatory variables (Larsen 2015). The model has been utilized to build transition rules of urban CA models (Brown and Goovaerts 2002; Feng and Tong 2017a). Here, we focus on the identification of explanatory ability and rank-order of each factor using GAM in constructing urban CA models. We then applied differential evolution (DE) to capture the land transition rules of CA models. Most recently, DE was integrated with CA to build a hybrid DE-CA model that successfully simulated rapid urban expansion in Kunming (Feng and Tong 2018a). DE automatically minimizes total residuals in the transition rules, resulting in improved accuracy over traditional CA methods. Since DE-CA can predict future scenarios, we applied this model to simulate urban expansion based on the GAM-identified individual and multiple factors. Hangzhou city that lies on the southeast coast of China was selected as our case study area. Modeling urban expansion and land-use change in Hangzhou is of great interest to researchers (Hou et al. 2019; Liu et al. 2018) because it is a rapidly urbanizing and economically developed city in the Yangtze River Delta. A gradient CA model and a CA-Markov model were applied in these studies, but it is still unclear how urban scenarios of Hangzhou will be affected by each driving factors. For comparison, we calibrated DE-CA models based on individual (IFB-DE-CA) and multiple (MFB-DE-CA) factors using land-use change data from 2005 to 2015, and used these models to simulate the 2015 urban pattern of Hangzhou. We finally projected different urban scenarios for the year 2030 using the DE-CA models.

Study area and data

Study area

Hangzhou is the capital of Zhejiang province on the southeast coast of China (Fig. 1a). The city covers 16,596 km2 and has jurisdiction over 13 sub-areas including 9 urban districts and 4 satellite cities (Fig. 1b, c).

Fig. 1
figure 1

The Hangzhou study area

Elevation gradually decreases from the southwest with a high slope to the northeast with a low slope. More than half of the study area is hilly, yielding the most extensive urban development in the northeast. In the Urban Agglomeration Development Planning of the Yangtze River Delta (NDRC 2016), Hangzhou was designated as a Type I large city with more than 5 million urban residents. According to the local Bureau of Statistics (tjj.hangzhou.gov.cn), Hangzhou had 9.2 million registered and short-term residents as of 2016, with a population density of 554 persons per km2. Hangzhou’s GDP increased substantially since China’s opening up, rising from 284 million Chinese Yuan (about 45 million USD) in 1978 to 1130 billion Chinese Yuan (about 180 billion USD) in 2016. Like other big cities in China, the integrated effects of the population explosion and rapid economic growth have led to the fast expansion of the urban area. The built-up area of Hangzhou grew from 314 km2 in 2005 to 506 km2 in 2015 as reported by the local Bureau of Statistics. Comparison of land-use patterns in 2005 and 2015 classified from Landsat images shows that the newly built-up areas occur principally around the Hangzhou city center and two nearby suburban areas (Binjiang and Xiaoshan). In this study, we focused the Hangzhou city center and all the adjacent suburban areas that cover about 6300 km2 (Fig. 1c).

Data and variables

We used vector and raster data to produce both dependent and independent variables in our models. The Hangzhou administrative map was implemented to identify Hangzhou boundaries, city center, and county centers (Fig. 2a). We acquired traffic network datasets from OpenStreetMap, which classifies the roads into motorway, trunk, and primary based on their categories. Among them, we included national and provincial roads that have greater impacts on urban expansion, but excluded urban roads and rural lanes that have weaker influences on urban expansion. The raster datasets include Landsat images acquired on March 7, 2005 and October 13, 2015, which were assembled from Geospatial Data Cloud (www.gscloud.cn). We then applied Mahalanobis distance classifier in ENVI 5.2 to classify these images to produce land-use patterns. As our focus is to model urban expansion, we only considered three types of land use (i.e., urban, non-urban, and water body) to produce the 2005–2015 urban changes (Fig. 2b). We also used a digital elevation model (DEM) from the ASTER Global Digital Elevation Map and the rasterized population density from Worldpop (worldpop.org.uk). Spatial resolution set at 30 m yielded 3011 × 4233 cells in each map.

Fig. 2
figure 2

Major road networks and urban expansion from 2005 to 2015

The observed 2005–2015 urban expansion was applied to generate the dependent variable (y) with a value of 1 (urban expansion) or 0 (others). We extracted four distance-based independent variables for urban expansion drivers from vector maps using the Euclidean Distance tool in ArcGIS 10.1 (Table 1; Fig. 3a–d). We included the biophysical DEM factor to reflect the impact of terrain, and the demographic POP factor to reflect the influence of population density. All driving factors were normalized using the method of Feng and Tong (2018b) to reduce the influence of dimension on DE-CA parameterization.

Table 1 Dependent [D] and independent [I] variables and summary statistics
Fig. 3
figure 3

Spatial visualization of independent variables in the DE-CA model

Among the independent variables, CITY and COUN yielded the highest mean values, while DEM and POP yielded the smallest mean values (Table 1). There are no substantial differences in standard deviation between CITY and COUN, suggesting a similar distribution of these two factors across space. The standard deviation of RAIL is larger than that of ROAD, indicating the more discrete distribution of RAIL and a lower density of the railway networks. The DEM and POP factors have the smallest standard deviation, suggesting that most areas are relatively plain and most regions have a low population density, sometimes with zero residents (Fig. 3e, f).

Methods

Workflow

Figure 4 shows our procedure of DE-CA modeling in four steps: (1) data collection and preprocessing that include radiometric calibration, atmospheric and geometric rectification, and study area cropping; (2) variable extraction that includes classification of Landsat images to produce land-use patterns and spatial visualizations of driving factors; (3) factor assessment using GAM that selects samples from the land-use change map and driving factors using systematic sampling, and applies GAM to identify the rank-order of driving factor; and (4) modeling and assessment that construct CA transition rules using DE to build IFB-DE-CA and MFB-DE-CA models, and simulate present urban pattern and future scenarios.

Fig. 4
figure 4

Modeling procedure under different strategies for factor combination

The DE-CA model

Our DE-CA model is an improvement of a typical CA by automatic calibration using DE. In CA, four elements including the current cell state, contiguity cell effects, constraints, and land transition probability collaboratively determine the land transformation. The DE-CA transition rules can be expressed as

$$ {Stat}_m^{t+1}= LandTrans\left({Stat}_m^t, NeiEff, Con,{P}_t\right) $$
(1)

where

  • Land transition function (LandTrans) denotes the transition rules that integrate the effects of the four current elements to determine the future cell state;

  • Future cell state (\( {Stat}_m^{t+1} \)) represents the state of cell m at time t + 1 and current cell state (\( {Stat}_m^t \)) represents the state of the cell m at time t;

  • Neighborhood effects (NeiEff) denote the influence of neighboring urban cells on the state of the cell in processing, with 5 × 5 Moore square neighbors widely applied in DE-CA models (Pan et al. 2010; Wu 1998);

  • Constraints (Con) denote prohibited areas resulting from unsuitable conditions or urban planning regulations, including broad water bodies, high-slope areas, and protected areas (Feng and Tong 2018b); and

  • Transition probability (Pt) denotes the temporally stationary, but spatially non-stationary, land conversion probability defined by driving factors. The probability is calculated by (Mustafa et al. 2018):

$$ {P}_t=\frac{e^{\left({a}_0+{a}_1\times {D}_1+\dots +{a}_n\times {D}_n\right)}}{1+{e}^{\left({a}_0+{a}_1\times {D}_1+\dots +{a}_n\times {D}_n\right)}} $$
(2)

where a0 is the intercept, n is the number of factors, and a1, … , an are parameters for each factor representing its weight. Here, n ranges from 1 to 6, with n = 1 indicating only one factor included in the modeling and n = 6 indicating all six factors included in the modeling.

The parameters (an) are commonly retrieved using logistic regression (LR). Many studies have reported that LR alone cannot sufficiently address urban expansion dynamics (Feng and Tong 2018a; Li and Yeh 2002). In contrast, heuristics such as DE can search for parameters that represent the complex urban expansion and can minimize the residuals in fitting samples. Residuals are usually calculated as the root-mean-square error (RMSE) between the observed land conversion value and the predicted transition probability. A RMSE-based objective function can thus project the transition rule space into the DE space. A typical objective function is (Feng and Tong 2018a):

$$ \operatorname{Min}\ Func\left({D}_1,\dots, {D}_n\right)=\sqrt[2]{\frac{\sum {\left({P}_v\left({D}_1,\dots, {D}_n\right)-{P}_o\right)}_m^2}{Q}} $$
(3)

where Func(D1, … , Dn) is the objective function; for cell m, Pv(D1,  … , Dn) is the predicted transition probability and Po is the observed land conversion; and Q is the number of samples.

DE is a global heuristic similar to genetic algorithms (GAs) that are guided by an objective function as well as the population mutation, crossover, and selection operators. Compared with GA, DE searches for the global optimum using differentiation among populations, and does not easily fall into local optima (Das et al. 2009; Noroozi et al. 2011). DE starts with population initialization under predefined bound constraints. The process randomly selects three vectors from the initial population and then subtracts two of these to generate a difference vector that is added with the third vector to generate a new vector, which is compared with the third one. If the new individual’s function value is smaller than that of the third one, the new one will replace the old one at the next generation. DE finds a near-optimal solution after the population undergoes mutation, crossover, and selection operations. Here, the solution is a set of optimal parameters for the CA transition rules.

Thresholds were defined according to the total land available for development and the maximum number of iterations from the initial to final year, where an iteration denotes 1 year. For modeling the urban pattern, the total land available was taken to be the actual urban area in the final year (2015) of the model. The threshold is preferred if it satisfies both the total land and the maximum iterations. We defined the thresholds for future predictions using the same method.

Factor combination using GAM

We adopted a GAM to assess the influence of each factor before building the transition rules. The model has the advantage of quantifying the factors’ ability to explain urban expansion and ranking them accordingly. A standard GAM allows flexible estimation of the underlying nonlinear relationships between dependent and independent variables (Guisan Jr et al. 2002). An example of GAM can be represented as

$$ g\left(E(y)\right)={b}_0+{s}_1\left(\mathrm{DEM}\right)+{s}_2\left(\mathrm{POP}\right)+{s}_3\left(\mathrm{CITY}\right)+{s}_4\left(\mathrm{RAIL}\right)+{s}_5\left(\mathrm{ROAD}\right)+{s}_6\left(\mathrm{COUN}\right)\kern1.5em $$
(4)

where g(E(y)) is a function that links the expected value E(y) to all independent variables, b0 is the intercept, and si(·) is a nonparametric smoothing function that relates each variable to g(E(y)). The factors’ positions in the above equation reflect their impacts, in descending order, which may be different when using these factors to explain urban expansion at different locations and at different periods.

GAM is a rank-order sensitive method that introduces factors into the model one by one. An anterior factor has a stronger impact on urban expansion while a posterior factor has a weaker impact. Percentage of deviance explained (PDE) indicates the factor’s ability to explain urban expansion and this contribution in GAM, and Akaike Information Criterion (AIC) indicates the model’s comparative performance. We therefore applied PDE and AIC to define the rank order of each factor and combinations of factors.

Evaluation methods

Accuracy assessment

CA models are usually assessed using cell-by-cell comparison between the simulated patterns and the actual patterns (Feng and Tong 2018b; Liu et al. 2017; Musa et al. 2017). This method usually generates an error matrix that reports overall accuracy and Kappa coefficients. While pattern comparison indicates the global performance of CA models, its weakness is obvious when trying to identify the changes that are simulated correctly or falsely (Pontius and Millones 2011). CA models can also be assessed using an overlaid map of the simulated urban expansion and the actual urban expansion, which usually reports metrics such as hit, correct rejection, miss, and false alarm (Aldwaik and Pontius Jr 2012; Pontius et al. 2013). Hit means that the 2005–2015 reference urban expansion was correctly simulated as urban expansion, and correct rejection represents that the actual non-urban persistence during 2005–2015 was correctly identified. Miss indicates that the 2005–2015 urban expansion was incorrectly simulated as non-urban persistence, while false alarm indicates that non-urban persistence was mistakenly simulated as urban expansion.

Landscape metrics

To assess the agreement between the simulated landscape patterns and actual landscape patterns, researchers have applied a set of metrics that can quantify the landscape composition and structure (Chaudhuri and Clarke 2013; Feng and Tong 2017b; Whitsed and Smallbone 2017). The metrics include a number of landscape spatial characteristics at the landscape, class, and patch levels (Feng et al. 2018; Mcgarigal 2014). Among commonly used landscape metrics, area-edge metrics reflect the size and edges of patches, shape metrics reflect the complexity of patches, and aggregation metrics refer to the trend of spatial aggregation of urban patches (Mcgarigal 2014). In this study, we calculated ten class-level metrics to evaluate urban land-use patterns following Niesterowicz and Stepinski (2016):

  • Pecentage of landscape (PLAND), Largest Path Index (LPI), and total edge (TE) from the area-edge category

  • Perimeter-area fractal dimension (PAFRAC) from the shape category

  • Path density (PD), Interspersion Juxtaposion Index (IJI), Patch Cohesion Index (COHESION), Landscape Division Index (DIVISION), Aggregation Index (AI), and Splitting Index (SPLIT) from the aggregation category

Results

Factor effects

A null model with no factors (Model-0) and six IFB-DE-CA models (models 1–6) were constructed to evaluate each factor’s impact on urban expansion (Table 2). The fitting statistics of GAMs show that the rank-order of the factors are DEM, POP, CITY, RAIL, ROAD, and COUN (Table 2). The terrain conditions (DEM) and population density (POP) have the highest influences with ~ 12.8% deviance explained while COUN explained the least deviance ~ 2.4%. Models 1–2 yield the largest PDE and smallest AIC, indicating good performance of these two models. In contrast, models 3–6 yield smaller PDE and larger AIC, implying that these models were built using less influential factors.

Table 2 The PDE, AIC, and rank order that show factor contribution for each IFB-DE-CA model

For comparison, fifteen MFB-DE-CA models were constructed to assess the influences of multiple factors on urban expansion (Table 3). Of the two-factor models, model 7 (based on DEM and POP) explained ~ 15% deviance, while the other four models explained ~ 13% deviance. For the three-factor models, model 12 explained more than 16% deviance while the others explained deviance less than 16%. The models with four and five factors explained similar deviance. We selected the models that explained the highest deviance over those with the same number of factors, and selected model 21 that included all factors, to simulate the 2000–2015 urban expansion in Hangzhou. The deviance increased and the AIC decreased as more factors were put into the models, showing better fitting performance as more factors applied in the models, given that multicollinearity is not considered. While this suggests that more factors lead to more accurate simulation results, it may be violated by simulation practices.

Table 3 The PDE and AIC in GAMs that show factor contribution for the MFB-DE-CA models

Transition rules and land conversion probability maps

Controlling parameter assignment is critical for solving the objective function using DE because heuristics are sensitive to their controlling parameters. We followed Feng and Tong (2018a) to define the controlling parameters using the default values (Table 4) recommended in the package “DEoptim” in R-Gui. Among these parameters, the population size was assigned to 20 times the sum of the number of variables and an interpret (a0). The lower and upper bounds are external DE parameters defined by the parameters calculated in LR. The lower bound of positive parameters and the upper bound of negative parameters are taken to be 0, while the upper bound of positive parameters and the lower bound parameters are defined as two times the calculated parameters by LR (Table 5).

Table 4 Definition of controlling parameters in DE for solving the objective function
Table 5 The lower and upper bounds for DE and the computed CA parameters of all the selected models

Table 5 shows that the same factor may yield different parameters and even different signs in various models, suggesting the changing effects of factors across models. For factors except for POP, a negative parameter indicates a promotive effect on urban expansion while a positive parameter indicates a resistive effect. DEM has negative and the highest absolute parameters in models, suggesting that it strongly drives urban expansion. This confirms the rank order and the deviance explained for DEM (Table 3). The CITY’s parameters are all negative in models 3, 19, and 21, and there are no major differences between these parameters that show similar promoting effects. ROAD yield negative parameters in models 5 and 21, where the larger absolute parameter in model 5 shows a stronger influence.

Some factors yield changing impacts in different models. POP is positive in model 2 but negative in the other five MFB-DE-CA models, suggesting that POP promotes urban expansion in model 2 but inhibits it in the other models. COUN is negative in model 6 but positive in the other MFB-DE-CA models except for model 7, suggesting that COUN yields less strong influences when applying it with other factors in models. RAIL is negative and yields attractive effects in models 4 and 16, whereas it is positive and yields repulsive effects in models 19 and 21.

The conversion probabilities derived from the IFB-DE-CA models were stretched to range between 0 and 1, and used as input into the DE-CA models to simulate urban expansion. The maps in Fig. 5 show different spatial patterns that are characterized by their corresponding factors. In the DEM-based model 1 (Fig. 5a), conversion probability is low in the high-elevation areas but high in the low-lying areas. Model 2 yields high conversion probability in the city center with high population density, while it yields low conversion probability in other areas with low population density (Fig. 5b). In models 3–6 (Fig. 5c–f), high conversion probabilities occur principally in areas near the Hangzhou city center, railways, main roads, and county centers, respectively.

Fig. 5
figure 5

Land conversion probability maps produced by all six IFB-DE-CA models

The multi-factor-based conversion probability maps in Fig. 6a–e are similar to each other, with minor differences. All maps based on multiple factors (Fig. 6a–e) are similar to the DEM-based map (Fig. 5a). This is probably due to the dominant effect of elevation and topography. Visual inspection shows different spatial patterns for model 7 and models 12 and 16, but their differences are small and probably insignificant (Fig. 6f, g). While model 7 and models 19 and 21 (Fig. 6h, i) are quite similar, they differ widely in probability, ranging from 0.00 to 0.32. This suggests that model 7 may be quite different from models 19 and 21. All models yield the same min and max due to normalization, but most means and SDs differ between factors (Table 6), leading to different land conversion thresholds.

Fig. 6
figure 6

Land conversion probability maps and their differences for the MFB-DE-CA models

Table 6 Summary statistics of land conversion probability from different models

Simulation results

We simulated 11 urban patterns for 2015 using the DE-CA models but present only nine in Fig. 7 because models 12 and 16, and models 19 and 21, are very similar. Figure 7 shows that the selected DE-CA models are capable of generating the 2015 urban patterns where newly built-up areas occur surrounding the existing built-up areas.

Fig. 7
figure 7

Simulated 2015 Hangzhou urban patterns from DE-CA models with different factors

Models 1, 2, 7, 12, and 19 (Fig. 7a, b, g–i) yielded quite similar outcomes with minor differences. All models simulated fewer changes in the outer suburb Lin’an (region 3). Compared to the other models, model 2 simulated fewer changes in regions 1 and 2 (Fig. 7b) while model 19 simulated the fewest changes in region 1 but more changes in region 5 (Fig. 7i). Models 3 and 4 produced similar spatial patterns as they simulated fewer changes in the far suburbs for both Lin’an and Fuyang (Fig. 7c, d). The simulated urban expansion areas are mainly distributed around the Hangzhou city center. By comparison, model 3 allocated new urban areas in the east-west direction while model 4 allocated new urban areas in the north-south direction. The simulated changes in model 5 occur mainly along the road networks (Fig. 7e), confirming the significant influence of the major roads on urban expansion. Model 6 captured most of the urban expansion in the Hangzhou city center and two of its distant suburbs (Lin’an and Fuyang), but it missed many other urban areas in region 2 (Fig. 7f).

Accuracy analysis and model response

The overall accuracy for all the DE-CA models varies as the iteration number increases (Fig. 8). Models 1, 2, and 7 yielded the greatest accuracy (~ 91%) at the 10th iteration (Fig. 8a), models 3–6 yielded the greatest accuracy (~ 89%) at the 6th iteration (Fig. 8a), and the remaining four models yielded the greatest accuracy (~ 90%) at the 8th iteration (Fig. 8b). while the overall accuracy of models 1, 2, and 7 monotonously increased during iteration, their simulated quantity of urban expansion matched the total land available for development at the 10th iteration. The overall accuracy of models 3–6 first increased as the iteration number increased, then peaked at the 6th iteration. The new urban areas of these four models satisfied the total land available for development at the 10th iteration, where they have lower overall accuracy than at the 6th iteration. Models 12, 16, 19, and 21 yielded the greatest accuracy at the 8th iteration, similar to models 3–6. These results show that the DE-CA models containing different driving factors led to different simulation processes and outcomes.

Fig. 8
figure 8

Overall accuracy versus iteration number for 2015 urban expansion simulations

To show the spatial patterns of accurate and erroneous simulations, we performed an overlay analysis for the 2005 reference map, the 2015 reference map, and the 2015 simulated map (Fig. 9). The overlay maps have five categories: urban agreement, non-urban agreement, miss, false alarm, and water body. The water body was excluded from the accuracy assessment. The maps show that the simulation results for models 1, 2, 7, 12, and 19 had fewer misses and false alarms (Fig. 9a, b, g–i), with simulation errors principally occurring in the east and north of Hangzhou. In contrast, models 3–6 yielded more misses in eastern Hangzhou, and most of the false alarms occurred around the existing built-up area (Fig. 9c). Model-4 yielded more false expansion in the southern and northern parts of Hangzhou (Fig. 9d) while model 5 yielded more false expansion along the road networks (Fig. 9e). For model 6, false alarms occurred in Hangzhou city center and north of Hangzhou, as well as in two satellite cities (Lin’an and Fuyang).

Fig. 9
figure 9

Overlay maps showing 2015 simulation hits, misses, and false alarms

As each simulation progressed, hits increased monotonously, while misses decreased and false alarms increased (Fig. 10), indicating that more simulation errors were introduced when more urban expansion was accurately modeled. Quantitatively, models 1 and 2 hit 5.7% urban expansions correctly during 2005–2015, while models 3–6 hit 0.7% fewer urban expansions as compared with the former. Models 1 and 2 missed 4.9% actual urban expansions and generated 4.4% false alarms, whereas models 3–6 produced the largest misses (5.6%) and false alarms (5.5%). Models 7, 12, and 16 hit more than 5.7% of urban expansion, and there are no quantitative differences among them. The correctly captured urban expansions (~ 5.4%) of models 19 and 21 are slightly smaller than those of models 7, 12, and 16. Models 7 and 12 missed less than 5% of urban expansions while models 16, 19, and 21 missed more than 5% of urban expansions. Models 7 and 16 yielded similar false alarms ~ 4.4%, while models 12, 19, and 21 yielded slightly larger false alarms than the former models. Overall, the IFB-DE-CA models 1 and 2 and all MFB-DE-CA models are superior to the IFB-DE-CA models 3–6 as measured by all metrics including hit, miss, and false alarm, suggesting the latter four models are less preferred in modeling urban expansion at Hangzhou.

Fig. 10
figure 10

Hit, miss, and false alarm percentages as a function of iteration number in simulating 2015 urban patterns

Predicted 2030 scenarios

To evaluate the effect of factors on urban scenarios, we predicted Hangzhou urban scenarios for 2030 using the IFB-DE-CA models because each only includes only one factor. We applied a linear extrapolation to determine the amount of future urban expansion based on the 2005–2015 annual growth rate. Hangzhou’s urban area expanded by approximately 488 km2 from 2005 to 2015, leading to a projected expansion of ~ 732 km2 from 2015 to 2030 and a total urban area of ~ 2125 km2 by 2030.

Figure 11 shows distinct differences in the spatial pattern among different scenarios, reflecting the effects of the corresponding factors on future urban expansion when applying them in DE-CA models. The model 1 scenario shows that the newly built-up areas will occur mainly in the flat and low-slope areas of eastern and northern Hangzhou (Fig. 11a). The model 2 scenario shows that urban expansion will primarily occur in highly populated areas with less expansion in areas of low population density (Fig. 11b). For models 3 and 4, less expansion is observed in eastern Hangzhou and the outer suburbs Lin’an and Fuyang. Model 3 predicts greater urban expansion than Model 4 in southern and northwestern Hangzhou, while Model 4 predicts much more expansion in Xihu District. Model 5 suggests that new urban areas will form along existing major roads (Fig. 11e). Model 6 (Fig. 11f) projects expansion in the coming 15 years in the fringe areas of both Hangzhou city center and satellite cities Lin’an and Fuyang.

Fig. 11
figure 11

The urban scenario prediction to the year 2030 using all six IFB-DE-CA models

To examine differences in the predictions, ten class-level landscape metrics were used to characterize the spatial patterns of the scenarios (Table 7). PLAND shows the percentage of urban land use in 2030, where only ~ 0.7% differences exist between the models, indicating similar quantity control ability. Models 1–2 have greater LPI than the other models, suggesting that by 2030, these two models will fill more non-urban areas surrounded by the largest urban patch in 2015. These findings were confirmed by their relatively smaller values of TE and PD. PAFRAC suggests that the model 2 scenario yields a relatively less complex urban pattern than the other scenarios. As inferred by IJI, the urban patches of the model 3 scenario are more juxtapositioned while those of the model 5 scenario are less juxtapositioned compared with the other models. This indicates that the CITY factor aggregates more new urban cells to the Hangzhou city center (Fig. 11c) while the ROAD factor attracts more urban cells along road networks (Fig. 11e). All scenarios have similar large values of the COHESION, DIVISION, and AI metrics, demonstrating the high connectivity among urban patches and the aggregation effect of the DE-CA models in predicting future urban expansion. Models 5–6 show the highest SPLIT, suggesting that urban landscapes in these simulations are more fragmented when compared to the other four scenarios. We applied the same samples to build the DE-CA models using DE, and the differences among the scenarios can be attributed to the effect of different driving factors.

Table 7 Landscape metrics of predicted 2030 urban land-use

Discussion

We used a stepwise GAM to examine the effects of driving factors and ranked these factors according to their statistical significance. Each factor was used to produce a land conversion probability map to create a DE-CA model. Such models based on individual factors have not been previously reported. We identified five combinations of land-use driving factors to generate conversion probability maps, which were in turn used to build five MFB-DE-CA models. The rapid urban expansion in Hangzhou during 2005–2015 was then simulated using all DE-CA models, yielding high overall accuracies exceeding 89%. Our accuracies are comparable to previous case studies in Hangzhou, which reported accuracies of 84~87% (Hou et al. 2019; Liu et al. 2018). To examine the impacts of individual factors on future urban expansion, we applied all six IFB-DE-CA models to predict future scenarios of Hangzhou to the year 2030.

Our models show that an individual factor can be applied to establish a CA model, which generated defensible simulation results. Among the factors, DEM and POP have the highest impact on urban expansion at Hangzhou from 2005 to 2015. Models incorporating two factors produced more accurate simulations when compared with models using other factors. While earlier work suggests including more significant driving factors into urban CA models (Engelen 2002; Wahyudi and Liu 2013), our findings show that MFB-DE-CA models are not necessarily superior to IFB-DE-CA models. The differences between the IFB-DE-CA and MFB-DE-CA models exist in both the simulated accuracy and spatial patterns. IFB-DE-CA models 1–2 performed better than the MFB-DE-CA models 19 and 21, which bettered the IFB-DE-CA models 3–6 by ~ 1%. This indicates that a CA model using only the most influential factor (e.g., DEM) performs better than a model using a less significant factor (e.g., RAIL) only and may perform better than models that include both the most influential factor and other less significant factors (e.g., DEM-POP-COUN-RAIL-CITY). Our findings confirmed the results of Feng and Tong (2017b), who showed that model performance can be reduced when too many factors are included.

CA models are built based on the land conversion probability defined by driving factors and the combined effects of other model elements. The definition of these elements and their processing may substantially affect the simulation results (Poelmans and Van Rompaey 2010; Sang et al. 2011). Possible issues include neighborhood configuration, constraints, spatial resolution, sampling methods, and particularly the algorithms applied to define land conversion probability (Feng and Tong 2017a). In this study, all DE-CA models applied the same settings except the factors included in the transition rules. As a result, the differences among the models in this research can be attributed mostly to the different impacts of the driving factors.

Driving factors are approximations of real geographic and socioeconomic elements, and the visualization of these elements may lead to the loss of some spatial detail (Goodchild et al. 1992). Different categories of factors have different effects on urban expansion. Factors belonging to the same category may generate different impacts on urban expansion, because they represent different geographical aspects. When more driving factors are included in DE-CA models, the interaction among factors does not necessarily lead to improvements in the simulation results and may even reduce the simulation accuracy. Applying more driving factors probably leads to multicollinearity (Feng and Tong 2017b), introducing more geographical data errors that propagate through data processing and model implementation and lead to negative impacts on modeling.

The selection of appropriate driving factors is crucial in CA modeling, which can produce more realistic simulation results by taking into account the most influential factors. GAM is suitable to quantify the impacts of factors using the fitting statistics and to select the appropriate combination of factors. Our study is a good example of how to select the most influential driving factors among the candidates for calibrating CA models. Future urban scenarios from various IFB-DE-CA modeling differ because each scenario emphasizes the different impacts of each factor. For a comprehensive prediction of urban expansion, IFB-DE-CA models may not be the best choice because cities are complex systems affected by numerous factors. However, IFB-DE-CA models can inform urban modelers and policy-makers about potential urban scenarios when considering each factor separately. These scenario predictions can be early warnings of the consequences of different urban development schemes and can help policy-makers to adjust and avoid unfavorable future urban scenarios.

Conclusions

We used a stepwise GAM to examine the rank orders of candidate driving factors and identify 21 combinations of these factors, which were applied to calibrate 21 DE-CA models. Among these models, six IFB-DE-CA models and five MFB-DE-CA models were applied to simulate urban expansion at Hangzhou from 2005 to 2015, and all these models produced defensible 2015 simulations with overall accuracies in excess of 89%. We finally applied all six IFB-DE-CA models to project future urban scenarios for the year 2030.

We concluded that (1) a CA model can be constructed using only one driving factor, (2) IFB-DE-CA models may outperform MFB-DE-CA models in simulating present urban patterns, and (3) the major differences between the two types of models are their projections of future urban scenarios. Each IFB-DE-CA model produces a very different future scenario that is shaped by the corresponding factor, informing the modelers and policy-makers about how cities will be formed if the corresponding factors are individually applied. This improves our understanding of the effects of driving factors on urban dynamics and their impacts on CA models when incorporating them into the models separately. In contrast, the MFB-DE-CA models take into account several factors simultaneously and allocate new urban cells where the combined effect of all factors is higher than the threshold. Because cities are complex systems, we suggest examining the significance and impact of each candidate factor using GAM to determine whether it should be included in the model.