Introduction

Land use change is a complex process that is affected by human activities and natural environmental changes (Arsanjani et al. 2011; Etemadi et al. 2018; Liu et al. 2017b; Memarian et al. 2012; Meyer and Turner 1994; Wang et al. 2019a; Watson 2000; Yang et al. 2014). After China’s reform and opening policies were implemented in 1978, this country has experienced rapid land use change. Rapid industrialization and urbanization have resulted in significant environmental impacts, such as water and air pollution (Lin and Zhu 2018; Peng et al. 2016; Shao et al. 2006; Wang et al. 2019b). This is especially true in many cities, such as Shenyang city in northern China, which is a key internal trading center and has a reputation as an industry leader in China (Geng et al. 2013). Therefore, a study of land use change in this area is urgently needed to facilitate environmental management.

Modeling is an important technique for studying land use dynamics (Lambin 1997; Mustafa et al. 2018; Omrani et al. 2017; Zheng et al. 2015). Rapid advances in geospatial models have made it increasingly possible to simulate land use change. Different geospatial models, such as CLUE, GEOMOD, cellular automata (CA), Markov chain model, and SLEUTH, have been used to assess land use change (Arsanjani et al. 2013; Dickinson and Henderson-Sellers 1988; Dietzel and Clarke 2006; Hagenauer and Helbich 2018; Veldkamp and Fresco 1996; Verburg et al. 2002). CA and Markov chain models are the most common approaches used for simulating land use dynamics. It is difficult to simulate the spatial pattern of land use change using Markov chain models (Ye and Bai 2008). However, CA models with powerful spatial computing can be used to predict spatial variations in land use (Sang et al. 2011). Therefore, it is expected that the integration of CA and Markov chain models may have potential for projecting land use change.

Worldwide use of CA–Markov models for predicting land use change has significantly increased in recent years (Al-sharif and Pradhan 2014; Arsanjani et al. 2011; Chen et al. 2013; Fu et al. 2018; Kamusoko et al. 2009; Luo et al. 2015; Mishra and Rai 2016; Mitsova et al. 2011; Naboureh et al. 2017; Vaz et al. 2012; Wang et al. 2019c; Yang et al. 2014). With the rapid development of urbanization, we believe that the use of a CA–Markov model for predicting land use change will be expected to increase in the future. According to the methods used to generate probability surfaces of the driving factors, studies on land use change modeling can be divided into two categories, namely, multicriteria evaluation (MCE) and logistic regression methods. MCE is a common approach to generate probability surfaces of driving factors (Fu et al. 2018; Kamusoko et al. 2009; Ku 2016; Vaz et al. 2012; Zhou et al. 2012). For example, Kamusoko et al. (2009) simulated future land cover changes (up to 2030) based on an MCE-CA–Markov model to assess rural sustainability in Zimbabwe. The results indicated that if the current land cover trends continued without holistic sustainable development measures, severe land degradation would occur. Zhou et al. (2012) successfully assessed regional land salinization resulting from biophysical and human-induced influences using an MCE-CA–Markov model in the Yinchuan Plain in northwest China in 2009. Logistic regression methods are frequently utilized approaches to generate probability surfaces of driving variables (Arsanjani et al. 2013; Hamdy et al. 2016; Islam et al. 2018; Liu et al. 2015; Siddiqui et al. 2017; Sun et al. 2018; Wang et al. 2019c). For example, Arsanjani et al. (2013) analyzed suburban expansion in the metropolitan area of Tehran in Iran using a logistic regression–CA–Markov model. Siddiqui et al. (2017) simulated urban growth dynamics of an Indian metropolitan area using a logistic regression–CA–Markov model. Compared with the MCE method, logistic regression is an easier and more efficient method for generating suitability maps (Arsanjani et al. 2013; Fu et al. 2018; Islam et al. 2018; Liu et al. 2015; Siddiqui et al. 2017; Sun et al. 2018; Wang et al. 2019c); therefore, this method has become increasingly popular for land use change simulations worldwide.

In this research, we used a model that combines logistic regression, CA, and Markov chain analysis to evaluate the changes in eight land use types in southern Shenyang in northern China under various driving forces including environmental and socio-economic factors. This area is experiencing rapid urbanization and environmental problems are increasing, such as haze events. The goal of this study was to evaluate the potential of using a logistic regression–CA–Markov model for estimating land use changes in a cool temperate region and simulate future land use changes in order to provide reference data for environmental management.

Materials and methods

Study area

The study area is located in central Liaoning province (122°41′~123°80′E, 41°20′~42°29′N) in northern China (Fig. 1) and covers an area of approximately 8.42 × 105 ha. There are four distinct seasons, and it is mild in the spring and autumn with a hot summer (25~31 °C) and cold winter (− 25~ − 30 °C). The average annual precipitation is about 646 mm. There are 12 districts and the population is nearly 6.9 million. Cultivated land is the predominant land use and accounted for 69.1% of the study area in 2010. This area has experienced rapid urbanization and land use change. The built-up land has increased from 12.6% (2000) to 17.4% (2010). The economy of this area is based on heavy industry and the area is one of China’s largest industrial centers. It is home to an extensive industrial system including electronics, textiles, chemicals, metallurgy, and food industries (Geng et al. 2013).

Fig. 1
figure 1

The geographic location of the study area (red area)

Data and data processing

The main data sources were digital land use maps from 2000, 2005, and 2010. The digital map data for the study area were classified into eight classes: residential land, urban traffic, mining lease, industrial land, cultivated land, forest land, undisturbed (desert and bare land), and water. The focus areas of this study are not only the assessment of urban expansion but also the detection of changes in industrial land, urban traffic, and residential land. In addition, two categories of driving forces are expected to explain land use change, namely, (i) environmental factors including elevation, slope, aspect, and precipitation and (ii) socio-economic factors including population density, distance to roads, water, tourist attractions, town, as well as gross domestic product (GDP). Elevation, slope, and aspect were derived from a 30-m digital elevation model (DEM) obtained from the U.S. NASA website (http://reverb.echo.nasa.gov/reverb/). The other data used in this study, such as the road, water, and town layers were all obtained from the statistical yearbook and the local land department.

Modeling approach

In this study, we integrated a logistic regression model, CA, and Markov chain analysis to predict the expected land cover (2010 and 2020); IDRISI software was used. The CA–Markov model requires a land cover dataset to represent the initial states, a Markov transition matrix, a group of land use suitability images, a number of iterations, and a contiguity filter. Specifically, two land use maps at different time points were used to calculate the probabilities of transition in land use using a Markov chain model; (1) a matrix of transition probabilities between the 2000 and 2005 maps was used to predict the land use in 2010 using the year 2005 as a starting point. (2) Similarly, the 2000 and 2010 land use maps were used to calculate the probabilities of transition to predict the land use in 2020. We used the year 2010 as a starting point. The maps of driving variables were created using logistic regression. Subsequently, the resulting probability surfaces of the dependent variables (different land types) were used to estimate the degree of change based on the CA–Markov model. A standard 5 × 5 contiguity filter was used as the neighborhood definition in the simulations; 10 CA iterations were used to predict the spatial pattern in the study area. During each iteration, the pixels with the highest transition probability and highest suitability for a particular land type were changed to a new land type whereas the pixels with lower probabilities and lower suitability remained unchanged; 30-m resolution spatial data were used as the input to the model.

The overall prediction accuracies (OPA) and kappa index (Rosenfield and Fitzpatricklins 1986) were used to assess the model performance. The OPA is defined as follows:

$$ \mathrm{OPA}\left(\%\right)=\frac{\mathrm{Simulated}\ \mathrm{land}\ \mathrm{use}\kern0.5em \cap \mathrm{Actual}\ \mathrm{land}\ \mathrm{use}\ }{\mathrm{Actual}\ \mathrm{land}\ \mathrm{use}\ } $$
(1)

The OPA provides a measure of the similarity between the simulated land use and the actual land use and ranges from 0 to 100%. The closer the OPA is to 100%, the more accurate the model is.

Logistic regression

Logistic regression is used when the dependent variable is a binary variable (0 and 1) and the independent variables are continuous and categorical variables (Long 1997; MacCullagh and Nelder 1989). The dependent variable in a logistic regression model represents the probability that a particular theme will be in one of the categories (Arsanjani et al. 2013). The basic assumption is that the probability of the dependent variable takes the value of 1 (positive response) and follows the logistic curve and its value can be estimated with the following formula:

$$ \mathrm{P}\left(y=1|X\right)=\frac{\exp \left(\sum BX\right)}{1+\exp \left(\sum BX\right)} $$
(2)

where P is the probability of the dependent variable; X represents the independent variables,X = (x0, x1, x2...xk), x0 = 1; B represents the estimated parameters, B = (b0, b1, b2..bk).

In order to linearize the model and remove the 0/1 boundaries of the original dependent variable (probability), the following transformation is commonly applied:

$$ {\mathrm{P}}^{\hbox{'}}=\ln \left(\mathrm{P}/\left(1-\mathrm{P}\right)\right) $$
(3)

This transformation is referred to as the logit or logistic transformation. Note that after the transformation P′ can theoretically assume any value between minus and plus infinity (Lewicki and Hill 2006). By performing the logit transformation on both sides of the above logit regression model, we obtain the standard linear regression model:

$$ \ln \left(\mathrm{P}/\left(1-\mathrm{P}\right)\right)={b}_0+{b}_1\ast {x}_1+{b}_2\ast {x}_2+...+{b}_k\ast {x}_k+ error\_ term $$
(4)

The main driving forces determining land use change were investigated in this study and logistic regression was used to create probability surfaces to determine the most probable sites that were developed. The dependent variable is the area of change (e.g., change from non- residential land to residential land); it has a binary form where a value of 1 indicates a change to residential land a value of and zero indicates areas of no change within a time period (Fig. 2). The independent variables have been described in “Data and data processing”. Figure 3 shows the spatial representation of the independent variables. The relative operating characteristic (ROC) method was used to validate the performance of the approach (Alsharif and Pradhan 2013; Pontius and Schneider 2001). ROC = 1 indicates a perfect fit and ROC = 0.5 indicates a random fit. In this study, stratified random sampling was chosen to eliminate spatial autocorrelation.

Fig. 2
figure 2

Dependent variable Y; change to residential land during 2000–2005 (left) and 2000–2010 (right) (no change: green cells; change to residential land: red cells)

Fig. 3
figure 3

Spatial representation of the independent variables

CA–Markov model

The Markov chain model is a random process model that describes how likely it is that one state (t1) changes into another state (t2) (Houet and Hubert-Moy 2006). Based on the Bayes’ theorem of conditional probability, the land use change is calculated using the following formula (Sang et al. 2011):

$$ S\left(t+1\right)={\mathrm{P}}_{ij}\times \mathrm{S}(t) $$
(5)

where S(t) and S(t + 1) are the system statuses at the time of t and t + 1. Pij is the transition probability matrix in a state and is calculated as follows:

$$ {\displaystyle \begin{array}{c}{\mathrm{P}}_{\mathrm{ij}}=\left[\begin{array}{cccc}{\mathrm{P}}_{11}& {\mathrm{P}}_{12}& \cdots & {\mathrm{P}}_{1\mathrm{n}}\\ {}{\mathrm{P}}_{21}& {\mathrm{P}}_{22}& \cdots & {\mathrm{P}}_{2\mathrm{n}}\\ {}\cdots & \cdots & \cdots & \cdots \\ {}{\mathrm{P}}_{\mathrm{n}1}& {\mathrm{P}}_{\mathrm{n}2}& \cdots & {\mathrm{P}}_{\mathrm{n}\mathrm{n}}\end{array}\right]\\ {}0\le {\mathrm{P}}_{ij}<1\ \mathrm{and}\ {\sum}_{j=1}^n{\mathrm{P}}_{ij}=1,\left(i,j=1,2,3\dots n\right)\end{array}} $$
(6)

However, in a Markov chain model, the spatial distribution of the land cover categories are unknown (Ye and Bai 2008). To address this problem, the CA–Markov model was developed to add a spatial dimension to the model using CA. CA has been widely used to simulate urban sprawl (Arsanjani et al. 2013; Dietzel and Clarke 2004; Guan et al. 2011; Torrens 2006; White and Engelen 1993). A CA model is an agent or object that has the ability to change its state based on the application of a rule that relates the new state to its previous state and those of its neighbors (Eastman 2009; Surabuddin Mondal et al. 2013). The CA model combined with powerful spatial computing can be used to predict the spatial variation. It has been demonstrated that the integration of a Markov chain model and CA was effective for predicting land use changes (Behera et al. 2012).

Results

Description of land use change during 2000–2010

The land use change from 2000 to 2010 is determined to quantify the extent and location of change. The changes are shown in Table 1 and Fig. 4.

Table 1 Area of land use changes during 2000–2010
Fig. 4
figure 4

Land use map of a 2000 and b 2010. Note: ML—mining lease, FL—forest land, IL—industrial land, CL—cultivated land, WT—water, UT—urban traffic, RL—residential land, and UD—undisturbed (desert and bare land)

As shown in Table 1, the main land use types in the area are cultivated land and residential land, which accounted for 85.5% of the total area in 2010. Residential land (including urban residential, and rural settlements) increased 37.4% from 2000 to 2010, whereas cultivated land showed a decreasing trend (− 6.7%), reflecting the rapid quick decline in cultivated land resources. In the period 2000–2010, approximately 41,576.9 ha of cultivated land changed into other land types. Mining lease, industrial land, urban traffic, and undisturbed land also increased in this period whereas forest land and water exhibited fewer changes. The most apparent trend is the expansion of built-up land including residential land (37.4%), urban traffic (29.3%), and industrial land (10.1%); most changes were located in the northeastern, eastern, and southeastern parts of the study area. Additionally, changes were observed in the main core of the metropolis (Fig. 4).

Quantification of land change and transition potential maps

The Markov chain model was used to analyze the land cover images in two time periods and output a transition probability matrix, a transition area matrix, and a set of conditional probability images. The probability matrix shows the probability of change of the land cover categories into other land cover categories (Table 2). For example, during 2000–2010, there was a 22.76% chance that cultivated land would transition into residential land and only a 1.98% chance that cultivated land would turn into forest land. During the same period, there was a 15.65% chance that forest land would transition into cultivated land. The transition area matrix lists the number of pixels that are expected to change from each land cover type to each other land cover type over the specified number of time units (Table 3). Figure 5 shows the Markov transition areas and suitability maps of the major four land cover classes in an area of central Liaoning province in northern China. An average ROC value of 0.824 was obtained, which verifies the validity of the logistic regression model to predict the most probable areas of development. These results were used for the subsequent land use change predictions.

Table 2 Markov transition probability matrix
Table 3 Transition area matrix based on the Markov model for 2020 (ha)
Fig. 5
figure 5

An illustrative example of the Markov transition probability areas and suitability maps of major four land covers for 2010

Land use change prediction results

We compared the simulated land use map of 2010 derived from the CA–Markov model with the actual land use map of 2010. A match of 83.7% was achieved between the simulated and actual maps of 2010, which represents a satisfactory calibration. The Kappa index was 0.86, indicating a very good agreement between the simulated and observed land cover. Thus, the approach used to predict the land cover in the future (i.e., 2020). Figure 6 b illustrates the spatial pattern of land cover in 2020.

Fig. 6
figure 6

Land use map of a 2010 and b simulated land use map of 2020. Note: ML—mining lease, FL—forest land, IL—industrial land, CL—cultivated land, WT—water, UT—urban traffic areas, RL—residential land, and UD—undisturbed (desert and bare land)

Table 4 shows the areas of land use change occurring in the time period 2010–2020. The forest land and cultivated land exhibited significant reductions, for example, the cultivated land decreased by 18.0%. Much of the cultivated land transitioned into other land types (Table 3). Approximately 104,673.7 ha of cultivated land changed into other land types in 2010–2020. There is an increasing trend in the built-up area including residential land (63.6%), urban traffic (54.4%), and industrial land (29.4%). The urban expansion is more extensive during 2010–2020 than during 2000–2010. There is a new wave of suburban development in the southwestern, western, and northwestern parts of the study area. Significant changes were observed in the main core of the metropolis (Fig. 6).

Table 4 Area of land use change during 2010–2020

Discussion

In this study, logistic regression, a Markov chain model, and a CA model were integrated to simulate urban expansion. Logistic regression is one of the most frequently used approaches to determine the suitability of a particular land use type in land use change modeling (Verhagen 2007). Our results indicate that logistic–CA–Markov model shows good predictive performance for land use changes in a cool temperate region, such as our study site located in northern China. The results are similar to those of other previous studies using the logistic regression–CA–Markov models to predict land use changes (Arsanjani et al. 2013; Memarian et al. 2012; Siddiqui et al. 2017; Sun et al. 2018; Wang et al. 2019c). Moreover, we should point out that different thematic resolutions (2-, 4-, 5-, 6-, 8- and 10-class land use maps) were used to simulate land use changes in the aforementioned studies. For example, Arsanjani et al. (2013) integrated logistic regression, Markov chain, and CA models to simulate urban expansion (5-classes). An agreement of 89.0% was observed between the simulated and actual maps. In our study, the logistic regression–CA–Markov model was used to predict land use change for 8-classes. The overall accuracy was 83.7%. Memarian et al. (2012) simulated land-use/cover changes (10-classes) using a logistic regression–CA–Markov model and a match of 89.9% was achieved between the predicted and actual maps. These results demonstrate that a higher thematic resolution does not result in lower OPA. This further emphasizes the potential of the logistic regression–CA–Markov model for land use change prediction.

The model outputs indicate that our study area is facing unprecedented challenges due to rapid urbanization. Based on the latest city plans of the Shenyang government, the urbanization rate will reach 90% in 2030. This situation will further exacerbate the extensive environmental pollution currently existing in this area (Fu et al. 2011; Liu et al. 2017a; Wang et al. 2013b; Yue and Du 2010). For example, the built-up land (consisting of industrial land, urban traffic areas and residential land) accounted for 12.67% of the total area in 2000 (Table 1) and increased significantly and reached 28.25% in 2020 (Table 4). As urbanization progresses, the area of impervious surfaces, such as pavement, rooftops, and compacted soil increases (Choe et al. 2002; Ferreira et al. 2013); as a result, runoff volumes from terrestrial catchments increase (Arnold Jr. and Gibbons 1996; Coker et al. 2018; Desta et al. 2019; Shuster et al. 2005) and pollutants (heavy metals, nutrients, and organic compounds) increase in the water after precipitation events (Arnold Jr. and Gibbons 1996; Carey et al. 2013; Coker et al. 2018; Islam et al. 2015; Wu et al. 2016). In addition, there is also a significant increase in industrial land areas (Tables 1 and 4). An increasing number of factories discharge sewage into waterbodies, resulting in large challenges for water quality management in the future. Moreover, as shown in Tables 1 and 4, the large increase in urban traffic areas and residential land will cause extensive and dangerous air pollution (Liu et al. 2017a; Xue et al. 2016), such as smog/haze because many pollutants (e.g., PM2.5, PM10, SO2, and NO2) that affect human health are discharged from industry, car exhaust fumes (Moldovan et al. 1999), and coal-fired stoves in winter (Buhre et al. 2005; Xiao et al. 2015). For example, Liu et al. (2017a) found that SO2 emissions in core urban areas were significantly higher than those in the surrounding urban areas during the heating season in Shenyang.

Our results also suggest that forest land and cultivated land have been shrinking owing to the urban expansion and the current trend will continue (Tables 3 and 4). Therefore, it is urgent to strengthen the protection of forest land and cultivated land, prevent the indiscriminate use of these areas, and promote sustainable use. Protection of these lands will facilitate socially sustainable development in this region, where short-term economic benefits should be balanced with long-term environmental and economic sustainability.

Although the logistic regression–CA–Markov is an effective technique for investigating land use change, there are several factors that significantly affect the prediction uncertainties in land use modeling. First, previous studies have shown that the main factors affecting model prediction accuracy are the driving forces (Arsanjani et al. 2013; Memarian et al. 2012; Park et al. 2011). Thus, the choice of an optimal set of driving forces can improve model prediction accuracy. In our study, several environmental and socio-economic factors were considered. However, a limited number of driving factors may have resulted in errors in the estimation of land use changes. Hence, additional socio-economic factors should be considered in land change modeling, such as the distance to hospitals and the distance to markets to improve model prediction accuracy. Second, land development policy has been inconsistent in recent years for the revitalization of the old industrial base in Northeast China (Wang et al. 2013a), which may also have caused uncertainty in the logistic regression–CA–Markov model predictions. For example, if a plot of land has been allocated for the purpose of developing a garden, the developer may instead choose to turn the garden into a factory. Finally, logistic regression techniques suffer from certain limitations, such as spatial autocorrelation of the independent variables (Arsanjani et al. 2013; Fotheringham et al. 2000; Hu and Lo 2007; Smith 1994), which may lead to errors in the suitability images (e.g., suitability images for predicting the land use map of 2020) and subsequently cause uncertainty in land use modeling. In summary, land use modeling is a complex process that is affected by natural factors, human-induced driving factors, and model limitations and may result in prediction errors of land use changes; therefore, model outputs of interest should be used with caution.

Conclusion

We used land use maps from different time periods (2000, 2005, and 2010) and integrated the logistic regression, CA, and a Markov model to successfully simulate land use changes in Shenyang city in northern China. The results indicate that the hybrid model has good potential to simulate land use changes. However, there are many significant uncertainty factors that affect the prediction accuracy of land use change, such as socio-economic driving forces, model limitations, and land development policy and result in simulation errors of the land use dynamics. Consequently, in order to improve the prediction accuracy, the uncertainty factors should be thoroughly considered and addressed when predicting land use dynamics.

The model outputs showed that our study area is facing unprecedented challenges due to rapid urbanization, which is likely to exacerbate the significant environmental pollution currently existing in this area. This poses a great challenge for environmental management in the future, and adaptation measures (e.g., low impact development and use of clean energy) should be implemented. In addition, forest land and cultivated land have been shrinking due to urban expansion. Therefore, it is urgent to prevent the indiscriminate use of forest land and cultivated land and promote the sustainable use of land. The results of this study are expected to provide input for land use management and environmental protection in this area.