Keywords

1 Introduction

Rapid urbanization in the past 50 years, triggered by the population growth and migration from rural to urban and suburban areas, presents one of the greatest challenges in environmental, economic, social, political, and cultural research (Antrop 2004; Tang et al. 2012; Tayyebi et al. 2012). The total urban population is 82 % with an estimated 1.2 % annual increasing rate from 2010 to 2015 in the United States (US Census 2011). The motivation to model urban landscape dynamics arises from the process of examining where and to what extent landscape change has occurred, and furthermore, the need to understand how and why the changes can occur (Weng 2002; Yang and Lo 2002). One of the greatest challenges in designing effective urban models is that their performances are often limited by the inadequate digital data source over time as well as the consideration of external driver such as socioeconomic development and human disturbance (Pickett et al. 1997; Mcintyre et al. 2000).

Remote sensing data, with the ability to provide large-scale data sources such as historical maps or urban land use maps, has been used as an effective tool in quantitatively measuring urban landscape and modeling urbanization at a relatively large spatial scale (Herold et al. 2003; Tang 2011). Images from satellite sensors provide a large amount of cost-effective multispectral and multi-temporal data to monitor landscape changes and estimate biophysical characteristics of land surfaces (Weng 2002). Many researchers have proposed the routine to combine remote sensing with GIS in urban growth models (Tang 2011; Tayyebi et al. 2013). Significant progress in acquiring remotely sensed data in a higher spatial resolution and developing the spatial geographic process model has widened our research on the process, driving forces, and impacts of the urbanization.

The cellular automata (CA) model, introduced by Tobler in 1979, is one of the most powerful spatial dynamics techniques used to simulate complex urban systems (Batty and Xie 1994). The CA model allows researchers to view the city as a self-organizing system in which the basic land parcels are developed into various land use types. Cecchini and Viola (1990) applied simple decision rules in the CA model to predict the complex, large-scale structure in the urban growth process. Wu (1998) combined the multicriteria evaluation (MCE) and GIS into the CA model to define the transition rules in a visualized environment. Shafizadeh-Moghadam and Helbich (2013) used AHP (analytical hierarchy process) to determine the weight in a Markov chains-cellular automata urban growth model.

The advantages of the CA model in simulating urban spatial process and dynamics (Hillier and Hanson 1984; White and Engelen 1993) have been widely documented because the theoretical abstraction of the CA model and the practical constraints in the real world can be easily related (Batty and Xie 1994; Clarke and Hoppen 1997; Wu and Martin 2002). The model begins from a homogeneous cell-based grid and adjusts itself through the transition rule derived from its local spatiotemporal neighborhood. This makes the CA model suitable to simulate complex and hierarchical structures since more unknown, immeasurable spatiotemporal variables can be incorporated and manipulated in this model. Another advantage in CA simulation is the ability of the model to incorporate proper parameters or weights to model the alternative socioeconomic states in the model development (Clarke and Gaydos 1998; Li and Yeh 2000). With better computer techniques, the CA model is also able to explore more complex human behavior through defining different transition rules (Li and Yeh 2000; Wu and Martin 2002). However, the tension between the simple local transition rule in CA models and the complex, unpredicted social changes in urban landscapes still remains.

In this context, this chapter attempted to develop a spatial-explicitly CA model to simulate urban growth patterns using the classification result from Landsat images and another one incorporated the socioeconomic data with the same classification results. Two CA models were compared to test how the socioeconomic data could improve the urban model simulation in Houston during the last 30 years. Specially, the following research questions were addressed: How the socioeconomic data could be incorporated with remote sensing in the urban growth model? Does the socioeconomic data improve the model? In which classes does this model improve?

2 Urban Model Review and Socioeconomic Data in the Model

With the availability of spatial data on a large scale, various sophisticated models, especially after the late 1990s, were developed such as UrbanSim model (Waddell 2002), Markov chain model (Stewart 1994), LUCAS model (Berry et al. 1996), CLUE model (De Kong et al. 1999), area-based model (Lichtenberg 1985; Tayyebi et al. 2011, 2013); CA model (Batty and Xie 1994), Land Transformation Model (Pijanowski et al. 1997, 2014), and agent-based model (Liebrand et al. 1998). The detailed review of these spatial explicit models is listed in Table 12.1.

Table 12.1 Detailed comparison of urban models

In terms of the methods to represent the model object, there are vector-based models and grid-based models (Herold 2004), and both of them have been used to incorporate socioeconomic data. Vector-based models use the thematic map as the input data for the model, and the spatial objects are usually defined as homogenous land units. UrbanSim is one of land use simulation models for the growth government, regional land use, and transportation planning in the states of Hawaii, Oregon, and Utah (Waddell 2002). Within the context of urban infrastructure and governmental policy, UrbanSim represents zonal structure in the urban area to monitor the socioeconomic-related behaviors of households, business, and land developers. Theoretically, UrbanSim is an object-oriented model. What if model (Klosterman 1999) begins with uniform analysis zones or homogeneous land units generated from the GIS software. Through applying the governmental policies and land use demands, this model derives the aggregating value of the regional condition on the land units. What if model projects future land use patterns by balancing the supply, demand, and land sustainable at different locations. Area-based model is a vector-based model used in resource assessments to predict the availability of farm and forest land. Transformed from the regional model (Palmquist 1989), area-based model allocates the proportions of a given land use to predefine land use categories using Lichtenberg’s (1985) acreage allocation method (Tayyebi et al. 2011, 2013).

Another vector-based model is Markov model which predicts future landscape patterns based on the spatial transition probability. Although Markov model is a typical spatial transition model, early Markovian analysis is a descriptive tool to predict land use change on a local or regional scale (Bell 1974; Bourne 1976; Arsanjani et al. 2013). Actually, the Markov model is not a strict vector-based model; it is based on the statistical results from the thematic map. Lopez et al. (2001) used Markov chain to simulate the relationships among a set of urban and social variables in predicting land use/cover change in the urban fringe of Morelia city, Mexico. Weng (2002) demonstrated that the integration of satellite remote sensing and GIS techniques into the stochastic urban modeling was an effective approach for analyzing the direction, rate, and spatial pattern of landscape change in Zhujiang Delta of China. Tang et al. (2007) improved the Markov chain model by incorporating a modified genetic algorithm in the urban boundary expansion for urban simulation. Mathematically, most vector-based models rely on some static equations, and this characteristic provides the potential in integrating the statistical information into the model entities. The major drawbacks of such models are the poor handling in dynamic entities and poor representation of external variables, e.g., the spatial information and socioeconomic factors.

The models developed on grid have more advantage in solving these problems than the vector ones. Land-Use Change Analysis System (LUCAS) is a grid-based model which integrates socioeconomic and ecological variables in the multilayered, gridded maps (Berry et al. 1996). This model consists of three subject modules: socioeconomics, which derives the transition probability from the function of socioeconomic driving variables; landscape change, which predicts the landscape maps from the socioeconomic module; and environmental impacts, which estimates the impacts of selected environmental variables from the landscape maps from second modules. Land Transformation Model (LTM) (Pijanowski et al. 1997, 2014) applied the spatial rules to land use transitions for each location in the processed spatial layer or grid. It is easy to quantify the contribution of different spatial variables because of its grid format. In order to aggregate the land use change and change drivers, this model adopted the similar method with the Conversion of Land Use and its Effects (CLUE) model (De Kong et al. 1999). Both of them apply the variable values in grid format to create a series of future land use patterns over the time. Cellular automata model has been proposed and developed to simulate the urban land use model by incorporating various socioeconomic variables, such as dynamic transportation model (Aljoufie et al. 2013) and dynamic population density (Van Vliet et al. 2012).

Agent-based model (Liebrand et al. 1998) is a complex behavior model which used both vector data and raster data. Usually, the raster data is the agents’ environment, and the agents, in turn, act on the simulated environment. This model can be applied to a wide variety of simulations, including moving cars, animals, people, or even organizations. The socioeconomic variable, as both agents’ status and driving forces, was incorporated into the model to simulate individual activities (An et al. 2005). This model is difficult to develop and control since we need to incorporate the “individual agent” information and predict its potential behaviors.

Generally, a reliable urban growth model should have the following capabilities: (1) providing an appropriate theoretical and technical framework for urban growth; (2) understanding and describing the historical dynamics of urban structures; and (3) exploring and incorporating different economic and social parameters to monitor the urban growth.

3 Study Area and Data Preparation

The eastern metropolitan area of Houston, Texas, covering an area of 1,200 km2, was chosen as the study site (Fig. 12.1). Houston is situated in the northern portion of the Gulf coastal plain, a 60 by 80 km-wide swath along the Texas Gulf Coast, 80 km from the Gulf of Mexico (Moser 1998). This area has experienced rapid urban development since the 1930s after the discovery of oil (Tang et al. 2008) in nearby oil fields. These discoveries made it the largest city in Texas as of 1930 and the fourth largest city in United States since 1990 (Texas State Historical Association 2002). Although the government tried to diversify its economy (Key to the city 2001), the city’s unchallenged role as an international center of oil technology, headquarters for a number of the world’s largest energy companies, and a strong refining and petrochemical manufacturing base should shore up the local economy of Houston in the near future. The representative land use/land cover classes in this selected region include residential area, commercial/industrial area, transportation, woodland, grassland, and barren/soil.

Fig. 12.1
figure 1

Study site, Houston, Texas, in the United States

The satellite Landsat MSS/TM images were collected from 1970s to 2010 in this study. All these images were georeferenced to the Universal Transverse Mercator projection using ENVI. The convention Maximum Likelihood Classification was adopted to obtain four classified landscape maps with six landscape classes for each map. We chose two set of samples around 600 pixels for training samples and test samples, respectively. The selection of separate of training and test samples was guided by the characteristic of each class at different years. The overall accuracy assessment of classified maps was 92 % (1979), 94 % (1990), 96 % (2000), and 95 % (2010). Figure 12.2 shows the detailed proportion of each land use type as shown in Table 12.2.

Fig. 12.2
figure 2

Satellite images and classification results from MLC method on October 1979, December 1990, November 2000, and October 2010

Table 12.2 The proportion of each land use type from 1979 to 2010 in Houston

In order to represent the rapid socioeconomic development in the Houston area, four major socioeconomic variables were collected: population density, house density, road density, and distance to highways (Van Vliet et al. 2012; Aljoufie et al. 2013). These four variables were collected at census block level from the official website of the U.S. Census Bureau (US Census Bureau 2010).

4 Methodology

A cellular automata model was developed to investigate the scenarios of future urban land transformations in Houston. This model started on a 30-m grid and the transition rules were applied to all cells at the same time, and the entire grid was updated at the annual iteration. The transition rules were defined as the difference between the center cell and eight neighbors within 3 × 3 Moore’s neighborhood. To determine the state of a cell in a certain time period, the simulation function was written as:

$$ {\mathrm{S}}_{i,j}^{t+1}={a}_{\mathrm{N}}\times {\mathrm{N}}_{i,j}^t+{a}_{\mathrm{M}}\times {\mathrm{M}}_{i,j}+{a}_{\mathrm{S}\mathrm{E}}\times {\mathrm{S}\mathrm{E}}_{i,j} $$
(12.1)

where N t i,j denotes the diffusion factor regarding its neighborhoods, M i,j denotes the Markov transition probabilities, SE i,j denotes the socioeconomic status of each single cell and its neighborhoods; a represents the coefficients for these variables.

For a self-organizing CA model, the diffusion factor, Markov transition rules, and socioeconomic status were defined as:

$$ {\mathrm{N}}_{i,j}=\frac{n_{i,j}}{{\displaystyle \sum {n}_{i,j}}} $$
(12.2)
$$ {\mathrm{M}}_{i,j}={\displaystyle \sum_{k=1}^k\frac{{\mathrm{N}\left(i,j\right)}\!\left/ \!{{\displaystyle \sum_{i=1}^mm}}\right.}{k*{m}_k}} $$
(12.3)
$$ {\mathrm{SE}}_{i,j}=\frac{{\displaystyle \sum_{n=1}^n\left(\frac{d_{i,j}^n- \min \left({d}_{i,j}^n\right)}{ \max \left({d}_{i,j}^n\right)- \min \left({d}_{i,j}^n\right)}\right)}}{n} $$
(12.4)

where n i,j is the total number of class i surrounding the observed class j, N(i, j) is the observed landscape amount changing from class i to class j during total m years at k internal steps, and d n i,j is the different value in the selected four socioeconomic variables between the observed center cell and its n neighbors (Fig. 12.3).

Fig. 12.3
figure 3

The visualization of the socioeconomic value in Houston (a) Population density; (b) House density; (c) Road density; and (d) Distance to highway

Although the socioeconomic data were collected at the last year of simulation, the difference of socioeconomic values between the observed cell and the neighbors was used to determine the socioeconomic factors. Obviously, different socioeconomic variables have different impact weights to the urban land use/land cover change. In order to find the weightiness of each socioeconomic variable, 20 experts in the field of socioeconomic and land use change were invited to assign weights to each variable using the index ranging from 0 to 10 to represent the weight from the highest impact to the lowest impact. The average value of these ratings was shown in Table 12.3.

Table 12.3 The weight of socioeconomic indices

A critical issue in the CA model is the provision of proper methods to calibrate the CA model to find appropriate coefficients for the diffusion factor, Markov transition rules, and socioeconomic status (Hagen-Zanker and Lajoie 2008; Van Vliet et al. 2011). To calibrate the model, we used the classified Landsat TM image as empirical maps on the following dates: November 5, 1984; July 20, 1990; October 6, 1999; and November 9, 2000. We randomly selected an encoded weight number (ranging from 1 to 10) for each factors, run the CA model using these weight number, and compared the cells simulated in the CA model with the cells located in the empirical maps to choose the weight number with the highest fitness. The CA model was run at yearly intervals to represent one combination until the next calibration year. These steps were repeated until the year of the last calibration map.

For the validation, the model’s simulation output was compared to the empirical map, occurring in the same simulated year (Pontius et al. 2004; Pontius and Cheuk 2006) through visual inspection and quantitative evaluation. In this research, we adopted the classified map in October 31, 2011, as an empirical map and overlaid it with the predicted map to generate a black-and-white error image. Meanwhile, an error matrix was built up with the user’s and producer’s accuracy for each class as well as the overall accuracy and Kappa for the entire landscape.

5 Results and Discussion

Since our model was based on actual observation from the last 30 years in Houston, the temporal transition probability matrix is calculated by accumulating the periods from 1979 to 2010. We first calculated the yearly transition matrix between each two subsequent maps between 1979–1990, 1990–2000, and 2000–2010 and then calculated the yearly transition matrix between 1979 and 2010 using Eq. 12.3. The yearly transition probability matrix from 1979 to 2010 is shown in Table 12.4.

Table 12.4 Yearly transition probability (%) matrix from 1979 to 2010

Using the yearly transition probability matrices in Table 12.4, we parameterized the Markov transition probability and socioeconomic variable on the census block level into the CA model. Two CA models were built up, one with the socioeconomic variables and another one without. Figures 12.4 and 12.5 show the initial state and simulated pattern of Houston with the socioeconomic variables and without the socioeconomic variables, respectively.

Fig. 12.4
figure 4

The simulated landscape pattern of Houston with the socioeconomic factors

Fig. 12.5
figure 5

The simulated landscape pattern of Houston without the socioeconomic factors

The simulated results from two models have similar pattern in general urban sprawl pattern: fast shrinkage in grassland and woodland and clear outward expansion in residential or industrial/commercial area. This growth pattern could be observed in the southeastern and northeastern city with a large amount of new residential and industrial/commercial area being built in the last 30 years. Different from other large cities in the United States, Houston did not adopt city zoning laws in its urban planning. Lacking city zoning has led to an abundance of urban sprawl in Houston, resulting in a relatively large metropolitan area and low population density. Land developers inspired the spread of Houston when they built suburbs such as Pasadena (1892), Houston Heights (1892), Deer Park (1892), Bellaire (1911), West University Place (1919), and River Oaks (1922–24).

Although the simulated results from two models have the similar sprawl pattern, the model with the parameterized socioeconomic variables had a better correspondence with the “abrupt” expansion in residential and industrial/commercial area. From Fig. 12.6, we could find that the “abrupt” expansion were simulated well in the model with the socioeconomic data as the larger predicted area in these human-related landscapes in the year 2010. This “abrupt” expansion was caused by the rapid economic development, population growth, and road construction in Houston. The simulated pattern by the model without socioeconomic factors was much tardier, especially in simulating the rapid growth in suburban area. The differences between these two models indicate that the CA spatial model could simulate the urban evolution behaviors with incorporating enough driving factors.

Fig. 12.6
figure 6

The estimated results from two models in 2010

In order to display the error in the predicted map, we compared our predicted results with the empirical maps. The differential map was shown in Fig. 12.7. White pixels in the figure represented the area predicted correctly, while dark pixels represented the incorrect prediction. Generally, the residential areas were best predicted and most of the errors were found in the suburban area, which were mostly grassland and barren/soil landscapes. The woodland was predicted better than other natural landscapes, which might be caused by the large forest reserved area in northeastern Houston in the Sheldon Lake State Park and Dwight D. Eisenhower Park.

Fig. 12.7
figure 7

The differential map between the predicted map and empirical map (a) with socioeconomic factors and (b) without socioeconomic factors

In Fig. 12.7, the predicted result with the socioeconomic data (Fig. 12.7a) was better than the one without the socioeconomic data (Fig. 12.7b) with more white pixels. This could be confirmed in the southwestern Houston, such as Gulfton, Sharpstown, and Bellaire, and southeastern Houston between Deer Park and Pasadena. The incorrect predictions were always also found in the industrial/commercial area in the Southern and Northeastern Houston, such as Missouri City and Jersey village. It was easy to understand since the chosen socioeconomic data, especially the population density and house density, were better to represent the residential area instead of industrial/commercial area.

Further validation of models between the simulated one and predicted one was analyzed through the confusion matrix (Table 12.5). This table showed the comparison results between the simulated result and empirical maps as the value of user’s accuracy and producer’s accuracy represented the accuracy for each class and the overall accuracy and Kappa represented the accuracy for the entire landscape. In both models, the best predicted class was the residential area (with 66.97 %/59.03 % user’s accuracy and 77.32 %/53.91 % producer’s accuracy) and the worst prediction class was barren/soil (with 40.78 %/20.48 % and 1.43 %/11.74 %). The barren/soil class, although had the least area in the study area, were easy to be confused with other classes such as industrial/commercial or residential area. The incorporation socioeconomic data into the model improved the simulation on the residential or industrial/commercial classes which made the barren/soil having the least accuracy.

Table 12.5 Confusion matrix and the model validation for two models

One disadvantage of incorporating socioeconomic data into the model was the overestimation of residential area in which led to a relative underestimate in the industrial landscape as well as other natural landscapes such as woodland and grassland. This might be improved as more and more socioeconomic data were incorporated as driving forces in the model. The analysis of the model validation showed that the appropriate ancillary parameters were necessary for the CA model to derive a solid result. In fact, the value of the simulation approach lied in its exploratory nature which enabled the improvement of models with additional variables later. Meanwhile, the CA model had an “aggregate” function to smooth the heterogeneous pattern within the urban and suburban area. One solution to solve this problem was to incorporate better data source into the model, such as higher spatial resolution images or sub-pixel classifications, to improve the accuracy of CA models.

6 Conclusion

The spatiotemporal CA model of urban landscape patterns using multi-temporal TM and MSS imagery enabled us to characterize the internal structure of landscapes and monitor the landscape dynamics for Houston. Moreover, we also explored the potential of socioeconomic variables to detect how human forces affect the urban spatial pattern.

The CA model, coupled with the Markov transition probability, has indicated the capability of trend projection for the landscape change. This spatiotemporal model provided not only the quantitative description of change in the past but also the direction and magnitude of change in the future. However, based on the experimental results and exploratory analysis, several limitations still exist within the current study:

  • Since the modeling process involves the usages of data from multiple sources, the accuracy of prediction result will be closely related to the individual accuracy with each type of data, especially different remote sensing data sources. The development of a robust method to incorporate data in different spatial resolution was still an interesting issue.

  • Although the Markov transition probability was calculated on the census block level, it was stationary and unable to accommodate the unpredictable influence variables, such as the climate, policy, and human disturbance. In addition, the pace of landscape change was usually kept on changing over the entire period.

  • In this research, we supposed the relationship between socioeconomic factors, neighborhood effect, and Markov transition probability was linear and deterministic during the calibration. Finding an exact dynamic coefficient between them was still an intricate study in the urban modeling.

Currently, it was not fully conclusive that the CA model based on socioeconomic data was inferior to the one without socioeconomic data, especially for the natural landscapes. It was still necessary to find more sophisticated methods applying to a series of varied landscape to verify this new model.

Most urban landscapes have been influenced by human disturbance, resulting in a heterogeneous mosaic of natural and human-managed patches that vary in size, shape, and arrangements (Turner 1989). The landscape responses to human disturbances are important, however, difficult to be estimated because the landscape-level simulation involved numerous challenging experiments and hypotheses in the development of models (Vaz et al. 2012). These hypotheses are always assumed to make the process model easier to be manipulated, leading to a more homogenous pattern in the predicted result. Thus, it is necessary to relate the homogenous analysis in the model prediction with the heterogeneous analysis in the quantitative landscape method for a comprehensive understanding of the urbanization process. In conclusion, this urban studies show that by incorporating more spatial algorithms into the prediction of landscape change, more accurate long-term landscape changes can be reproduced in the future.