Introduction

Land use and land cover (LULC) dynamics is a major global environmental issue because the dynamics in LULC are significantly affected by climate at the global level (Keshtkar and Voigt 2016). In addition, it distresses water balance, and biological cycles at the local and regional scale to the extent that it increases the incidence of natural hazards such as drought (Gidey et al. 2017). That is why the issue has now received due attention from scientists and decision makers to better understand, and evaluate the future dynamics and its impacts on the environment (Mas et al. 2014; Mubea et al. 2011). As a result, there is a high demand for improving LULC information in order to develop sustainable land use systems (Jansen and Di Gregorio 1998). In addition, Keshtkar and Voigt (2016), Omar et al. (2014), and Subedi et al. (2013) reported that prediction of future LULC dynamics is a complex process because it involves both environmental and socio-economic factors. This shows that the LULC dynamics are triggered by both natural and anthropogenic factors. Therefore, both factors are responsible for the LULC dynamics (Keshtkar and Voigt 2016).

Simulation based LULC can simplify and provide better insights into potential future developments (Omar et al. 2014). Hence, predictions of the future LULC dynamics at a basin scale is crucial for management of land resources, improvement of eco-environment and sustainable development of water resources of the basin (Wang et al. 2014). The prediction of forthcoming LULC dynamics in areas where the economic condition depends upon agriculture (e.g. Ethiopia) has a profound impact on land management, restoration of water resources, monitoring of vegetation cover dynamics and decision making on land use system (Corgne et al. 2003). For instance, quantitative data on where, when and why land-cover changes take place globally is not well addressed (Lambin 1997). However, geo-spatial technologies (e.g., Remote sensing and GIS) based predictions of future LULC provide the knowledge of how much, where, and what type of (LULC) change has occurred (Weng 2002). This is achieved through a combination of LULC dynamics models with remote sensing and GIS are used to predict the future LULC change in the functioning of the earth system (e.g., Conversion of Land Use and its Effects at small scale modeling framework (CLUE), Markov chain, Clue-S model, CA_Markov, Agent-based model (ABM), and grid-based LULC (GeoMoD)) (Veldkamp and Lambin 2001). However, all of these models rely on a limited number of theories and methods (Eric et al. 2007; Veldkamp and Lambin 2001) except the CA_Markov model. For instance, the efficiency of Clue-S model is not satisfactory and it has to rely on the results from other auxiliary software (Li et al. 2015). Besides, it is not policy sensitive and cannot easily incorporate a range of policy variables that might be of interest in predicting the impacts of various land use policies (Iacono et al. 2015). Similarly, CLUE use the logistic regression model to run in a statistical program, and the temporal dynamics of LULC in this model controlled by the conversion metrics and the model did not offer any method to improve the realism of simulated landscapes by reproducing the spatial patterns (Mas et al. 2007). In addition, the Markov model is limited to provide information about the spatial distribution of LULC dynamics. Furthermore, the GeoMoD model uses exactly two categories, can simulate only the transition from the first category to the second category, and cannot simulate an additional simultaneous transition of the second category to the first (Pontius and Malanson 2005).

In contrast, the cellular automata and Markov Chain Analysis (CA_Markov) is found to be the most universal and effective model in predicting the future (short-long term) LULC dynamics under various scenarios (e.g., Socio-economic, & physical) because it generates an improved spatial pattern of each LULC category than other models (e.g., Clue-S, GeoMoD, and Markov chain). CA_Markov is an expert driven process that spatially allocates expected categorical LULC by using categorical suitability maps (Paegelow et al. 2014). It also allows any number of categories and can simulate the transition from any category to any other category (Pontius and Malanson 2005). This model has involved three major processes such as Markov chain, Cellular Automata, and validation. Furthermore, the model is strong due to its dynamic simulation capability, high efficiency, simple calibration, and ability to simulate multiple land covers and complex patterns spatially and temporally (Memarian et al. 2012; Regmi et al. 2014). The basic principle of the CA_Markov model is that, the cellular state of the next moment is a function of the neighboring cellular’s present state (Wang et al. 2014). The model has also been widely applied successfully in sub-tropical and tropical areas to forecast the distributions of future LULC (Rendana et al. 2015). Predicting the future LULC dynamics using the CA_Markov model may help to diminish the impacts of vegetation deterioration, loss of biodiversity and soil erosion (Ghosh et al. 2017; Verburg et al. 2004) and to use as a baseline for water quality assessment, hydrologic modeling, and climate change study.

The CA_Markov is a hybrid model that consists of the concept of both cellular automata (CA) and Markov Chain (Hyandye and Martz 2017; Mondal et al. 2015; Zhilong et al. 2017). The CA is a set of identical elements, called cells, each one of which is located in a regular and discrete space (Fan et al. 2008). The basic principle of CA is that land use change for any location (cell) can be explained by its current state and changes in its neighboring cells (Parsa et al. 2016). This integrates transition rule dependency on neighboring cells. Transition rules are expressed by probabilities considering only present state. Moser et al. (2013) noted that the Markov model represents a wide and general family of stochastic models for the temporal and spatial dependence properties associated with 1-D and multidimensional random sequences or random fields. Combining both CA and Markov Chain (CA_Markov) is more accurate and logical for predicting the future LULC change for a certain domain (Omar et al. 2014). The CA_Markov model computes the state of a pixel based on its initial state, the conditions in the surrounding pixels, and a set of transition rules (Verburg et al. 2004). This underlies the dynamics of the change events based on proximity concept so that the regions closer to existing areas of the same class are more probable to change to a different class (Memarian et al. 2012). The most important feature of CA_Markov is to predict the complex dynamic temporal and spatial patterns through a set of transition rules (Behera et al. 2012; Fan et al. 2008). The model helps to resolve LULC dynamics related issues because it works based on transition probabilities and uses suitability maps analyzed from the MCE for each LULC category to generate reliable projections for the future. Iacono et al. (2012) reported that one of the most desirable qualities of the Markov chain model is its simplicity and ability to describe the complex and long-term process of land use conversion in terms of simple transition probabilities, making it a potentially useful sketch-planning tool. The amount of LULC changes calculated in Markov chain analysis using the transition potential maps were used to predict the future LULC (Shooshtari and Gholamalifard 2015).

Nowadays, the rapid changes in LULC can lead to land degradation and environmental problems due to anthropogenic activities like agricultural land expansion, deforestation, urbanization and tourism (Rendana et al. 2015). Scenario-based predictions of LULC will support policy makers and other stakeholders to examine past, current, and future effects of land cover change on ecological and socio-economic processes to set appropriate intervention measures in order to diminish its effects (Sohl and Sleeter 2012). Therefore, improved methods of LULC prediction are required to assess and project the future LULC change in the functioning of the Earth System (Lambin et al. 2001). Iacono et al. (2012) stated that stochastic models used to simulate and explore the dynamic land use change. The model combines both cellular automata (CA), Markov Chain, Multi-Criteria and Multi Objective Land Allocation (MOLA) to detect the dynamics of future LULC changes. Models that predict future land cover pattern within different time scales can support the generation of plausible scenarios for assessing land cover conditions under a range of assumptions (e.g. Rates and patterns) (Serneels and Lambin 2001; Verburg et al. 2004).

Ethiopia is highly vulnerable to natural and anthropogenic induced environmental changes (e.g., LULC). In recent decades, the changes caused by anthropogenic forces occur at a faster pace than natural variations due to population growth (Keshtkar and Voigt 2016). Humans have thus largely influenced the earth environment by changing LULC (Singh et al. 2015). The challenge is severe in the highlands area where the study site is located. The 1984–2015 LULC dynamic history shows that the study area is experienced a significant shrinking in the water body and grasslands and increases in croplands, barren lands, built-up area, forest and shrublands. Currently, the future LULC type in the study area has not been adequately studied. As a result, the farming system of smallholders’ farmers may be considerably affected in the future due to the changes that will be occurring in their land use system. Weng (2002) reported that the application of stochastic models to simulate the LULC dynamic in developing countries (e.g., Ethiopia) is rare. Therefore, there is a need to integrate the techniques of remote sensing, GIS, and Markov for monitoring and modeling LULC changes. Spatially explicit modeling of future LULC dynamics is thus important for describing processes of change in quantitative terms and evaluating the magnitude, pattern, and type of LULC changes (Serneels and Lambin 2001; Weng 2002). Only a few studies have combined both physical and socio-economic factors into the CA_Markov to effectively predict the LULC (Sayemuzzaman and Jha 2014). For example, Wang and Murayama (2017) employed CA_Markov in Tianjin-northeastern China; Kityuttachai et al. (2013) in Thailand; Keshtkar and Voigt (2016) in central Germany; Gashaw et al. (2017) in the Andassa watershed, Blue Nile Basin–Ethiopia. However, the majority of them did not include core LULC dynamic factors.

Hence, there is a need to integrate both socio-economic and physical factors to improve the prediction of future LULC dynamics of short scales because the CA_Markov model is highly efficient at short period time scales (10–20 years) than long term scales. Therefore, predicting the future LULC at different time scale (e.g., Short & long term) can be useful for policy makers, land use planners, environmentalists, conservation planning, and practitioners for better understanding and mitigating the effects felt from the potential modifications and/or alterations of the predicted LULC that might happen in the near future policy intervention (Halmy et al. 2015; Roy et al. 2014; Verburg et al. 2006). This may help to ensure sustainable land management and agricultural development (Bewket and Abebe 2013). It is, therefore, the aim of this study is to predict and analyze the future scenarios of LULC changes by integrating both physical and socio-economic factors using the cellular automata and Markov Chain (CA_Markov) from 2015 to 2033. The findings of this study are vital to foster better decisions and improve policies in land use policy within the framework of sustainable land use planning in relation to the future likelihood of changes or development.

Materials and methods

Study area

This study was conducted in Raya and its environs (Northern Ethiopia) which is—an intermountain plain area located at 39°24′40′′ and 40°25′20′′ longitude Easting and 12°7′20′′ and 13°8′0′′ latitude Northing (Fig. 1) (Gidey et al. 2017). It consists of 11 districts, namely Megale, Yalo, Gulina, Gidan, Kobo, Alaje, Alamata, Hintalo Wejirat, Ofla, Endamehoni, and Raya Azebo. The total area coverage of the study area is estimated at 14,532 km2 of which 48% falls in the southern Tigray region, 22% in Amhara and (30%) in the Afar region (Gidey et al. 2017). The area receives up to 558 mm of rainfall annually (Gidey et al. 2017). Rainfall is erratic and bimodal (Ayenew et al. 2013) in the area. During the last 33 years, the maximum (Tmax) and minimum temperature (Tmin) were 30.5 and 15.9 °C, respectively. The study area consists of four river basins such as Denakil basin, which covers about 10265.8 km2 (70.64%), Lake Ashinge 16.0 km2 (0.11%), Abay (Blue Nile) 13.2 km2 (0.09%), and Tekeze 4237.0 km2 (29.16%) (Gidey et al. 2017). The mean elevation value of the area is 1762 meters above sea level (m.a.s.l). Similarly, the slope of the study area is ranging from 0% (flat) to 395.3% (very steep slope). Eutric cambisols is the predominant soil type in the area covering about 4667.1 km2 or 32.1%, while dystric gleysols covers only a small portion i.e., 1.1 km2 (0.001%), respectively (Gidey et al. 2017).

Fig. 1
figure 1

Source: Gidey et al. (2017)

Location map of the study area.

Data acquisition

Earth observation data

Predictions of future LULC dynamics require a substantial amount of earth observation data to conduct an effective analysis (Araya and Cabral 2010; Keshtkar and Voigt 2016). For instance, earth observation data sets serve as a great source of data, from which updated land-cover maps and changes can be analyzed and predicted (Keshtkar and Voigt 2016). Hence, the LULC data of this study were prepared from the Landsat Thematic Mapper (TM) and Operational Land Imager (OLI) path 168/169 row 051/052 for the year 1984, 1995, and 2015. Gidey et al. (2017) reported eight major LULC types, which include cropland (Cl), forestland (Fl), shrub/bush land (Shl), built-up area (Bu), water bodies (Wb), grassland (Gl), barren land (Bl) and floodplain (deposition) areas (Fp). These data sets were used as a baseline for predicting the future LULC dynamics of 2033.

Socio-economic data

In this study, socio-economic data such as population, road and river were gathered from the Central Statistical Agency of Ethiopia. These data sets were used as one of the socio-economic driving forces of LULC dynamics to prepare the suitability maps for running the CA_Markov model.

Data processing and analysis

Markov chain model

Markov chain is a stochastic model that predicts the probability of LULC change from one state to another state by taking into account the past LULC change trend at different spatio-temporal scales. In another ways, Markov chain is just a series of random values whose probabilities at a time interval depend on the value of the number at the previous time (Surabuddin Mondal et al. 2013). The output of this model is based on the probability of transition (Adhikari and Southworth 2012). The transition probability matrix of LULC change from time one to time two, which will be the basis for projecting to later time periods (Surabuddin Mondal et al. 2013). The probability matrix is a set of conditional probabilities for the cells in the model to go to a particular new state (Akin et al. 2014). This model can be used as a basis to predict how a particular LULC change over time (Fan et al. 2008; Hyandye and Martz 2017; Iacono et al. 2015; Subedi et al. 2013; Mandal 2014). Currently, several studies are using the Markov analysis to simulate land use change over different types of landscapes (Halmy et al. 2015). The Markov model predicts the quantities of each LULC type or the dynamic changes of LULC pattern, but it is not good at dealing with the spatial pattern of landscape change (Li et al. 2015), because it does not provide spatial distribution of the change, which is highly imperative in understanding the potential impact of the projected changes (Halmy et al. 2015). The basic principle of the Markov chain model is that land use at some point in the future \((t+1)\) can be determined as a function of current land use (t) (Iacono et al. 2015). Coppedge et al. (2007) reported that the LULC change for any particular location may not be a random, but it depends upon previous or current land use situation. In this study, the Markov chain model was used on the base of the following mathematical expression as used worldwide (e.g., Subedi et al. 2013) (Eqs. 1, 2):

$${L_{(t+1)}}={p_{ij}} \times {L_{(t)}}$$
(1)

and

$${p_{ij}}=\left[ {\begin{array}{*{20}{c}} {{p_{11}}}&{{p_{12}}}& \cdots &{{p_{1m}}} \\ {{p_{21}}}&{{p_{22}}}& \cdots &{{p_{2m}}} \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ {{p_{m1}}}&{{p_{m2}}}& \cdots &{{p_{mm}}} \end{array}} \right]$$
(2)

where \({L_{(t+1)}}\)and \({L_{(t)}}\) are the LULC status at time \(t+1\)and t, respectively.\(0 \leqslant {p_{ij}}<1\) and\(\sum\nolimits_{{j+1}}^{m} {{p_{ij}}} =1\), \((i,j=1,2,...,m)\) is the transition probability matrix.

In this study, the Markov chain analysis was applied to assess the transition matrix among the 1984–1995, 1995–2015 and 1984–2015 LULC dynamics and the probabilities of change. Some of the Markov approaches employed in this study are presented as follows: First, the LULC of 1984, 1995, and 2015, which was classified in the Earth Resources Data Analysis System (ERDAS) imagine v. 2014 remote sensing software was imported to fit with IDRISI–TerrSet Geospatial Monitoring and Modeling System. The file formats of all LULC images were then converted from image (.img) into tiff (.tiff) to run the Markov chain model. In the Markov chain model, the earlier (e.g., 1984) and later (e.g., 1995) LULC distribution data were defined as a baseline and then the number of time-periods between 1984 and 1995 were defined. Similarly, the number of time-periods to project forward from the second image, i.e., 1995–2015 was set to be 20 years to produce LULC dynamic scenarios. A background cell option of 0.0 and proportional error value of 0.15 was then assigned to generate the predicted LULC of 2015 at the 85% level of accuracy based on the transition probability matrix, transition area file, and a set of conditional probability image which was generated using the Markov model between each LULC type or category. The transition probability matrix is just a text file that consists of the probability of each LULC type change. This transition probability matrix was obtained by cross tabulation of two images of different time period to determine the probability of a pixel in a land-use class to change into another class (Subedi et al. 2013; Iacono et al. 2015). The transition probability matrix was mathematically expressed based on Ghosh et al. (2017) as follows (Eq. 3):

$$\chi =\sum {\frac{{{{(O - E)}^2}}}{E}}$$
(3)

where \(\chi\)= transition probability matrix, \(O\)= observed number of transitions, \(E\)= expected number of transitions.

Similarly, the transition area file is a text file that records the number of pixels that are expected to change from each LULC type to another LULC class over a specified period of time (Adhikari and Southworth 2012). Furthermore, the conditional probability image shows the probability of each LULC type found at each pixel after the specified number of periods. In this study, the transition probability matrix, transition areas matrix and a set of conditional probability images were generated for eight different LULC classes such as cropland (Cl), forestland (Fl), shrub/bush lands (Shl), built-up area (Bu), water body (Wb), grasslands (Gl), barren lands (Bl) and floodplain area (Fp). Fan et al. (2008) reported that the control factors in a Markov chain model are the transition probability, which are a conditional probability of the system to go to a particular new state, given the current state of the system. The actual LULC of 2015 was then used for simulating the 2033 LULC because the LULC of 2015 is the only recent LULC for the study area.

Cellular automata (CA)

A cellular automata (CA) is a model that has the ability to change and control complex spatially distributed processes. The model provides clear insights into local and global patterns of land cover dynamics that relates the new state to its previous state and those of its neighbors (Surabuddin Mondal et al. 2013; Al-sharif and Pradhan 2014). The CA model has a strong capability in simulating the spatio-temporal characteristics of complex systems (Yang et al. 2014). That is why it has been extensively used as a spatially dynamic model in LULC research (Adhikari and Southworth 2012; Omar et al. 2014). This model can be understood as a dynamic and relatively simple spatial system, in which the state of each cell of the matrix depends on the previous state of the cells enclosed inside a defined neighborhood, in accordance with a set of transition rules (Rocha et al. 2007). Therefore, the CA model is capable enough to predict the spatial distribution of the LULC pattern and its dynamics because it adds the spatial properties of LULC. This model does not only use the information of the previous state of a land-cover as done by a Markov model, but also uses the state of neighboring cells for its transition rules (Adhikari and Southworth 2012). The CA model serves as an analytical engine that enables dynamic modeling within GIS (Ye and Bai 2008) and remote sensing environment. Despite its advantages, the CA has some problems in the definition of transition rules, and model structure (Rocha et al. 2007). As a result, it cannot predict the LULC dynamics. This shortcoming of the technique can be overcome through the integration with other different dynamic and empirical models (Halmy et al. 2015) such as CA_Markov. The CA model was mathematically estimated (e.g., Sang et al. 2011) as follows (Eq. 4):

$$S\left( {t,t+1} \right)=f\left( {S(t),N} \right)$$
(4)

where S is the set of limited and discrete cellular states, N is the Cellular field, t and t + 1 indicate the different times, and f is the transformation rule of cellular states in local space.

Cellular automata and Markov chain model (CA_Markov) integration

The cellular automata (CA) and Markov chain model is a dynamic model in time and state. This model is robust in predicting the transitions or spatial and temporal dynamics among a number of LULC types. The CA_Markov model has been extensively used in many scientific studies to predict the future LULC because it integrates the advantage of cellular automata and the Markov chain element of spatial contiguity as well as knowledge of the likely spatial distribution of transitions (Arsanjani et al. 2011; Eastman 2003; Li et al. 2015; Mas et al. 2007). That is why the CA and Markov chain model depend on each other (Omar et al. 2014) to predict the future LULC effectively. This model is capable of generating a better spatiotemporal pattern of the LCLU change (Sayemuzzaman and Jha 2014). This study, therefore, applied the CA and Markov chain model together to predict accurately the future likelihood of LULC dynamics in both spatial and temporal domain. In this study also, the CA_Markov model was also used to simulate the long-term dynamics of LULC (2015–2033) based on the past land cover patterns supported by the driving force both in temporal changes and spatial distribution using the IDRISI-TerrSet Geospatial Monitoring and Modeling System software. The prediction of future LULC using the CA_Markov model was supported by the transition probabilities controlled by local rules. To run the model appropriately, the CA_Markov require three types of data sets such as the base land cover image (e.g., LULC of 2015), Markov transition areas file generated by the Markov chain model, and the transition suitability images collection, which was prepared using the Multi-criteria evaluation (MCE) module of TerrSet. Likewise, a standard contiguity filter of 5 \(\times\) 5 was used to define the neighborhoods of each cell and to generate better results than other contiguity filter (e.g., 3 \(\times\) 3) and to create spatially explicit contiguous weighing factors. The pixels that are far from the existing LULC class have lower suitability than the pixels that are near (Subedi et al. 2013).

CA_Markov parametric selection and analysis

The CA_Markov model assumes that each parameter or factor of LULC dynamics will be persistent to operate as before. Eastman (2012) reported that one of the basic factors that can trigger the dynamics of LULC events is proximity (e.g., Proximity to road, river, etc...). The physical closeness to an existing LULC class is likely to be a driver of change to this class in the future (Halmy et al. 2015). Proximity to major road is thus one of the best indicators of LULC dynamics because population residing along the road can expand their settlements, and/or clear forests, shrubs or bushes in time and space at various scales, either to enlarge their croplands or fuel wood collection and charcoal production. This type of practice is very common in all districts in the study area. Halmy et al. (2015) reported that the LULC change drivers often include an increase in population, distance to roads and other factors. Therefore, this study considered the major physical and socio-economic parameters such as population density, elevation, rainfall, slope, LULC type, proximity to road and river (Table 1) to prepare the transition suitability map. The CA_Markov factors can be selected based on existing literature, analysts, or group of the expert’s knowledge (López–Marrero et al. 2011; Hadi et al. 2014). The main reason is that there is no consistent standard for defining the suitability level of each LULC factor. However, in this study, expert opinions, knowledge of the researcher, and literature was used to determine their suitability level.

Table 1 Factors, membership function types/shapes, and control points used for LULC suitability map development (Improved from Akin et al. 2014; Alimi et al. 2016; Araya and Cabral 2010; Keshtkar and Voigt 2016; Khoi and Murayama 2010; Luo et al. 2015; Owusu et al. 2017)

For instance, the slope gradient was computed from ASTER DEM 30 m \(\times\) 30 m spatial resolution and then reclassified in ArcGIS 10.4.1. Similarly, the population density was analyzed from the total population in the study area. The ultimate reason to consider population density as a factor was due to the fact highly populated and denser areas can increase the deficit of food. Moreover, the Euclidean distance function was applied to estimate the proximity from the road and rivers based on the closest cell and then each layer was reclassified in ArcGIS 10.4.1 software. The two extreme values such as low and high were analyzed from the Euclidean distance to use as input for the fuzzy set membership analysis. Omar et al. (2014) and Rocha et al. (2007) reported that the distance values should then standardized to the continuous suitability scale (0–255) through a fuzzy approach both in linear and sigmoidal functions. Accordingly, this study standardized the values of each factor and constraints using the fuzzy standardization in IDRISI-TerrSet Geospatial Modeling and Monitoring System software developed by Clark Labs at Clark University for the analysis of geospatial information.

CA_Markov model calibration, fuzzy standardization and model execution

Model calibration is the process whereby the scientists help to select parameters and advance the goodness of fit of the model (Pontius and Malanson 2005). This study calibrates and standardizes both the factors, and constraints to develop a suitability map for the CA_Markov model (Fig. 2). The term factors signify the criteria or continuous images that are applied to consider the physical and socio-economic parameter (e.g., Elevation, slope, rainfall, population density). Whereas, the constraint is expressed in the form of Boolean (logical) map i.e., 0 and 1 to limit or exclude some of the LULC types (e.g., Lake) which are believed to be changed rarely into other LULC type such as Built-up area. In this analysis, the excluded areas were coded with 0 and those open for consideration were coded with 1. Hence, each factor and constraints were well calibrated and standardized in IDRISI-TerrSet Geospatial \({r_i}=\)Monitoring, and Modeling System. The fuzzy standardization evaluates the fuzzy set membership values (possibilities) of data cells based on any of the three membership functions such as sigmoidal, j-shaped, and linear (Eastman 2003). Likewise, the fuzzy membership functions are used to standardize the criterion scores or to rescale the factors into 0–255 in byte or (0.0 to 1.0) in real, where 0 represents unsuitable (or less suitable) and 255 signifies the most suitable (Mishra et al. 2014; Omar et al. 2014; Keshtkar and Voigt 2016). Moreover, the MCE computes the Boolean analysis, Weighted Linear Combination (WLC) or Ordered Weighted Averaging (OWA) of factors (Eastman 2003). In this study, the Boolean analysis was employed for the constraints (e.g., Lake) only while for other factors (e.g., Elevation, slope, and proximity to the road) the Weighted Linear Combination (WLC) were considered. The standardized factors of WLC express a perspective of suitability: the higher the score, the more suitable for the specific LULC and there is no real threshold, however, that allows the definitive allocation of areas to be chosen and excluded (Jiang and Eastman 2000). Besides, the Weighted Linear Combination (WLC) was applied by assigning weight to each factor, followed by a summation of the results to yield a suitability map as follows (Eqs. 5 and 6):

Fig. 2
figure 2

Schematic diagram for simulating the future LULC using CA_Markov mode

$$S=\sum {w_i}{x_i}$$
(5)

where \(S=\)suitability, \({w_i}=\)weight of factor i, \({x_i}=\) criterion score of factor i

$$S=\sum\limits_{{i=1}}^{n} {} {w_i}{c_i}\prod\limits_{{i=1}}^{n} {} {r_i}$$
(6)

where \(S=\)suitability map of each factor, \({w_i}=\)weight of each factor, \({c_i}=\)criteria, restrictions or constraints.

The weight of each LULC driving factor was determined through the pairwise comparison in the Analytical Hierarchy Process (AHP). The model has a unique advantage when the quantification and comparison of the important variable is complex (Keshtkar and Voigt 2016). Hadi et al. (2014) applied an equal weighting method for weighting the factors because each suitability maps were weighted during the process of the fuzzy standardization. However, in this study, argue the process because each factor of LULC cannot be contributing equally and in this study the weight of each factor was determined based on their percent of influence. Some literatures were reviewed to determine the weight as well as the degree of suitability. The Analytic Hierarchy Process (AHP), which is a mathematical method of analyzing complex decision problems with multiple criteria evaluation, was applied to estimate the relative weight of each factor (Table 2). The highest weight value is that the most influential factor, while the lowest is a less important factor in LULC change. The consistency ratio of the overall factor is 0.09, which is acceptable range. The overall process of factor definition and model implementation is shown in Fig. 2.

Table 2 The eigenvector of weights of each factors considered in this study
CA_Markov model validation

At this time, scientists need a better and larger set of tools to validate land use and land cover dynamics models, because it is essential to know a model’s prediction accuracy (Pontius and Schneider 2001). Some scientists prefer models that express the theory of the mechanisms of the processes of land change, while others place more weight on a model’s ability to extrapolate the observed pattern of change based on past empirical patterns (Pontius and Malanson 2005). The usefulness of LULC models has thus measured by the accuracy of the model output (Paegelow et al. 2014). This process provides several statistics for measuring the similarity between two qualitative images by including specialized kappa measures that discriminate between errors of quantity and errors of location (Eastman 2003). In this study, the kappa statistical validation tool was applied to evaluate the goodness or reliability of the projected 2015 LULC with the actual or observed 2015 LULC.

Analysis of the predicted LULC relative change

Gidey et al. (2017) reported that the LULC change detection and analysis integrate a wide range of methods to estimate the differences between two classified images. In this research, the predicted LULC changes were analyzed by applying the theories of relative change i.e., Area of a single LULC before and after as follows (Eqs. 7 and 8):

$$C=(\Delta f - \Delta i)$$
(7)
$$C\left( \% \right)=(\Delta f - \Delta i)/\Delta i \times 100$$
(8)

where \(C\)= total relative change, \(C(\% )\)= LULC relative change in percentage, \(\Delta f\)and \(\Delta i\) = total area coverage of final and initial LULC.

Furthermore, the annual rate of LULC relative change was statistically estimated computed as follows (Eq. 9):

$$C=(\Delta f - \Delta i)/\Delta i \times \frac{1}{T} \times 100$$
(9)

where \(C\)= annual relative change of each LULC type, \(\Delta f\)and \(\Delta i\) before (final) and after (initial) area coverage of each LULC type, and \(T\)= time-period (interval) between initial and final.

Analysis of the gains and losses of predicted LULC dynamics

In this study, the gain and losses of the predicted LULC category were analyzed quantitatively to measure the dynamics as follows (Eq. 10):

$$\left[ \begin{gathered} {p_{loss(i),j}}=({p_{j,i}} - {p_{i,j}})/({p_i} - {p_i}) \times 100\mathop {}\limits^{{}} \mathop {}\limits^{{}} \mathop {}\limits^{{}} i\# j \hfill \\ {p_{gain(i),j}}=({p_{i,j}} - {p_{j,i}})/({p_i} - {p_i}) \times 100\mathop {}\limits^{{}} \mathop {}\limits^{{}} \mathop {}\limits^{{}} i\# j \hfill \\ \end{gathered} \right]$$
(10)

where \({P_{loss(i),j}}\)is the percentage taken by \(_{j}\) LULC in the total “conversion loss” of category row\(_{i}\); \({P_{gain(i),j}}\)is the percentage taken by \(_{j}\) in the total “conversion gain” of category row\(_{i}\),\({p_{i,j}}\) and \({p_{j,i}}\).

Results and discussions

Analysis of the Markov chain transition probability and area file matrix (cell)

The transition probability matrix, which is the most important factor of a Markov chain model from the periods 1984–1995, 1995–2015, and 1984–2015 is shown in Table 3. For example, the row classes in Tables 3 and 4 shows the previous LULC types and the columns represent the newer or projected LULC categories. In each of the transition matrices, the diagonal values represent the probability that each land cover class remains persistent from time 0 to time 1 (Halmy et al. 2015). This transition probability matrix signifies the dynamics from one LULC category to every other LULC type or the possibilities that a cell of each LULC type changes into any other category during that period. Surabuddin et al. (2013) reported that a given parcel of land theoretically might change from one category of land use, to any other, at any time. This transition probability matrix expresses the probability of changing or the chances that a pixel or pixels of any given class will change to any other class (or stay the same) in the next period (Eastman 2003). The possible reason is that LULC dynamics is not unidirectional in nature; a given land cover type might theoretically change from one category of LULC to any other (Han et al., 2015). Luo et al. (2015) studied the 1990–2007 LULC to generate a transition probability matrix which was used to predict the landscape patterns of 2020 in the inland river delta of Central Asia. Similarly, Halmy et al. (2015) applied the historical land use data where the past land transformation and transition is assessed to predict the future LULC in the northwestern coastal desert of Egypt. Furthermore, Keshtkar and Voigt (2016) reported that transition potentials were computed based on the historical land-cover conditions during the periods 1990–2000 and 2000–2010 to show how each land-cover was projected to change in central Germany. This study also used the historical LULC of 1984–2015 to generate the transition probability matrix and areas files, which are the most important parameters in analyzing the future LULC. The transition area files give an indication of how much of the cells or pixels that change from one category to another (e.g., Shrub/bush lands–croplands).

Table 3 Probability of LULC dynamics based on Markov transition matrix from 1984 to 1995, 1995 to 2015, 1984 to 2015 in the study area

Table 4 reveals the transition area file matrix that the total number of pixels, or areas in cells expected to change in the next time-period. A pair of LULC (e.g., 1984–1995, 1995–2015, and 1984–2015) was used to generate the transition area matrix and it depicts how each LULC was projected to the change. In this study, the transition area files of the year 1995–2015 were used to predict the 2015 LULC. This LULC was then used to validate the reliability of CA_Markov with the actual LULC of 2015. Whereas, the transition area file of the year 1984–2015 was applied to forecast the LULC of 2033</tb>

Table 4 Markov transition area file matrix of various LULC from 1984 to 1995, 1995–2015, 1984–2015 in the study area

Fuzzy standardization and suitability maps derivation

CA_Markov is an expert driven process that spatially allocates the expected LULC by using categorical suitability maps (Paegelow et al. 2014). Figure 3 depicts the physical and socio-economic suitability maps of each factor. These suitability maps were used as input to produce the transition suitability image and to carry out an integrated analysis in the CA_Markov model. Yang et al. (2014) reported that the transition potential image is useful to control the spatial distribution of LULC. As a result, each factor was separately converted into a relative scale to correspond to values from 0 to 255 using the fuzzy standardization tool (Poska et al. 2008). A pixel value of 0 shows that the pixel is unsuitable (least suitable), while pixel values of 255 represent suitable for a particular LULC. Subedi et al. (2013) reported that socio-economic factors are prime drivers of LULC change, however, only physical factors were considered in their study. This study, therefore argues that considering physical factors alone is not adequate to predict the future LULC dynamics using the CA_Markov model. One of the possible reasons could be socio-economic factors. For example, population density could be the prime triggering factors of LULC dynamics because highly populated areas may trigger shrinking of water bodies, forestlands, and shrub/bush lands. Therefore, absence of the socio-economic drivers may not give better results in predicting the LULC. A combination of physical and socio-economic data and other relevant parameters can thus certainly improve the accuracy of future LULC simulation (Arsanjani et al. 2011; Yang et al. 2014). Behera et al. (2012) observed that both the physical and socio-economic drivers, including residential/industrial development, road–rail and settlement proximity influenced the spatial pattern of the watershed LULC, leading to a creative linear growth of settlements and agricultural areas in Choudwar watershed, India. Similarly, this study incorporates the physical and socio-economic factors to predict the future LULC dynamics to inform proper and sustainable land use planning system. The main reason is that land-use planning is one of the most important policy instruments that may be used for the conservation of natural resources and proper management of land parcels for various uses (López–Marrero et al. 2011; Woodcock et al. 1983). Furthermore, studies have shown that areas with no access roads are less likely to be disturbed by human intervention due to lack of access.

Fig. 3
figure 3

Standardized physical and socio-economic factors of LULC in the study area. a Elevation, b population density, c proximity to stream, d proximity to fl, e slope, f proximity to shl, g proximity to fp, h proximity to bl, i proximity to gl, j proximity to bu, k long term mean annual rainfall, l proximity to cl, m Proximity to road, n transition suitability image, o proximity to lake (constraints)

Prediction and change analysis of future LULC

Figure 4a, b shows the historical (LULC 2015) and predicted (LULC 2033) in the study area. The predicted LULC 2033 result reveals that eight major LULC types e.g., Cropland (Cl) 6153.38 sq km (42.3%), forestland (Fl) 352.88 Sq km (2.4%), shrub/bush lands (shl/bu) 4257.38 (29.3%), built–up area (Bu) 879.11 sq km (6%), water body (Wb) 41.03 sq km (0.3%), grasslands (Gl) 238.13 sq km (1.6%), barren lands 2114.8 sq km (14.6%), and floodplain area 495.5 sq km (3.4%) are well identified at different spatial extents (Tables 5, 6).

Fig. 4
figure 4

Spatial distribution of the historical LULC 2015 (a) and predicted LULC 2033 (b) in the study area

Table 5 Expected change of future land use and land cover (2033) type of the study area
Table 6 Annual rate of relative future LULC change in Sq km

Eva et al. (2006) stated that the main LULC dynamics have been the conversion of natural vegetation into agricultural lands. However, it is not only natural resources change into agricultural lands. For instance, Fig. 4 depicts a single LULC type in the study area is projected to convert to a number of LULC types by 2033. The LULC conversion may lead the area into the increasing intensity of Urban Heat Island (UHI). The study area is anticipated to experience a significant LULC change during the period 2033. Figure 4a–b shows the historical (LULC 2015) and predicted (LULC 2033) in the study area. The predicted LULC 2033 result reveals that eight major LULC types e.g., cropland (Cl) 6153.38 sq km (42.3%), forestland (Fl) 352.88 Sq km (2.4%), shrub/bush lands (shl/bu) 4257.38 (29.3%), built–up area (Bu) 879.11 sq km (6%), water body (Wb) 41.03 sq km (0.3%), grasslands (Gl) 238.13 sq km (1.6%), barren lands 2114.8 sq km (14.6%), and floodplain area 495.5 sq km (3.4%) are well identified at different spatial extents (Tables 5, 6).

Rendana et al. (2015) reported that the cellular automata and Markov Chain analysis model were employed to predict the future LULC type and a significant LULC change has been observed in 1997–2014 (18.95%), and 2014–2020 (3.66%). In addition, the author reported that open water 80.37 ha (0.54%), mixed agriculture 501.02 ha (12.24%), open land 499.95 ha (5.47%), and built up areas 119.88 ha (0.85%) are expected to increase in 2020. Sayemuzzaman and Jha (2014) observed that nearly 7% agricultural land was expected to decrease in 2030 when compared with 2001 data and no significant changes were observed for water body and other land category coverage in North Carolina. Furthermore, Yulianto et al. (2016) reported a decrease in forest 10.52 ha, dry land 13.22 ha, paddy fields 14.49 ha, and shrubbery 1.15 ha per annum, respectively, while bare soil 6.79 ha, plantation 11.14 ha, settlement 11.49 ha, and water body 9.7 ha have been predicted in the Tondano watershed, North Sulawesi–Indonesia. Likewise, Gashaw et al. (2017) observed that cultivated land in Andassa watershed is anticipated to rise from 76.8% in 2015 to 83.3% during the period of 2030. Besides, the author stated that the rapid expansion of the built-up area is projected to grow from 1.1% during the period of 2015 to 2.0% in 2030, respectively. These expansions are largely due to the reduction of potential land cover types (e.g., Forest land, shrub land, and grasslands), which are helpful to diminish the adverse effects of climate extremes such as drought and floods. However, this study observes different trends in forest, shrub/bush lands, and grasslands in the major LULC types of the Raya, Ethiopia. For instance, the forestland, shrub/bush lands, and grasslands are projected to expand at an annual growth rate of 6.0 sq km (2.47%), 39.4 (1.11%), and 1.7 sq km (0.83%), respectively (Table 6). Besides, the built-up area is anticipated to develop annually at a rate of 15.9 sq km (2.68%). However, croplands which was the leading LULC type is now expected to shrink annually at a rate of 1.7 sq km 4.4 (0.07%), water body 0.3 sq km (0.62), barren land 44.4 sq km (1.52%), and floodplain areas 14.0 sq km (1.87%), respectively. The possible reason for shrinking the cropland might be due the attention given by the local/federal government for area closure to improve the natural resources (e.g., Forest and shrub/bush lands). Hadi et al. (2014) stated that vegetation cover is projected to decrease by 45.11 Sq km (30.34%) during 2030 in Tikrit–Iraq. The author also reported that this reduction might contribute to the eco-environmental degradation in the area. The changes in LULC therefore cause a significant environmental effect such as decrease in rainfall, the increase in surface temperature, and land degradation which can contribute to contribute to the occurrence of drought and famine (Ildoromi and Safari Shad 2017).

Yang et al. (2015) noted that the current spatial pattern of land use is similar to the historical pattern, and that land cover observes during 1954–2005 was also observed in the 1930s. This study has also agreed that the dynamics observed in the study area were driven mainly by the historical LULC change trends. Lopez et al. (2001) applied Markov transition matrices to predict LULC change in Morelia city, Mexico and reported that the city shows a fast growing from 709 ha in 1960 to 3368 ha in 1990, followed by plantations and cropland, while the grasslands and shrub land are the least stable categories. Furthermore, the historical LULC is subject to changes in various cover patterns at different magnitudes (Fig. 5). For instance, the forestland in the study area is expected to moderately shift into cropland (16.92 sq km), built-up area (1.02 sq km), barren land (0.01 sq km), grassland (15.06 sq km), water body (0.08 sq km), shrub/bush (7.62). Similarly, other LULC types are also anticipated to shift into various cover types. However, it is projected to remain as forest cover (203.47 sq km).

Fig. 5
figure 5

Patterns of the LULC 2015–2033

Moreover, this study evaluates the statistical relationships between the baseline, which is the LULC of 2015, and predicted LULC in 2033. The Pearson correlation result indicated that both LULC dynamics were positively and strongly correlated (r = 0.981) and are statistically significant (p = 0.000).

Losses and gains analysis in the future LULC

Figure 6 shows the gain and losses of the predicted LULC. The results indicated that forest lands (Fl) 108 sq km (44.5%), shrub/bush lands (Shl/bu) 710 sq km (20%), built-up area (Bu) 286.2 sq km (48.3%), and grass lands (Gl) 31 sq km (15%) are predicted to increase (gain) from the 2015 LULC coverage. However, significant reductions (losses) are projected to occur in water body (Wb) 5.2 sq km (11.2%), croplands (Cl) 78.9 sq km (1.3%), barren lands (Bl) 800 sq km (27.4%), and flood plain area (Fp) 251.68 sq km (33.7%). Furthermore, the smallholder farmers of the study area reveal that the crop production and productivity is diminishing at alarming rates due to the shrink of croplands and climate change. Conversely, the increase in forest land and reduction in barren and flood plain may benefit the study area to harmonize the climate condition and in improving livelihoods, protecting watershed, mitigating climate change, and land degradation impacts. However, the decrease in the potential water body may contribute to the regularity and severity of drought, which causes significant impacts in both livestock and humans. Besides, the increase in built-up area is an indication of the rapid growth of population and this may remain a challenge unless the environmental friendly policy on land use is implemented to harmonize the demand and diminish the impacts arises from it. The increase in human populations, combined with continuing development, has caused the unprecedented LULC change, and resulted in serious impacts on ecological systems and landscape patterns (Yang et al. 2014). Therefore, environmentally friendly policy on land use is paramount significant to harmonize the demand and reduce the impacts that arises from it.

Fig. 6
figure 6

Analysis of the gain and losses of LULC in the study area from 2015 to 2033

Validation of the predicted LULC

Validation is a process of measuring or comparing the agreement between the predicted and actual or reference data (Verburg et al. 2006; Pontius and Malanson 2005). The reference data is more accurate than the predicted LULC because no model perfectly predicts LULC dynamics. The Kappa statistic index is widely applied to validate between the actual and predicted LULC. Keshtkar and Voigt (2016) reported that the predictive power of the CA_Markov model was successfully evaluated using a Kappa statistic index. Pontius and Malanson (2005) applied the kappa statistic to evaluate the accuracy of the predicted LULC of 2011 and the agreement with the actual LULC map of 2011. This helps to determine how well does a pair of map agree with the quantity of cells and their location in each category? Furthermore, in order to ensure the reliability and/or representativeness of the projected LULC of 2033, the predicted LULC of 2015, and the actual LULC of 2015 were compared using the validation tool in TerrSet. The kappa statistics result reveals that Kappa for no information (Kno: 0.7072), Kappa for location (Klocation: 0.8135) and Kappa for standard (Kstandard: 0.6593) were estimated. This indicated that both the actual and predicted LULC are moderately highly in agreement with the predicted LULC. This level of agreement is acceptable. This reveals that the CA_Markov model is capable of predicting the future LULC patterns successfully and correctly (Halmy et al. 2015; Yang et al. 2014).

Yang et al. (2014) reported KIA for forest 0.736, cultivated land 0.577, construction land 0.534, and water body 0.373 respectively, which are all acceptable. Similarly, Akin et al. (2014) found that about 69% of validation accuracy, which was less than expected due to the radiometric quality of the Landsat 1972 data, the lack of in-situ data for Landsat 1972 and 1986 data, and heterogeneous landscape and complex urban structure of Istanbul among others. However, this study found relatively better findings at \(30 \times 30\) meter resolution. Some of the facts to achieve this result are: 1st all image pre–processing techniques (radiometric, geometric, and atmospheric correction) were applied before conducting any analysis, 2nd adequate in-suit measurements were gathered from the field to appropriately classify, interpreted and analyze the LULC cover types, 3rd post classification techniques were employed at field level.

Conclusions

This study aimed to predict and monitor the future scenarios of LULC (2015–2033) using cellular automata and Markov Chain model (CA_Markov) by taking into consideration the physical and socio-economic drivers of LULC dynamics in Raya and its surroundings, Northern Ethiopia. The historical LULC change data of 1984–1995, 1995–2015, and 1984–2015 were used as a baseline to predict the future LULC 2033. Both transition rules and transition area matrix for the period 1984–1995, 1995–2015, and 1984–2015 were produced quantitatively using the Markov chain model in TerrSet Geospatial Monitoring and Modeling System. The physical and socio-economic factors of LULC change were standardized using fuzzy and then Multi Criteria Evaluation (MCE) was used to produce the transition suitability image based on the suitability (influence) of each factor. The CA_Markov model was then applied as a standard contiguity filter of 5 \(\times\) 5 to predict the 2033 LULC condition. As a result, eight major LULC are identified (e.g., Cropland (Cl) 6153.38 sq km (42.3%), forestland (Fl) 352.88 Sq km (2.4%), shrub/bush lands (shl/bu) 4257.38 (29.3%), built-up area (Bu) 879.11 sq km (6%), water body (Wb) 41.03 sq km (0.3%), grasslands (Gl) 238.13 sq km (1.6%), barren lands 2114.8 sq km (14.6%), and floodplain area 495.5 sq km (3.4%)). In addition, the Pearson correlation result between the historical and predicted LULC trends indicated that there is a positive, strongly correlated, and statistically significant relationship (r = 0.981, p = 0.000). The forest lands, shrub/bush lands, built-up area, and grasslands are predicted to increase (gain) from the 2015 coverage by 108 sq km (44.5%), 710 sq km (20%), 286.2 sq km (48.3%), and 31 sq km (15%), respectively. However, significant reductions (losses) in a water body (Wb), croplands (Cl), barren lands (Bl), and floodplain (Fp) area is projected to occur by 5.2 sq km (11.2%), 78.9 sq km (1.3%), 800 sq km (27.4%), and 251.68 sq km (33.7%), respectively. The increase in forest land and reduction in barren and flood plain may benefit the study area to harmonize the climate condition and in improving livelihoods, protecting watershed, mitigating climate change, and land degradation impacts. However, the decrease in the potential water sources (body) may contribute to the regularity and severity of drought, which causes significant impact to both livestock and human. Besides, the increase in built-up area is an indication of the rapid growth of population and this may remain as a challenge unless environmental friendly policy on land use is implemented to harmonize the demand and diminish the impacts arises from it. This study may help to use as useful benchmarks, to foster better decisions and improve policies in land use within the framework of sustainable land use planning system.