Keywords

Introduction

Landslides, or downward/outward movements of slope forming material such as earth, debris or rock under gravity, are a major natural disaster common to many parts of the world. In addition to location dependent attributes such as land cover, geology, slope angle, surface deposits and hydrology, landslides are usually triggered by external factors such as rainfall. This research is focused on rainfall induced landslides. Rain water infiltration can lead to soil slope instability and weathered bedrock to dislodge bedrock fragments along with the overburden soil.

Historically, prediction of rainfall induced landslides has been performed using rainfall intensities and duration thresholds at locations at risk for landslides. However, the fundamental reason behind slope failure lies in the rise of soil moisture levels due to rainfall or groundwater table surges. Measuring in situ soil moisture at pre-identified sites can be a prohibitive task due to extensive instrumentation requirements and instrument readings can be unreliable. Hence the use of remotely sensed soil moisture in landslide prediction could become a viable alternative in the future. In addition, some location attributes such as land cover can also be derived from remote sensing.

The objective of this research was to evaluate the feasibility of using remote sensing for predicting rainfall induced landslide risk. It is accomplished with statistical modeling of information regarding remotely sensed soil moisture and landslide attributes.

Causes of Landslide Occurrence

For a landslide to occur, the site should contain favorable conditions for slope failure. However, for a naturally occurring slope with such conditions, typically a trigger is also necessary to cause failure.

Location Based Attributes

Many location based factors can create favorable conditions for landslides. Some well known factors such as geology, slope, land cover, hydrology and surface deposits can be used as potential predictors of landslide occurrence (Weerasinghe et al. 2011).

Geology

Geologic factors such as underlying lithology, presence of joints and fractures in the bedrock, amount and direction of dip, etc. have an impact on landslide occurrence (Weerasinghe et al. 2011).

Slope

Magnitude and shape (terraced or straight) of slopes are important factors as the movement of surface deposits in a landslide occurs under the effect of gravity.

Land Use and Land Cover

Land use practices can affect the behavior of a slope under the effect of a trigger. Deforestation, de-rooting and reduction in forested area can expose a slope to surface erosion and increased landsliding potential.

Hydrologic Factors

Since the sustenance of higher levels of soil moisture can create favorable conditions for slope failure, the proximity to water bodies, characteristics of the drainage basin such as shape of the basin, total length of stream channels per unit area in the basin, etc. could affect the occurrence of a landslide.

Surface Deposits

The occurrence of a landslide depends on the properties of the surface deposits which would slide, such as the shear strength of the deposit, the height, unit weight of the deposit, etc. Moreover, the soil formation could provide an indication of landslide potential as well, i.e. presence of colluvium in a slope would indicate past failures and hence a higher failure potential in the future (Bhandari and Thayalan 1994).

Triggering Attributes

Common landslide triggers are discussed below.

Rainfall

The most common landslide triggering factor is rainfall. Use of rainfall intensity and duration thresholds developed through empirical, statistical or process based estimations are commonly used as predictors of landslide occurrence (Berti et al. 2012). Infiltration of rain water into soil causes an increase of soil moisture and the pore water pressure. This leads to a decrease in shear strength which can be explained by the Mohr-Coulomb failure criterion in Eqs. (1) and (2).

$$ \tau_{f} = c + \sigma^{\prime } \tan \emptyset $$
(1)

where \( \tau_{f} \) is the soil shear strength along the failure plane, c is the cohesion, σ′ is the effective stress of the soil perpendicular to failure plane and \( \emptyset \) is the angle of internal friction. The effective stress can be computed using the following equation (Das 2006):

$$ \sigma^{\prime } = \sigma - u_{a} + \chi \left( {u_{a} - u_{w} } \right) $$
(2)

where σ is the total stress of the soil, ua is the pore airpressure, uw is the pore water pressure and χ is the fraction of a unit cross-sectional area of soil occupied by water, which would be zero for dry soil and 1.0 for wet soil.

Based on Eq. (2), it can be seen that with the increase of pore water pressure, the effective stress decreases, thus decreasing the shear strength. This is further illustrated in Fig. 1.

Fig. 1
figure 1

Sequence of events leading to the occurrence of a landslide

Since the reason behind slope failure due to rainfall is the increase of soil moisture, moisture evaluations need to be incorporated in rainfall induced landslide prediction systems. However, measurement of the soil moisture content at the site level can be prohibitive due to high cost and complexity in implementation.

On the other hand, a significant research effort has been devoted to developing soil moisture surrogate measurements which are used in place of the actual soil moisture in the field. These parameters are developed by modeling the physical processes leading to soil moisture increase due to rainfall. Since these measurements are process based, the data requirement can be very high. Thus, in order to apply these models in a practical scenario, assumptions have to be made, which can adversely affect the accuracy of the output. Therefore, this research is specifically focused on the viability of using remotely sensed soil moisture evaluations for predicting rainfall induced landslides.

Other Triggering Factors

Accompanying landslides have often added to the devastation caused by major earthquakes. Vibrations caused by earthquakes can also result in the soil overburden losing its shear strength (e.g. liquefaction), which leads to landslides. Explosive volcanic eruptions as well as the dissolving of gases created by magma in groundwater which weakens the underlying rock could cause landsliding. Furthermore, wildfires can cause damage to flora, which result in de-rooting of slopes exposing them to erosion, thus causing landslides. Moreover, undercutting or over-loading of slopes could promote or even initiate landslides.

Current Methods of Landslide Hazard Assessment

The following five main techniques have been used in the assessment of landslide hazard (Mantovani et al. 1996): (1) Distribution analysis (2) Qualitative analysis (3) Deterministic methods (4) Frequency analysis and (5) Stochastic methods.

Distribution analysis involves direct mapping of historic landslides and thus, it provides information regarding landslide hazard only at the locations of previous failures. In qualitative analysis, the landslide hazard in selected regions is assessed with maps developed for location based attribute severities combined under one subjective rule, which is based on the experts’ opinions. However, fuzzy sets can be applied in order to eliminate the subjective uncertainty arising from experts’ judgment (Weerasinghe et al. 2011).

On the other hand, deterministic methods use a process based approach, (e.g.: slope stability analysis) in landslide hazard assessment. However, this method can be prohibitive due to extensive data requirement. Moreover, it would be difficult to apply this analysis at a regional scale. Landslide frequency analysis involves assessment of landslide risk due to a trigger based on its pre-determined threshold value. However, it should be noted that since the location based attributes of landsliding are generally not considered, this estimation would be site specific and hence inapplicable at a regional scale.

Numerous past studies have focused on the use of the stochastic approach to predict landsliding which involves landslide risk assessment using either probabilistic estimations, regression analyses or artificial neural networks, based on location based attribute and triggering factors. Wang et al. (2016) recently conducted a comparative study to assess landslide risk in Japan with logistic regression, decision trees, frequency ratios, weights of evidence and artificial neural networks. In the above research, the impact of the causative factors of landslides, were investigated to derive relationships to predict landsliding. Logistic regression method was determined to yield best results in classification while the decision tree, which was developed using the CART (Classification and Regression Tree) algorithm, performed fairly well. Tien Bu et al. (2016) applied bagged trees in GIS-based modeling of rainfall induced landslides.

Use of Remote Sensing in Landslide Prediction

Since measuring soil moisture at site level can be prohibitive due to the cost and complexity of instrumentation, remotely sensed soil moisture can be used as an alternative (Ray and Jacobs 2007). Different remote sensing techniques such as microwave remote sensing and thermal remote sensing possess the ability to detect soil moisture. From the above techniques, the lowest signal attenuation is observed in microwave remote sensing. Therefore, microwave remote sensing contains the unique ability to penetrate the cloud cover without a significant reduction in strength. Thus, it is the most widely used remote sensing technique in detecting soil moisture.

Use of Microwave Remote Sensing in Detecting Soil Moisture

When microwave radiation (300 MHz–300 GHz frequency range in the electromagnetic spectrum) comes into contact with an object on earth, different portions of the radiation are reflected, scattered, absorbed and transmitted. The absorbed radiation is later emitted by the object. Microwave remote sensing can be categorized as: (1) Active microwave remote sensing and (2) Passive microwave remote sensing.

Active Microwave Remote Sensing

In active microwave remote sensing, a pulse is sent to the object of interest by the satellite and the portion of scattering returned is measured by the receiver. The level of scatter depends on several factors and the properties of the incident wave itself. Some factors important in the current study are the soil moisture, surface roughness, land cover, incidence angle and frequency. Hence, scattering of microwave radiation offers important information which could be used to remotely sense the properties of the land surface. Scattering is quantified using a parameter known as the ‘backscatter coefficient’.

Passive Microwave Remote Sensing

Passive microwave remote sensing (Fung and Chen 2010) quantifies the earth’s emission of the microwave radiation that had been previously absorbed from solar radiation. Emission of radiation is quantified using the brightness temperature, the temperature of a black body in thermal equilibrium with its surrounding, which would emit the same intensity of radiation of the measured frequency. Black bodies absorb all the incident electromagnetic radiation, irrespective of the frequency. At any frequency, a black body in thermal equilibrium emits more energy than any other body at the same temperature. Imagery derived from passive microwave remote sensing is high in temporal resolution, but low in spatial resolution, while the opposite is true for active microwave remote sensing.

Landslide Prediction Using Remotely Sensed Soil Moisture

Considerable research effort has been spent on using remotely sensed soil moisture in the study of landslides. In a qualitative comparison study performed on three landslide sites (Ray and Jacobs 2007), a relationship was seen among remotely sensed soil moisture evaluated using the brightness temperatures derived from Advanced Microwave Scanning Radiometer (AMSR-E) sensor, precipitation derived from Tropical Rainfall Measuring Mission (TRMM) satellite and landslide events. It was observed that the events of increased soil moisture followed the events of rainfall and matched well with landslide events. Another important study has been conducted by the same researchers where downscaled remotely sensed soil moisture was successfully employed in developing landslide susceptibility maps for a region in California, USA, using a deterministic approach (Ray et al. 2010).

More recently, Advanced Scatterometer (ASCAT) derived soil moisture has been used to investigate the feasibility of employing remotely sensed soil moisture to improve the prediction of landslides (Brocca et al. 2012). The above authors have correlated crack width propagation, which is an indication of impending slope failure, with SWI (soil water index) derived from remotely sensed ASCAT soil moisture content and the rainfall, to obtain threshold moisture conditions for failure. However, a single threshold value derived by this method would not incorporate the effects of previously identified landslide attributes. Hence, the results of the above study would be site specific.

It is seen that no previous study has employed remotely sensed soil moisture at a broad scale in quantitative landslide risk assessment, which is the objective of this study.

Methodology

Site Selection

Northern and southern regions of western Oregon, USA, was selected for this study due to the availability of data from over 12,000 landslides from 1932 at these locations (Burns and Watzig 2014). Information on the date, length, width, area, volume, location of the landslide and extent of damage caused by the landslide were available for most of these landslides.

Development of the Landslide Database

Although there were over 12,000 landslide sites available throughout the state of Oregon, only 815 sites (Fig. 2) contained remotely sensed soil moisture on the date of landslide occurrence, thus limiting the database to those sites. The soil moisture information regarding the above sites were obtained from European Space Agency (ESA) satellites (Scheepmaker and Frankenberg 2011). Furthermore, information regarding the following attributes were obtained at the above sites as well: (1) slope (2) geology (3) soil type and (4) land cover. Digital elevation models (DEM) developed for the state of Oregon at a resolution of 10 m × 10 m were used to calculate the slopes from the elevation and distance data, using ArcMap.

Fig. 2
figure 2

Selected landslide sites of Oregon

Lithology maps used in this project were developed by the United States Geological Survey (USGS) in 2005 (Walker and MacLeod 1991). Updated soil type maps developed by United States Department of Agriculture (USDA) and Natural Resources Conservation Service (NRCS) were used to extract soil type information (Soil Survey Staff 2013). The relevant rock and soil types are seen in column 1 of Table 1. The land cover patterns were obtained from National Land Cover Datasets (NLCD) of 1992, 2001, 2006 and 2011 (Homer et al. 2007; Fry et al. 2011; Vogelmann et al. 2001). These datasets consist of 20 different land cover classes and they were assigned to 8 broader classes shown in column 1 of Table 1. From the above data, it could be observed that many landslides had occurred due to single storms such as the ones on February 1996 and November 1996. In these situations, some landslides could have been triggered by a precursor landslide. Thus, in such cases, the successor landslides had to be removed from the database. In order to do so, the largest dimension of the landslides was selected and a buffer area with a radius greater than twice the size of that dimension was created. Any smaller magnitude landslides occurring within the buffer area of a major landslide were removed, eliminating the cascading incidents from the database.

Table 1 Landslide attributes and the coefficients of logistic regression model

In order to develop a model that can be used to differentiate between locations with high landslide potential from those with low potential, sites with no landslides were included in modeling. Thus, the same number of randomly selected sites from the study area with no reported landslides and the corresponding attributes were also added to the landslide database.

Analytical Approach

The authors used the three stochastic approaches discussed below to assess landslide hazard based on the location specific attributes of slope, geology, soil type, and land cover and moisture data obtained from satellite images that is expected to represent both the hydrological and rainfall triggering effects.

  1. (1)

    Logistic regression analysis

Logistic regression modeling can be employed in landslide studies to predict the probability of landslide occurrence at a given location, based on the attributes of the location and triggering factors. Since the occurrence of a landslide is a discrete variable taking values of 0 or 1, it cannot be modeled with linear regression. Instead, the natural logarithm of the odds of landslide occurrence, i.e. the probability of landslide occurrence over the probability of non-occurrence, which is a continuous variable, is modeled. The probability of occurrence of a landslide using logistic regression can be expressed as:

$$ P\left( F \right) = \frac{1}{{1 + { \exp }[ - \left( {\beta_{0} + \beta_{\text{i}} X_{\text{i}} + \beta_{k} X_{k} + \cdots } \right)]}} $$
(3)

where β0, βi and βk are coefficients associated with the continuous variables Xi and categorical variables Xk. Continuous variables such as soil moisture and slope are associated with single coefficients while categorical variables such as geology, soil type and land cover contain different coefficients for each sub-category. If the specific sub-category is present at a certain location, a value of 1 would be assigned to that sub-category and values of zero would be assigned to other sub-categories rendering the contribution to the above equation from category ‘k’ to be βk. A sample dataset from the database used in this study is given in Table 2. Parameter estimates are obtained using maximum likelihood. Since the outcome of the model is the binary status of failure, it is assumed that the outcome follows a binomial distribution. It is further assumed that the relationship in Eq. (3) between the mean of the above binomial distribution and the predictor variables is a logistic function. Thus, the likelihood function of the observations can be expressed as follows:

$$ \begin{aligned} {\text{L}}\left( {\upbeta0,\upbeta1,\upbeta{\text{k}}, \ldots } \right) & = \mathop \prod \limits_{i:y = 1} P\left( {F_{i} } \right)\mathop \prod \limits_{{i^{{\prime }} :y = 0}} \left[ {1 - P\left( {F_{i}^{{\prime }} } \right)} \right] \\ & = \mathop \prod \limits_{i:y = 1} \frac{1}{{1 + { \exp }[ - \left( {\beta_{0} + \beta_{\text{i}} X_{\text{i}} + \beta_{k} X_{k} + \cdots } \right)]}} \\ \quad \mathop \prod \limits_{{i^{{\prime }} :y = 0}} \left[ {1 - \frac{1}{{1 + { \exp }[ - \left( {\beta_{0} + \beta_{\text{i}} X_{\text{i}} + \beta_{k} X_{k} + \cdots } \right)]}}} \right] \\ \end{aligned} $$
(4)

where y = 1 denotes failure while y = 0 denotes non-failures. The above parameters are evaluated by maximizing the logarithm of the likelihood function.

Table 2 A sample dataset from the database
  1. (2)

    Analysis based on decision trees

Decision trees use segmentation of the predictor space into a number of regions, based on a selected decision rule. A decision tree consists of a root node, split nodes and terminal nodes. The root node consists of input data while split nodes consist of results of the intermediate partitioning of input data based on the selected decision rule. Terminal nodes, also known as leaves, consist of final classifications assigned to the partitioned data. Inputs at the root node and split nodes in this study would be landslide predictor variables, i.e. locations based attributes and triggering factors while the output at terminal nodes would be the landslide occurrence or non-occurrence.

A decision tree developed based on the standard CART algorithm was used in this study. First, the input data is examined and the best split is performed. Then recursive binary splitting is performed at each child node generated hereafter such that the classification error at each node is minimized. Gini index defined in Eq. (5) (James et al. 2007) is used for estimating the classification error in this study.

$$ G = \mathop \sum \limits_{k = 1}^{K} p_{mk} (1 - p_{mk} ) $$
(5)

where G represents the Gini index, pmk stands for the proportion of observations that are in the m-th region belonging to the kth class and K represents the number of classes in the classification. Thus, if pmk is either zero or 1, G would be zero.

The decision rule used in study was the minimization of G in Eq. (5) at each splitting stage. As an example, if a decision tree is to be developed for the dataset in Table 2, the CART algorithm will consider all possible binary splits of the variables, and select the split which results in the lowest value of the G for a given node.

  1. (3)

    Analysis based on Bagged trees

Decision trees could suffer from high variance, i.e., if the tree is trained using a different dataset, the resulting tree could be quite different from the original tree. Thus, bagging is introduced to minimize this error. Bagging involves taking multiple repeated samples from the same data set to generate different training sets of data. The model is trained on each individual dataset generated in this manner. Final predictions are made based on the majority rule, i.e., preponderance of individual decisions of landslide occurrence or non-occurrence generated from the multiple samples. This could help reduce the error induced by variance and thus, bagging helps to improve the prediction accuracy of a decision tree (James et al. 2007).

Cross Validation

Cross validation is employed to improve the prediction accuracy of statistical models. A common approach to model testing is the validation set approach, where a randomly selected portion of the training data set is set aside, during training the model, for model testing. The validation set approach is used widely in landslide studies (Wang et al. 2016). However in this method, the number of observations used in training is reduced since a portion of the data set is not used in model training which could lead to a compromise in model accuracy. Furthermore, since only one data set is used in testing the model, the accuracy would depend on the data set that was selected for testing, thereby generating a classification error due to variance.

In order to address these drawbacks, a k-fold cross validation approach has been used. This method involves dividing the training data set randomly into a ‘k’ number of sub-samples. Every time the model is trained, one of the above sets is set aside for testing while the remaining k − 1 number of samples are used in model training. The above procedure is repeated so that every sample is used once for testing. Thus, a ‘k’ number of different model combinations are developed. The final model is developed by averaging the results of all the combinations and helps to improve the prediction accuracy by decreasing the model variance while also overcoming the shortcomings in the validation set approach.

Results of the Study

  1. (1)

    Logistic regression model

When trained with the previously described landslide database, the logistic regression model in Eq. (3) yielded parameters listed in column 2 of Table 1. The simplified format of the model for one sample case is provided below. Based on the parameters of Table 1, the probability of failure at a location with “barren” land cover class, “mudstone” geology and “andisol” soil order can be expressed in Eq. (6).

$$ \begin{aligned} P\left( F \right) & = \frac{1}{{1 + { \exp }[ - \left( {6.2 + 15.1{\text{SM}} + 0.02{\text{SL}} - 2.0 + 0.8 + 1.4} \right)]}} \\ & = \frac{1}{{1 + { \exp }[ - \left( {6.29 + 15.12{\text{SM}} + 0.02{\text{SL}}} \right)]}} \\ \end{aligned} $$
(6)

where SM is the volumetric soil moisture content and SL is the slope angle of the given location.

The locations with probability of failure greater than 0.5 are classified as high-risk locations while the remaining locations are classified as low risk ones. Analysis of the database based on logistic regression resulted in a classification accuracy of 80.7% (Table 3).

Table 3 Classification accuracies of the three models
  1. (2)

    Decision tree

Figure 3 shows the decision tree developed in this study of which only the first three predictor variable splits are shown due to space limitation. As seen in Fig. 3, at every node, all the possible binary splits of predictor variables (e.g.: soil moisture) are considered and the split which has the lowest value of the Gini index (Eq. 5) is selected as the splitting criterion of that node. As an example, the split between soil moisture content <0.39 and soil moisture content ≥0.39 resulted in the lowest Gini index for node 1, hence it was selected as the splitting criterion for that node. Next, splitting is performed for each child node, developed by the first split, in a similar manner. As an example, the child node with soil moisture content <0.39 is split further into those with rock types of andesite, clay or mud, gravel, greywacke or mudstone and rock types of basalt or sand. The decision tree is grown further by splitting data in this manner until all the resulting nodes have a Gini index of zero, i.e. they only contain either the failures or non-failures. Once this criterion is achieved for all the nodes, splitting is terminated. A decision tree, which contained multiple nodes, resulted in a classification accuracy of 89.2 (Table 3).

Fig. 3
figure 3

First three predictor variable splits of the decision tree

  1. (3)

    Bagged tree

In Bagging on the other hand, 30 different decision trees like the the one in Fig. 3 are grown in the above manner by taking multiple repeated training samples from the same training dataset. The final class assigned to a given location would be the class assigned by a majority of the above decision trees. The bagged tree classification resulted in an accuracy of 92.2% (Table 3).

Discussion of Results

When the same models were formulated based on the landslide attributes alone, i.e. without the remotely sensed moisture, the corresponding classification accuracies were significantly lower. Thus, the results demonstrate that all three stochastic methods possess the ability to model, to a reasonable accuracy, the complete relationship among the landslide attributes, remotely sensed soil moisture and landslide occurrence. However, the authors believe that the relatively high accuracies, especially those of the latter two models, result from training and validation of the models based on different portions of the same dataset. To address this limitation, the authors are currently in the process of setting up a second landslide database from a different geographical region that would be used for independent verification of the above models.

Conclusions

Timely evaluation of sudden increases in soil moisture levels provides a reliable means of landslide risk assessment in landslide prone areas. However, continuous in situ measurement of soil moisture content could be prohibitive due to labor cost, complexity of instrumentation and reliability of instrument readings. Hence, frequently calibrated, remotely sensed soil moisture provides a viable alternative to the instrumentation requirement. The classification models developed in this study demonstrated reasonably high accuracies in predicting locations of landsliding in the considered study area. Moreover, future adaptation of techniques that can further downscale remotely sensed soil moisture product used in this study, is expected to improve the accuracy and reliability of the predictions. The authors hope to refine these preliminary prediction models to develop more comprehensive, convenient and timely methods of identifying locations which are subject to high risk from rainfall triggered landslides.