Effect of time and space partitioning strategies of samples on regional landslide susceptibility modelling

Khanna, Kirti; Martha, Tapas R.; Roy, Priyom; Kumar, K. Vinod

doi:10.1007/s10346-021-01627-3

Effect of time and space partitioning strategies of samples on regional landslide susceptibility modelling

Technical Note
Published: 29 January 2021

Volume 18, pages 2281–2294, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Landslides Aims and scope Submit manuscript

Effect of time and space partitioning strategies of samples on regional landslide susceptibility modelling

Download PDF

Kirti Khanna¹,
Tapas R. Martha ORCID: orcid.org/0000-0002-7761-9800¹,
Priyom Roy¹ &
…
K. Vinod Kumar¹

1199 Accesses
25 Citations
Explore all metrics

A Correction to this article was published on 02 March 2021

This article has been updated

Abstract

Assessment of the spatial probability of future landslide occurrences for disaster risk reduction is done through landslide susceptibility modelling. In this study, we investigated the effect of time and space partitioning strategies of samples on the performance of regional landslide susceptibility models on macro-scale mapping in the state of Mizoram, India, covering 21,087 km² area. We used landslide inventory data of 2014 and 2017 periods consisting of 1205 and 2265 landslides, respectively, to train and test the models with four sampling strategies such as spatial, temporal, temporal (size constrained) and temporal (geographic constrained). We used five commonly inherited models such as multiclass weighted overlay (MCWO), information value (IV), weights of evidence (WoE), logistic regression (LR) and artificial neural network (ANN) to evaluate the effect of sampling strategies on the model performance for regional landslide susceptibility mapping. Validation of model performance was done using receiver operating characteristic (ROC) curve. Traditional spatial sampling strategy applied to landslides in 2014 with a random split in 70:30 proportion provided a high performance of all the five models but failed to predict landslides in 2017. The landslide incidences in 2017, when used for model validation either entirely or in different split conditions (both size and geographic constrained), provided consistent performance, even though the testing sample size is large or have a different spatial disposition, if the training was carried out with non-linear susceptibility models such as LR and ANN using landslide incidences in 2014. Results show the importance of sample selection during validation of landslide susceptibility models on a regional scale.

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Landslides are frequent natural disasters that have impacted people and economy worldwide. According to the EM-DAT, 275 people died (cases > 10) and 54,908 people were affected (cases > 100) due to landslides with an economic loss of 0.9 billion US$ in the year 2018 (EMDAT 2019). The Himalayan belt is the hotbed for landslide disaster in the world (Froude and Petley 2018). One of the primary management and mitigation measures to reduce landslide disaster risk is to create landslide susceptibility maps (LSM). These maps give a spatial probability of an area to future occurrence of landslides (Reichenbach et al. 2018), thus help planners to decide on prioritisation of localities in a region for land development activities (Sepe et al. 2019). It also helps in regional landslide early warning through incorporation of rainfall threshold models (Mathew et al. 2014).

Landslide susceptibility modelling is a constantly evolving area of research, and a comprehensive review of susceptibility models was recently provided by Reichenbach et al. (2018) and Lombardo et al. (2020). The reliability of LSMs depends mostly on the amount and quality of available data, the working scale and the selection of the appropriate methodology of analysis (Ayalew and Yamagishi 2005; Lombardo et al. 2020). Over the years, LSMs have been prepared in different parts of the world using heuristic, statistical and deterministic approaches (Van Westen 1993; Aleotti and Chowdhury 1999; Reichenbach et al. 2018). Deterministic approach for landslide susceptibility modelling on a regional scale was found to be effective for landslide early warning (Montrasio et al. 2014). But, data-driven statistical methods are commonly used in susceptibility modelling, and they include bivariate analysis, multivariate analysis, neural network, fuzzy logic and genetic algorithms (van Westen et al. 1997; Aleotti and Chowdhury 1999; Guzzetti et al. 2005; Kanungo et al. 2006). Other landslide susceptibility analyses include probabilistic methods and machine learning techniques wherein the weights are assigned according to the probability of landslide and non-landslide occurrences (Bonham-Carter 1994; Vahidnia et al. 2010; Di Napoli et al. 2020). Selection of predisposing parameters which is also important for the success of a susceptibility model has been investigated by Guzzetti et al. (2006), Ghosh et al. (2011) and Cevasco et al. (2014).

Landslide susceptibility assessment is generally based on the concept that ‘the present and the past are key to the future’ (Varnes 1984; Carrara et al. 1991; Hutchinson 1995; Guzzetti et al. 1999; Aleotti and Chowdhury 1999), which implies that slope failure in the future will occur under same conditions which led to past instability (Guzzetti et al. 1999). This is why most hazard analysts take into account updated landslide inventory that represents the fundamental data for identifying the hill-slope instability factors in triggering landslides (Lee and Sambath 2006). Therefore, the use of future or time partitioned inventory is desirable for validation of landslide susceptibility models (Chung and Fabbri 2003; Remondo et al. 2003). Landslide inventory maps, which portray spatial and temporal patterns of landslide distribution, type of movement, rate of movement and kind of material displaced (earth, debris or rock), are used to train and test susceptibility models (Pardeshi et al. 2013). A common practice for acceptance of susceptibility models is validation by the division of the dependent variable in time or space. The space partitioned method splits samples randomly in a particular ratio (commonly 70:30). This method is useful in the absence of time-dependent variables and offers a non-time assessment of results derived from prediction models (Chung and Fabbri 2003). The performance of the prediction result of susceptibility models is estimated using receiver operating characteristic (ROC) curve (Fawcett 2006; Lee et al. 2004; Blahut et al. 2010). However, statistical methods used by them (e.g. Kavzoglu et al. 2014; Pellicani et al. 2017; Xiao et al. 2020), though predicted higher performance using ROC, are applied to a limited area and could not conclusively justify the usage of one GIS model over the other for regional landslide susceptibility mapping.

The objective of this study is to evaluate performance of data-driven models for regional landslide susceptibility with temporally and spatially split landslide inventory data. We investigated landslide susceptibility over a large area covering 21,087 km² of the Mizoram state, India, using training and testing data prepared with four sampling strategies from a bi-temporal (2014 and 2017) landslide inventory dataset. These are shallow landslides induced by rainfall during the monsoon season. Five susceptibility models, viz. multiclass weighted overlay (MCWO), information value (IV), weights of evidence (WoE), logistic regression (LR) and artificial neural network (ANN), were used to find the effect of the sampling strategies on regional landslide susceptibility mapping. Major landslide predisposing factors in the Himalayas used by previous researchers (e.g. Kanungo et al. 2006; Ghosh et al. 2011) such as slope, aspect, landform, lithology, distance to lineaments, soil and land use were used in the susceptibility models.

Study area

The state of Mizoram in India, covering an area of 21,087 km², was considered for regional landslide susceptibility modelling on macro-scale (Fig. 1). This state generally witnesses a large number of landslides during the monsoon season. The minimum and maximum elevations of the study area are approximately 550 m and 2100 m, respectively. The state has a highly rugged terrain with narrow, deep valleys and steep slopes (Fig. 1). Landslide inventory for both the years mapped using satellite data is also shown in Fig. 1.

The rainfall pattern for both the years (2014 and 2017) is shown in Fig. 2. The graph indicates that the amount of total rainfall during June 2017 is almost twice the amount of total rainfall during June 2014. Excess rainfall in 2017 during the monsoon season (June to September) has resulted in occurrence of more landslides in 2017 in comparison to 2014.

The geology of Mizoram is controlled by the eastern syntaxial bend of the Himalayan orogeny (Valdiya 2016). The Neogene sedimentary rocks of Tipam and Surma groups are the primary litho units that constitute the region. The Surma group is unconformably overlain on the Barail group, which is made up of shale and siltstones. The Surma group is divided into lower, middle and upper Bhuban formations, which transitionally changes to Bokabil formation. Shales, siltstones and sandstones are the main rock units occurring as interbedded or massive layers (Valdiya 2016). The Tipam group lies conformably over the Surma group and is mainly comprised of thickly bedded sandstones. The sedimentary rocks are folded into asymmetrical anticlines and synclines along N-S axes. The folded and friable arenaceous rocks constituting the topography make the region highly vulnerable to landslides.

Materials and methods

The occurrence of landslides is controlled by predisposing factors such as lithology, landform, soil and geological structure (Carrara et al. 1991; Guzzetti et al. 1999). These layers were prepared and integrated in GIS using weightages derived through five modelling techniques. The flowchart of the methodology is shown in Fig. 3.

Landslide predisposing factors

We have used Cartosat-1 DEM (30 m) for generating topographic factors such as slope angle and slope aspect. The slope angle ranges from 0° to 87° and was classified into ten classes using the natural break method. Aspect is categorised into eight directional classes ranging from 0° to 360° w.r.t. the North. The classes for the slope and aspect are shown in Fig. 4a and b.

Other factors used in the study are lithology, landform, lineaments, soil and land use. Geological map (i.e. lithology and lineament) on 1:50,000 scale published (www.bhukosh.gsi.gov.in) by Geological Survey of India (GSI) was used in the study (GSI 2020). Shale and sandstone of the upper/middle Bhuban formation of the Surma group form the major litho types in the area (Fig. 4c). Euclidean distance, in the case of lineaments, was calculated through the identification of the nearest landslide location (Fig. 4d). Land Use and Land Cover (LULC) map prepared by NRSC (NRSC 2014) on 1:50,000 scale using satellite data was used in this study. Majority of the area is covered with the deciduous forest with evergreen/semi-evergreen and scrub forest being the less dominant types (Fig. 4e). The soil texture map prepared on 1:50,000 scale by Mizoram Remote Sensing Applications Centre (MIRSAC) using satellite data and field survey was used in this study. The soil is formed by the erosion of the Surma and Tipam group of rocks and is classified mainly as loamy and clayey (Fig. 4f). Landform map prepared on 1:50,000 scale jointly by GSI and National Remote Sensing Centre (NRSC 2012) using satellite data and digital elevation model was used in the study to calculate the weightages of landform classes for landslide occurrence. Highly and moderately dissected hills and valleys oriented north-south are mostly found in the area (Fig. 4g). These predisposing factors were converted to 30 m × 30 m grid size and ingested to the susceptibility models.

Landslide inventory

High-resolution multi-spectral images of LISS-IV acquired from the Resourcesat-2 satellite were used for mapping landslides using the object-based change detection method (Martha et al. 2010, 2011, 2016). Image characteristics such as reduced NDVI in landslide affected areas and increase in brightness due to exposure of new rock and soil are mainly used for detection and mapping of landslides. Landslides in Mizoram are mostly rainfall-induced shallow landslides and are small in size. Therefore, we have mapped the entire body of landslides as single polygon since it is difficult to differentiate scarp from remaining parts of the landslide body. Figure 5 shows pre- and post-landslide satellite images of Mizoram used in landslide inventory mapping. As shown in Fig. 1, landslide occurrences in the east-central part of the study area are less in both the periods. However, landslides in the 2014 period have occurred in the entire study area, although prevalent in the northern part of the area. On contrary, majority of landslides in 2017 occurred in the Lunglei district (western part of the study area (Fig. 1)) due to cyclone-induced rainfall, thus offered an ideal opportunity to validate the models using time partitioned samples. Table 1 shows the summary statistics of landslides mapped in 2014 and 2017 periods.

Table 1 Summary statistics of landslides mapped in Mizoram for the years 2014 and 2017

Full size table

GIS models for susceptibility mapping

Five models such as information value (IV), multiclass weighted overlay (MCWO), weights of evidence (WoE), logistic regression (LR) and artificial neural network (ANN) were used for generation of landslide susceptibility map using time and space partitioned samples. These models are briefly described below.

Information value (IV) method

This method provides information about the relative influence of predisposing factors on the landslide occurrence. The information value J_i for each disposing factor X_i concerning landslides is given in Eq. (1) (Yin and Yan 1988).

$$ {\mathrm{J}}_{\mathrm{i}}=\ln \frac{S_{\mathrm{i}}/{N}_{\mathrm{i}}}{S/N} $$

(1)

where S_i is the area of landslides within the ith class of causative factor X, N_i is the area of the ith class of the predisposing factor X, S is the total area of the landslides in the study area, and N is the total area of the study area. The final susceptibility index map was generated by integrating all factors as shown in Eq. (2).

$$ \mathrm{LSI}=\sum \limits_{i=1}^nJ $$

(2)

where LSI is the landslide susceptibility index and i varies from 1 to n.

Multiclass weighted overlay (MCWO) method

The MCWO method weighs predisposing factors using landslide inventory data and calculates the spatial association of landslides with categorical variables using Yule’s coefficient (Y_C) (Eq. (3)) (Ghosh et al. 2011).

$$ {\mathrm{Y}}_{\mathrm{c}}=\frac{\sqrt{M_{cl}/{M}_{\overline{c}l}}-\sqrt{M_{c\overline{l}}/{M}_{\overline{c}\overline{l}}}}{\sqrt{M_{cl}/{M}_{\overline{c}l}}+\sqrt{M_{c\overline{l}}/{M}_{\overline{c}\overline{l}}}} $$

(3)

where M_cl is the area of ‘positive match’ where a factor class and landslides are both present, $ {M}_{\overline{c}l} $ is the area of ‘mismatch’ where a factor class is absent, but landslides are present, $ {M}_{c\overline{l}} $ is the area of ‘mismatch’ where a factor class is present, but landslides are absent, and $ {M}_{\overline{c}\overline{l}} $ is the area of ‘negative match’ where both factor class and landslide are absent. The value of Y_C ranges between −1 and +1. A negative Y_C means less spatial association, whereas a positive Y_C means high spatial association (Ghosh et al. 2011). Based on the Y_C, the landslide favourability score for each factor class is generated using Eq. (4).

$$ LOFS=\left\{\begin{array}{c}0\kern3.5em for\kern0.5em {Y}_C\le 0\\ {}\frac{Y_c}{Y{c}_{max}}\kern1em for\ {Y}_C>0\end{array}\right. $$

(4)

where LOFS stands for landslide observed favourability score and Y_Cmax is the highest value among all Y_C values of the predisposing factor class.

The LOFS values can determine the predictor sub-class weight, but the absolute value of the landslide predisposing factor, on the whole, can be determined by the ratio of difference of spatial association (Y_C) as shown in Eq. (5).

$$ \mathrm{PR}=\left[{\mathrm{SA}}_{\mathrm{max}}-{\mathrm{SA}}_{\mathrm{min}}\right]/\left[{\left({\mathrm{SA}}_{\mathrm{max}}-{\mathrm{SA}}_{\mathrm{min}}\right)}_{\mathrm{min}}\right] $$

(5)

where PR stands predictor rating and SA stands for spatial association between factor classes with respect to landslides.

Weight of evidence (WoE) method

Weights of evidence (WoE) was primarily developed for mineral exploration applications (Bonham-Carter 1994). But due to its broad applicability and scope, it has also been used in the field of landslide susceptibility zonation (Mathew et al. 2007). WoE is based on the concept of prior and posterior probability, assuming that input layers are independent of one another (Neuhäuser and Terhorst 2007).

An open-source geospatial tool in ArcGIS (Arc-SDM-10.5, Sawatzky et al. 2009) was used to calculate weights (W⁺ and W^-) depending on the association between landslides and the layers for each class. Also, other parameters like contrast (c) and studentised contrast (Sc) are estimated to provide a spatial relationship between landslides and predisposing factors. The WoE method is discussed in detail by Neuhäuser and Terhorst (2007), Mathew et al. (2007), Blahut et al. (2010) and Pudi et al. (2018).

Logistic regression (LR) method

Logistic regression is one of the multivariate techniques which models the relationship between a dependent (dichotomous) and independent variables. The landslide distribution in the study area comprises of training data and randomly selected equal number of non-landslide data (Lee et al. 2002). The status of each cell in the landslide database is represented as ‘1’ indicating presence of landslides and as ‘0’ indicating the absence of landslides (Yesilnacar and Topal 2005). The model was executed in Statistical Package for Social Sciences ©(SPSS 2017). It is based on the logistic function f (z) which is defined in Eq. (6).

$$ \mathrm{f}\left(\mathrm{z}\right)=1/\left(1+{\mathrm{e}}^{-\mathrm{z}}\right) $$

(6)

where z varies from − ∞ to +∞. To obtain the logistic model from the logistic function, z is written as a linear combination of some constant value, which is the intercept of the model and products of independent variables and their respective coefficients (Eq. (7)).

$$ Z={\beta}_o+\sum \limits_{i=1}^n{\beta}_i{X}_i $$

(7)

where βo is the intercept of the model, β_i is the corresponding coefficients for each independent factor, X_i is the independent factor, and i varies from 1 to n.

Artificial neural network (ANN) method

Artificial neural network (ANN) is one of the widely used techniques in landslide susceptibility modelling (Gόmez and Kavzoglu 2005). The purpose of an ANN is to build a model of the data generating process so that the network can predict outputs from inputs through a learning process (Lee 2005). A feed forward network using multi-layer perceptron (MLP) technique comprising of input, hidden and output layers (three layers architecture) was utilised in the study (Fig. 6). The detailed description of MLP can be found in Basheer and Hajmeer (2000). Input data are transformed into output classes through interconnected neurons through weights which are summed up subsequently (Kanungo et al. 2006). The number of neurons during the processing of input and output layers depends on the number of data sources and often determined by trial and error method. These networks are generally non-linear and could process and analyse intricate data patterns (Kanungo et al. 2009). The network learns by adjusting the weights between the neurons in response to the errors between the actual output values and the target output values based on specific algorithms (Lee et al. 2004).

Two stages that are generally involved in using neural networks for multisource classification are (i) the training stage wherein internal weights are adjusted and (ii) the classifying stage (Lee et al. 2004). Weights physically represent connections between processing units or neurons, and each neuron has a rule for summing the input weights and a rule for calculating an output value (Ermini et al. 2005). The rules can be formed from different algorithms which are implemented until the desired threshold is reached. The back-propagation algorithm, which is generally used and also applied in the present study, trains the network until some minimal targeted error is achieved between the desired and actual output values (Bishop 1995; Pradhan et al. 2010). Formally, the input that a single node receives is weighted according to Eq. (8).

$$ {\mathrm{net}}_{\mathrm{b}}=\sum \limits_{i=1}{w}_{\mathrm{a}\mathrm{b}}\ast {\mathrm{o}}_{\mathrm{a}} $$

(8)

where w_ab represents the weights between nodes a and b and o_a is the output from node a. Output from node c is given by Eq. (9).

$$ {\mathrm{o}}_{\mathrm{c}}=\mathrm{f}\left({\mathrm{net}}_{\mathrm{b}}\right) $$

(9)

The function f is usually a non-linear sigmoid function that is applied to the weighted sum of inputs before the signal propagates to the next layer. The error, E, for an input training pattern, I, is a function of the desired output vector, d, and the actual output vector, o, given by Eq. (10).

$$ \mathrm{E}=\frac{1}{2}\sum \limits_{\mathrm{c}}{\left({\mathrm{d}}_{\mathrm{c}}-{\mathrm{o}}_{\mathrm{c}}\right)}^2 $$

(10)

The error is propagated back through the neural network and is minimised by adjusting the weights between layers (Paola and Schowengerdt 1995).

The training phase was executed with seven predisposing factors (e.g. slope, aspect, lithology landform, LULC, soil and structure) and landslide and non-landslide data. The values are normalised and fed into the ANN architecture. The ANN network produced hidden layer weights and an importance matrix through 12 non-linearly connected neurons.

Model training and validation

Sample preparation

The landslide database of 2014 and 2017 was created as polygons in the form of ESRI shape files. These shape files form the base for training and testing of susceptibility models. Both training and testing of models were carried out in a raster environment. Hence, the polygon shape files were converted to a raster file of 30 m × 30 m grid size. However, in instances where the landslide area is less than 900 m², the polygons were first converted to points (centroids of the polygons) and then rasterised as 30 m × 30 m grid. Thus, all landslides were converted to 30 m × 30 m grid. WoE, LR and ANN models require the dependent variable to be ingested as points during model execution. Hence, the 30 m × 30 m grids corresponding to training data were further converted to points and ingested to these three models to calculate weights of independent variables.

Data training and validation

Any prediction model aims to find the probability of future occurrence of landslides using the historical landslide data. This means prediction done using historical landslide inventory data needs to be validated using succeeding landslide data. Generally, in the absence of subsequent (i.e. future) inventory, the standard approach adopted is by selecting the landslide inventory of a particular year and randomly splitting it into 70:30 ratio (Pellicani et al. 2017; Taalab et al. 2018; Vakhshoori et al. 2019; Xiao et al. 2020) or by taking equal numbers of training and testing datasets (Kavzoglu et al. 2014; Segoni et al. 2020). In another study, Guzzetti et al. (2006) have shown that the performance of susceptibility models generated using a large number of landslides is better in comparison to model performance when less number of landslides are used as training population. This indicates that training sample size also influences the performance of susceptibility models. In this study, the landslide inventory database of 2014 and 2017 was used to design four strategies of training and testing samples to validate the disparity of sample population (both spatial and temporal) on model performance. The random splitting of samples to training and testing data was iterated ten times to rule out that the accuracy obtained for susceptibility models is not result of chance (Kanungo et al. 2006; Lombardo et al. 2020). The landslide polygons were split randomly as training and testing data using the geostatistical analyst tool of ArcGIS 10.5 software and subsequently rasterised as explained in the previous section.

I.
Strategy 1: Spatial sampling - The landslide inventory of 2014 was considered for the model generation and validation wherein the dataset was randomly divided into 70% (training) and 30% (testing) landslides (Fig. 7a). This is the most common approach followed in landslide susceptibility modelling (Aleotti and Chowdhury 1999; Guzzetti et al. 1999; Ghosh et al. 2011).
II.
Strategy 2: Temporal sampling - The inventory of 2014 (100%) was considered to train the models, and the inventory of 2017 (100%) was used to test the models (Fig. 7b). This is the ideal approach to validate the performance of landslide prediction models (Chung and Fabbri 2003).
III.
Strategy 3: Temporal sampling (size constrained testing) - The inventory of 2014 (100%) was considered to train the models, and the inventory of 2017 (50%) was used to test the models. This was done to remove the bias of oversampled testing data by approximately equalising the testing and training sample population (Fig. 7c).
IV.
Strategy 4: Temporal sampling (geographic constrained testing) - The inventory of 2014 (100%) was considered to train the models, and the inventory of 2017 (50%) was constrained geographically to test the models. There is one cluster of landslides in the western part of Lunglei district (Fig. 1). This cluster boundary was considered to geographically constrain the selection of testing samples. Herein, 20% of the landslides within the cluster and 80% of the landslides for remaining area outside the cluster corresponding to the year 2017 were selected as testing sample population (Fig. 7d). This helped us to validate spatial biasness of testing sample population on the performance of models.

The landslide susceptibility models were validated for their predictive performance using receiver operating characteristic (ROC) curve (Blahut et al. 2010; Frattini et al. 2010; Ghosh et al. 2011). False positives and true positives were calculated as a contingency table by applying a range of different cut-offs (Frattini et al. 2010). ROC as a two-dimensional graph was created between true positive rate (y-axis) and false-positive rate (x-axis). The area under curve (AUC) of ROC is the quantitative measure of the susceptibility model performance (Sarkar et al. 2013). ROC provided relative trade-offs between benefits (true positives) and costs (true negatives) (Fawcett 2006).

Results and discussion

Model training

The four sampling strategies based on two training cases, i.e. 70% and 100% of landslides of 2014, resulting in a total of 844 and 1205 landslides, respectively, were used for training the five models. The two training cases, due to their uniformity in spatial disposition, preserve the effective control of predisposing factors on the prediction of landslides by the five models. The control of the predisposing factors for the 70% dataset training case is summarised in Table 2 and that of the 100% dataset training case is summarised in Table 3.

Table 2 Relative control of predisposing factors on susceptibility models estimated with 70% training data

Full size table

Table 3 Relative control of predisposing factors on susceptibility models estimated with 100% training data

Full size table

As shown in Tables 2 and 3, the training sample population influence the relative weight of predisposing factors. Lithology, land use and aspect have the highest control on the occurrence of landslides in all five methods. However, interestingly, when trained with 100% of 2014 landslides, the role of the slope is diminished in comparison to training of models with 70% of the 2014 landslides. This corroborates our understanding that landslides in the Northeast Himalayas in India occur in all kinds of slope conditions provided right kind of lithology (e.g. sandstone-shale-siltstone alternative bands) exists.

Model validation

Landslide susceptibility maps generated using 70% and 100% of 2014 landslides as training datasets were validated with spatial and temporal testing landslide data as per the strategies described in the “Data training and validation” section. The accuracy as AUC (%) estimated using the ROC curve provided a direct comparison of the model performance among the four types of space and time partitioned landslide inventory testing datasets. Table 4 shows AUC (%) obtained using ten iterations of training/testing sampling strategies (1, 3 and 4). Strategy 2 involves only one iteration since entire temporal data were used for training and testing of models. The standard deviation of AUC estimated with ten random splitting iterations is quite low (Table 4), which indicates that model performance is not a result of chance. The accuracy of models reported in this study corresponds to the maximum AUC value.

Table 4 Comparison of spatial and temporal sampling strategies and split iterations on model performance (AUC in %)

Full size table

In the sampling strategy 1, the maximum AUC (84%) was obtained for IV and ANN followed by WoE (80%), LR (78%) and MCWA (77%) models (Table 4). The AUC obtained using sampling strategy 2 is MCWO (66%), IV (66%), WoE (65%), LR (73%) and ANN (74%). This indicates that IV, ANN and WoE are the best performing models for landslide susceptibility mapping of a large area in comparison to MCWO and LR models, when the spatial sampling method is used. However, there is substantial decrease in the performance of IV and WoE models when temporal sampling strategy (strategy 2) is adopted. This is mainly due to an increase in testing sample size (664%), while the training size has increased only by 30% (Figure 7b). The result shown in Table 4 indicates that the ANN method, which produced maximum AUC (74%) with the temporal sampling strategy, is effective in training predisposing factors with a higher training population (from 844 to 1205).

The landslide occurrence in the 2017 period has been pervasive due to the high intensity of rainfall. The peak rainfall recorded in June 2017 is ~700 mm, which is an order of magnitude higher than that of 2014 (Fig. 2). This factor has been taken into account while devising the temporal sampling strategies. Further, the influence of size (no. of landslide occurrences) of the training dataset is also analysed while validating the model. Therefore, in strategy 3, i.e. size constrained testing, we have considered 50% of the landslides in 2017 as testing samples (1133 landslides), which is validated against the models trained using 100% of landslides in 2014. The results (Table 4) are similar to strategy 2, which indicates that testing sample population has less influence on the performance of susceptibility models, and the models trained with 1205 landslide samples are adequate to predict a future large landslide event. It is seen that a large cluster (approx. 50% of total) of landslides in 2017 is present in the Lunglei region (Fig. 1). This was due to intense cyclonic rainfall in the Lunglei region, an anomalous scenario occurring during the regular seasonal monsoon. Even in a random selection of testing data, more samples are selected from that area, creating a possible spatial biasness on the modelling results (Fig. 7c1). In order to evaluate the geographic biasness of the testing sample population on model performance, a stratified systematic sampling strategy, i.e. strategy 4, was considered wherein 20% from the total landslides are randomly picked up from Lunglei cluster (227 landslides), while remaining 80% are randomly picked up from the rest of the state (906 landslides). This resulted in the reduction of testing samples in the Lunglei cluster (Fig. 7d1) and better geographic distribution of testing sample population in the remaining area outside the cluster while retaining the 50% total sample population (1133) of landslides in 2017. Results show an increase in AUC (Table 4) of WoE and IV susceptibility models for strategy 4 in comparison to strategies 2 and 3, indicating that testing sample population distribution has an influence on model performance. However, strategy 4 has no effect on the AUC of the ANN model (Table 4) in comparison to strategies 2 and 3, indicating that testing sample population distribution has no influence on the performance of ANN model.

Results from the four sampling strategies have shown that number and distribution of landslides have a role in the performance of susceptibility models covering a large area (Table 4). Non-linear methods (e.g. LR and ANN) of susceptibility modelling are adaptable to a large area. The ANN method provided highest accuracy (84%) estimated by strategy 1 and consistently high accuracy (74%) estimated by strategies 2, 3 and 4. The performance of ANN model is also better than other four models in case of strategy 2 wherein the testing population (2265) is quite large in comparison to training population (1205). This indicates that ANN model is suitable for predicting large no. of future landslides in macro-scale landslide susceptibility modelling over a large area. Figure 8 shows the susceptibility map of Mizoram state generated using strategies 1 and 2 by the five models. The susceptibility maps were classified into five categories, and area of testing landslides within each category is also shown in Fig. 8. Strategies 3 and 4 used the same susceptibility map which was generated using strategy 2, hence were not shown separately.

Conclusion

The effect of four sampling strategies prepared using the time and space partitioning approach was investigated for regional macro-scale landslide susceptibility mapping of Mizoram state in India covering 21,087 km² area. The traditional spatial sampling strategy (i.e. 70:30) has shown the highest performance of susceptibility models but failed to retain similar performance in spatially predicting subsequent (future) occurrence of landslides.

Training landslide data, which is further catalysed with an increase in testing sample population from a different boundary condition such as the heavy rainfall of 2017, influence the performance of regional landslide susceptibility models. The prediction performance remains consistently high in the case of ANN model, irrespective of size, distribution and temporal variation of testing data. This implies that the ANN is able to effectively train the predisposing factors for spatially predicting future landslides irrespective of no. of incidences. The removal of geographic bias (Lunglei district cluster) is evident on performance of MCWO, IV and WoE models. Nevertheless, ANN is the best model when creating a long-term susceptibility models, followed by the LR model.

The outcome of any landslide susceptibility model largely depends on the experience of experts utilising the method, well-distributed information of past landslides and terrain-specific information. However, when the process needs to be applied over a large area for macro-scale landslide susceptibility mapping with significant variability in terrain conditions, a priori knowledge on specific method may be challenging to comprehend; hence, ANN method which produced consistently high performance in the predicting spatial probability of future landslide occurrences is recommended irrespective of the area of investigation.

Change history

02 March 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10346-021-01646-0

References

Aleotti P, Chowdhury R (1999) Landslide hazard assessment: summary review and new perspectives. Bull Eng Geol Environ 58:21–44
Google Scholar
Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65(1-2):15–31
Google Scholar
Basheer IA, Hajmeer M (2000) Artificial neural networks: fundamentals, computing, design and application. J Microbiol Methods 43:3–31
Google Scholar
Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Google Scholar
Blahut J, Van Westen CJ, Sterlacchini S (2010) Analysis of landslide inventories for accurate prediction of debris-flow source areas. Geomorphology 119(1-2):36–51. https://doi.org/10.1016/j.geomorph.2010.02.017
Article Google Scholar
Bonham-Carter GF (1994) Geographic information systems for geoscientists: modelling with GIS. Computer Methods in Geosciences, vol. 13. Pergamon Press, Oxford, p 398
Google Scholar
Carrara A, Cardinali M, Detti R, Guzzetti F, Pasqui V, Reichenbach P (1991) GIS techniques and statistical models in evaluating landslide hazard. Earth Surf Process Landf 16:427–445
Google Scholar
Cevasco A, Pepe G, Brandolini P (2014) The influences of geological and land-use settings on shallow landslides triggered by an intense rainfall event in a coastal terraced environment. Bull Eng Geol Environ 73(3):859–875
Google Scholar
Chung CJ, Fabbri AG (2003) Validation of spatial prediction models for landslide hazard mapping. Nat Hazards 30:451–472
Google Scholar
Di Napoli M, Carotenuto F, Cevasco A, Confuorto P, Di Martire D, Firpo M, Pepe G, Raso E, Calcaterra D (2020) Machine learning ensemble modelling as a tool to improve landslide susceptibility mapping reliability. Landslides 17(8):1897–1914
Google Scholar
EM-DAT (2019) www.emdat.be
Ermini L, Catani F, Casagli N (2005) Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 66:327–343
Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874
Google Scholar
Frattini P, Crosta G, Carrara A (2010) Techniques for evaluating the performance of landslide susceptibility models. Eng Geol 111:62–72
Google Scholar
Froude MJ, Petley D (2018) Global fatal landslide occurrence from 2004 to 2016. Nat Hazards Earth Syst Sci 18:2161–2181
Google Scholar
Ghosh S, Carranza EJM, Van Westen CJ, Jetten VG, Bhattacharya DN (2011) Selecting and weighting spatial predictors for empirical modelling of landslide susceptibility in the Darjeeling Himalayas (India). Geomorphology 131:35–56
Google Scholar
GSI (2020) www.bhukosh.gsi.gov.in, Geology 50K map of Mizoram, Accessed on 13 February 2020
Guzzetti F, Carrara A, Cardinali M, Reichenbach P (1999) Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31(1-4):181–216
Google Scholar
Guzzetti F, Reichenbach P, Cardinali M, Galli M, Ardizzone F (2005) Probabilistic landslide hazard assessment at the basin scale. Geomorphology 72:272–299
Google Scholar
Guzzetti F, Reichenbach P, Ardizzone F, Cardinali M, Galli M (2006) Estimating the quality of landslide susceptibility models. Geomorphology 81(1-2):166–184
Google Scholar
Gόmez H, Kavzoglu T (2005) Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Eng Geol 78:11–27
Google Scholar
Hutchinson JN (1995) Keynote paper: landslide hazard assessment. In: Bell (ed) Landslides. Balkema, Rotterdam, pp 1805–1841
Google Scholar
Kanungo DP, Arora MK, Sarkar S, Gupta RP (2006) A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Eng Geol 85(3-4):347–366
Google Scholar
Kanungo D, Arora M, Sarkar S, Gupta R (2009) Landslide susceptibility zonation (LSZ) mapping - a review. J S Asia Disast Stud 2(1):81–105
Google Scholar
Kavzoglu T, Sahin EK, Colkesen I (2014) Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression. Landslides 11(3):425–439
Google Scholar
Lee S (2005) Application of logistic regression model and its validation for landslide susceptibility mapping using GIS and remote sensing data. Int J Remote Sens 26(7):1477–1491
Google Scholar
Lee S, Sambath T (2006) Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environ Geol 50:847–855
Google Scholar
Lee S, Choi J, Min K (2002) Landslide susceptibility analysis and verification using the Bayesian probability model. Environ Geol 43(12):120–131
Google Scholar
Lee S, Choi J, Min K (2004) Probabilistic landslide hazard mapping using GIS and remote sensing data at Boun, Korea. Int J Remote Sens 25(11):2037–2052. https://doi.org/10.1080/01431160310001618734
Article Google Scholar
Lee S, Ryu J-H, Won J-S, Park H-J (2004) Determination and application of the weights for landslide susceptibility mapping using artificial neural network. Eng Geol 71:289–302
Lombardo L, Opitz T, Ardizzone F, Guzzetti F, Huser R (2020) Space-time landslide predictive modelling. Earth Sci Rev 209:103318
Google Scholar
Martha TR, Kerle N, Jetten V, van Westen CJ, Kumar KV (2010) Characterising spectral, spatial and morphometric properties of landslides for semi-automatic detection using object-oriented methods. Geomorphology 116(1-2):24–36
Google Scholar
Martha TR, Kerle N, van Westen CJ, Jetten V, Kumar KV (2011) Segment optimisation and data-driven thresholding for knowledge-based landslide detection by object-based image analysis. IEEE Trans Geosci Remote Sens 49(12):4928–4943
Google Scholar
Martha TR, Kamala P, Jose J, Kumar KV, Sankar GJ (2016) Identification of new landslides from high resolution satellite data covering a large area using object-based change detection methods. J Indian Soc Remote Sens 44(4):515–524
Google Scholar
Mathew J, Jha VK, Rawat GS (2007) Weights of evidence modelling for landslide hazard zonation mapping in part of Bhagirathi valley, Uttarakhand. Curr Sci 92(5):628–638
Google Scholar
Mathew J, Babu DG, Kundu S, Vinod Kumar K, Pant CC (2014) Integrating intensity–duration-based rainfall threshold and antecedent rainfall-based probability estimate towards generating early warning for rainfall-induced landslides in parts of the Garhwal Himalaya, India. Landslides 11(4):575–588
Google Scholar
Montrasio L, Valentino R, Corina A, Rossi L, Rudari R (2014) A prototype system for space-time assessment of rainfall-induced shallow landslides in Italy. Nat Hazards 74(2):1263–1290
Google Scholar
Neuhäuser B, Terhorst B (2007) Landslide susceptibility assessment using “weightsof-evidence” applied to a study area at the Jurassic escarpment (SW-Germany). Geomorphology 86:12–24
Google Scholar
NRSC (2012) NRSC Technical Document: manual for geomorphologyand lineament mapping. Document reference number: NRSC-RSAA-ERG-G&GD-SEP' 12-TR-445
NRSC (2014) Land use/land cover database on 1:50,000 scale, Natural Resources Census Project, LUCMD, LRUMG, RSAA. Hyderabad, National Remote Sensing Centre, ISRO
Google Scholar
Paola JD, Schowengerdt RA (1995) A review and analysis of back propagation neural networks for classification of remotely sensed multi-spectral imagery. Int J Remote Sens 16:3033–3058
Google Scholar
Pardeshi SD, Autade SE, Pardeshi SS (2013) Landslide hazard assessment: recent trends and techniques. SpringerPlus 2(1):523. https://doi.org/10.1186/2193-1801-2-523
Article Google Scholar
Pellicani R, Argentiero I, Spilotro G (2017) GIS-based predictive models for regional-scale landslide susceptibility assessment and risk mapping along road corridors. Geomat Nat Haz Risk 8(2):1012–1033
Google Scholar
Pradhan B, Youssef A, Varathrajo R (2010) Approaches for delineating landslide hazard areas using different training sites in an advanced artificial neural network model. Geo-spatial Inf Sci 13(2):93–102. https://doi.org/10.1007/s11806-010-0236-7
Article Google Scholar
Pudi R, Roy P, Martha TR, Kumar KV, Rao PR (2018) Spatial potential analysis of earthquakes in the western Himalayas using b-value and thrust association. J Geol Soc India 91(6):664–670
Google Scholar
Reichenbach P, Rossi M, Malamud BD, Mihir M, Guzzetti F (2018) A review of statistically-based landslide susceptibility models. Earth Sci Rev 180:60–91
Google Scholar
Remondo J, González A, De Terán JRD, Cendrero A, Fabbri A, Chung C-JF (2003) Validation of landslide susceptibility maps; examples and applications from a case study in Northern Spain. Nat Hazards 30:437–449. https://doi.org/10.1023/B:NHAZ.0000007201.80743.fc
Sarkar S, Roy AK, Martha TR (2013) Landslide susceptibility assessment using information value method in parts of the Darjeeling Himalayas. J Geol Soc India 82(4):351–362
Google Scholar
Sawatzky DL, Raines GL, Bonham-Carter GF, Looney CG (2009) Spatial data modeller (SDM): ArcMAP 9.3 geoprocessing tools for spatial data modelling using weights of evidence, logistic Regression, fuzzy logic and neural networks. http://arcscripts.esri.com/details.asp?dbid=15341
Segoni S, Pappafico G, Luti T, Catani F (2020) Landslide susceptibility assessment in complex geological settings: sensitivity to geological information and insights on its parameterisation. Landslides. 17:2443–2453. https://doi.org/10.1007/s10346-019-01340-2
Article Google Scholar
Sepe C, Confuorto P, Angrisani AC, Di Martire D, Di Napoli M, Calcaterra D (2019) Application of a statistical approach to landslide susceptibility map generation in urban settings. In: Shakoor A, Cato K (eds) IAEG/AEG Annual meeting proceedings, San Francisco, California, 2018 - Volume 1. Springer, Cham, pp 155–162. https://doi.org/10.1007/978-3-319-93124-1_19
Chapter Google Scholar
SPSS (2017) SPSS for Windows, Version 23, 2017. SPSS Inc., Chicago
Google Scholar
Taalab K, Cheng T, Zhang Y (2018) Mapping landslide susceptibility and types using random forest. Big Earth Data 2(2):159–178
Google Scholar
Vahidnia NH, Alesheikh AA, Mohommad A, Hosseinalli F (2010) A GIS based neuro-fuzzy procedure for integrating knowledge and data in landslide susceptibility mapping. Comput Geosci 36(9):1101–1114
Google Scholar
Vakhshoori V, Pourghasemi HR, Zare M, Blaschke T (2019) Landslide susceptibility mapping using GIS-based data mining algorithms. Water 11(11):2292
Google Scholar
Valdiya KS (2016) The making of India: geodynamic evolution. Society of Earth Scientists Series. Springer, Cham 924p
Google Scholar
Van Westen CJ (1993) Application of geographic information systems landslide hazard zonation. ITC Publication 15
Van Westen CJ, Rengers N, Terlien MTJ, Soeters R (1997) Prediction of the occurrence of slope instability phenomenal through GIS-based hazard zonation. Geol Rundsch 86(2):404–414
Google Scholar
Varnes DJ (1984) IAEG Commission on landslides and other mass-movements landslide hazard zonation: a review of principles and practice. UNESCO Press, Paris, 63 pp
Google Scholar
Xiao T, Segoni S, Chen L, Yin K. Casagli N (2020) A step beyond landslide susceptibility maps: a simple method to investigate and explain the different outcomes obtained by different approaches. Landslides 17(3): 627-640.
Yesilnacar E, Topal T (2005) Landslide susceptibility mapping: a comparison of logistic Regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng Geol 79:251–266
Google Scholar
Yin KL, Yan TZ (1988) Statistical prediction models for slope instability of metamorphosed rocks. In: Bonnard C (ed) Proc 5th INternational Symposium on landslides, Pub Rotterdam: A Blakema, Lausanne, pp 1269–1272

Download references

Acknowledgements

We thank Shri Santanu Chowdhury, Director, National Remote Sensing Centre (NRSC), and Dr. P. V. N. Rao, Deputy Director (Remote Sensing Applications Area), NRSC, for their support and encouragement. Critical comments of two anonymous reviewers have helped us to improve the study, and we are grateful to them. We are thankful to the Geological Survey of India (GSI) and Mizoram Remote Sensing Application Centre (MIRSAC) for providing the Geological and Soil map, respectively, of the state.

Author information

Authors and Affiliations

Geosciences Group, National Remote Sensing Centre, Indian Space Research Organisation, Hyderabad, 500 037, India
Kirti Khanna, Tapas R. Martha, Priyom Roy & K. Vinod Kumar

Authors

Kirti Khanna
View author publications
You can also search for this author in PubMed Google Scholar
Tapas R. Martha
View author publications
You can also search for this author in PubMed Google Scholar
Priyom Roy
View author publications
You can also search for this author in PubMed Google Scholar
K. Vinod Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tapas R. Martha.

Additional information

The original online version of this article was revised: This article has an error that was introduced during the publishing process. In this paper, Eq. 3 is mistakenly presented and Table 4 was incorrectly laid out. The correct Eq. 3and Table 4 are provided here.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khanna, K., Martha, T.R., Roy, P. et al. Effect of time and space partitioning strategies of samples on regional landslide susceptibility modelling. Landslides 18, 2281–2294 (2021). https://doi.org/10.1007/s10346-021-01627-3

Download citation

Received: 31 August 2020
Accepted: 18 January 2021
Published: 29 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10346-021-01627-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Effect of time and space partitioning strategies of samples on regional landslide susceptibility modelling

Abstract

Explore related subjects

Introduction

Study area

Materials and methods

Landslide predisposing factors

Landslide inventory

GIS models for susceptibility mapping

Information value (IV) method

Multiclass weighted overlay (MCWO) method

Weight of evidence (WoE) method

Logistic regression (LR) method

Artificial neural network (ANN) method

Model training and validation

Sample preparation

Data training and validation

Results and discussion

Model training

Model validation

Conclusion

Change history

02 March 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation