Introduction

Landslides are frequent natural disasters that have impacted people and economy worldwide. According to the EM-DAT, 275 people died (cases > 10) and 54,908 people were affected (cases > 100) due to landslides with an economic loss of 0.9 billion US$ in the year 2018 (EMDAT 2019). The Himalayan belt is the hotbed for landslide disaster in the world (Froude and Petley 2018). One of the primary management and mitigation measures to reduce landslide disaster risk is to create landslide susceptibility maps (LSM). These maps give a spatial probability of an area to future occurrence of landslides (Reichenbach et al. 2018), thus help planners to decide on prioritisation of localities in a region for land development activities (Sepe et al. 2019). It also helps in regional landslide early warning through incorporation of rainfall threshold models (Mathew et al. 2014).

Landslide susceptibility modelling is a constantly evolving area of research, and a comprehensive review of susceptibility models was recently provided by Reichenbach et al. (2018) and Lombardo et al. (2020). The reliability of LSMs depends mostly on the amount and quality of available data, the working scale and the selection of the appropriate methodology of analysis (Ayalew and Yamagishi 2005; Lombardo et al. 2020). Over the years, LSMs have been prepared in different parts of the world using heuristic, statistical and deterministic approaches (Van Westen 1993; Aleotti and Chowdhury 1999; Reichenbach et al. 2018). Deterministic approach for landslide susceptibility modelling on a regional scale was found to be effective for landslide early warning (Montrasio et al. 2014). But, data-driven statistical methods are commonly used in susceptibility modelling, and they include bivariate analysis, multivariate analysis, neural network, fuzzy logic and genetic algorithms (van Westen et al. 1997; Aleotti and Chowdhury 1999; Guzzetti et al. 2005; Kanungo et al. 2006). Other landslide susceptibility analyses include probabilistic methods and machine learning techniques wherein the weights are assigned according to the probability of landslide and non-landslide occurrences (Bonham-Carter 1994; Vahidnia et al. 2010; Di Napoli et al. 2020). Selection of predisposing parameters which is also important for the success of a susceptibility model has been investigated by Guzzetti et al. (2006), Ghosh et al. (2011) and Cevasco et al. (2014).

Landslide susceptibility assessment is generally based on the concept that ‘the present and the past are key to the future’ (Varnes 1984; Carrara et al. 1991; Hutchinson 1995; Guzzetti et al. 1999; Aleotti and Chowdhury 1999), which implies that slope failure in the future will occur under same conditions which led to past instability (Guzzetti et al. 1999). This is why most hazard analysts take into account updated landslide inventory that represents the fundamental data for identifying the hill-slope instability factors in triggering landslides (Lee and Sambath 2006). Therefore, the use of future or time partitioned inventory is desirable for validation of landslide susceptibility models (Chung and Fabbri 2003; Remondo et al. 2003). Landslide inventory maps, which portray spatial and temporal patterns of landslide distribution, type of movement, rate of movement and kind of material displaced (earth, debris or rock), are used to train and test susceptibility models (Pardeshi et al. 2013). A common practice for acceptance of susceptibility models is validation by the division of the dependent variable in time or space. The space partitioned method splits samples randomly in a particular ratio (commonly 70:30). This method is useful in the absence of time-dependent variables and offers a non-time assessment of results derived from prediction models (Chung and Fabbri 2003). The performance of the prediction result of susceptibility models is estimated using receiver operating characteristic (ROC) curve (Fawcett 2006; Lee et al. 2004; Blahut et al. 2010). However, statistical methods used by them (e.g. Kavzoglu et al. 2014; Pellicani et al. 2017; Xiao et al. 2020), though predicted higher performance using ROC, are applied to a limited area and could not conclusively justify the usage of one GIS model over the other for regional landslide susceptibility mapping.

The objective of this study is to evaluate performance of data-driven models for regional landslide susceptibility with temporally and spatially split landslide inventory data. We investigated landslide susceptibility over a large area covering 21,087 km2 of the Mizoram state, India, using training and testing data prepared with four sampling strategies from a bi-temporal (2014 and 2017) landslide inventory dataset. These are shallow landslides induced by rainfall during the monsoon season. Five susceptibility models, viz. multiclass weighted overlay (MCWO), information value (IV), weights of evidence (WoE), logistic regression (LR) and artificial neural network (ANN), were used to find the effect of the sampling strategies on regional landslide susceptibility mapping. Major landslide predisposing factors in the Himalayas used by previous researchers (e.g. Kanungo et al. 2006; Ghosh et al. 2011) such as slope, aspect, landform, lithology, distance to lineaments, soil and land use were used in the susceptibility models.

Study area

The state of Mizoram in India, covering an area of 21,087 km2, was considered for regional landslide susceptibility modelling on macro-scale (Fig. 1). This state generally witnesses a large number of landslides during the monsoon season. The minimum and maximum elevations of the study area are approximately 550 m and 2100 m, respectively. The state has a highly rugged terrain with narrow, deep valleys and steep slopes (Fig. 1). Landslide inventory for both the years mapped using satellite data is also shown in Fig. 1.

Fig. 1
figure 1

Shaded relief map of Mizoram, India, showing the distribution of landslides in 2014 and 2017 periods

The rainfall pattern for both the years (2014 and 2017) is shown in Fig. 2. The graph indicates that the amount of total rainfall during June 2017 is almost twice the amount of total rainfall during June 2014. Excess rainfall in 2017 during the monsoon season (June to September) has resulted in occurrence of more landslides in 2017 in comparison to 2014.

Fig. 2
figure 2

Monthly rainfall in Mizoram for 2014 and 2017 (Source: Envis Centre: Mizoram)

The geology of Mizoram is controlled by the eastern syntaxial bend of the Himalayan orogeny (Valdiya 2016). The Neogene sedimentary rocks of Tipam and Surma groups are the primary litho units that constitute the region. The Surma group is unconformably overlain on the Barail group, which is made up of shale and siltstones. The Surma group is divided into lower, middle and upper Bhuban formations, which transitionally changes to Bokabil formation. Shales, siltstones and sandstones are the main rock units occurring as interbedded or massive layers (Valdiya 2016). The Tipam group lies conformably over the Surma group and is mainly comprised of thickly bedded sandstones. The sedimentary rocks are folded into asymmetrical anticlines and synclines along N-S axes. The folded and friable arenaceous rocks constituting the topography make the region highly vulnerable to landslides.

Materials and methods

The occurrence of landslides is controlled by predisposing factors such as lithology, landform, soil and geological structure (Carrara et al. 1991; Guzzetti et al. 1999). These layers were prepared and integrated in GIS using weightages derived through five modelling techniques. The flowchart of the methodology is shown in Fig. 3.

Fig. 3
figure 3

Methodology flowchart of landslide susceptibility modelling and validation

Landslide predisposing factors

We have used Cartosat-1 DEM (30 m) for generating topographic factors such as slope angle and slope aspect. The slope angle ranges from 0° to 87° and was classified into ten classes using the natural break method. Aspect is categorised into eight directional classes ranging from 0° to 360° w.r.t. the North. The classes for the slope and aspect are shown in Fig. 4a and b.

Fig. 4
figure 4

(a) Slope, (b) aspect, (c) lithology (Source: Geological Survey of India), (d) structure (Source: GSI), (e) land use (Source: NRSC), (f) soil (Source: MIRSAC) and (g) landform (Source: NRSC). Insets (e1, f1 and g1) show enlarged views

Other factors used in the study are lithology, landform, lineaments, soil and land use. Geological map (i.e. lithology and lineament) on 1:50,000 scale published (www.bhukosh.gsi.gov.in) by Geological Survey of India (GSI) was used in the study (GSI 2020). Shale and sandstone of the upper/middle Bhuban formation of the Surma group form the major litho types in the area (Fig. 4c). Euclidean distance, in the case of lineaments, was calculated through the identification of the nearest landslide location (Fig. 4d). Land Use and Land Cover (LULC) map prepared by NRSC (NRSC 2014) on 1:50,000 scale using satellite data was used in this study. Majority of the area is covered with the deciduous forest with evergreen/semi-evergreen and scrub forest being the less dominant types (Fig. 4e). The soil texture map prepared on 1:50,000 scale by Mizoram Remote Sensing Applications Centre (MIRSAC) using satellite data and field survey was used in this study. The soil is formed by the erosion of the Surma and Tipam group of rocks and is classified mainly as loamy and clayey (Fig. 4f). Landform map prepared on 1:50,000 scale jointly by GSI and National Remote Sensing Centre (NRSC 2012) using satellite data and digital elevation model was used in the study to calculate the weightages of landform classes for landslide occurrence. Highly and moderately dissected hills and valleys oriented north-south are mostly found in the area (Fig. 4g). These predisposing factors were converted to 30 m × 30 m grid size and ingested to the susceptibility models.

Landslide inventory

High-resolution multi-spectral images of LISS-IV acquired from the Resourcesat-2 satellite were used for mapping landslides using the object-based change detection method (Martha et al. 2010, 2011, 2016). Image characteristics such as reduced NDVI in landslide affected areas and increase in brightness due to exposure of new rock and soil are mainly used for detection and mapping of landslides. Landslides in Mizoram are mostly rainfall-induced shallow landslides and are small in size. Therefore, we have mapped the entire body of landslides as single polygon since it is difficult to differentiate scarp from remaining parts of the landslide body. Figure 5 shows pre- and post-landslide satellite images of Mizoram used in landslide inventory mapping. As shown in Fig. 1, landslide occurrences in the east-central part of the study area are less in both the periods. However, landslides in the 2014 period have occurred in the entire study area, although prevalent in the northern part of the area. On contrary, majority of landslides in 2017 occurred in the Lunglei district (western part of the study area (Fig. 1)) due to cyclone-induced rainfall, thus offered an ideal opportunity to validate the models using time partitioned samples. Table 1 shows the summary statistics of landslides mapped in 2014 and 2017 periods.

Fig. 5
figure 5

Pre- and post-landslide Resourcesat-2 LISS-IV Mx image used in the preparation of landslide inventory (yellow polygons) after the monsoon seasons in 2014 and 2017

Table 1 Summary statistics of landslides mapped in Mizoram for the years 2014 and 2017

GIS models for susceptibility mapping

Five models such as information value (IV), multiclass weighted overlay (MCWO), weights of evidence (WoE), logistic regression (LR) and artificial neural network (ANN) were used for generation of landslide susceptibility map using time and space partitioned samples. These models are briefly described below.

Information value (IV) method

This method provides information about the relative influence of predisposing factors on the landslide occurrence. The information value Ji for each disposing factor Xi concerning landslides is given in Eq. (1) (Yin and Yan 1988).

$$ {\mathrm{J}}_{\mathrm{i}}=\ln \frac{S_{\mathrm{i}}/{N}_{\mathrm{i}}}{S/N} $$
(1)

where Si is the area of landslides within the ith class of causative factor X, Ni is the area of the ith class of the predisposing factor X, S is the total area of the landslides in the study area, and N is the total area of the study area. The final susceptibility index map was generated by integrating all factors as shown in Eq. (2).

$$ \mathrm{LSI}=\sum \limits_{i=1}^nJ $$
(2)

where LSI is the landslide susceptibility index and i varies from 1 to n.

Multiclass weighted overlay (MCWO) method

The MCWO method weighs predisposing factors using landslide inventory data and calculates the spatial association of landslides with categorical variables using Yule’s coefficient (YC) (Eq. (3)) (Ghosh et al. 2011).

$$ {\mathrm{Y}}_{\mathrm{c}}=\frac{\sqrt{M_{cl}/{M}_{\overline{c}l}}-\sqrt{M_{c\overline{l}}/{M}_{\overline{c}\overline{l}}}}{\sqrt{M_{cl}/{M}_{\overline{c}l}}+\sqrt{M_{c\overline{l}}/{M}_{\overline{c}\overline{l}}}} $$
(3)

where Mcl is the area of ‘positive match’ where a factor class and landslides are both present, \( {M}_{\overline{c}l} \) is the area of ‘mismatch’ where a factor class is absent, but landslides are present, \( {M}_{c\overline{l}} \) is the area of ‘mismatch’ where a factor class is present, but landslides are absent, and \( {M}_{\overline{c}\overline{l}} \) is the area of ‘negative match’ where both factor class and landslide are absent. The value of YC ranges between −1 and +1. A negative YC means less spatial association, whereas a positive YC means high spatial association (Ghosh et al. 2011). Based on the YC, the landslide favourability score for each factor class is generated using Eq. (4).

$$ LOFS=\left\{\begin{array}{c}0\kern3.5em for\kern0.5em {Y}_C\le 0\\ {}\frac{Y_c}{Y{c}_{max}}\kern1em for\ {Y}_C>0\end{array}\right. $$
(4)

where LOFS stands for landslide observed favourability score and YCmax is the highest value among all YC values of the predisposing factor class.

The LOFS values can determine the predictor sub-class weight, but the absolute value of the landslide predisposing factor, on the whole, can be determined by the ratio of difference of spatial association (YC) as shown in Eq. (5).

$$ \mathrm{PR}=\left[{\mathrm{SA}}_{\mathrm{max}}-{\mathrm{SA}}_{\mathrm{min}}\right]/\left[{\left({\mathrm{SA}}_{\mathrm{max}}-{\mathrm{SA}}_{\mathrm{min}}\right)}_{\mathrm{min}}\right] $$
(5)

where PR stands predictor rating and SA stands for spatial association between factor classes with respect to landslides.

Weight of evidence (WoE) method

Weights of evidence (WoE) was primarily developed for mineral exploration applications (Bonham-Carter 1994). But due to its broad applicability and scope, it has also been used in the field of landslide susceptibility zonation (Mathew et al. 2007). WoE is based on the concept of prior and posterior probability, assuming that input layers are independent of one another (Neuhäuser and Terhorst 2007).

An open-source geospatial tool in ArcGIS (Arc-SDM-10.5, Sawatzky et al. 2009) was used to calculate weights (W+ and W-) depending on the association between landslides and the layers for each class. Also, other parameters like contrast (c) and studentised contrast (Sc) are estimated to provide a spatial relationship between landslides and predisposing factors. The WoE method is discussed in detail by Neuhäuser and Terhorst (2007), Mathew et al. (2007), Blahut et al. (2010) and Pudi et al. (2018).

Logistic regression (LR) method

Logistic regression is one of the multivariate techniques which models the relationship between a dependent (dichotomous) and independent variables. The landslide distribution in the study area comprises of training data and randomly selected equal number of non-landslide data (Lee et al. 2002). The status of each cell in the landslide database is represented as ‘1’ indicating presence of landslides and as ‘0’ indicating the absence of landslides (Yesilnacar and Topal 2005). The model was executed in Statistical Package for Social Sciences ©(SPSS 2017). It is based on the logistic function f (z) which is defined in Eq. (6).

$$ \mathrm{f}\left(\mathrm{z}\right)=1/\left(1+{\mathrm{e}}^{-\mathrm{z}}\right) $$
(6)

where z varies from − ∞ to +∞. To obtain the logistic model from the logistic function, z is written as a linear combination of some constant value, which is the intercept of the model and products of independent variables and their respective coefficients (Eq. (7)).

$$ Z={\beta}_o+\sum \limits_{i=1}^n{\beta}_i{X}_i $$
(7)

where βo is the intercept of the model, βi is the corresponding coefficients for each independent factor, Xi is the independent factor, and i varies from 1 to n.

Artificial neural network (ANN) method

Artificial neural network (ANN) is one of the widely used techniques in landslide susceptibility modelling (Gόmez and Kavzoglu 2005). The purpose of an ANN is to build a model of the data generating process so that the network can predict outputs from inputs through a learning process (Lee 2005). A feed forward network using multi-layer perceptron (MLP) technique comprising of input, hidden and output layers (three layers architecture) was utilised in the study (Fig. 6). The detailed description of MLP can be found in Basheer and Hajmeer (2000). Input data are transformed into output classes through interconnected neurons through weights which are summed up subsequently (Kanungo et al. 2006). The number of neurons during the processing of input and output layers depends on the number of data sources and often determined by trial and error method. These networks are generally non-linear and could process and analyse intricate data patterns (Kanungo et al. 2009). The network learns by adjusting the weights between the neurons in response to the errors between the actual output values and the target output values based on specific algorithms (Lee et al. 2004).

Fig. 6
figure 6

The architecture of the ANN model used in landslide susceptibility mapping

Two stages that are generally involved in using neural networks for multisource classification are (i) the training stage wherein internal weights are adjusted and (ii) the classifying stage (Lee et al. 2004). Weights physically represent connections between processing units or neurons, and each neuron has a rule for summing the input weights and a rule for calculating an output value (Ermini et al. 2005). The rules can be formed from different algorithms which are implemented until the desired threshold is reached. The back-propagation algorithm, which is generally used and also applied in the present study, trains the network until some minimal targeted error is achieved between the desired and actual output values (Bishop 1995; Pradhan et al. 2010). Formally, the input that a single node receives is weighted according to Eq. (8).

$$ {\mathrm{net}}_{\mathrm{b}}=\sum \limits_{i=1}{w}_{\mathrm{a}\mathrm{b}}\ast {\mathrm{o}}_{\mathrm{a}} $$
(8)

where wab represents the weights between nodes a and b and oa is the output from node a. Output from node c is given by Eq. (9).

$$ {\mathrm{o}}_{\mathrm{c}}=\mathrm{f}\left({\mathrm{net}}_{\mathrm{b}}\right) $$
(9)

The function f is usually a non-linear sigmoid function that is applied to the weighted sum of inputs before the signal propagates to the next layer. The error, E, for an input training pattern, I, is a function of the desired output vector, d, and the actual output vector, o, given by Eq. (10).

$$ \mathrm{E}=\frac{1}{2}\sum \limits_{\mathrm{c}}{\left({\mathrm{d}}_{\mathrm{c}}-{\mathrm{o}}_{\mathrm{c}}\right)}^2 $$
(10)

The error is propagated back through the neural network and is minimised by adjusting the weights between layers (Paola and Schowengerdt 1995).

The training phase was executed with seven predisposing factors (e.g. slope, aspect, lithology landform, LULC, soil and structure) and landslide and non-landslide data. The values are normalised and fed into the ANN architecture. The ANN network produced hidden layer weights and an importance matrix through 12 non-linearly connected neurons.

Model training and validation

Sample preparation

The landslide database of 2014 and 2017 was created as polygons in the form of ESRI shape files. These shape files form the base for training and testing of susceptibility models. Both training and testing of models were carried out in a raster environment. Hence, the polygon shape files were converted to a raster file of 30 m × 30 m grid size. However, in instances where the landslide area is less than 900 m2, the polygons were first converted to points (centroids of the polygons) and then rasterised as 30 m × 30 m grid. Thus, all landslides were converted to 30 m × 30 m grid. WoE, LR and ANN models require the dependent variable to be ingested as points during model execution. Hence, the 30 m × 30 m grids corresponding to training data were further converted to points and ingested to these three models to calculate weights of independent variables.

Data training and validation

Any prediction model aims to find the probability of future occurrence of landslides using the historical landslide data. This means prediction done using historical landslide inventory data needs to be validated using succeeding landslide data. Generally, in the absence of subsequent (i.e. future) inventory, the standard approach adopted is by selecting the landslide inventory of a particular year and randomly splitting it into 70:30 ratio (Pellicani et al. 2017; Taalab et al. 2018; Vakhshoori et al. 2019; Xiao et al. 2020) or by taking equal numbers of training and testing datasets (Kavzoglu et al. 2014; Segoni et al. 2020). In another study, Guzzetti et al. (2006) have shown that the performance of susceptibility models generated using a large number of landslides is better in comparison to model performance when less number of landslides are used as training population. This indicates that training sample size also influences the performance of susceptibility models. In this study, the landslide inventory database of 2014 and 2017 was used to design four strategies of training and testing samples to validate the disparity of sample population (both spatial and temporal) on model performance. The random splitting of samples to training and testing data was iterated ten times to rule out that the accuracy obtained for susceptibility models is not result of chance (Kanungo et al. 2006; Lombardo et al. 2020). The landslide polygons were split randomly as training and testing data using the geostatistical analyst tool of ArcGIS 10.5 software and subsequently rasterised as explained in the previous section.

  1. I.

    Strategy 1: Spatial sampling - The landslide inventory of 2014 was considered for the model generation and validation wherein the dataset was randomly divided into 70% (training) and 30% (testing) landslides (Fig. 7a). This is the most common approach followed in landslide susceptibility modelling (Aleotti and Chowdhury 1999; Guzzetti et al. 1999; Ghosh et al. 2011).

  2. II.

    Strategy 2: Temporal sampling - The inventory of 2014 (100%) was considered to train the models, and the inventory of 2017 (100%) was used to test the models (Fig. 7b). This is the ideal approach to validate the performance of landslide prediction models (Chung and Fabbri 2003).

  3. III.

    Strategy 3: Temporal sampling (size constrained testing) - The inventory of 2014 (100%) was considered to train the models, and the inventory of 2017 (50%) was used to test the models. This was done to remove the bias of oversampled testing data by approximately equalising the testing and training sample population (Fig. 7c).

  4. IV.

    Strategy 4: Temporal sampling (geographic constrained testing) - The inventory of 2014 (100%) was considered to train the models, and the inventory of 2017 (50%) was constrained geographically to test the models. There is one cluster of landslides in the western part of Lunglei district (Fig. 1). This cluster boundary was considered to geographically constrain the selection of testing samples. Herein, 20% of the landslides within the cluster and 80% of the landslides for remaining area outside the cluster corresponding to the year 2017 were selected as testing sample population (Fig. 7d). This helped us to validate spatial biasness of testing sample population on the performance of models.

Fig. 7
figure 7

Number and distribution of training and testing landslides in four sampling strategies. a Strategy 1, Spatial sampling; b Strategy 2, Temporal sampling; c Strategy 3, Temporal sampling (size constrained testing) and d Strategy 4, Temporal sampling (geographic constrained testing). The biasness of testing sample population has been reduced in Strategy 4 (less samples in d1 in comparison to c1).

The landslide susceptibility models were validated for their predictive performance using receiver operating characteristic (ROC) curve (Blahut et al. 2010; Frattini et al. 2010; Ghosh et al. 2011). False positives and true positives were calculated as a contingency table by applying a range of different cut-offs (Frattini et al. 2010). ROC as a two-dimensional graph was created between true positive rate (y-axis) and false-positive rate (x-axis). The area under curve (AUC) of ROC is the quantitative measure of the susceptibility model performance (Sarkar et al. 2013). ROC provided relative trade-offs between benefits (true positives) and costs (true negatives) (Fawcett 2006).

Results and discussion

Model training

The four sampling strategies based on two training cases, i.e. 70% and 100% of landslides of 2014, resulting in a total of 844 and 1205 landslides, respectively, were used for training the five models. The two training cases, due to their uniformity in spatial disposition, preserve the effective control of predisposing factors on the prediction of landslides by the five models. The control of the predisposing factors for the 70% dataset training case is summarised in Table 2 and that of the 100% dataset training case is summarised in Table 3.

Table 2 Relative control of predisposing factors on susceptibility models estimated with 70% training data
Table 3 Relative control of predisposing factors on susceptibility models estimated with 100% training data

As shown in Tables 2 and 3, the training sample population influence the relative weight of predisposing factors. Lithology, land use and aspect have the highest control on the occurrence of landslides in all five methods. However, interestingly, when trained with 100% of 2014 landslides, the role of the slope is diminished in comparison to training of models with 70% of the 2014 landslides. This corroborates our understanding that landslides in the Northeast Himalayas in India occur in all kinds of slope conditions provided right kind of lithology (e.g. sandstone-shale-siltstone alternative bands) exists.

Model validation

Landslide susceptibility maps generated using 70% and 100% of 2014 landslides as training datasets were validated with spatial and temporal testing landslide data as per the strategies described in the “Data training and validation” section. The accuracy as AUC (%) estimated using the ROC curve provided a direct comparison of the model performance among the four types of space and time partitioned landslide inventory testing datasets. Table 4 shows AUC (%) obtained using ten iterations of training/testing sampling strategies (1, 3 and 4). Strategy 2 involves only one iteration since entire temporal data were used for training and testing of models. The standard deviation of AUC estimated with ten random splitting iterations is quite low (Table 4), which indicates that model performance is not a result of chance. The accuracy of models reported in this study corresponds to the maximum AUC value.

Table 4 Comparison of spatial and temporal sampling strategies and split iterations on model performance (AUC in %)

In the sampling strategy 1, the maximum AUC (84%) was obtained for IV and ANN followed by WoE (80%), LR (78%) and MCWA (77%) models (Table 4). The AUC obtained using sampling strategy 2 is MCWO (66%), IV (66%), WoE (65%), LR (73%) and ANN (74%). This indicates that IV, ANN and WoE are the best performing models for landslide susceptibility mapping of a large area in comparison to MCWO and LR models, when the spatial sampling method is used. However, there is substantial decrease in the performance of IV and WoE models when temporal sampling strategy (strategy 2) is adopted. This is mainly due to an increase in testing sample size (664%), while the training size has increased only by 30% (Figure 7b). The result shown in Table 4 indicates that the ANN method, which produced maximum AUC (74%) with the temporal sampling strategy, is effective in training predisposing factors with a higher training population (from 844 to 1205).

The landslide occurrence in the 2017 period has been pervasive due to the high intensity of rainfall. The peak rainfall recorded in June 2017 is ~700 mm, which is an order of magnitude higher than that of 2014 (Fig. 2). This factor has been taken into account while devising the temporal sampling strategies. Further, the influence of size (no. of landslide occurrences) of the training dataset is also analysed while validating the model. Therefore, in strategy 3, i.e. size constrained testing, we have considered 50% of the landslides in 2017 as testing samples (1133 landslides), which is validated against the models trained using 100% of landslides in 2014. The results (Table 4) are similar to strategy 2, which indicates that testing sample population has less influence on the performance of susceptibility models, and the models trained with 1205 landslide samples are adequate to predict a future large landslide event. It is seen that a large cluster (approx. 50% of total) of landslides in 2017 is present in the Lunglei region (Fig. 1). This was due to intense cyclonic rainfall in the Lunglei region, an anomalous scenario occurring during the regular seasonal monsoon. Even in a random selection of testing data, more samples are selected from that area, creating a possible spatial biasness on the modelling results (Fig. 7c1). In order to evaluate the geographic biasness of the testing sample population on model performance, a stratified systematic sampling strategy, i.e. strategy 4, was considered wherein 20% from the total landslides are randomly picked up from Lunglei cluster (227 landslides), while remaining 80% are randomly picked up from the rest of the state (906 landslides). This resulted in the reduction of testing samples in the Lunglei cluster (Fig. 7d1) and better geographic distribution of testing sample population in the remaining area outside the cluster while retaining the 50% total sample population (1133) of landslides in 2017. Results show an increase in AUC (Table 4) of WoE and IV susceptibility models for strategy 4 in comparison to strategies 2 and 3, indicating that testing sample population distribution has an influence on model performance. However, strategy 4 has no effect on the AUC of the ANN model (Table 4) in comparison to strategies 2 and 3, indicating that testing sample population distribution has no influence on the performance of ANN model.

Results from the four sampling strategies have shown that number and distribution of landslides have a role in the performance of susceptibility models covering a large area (Table 4). Non-linear methods (e.g. LR and ANN) of susceptibility modelling are adaptable to a large area. The ANN method provided highest accuracy (84%) estimated by strategy 1 and consistently high accuracy (74%) estimated by strategies 2, 3 and 4. The performance of ANN model is also better than other four models in case of strategy 2 wherein the testing population (2265) is quite large in comparison to training population (1205). This indicates that ANN model is suitable for predicting large no. of future landslides in macro-scale landslide susceptibility modelling over a large area. Figure 8 shows the susceptibility map of Mizoram state generated using strategies 1 and 2 by the five models. The susceptibility maps were classified into five categories, and area of testing landslides within each category is also shown in Fig. 8. Strategies 3 and 4 used the same susceptibility map which was generated using strategy 2, hence were not shown separately.

Fig. 8
figure 8

Susceptibility map of Mizoram generated by five models using strategies 1 and 2. Area of landslides (%) not used in model generation (i.e. testing data) is shown adjacent to each susceptibility class

Conclusion

The effect of four sampling strategies prepared using the time and space partitioning approach was investigated for regional macro-scale landslide susceptibility mapping of Mizoram state in India covering 21,087 km2 area. The traditional spatial sampling strategy (i.e. 70:30) has shown the highest performance of susceptibility models but failed to retain similar performance in spatially predicting subsequent (future) occurrence of landslides.

Training landslide data, which is further catalysed with an increase in testing sample population from a different boundary condition such as the heavy rainfall of 2017, influence the performance of regional landslide susceptibility models. The prediction performance remains consistently high in the case of ANN model, irrespective of size, distribution and temporal variation of testing data. This implies that the ANN is able to effectively train the predisposing factors for spatially predicting future landslides irrespective of no. of incidences. The removal of geographic bias (Lunglei district cluster) is evident on performance of MCWO, IV and WoE models. Nevertheless, ANN is the best model when creating a long-term susceptibility models, followed by the LR model.

The outcome of any landslide susceptibility model largely depends on the experience of experts utilising the method, well-distributed information of past landslides and terrain-specific information. However, when the process needs to be applied over a large area for macro-scale landslide susceptibility mapping with significant variability in terrain conditions, a priori knowledge on specific method may be challenging to comprehend; hence, ANN method which produced consistently high performance in the predicting spatial probability of future landslide occurrences is recommended irrespective of the area of investigation.