Keywords

1 Introduction

Landslides are one of the most critical and life-threatening natural disasters having the potential of causing huge economic breakdown especially for the hilly terrains in North-Eastern India. These areas as a consequence, face terrible challenges due to landslide occurrences in the rainy season. Hence the formulation of adequate methodologies to identify the occurrences of landslides is a critical task. Several researches have been carried out that illuminate different ML techniques for landslide identification. For instance, a high-resolution digital elevation model was constructed and its derivatives are exploited for the identification of bedrock landslides [36]. Advanced remote sensing, visual interpretation, and perception are collaborated to evaluate remotely sensed images along with topographic surfaces [22]. Researches yield that spectral, shape and contextual information can be combined together in the OO approaches for the landslide identification and using the multi-temporal images, further exploration can be done in order to identify the historical landslides [33, 35], on which the digital terrain model is used very often [20]. An object-based approach for landslide inventory mapping has been proposed which is optimized by the Taguchi method [40]. The method of data segmentation and SVM can be used to identify the forested landslides with the association of the DTM [60]. Again, the DTM derivatives can be associated with RF along with the SVM to identify the forested landslides [31]. In the field of landslide identification along with many geo-morphological, geotechnical applications, ML and DL methods are found to be efficient and propitious. An integrated method for identifying landslides using ML and DL techniques has been proposed [63], in which the DCNN-11 model and RecLD landslide database were found to be the most promising procedures. An adaptive neural-fuzzy inference system has been proposed for landslide-susceptibility mapping using a geographic information system (GIS) environment [3, 42]. A GIS based SVM model has been proposed for susceptibility mapping of landslides triggered by earthquakes [64]. Studies using DT, ANN, GAM model, CART, LR model, ME model have been carried out [5, 6, 32, 51, 56, 57]. Although the ML techniques are getting importance, each method has its advantages and disadvantages regarding the factors depending on which, the selection of ML methods are carried out [61]. Consequently no one technique or method is universally approved accepted or preferred for landslide hazard and zonation mapping satisfactorily or sufficiently, which results that landslide susceptibility mapping and zonation remain a convoluted and perplexing area of study.

2 Causative Factors for the Occurrence of Landslide

The causative factors for the landslide’s occurrences in hilly terrains may be divided into two categories, internal and external. The internal factors such as heavy rainfall, stream erosion, snow melting, ground water-level change, volcanic eruption [9, 16, 26] and the external factors such as expansion of the agricultural area and built-up area, deforestation, clear-cutting, shifting agriculture, poorly planned construction of roadways play an important role in the happening of the landslides and its increase to a fair extent. Undoubtedly, the external factors are mostly human activities. Previous studies indicate that the frequency as well as the magnitude of the landslides occurrence has been on the increasing side due to elevation, slope gradient, slope aspect, slope curvature, rainfall, fault distance, distance to drainage, distance to road, LULC, NDVI, TWI, STI, SPI [17, 45, 58, 59]. Moreover, to the accumulation of the internal and external factors, the climatic extremities in hilly or mountainous regions are also to be considered. In the Indian Himachal Region, the internal factors such as lithology, altitude, slope steepness, fragility of soil, heavy rainfall and many anthropogenic activities like rapid deforestation, agricultural shifting and expansion act as emphatic reasons behind the increased landslides in many of the potentially unstable areas [50]. Consequently, the increment in the landslide occurrence is becoming causative factors for tree losses, forest fragmentation, changes in LULC, slope-instability [1, 38], eventually the natural landscape is vastly affected by the impact of the landslides [14, 52]. Hence, mitigation of frequent landslides occurrence on unstable slopes and assessment of the adverse effects on natural landscapes is one context of this paper.

3 Statistical Approaches for Landslide Susceptibility and Zonation Mapping

For the evaluation of landslide susceptibility and hazard zonation, several techniques have been proposed, including landslide inventories design, statistical modeling techniques, probabilistic methodologies, deterministic methodologies etc. [8, 45, 46, 48]. In the past few years, the landslide susceptibility and zonation mapping approaches have been shifted from heuristic approaches to statistical (data-driven) approaches. The statistical methods can be broadly classified into two categories, namely, bi-variate and multivariate statistical analysis.

3.1 Bi-variate Statistical Methodologies

The bi-variate statistical technique illuminates that if a situation holds in all observed cases, then the situation holds in all cases. There is a general assumption on which the bi-variate statistical techniques depend, “past and present are the key to the future”. The common techniques falling under the bi-variate statistical approach are; Weight-of-Evidence (WoE) model [39] and Information Value (IV) model [37]. Apart from these, Frequency Analysis method, known as likelihood ratio method has also been proposed [30]; Fuzzy Logic approach and Weighted Overlay method are proposed respectively [28].

Weight-of-Evidence (WoE) Approach

A quantitative and data-driven approach, used to calculate the causative factors after avoiding the weight’s subjectivity. Originally developed for the identification and exploration of mineral deposits, this method came to the application area for the study of landslide susceptibility and with this method, the prior probability, conditional probability, and the positive and negative weights of landslide susceptibility can be determined [11, 54]. The positive and negative weights are:

$$ W^{ - ve} = ln\frac{{\{ B|D\} }}{{\{ B|\overline{D}\} }} $$
(1)
$$ W^{ - ve} = ln\frac{{\{ \overline{B}|D\} }}{{\{ \overline{B}|\overline{D}\} }} $$
(2)

With ‘P’ denoting the probability, ‘\({\text{B}}\)’ and ‘\({\overline{\text{B}}}\)’ denote the presence and absence of potentially desired landslide causative factors respectively, ‘\({\text{D}}\)’ and ‘\({\overline{\text{D}}}\)’ respectively denote the presence and absence of landslides.

Information Value (IV) Method

Alternatively known as landslide index method, this method is used to compute the weighted class value through the landslide density with respect to each and every landslide causative factors [37, 66]. The mathematical representation of information value follows:

$$ W = ln\frac{Landslide\,density\,with\,a\,potential\,class\,of\,causative\,factors}{{Landslide\,density\,in\,the\,area}} $$
(3)
$$ W = ln\frac{{N_{pix} \left( {S_i } \right)/N_{pix} \left( {x_i } \right)}}{{\sum N_{pix} \left( {S_i } \right)/N_{pix} \left( {x_i } \right)}} $$
(4)

Frequency Ratio Analysis Method

Popularly known as likelihood ratio method, FR is one of the very widely used bi-variate statistical technique, which uses the correlation between the classes of potential causative factors and the spatial distribution of occurred landslides in the area of study [7, 30]. So, FR > 1 shows more significant correlation to the landslide occurrence, while FR < 1 shows less significant correlation to the same. It can be represented as

$$ FR = \frac{Percentage\,of\,landslide\,in\,a\,class}{{Area\,of\,the\,factor\,class\,as\,a\,percentage\,of\,the\,entire\,area}} $$
(5)

The landslide susceptibility index can be thus represented as follows:

$$ LSI = \sum_{i = 1}^n {FR_i } $$
(6)

Weighted Overlay Method

In this method, the landslide hazard can be calculated by assigning the weights based on the correlation of landslide frequency with its causative factors [12]. It is assumed in this method, that if the factors for which landslides occurred in the past, if reoccur in some other area in the future, can again result in the occurrence of landslides. Higher the weight to a potential causative factor or to its class, represents greater significance for the occurrence of landslides [25, 29]. The mathematical representation for the same follows:

$$ S = \frac{\sum W*SP}{{\sum W}} $$
(7)

‘W’ denotes the weight assigned to the respective factor, ‘SP’ represents the weight to the spatial class and ‘S’ is the spatial value of the output map.

3.2 Multivariate Statistical Analysis and Methodologies

Multivariate statistical analysis approach for landslide hazard zonation and susceptibility mapping is based on the relative contribution of each potential instability factors to the entire landslide susceptibility of the study area [41]. The multivariate statistical methodologies, determine the percentage of landslide for each and every pixel, and data layer on the presence and absence of landslides is produced and calculated, then the reclassification of hazard is followed with the help of the said methodologies. Logistic regression (LR) analysis, Discriminant analysis are the methodologies that fall under the category of multivariate statistical analysis.

Logistic Regression (LR) Analysis

Using LR method, the occurrence of landslides and the dependability factors can be represented by the following equation:

$$ P = \frac{1}{{1 + e^{ - z} }} $$
(8)

where, P is the probability of the occurrence of landslide, and z represents a linear combinatorial equation as follows:

$$ z = c_0 + c_1 x_1 + c_2 x_2 + \ldots + c_n x_n $$
(9)

where \(x_i\) (i = 1,2,3,…,n), represents the environmental factors for landslides, \(c_0\) represents the model intercept, \(c_i\) (i = 1,2,3,…,n), represents the regression coefficient. Extensive application has been done with this methodology for landslide susceptibility for the Umbria region in central Italy [19]. The LR technique has been used for landslide hazard zonation mapping model for Hong Kong, based on the use of DEM in the GIS perspective [49]. A comparative analysis of different ML techniques along with the heuristic model for predicting landslides has been carried out [23].

Discriminant Analysis Methodology

A frequently used multivariate statistical modeling technique, facilitating to compute the maximum difference for each potential causes segregated in two groups as landslide and non-landslide group. This method assumes all dependent variables to be categorical rather than being continuous [17]. Thus, the weights can be calculated on the basis of the maximum difference. This method can be classified in two categories, (a) Quadratic Discriminant Analysis (QDA), (b) Linear Discriminant Analysis (LDA) [62]. By the use of this method, the Standardized Discriminant Function Coefficient (SDFC) can be calculated and further the relative significance can be represented in terms of discriminant function, acting as a predictor of the instability of slope, eventually considered as one of the most potential factors for landslides occurrence. Using SDFC, the variables having maximized coefficients are correlated strongly to presence or absence of landslide in the study area [21, 44].

4 Pros and Cons of Different Statistical Methodologies for Landslide Susceptibility and Zonation Mapping

Previous studies have suggested that the advantages and disadvantages depend on the application of the technique to the relative context, procurement and/or collection of data and scale of their application [4, 19]. The statistical methodologies are developed based on the correlation between occurred landslides and their causative factors, to which weights are assigned for the measurement of the same, and these weights for the factors are statistically determined. The analysis of the functional relationship between the thematic factors or variables and the distribution of slope deterioration, also termed as the landslide inventory. The statistical techniques are advantageous as these methodologies can be applied over a large area and the past landslide data can extensively be used in the determination, stratification and calculation of the weights for various causative factors for landslides, as it can be witnessed in the WoE model [2]. However, there are some limitations associated with these data-driven techniques. The collection of the past landslide inventory data over large area is considered to be the fundamental disadvantage for the statistical techniques, as the general regulations for landslides susceptibility are formulated based on the past landslides in the area. Consequently, the requirement of a well-defined and distributed landslide inventory data as an input becomes essential for ensuring prompt result. However, there are no fully accepted techniques for the same, which acts as the motivation for the study of landslide susceptibility and hazard zonation mapping in a more extensive way [53].

In addition, to ensure effective and promptness, the collection and validation of the necessary input data are also required, however, the data are rarely available. As a result, large efforts are required to accomplish the same, provided extensive interaction is also a requirement, between the geo-morphologists and statisticians to execute and assess the collected geo-environmental, geo-morphological and landslide data. Apart from these, the study area plays a crucial role as the statistical models are negatively impacted by the study area, which makes it an uneasy task to compute the comparison between the classes of landslide susceptibility from different locations. Moreover, studies are extensively required in the hilly terrains for the future geo-morphological and environmental planning, but very often the mapping techniques based on statistical methodologies happen to be non-understandable by non-specialists which include planners and stakeholders [13, 43, 46, 64]. Research yields that the statistical techniques can extensively be applied for medium scale study in a data scarce environment, however small-scale study for the same can also be done, but the result may not be prompt enough as the data collection in a large geographical area is less feasible, making the statistical methodologies less or practically not feasible for the same.

5 Landslide Susceptibility Assessment and Zonation Mapping Techniques: Literature Survey in Indian Context

The following section contains a summarized literature study about landslide susceptibility and zonation mapping techniques in the Indian Context. An in-depth study along the national highway (NH-39), Manipur has been given about the landslides along with the involvement of various landslides triggering mechanisms and also revealing the fact that the landslides are caused by wedge failure for the slope instability [27]. It is also observed that the terrain comprising soil and rock with a high factor of safety (0.62–1.82) are landslide-prone. The landslide risk and hazard assessment technique using an index value, landslide nominal risk factor (LNRF) and GIS techniques have been proposed [18] from the Ramganga catchment, Himalayas. The heuristic techniques have been proposed, a quantitative methodology has been developed for landslide hazard zonation based on a factor in a numerical rating scheme, called landslide hazard evaluation factor (LHEF). For the Indian mountains, a comparison between BIS and WoE has been drawn [15], which shows that the latter produces enhanced and improved results. A comparative exploration among the BIS, MCA and FR methods was carried out [24] which have shown that FR method is more establishing in nature. Studies were also carried out for establishing the impact of landslides on human lives in the Himalayan region in India [10]. Considering all types of studies altogether, it becomes very clear that the database or inventory related to landslide is insufficient, which may be eradicated by a universally accepted procedure through extensive study.

6 Recent Gaps and Future Directives in the Study of Landslide Susceptibility Mapping and Hazard Zonation

Indian Himalayan Region is highly susceptible to natural disasters, and landslides are one of the same. For the study in landslides, the landslides susceptibility mapping and hazard assessment become very crucial in order to pick out the susceptible areas and assess the risk. This way, the disaster and economic loss may be optimized. The national level organizations and institutions associated with the disaster mitigation and analysis in India include NRSC, IIRS, NIDM, ISRO, GSI, and BMTPC. To obtain landslide susceptibility maps, GSI applies AHP for computing the rating of factors of the classes and assignment of weights to the potential factors with the help of knowledge driven approaches, provided that AHP is a semi-quantitative method which assigns weights through the pair wise relative comparison in the decision process without any inconsistencies. However, AHP does not provide any certainty regarding the selection in ranking of the geo-factors as it may differ from expert to expert. Hence, other quantitative techniques are required to be compared with AHP in order to prepare useful landslide susceptibility maps. The NRSC has a significant role to play with the preparation of landslide inventory using the earth observation data. It has prepared historical landslide inventories using a semi-automatic image analysis algorithmic approach [34]. The historical landslide inventories and landslide susceptibility maps are limited as these are event based such as earthquakes, rainfall. Consequently, the multi-spatial temporal and non-event-based landslide inventories and landslide susceptibility mapping are crucial as these acts as an existing gap in the disaster mitigation.

7 Conclusion

Landslide identification, susceptibility mapping and hazard zonation are comprehensive, crucial and at the same time very critical task in nature as the historical landslide inventories, datasets associated to the existing statistical and knowledge-driven methodologies are very much on the limited side and there are limitations in order to acquire them. The statistical techniques and quantitative methodologies are found to be reliable as these are promising in nature and the landslide identification, prediction and hazard mitigation present the comprehensibility as these are based on the realistic and interpreted data, however the limitations in the availability of credible data makes it effort-worthy techniques. Furthermore, the purpose of investigation, the extent of the study area to be covered, type of landslide, resource availability is to be considered as the potential factors for the same. In the current scenario, the collaboration of quantitative and data-driven techniques has made the landslide susceptibility and zonation mapping a more objective and promising procedure. However, the study for the same is a never-ending process; so good understanding and governing factors are required for the study. The ML techniques are to be collaborated with the remote sensing and GIS methodologies in order to produce comprehensive susceptible maps for the complex natural geo-hazard so that the hazard mitigation and management may be apprehended at local/state/national level.

The following Table 1 gives a comparison among Weight-of-Evidence, Frequency Ratio, Information Value, Logistic Regression for regions in Darjeeling Himalayas

Table 1. Comparison of WoE, FR, IV, LR