Keywords

14.1 Introduction

Landslides are an omnipresent global hazard incurring losses around the geophysical environment. The devastating impact of landslides is not limited to the socioeconomic losses but extends up to swerve environmental implications. The global disaster reports confirm that landslides are responsible for 4.9% of the total natural disasters [1]. The spatiotemporal occurrence of landslides is modulated by some triggering processes. These include earthquakes, extreme rainfall, snow/glacier melting, land-use/land cover (LULC) changes, various anthropogenic activities which cause vibrations, overburden on soil material, uprooting of supports laterally and change in moisture content of soil or rock structures, etc. [2, 3]. The global economic expansion and unplanned haphazard development activities, in the mountainous areas, have exacerbated the socioeconomic impacts of landslides in recent times [4, 5]. Hence, for effective landslide risk management, the areas susceptible to landslides should be identified accurately so that adequate response and emergency measures can be administered.

A detailed investigation of the various landslide influencing factors resulting in slope failure can provide relevant information regarding landslide occurrence. Landslide susceptibility mapping (LSM) is considered fundamental in a competent approach toward landslide hazard assessment, management, and mitigation [6, 7]. The susceptibility mapping is a complex process of establishing interdependence of historical landslide events and topographical, geological, and hydrological variables, which are expected to influence landslide occurrences on a regional scale [8]. The various stages of landslide susceptibility analysis include landslide inventory generation and identification of landslide causative factors (LCFs), spatially correlating these variables using a modeling framework and model validation [9, 10]. The predictive potential of models is correlated with the quality and accuracy of landslide inventory and the optimal selection of LCF [4, 11]. For reliable and accurate landslide susceptibility mapping of an area, accurate mapping of landslides and selection of causative factors should be conclusive and logical [12, 13]. Optimization of landslide causative factors is crucial for an effective prediction model [14, 15]. Researchers have used factor analysis, multicollinearity analysis, linear correlation, certainty factor approach, and multifactor set techniques for the optimal selection of LCF [10, 16]. Most of the time, these techniques were found to be inadequate or highly time-consuming. As such, no standard guidelines are available for the optimal selection of LCF. Hence, there is a need to develop an adequate approach for optimal selection of the landslide causative factors (LCFs) to achieve high predictive potential from the applied models within a reasonable time.

The LSM modeling process has evolved over the years from qualitative models such as weight of evidence (WOE) [17, 18], analytical hierarchy process (AHP) [19, 20], etc. toward data-driven models such as the evidential belief functions (EBF) [21, 22], frequency ratio (FR) [23, 24], certainty factor (CF) [25, 26], etc. These techniques have performed satisfactorily for landslide prediction but sometimes lack functional correlations between LCFs [4]. Another drawback of bivariate models is that hypothesis must be accepted before modeling [27].

The recent advancements in machine learning (ML) algorithms and their integration with Python or R programming have proved to be an articulate tool for mapping and analyzing natural hazards [28]. Traditional ML algorithms include logistic regression (LR) [4, 29], artificial neural networks (ANN) [30, 31], decision tree (DT) [17, 32], support vector machine (SVM) [28, 33], fuzzy logic (FL) [34, 35], Naïve Bayes (NB) algorithm [36, 37], kernel logistic regression (KLR) [38, 39], random forest (RF) [21, 29], etc. The ML algorithms can rearrange their internal structure according to landslide data type and have the potential to analyze and update the factor contribution automatically and continuously [37, 40]. However, the results of ML techniques are prone to errors and sometimes lack ease of interpretation concerning the individual contribution of subclasses of LCF. In recent times, various ensemble or hybrid learning techniques are being used, combining multiple modeling approaches to improve the overall predictive potential of the model [41]. Some commonly used ensembling techniques in landslide susceptibility analysis include bagging, stacking, and boosting approaches. A hybrid bagging-based kernel logistic regression (BKLR) approach was adopted by [42] using two kernel functions. Hence, the objective of the current study is to integrate the frequency ratio (FR) statistical model with radial kernel-based support vector machine (SVM) learning models to analyze and predict the potential landslide-prone areas. The accuracy of prediction and validation for FR, SVM, and hybrid FR-SVM models is analyzed using ROC curves.

14.2 Study Area

The study area constitutes the Mandi district of Himachal Pradesh lying between 31°13′50″–32°04′30″ N and 76°37′20″–77°23′15″ E. The district has a total area of 3951 km2 with a population of 901,344. A major part of the district falls in the Lesser Himalayan region comprising of steep and rugged mountain ranges and fluvial valleys. The regions’ altitude varies between 500 m and 3400 m from low-lying valleys to higher-elevation mountain ranges. With a forest cover of 45%, scrub, sal, and bamboo forests are found at lower elevations, while the alpine forests are characteristics of higher elevations. The Siwaliks and Lesser Himalayan soils are mainly found in the district, which are generally high in organic matter and characterized by rugged topography. The district has two distinct and well-defined hydrogeological units, that is, the porous formations constituted by unconsolidated sediments and the fissured formations. The study area is drained by Beas and Sutlej rivers. The density of roads for the region is 155 km per 100 km2, which is higher than the state’s average density.

A landslide inventory is a prerequisite for analyzing the spatial distribution of landslides, which is necessary to identify potential landslide zones in the study area [43]. The primary landslide inventory was generated from visual interpretation through high-resolution Google Earth images and analyzing the terrain characteristics derived from Advanced Land Observing Satellite (ALOS) Polarimetric Phased Array L-band Synthetic Aperture Radar (PALSAR) Digital Elevation Model (DEM). The published landslide inventories from Himachal Pradesh State Disaster Management Authority (HPSDMA), Geological Survey of India (GSI), and NASA along with numerous newspaper articles and previous landslide studies of the study area [44,45,46] were the auxiliary data sources. The spatial information of the landslides was extracted, and an inventory was compiled by incorporating geomorphologic, LULC, landslide magnitude (length and area) and other characteristics. A total of 1723 landslides with an average area of 1425 m2, the area of landsides varies from a minimum of 2.5 m2 to maximum 2.9 × 105 m2, were mapped. The landslide inventory was further split individually for both districts into 70% training and 30% validation datasets, as suggested by [47,48,49] in ArcGIS (Fig. 14.1).

Fig. 14.1
A map of the Mandi district of Himachal Pradesh that includes Jogindernagar in the north and Shikari Devi toward the south. The landslide testing and training datasets are marked across the map with clusters around Sundernagar, Mandi, Pandoh, Sarkaghat, Padhar and Jogindernagar, Shikari Devi is marled with some of the highest elevated areas.

Study area and landslide inventory with training and testing datasets

14.3 Materials and Methods

14.3.1 Landslide Causative Factors (LCFs)

The predictive capabilities of the applied algorithms depend on the quality of processing data derived from data sources such as DEM and satellite images. This study incorporates 11 independent factors influencing landslide occurrence. ALOS-PALSAR digital elevation model (DEM) of 12.5 m resolution was used to derive elevation, slope gradient, slope aspect, curvature, topographical wetness index (TWI), and drainage density whereas the Landsat-8 (OLI) imagery was used to derive the normalized difference vegetation index (NDVI) and lineament density maps. The rest of the thematic layers, such as distance from road, geology, and soil maps, were procured and mapped using data from various government repositories [50, 51]. The summary of data products and derived information is given in Table 14.1. The slope gradient measures the steepness of the hilly slopes and was subdivided into five categories using natural break classification. The plan curvature is defined as the angle of contours generated by their intersection with the horizontal surface. The plan curvature represents the direction of maximum slope and helps in identifying the morphology of topography of the area and differentiating between valleys and ridges [52, 53]. Although the relationship between landslides and aspect is still under investigation, the aspect has a direct relationship with discontinuities, vegetation covers, and soil moisture, which affects the landslide occurrences [36, 54]. The elevation of a region signifies the altitude from mean sea level and is widely used in susceptibility analysis. The northern region of the study area has higher elevation varying from 3000 to 6000 m. The drainage density represents stream length per unit area in a drainage basin. It directly influences the erodibility of slopes that are dissected by channels and influences the surface runoff [55]. The topographical wetness index (TWI) is the measure of accumulation of water in areas having variable elevations. Higher TWI values signify greater tendency of slopes toward erosion [5, 53]. The TWI map of the study area was prepared by mathematical augmentation of drainage parameters, using equation TWI = [ln (FS)/Tan (α)], where FS represents the accumulation of flow and α represents the gradient of slopes. The lineament density represents the topographical surface and the underlying faults and fractures in the structures. The lineaments of the study area were derived from Landsat-8 imagery and line density tools in GIS [56]. The NDVI is a dimensionless entity and gives information about the vegetation cover in an area. The NDVI map of the district was generated from Landsat-8 images by using image analysis in ERDAS IMAGINE software using NDVI = (ꞶNIR – ꞶRED)/(ꞶNIR + ꞶRED), where ꞶNIR represents near-infrared channel and ꞶRED represents red channel of the electromagnetic spectrum.

Table 14.1 Data purpose and source along with their scale/resolution

The geology of an area represents the topographical aspects of the underlying surface of an area along with its mineral and rock types [53]. The Jutogh formation of Mandi district comprises of slates, schists, and quartzite with hematite. Mandi-Darla volcanic is also known to occur, which represents lava flows in the past. The Shah formation is characterized by salt grit, dolomite, limestone, quartzite, and red shales [57, 58]. The soil properties influence the landslide occurrence in the region [53]. The study area’s soil map procured from the National Bureau of Soil Survey and Land Use Planning (ICAR-NBSS and LUP), digitized in GIS environment. Five categories of soils were identified based on their geomorphological and erosion characteristics. The lesser Himalayan soils of side and reposed slopes are predominant in both the districts having coarse loamy and skeletal loamy soils and facilitating moderate to swerve erosion [59]. Additionally, Siwalik soils of fluvial valleys are widespread in Mandi district having sandy to loamy structure facilitating moderate erosion.

The unplanned road construction activities often involve disruption of the natural bed slopes which results in higher probability of slope failures [48, 52]. The details of road network distribution of national highways, state highways, and major district roads in the study area were procured from MORTH and GSI database, digitized in GIS software. The Euclidian distance from roads was calculated using distance from line operation in GIS and was classified into five categories from 0 to 500 m at an interval of 100 m each to analyze the influence of road construction on landslide occurrence. The methodology adopted for the study area is described through a flowchart as shown in Fig. 14.2, and the maps of thematic layers are shown in Fig. 14.3.

Fig. 14.2
A diagram explains maps and reports lead to landslide causative factors and multicollinearity test. Good satellite images and investigation contribute to landslide inventory and field validation. All of the makes the landslide inventory or database. This enables appropriate models, analysis and model variation and comparison.

The methodology adopted for the susceptibility analysis

Fig. 14.3
10 maps of the same area with various distributions and factors of landslide marked. They are maps with slope gradient, plan curvature, slope aspect, elevation, drainage density, lineament density, geology, N D V I, soil, distance from roads, and T W I.

Landslide causative factors: (a) slope gradient, (b) plan curvature, (c) slope aspect, (d) elevation, (e) drainage density, (f) lineament density, (g) geology, (h) NDVI, (i) soil, (j) distance from roads, and (k) TWI

14.3.2 Bivariate Frequency Ratio (FR) Model

The FR represents the ratio between the pixel data with and without landslides and pixels of input raster data layers of causative factors. The FR values are computed for each class of causative factors using Eq. (14.1). The correlation is high if the FR value is greater than 1, while less than 1 value of FR represents lower correlation of landslides with causative factors [60]. A study to access landslide susceptibility using the frequency ratio method was carried out by [61], which incorporated nine predictor variables. The FR model is considered a good modeling algorithm due to its easier applicability and production of better results than similar models [62]:

$$ \textrm{FR}\left(\textrm{i}\right)=\Big({\textrm{N}}_{\textrm{LP}}/{\textrm{N}}_{\textrm{TP}}/\left(\sum {\textrm{N}}_{\textrm{LP}}/\sum {\textrm{N}}_{\textrm{TP}}\right) $$
(14.1)

where:

  • NLP, landslide pixels in each class of landslide factors

  • NTP, total number of pixels in each class

  • ∑NLP, total landslide pixels in the study area

  • ∑NTP, total pixels in the whole study area

The FR values calculated for all landslide factors are combined to produce the final LSM map using Eq. (14.2):

$$ {\textrm{LSM}}_{\textrm{FR}}={\textrm{FR}}_1+{\textrm{FR}}_2+{\textrm{FR}}_3+\dots +{\textrm{FR}}_n $$
(14.2)

14.3.3 Support Vector Machine (SVM) Model

The SVM algorithm is a set of supervised machine learning algorithms used to analyze nonlinear data for regression as well as classification [63]. SVM is a nonparametric approach which uses classification hyperplane along with a set of data points that are closer to the hyperplane called support vectors to maximize classification margin [33]. The SVM is considered a robust algorithm and is widely used in landslide susceptibility analysis.

The kernel tricks convert nonlinear datasets and project them into higher-dimensional dataset by applying Lagrangian multipliers. These functions can be categorized as radial, linear, or polynomial and sigmoid [10, 64] and are mathematically expressed as Eq. (14.3):

$$ k\left({x}_i,{y}_j\right)=\left[f\left({x}_i\right),f\left({y}_j\right)\right] $$
(14.3)

where xi and yj are the dimensional inputs for kernel function k in an n-dimensional environment. The optimum hyperplane is generated using the decision function given in Eq. (14.4):

$$ y(x)=\left(\alpha .\rho (x)\right)+b, $$
(14.4)

where α represents the orientation vector of the hyperplane, ρ(x) is the input sample x which is to be converted, and b represents the distance of hyperplane from the origin.

14.4 Result and Discussion

14.4.1 Test for Multicollinearity

A test to identify multicollinearity among independent variables was carried out to check for the presence of any correlation among landslide causative factors. The problem of multicollinearity can result in reduced accuracy of the model. The tolerance values less than 0.1 and variance inflation factor (VIF) values greater than 10 suggest higher correlation between independent variables, and such variables should be removed from dataset. The results for multicollinearity of 11 landslide factors indicate that the VIF and tolerance values are within the acceptable limits as shown in Table. 14.2.

Table 14.2 Multicollinearity analysis

14.4.2 LSM Using FR Model

All the classes of landslide causative factors were rasterized and reclassified in the GIS environment for analysis. The FR values were calculated for 11 landslide causative factors, and their spatial relation to landslide factors is analyzed. The resultant FR values of each subclass of the landslide factors were calculated in GIS as shown in Table 14.3. The analysis indicates that drainage density, geology, NDVI, distance from road, and TWI were the critical factors affecting the study area’s landslide susceptibility. While analyzing the hydrological parameters, areas with moderately high, high, and very high drainage density and TWI values were prone to landslides. Also, it was found that areas near the vicinity of roads were found to be more landslide susceptible. In the current study, the distance from the road class of 0–100 m had the highest FR values and hence requires special attention while planning construction activities. The LSM map generated was reclassified as areas with very low, low, moderately high, and very high susceptibility zones as shown in Fig. 14.4a.

Table 14.3 Landslide occurrences and their spatial relation with landslide causative factors
Fig. 14.4
Three maps of the Mandi district marked with L S M F R, L S M S V M, and L S M F R S V M. The first one has very high patches toward the west, the second map high intensities toward the easter part, the third map appears similar to the first one.

Landslide susceptibility maps: (a) FR model, (b) SVM model, and (c) FR-SVM model

14.4.3 LSM Using SVM Model

The SVM model was incorporated in GIS environment using R-integration and was utilized to calculate the spatial prediction of landslides in the study area. The LSM map generated was reclassified as areas with very low, low, moderately high, and very high susceptibility zones, as shown in Fig. 14.4b. It was observed that the slope gradient, drainage density, lineament density, soil, and distance from the road were the vital parameters that influence landslide occurrences. The slope gradient influences the landslide occurrence as steeper slope facilitates maximum slope failures. The results also show that the slope gradient classes, namely, steep (35°–45°) and very steep (>45°), are susceptible to landslides, whereas landslide occurrences were minimum for flatter slopes (<15°). Similarly, due to modest terrains at lower elevations (400 m–1000 m), these regions also witness less landslides. The highest probability of landslides was observed for moderate (1000 m–1500 m) to moderately high (1500 m–2000 m) elevations, but at high (2000 m–2500 m) to very high elevations (2500 m–3500 m), the probability of landslides again decreases.

14.4.4 LSM Using FR-SVM Model

The landslide causative factors were reclassified using FR model values, and the landside causative factors were reclassified according to these FR values. The radial-based SVM algorithm was applied to these factors to generate final LSMFR-SVM map. The LSM produced using the FR-SVM model was classified into five categories as shown in Fig. 14.4c. The LSMFR-SVM map analysis indicated that 15.28% of the total area falls into high landslide susceptibility zone, whereas 6.49% of the total area falls into very high landslide susceptibility zone. The analysis confirmed that TWI, drainage density, and NDVI were highly correlated to landside occurrence in the region.

14.5 Discussion

The disaster caused by landslides is not limited only to the socioeconomic loss but has a profound and everlasting impact on the overall demographic upliftment of the area. The effective management of landslide risk relies predominantly on the accuracy of the area’s landslide susceptibility maps. This study aims to carry out the comparison of FR, SVM, and hybrid FR-SVM models for their accuracies in predicting landslides.

The analysis of the LSM’s produced using FR, SVM, and hybrid FR-SVM models indicated that areas in the vicinity of roads (0–100 m) having high drainage density and having steeper slopes are more susceptible to landslides. In hilly regions such as Mandi district, the unplanned road construction activities continuously employ with the natural bed slope of the region. The areas with lower elevation have extensive developmental activities in comparison to higher regions of the study area. Since the regions constituting higher elevations are less approachable, therefore, less landslide incidences are reported.

The remaining landslide factors, that is, slope aspect, NDVI, geology, etc., had moderate to low influence on landslide occurrence. The results of this study are in accordance with similar landslide susceptibility studies of mountainous regions [30, 38, 51, 65].

The SVM is considered as a robust model and has been applied in wide varieties of landslide susceptibility analyses. However, it lacks in assigning effective relative weights to subclasses of causative factors. The SVM predictive power is maximized when the sample datasets are nonlinear and uses kernel functions for classification which sometimes leads to overfitting. The FR is a quantitative statistical model that can easily assign factor importance to subclasses of causative factors in minimal time.

While comparing the results obtained from the models, the FR-SVM model performed better than individual FR and SVM models. The FR-SVM model obtained a higher AUC value (84.7%) for model prediction as compared to FR model (77.9%) and SVM model (81.2%) as shown in Fig. 14.5. Hence, the use of hybrid model for predicting landslide susceptibility helps in eliminating the shortcoming of individual methods and improves the overall accuracy of the model.

Fig. 14.5
Three graphs with R O C curves, with false positives versus true positives. An ascending line in each graph represents a random guess. The curves begin at 0, have a distinct peak at around 0.9 between 0.2 and 0.4 and then rise gradually further. All values estimated.

ROC curves with AUC values: (a) FR model, (b) SVM model, and (c) FR-SVM model

14.6 Conclusion

Landslides constantly threaten the communities residing in landslide-prone mountainous regions. The planning of landslide management and mitigation activities in such regions requires adequate knowledge of the location and probable impact due to landslides. The Mandi district of Himachal Pradesh is highly prone to landslides and has witnessed landslide-induced destruction for many years. In the present study, 1723 landslides were documented from various sources and field observations, out of which 1206 (70%) landslides were included in the training dataset and 517 (30%) landslides were included in the testing dataset using random sampling. Also, it should be noted that all these landslide incidences were assumed to be triggered from rainfall only, and other triggering factors like earthquakes, volcanic eruptions and rapid snowmelt, etc. were neglected. In this study, although independent, the landslide causative parameters were chosen manually based on topographical and hydrological conditions of the study area. Eleven factors having high correlation to landslide occurrences were selected for identification of areas prone to landslides. The results of all models indicated that drainage density, slope gradient, and road distance are the three variables to be highly correlated to landslide occurrence. The results confirmed that the use of hybrid FR-SVM model produces robust model having better AUC values as compared to individual FR and SVM models. The process of integration of a bivariate and machine learning models provides highly accurate spatial correlation among variables with minimal overfitting of model. The outcomes of the present study should be considered during the planning of mitigation strategies in mountainous regions with similar topographic and hydrological conditions.