Introduction

Landslides, one of the most significant natural geologic phenomena in mountainous regions, often results in significant threats to human lives, infrastructure, and the environment (Mergili et al. 2014; Gariano and Guzzetti 2016). Landslides encompass a diverse array of occurrences such as rock falls, shallow debris flows, slides and avalanches. Though gravity's influence on slopes is the fundamental catalyst, additional interplay of geological, climatic, and anthropogenic factors triggers the landslides (Chen and Li 2020; Yunus et al. 2021). Because of this complexity, accurate prediction of landslide hazards is extremely challenging.

Examining slopes for their susceptibility to future risks is of utmost importance, because landslides present a threat that is widespread across nearly every mountainous region of the world (van Westen et al. 2008; Petley 2012). As world populations expand into mountainous regions, it becomes important to discuss the inherent risks posed by landslide hazards in these settings (Zhou et al. 2021). Equally important is comprehending how urban centers, settlements, and administrative regions can strategically address these risks through comprehensive land-use planning, engineering considerations for new construction, and the development of robust infrastructure systems (Guzzetti et al. 2012). Therefore, developing the landslide susceptibilities for the study of interest constitutes first stage in assessing landslide hazard and risk (van Westen et al. 2006; Dou et al. 2019).

Landslide susceptibility modelling (LSM), in this study, refers to modelling the probability of a landslide occurrence within the study of interest, determined by analyzing various conditioning factors (Brabb, 1984; Reichenbach et al. 2018). The LSM output serve as valuable resources, offering insights into the spatial probability of future landslide events (Lee and Min 2001; Feizizadeh and Blaschke 2014; Peethambaran et al. 2020). Precise delineation of landslide susceptibilities within a vulnerable region holds the potential to enhance disaster preparedness and response. By identifying zones prone to susceptibility, it facilitates proactive measures for land use management, strategic infrastructure planning, and the enhancement of community resilience (Rahman et al. 2019, 2021).

A significant number of studies have recently addressed the concept that accurate susceptibility maps depend on a reliable landslide inventory (i.e., the quality of the landslide mapping method, positional accuracy, and the sampling strategy) (Petschko et al. 2014; Steger et al. 2016; Chang et al. 2019). Nevertheless, there is an urgent need for a comprehensive examination of how the quality of mapping, specifically the selection of landslide pixel/point samples, can influence the modelling of landslide susceptibility.

While previous studies have aimed to highlight the differences in the predictive capabilities of landslide susceptibility models (LSMs) generated using different sampling strategies (e.g., Nefeslioglu et al. 2008; Hussin et al. 2016; Dou et al. 2020; Yu and Chen 2024), a comprehensive comparison of factor analysis and explanation of underlying mechanisms associated with each sampling scheme has been lacking. Therefore, this study explores into a critical evaluation of the variations in prediction accuracies of LSMs derived from distinct sampling strategies, including landslide centroid points, scarp points, and accumulation points.

We employed typical machine learning (ML) models such as random forest (RF), artificial neural network (ANN), k-nearest neighbors (KNN), and logistic regression (LR) to assess the predictive performance of LSMs and validate our assumptions. Consequently, this study sheds light on the conceptual disparities of LSMs produced using the aforementioned sampling strategies.

Geographic settings

Because of the complex mountain settings, India ranks among the top countries worldwide with a high susceptibility to landslides, experiencing at least one fatality per 100 square kilometers annually in hilly terrains as a result of landslide occurrences. Exposure of the huge population to landslides in areas spanning steep hillslopes of Himalayas to the ecologically sensitive Western Ghats, this region is very critical in disaster viewpoint (Gupta et al. 2018; Sankar 2018; Ramasamy et al. 2019; Saha et al. 2021). With its fragile geological structure, tectonics and intense monsoon rains, landslide hazards are very typical in Himalayas during heavy rainy spells (Dubey et al. 2005; Kanungo and Sharma 2014; Dikshit and Satyam 2018; Dikshit et al. 2020). This has been aggravated in recent period due to warming climate and skewed monsoon patterns. The infamous 2013 Uttarakhand disaster, triggered by heavy rainfall, exemplified the catastrophic impact of landslides in the country. In addition, the road construction, widening of existing roads, dam construction and urbanization that been increased in recent periods resulted in increased number of landslide hazards (Yunus et al. 2021).

On the other hand, the monsoon season which brings heavy rainfall between June and September, is a significant contributor to landslides in Western Ghats region of India. The intense rains during monsoon months saturate the soil, destabilizing slopes and triggering landslides. In 2018 and 2019, the Indian state of Kerala witnessed several catastrophic events with at least 100 people died and several thousands of properties damaged. Research conducted by Yunus et al. (2021) underscores the correlation between the human-induced factors, such as deforestation and road cutting, which amplify the landslide risks in the region. Their study also highlights the intense rainfall patterns and heightened landslide occurrences in this region. Furthermore, Ajin et al. (2022) shed light on the geological and geomorphological factors that contribute to landslide susceptibility, accentuating the complex nature of the terrain. As these authors collectively demonstrate, the multifaceted interplay of natural and anthropogenic factors calls for a comprehensive approach to mitigate landslide risks in the Western Ghats, especially in the less explored northern Ghats regions.

In the past few years, numerous landslide activity happened within the northern Western Ghats, especially in the areas surrounding Mahabaleshwar, Satara, Raigad and Koyna region, which have experienced heavy to very extreme precipitations. The present study analyses the landslide vulnerability of areas surrounding the Koyna Reservoir. Located in the western state of Maharashtra, India, the Koyna Reservoir region is situated at the northern Western Ghats Mountain range (Fig. 1). Here, we aim at providing an accurate landslide susceptibility map to mitigate future risks in this region by providing vulnerable zones. For this, we mapped the landslides that happened during 2021 by using high resolution satellite images. The 2021 monsoon rainfall event was unusual, triggering hundreds of landslides in different part of the Satara district, Maharashtra (Fig. 1). According to IMD observational data, Satara experienced its wettest day ever on 23rd July, 2021 with a record rainfall of 594 mm in 24 h. Mahabaleshwar station, north of Koyna reservoir, received about 943 mm of rainfall in 48 h during 21st to 22nd, July 2023 (IMD 2021).

Fig. 1
figure 1

(a) Location of the Koyna reservoir, Western Ghats, India, and the mapped landslide polygons for the region. Three different sampling strategies are adopted for landslide susceptibility modelling (LSM), namely (b) sampling from landslide scarp, (c) sampling from landslide centroid and (d) sampling from landslide deposits/accumulation zone. Filled triangles in (a) represent the locations of the field photographs depicted in Fig. 2, corresponding to Taliye, Posare, Shiral, and Dhokawale, respectively, from north to south

The Koyna reservoir is the result of the Koyna Dam, a major hydroelectric project constructed on the Krishna River. The elevation of the region ranges from sea-level to about 1500 m, with a mean of 500 m, and variations across the diverse terrain. Surrounded by dense forests, the region is rich in ecological diversity, including various flora and fauna species endemic to the Western Ghats. These encompass evergreen forests, deciduous woodlands, and grasslands that contribute to the region's unique ecological profile. The climate of the study region can be classified as tropical monsoon, characterized by extreme precipitation in monsoon (exceed 2000 mm annually). The geology of the region is predominantly homogenous, covered mainly by Deccan basalts (~ 95%), and partly with iron and aluminum rich laterite deposits. The regions geological history is intertwined with seismic activity, reportedly in the Koyna-Warna earthquake zone. Significant earthquakes have been reported in the past; notable one is the Koynanagar earthquake that occurred on 11 December, 1967 with a magnitude of 7.5 recorded in Ritcher scale (Narain and Gupta 1968). The epicenter lies about 5 km of the Koyna reservoir.

Data and methods

Landslide inventory mapping

The first stage in the precise prediction of landslide-prone areas is the preparation of an accurate landslide inventory map (Guzzetti et al. 2012). In this study, a comprehensive landslide inventory maps were carefully prepared through manual interpretation of post-July 2021 rainfall event imagery from Planet (3 m), Sentinel-2 (10 m), and Landsat 8 OLI (30 m) scenes. Landslide boundaries were digitized as polygons from cloud-free images acquired after July 2021 in a GIS environment. We preferred manual digitization instead of automated extraction techniques as automated methods may lead to upward biases in the land use change classes. To ensure the accuracy of identified landslides stemming from the intense rainfall episodes of July 2021, we cross-referenced pre-event satellite data from Planet and Sentinel-2. Additionally, to enhance the reliability of our mapping endeavours, we cross-checked the mapped landslide boundaries using high-resolution Google Earth© archives from 2020 and 2021, thereby enhancing the overall robustness of the inventory. Furthermore, the locations of accessible large landslides were verified in the field as part of our study. Some representative examples of the field surveyed landslide locations are shown in Fig. 2.

Fig. 2
figure 2

Slope failures photographed in (a) Taliye (18°06′40.5"N 73°35′00.9"E), (b) Posare (17°37′26.4"N 73°38′04.9"E), (c) Shiral (17°22′17.6"N 73°50′21.9"E) and (d) Dhokawale (17°19′21.9"N 73°44′55.7"E)

Landslides can be mapped at various spatial discretization levels, for example, as landslide initiation points (Gorum et al. 2011), centroid points (Wartman et al. 2013), or landslide polygons (Tanyaş et al. 2022). But, in the context of LSM, the selection of sampling methods is also a critical consideration (Guzzetti et al. 2012). Some researchers opted for point data to express the landslides as a means of simplification or due to constraints, such as limited access to high-resolution imagery (Tien Bui et al. 2019; Pham et al. 2020). In contrast, other researchers choose to employ polygon data, encompassing the various components of landslides, including the scarp, crown, and deposition areas (Dou et al. 2019; Chang et al. 2019). In the present study, we have chosen to map the landslides as polygons, mainly to assess the total affected areas. Highest and lowest elevation in the polygon is further identified as source and deposition zones respectively, and buffered to ~ 10% of the size of the landslide polygon, and their centroid points are selected for LSM analysis. Accordingly, different sampling criteria were chosen as the input data for the susceptibility models: they are, (a) centroid point of entire landslide body, (b) centroid point representing landslide scarp polygon, and (c) centroid point representing the landslide accumulation polygon. All the processing steps were carried in a GIS environment (ArcGIS v10.8).

Factor analysis

Fourteen landslide conditioning factors (LCF) typically used in LSM studies were selected in this study. Table 1 summarizes the source data and their description in detail, and Fig. 3 shows the factor maps.. Given the predominantly homogeneous terrain characterized by a single outcrop lithology (Deccan Basalt), covering over 90% of the study area, we have excluded it as a contributing factor to landslide conditioning within the study area.

Table 1 Landslide conditioning factors selected in this study for the preparation of landslide susceptibility modelling
Fig. 3
figure 3

Various landslide conditioning factors (LCF) used in the landslide susceptibility modelling: (a) aspect, (b) curvature, (c) elevation, (d) topographic position index (landforms), (e) distance to lineaments, (f) LS factor, (g) normalized difference vegetation index, (h) positive openness, (i) stream power index, (j) slope, (k) rainfall, (l) distance to channels, (m) topographic wetness index, and (n) valley depth

Subsequently, we performed a collinearity analysis to test the inter-associations among the independent variables. Multicollinearity occurs when a variable is highly correlated with one or more other variables (Allen 1997), and the variance inflation factor (VIF) and tolerance (TOL, i.e., 1/VIF) are two common tests for collinearity. According to previous studies, VIF>10 or TOL<0.1 indicate significant multicollinearity (Dormann et al. 2013). Table 2 summarizes the VIF values for the conditioning factors and the results shows that the factors doesn’t exhibit any significant multicollinearity. The information gain function (IG) was then used to calculate the factor importance of independent variables. In feature selection and for locating root nodes in tree-based models, IG is one of the quickest and simplest attribute ranking techniques (Alhaj et al. 2016). Figure 4 shows the significance and ranking of the LCFs using the respective training dataset.

Table 2 Collinearity tests conducted for different landslide conditioning factors using value inflation factors (VIF). Note that the VIF values for all factors are lower than 10
Fig. 4
figure 4

Rank importance of LCFs using the IG attribute evaluation. a) Landslide centroid points, b) scarp points, c) accumulation points

Landslide susceptibility modelling

While traditional regression analysis is effective for landslide modelling (Reichenbach et al. 2018), comparative evaluation often pinpoints the advantageous of machine learning models in capture the complex, non-linear interactions present in different variables and landslide initiation (Huang et al. 2020; Goetz et al. 2015). Machine learning methods, on the other hand, have the flexibility to model intricate relationships between multiple predictor variables and the target variable, allowing for more accurate predictions (Merghadi et al. 2020). By leveraging algorithms such as Random Forest, logistic regression, K-nearest neighbours, or neural networks, ML can efficiently process and extract meaningful patterns from various landslide conditioning factors, thereby enhancing the predictive power of the LSM models. For this research, we thoroughly examined and compared widely employed four machine learning models. After comparison of all four models and their predictive capacity, we then chose the best fit model to determine the final landslide susceptibility assessment near Koyna reservoir region.

Based on our defined criteria, we created landslide susceptibility maps by utilizing landslide points obtained from three different sources: the landslide centroid, the scarp, and the accumulation region. To model the data effectively, we compiled a dataset consisting of 3066 landslide points and an equal number of randomly selected non-landslide points within the study area. To generate the random points, we applied a 500 m buffer around the mapped landslide polygons and excluded this area from the study zone. This ensured that no landslide areas were inadvertently included in the selection of non-landslide points. This comprehensive landslide and non-landslide dataset were then subsequently divided into two parts: 70% for training and 30% for validation purposes.

Random forest

Random Forest (RF) is an ensemble ML technique that produces numerous decision trees, which are then weighted to calculate a classification scheme (Breiman et al. 1984; Merghadi et al. 2020). Due to their demonstrated superior performance in multiple cases of LSM analysis, and because of less hyper parameters to deal with, RF is favored over other conventional statistical approaches logistic regression (LR), frequency ratio (FR), and ML models such as support vector machines (SVM), and decision trees (DT) (Merghadi et al. 2020; Fan et al. 2021). RF’s accuracy rates in past studies found to be higher in most cases (90% and over) and they can be achieved, even with limited data points (Catani et al. 2013; Dou et al. 2019). We used the “randomForest” package in R 4.3.0 software to model the landslide susceptibility.

Logistic regression

Logistic Regression (LR), one another prominent ML algorithm, was initially adopted from statistical techniques (Budimir et al. 2015). LR uses the logit function (sigmoid function) in the modelling approach. It stands as an explicit instance within the broader scope of generalized linear models, designed to yield binary outcomes. LR's distinctive capability to determine the optimal fitting function for representing the intricate association among the presence/absence of landslides and a combination of landslide conditioning factors (LCF) is coupled with the advantage of requiring minimal adjustment of 'hyper-parameters’ (Chen et al. 2015).

Fundamentally, LR establishes a connection between the probability of a landslide incidence and a link function presumed to provide the conditioning factors that potentially influence landslide presence. This is formalized through the Equation:

$$\widehat{P}=\frac{1}{1+{e}^{-z}}=\frac{{e}^{z}}{1+{e}^{z}}$$

Here P signifies the probability of occurrence of landslides, confined to 0 and 1, delineated by a sigmoid curve. The term z represents a linear fitting equation incorporating the provided ensemble of variables associated with landslides, structured as described in Equation:

$$Z={b}_{0}+{b}_{1}{X}_{1}+{b}_{2}{X}_{2}+\dots +{b}_{n}{X}_{n}$$

K-Nearest neighbors

K-nearest neighbors (KNN) is a fitting straightforward ML algorithm, where the input entails the k-nearest training instances within the feature space. The resulting output comprises probabilities of class membership. In classification scenarios, these probabilities signify the degree of ambiguity surrounding the assignment of an individual item to a specific class. The categorization of an item is established by means of a consensus among its neighbouring items. Subsequently, the item is allocated to the class that prevails among its k nearest neighbors, with k often denoting a small positive integer. When k is set to 1, the item is straightforwardly categorized according to the class of its sole nearest neighbors (Merghadi et al. 2020).

KNN though recognized as a 'lazy' supervised model, but it highlights that the algorithm's calculations and are not contingent on the data's distribution; rather, the model naturally adapts to the data (Merghadi et al. 2020). This adaptability is particularly advantageous in cases such as landslide susceptibility modelling, where the distribution of landslides might not conform to standard patterns. On the contrary, the algorithm approximates the function locally and postpones all computations until the classification stage. Consequently, KNN doesn't create a discerning function from the training data; instead, it essentially stores the training dataset in memory.

Artificial neural network

Also known as a multilayer perceptron (MLP), the Artificial Neural Network (ANN) consists of input neurons (input layer), one or more fully connected hidden layers, and an output layer (in the case of binary classification). The performance of an ANN model is sensitive to the number of hidden layers, the type of activation function and the way in which the weights are updated. The purpose of an ANN is to construct a model of the data-generating process, enabling the network to generalize and predict outputs from inputs it hasn't previously encountered. The network operates in two modes: learning and recall. During the learning phase, facilitated by the back-propagation learning algorithm, the network adjusts its initially random connection weights based on a set of stimulus pairs called learning examples. This phase includes input data and corresponding expected outputs. In the recall phase, the synthesized knowledge from the learning phase is applied. This allows the network to provide coherent responses to inputs not present during the training phase, demonstrating its ability to generalize and make predictions beyond its training data.

Model performance evaluation

The predictive accuracy of developed LSM should be checked for its validity (Tien Bui et al. 2019; Pham et al. 2020). To evaluate the model performance, we employed accuracy (classification rate), kappa statistics, and area underneath the receiver operating characteristic (ROC) curve (AUC) metrics. Kappa index demonstrates the reliability of the landslide models (Bennett et al. 2013). If the kappa value is close to zero, the model is unreliable, but for values closer to one, the model is reliable. According to Landis and Koch (1977), a kappa index near zero indicates poor agreement between the model's estimations and observed reality, while values falling within the ranges of 0.4–0.6, 0.6–0.8, and 0.8–1 signify moderate, substantial, and almost conditions of agreement, respectively.

The accuracy is calculated using the possibility indices true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The TP and FP represent the number of pixels correctly identified as landslide and non-landslide pixels. However, TN and FN are some landslide pixels that were misclassified as landslide and non-landslide pixels (Dou et al. 2019; Pham et al. 2020).

$$\text{Accuracy }=\frac{\text{TP}+\text{TN}}{\left(\text{TP}+\text{TN}+\text{FP}+\text{FN}\right)}$$

Results

Landslide distributions and controlling factor analysis

A total of 3066 landslide polygons were delineated from the study area after 2021 using post-event satellite images, covering a total area of approximately 9.58 km2 and large accessible areas were field verified (Figs. 1 and Fig. 2). In Fig. 5, we present the analysis of first five most important landslide conditioning factors identified using IG ranks from the best fit model found from this study. The figure is divided into three parts: (left panel—representing factor analysis of centroid areas; middle panel—representing scarp areas; and right panel—representing accumulation areas. The lowercase letters (a-e) correspond to the important landslide conditioning factors. This organization allows for a comprehensive examination of the conditioning factors.

Fig. 5
figure 5

Plot showing landslide aerial density (LAD) and landslide density (LA) sampled for centroid (left panel), landslide scarp (middle panel) and landslide accumulation (right panel) zones for (a) Rainfall (b) slope, (c) LS factor, (d) positive openness, and (e) stream power index

Results shows that the majority of landslides are concentrated on slopes steeper than 20 degrees, encompassing over 56–84% of the total landslide area (Fig. 5a). However, a detailed visualization of slope classes with landslide area (LA) and landslide aerial density (LAD) in different scheme of landslide sampling reveals significant trend differences. For instance, the peak LA and LAD for slope classes derived from centroid schema lies in 20˚– 25˚ and 25˚- 30˚respectively. This trend is partly consistent in the accumulation region, where the peak aerial density (LAD) approaches 1.6. However, the examination of the scarp region reveals a peak LAD of approximately 3 in the 30°–37° slope classes. These findings generally support conventional assumptions, suggesting that scarp samples exhibit greater predictive power than other sampling schemes when comparing different slope classes.

We observed significant variations in the LS factor and landslide occurrence across three different sampling schemes (Fig. 5b). While the majority of landslides occur in areas with slope lengths ranging from 8 to 59, their associated LAD values vary considerably. Specifically, landslide accumulation samples exhibit a peak LAD of 6.7 in regions characterized by high slope lengths ranging from 177 to 626. In contrast, scarp samples show zero LAD for LS factor within these classes, while centroid samples display LAD values of less than 2, with both schemes peaking at the 25–60 LSF class. In previous studies, it has been revealed that slope gradient decreases with increasing slope length, and this correlation substantiate the potential positive relationship between better predictive powers in scarp and centroid samples over the accumulation zone (Qiu et al. 2018).

Positive openness (POS) expresses the degree of dominance or openness of the landscape to the sky for a pixel of the terrain model (Yunus et al. 2021). A low value of POS highlights the topographic concavities. Our results indicate that, in all three sampling scenarios, the majority of landslide-prone areas (>50%) fall in the medium positive openness class ranges (1.29 to 1.38) indicating both concave and convex hillslope terrains exhibit the evidence of landslides (Fig. 5c).

Analysis of the stream power index (SPI), a metric indicating the erosive potential of flowing water (Wilson and Gallant 2000; Deng et al. 2007), reveals distinct trends across the three landslide sampling schemes. However, results consistently demonstrate that over 77% of landslides are concentrated in areas characterized by high SPI values, which correspond to regions of high aerial density (LAD). The spatial distribution of the highest landslide occurrences generally aligns with areas of maximum unit stream power. Nonetheless, the LAD trend for scarp samples peaks within the 2.6–4.2 class, contrasting with the other two schemes where SPI peaks occur in the 6 to 16 classes. This discrepancy between scarp and other samples may be attributed to the fact that in headwater channels, high LAD values in scarp regions are sufficient to induce significant landslide erosion.

The NDVI value, a proxy for vegetation dominance or not, indicates the presence and intensity of vegetation in the study area. The relationship between landslide distribution and NDVI values in the study region suggests that landslides are more common in densely vegetated areas, with more than 75% of the total landslide area is in NDVI value greater than 0.5 (Fig. 5e). In all the instances, there is a progressive increase in landslide area and LAD from a low NDVI value to a high NDVI value. Conventionally, it's often assumed that landslides are more prevalent in areas with sparse vegetation due to factors such as reduced root cohesion and increased water infiltration leading to soil instability. However, the study's observation that landslides are more common in densely vegetated areas, particularly those with NDVI values greater than 0.5, suggests a more complex interplay of factors.

One possible explanation for this discrepancy is that while dense vegetation can indeed stabilize slopes through root reinforcement, it can also increase the weight and water-holding capacity of the soil, potentially leading to increased pore water pressure and, consequently, higher landslide susceptibility, especially when study area like Western Ghats, where the intense rainfall and steep terrain coincides.

Model comparison and validation

We explored the predictive performance power of different machine learning models in LSM such as RF, ANN, KNN, LR models for training and validation. Accuracy (ACC) of different models were then performed with the help of confusion matrix, and the receiver operating characteristic (ROC) curve.

The performance results for training (Table 3) and testing (Table 4) show that all the models produced good results (AUC>0.790, ACC>0.718 and kappa>0.436). Furthermore, the comparative analysis of these models clearly demonstrated that the Random Forest (RF) model outperformed the others by a significant margin on both training and testing datasets (Fig. 6). We found that the prediction rates are inconsequential with the RF model irrespective of the sampling technique (AUC: 0.902 – 0.952). Whereas, testing with ANN (AUC: 0.866 – 0.913), KNN (AUC: 0.816 – 0.868) and LR (AUC: 0.817 – 0.874) shows significant variations in the accuracies between the three datasets.

Table 3 Performance results of the implemented models using training datasets
Table 4 Performance results of the implemented models using validation datasets
Fig. 6
figure 6

Receiver operating characteristic (ROC) curve and AUC values for the four different machine learning models with samples from landslide centroid, scarp, and accumulation using training (a-c), and testing (a'-c') datasets respectively

When employing all the machine learning models, it is evident that landslide scarp points achieved much better results than landslide centroid points and accumulation points. Models using landslide scarp points consistently achieved superior performance with a minimum accuracy value of 79% and AUC values greater than 85%. RF model demonstrated excellent performance in all three scenarios using validation datasets, specifically when using landslide scarp points, achieving an impressive 95.2% AUC and an 88.8% accuracy rate. When applied to datasets with landslide centroid points, it showed slightly lower performance, with a 90.2% AUC and 82.3% accuracy. For landslide accumulation points, the model maintained an accuracy of 86.7% and had an AUC of about 93.5%. Also, the RF model consistently achieved a kappa statistics value above 0.644, demonstrating strong agreement between model estimations and observed data. Specifically, the model applied to landslide scarp points demonstrated a strong kappa value of 0.774, while the accumulation dataset yielded a respectable 0.734, and the centroid point model achieved a satisfactory kappa value of 0.644. Moreover, these models demonstrated a lower false positive rate, highlighting the effectiveness of the landslide scarp point approach in comparison to the other methods.

In this study, the ranking of model accuracy follows the order: RF>ANN>LR>KNN. However, when considering the model accuracy in relation to specific scenarios, it can be observed that the model developed using landslide scarp points exhibited a higher accuracy than those developed using landslide accumulation and landslide centroid points. Hence, in terms of LSM performance, the predictive power is ranked in the following order: landslide scarp point>landslide accumulation point>landslide centroid point.

Estimation of landslide distribution

Following a comparison of different machine learning models, we chose the Random Forest (RF) model for landslide susceptibility modelling due to its superior accuracy. Figure 7 and 8 shows the landslide susceptibility maps derived from 14 LCFs and the landslide inventory from the study area, modelled using the different ML methods. The susceptibility maps are categorized into five levels based on 2.5%, 25%, 50% 75% and 95% of the total pixels following Merghadi et al. (2020). They are corresponding to very low, low, medium, high, and very high landslide classes. Table 5 illustrates the distribution of area covered by each susceptibility class in the study region utilizing different machine learning models.

Fig. 7
figure 7

Landslide susceptibility maps generated using (a) LR model (b) ANN and (c) KNN (left panel: landslide centroid points; middle pane: scarp points; and right panel: accumulation points)

Fig. 8
figure 8

Landslide susceptibility maps generated using RF models (left panel: landslide centroid points; middle panel: scarp points; and right panel: accumulation points)

Table 5 Total area extent covered by each susceptibility class

The quantitative evaluation of landslide distribution reveals distinct trends when employing various machine learning models (Table 6). Quantitative assessment of most accurate RF model using landslide centroid points predicted that around 90.45% of the landslide area fell within the medium to very high susceptibility classes (Fig. 8 and Table 5). Conversely, the model employing landslide scarp points showed that approximately 90.69% of the landslide area is categorized in the very high to medium probability classes, while the model using landslide accumulation points yielded similar results with around 96% of the landslide area assigned to the medium to very high classes. The KNN model indicates that approximately 80% of the landslide area falls within the medium to very high probability classes, with approximately 71% of this area located within the very high susceptibility region. In contrast, the utilization of landslide scarp points indicates that approximately 76% of the area is classified within the medium to very high probability class, with 68% falling specifically within the very high probability region.

Table 6 Extent of landslide area covered by each susceptibility model

However, it's important to note that the model using landslide centroid points and landslide accumulation points exhibited a higher false positive rate in comparison to the model utilizing landslide scarp points (Table 4). Specifically, the RF model with landslide scarp points demonstrated a lower false positive rate at 0.115, while the models using landslide centroid and accumulation points had false positive rates of 0.179 and 0.133, respectively. This consistent pattern was evident in the outcomes of all assessed models.

Discussion

Estimation of landslide distribution using RF model

The choice of landslide sampling criteria has garnered significant attention in landslide susceptibility studies, as discussed in earlier literature (Hussin et al. 2016; Dou et al. 2020). The assessment of the ranking of LCF in relation to three different sampling strategies using Information Gain Ratio (IGR) is presented in Fig. 3. It is noteworthy that the ranking in IGR changes with three different sampling approaches, indicating there is obvious variance in the results as well. Despite the variances, it can be noted that the top four position in the best two LSM-AUC output scheme (scarp method and centroid method) remains the same except in their order of ranking given to slope, LS factor, positive openness and SPI. Indeed, the ranking of positive openness and stream power index remains same at 3rd and 4th position in both methods, but slope has gained maximum significance in the scarp method compared to the latter. Interestingly, one can be noted that the importance of slope decreased considerably when we use the samples from deposit/accumulation zones. This is anticipated because, at the accumulation zones the control of slope on landslide initiation is very small, and therefore it has got the least AUC values in LSM output. This result highlights the importance and robustness of the IGR in LSM modelling. Another noteworthy change while employing the scarp sampling strategy in the ranking is the increased significance of NDVI and decreased significance of landforms.

Robust results with scarp samples and random forest models

Among the major controlling factors identified for the landslides, slope, SPI and LS factors are typical of all rainfall-induced landslides in Western Ghats in addition to the anthropogenic drivers of landslides (Yunus et al. 2021). With respect to the accuracy of LSM, our empirical observations suggest that sampling scarp areas tends to yield a more effective model compared to other methods. This finding is consistent with prior research results (Simon et al. 2013; Süzen and Doyuran 2004; Dou et al. 2020). Landslide centroid point sampling method offers advantages in terms of time, simplicity, and automation. But it is important to note that it relies on the center of gravity of the landslide polygon, potentially resulting in reduced accuracy. Whereas, scarp areas are indicative of instability and undisturbed morphological conditions, making them a crucial element in susceptibility assessment. Therefore, choice of an appropriate landslide inventory map should be contingent on the specific research objectives. For the computation of landslide area, the utilization of landslide polygons is recommended as they depict the realistic representation of the landslide initiation condition. On the other hand, if the primary aim is to expediently determine the locations of landslides, point data provides a straightforward and efficient approach. Nevertheless, in scenarios where a comprehensive assessment of susceptibility is required, particularly in regions prone to instability, focusing on the scarp areas is often the preferred approach.

In terms of model performance, we noticed that RF algorithms achieve excellent results compared to other machine learning algorithms, which are in good agreement with previous types of research (Dou et al. 2020; Merghadi et al. 2020; Yunus et al. 2021). This observation underscores the robustness of the RF model. Even when the characterization of landslide scarp and body in an inventory is less accurate, RF modelling can significantly enhance predictive power, demonstrating its reliability. However, it's noteworthy that when using the RF model, significant variations in AUC values were observed based on the sampling methods employed. Hence, this study recommends utilizing samples within the landslide scarp area and employing the RF technique to enhance the predictive performance of landslide susceptibilities. This approach can maximize the accuracy and reliability of landslide susceptibility assessments.

Furthermore, we extended the validation of our study framework to an entirely different study site, the Chaliyar Basin in Kerala. This region encountered unprecedented rainfall and landslides during 2018–19 (Yunus et al. 2021). The Random Forest (RF) model, initially trained for the Koyna region, was applied to the region; remarkably, the resulting AUC value was found to be 0.83 (Fig. 9). This superior performance of the model framework without requiring fine-tuning or adjustments to the selected conditioning factors, indicate the capability of model performance in other environmental settings.

Fig. 9
figure 9

Validation of the model framework to Chaliyar Basin, Kerala. Note the area under the ROC curve values is 0.83

Limitations

The LSM modelling ability from different sampling approaches presented in this study may be influenced by the manual mapping procedures used and the inherent limitations in remote sensing technology, which are restricted by the sensor’s spatial resolution. Implementing machine learning models and semi-automated to automated mapping approaches using NDVI as a vegetation proxy for landslide mapping may ensure consistency (Deijns et al. 2020; Milledge et al. 2022). Additionally, using post-event high-resolution DEMs helps in differentiating landslide partitions into scarp and accumulation zones (Dou et al. 2021). Different landslide geometries and individual capacity in mapping them have varying adaptability, but there is not yet a fully successful automated approach in landslide mapping techniques adapted for a wider region. Therefore, future studies should focus on developing a best-fit data sampling approach that minimizes the impacts of manual editing and area mismatch, for example, by incorporating emerging automated AI-based models.

A key limitation of this study lies in the challenge of addressing non-landslide sampling in a random manner. For instance, it was difficult to accurately classify areas where landslides had not yet occurred and assign them similar characteristics as scarp, centroid, or accumulation zones. This difficulty in effectively addressing non-landslide sampling introduces biases into the susceptibility modelling process. Further research can also be attempted to study extreme event characteristics, given that the landslide cases in this study are concentrated in a single event. Consequently, the results may be influenced by factors such as the concentrated nature of the rainfall.

Another significant concern is the reliance on a single-event landslide case presented in this study. Additionally, the homogeneity of the lithology in the entire study area limits the model applicability due to training bias. Without comprehensive data on historical landslide occurrences and a wide set of training samples from different environmental conditions, the predictive accuracy of the susceptibility models for a wider region may be compromised, particularly when coupled with selection issues in non-landslide zones. However, we assume that the non-landslide points encompass all types of conditioning zones. This assumption enhances the robustness of the findings and may therefore reflect the practical utility of the susceptibility maps generated by the study.

Conclusions

This research compiled a thorough inventory documenting over 3000 landslides triggered by an extreme rainfall event in July 2023 in the northern Western Ghats. By employing four machine learning models (LR, ANN, KNN, and RF) and three sampling schemes (landslide scarp samples, centroids of landslide bodies, and samples from landslide accumulation zones), we further assessed landslide susceptibility in the region. All the four ML models, and three sampling schemes displayed AUC values exceeding 0.80, indicating promising outcomes and showcasing the effectiveness of machine learning techniques in landslide studies.

Notably, the RF method exhibited the highest performance with an AUC of 0.95, followed by ANN (AUC = 0.91) and LR (AUC = 0.87). Regarding performance of sampling schemes, landslide scarp samples proved to be more effective than accumulation zones and scarp centroids in susceptibility modelling. Analysis of the feature importance by means of IG ratio underscored the significant influence of steep slopes and slope lengths in the Western Ghats terrain on triggering substantial landslides. The insights gained from this study aim to assist landslide investigators in selecting appropriate data types and models, as these choices can significantly impact the accuracy of the final susceptibility maps.