1 Introduction

Landslides are natural hazards that frequently occur worldwide and threaten the safety of individuals, buildings, and other infrastructures (Iverson, 2000; Kjekstad & Highland, 2009). Population growth, urban development, and indiscriminate use of natural resources have increased the susceptibility of many regions to landslides. It has been estimated that landslides are responsible for 1000 deaths and 4 billion USD of damage annually (Lee & Pradhan, 2007).

Accurate estimation of the spatial distribution of landslides and the generation of susceptibility maps are essential for hazard mitigation and urban development planning in landslide-prone areas. Classic methods, such as direct mapping through field surveying, typically involve measurements of mass displacements, which may be time-consuming, costly, and impractical over large areas (Kovács et al., 2019). Alternative methods include indirect mapping through the use of modeling techniques, relationships between landslide conditioning factors (e.g., land use, slope class, distance from roads or streams, and other site-specific properties that can be predictors of landslide susceptibility), and recorded locations of historical landslides (Guzzetti et al., 2006). The premise for using such conditioning factors with historical data is that future landslides will likely occur in locations that have similar properties as the regions in which past landslides occurred (Guzzetti et al., 2006). In general, alternative approaches to landslide susceptibility mapping (LSM) can overcome the limitations of direct mapping techniques and accelerate the production of susceptibility maps to alleviate the hazards associated with landslides worldwide.

In recent years, many such alternative approaches have been developed, ranging in type from physical models to machine learning (ML) methods (Merghadi et al., 2020). For example, deterministic models evaluate landslide susceptibility with physical laws and data including rock and soil properties, topography, and hydrological conditions (Yilmaz, 2009). However, these models may oversimplify landslide processes, and the required data are usually expensive or difficult to obtain, especially when the study area is large and heterogeneous. Heuristic approaches for LSM establish classes for judging the relative contributions of multiple landslide variables (Dahal et al., 2008a, 2008b; Dai & Lee, 2002). Although these ranking or rating methods are effective, they may be highly subjective compared with other data-driven approaches (Lee et al., 2001). Probabilistic models exploit the statistical properties of landslide factors in locations of past landslides (Constantin et al., 2011; Jaafari et al., 2014; Shirzadi et al., 2017). Other statistical models apply logistic regression, binary logistic regression (Shahabi et al., 2015; Tien Bui et al., 2016), fuzzy logic (Sur et al., 2021, 2022), and knowledge-based methods (Althuwaynee et al., 2016; Kumar & Anbalagan, 2016).

ML models typically use general-purpose learning algorithms that can identify patterns in data, including complex or nonlinear data. Diverse ML techniques have been applied to LSM and noted to achieve excellent results. For instance, a highly optimized support vector machine (SVM) workflow was noted to attain prediction accuracies > 90% (Dou et al., 2020a). Ensemble methods using random forest and decision tree achieved excellent area under the receiver operating characteristic curve (AUC) values of 0.91 and 0.98, respectively (Arabameri et al., 2020; Chowdhuri et al., 2020). However, ML workflows may be complicated, and their optimization is often challenging. Moreover, owing to the rapid development of ML techniques, many methods with potential to improve LSM predictions have not been fully evaluated.

Recent studies have applied deep learning (DL) methods for LSM. The increased complexity or numbers of layers and nodes, which makes these frameworks “deep,” render them well-suited for predicting the complicated relationships between landslide conditioning factors and geographic landslide likelihoods (Schmidhuber, 2015). For example, Dou et al. (2020b) compared the performance of a deep neural network (DNN) against logistic regression (LR) and an artificial neural network (ANN) for LSM in Japan. The DNN model outperformed the LR and ANN during training and testing. Thi Ngo et al. (2021) compared convolutional neural network (CNN) and recurrent neural network (RNN) models for LSM in Iran on a national scale and reported that both models achieved AUC values higher than 0.85. These studies highlighted the potential of DL methods for LSM.

Previous studies that assessed and compared the performances of different DL methods did not consider optimization process, and thus, their conclusions may be unreliable. Although several researchers attempted to optimize their models using different optimization algorithms, they focused on a single method and compared the effects of different optimization algorithms on the model performance. In this study, we combined both objectives. Specifically, we performed a reliable and comprehensive comparison among three methods and investigated the possible performance advantage of DL methods (DNN and CNN) over ML (SVM) methods. These three methods were selected because although they have separately demonstrated strong potential for LSM, their landslide susceptibility assessment capabilities have not been comprehensively compared yet. In addition to following the methodologies of existing LSM studies that separately used SVM, DNN, and CNN methods, we optimized the model hyperparameters that control the learning process to ensure that each model reaches its maximum potential and the results can be reliably compared. The performances of the three methods in performing LSM in a landslide-prone province in Iran were assessed using a landslide inventory to evaluate the locations of previous landslides and a set of landslide conditioning factors. The AUC, which has been commonly used in ML and DL studies, was adopted as the performance metric to facilitate the comparison of the obtained results with those reported in the existing studies, for identifying the most effective approaches for LSM.

2 Study area

Iran is a mountainous country with many major population centers located on sloping terrains that are exposed to landslide hazards. In this work, the Kermanshah province in western Iran (Fig. 1), which is one of the most landslide-prone provinces was selected as the study site. Kermanshah has a total area of 95970 km2 and is located between 33°40´ N–35°20´ N and 45°20´ E–48°10´ E in the Zagros Mountain range. The elevation in the province ranges from 116 to 3359 m above sea level. The region has average low and high temperatures of 20.61 °C and 36.1 °C, respectively, and an average annual rainfall of 500 mm, making it one of the wettest provinces in the country (http://www.kermanshahmet.ir/met/amar).

Fig. 1
figure 1

Location of the study area (Kermanshah, Iran)

The province is partially covered with high-density vegetation, agricultural lands, regions of sparse vegetation, and plains connecting mountains and valleys. Kermanshah is seismically active, positioned over the High Zagros Fault (HZF), which is the most active fault in the area. The HZF is 1375 km long with a NW-SE bearing. The occurrence of 14 earthquakes (magnitude 4 or higher) was recorded between August 2019 and December 2021, which resulted in landslides and severe damage to infrastructure. The area consists of two geological zones, covering the northeast and southwest regions of the area: Sanandaj–Sirjan, consists of sedimentary and igneous/metamorphic rock zones owing to high volcanic activity and Zagros, which covers a much larger portion of the province, consists of mostly sedimentary rock, with some ophiolites (Ao et al., 2016; Arian & Aram, 2014).

Landslides in Kermanshah province are mostly triggered by intense rainfall or seismic activities such as earthquakes. Some of the most devastating landslides in Iran’s history have occurred in Kermanshah, including the largest landslide recorded in the past 20 years, which occurred in Mela Kabood with an area of effect of 4.61 km2. This landslide was triggered by a 7.3 magnitude earthquake in November 2017. Many other regions of the province were also damaged by landslides triggered by this earthquake, such as North Dalahoo and Zanganeh outpost (Fig. 2a–c). Multiple occurrences of rockfall were also reported in Babayadegar, Northeast Dalahoo, and Piran (Fig. 2d–f). These phenomena damaged critical infrastructures, disrupted water supply lines and roads, and led to severe injuries (Bordbar et al., 2022). According to reports from Tasnim news agency (https://www.tasnimnews.com), 20 villages were evacuated, and the residents had to be relocated.

Fig. 2
figure 2

Photographs of landslide and rockfall damage across Kermanshah province (Haghshenas et al., 2017) a Mela Kabood b Dalahoo c Zanganeh outpost d Babayadegar e Northeast Dalahoo f Piran (https://www.tasnimnews.com)

In April 2019, multiple regions in Kermanshah province were damaged by landslides caused by intense rainfall and flooding (Fig. 3). According to the Islamic Republic News Agency (https://www.irna.ir), 281 villages, 134 km of roads, 46 buildings, and the feeding pipeline of an oil refinery were damaged due to landslides.

Fig. 3
figure 3

Roads damaged by landslides due to intense rainfall (https://www.irna.ir)

3 Materials and methods

3.1 Overview

The creation of landslide susceptibility maps through SVM, DNN, and CNN modeling consisted of a multi-step workflow, established with reference to previous studies (Fig. 4). First, the sampling points were split into testing and training groups, followed by data sourcing and processing to develop a geodatabase. Using the training sample, we extracted different formats of data from the geodatabase for model optimization and training. Multiple batches of models were trained using different sets of hyperparameters for each method. The AUC and root mean square error (RMSE) per epoch were recorded for each of the models as a graph for future comparison. After training multiple batches of models per method, the model with the highest AUC and stable RMSE graph in each batch was selected. Next, the AUC and RMSE graphs of the final set of chosen models were assessed, and the best model resulting from the optimization of the corresponding method was selected. The performances of these optimized models were validated using additional metrics. If the models exhibit satisfactory validation results, they were used to generate susceptibility maps of the area.

Fig. 4
figure 4

Workflow of this study

3.2 Geospatial dataset

Two essential datasets were acquired for this study: (1) a landslide inventory, and (2) a set of landslide conditioning factors. These data were sourced from national authorities (e.g., geological survey and mineral exploration of Iran and Ministry of Agriculture Jihad) and various online sources as digital maps. An elevation map and its derivatives (e.g., slope, aspect, and curvature) were created using ALOS World elevation tiles that were resampled from 30 m to 85 m resolution raster images to ensure uniform resolution across the dataset.

3.2.1 Landslide inventory

The landslide inventory included 110 landslide location records from across the study area. Points for these landslides were generated from the centroid of the landslide scarp. This inventory was used to create a landslide density map, which was divided into five density classes or regions (very low, low, moderate, high, and very high). Most of the 110 landslide locations lay within the “high” and “very high” density regions. A second set of 110 locations was created to represent non-landslide locations, by randomly sampling points in the very-low-density regions. Figure 5 shows the distributions of both types of points and the density map.

Fig. 5
figure 5

Landslide density and point distribution across Kermanshah province

This dataset was split into two groups of training and testing points. Such sets are typically partitioned to include 70–80% of the data for training and 20–30% for testing (Nefeslioglu et al., 2008; Pourghasemi & Rahmati, 2018). In this study, an 80–20% split was chosen. Stratified sampling was performed to create the training and testing sets to ensure equal numbers of points from each group of landslide and non-landslide points. The geospatial data were used to generate training data as image patches for the CNN model and as data tables extracted from both vector and raster datasets for the SVM and DNN models. The input datasets were extracted at the locations of landslide and non-landslide points.

3.2.2 Landslide conditioning factors

Four major groups of factors affect landslide susceptibility: geomorphological, hydrological, geological, and environmental factors. In this study, 14 landslide conditioning factors from these groups were considered: elevation; slope; aspect; planar curvature; rainfall; topographic wetness index (TWI); stream power index (SPI); valley depth; land use; dominant soil; lithology; and distances from roads, drainages, and faults. Maps was obtained for each factor, which were of different types, such as ordinal data in which the order is relevant (e.g., elevation and slope), and nominal data without any ranking or order (aspect, land use, dominant soil, and lithology). Table 1 summarizes the properties and importance of the factor maps.

Table 1 Description of conditioning factors

Geomorphological factors play a crucial role in landslide susceptibility assessment. Elevation is one of the most commonly used factors in landslide modeling, as elevated steep slopes affect the surface reliability and stability (Umar et al., 2014). The slope angle has been widely used as a key factor in landslide modeling as it can represent the sheer stress and force and considerably affects hydrological processes (Nohani et al., 2019; Pourghasemi & Rahmati, 2018). The slope aspect, which indicates the azimuth of maximum slope, affects the amount of sunlight and rainfall received, which influences the precipitation and vegetation and root development (Jaafari, 2018; Kavzoglu et al., 2015). Curvature indicates the concavity or convexity of the surface, which is another morphological factor that can affect erosive processes and their intensity to potentially destabilize the surface. The planar curvature is the amount of curvature in a horizontal plane that determines the convergence or divergence of flows and runoffs (Fallah-Zazuli et al., 2019; Jaafari et al., 2015; Pourghasemi & Rahmati, 2018). Valley depth, which indicates the difference in elevation between the valley and upstream ridge, affects the slope stability and soil pore water pressure, which influence the landslide occurrence probabilities (Hadji et al., 2018; Hakim et al., 2022; Pourghasemi et al., 2020).

Among hydrological factors, rainfall is a notable landslide-triggering factor in the study area, which can also cause erosion, affect vegetation density, and promote the generation of runoff (Dou et al., 2019b; Mondini et al., 2011). TWI is another factor that has been commonly used in similar studies as it can clarify the influence of the topography and flow accumulation on soil conditions and spatial wetness patterns (Arabameri et al., 2019; Lee & Pradhan, 2007; Panahi et al., 2022). The SPI can reflect the erosive power of flows and its effect on the surface (Sestraș et al., 2019; Sevgen et al., 2019). Drainages and other water flow significantly influence the erosive processes, which then affect the slope stability and landslide probability (Dou et al., 2019a, 2019b; Fallah-Zazuli et al., 2019; Kadirhodjaev et al., 2020); therefore, the distance from these sources was selected as a hydrological factor.

Geological factors are another set of commonly used factors in landslide modeling. The soil texture indicates the strength and permeability of soils, which affect the erosion processes and shear stress (Sharma et al., 2012). We used the dominant soil texture or textures present in each unit to reclassify and simplify the available dataset. As another key factor, the lithology determines the mineral characteristics of different rock types, permeability of rocks, and their contribution to the generation of surface runoff (Hong et al., 2015; Reneau, 2000; Yilmaz & Ercanoglu, 2019). Faults represent notable geological factors as they cause tectonic activities that can trigger landslides. Moreover, faults affect the geomorphology of the surface by deforming it, thereby influencing the slope stability (Fallah-Zazuli et al., 2019; Nguyen et al., 2019; Pham et al., 2020). Therefore, the distance from faults was introduced as a factor.

The last set of impact factors used in landslide susceptibility assessment is environmental factors. In this study, land use and distance from roads were used as environmental factors. Land being used for different purposes has different properties, with some being more susceptible to landslides. Depending on the type of land use or land cover, different units can indicate industrial development status or expectations, their effect on soil stability, and current or expected vegetation density (Nasiri et al., 2019; Nedbal & Brom, 2018), which can alter the landslide probability. The distance from roads and transportation networks was used as an impact factor as their development may have destabilized the soil in their vicinity, and the force applied to the ground by traversing vehicles can increase the chance of landslide occurrence (Fallah-Zazuli et al., 2019; Sestraș et al., 2019).

All the conditioning factor maps were rasterized with a resolution of 85 m. The raster layers were resampled, and polygon layers were rasterized using appropriate toolboxes to create a raster with the same grid as the other layers. Figure 6 shows an overview of the conditioning factor maps.

Fig. 6
figure 6

Conditioning factor maps

3.2.3 Multicollinearity analysis

Before the modeling phase, any multicollinearity among selected parameters must be analyzed and identified. Removing factors with high correlation helps decrease the data dimensionality and model complexity, which can shorten the training phase and prevent models from becoming biased toward certain factors (Tehrany et al., 2019). In this study, the extent of correlation between the conditioning factors was evaluated using the variance inflation factor (VIF), calculated as

$$VI{F}_{i}=\frac{1}{1-{R}_{i}^{2}}$$
(1)

where Ri is the multi-correlation coefficient between the ith factor and other conditioning factors. According to the literature (Kalantar et al., 2019; Roy & Saha, 2019), factors with VIF > 5 are considered to have high multicollinearity and should be removed or combined with another related variable into a single index (O’brien, 2007).

3.3 Numerical modeling methods

3.3.1 SVM

SVM is a supervised learning method that originated from statistical learning theory and the structural risk minimization principle (Lee et al., 2017b; Vapnik, 1995). This method can be used for both classification and regression (Cristianini & Shawe-Taylor, 2000; Vapnik, 1995). SVM determines a line or hyper-plane in the multi-dimensional space of training samples to separate the samples into two classes with optimal margins (distances from the separating surface and closest point) (Xu et al., 2012; Yao et al., 2008). Larger margins have been noted to be more resistant to noise (Kanevski, 2009). The initial space of an SVM can be transformed into a feature space with a higher dimensionality by using a kernel function. This transformation can increase the linear separation between the points (Abe, 2010; Chang & Lin, 2001) and allow the SVM to function as a nonlinear classifier as well. The commonly used kernel functions are the radial basis function (RBF) and linear, sigmoid, and polynomial kernels.

The RBF kernel, defined in Eq. (2), has been applied successfully in similar nonlinear regression problems, such as flood modeling (Tehrany et al., 2015), and was thus used in this work after numerous optimization trials.

$$K\left( {\vec{x},\overrightarrow {{x^{\prime}}} } \right) = \exp \left( {\left\| { - \frac{{x - x^{{\prime}{2}} }}{{2\sigma^{2} }}} \right\|} \right)$$
(2)

where \(\left\| {x - x^{\prime}} \right\|^{2}\) is the squared Euclidean distance between the two feature vectors, and \(\sigma\) is a tuning parameter (Tien Bui et al., 2016). In addition to the kernel function, the SVM has two other key hyperparameters: gamma (\(\gamma =\frac{1}{2{\sigma }^{2}}\)) and regularization. Gamma determines the spread of classification boundaries, thereby affecting the flexibility of the model in classifying new data samples close to the classification boundaries. The regularization parameter is used for error control and relates to the tolerance of misclassification. With lower tolerance, the boundaries become stricter, resulting in lower errors and higher accuracies. However, the model reliability may not necessarily increase and must be assessed using the test dataset after training.

3.3.2 DNN

The DNN model is an ANN with more than one hidden layer (LeCun et al., 2015). This deeper structure allows the model to extract more complex features and patterns from the input data (Schmidhuber, 2015). DNN models alter data from one depiction to another and are therefore widely used for pattern detection and classification in nonlinear problems such as landslide zoning.

The first and last layers of a DNN are the input and output layers, respectively. Several hidden layers are present between the input and output layers. Each node acts as a variable, containing the value calculated from the previous layer’s nodes and an activation function that transforms the previous values into a new range. Given the structure of a DNN, it is necessary to determine the proper combination of the number of layers, number of nodes in each layer, and activation function in each layer. Increasing the node and layer counts can enable the recognition of more complex data relationships but also increases the number of parameters to be calculated, which can hinder the learning process. The activation function determines the data transformation between layers, which influences the feature classification quality. Various activation functions, such as the rectified linear unit (ReLU) and sigmoid, linear, and tanh functions, are available to accomplish diverse modeling objectives.

DNNs are large neural networks and are thus susceptible to overfitting and deteriorate performance when provided new data samples. Overfitting causes the network to mimic sample properties, thereby reducing the model flexibility. Dropout layers are typically used to prevent this phenomenon. Through the introduction of dropout layers, a random number of layer nodes are ignored, preventing weight update. The dropout rate, as a percentage, determines the number of dropout nodes selected. Notably, dropout rates that are too low will not prevent overfitting, whereas excessively high rates may result in underfitting and prevent convergence.

Other parameters, such as the batch size and learning rate, also affect the DNN training and convergence speeds. During DNN training, optimal model weights are determined over a series of training epochs in which the weights are iteratively updated to decrease the overall model error. The batch size controls the number of sample points per epoch and can thus be appropriately selected to balance the training accuracy and speed. The learning rate controls how much the weights are allowed to change when updated. A high learning rate will result in overshooting while updating weights, which may retard or prevent model convergence or destabilize the learning process. In contrast, a low learning rate will drastically slow down model convergence and increase the number of training epochs needed to reach adequate metrics.

3.3.3 CNN

CNNs are modified DNNs that specialize in processing images or gridded data with convolutional, pooling, flattening, and fully connected layers (Li et al., 2021). Without these layers, image processing with a DNN may require excessive data samples and result in a slow and process-heavy training phase (Lee et al., 2017a). Convolution layers apply filters or kernels over regions of images and then pass results to the next layer, thereby creating feature maps. Pooling layers decrease the number of pixels, and thus, the image size and overall parameter count, while preserving important features. Flattening layers convert the resulting feature map to a fully connected layer with an equal number of neurons. The product is the input layer of an ANN or DNN, which is responsible for the classification of the features extracted from the initial image data.

The number of convolution layers influences feature detection in the input data, in addition to the learning speed and parameter count. Similar to the node count in the DNNs, the number of filters in each convolution layer determines the amount of data transferred from one layer to another, which defines the balance between speed and performance. Two other parameters that affect information preservation are the kernel size and stride. The kernel size determines the data window used in each step for feature detection. The optimal choice for this parameter depends on the problem: Choosing a small kernel size may result in the loss of possible spatial data patterns or error mitigation, whereas a large kernel size may increase the number of parameters and negatively affect the training process and convergence. In this study, 3 × 3 and 5 × 5 pixel kernels were tested for each layer to optimize the model performance. The stride relates to the travel of the kernel or pooling window over the input data. Increased stride may decrease the parameter count but result in overshooting and the loss of important data features. In this study, stride values of 1, 2, and 3 pixels were tested for the max pooling layers.

3.4 Hyperparameter optimization

The hyperparameters of the CNN, DNN, and SVM models (Table 2) affect their learning rate and stability, as discussed in the previous sections. Therefore, we determined the optimal combination of these hyperparameters for each algorithm.

Table 2 Hyperparameters of the considered models

Hyperparameter tuning was conducted during the training phase. All the processing and modeling steps were programmed in Python (Release 3.7). Keras library was used for the CNN and DNN model development, and SciKit-learn library was used for SVM modeling. The CNN and DNN models were optimized using the Optuna library (Akiba et al., 2019), and the SVM model was optimized using SciKit-learn’s built-in grid search tool. Table 3 summarizes the software and hardware used for the computational process.

Table 3 Software and hardware specifications

3.5 Evaluation of results

The model performances were evaluated using the training and testing datasets. The performance metrics used during the testing step were the classification accuracy and AUC calculated as follows:

$$AUC={\int }_{0}^{1}f\left(FPR\right)dFPR=1-{\int }_{0}^{1}f\left(TPR\right)dTPR$$
(3)
$$\text{Accuracy}\,=\,\frac{TP+TN}{TP+FP+TN+FN}$$
(4)

where TPR and FPR are the true positive rate and false positive rate, respectively. TPR indicates the percentage of landslide samples correctly classified, and FPR represents the percentage of non-landslide samples misclassified. FN, TN, TP, and FP denote the false negative, true negative, true positive, and false positive, respectively (Darabi et al., 2021). The accuracy was calculated as the ratio of correct predictions among all predictions. The error values for all models were calculated as the RMSE.

$$\text{RMSE}=\sqrt{\frac{\sum_{i=1}^{n}{\left({Y}_\text{Actual}-{Y}_\text{Predicted}\right)}^{2}}{n}}$$
(5)

where \({Y}_\text{Actual}\) and \({Y}_\text{Predicted}\) are the real and predicted values, respectively, and \(n\) is the total number of samples. Following Khosravi et al. (2016), AUC values in the ranges 0.5–0.6, 0.6–0.7, 0.7–0.8, 0.8–0.9, and 0.9–1.0 were used to indicate poor, moderate, good, very good, and excellent performances, respectively.

3.6 Susceptibility mapping

After optimizing the hyperparameters to maximize the model performance, the final models were used to generate the susceptibility maps. This process resulted in a raster map of landslide susceptibility indexes per model, enumerated with values ranging from 0 to 1, where 0 indicated similarity to non-landslide samples, and 1 indicated similarity to historical landslide occurrences. The resulting map values were then reclassified into five categories corresponding to very low, low, medium, high, and very high landslide-prone areas. This classification was performed using the quantile method, which has been commonly used for threshold value determination with skewed data (Akgun et al., 2012).

4 Results

4.1 Multicollinearity analysis

As discussed, the VIF method was used to identify any multicollinearity among the proposed impact factors. The results indicated the lack of any multicollinearity among the factors, as the VIF values of all the conditioning factors were lower than 5 (Table 4), and therefore, all 14 factors were used in the modeling process.

Table 4 Variance inflation factor (VIF) for landslide conditioning factors

4.2 Optimization results

The optimized hyperparameters for the CNN, DNN, and SVM models are listed in Table 5. The CNN optimization process recommended the use of one or two hidden layers in most candidate models, resulting in a simpler model. Moreover, introduction of the dropout did not benefit the CNN model, and including a dropout rate resulted in fewer candidate models in each test batch.

Table 5 Hyperparameter optimization results

The DNN model was not sensitive to activation functions. All three functions (Softmax, ReLU, and linear) were interchangeably used across layers among the candidate models. Optimized dropout rates rarely exceeded 10% in the candidate models, potentially because of the small sample size. The DNN depth was noted to be more important than the layer size (neuron count). Models with more layers and fewer neurons in each layer were more common candidates in test batches than shallow models with neuron counts higher than 50.

4.3 Model validation

Figure 7 shows model predictions of landslide susceptibility, during both training and testing, and their error values. The plots show the expected values (red lines) and calculated values (blue lines) of each sample in the training and testing sets. RMSE values were calculated for each dataset and model. Despite more instances of predictions being closer to the expected values (0 and 1), the DNN and SVM models had more instances with larger errors than the CNN model, resulting in higher overall RMSE errors over both the training and testing datasets.

Fig. 7
figure 7

Expected (red lines) and predicted (blue lines) values during training (left) and testing (right)

All models achieved satisfactory classification accuracy, RMSE, and AUC values (Table 6). The CNN model (88% classification accuracy during testing) was more accurate than the DNN model (79% classification accuracy during testing) and SVM model (80% classification accuracy during testing). The SVM model was slightly more accurate than the DNN model, despite having a slightly higher RMSE error (approximately 0.43 and 0.40 in testing, respectively). In terms of the AUC values, during testing, the CNN model (AUC = 0.88) outperformed both the DNN (AUC = 0.82) and SVM (AUC = 0.80) models. Moreover, the CNN model exhibited a higher robustness than the other models, owing to its smaller difference in the testing and training AUC values. Specifically, the CNN was slightly more robust than the DNN model, but considerably more robust than the SVM model.

Table 6 Model performance metrics

4.4 Susceptibility maps

Landslide susceptibility maps were generated with each model (Fig. 8). The maps contained comparable land-area proportions of the five different susceptibility classes (Figs. 8 and 9). The CNN predicted considerably more land area in the very low susceptibility class and considerably less land area in the moderate and high classes. These results could be explained by the fact that the CNN used the neighboring class properties in each pixel. Because very high and very low index values determined the error (which is the loss metric to be minimized in the training phase), the model tended to prioritize them over intermediate values. Therefore, fewer intermediate values were predicted. For example, pixels of high susceptibility were close to pixels of very high susceptibility and were thus classified as having very high susceptibility. The percentages for each susceptibility class for each model are shown in Fig. 9.

Fig. 8
figure 8

Landslide susceptibility maps produced by the a CNN, b DNN, and c SVM models

Fig. 9
figure 9

Percentages of each landslide susceptibility class

4.5 Factor importance analysis

To identify the conditioning factors that most notably affected the LSM, the Relief-F method was used to calculate the importance factors. Figure 10 presents a comparison of the importance factors for each model. The most important conditioning factors among all models were the rainfall and distances from roads and drainages, followed closely by elevation, slope, TWI, valley depth, and distance from faults. The significance of the remaining factors was considerably lower than these factors.

Fig. 10
figure 10

Comparison of the importance of conditioning factors between models

5 Discussion

5.1 Importance of localized susceptibility assessment

Landslide susceptibility in Iran has previously been studied with various approaches, such as fuzzy analytic network process (Alilou et al., 2019) and ML techniques (Pourghasemi & Rahmati, 2018; Shirzadi et al., 2019). The existing studies were conducted in different regions with different climates and geological properties. Although the models attained high accuracy, they yielded differing results regarding the importance of conditioning factors. In other words, although ML and DL models can be successfully applied to diverse regions, they must represent the regional differences associated with landslide factors and triggering mechanisms. For example, although Thi Ngo et al. (2021) used the CNN and RNN methods to perform a national-scale LSM of Iran, their results are not necessarily reliable or correct for every region in the country, and the use of DL methods necessitates more comprehensive research.

5.2 Model performance

The ML and DL methods evaluated in this study have been separately used in existing landslide studies and noted to yield promising results (Dou et al., 2020a, 2020b; Thi Ngo et al., 2021). A direct comparison of the three methods combined with parameter optimization suggested that the CNN model has a performance advantage over the more commonly used SVM and DNN models. Specifically, the CNN model exhibited higher AUC and accuracy values across the database and the smallest RMSE error. Moreover, the CNN’s AUC during testing (0.88) was comparable to the CNN performance (0.85) reported by Thi Ngo et al. (2021). However, the AUC was slightly lower than the values for a DNN (0.90–0.92) reported by Dou et al. (2020b) and an SVM model (0.74–0.91) reported by Dou et al. (2020a).

5.3 Factor importance

We observed that the rainfall, distance from roads, distance from drainage, and elevation were the most notable factors affecting LSM. Rainfall is known to be a key factor affecting landslide occurrence in the area. In particular, Kermanshah experiences intense rainfalls and has a high amount of annual rainfall compared to the national average, which promotes soil destabilization. Road construction and usage lead to further destabilization, which renders the land ready to collapse and cause a landslide. Proximity to drainage network affects the land through erosion and the provision of unstable layers of runoffs, which can lead to mass movements. These movements and drainage patterns have also been known to affect the landslide probability. Finally, elevation influences the landslide probability by indirectly affecting the precipitation and vegetation cover. Many researchers have highlighted the significance of the distances from roads (Dao et al., 2020) and drainage (Dao et al., 2020; Kalantar et al., 2020), slope and elevation (Dou et al., 2020a; Liang et al., 2021; Mandal et al., 2021), rainfall (Mandal et al., 2021), and TWI (Liang et al., 2021; Panahi et al., 2020) in LSM. In contrast, other researchers have indicated that rainfall (Liang et al., 2021), slope (Dao et al., 2020), and land use (Panahi et al., 2020) are not highly important. The variable importance results obtained in this study are moderately different from those reported by Thi Ngo et al. (2021) based on their national-scale LSM of Iran. While we determined the rainfall, distance from roads, distance from drainage, and elevation to be the most important factors, Thi Ngo et al. (2021) deemed these factors moderately important and mentioned slope as the most important factor. Nevertheless, there are several similarities between the results of the two studies, such as the aspect and elevation having low and moderate importance values, respectively. These comparisons emphasize that LSM should be conducted locally in countries with diverse climate and geological properties.

5.4 Limitations and scope for future studies

The two main limitations of this study compared with similar previous studies are the smaller sample size of the landslide inventory and lower accuracy and resolution of the input data and certain factor maps. As such, the results of developing a single model and comparing its performance with that of the existing studies is not reliable. To address these limitations, we developed multiple models using the same limited dataset available and assessed their potential in modeling such data. Although all the developed models exhibited satisfactory performance, model performance can be enhanced by using larger and more accurate datasets.

The sample size is a key factor affecting model training. In this study, 110 landslides locations were combined with 110 randomly sampled non-landslide locations and split into 80–20% subsets for training and testing data, respectively. The resulting training (176) and testing (44) points represented small samples for DL algorithms. For DL models, such as CNNs and DNNs, large sample sizes are recommended, especially if the models are configured with high levels of complexity (e.g., high hidden layer or convolution layer count), as shown by previous similar studies, where 440 total points were used on average (Dao et al., 2020; Yao et al., 2020). A small sample size can limit the complexity and thus suppress the advantages of DL approaches. As a structured learning method, the SVM was less affected by sample size compared with the CNN and DNN and exhibited a performance comparable with that of the DNN model.

In future work, several strategies can be implemented to alleviate the effects of small samples sizes. Additional data on landslide locations can likely be acquired through other methods, such as remote sensing, thereby increasing the sample size (Kalantar et al., 2020; Liang et al., 2021). Moreover, semi-supervised learning can be introduced to add more samples to each class and update uncertain labels in training iterations (Yao et al., 2020). Lastly, RNNs, which use neural nodes that keep historical information from previous samples and steps, may be a promising model for solving the problem of interest, having been shown to improve the performance of models with small sample sizes (Xiao et al., 2018).

The accuracy of input data may also limit the accuracy of the models. In this study, certain conditioning factor maps had scales ranging from 1:100,000 to 1:500,000, smaller than those used in similar previous studies. The larger pixel size of the output maps led to a low spatial resolution, and potentially, a low accuracy. The data resolution can be enhanced by using more detailed mapping survey data or remote sensing products, which can help improve data resolution, and possibly model accuracy.

6 Conclusions

The performances of CNN, DNN, and SVM algorithms for LSM in Kermanshah, Iran were evaluated and compared. The hyperparameters were optimized to ensure that the models achieve their peak performance values to conduct a reliable comparison. The results indicated that the CNN (AUC = 88%) outperformed the DNN (AUC = 82%) and SVM (AUC = 80%) models for LSM. Moreover, the CNN model was more robust than the other models, given the smaller difference in its AUC values for the training and testing datasets. In addition, the CNN predicted the locations of landslide and non-landslide points with the lowest overall RMSE. The superiority of the CNN was attributable to the use of a dataset with lower spatial accuracy and limited number of sample points compared with those used in similar studies conducted worldwide. In other words, the CNN model could more effectively handle datasets with low data quality and quantity than the other proposed models in similar situations. Although these three data-driven techniques had not been directly optimized and compared for LSM prior to this study, their individual performances, in terms of the AUC values, were comparable to those reported previously. Therefore, the CNN may be a valuable tool for LSM to support future planning and development in other landslide-prone regions worldwide, especially in areas with limited data availability or quality.