1 Introduction

A section (12.6%) of India’s land surface area (excluding regions covered by snow) is landslide-prone (Chawla et al. 2018). This amounts to a total of 0.42 million km2 area, out of which around 43% area lies in the North-eastern part of the Himalayas (GSI 2014). As per data from the National Crime Records Bureau’s (NCRB) report on accidental deaths (2010–2019), about 304 people die of landslides on an average annually in India. The highest annual death toll of 499 was recorded in 2014. A 2011 study by the National Institute of Disaster Management (NIDM) estimated Rs. 150–200 crores of monetary loss due to landslides in India. Kalimpong district is a part of the North-eastern Himalayan Region (NEHR) and is highly frequented by landslides, especially during the monsoons from July to September. Kalimpong region exhibits a highly sloping hilly terrain and is heavily drained by intense rainfall making it extremely susceptible to landslides. Anthropogenic infrastructure projects like roads, settlements, hydro-power projects, etc., at the expense of vegetative cover also loosen the slope material. This allows the loose material to readily slide downslope with slight lubrication by water. All of these reasons collectively make Kalimpong a region of interest for landslide studies. It is also imperative that the landslide susceptible zones be demarcated so that appropriate measures can be taken to mitigate the threats to life and property.

Over the years, landslide studies have evolved. Earlier, ground measurement data and landslide records were used to prepare landslide inventory maps. These are simple maps that contain the location, dimensions, and area of the landslide (Guzzetti et al. 1999). With the advancement of aerial photography, the preparation of these maps became easier and more convenient. The studies by Ayalew and Yamagishi (2005), Pradhan and Lee (2009), and others indicate the wide applications of aerial photographs in landslide studies. Gradually, developments in remote sensing (RS) techniques further facilitated effective landslide detection, mapping, monitoring, and hazard analysis (Tofani et al. 2013). Digital elevation models (DEM) have become an integral part of the landslide susceptibility mapping process as observed from the studies of Dahal et al. (2008), Balsubramani and Kumaraswamy (2013), and many more. Many spatial parameters contributing to landslides such as slope, curvature, aspect, drainage, etc., can be easily extracted from a DEM. Other RS techniques like RADAR, LiDAR, UAVs, etc., have also been used to study landslides and identify their risks. However, with the improvement of RS techniques and exploration of multiple landslide causative factors, the handling of huge data became difficult. This was aided by GIS which can store, modify, analyze, and display large amounts of spatial data conveniently. Therefore, RS and GIS have been used extensively as tools for mapping landslides and predicting future hazards (Gupta and Joshi 1990; Nagarajan et al. 1998; Gupta 2003).

Through the review of literature, various methods of landslide susceptibility mapping (LSM) have been identified. Simple methods to map landslides are landslide inventories. More sophisticated methods involve quantitative approaches based on statistics and probability like (i) probabilistic likelihood (frequency ratio) analysis in which the probability that a landslide factor will cause a landslide is calculated, it is termed as frequency ratio (FR). The FR for all factors is added to get the actual probability of landslide occurrence (Pradhan and Lee 2010a; Park et al. 2013). (ii) Information value method (IVM) and its modified form are statistical methods that also use probabilities like the FR method (Sarkar et al. 2006; Balsubramani and Kumaraswamy 2013; Ghildiyal et al. 2019). (iii) Fuzzy logic method uses the fuzzy set theory. Fuzzy memberships are used for the factors that aid in landslide occurrence for each pixel of the study area raster. The maximum membership is assigned to all the pixels to make an LSM (Pradhan 2010). (iv) Weight of evidence (WofE) method assigns positive and negative weights to locations having a landslide factor and not having one respectively. These weights determine the final landslide susceptibility (Dahal et al. 2008). (v) Logistic regression is another common statistical method for LSM. It helps in finding a mathematical relation between a landslide and its factors (Ayalew and Yamagishi 2005; Chi et al. 2019). (vi) Analytical hierarchy process (AHP) is also a method for LSM, but it has the demerit of introducing subjectivity. An expert assigns weights to each of the landslide factors through pair-wise comparison and relative importance of one factor over another (Yalcin 2008; Pourghasemi et al. 2012; Haoran et al. 2019). The latest works on LSM involve the use of machine learning methods like support vector machines (Bahareh et al. 2019), Naïve Bayes (Yaning et al. 2019), reduced error pruning Trees (Pham et al. 2019), rotation forest and decision trees (Haoyuan et al. 2018), etc. All of these methods have their own sets of assumptions, merits, and demerits. However, artificial neural networks (ANN) has been termed as an advantageous method for LSM as it does not depend on how the data is statistically distributed. It can work with many kinds of data, even if it is imprecise, and also does not violate assumptions (Ermini et al. 2005; Pradhan and Lee 2007, 2009, 2010a, b). Therefore, a specific algorithm of ANN called the resilient back propagation (Rprop) has been used as a new approach to LSM in this study.

The main objective of this research is to prepare a landslide susceptibility map of Kalimpong using a Rprop ANN model that can predict future landslides quickly with high accuracy. Rprop algorithm has extremely high speeds of updating weights of factors and can give highly accurate results in a very short time (Navneel et al. 2013). Thus, the Rprop ANN model used for LSM can provide a reliable reference map for policy formulation to mitigate landslide risks in a quick time.

2 Study area and physiographic settings

This study is focused on Kalimpong district, northern part of West Bengal, India, which lies within the latitude 26°51′40″–27°11′44″N and longitude 88°23′16″–88°53′00″E (figure 1a). Kalimpong district has a total area of 1,092 km2. Kalimpong has an average elevation of 1,603 m. The highest and lowest elevations are 3124 and 82 m, respectively (above mean sea level). The main town is a ridge adjacent to the Teesta River. Kalimpong is drained by Teesta, Relli, Leesh, Geesh, Neora, Murti, Jaldhaka rivers, and numerous small streams. These rivers and streams cause active denudation of the valley side slopes through erosion causing steepening of the slopes. The interfluvial area is sharpened, thus making the terrain more prone to landslides. The annual mean temperature is about 17.5°C with mild summers at 26°C (avg. maximum) from April to June (IMD 2020). The monsoonal rainfall from June to September acts as the main triggering factor for the landslides. Kalimpong’s average monthly rainfall in the monsoon seasons from June to September ranges from 119 to 417 cm (https://worldclim.org). The main problem arises due to the destruction of highways by landslides that connect Kalimpong to the plains of Siliguri (figure 2). The supply chain gets disrupted causing hardships to people. Sometimes even houses and passenger vehicles are carried downslope with landslide debris causing loss of life.

Figure 1
figure 1

(a) The study area of Kalimpong and its location with (b) 184 landslide inventory points.

Figure 2
figure 2

Landslides in Kalimpong (a) a section of the Lava Road (27°02′02.5″N, 88°41′38.7″E), (b) road near Yang Makum Khasmahal (26°56′34.1″N, 88°29′11.4″E), (c) NH10 near the Teesta River (26°56′34.1″N, 88°29′11.4″E), (d) near Nimbong Khasmahal (26°57′42.2″N, 88°34′49.3″E), and (e) a section of Rishi Road (27°05′24.8″N, 88°39′01.0″E).

The Kalimpong Hills are part of the tectonically active Eastern Himalaya. The geological units of Kalimpong consist of Precambrian slates, schist, phyllite, quartzite, gneisses, lower Gondwana and Shivalik sandstones, and recent to sub-recent alluvium. The overall rock type of the district mainly consists of granite, gneiss, shales, and most importantly sandstone and solidified but poorly consolidated clutter of conglomerate formations. The soil in the Kalimpong area is typically reddish. Dark soils are also found occasionally due to the existence of phyllites and schists. The soil zones of the study area have been described in table 1. Different soils contribute differently to landslide occurrence. Landslides show a high probability of occurrence in moderately shallow, well-drained, gravelly loamy soils with moderate erodibility and rockiness (Mondal and Mandal 2019). Figure 3(a) shows the regions of the study area where these soils are found (National Bureau of Soil Survey and Land Use Planning – NBSS & LUP Data). Kalimpong area mostly has coarse loamy and gravelly loamy soils which are shallow in depth and excessively drained. A characteristic feature of the grain size composition of the soils in Kalimpong is the high percentage of sandy and coarse particles up to 50–80%. The steeper slope segments have a high content of coarse debris. All of these features contribute to the low cohesivity of the soil making it slip-ready. Narrow ridges separated by closely-spaced V-shaped valleys, where the slope varies between 15° and 40° are found in the region (Geomorphological Field Guide Book, IGI 2017). The extensive compression due to the inter-continental plate convergence in this area has caused large-scale deformation in the lithological units. Cracks, joints, thrust planes, and schist planes are present throughout the region leading to a fragile lithology. This fragile base loses its cohesion during rainfall causing slope instability and ultimately failure. Rapid urbanization due to tourism in Kalimpong since the 1950s has led to deforestation. Forest cover has been replaced by tea gardens causing loss of deep-root reinforcement of soils. The construction of communication lines, buildings, roads, and hydro-power facilities have further aggravated the fragility of these hills. To add to the problem, Kalimpong falls in high seismic intensity Zone-IV (BIS 2002) making the zone susceptible to tremor-induced landslides also. The study area has landslides of debris, translational, rockslide, and debris-cum-rockslide-type.

Table 1 Types of soil within the Kalimpong study area (NBSS and LUP data).
Figure 3
figure 3

(a) Soil zonation of Kalimpong and (b) landslide and non-landslide (safe) points.

3 Materials and methodology

3.1 Landslide inventory

The landslide inventory as described in section 1 is the first step in LSM (Yalcin et al. 2011). This map of landslide locations was used to train the Rprop ANN model to determine the weights of the landslide factors. The landslide inventory was prepared by the digitization of landslide polygons in ESRI ArcGIS 10.6.1. To achieve this, World Imagery was added as a base map layer and the polygons were drawn at a scale of 1:4,000. The map features Maxar imagery at 0.3-m resolution for select metropolitan areas around the world, 0.5-m resolution across the United States and parts of Western Europe, and 1-m resolution imagery across the rest of the world. A total of 184 landslide occurrence sites were polygonized within the study area (figure 1b). The landslide polygon locations were validated with Google Earth Imagery in ERDAS Imagine for inaccessible locations, and the accessible locations were ground-truthed with a GPS receiver. This step was done to ensure good quality of data for training the Rprop ANN model. The total area of inventory landslide polygons was 5.4 × 105 m2 with a standard deviation of 5948.50 m2. To demarcate between the safe zones and unsafe zones, a non-landslide safe point layer was created using a slope cut-off value (figure 3b). Out of 184 landslides, it was identified that only three points fell within the 10° slope range (around 1.63% of total slides). Hence, the slope value of 10° was chosen as the threshold to demarcate between slide occurrence and no slide occurrence. Slopes less than 10° were considered as safe slopes and slopes above 10° were considered as landslide-prone. Then this information was entered into a custom-built model in QGIS to randomly create 184 non-slide points in zones of <10° slopes. The unsafe slope regions were transformed into grids of 200 m × 200 m to simulate the areas of landslide occurrence. Similarly, the safe slope zones were transformed into grids of the same dimensions to simulate areas of no landslide occurrence. The grids were randomly distributed within the safe and unsafe slopes. These grids were then used to segregate the training and testing data into 80% (424 observations) and 20% (110 observations) of total grids respectively. Few prominent landslides in Kalimpong have been illustrated in figure 2 with their coordinates.

3.2 Thematic data layers of landslide factors

Various factors that contribute to landslide study have been identified through an extensive literature review (table 2). The landslide factors can be determined through the knowledge of landslides (Guzzetti et al. 1999). A total of 11 landslide factors were considered to suitably contribute to landslide phenomena in Kalimpong. These factors can be subdivided into geomorphological (slope, elevation, curvature, aspect), hydrological (distance to drainage, topographic wetness index – TWI), geological (distance to lineaments, soil type), anthropogenic (distance to roads, landuse), physiographic (normalized difference vegetation index – NDVI), and the main triggering factor (rainfall). All the landslide factors, their data sources and details, and some references have been summarized in table 2.

Table 2 Landslide factors and their data sources.

3.2.1 Geomorphological factors

These factors were derived using ArcGIS 10.6.1 from Cartosat-1 Digital Elevation Model (CartoDEM) downloaded from https://bhuvan.nrsc.gov.in. It is a DEM by ISRO and has a resolution of around 30 m. The elevation range was subdivided into 82–486.49, 486.49–844.50, 844.50–1223.76, 1223.76–1627.28, 1627.28–2124.87, and 2124.87–3124 m using the Jenks Natural breaks (JNB) classifier for training the Rprop ANN model (figure 4a). This classifier identifies breakpoints between classes in a way that the difference between each class is maximized (Federici et al. 2007; Conforti et al. 2014). Higher elevation and landslides have direct co-relation (Umar et al. 2014). Aspect affects the rate of weathering by strengthening the effects of direct sunlight exposure, the impact of rainfall, and the abrasion of dry winds (Ercanoglu and Gokceoglu 2002). Weathered material on the slope takes a long time to consolidate which leaves the loose material on the slopes ready to slide. Data values of aspect were also derived in ArcGIS from the CartoDEM (figure 4b). The aspect range (–1° to 360°) was subdivided into eight categories for this study, viz., N (north) to NE (north-east) facing (0°–45°), NE to E (east) facing (45°–90°), E to SE (south-east) facing (90°–135°), SE to S (south) facing (135°–180°), S to SW (south-west) facing (180°–225°), SW to W (west) facing (225°–270°), W to NW (north-west) facing (270°–315°), and NW to N facing (315°–360°). A flat aspect of –1° was not taken in the analysis as flat surfaces do not contribute to landslides. Slope plays a very crucial role in landslide occurrence by reducing the shear strength of slope materials. Steeper slopes increase the tendency of landslide occurrence (Nefeslioglu et al. 2010; Mandal and Mandal 2018). It was observed that the slope values range from 0° (flat) to a little over 70° (steep-slope) in the study area. The slope range was sub-classed using the JNB classifier into 0°–12.75°, 12.75°–22.51°, 22.51°–30.63°, 30.63°–39.83°, and 39.831°–71.18° (figure 4c). Curvature represents how the slope gradient varies spatially. The topography can have convex (positive curvature), concave (negative curvature), or straight segments (0 curvature) of the slope (Conforti et al. 2014). Concave curvatures lead to the concentration of drainage which saturates (moisture saturation) and lubricates the slope materials, greatly reducing its shear strength. The probability of a landslide event increases with higher negative values of curvature which represent an increasing degree of concavity (Mondal and Mandal 2019). Standard curvature values were taken for this study. Standard curvature combines both planform and profile curvatures. The plan curvature affects how the water and landslide debris converges or diverges over the space, and the profile curvature influences the speed of the flow. A positive planform curvature value indicates sideward convexity while a positive profile curvature indicates upward concavity of the surface. They affect the rate of erosion and deposition, and hence the sediment yield (Das 2021). The standard curvature values were in the range of –44 to 40. This range was divided into curvature classes of –44 to –20.03, –20.03 to –11.96, –11.96 to –3.97, –3.97 to –0.02, –0.02 to 4.01, 4.01–7.96, 7.96–12, 12–40 using the JNB classifier (figure 4d).

Figure 4
figure 4

Geomorphological landslide factor rasters. (a) Elevation, (b) aspect, (c) slope, and (d) curvature.

3.2.2 Hydrological factors

The proximity of a location to drainage has a positive relationship with landslide phenomena (Yilmaz 2009). Greater proximity to streams led to a higher probability of landslide occurrence and vice-versa (Lee and Pradhan 2007). Streams could adversely affect the stability by either eroding the toe or saturating the slope or both (Gokceoglu and Aksoy 1996; Pradhan et al. 2010; Yalcin et al. 2011). The CartoDEM was used to build the drainage map. After the vector drainage map was created, multiple ring buffers were generated at intervals of 30, 70, 150, 300, and >300 m for covering the entire study area. This buffer map was then rasterized in ArcGIS for use in the analysis (figure 5a). TWI represents the moisture saturation condition in the terrain, which indicates the accumulation of drainage (Beven and Kirkby 1979). The role of moisture saturation (m.s.) towards landslide occurrence has already been described in section 3.2.1. The TWI map (figure 5b) was prepared from the slope map and flow accumulation map using a raster calculator in ArcGIS 10.6.1 by applying a modified form of Beven and Kirkby’s equation for TWI (1979) as stated in equation (1):

Figure 5
figure 5

Hydrological landslide factor rasters. (a) Distance to drainage and (b) TWI.

$$\mathrm{TWI }=\frac{\left(\alpha +1\right)\times \mathrm{cell\,size}}{\mathrm{tan}\left(\beta \times \frac{\pi }{180}\right)+0.001},$$
(1)

where α represents flow accumulation, and β is the slope in degrees. The TWI values range from 0.71 (low m.s.) to 19.98 (high m.s.). The TWI values were sub-divided into 0.70–2.83, 2.83–3.75, 3.75–4.70, 4.70–5.78, 5.78–7.13, 7.13–8.81, 8.81–10.97, and 10.97–19.98 using the JNB classifier for training the Rprop ANN model.

3.2.3 Geological factors

Lineaments are very important geological features that influence landslides (O’leary et al. 1976). They signify faulting, shear cracks, and joints in the underlying lithological formations. This makes the terrain fragile which is considered to be prone to landslides (Mondal and Mandal 2019). The chance of landslide occurrence is inversely related to the distance from lineaments (Choi et al. 2012). To produce the lineament map, Bhuvan Web Map Services (WMS) were used (https://bhuvan-app1.nrsc.gov.in/thematic/thematic/index.php). The WMS layer has the West Bengal lineament map of 1:50,000 scale from which the lineaments in Kalimpong were extracted. After the vector lineament map was created, multiple ring buffers were generated at intervals of 100, 400, 1000, 2000, 3500, 5000, 6500, and 8000 m for the Kalimpong study area (figure 6). The influence of soil types on the occurrence of landslides has been discussed in section 2. Hard copy NBSS & LUP soil maps at a scale of 1:5,00,000 were scanned and geo-referenced to the required projection system using Image-to-Image registration in ERDAS Imagine. The various soil types in the study area of Kalimpong are W001, W002, W003, W004, W007, W008 as per NBSS & LUP data, Kolkata (figure 3a) (table 1).

Figure 6
figure 6

Distance to lineament map.

3.2.4 Anthropogenic factors

The distance to roads has a similar effect on landslide occurrence as distance to drainage (Ayalew and Yamagishi 2005). Road construction on hilly areas requires cutting of slopes which destabilizes the consolidated material and makes it prone to landslides. The roads in the study area were digitized manually in ArcGIS 10.6.1. For this, the World Topographic Map was used as Base Map. It has a scale of 1:4,000 for India (the coverage and source details can be found in www.arcgis.com/home/item.html?id=30e5fe3149c34df1ba922e6f5bbf808f). After the vector road map was created, multiple ring buffers were generated at intervals of 20, 50, 100, 400, 1000, 2000, 3500, 5000 6500, and 8000 m (figure 7a). Landuse plays a crucial role in the modification of consolidated slope material through anthropogenic intervention (Yalcin 2008). Infrastructure works of roads, buildings, etc., especially in hilly areas lead to destabilization of slopes due to overburden and undercutting. Expansion of towns and villages also causes deforestation which reduces the reinforcement provided by roots against soil erosion. To prepare the landuse map, LISS-IV images were subjected to supervised image classification in ERDAS Imagine. Spectral signatures spread throughout the image were taken for the landuse classes of agriculture, dense forest, open forest, water body, settlement, barren land, and sand. These signatures were used to classify the image based on the maximum likelihood classifier. The signatures were then checked by computing Kappa from the contingency matrix. From the matrix, Kappa was calculated to be 0.987. Since the calculated Kappa value showed almost perfect agreement (Kappa between 0.81 and 1.00) with the reference data (Landis and Koch 1977), the signatures were used for landuse classification. The incorrectly classified pixels were then manually recoded in ERDAS IMAGINE producing the final landuse map (figure 7b).

Figure 7
figure 7

Anthropological landslide factor rasters. (a) Distance to roads and (b) landuse.

3.2.5 Physiographic factor

Soils with sparse vegetation or barren lands are more susceptible to landslides. In contrast, vegetated slopes provide root reinforcement to hold the slope material in place and also mitigate the effect of rain and runoff. This causes an increase in soil strength and stability which ultimately reduces the susceptibility to landslides (Beguería 2006). NDVI is a representation of the vegetative cover in an area and thus, indicates the extent of protection of the slope against landslides. The NDVI map (figure 8a) was derived from the LISS-IV image in ArcGIS 10.6.1. The NDVI values in vegetation deprived regions were as low as –0.27 up to 0.84 (high) in regions of dense forest cover. The NDVI values were sub-classed into –0.27 to 0.19, 0.19–0.43, 0.43–0.57, 0.57–0.67, and 0.67–0.83 to train the Rprop ANN model.

Figure 8
figure 8

Physiographic and triggering factor rasters. (a) NDVI and (b) rainfall.

3.2.6 Triggering factor

The most commonly cited triggering factor for landslides is rainfall. The probability of landslide initiation is directly proportional to the amount of rainfall (Pradhan and Lee 2007). Rainfall data of 10 years was downloaded from the Worldclim website (https://worldclim.org/data/worldclim21.html). This high-resolution rainfall data was used for the analysis (figure 8b). Kalimpong’s average monthly rainfall in the monsoon seasons from June to September was found to be in the range of 119–417 cm. The rainfall values were categorized using the JNB classifier as 119.21–186.60, 186.60–237.60, 237.60–275.77, 275.77–314.83, 314.83–361.06, and 361.06–417.12 cm for training the Rprop ANN model.

3.3 Workflow

The detailed workflow of the study (figure 9) consists of (i) collection of data and pre-processing, (ii) preparation of thematic data layers, (iii) generation of the datasets for the Rprop ANN model, (iv) modelling of the Rprop ANN, (v) validation of the model using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, and (vi) generation of Kalimpong’s landslide susceptibility map.

Figure 9
figure 9

Workflow of the Rprop ANN architecture and analysis.

4 LSM using Rprop ANN model

The resilient back propagation (Rprop) ANN model was originally designed to solve some inherent problems of the traditional Backpropagation (Bprop) ANN model. The latter has been widely used for landslide susceptibility studies (Park et al. 2009; Choi et al. 2012; Bahareh et al. 2019). This study proposes the use of the Rprop ANN to do LSM instead of the traditional Bprop ANN model due to its well-established advantages in soft computing. Researchers like Navneel et al. (2013) have already found Rprop to be way ahead of Bprop in terms of speed and accuracy. Similar to the Bprop, the Rprop is also a supervised learning algorithm except for the weight update mechanism. Due to this, Rprop converges the training error extremely fast, and can also handle imprecise input data (Widodo et al. 2017). The landslide factors are the input nodes of the ANN model, and the training dataset of the landslide inventory is the output node in the training phase. Comma-separated value (csv) files for training and testing datasets were prepared. Each file consisted of landslide locations and non-landslide locations. The landslide factor raster values for the same locations were adjoined to the csv files. Landslide locations were given a value of 1 and non-landslide locations were given a value of 0, thus transforming each factor into binary 1s and 0s. The categorical variables of aspect, landuse, soil type, distance to drainage, distance to lineament, and distance to road were converted to numeric variables of 0s and 1s by dividing each variable into its subclasses using dummy variables. All the variables were then normalized using Z-score standardization or Min–Max scaling as typical neural network algorithms require data that are on a 0–1 scale. The Rprop ANN model was trained using 424 observations from the training dataset in this study. The ANN develops the relationship between the landslide factor and the known landslide output in the training stage through a successive update of weights. Each input node of the landslide factors gets a final weight which is a measure of its contribution to the occurrence of a landslide. Bprop ANN calculates changes in landslide factor weights (\({\Delta w}_{ij}^{(t)}\)) through the magnitude of the partial derivative of the error function (E) (equation 2). Riedmiller and Braun (1992) proposed the Rprop algorithm and its underlying equations. These equations have been adapted to determine landslide susceptibility in this study.

$${\Delta w}_{ij}^{(t)}=\alpha \times {x}_{i}^{(t)}\times {\delta }_{j}^{(t)},$$
(2)

where \(\alpha \) is the learning rate of the ANN model, \({x}_{i}^{(t)}\) represents the input weights propagating back to the ith neuron at time step t, and \({\delta }_{j}^{(t)}\) is the corresponding error gradient of the jth landslide factor. Rprop alternatively calculates individual \({\Delta }_{ij}^{(t)}\) (update-value) for the weights of each landslide factor in the following way (equation 3):

$${\Delta }_{ij}^{(t)}=\left\{\begin{array}{l}{\eta }^{+} \times {\Delta }_{ij}^{(t-1) } ,\quad {\text {if}} \quad \frac{\partial {E}^{(t-1)}}{\partial {w}_{ij}}\times \frac{\partial {E}^{(t)}}{\partial {w}_{ij}}>0\\ {\eta }^{-} \times {\Delta }_{ij}^{(t-1) } ,\quad {\text {if}} \quad \frac{\partial {E}^{(t-1)}}{\partial {w}_{ij}}\times \frac{\partial {E}^{(t)}}{\partial {w}_{ij}}<0\\ {\Delta }_{ij}^{\left(t-1\right)} , \quad \quad \quad {\text {else}}\end{array}\right.$$
(3)

where 0 < \({\eta }^{-}\) < 1 < \({\eta }^{+}\). \({\Delta }_{ij}^{(t)}\) continuously adjusts during the learning process based on the sign of the partial derivative of E and not its magnitude. \({\eta }^{+}\) and \({\eta }^{-}\) are constants by which \({\Delta }_{ij}^{(t)}\) of each landslide factor is adjusted to control the speed and accuracy of weight updates. After the \({\Delta }_{ij}^{(t)}\) values for each weight are calculated, the weights of the landslide factors are updated in two ways. First, if the current and previous partial derivatives of E retain their signs, then the weight is updated using equation (4):

$${\Delta w}_{ij}^{(t)}=\left\{\begin{array}{l}{-\Delta }_{ij}^{(t)} , \quad {\text {if}} \quad \frac{\partial {E}^{(t)}}{\partial {w}_{ij}}>0\\ {+\Delta }_{ij}^{(t)} , \quad {\text {if}} \quad \frac{\partial {E}^{(t)}}{\partial {w}_{ij}}<0\\ 0 , \quad \quad \quad {\text {else}}.\end{array}\right.$$
(4)

Second, if the signs of the current and previous partial derivatives have changed, that means the local minimum was skipped, and thus, the weight is reverted to its last stage. Its previous partial derivative is also changed and set to 0 (equation 5).

$${\Delta w}_{ij}^{(t)} = {-\Delta w}_{ij}^{(t-1)} ,\quad {\text {if}} \quad \frac{\partial {E}^{(t-1)}}{\partial {w}_{ij}}\times \frac{\partial {E}^{(t)}}{\partial {w}_{ij}} < 0.$$
(5)

The new weight of the jth landslide factor is calculated using equation (6):

$${w}_{ij}^{(t+1)} = {w}_{ij}^{(t)}+ {\Delta w}_{ij}^{(t)},$$
(6)

where \({w}_{ij}^{(t)}\) is the old weight of the jth landslide factor, and \({w}_{ij}^{(t+1)}\) is the new weight of the jth landslide factor. The new weights are computed until a suitable threshold of error convergence is achieved by the Rprop ANN model.

The Rprop ANN architecture was configured as: (i) no. of hidden layers = 5; (ii) algorithm = RPROP; (iii) activation function = Sigmoid (default); (iv) error function = cross entropy; (v) maximum no. of steps = 1e+08; (vi) threshold = 0.01 (default), (vii) start weights = null (default); (viii) learningrate.limit = null (default); (ix) learningrate.factor = list (minus = 0.5, plus = 1.2) (default). The training error or root mean square error (RMSE) of the Rprop ANN model (figure 10) in this study was found to be 0.02635, and a threshold of 0.008912 was reached in 1,614 steps. The final weight of the jth landslide factor upon convergence is given by \(W_{j} = {w}_{ij}^{(t+1)}\). The final landslide susceptibility index using Rprop (LSIRprop) is given by equation (7):

Figure 10
figure 10

The Rprop ANN model.

$${\mathrm{LSI}}_{\mathrm{Rprop}}=\sum_{j=1}^{n}{C}_{j}\times {W}_{j},$$
(7)

where n is the total no. of landslide factors, and Cj is the value of the jth landslide factor.

5 Results and discussion

5.1 Contributions of landslide factors to landslide

The value of Wj from table 3 using Rprop ANN shows the contributions of each landslide factor to landslide occurrence. The magnitude of the weights Wj can be used to rate the significance of a landslide factor. The factors which have a greater weight contribute more to a landslide phenomenon than ones with lower weights (Dong et al. 2020). All the landslide factors and their Rprop ANN weights have been summarized in table 3. The weights allotted to the elevation classes by Rprop ANN indicate that the maximum chances of landslide occurrence are in the range of 844.50–1223.76 m. From the weights of slope sub-classes, it is evident that as the slope increases and exceeds 30°, the terrain starts to become more prone to landslide events. Higher slopes of the region, i.e., 39.831°–71.18° are the most dangerous in terms of landslide activity. The weights of the aspect classes show that NW to N facing, SE to S facing, and S to SW facing aspects contribute greater than other aspect classes to landslide occurrence. The trend of weights of curvature classes indicates that the slopes with greater concavity (negative values) are more susceptible to landslide phenomena. The lower NDVI values representing lack/sparse vegetation have higher weights, and thus, contribute greater to landslides than higher NDVI values representing dense vegetation. There is a general increasing trend of weights of TWI signifying that the slopes with greater moisture saturation, i.e., higher TWI have a greater effect on slope instability causing landslides. The subclasses for landuse with their weights are dense forest (0.0037), open forest (0.098), waterbody (0.049), settlement (0.10), barren (0.12), sand (0.041), and agriculture (0.019), from which it is evident that barren land is highly responsible for landslide occurrences with respect to other landuse classes, and dense forests show minimum contribution. This is also in conformity with the NDVI weights as mentioned earlier. As per the Rprop ANN model, the landuse category of waterbody got a weight of 0.049. This can be attributed to the size of landslide grids chosen for the study. Landslides in the study area were simulated to 200 m × 200 m square grids to represent the true extent of landslides. Thus, some landslide grids near a waterbody also subsumed some pixels of the waterbody. Therefore, the waterbody got a contribution value despite its non-contributory nature to landslide. However, the contribution of waterbody as regards other landuse categories, and other landslide factors, is low, and it conforms to other landslide studies. Also, the low overall weight of landuse (table 3) negates the impact of the contribution of a waterbody to landslide susceptibility. From all the soil types, type W004, i.e., moderately shallow soils with a gravelly-loamy surface, excessively drained, showing moderate erosion and rockiness found on steep side slopes show a maximum contribution to Kalimpong’s landslide which conforms with Mondal and Mandal (2019) study. The lowest contribution is from soil type W002 which shows strong rockiness (table 1). A location within a proximity of 70–150 m to a drainage source had the highest influence among the other distances as per the Rprop ANN model. The distance to lineaments also plays some part in landslide occurrence, as observed from its category weights. Locations nearer to a lineament have more chances of landslide: 0–100 m (0.10) than locations farther away: 6500–8000 m (0.071). A location’s proximity to a road also influences its landslide susceptibility. The categorical data indicates that a location within 20–50 m of a road may be prone to landslides. From the subclass weights of the triggering factor, i.e., rainfall it can be concluded that the higher rainfall values of over 275 cm have greater chances of triggering a landslide.

Table 3 Contribution of landslide factors to landslides through Rprop ANN weights.

5.2 Landslide factor importance plot

A landslide factor importance graph was drawn based on the overall weights of each factor (figure 11). The overall weights for each landslide factor are elevation (0.21), slope (0.62), aspect (0.07), curvature (0.069), NDVI (0.51), TWI (0.28), distance to drainage (0.053), distance to roads (0.0698), distance to lineaments (0.0653), landuse (0.0615), rainfall (0.17), and soil type (0.0962) (table 3). The slope of the terrain has been observed to have the highest contribution with an overall weight of 0.62, and the lowest contributor was distance to drainage with an overall weight of 0.053. Thus, it can be inferred that slope as a geomorphological factor is the most significant factor that leads to landslides in Kalimpong. Distance to drainage got low overall weight as it may not always contribute to landslides. For instance, a fragile cliff near a stream, or a steep slope with unconsolidated sediments may be landslide-prone even without the influence of drainage, which is the case in Kalimpong. The presence of coarser soil type W004 and steep slopes (section 5.1) substantially contribute to landslide occurrences with slight lubrication from rainwater and is independent of nearby drainage. Other geomorphological landslide factors, viz., elevation, curvature, and aspect show contributions of 0.21, 0.069, and 0.07, respectively. It can be deduced from this, that the elevation of the hilly terrain of Kalimpong is another important geomorphological contributing factor to landslides, second to the slope. Curvature and aspect have relatively lower impacts on landslide phenomena. Hydrological landslide factors of distance to drainage and TWI have Wj values of 0.053 and 0.28, respectively. This shows that TWI, i.e., the saturation condition of the slope also plays a very crucial role as it lubricates the slope materials. Geological landslide factors, viz., distance to lineaments and soil type have contributions of 0.0653 and 0.0962, anthropogenic landslide factors, viz., distance to roads and landuse have Wj values of 0.0698 and 0.0615. The absence of vegetation plays a very significant role in landslide occurrence as evident from the Wj value of NDVI at 0.51 (as discussed in section 5.1). This factor is even more significant than elevation and TWI, and only the overall second important landslide factor in Kalimpong. The values of the slope, TWI, and NDVI suggest that barren or sparsely vegetated land on high and wet slopes would be highly susceptible to landslides. The trigger factor, i.e., rainfall was allotted a value of 0.17 by the Rprop ANN model, which indicates its moderate significance. Therefore, the landslide factors contributing to landslides in Kalimpong can be arranged in descending order of significance as the slope (0.62), NDVI (0.51), TWI (0.28), elevation (0.21), rainfall (0.17), soil type (0.0962), aspect (0.07), distance to roads (0.0698), curvature (0.069), distance to lineaments (0.0653), landuse (0.0615) and distance to drainage (0.053).

Figure 11
figure 11

Landslide factor importance graph.

5.3 Rprop ANN model validation and accuracy test

The landslide susceptibility model has no significance unless the results are validated (Chung and Fabbri 2003; Beguería 2006). The model validation process can be divided into two phases. First, the model is run to predict the data of the training dataset which was used to train the model. This phase assesses the fitness of the model and its suitability for prediction. The accuracy by which it predicts the training dataset is called the success rate. Second, the model is run on the testing dataset which was not used for training the model. The accuracy by which it predicts the testing dataset is called the prediction rate. This indicates the actual prediction capability of the model. A prediction rate of 0.5 or 50% indicates that the model prediction is no good than a random guess. Higher the prediction rate, the higher the suitability of the model for making actual predictions (Chung and Fabbri 2003). Validation of the model in the two phases with high success and prediction rates allows the application of the model in any zone with similar geo-environmental features (Conforti et al. 2014). Therefore, the receiver operating characteristic (ROC) curve has been developed to evaluate and quantify the prediction capability of the model (Beguería 2006; Carrara et al. 2008; Dong et al. 2009; Baeza et al. 2010). The ROC curves are plotted with false positive rate (1 – specificity) on the x-axis and the true positive rate (sensitivity) on the y-axis for varying probability thresholds. The true positive rate indicates the proportion of actual landslide occurrences that were correctly predicted as landslides, while the false positive rate indicates the proportion of non-landslide occurrences that were incorrectly predicted as landslides (Swets 1988). The area under the curve (AUC) of the ROC curve, is a quantitative indicator of the prediction accuracy of the model (Akgun et al. 2012). The ROC is drawn for each phase of validation and the AUC is determined. The AUC value for the first phase represents the success rate of the model, and the AUC value for the second phase represents the prediction rate of the model.

The Rprop ANN model in this study was validated using the above methodology. The training and testing datasets derived from the ground/Google-Earth validated landslide inventory were used. Firstly, the Rprop ANN model prediction was checked on the training data itself to see if the model could predict its training data accurately. It was seen that the model classified all 278 observations of 0s and 146 observations of 1s of its training data (424 observations) correctly, where 0s represent no landslide occurrence and 1s represent landslides. When the Rprop ANN model was used to classify the testing data (110 observations), it was seen that 55 0s were classified as 0s and 19 0s were classified as 1s. Also, 28 1s were classified as 1s, and 8 1s were classified as 0s. The ROC curves were plotted with false positive rate on the x-axis and the true positive rate on the y-axis for varying landslide susceptibility thresholds in R Studio. The ROC curves for success rate as well as the prediction rate of the Rprop ANN Model are illustrated in figure 12(a and b), respectively. From the ROC curves, the AUC for success rate was calculated to be 1 or 100%, and for the prediction rate, it was 0.8435 or 84.35%. This indicates that the Rprop ANN model in this study is a good-fit model and has a high predictive capability to forecast future landslides.

Figure 12
figure 12

AUC of ROC curves for (a) success rate and (b) prediction rate.

5.4 Landslide susceptibility map by Rprop ANN

The Rprop ANN model was used to compute the prediction of landslide occurrence using the pixel values of the rasters of all landslide factor layers taken for this study. The final landslide susceptibility map of Kalimpong was produced in R Studio using equation (8) (the extended form of equation (7) using overall weight ‘Wj’ values from table 3):

$${\mathrm{LSI}}_{\mathrm{Rprop}}={\mathrm{Rprop}}_{\mathrm{Slope}}\times 0.62+{\mathrm{Rprop}}_{\mathrm{Elevation}}\times 0.21+{\mathrm{Rprop}}_{\mathrm{TWI}}\times 0.28+{\mathrm{Rprop}}_{\mathrm{Rainfall}}\times 0.17+{\mathrm{Rprop}}_{\mathrm{NDVI}}\times 0.51+{\mathrm{Rprop}}_{\mathrm{Curvature}}\times 0.069 +{\mathrm{Rprop}}_{\text{Soil \; type}}\times 0.0962 +{\mathrm{Rprop}}_{\text{distance\; to\; drainage}}\times 0.053 +{\mathrm{Rprop}}_{\text{distance\; to\; lineament}}\times 0.0653+{\mathrm{Rprop}}_{\text{distance\; to\; road}}\times 0.0698+{\mathrm{Rprop}}_{\mathrm{Aspect}}\times 0.07+{\mathrm{Rprop}}_{\mathrm{landuse}}\times 0.0615.$$
(8)

The landslide susceptibility map of the Kalimpong district was annotated in ArcGIS 10.6.1 (figure 13). The susceptibility values in the map range from 0.01 (extremely low) to 1 (very high). The map classifies the entire study area into five landslide susceptibility zones (LSZ) of very high (0.81–1), high (0.60–0.81), moderate (0.40–0.60), low (0.15–0.40), and extremely low (0.01–0.15) susceptibility to landslides using JNB classifier. From the study, the percentage area-wise distribution of landslide susceptibility zones in the study area was found (figure 14).

Figure 13
figure 13

Kalimpong’s landslide susceptibility map using Rprop ANN.

Figure 14
figure 14

Percentages of total area under different susceptibility categories.

5.5 Comparison with earlier studies

A study by Chawla et al. (2018) for LSM in Darjeeling Himalaya using particle swarm optimization (PSO)-SVM assigned various weights to their parameters. These weights were based on literature and past knowledge of landslides. Their selected landslide factors and weights were as drainage buffer (10), lineament buffer (9), slope (8), rainfall (7), earthquake (7), lithology (6), landuse (5), fault buffer (5), valley buffer (4), soil (3), relief (3), and aspect (1). The Rprop ANN weights allotted to the slope (high contribution) and aspect (low contribution) in this study (section 5.2) conform to Chawla et al. (2018) study. Another study by Mondal and Mandal (2019) of LSM in Darjeeling Himalaya using the Index of Entropy (IOE) model yielded a prediction accuracy of 78.2% of the model. In comparison, the Rprop ANN model in this study gave an accuracy of 84.35% for the same Darjeeling Himalayan Region (including Kalimpong). This signifies that the Rprop ANN model can give better accuracy with the landslide factors taken in this study. Mondal and Mandal (2019) landslide factors and their IOE weights were as elevation (0.070), aspect (0.086), slope (0.051), curvature (0.051), geology (0.109), soil (0.185), lineament density (0.025), distance from lineament (0.081), drainage density (0.041), distance to drainage (0.105), SPI (0.022), TWI/CTI (0.107), rainfall (0.053), NDVI (0.022), and LULC (0.070). Their study determined soil type as the most important contributing factor to landslides, whereas slope was the highest contributor in this study with more accuracy. They also classified the LSZ map into five zones with % of total pixels as very low (5.17%), low (24.08%), moderate (35.55%), high (25.48%), and very high (9.72%). The present study has LSZ with their % of total pixels as extremely low (55.8%), low (24.9%), moderate (7.6%), high (5.6%), and very high (6.1%). There is a similar amount of area in the very high and low susceptibility zones, whereas there is a difference in the results in the other categories (figure 14). A recent study for LSM of Sweden using Bprop ANN gave a prediction rate accuracy of 80.1% (Abbas et al. 2019). This indicates the superiority of the Rprop ANN model for LSM over the Bprop ANN model in terms of prediction rate accuracy (as described in section 4).

5.6 Important locations in landslide susceptible zones

The derived landslide susceptibility map was categorized in extremely low, low, moderate, high, and very high susceptibility zones covering 610, 272, 83, 61, and 66.7 km2 of Kalimpong’s area, respectively (figure 14). The important locations in the study area falling in different susceptibility zones have been categorized in table 4. The locations were identified by overlaying the susceptibility map on a Google Map base layer in ArcGIS 10.6.1. Some locations have been illustrated in figure 2. Figure 2(a) shows a section of the lava road affected by a landslide. The section lies in 27°02′02.5″N, 88°41′38.7″E and falls in the moderate susceptibility category as per the landslide susceptibility map (figure 13). Figure 2(b) illustrates a landslide-damaged road near Yang Makum Khasmahal. This region falls in the high susceptibility category and has coordinates 26°56′34.1″N, 88°29′11.4″E. Figure 2(c) was also taken from a high landslide susceptibility region famous for landslide damaged roads, viz., the NH10 near the Teesta River at 26°56′34.1″N, 88°29′11.4″E. The NH10 runs almost parallel to the Teesta on the western side of Kalimpong. The highway is extremely prone to landslides and is usually damaged in the rainy seasons cutting off Kalimpong’s connectivity to the plains. One of the very high susceptibility places is the Nimbong Khasmahal Region of Kalimpong. Figure 2(d) shows a place near Nimbong Khasmahal at 26°57′42.2″N, 88°34′49.3″E. Figure 2(e) shows a damaged section of Rishi Road (27°05′24.8″N, 88°39′01.0″E) which is a moderate susceptibility zone as per the final landslide susceptibility map. The main characteristic features of the zones of moderate to high susceptibility are high slopes, degraded vegetative cover, retention of water in the slopes after heavy rainfall, and moderate elevation as discussed in section 5.1. The main Kalimpong town has low to moderate susceptibility regions because of well-established settlements, roads, and drainage networks. The low susceptibility regions show good vegetative cover which holds the slope materials from sliding as discussed in section 3.2.5 and section 5.1. Forests in Kalimpong including the major portion of Neora Valley National Park fall in the low to extremely low susceptibility category (table 4).

Table 4 Important locations in the study area and their landslide susceptibilities.

6 Conclusion

In this study, the possible application of Rprop ANN in landslide susceptibility mapping was examined. Twelve landslide factors were processed and converted to csv files for analysis. A suitable architecture was chosen to run the Rprop ANN model, and it was verified using datasets that were not used in training. After checking the accuracy, the landslide susceptibility map was prepared using the landslide factor raster layers. The study also presents detailed discussions on the contribution of landslide factors, a few comparisons with earlier studies, and illustrates some important landslide-affected locations in the study area. However, a few limitations were recognized in this study, viz., the inability to ground-validate inaccessible landslide locations, the size of the landslide grids taken in this study (as described in section 5.1), and the consideration of 12 landslide factors only. Despite the limitations, the landslide susceptibility map of Kalimpong developed using Rprop ANN in this study showed a reasonable accuracy. The prediction rate of the model at 84.35% highlights the suitability of the combination of the 12 landslide factors taken for this study. The final susceptibility map has distinguishable areas of five susceptibility classes (figure 13), and the results are encouraging. Even so, the prediction accuracy may be improved by taking into account the factors that have not been considered, e.g., lithology, soil overburden thickness, stream power index, etc. Further, the size of grids may be taken separately for each landslide to avoid overlapping pixels of nearby drainages and add to the prediction rate.

Conversely, the Rprop ANN has many inherent advantages such as high speed and accuracy (section 4). The speed and efficiency of this model allowed the integration of 12 major landslide factors for the study without any burden on the system’s computational capability. Thus, Rprop ANN is feasible for working with a greater number of landslide factors. Also, raster files of very high resolution, i.e., small cell sizes, can be easily analyzed in a short time. This enables the user to retain minute details of a landslide raster layer and reduces the need to resample files into lower resolutions due to system resource constraints. Therefore, without compromising the quality of the output, this model can be run on low-end systems too. Other advantages include the absence of subjectivity in assigning weights to landslide factors, which is present in methods like the analytical hierarchy process (AHP). Just like any other ANN model, the Rprop ANN model is also independent of the type of distribution of data. Both continuous and categorical data can be integrated without violating any model assumptions. Also, ANNs can easily identify patterns and derive a solution set to any kind of input data.

The final susceptibility map produced using Rprop ANN highlights all the regions of Kalimpong that need a policy maker’s attention. The moderate to very high landslide susceptibility zones of Kalimpong, where landslides occur frequently are regions of immediate concern. These regions require slope modification like the use of chemical binders, construction of retaining walls, pile driving, providing efficient drainage, grouting of fissures in underlying rocks, etc., for stabilization of the slope materials. Existing roads in and around Kalimpong already have stretches reinforced with retaining walls, but the stabilization work is mostly post-mortem. To mitigate the devastating impacts of landslides on life and property, it is imperative to take preventive measures. Hence, this landslide susceptibility map can be taken as a reference to implement necessary ground improvement techniques and landslide management efforts. The Rprop ANN model can also be used to quickly develop landslide susceptibility maps to conduct site suitability analysis for an infrastructure project. Therefore, this study demonstrates the capability of the Rprop ANN model to obtain a reliable landslide susceptibility map quickly and efficiently.