Introduction

The impact of landslide on human life and economic activities is very intense in the different countries of the world (Shahabi and Hashim 2015). India has its 25 percent of the total territory under hilly mountainous topography among which tectonically active great Himalayan range is the main mountainous region (Dubey et al. 2005). Due to immature and fragile geological structure, landslide is a very common phenomenon in the Himalayan region. Only for the landslide India faces a loss of US $500 million per year (Dubey et al. 2005). So proper planning is necessary for the reduction of intensity and prevention of the hazard. For this purpose landslide susceptibility map is a very significant tool. Nowadays due to the availability of highly accurate remote sensing data landslide susceptibility mapping becomes easier (Shahabi and Hashim 2015). Landslide is an outcome of combined action of various physical and anthropogenic factors which affect the event directly or indirectly. Hence all methods of landslide susceptibility analysis associated with more or less same conditions: (1) identification of causative factors which are responsible for slope instability, (2) calculation of rating or weight by selecting a suitable method or technique, (3) assessment of role of instability factors for the occurrence of landslide on the basis of their primary ratings or weights, (4) identification of landslide susceptibility zones on the basis of the classification of landslide susceptibility index (LSI) (Anbalagan 1992; Guzzetti et al. 1999, and Dai et al. 2002). The landslide susceptibility mapping deals with the identification and division of land surface on the basis of the degree of actual and potential landslide. For this reason it can be a key tool to the planners for suitable site selection for human settlement and infrastructural development (Parise 2002). In recent times, Geographical Information System (GIS) based statistical modelling become popular for landslide study. Throughout the world plenty of work has been done under GIS environment and many researchers have applied probabilistic models (Lee and Choi 2003; Pradhan et al. 2006, 2011; Kayastha 2015). Statistical models such as logistic regression (Bathrellos et al. 2009; Akgun 2012; Mondal and Mandal 2017) and statistical index model (SI) (Bui et al. 2011; Regmi et al. 2014; Mandal and Mandal 2017) have also employed for landslide susceptibility analysis. Apart from this expert knowledge based statistical models: analytical hierarchy process (Tazik et al. 2014; Kumar and Anbalagan 2016; Rahaman and Aruchamy 2017), weighted overlay model (Shit et al. 2016) were used for landslide susceptibility assessment.

Landslide was observed and studied from late 1800s (Endlich 1876; Atwood and Mather 1932; Crandell and Varnes 1961; Varnes and Savage 1996). Later landslide studied with both qualitative (Ko Ko et al. 2004) and quantitative approaches (Dhakal et al. 2000; Mandal and Maiti 2015). Qualitative approaches are more effective in case of small scale landslide hazard mapping (Yoshimatsu and Abe 2006; Castellanos et al. 2008; Jian and Xiang-guo 2009). On the other hand, for large scale mapping quantitative methods are more suitable (Vakhshoori and Zare 2015). In last two decades remote sensing and GIS based techniques were widely used for the assessment of landslide hazard phenomena (Nagarajan et al. 1998). These techniques depend on scale, practical knowledge of the study area, scientific knowledge, expenditure of the study and time (Vakhshoori and Zare 2015). GIS based techniques and methods were successfully distinguished by several researchers into direct geomorphological mapping (Cardinali et al. 2002), landslide inventory mapping (Guzzetti et al. 2012), heuristic method (Leoni et al. 2015), statistical method (Tornyai and Lúchava 2015), and deterministic approaches (Van Westen and Terlien 1996). Throughout the world plenty of work has been carried out under GIS environment and many researchers have successfully applied probabilistic models (Zhou et al. 2002; Lee and Choi 2003; Youssef et al. 2009, 2012; Kayastha 2015). Statistical models such as logistic regression (Tunusluoglu et al. 2007; Mondal and Mandal 2017) and statistical index model (SI) (Bui et al. 2011; Regmi et al. 2014; Mandal and Mandal 2017) have also employed for landslide susceptibility analysis. Apart from this expert knowledge based statistical models: analytical hierarchy process (Tazik et al. 2014; Kumar and Anbalagan 2016), weighted overlay model (Shit et al. 2016; Basharat et al. 2016; Kanwal et al. 2016), multi-criteria elaluation techniques (Ahmed 2015) were used for landslide susceptibility assessment.

Slope instability as well as landslide is quite common problem in entire Sikkim. The occurrence of landslide is increasing day by day in the Rorachu river basin due to increase of anthropogenic activities like, rapid urban expansion, illicit hill cutting, incessant deforestation etc. In Sikkim, several researchers have successfully carried out some studies on use of 3-D digital elevation model (DEM) for landslide assessment (Dubey et al. 2005), mechanism of landslide initiation (Anbarasu et al. 2010), movement monitoring of landslide by GPS (Rawat et al. 2011), use of high resolution satellite data for damage and geological assessment (Martha et al. 2015), rule-based semi-automated method for landslide detection (Siyahghalati et al. 2014), remote sensing and GIS based statistical approach for landslide susceptibility analysis (Sarkar et al. 2008; Rawat et al. 2016; Vishwakarma et al. 2017). These models have upgraded the information, database and mapping techniques for the analysis of landslide (Chen et al. 2016). The present study deals with identification of landslide susceptibility areas in the Rorachu river basin of eastern Sikkim Himalaya using GIS based logistic regression model.

Study area

The study was completed in the Rorachu river basin of eastern Sikkim Himalaya, located to the northern extent of east Sikkim district. The study area extends between 27°17′19″ to 27°23′52″N and 88°35′37″ to 88°43′17″E with an area of 71.73 sq. km. The maximum and minimum altitudes of the Rorachu river basin are 4213 and 834 m respectively (Fig. 1). The altitude of the area is rapidly increasing from southwestern extent (Ranipool, 27°17′29.04″N, 88°35′29.76″E) towards its northeastern extent (Pandramaile, 27°21′41″N, 88°42′45″E). According to Koppen’s climatic classification, the study area is dominated by subtropical highland climate (Cwb). Because of its high relief and sheltered environment the Rorachu basin enjoys a mild temperature throughout the year with an average maximum temperature of 22 °C during summer and 4 °C during winter. Dissected hilly terrain is the main geomorphologic unit of the study area. Northern and northeastern parts of the basin where maximum altitude was found are under highly dissected hilly terrain where relatively low lying areas like Gangtok, Ranipool were under moderately dissected hilly terrain. Steep slope along with the presence of numerous number of first and second order streams may be the prime causes of high topographic dissection of the basin, as well as slope instability in the Rorachu river basin of East Sikkim.

Fig. 1
figure 1

Location map of the study area

Database of the study

Landslide is a complex geo-environmental process which occurrence depends on tectonic, climatic, geological, topographical and anthropogenic factors. So, the selection of proper factors is considered as a prime task in landslide related studies. The selection and mapping of appropriate set of factors responsible for the occurrence of landslide depends on the detail knowledge about the main causes of landslide initiation (Guzzetti et al. 1999). For the preparation of land susceptibility map, 13 landslide causative factors including rainfall, slope, aspect, curvature, relief, drainage density, distance from drainage, geology, soil, distance from lineament, distance from road, NDVI and land use/land cover were taken into consideration. Previous literatures (Mondal and Mandal 2017) showed that these factors were significantly used by the researchers in landslide susceptibility analysis. For logistic regression analysis two separate spatial database of continuous data (rainfall, slope, aspect, curvature, relief, drainage density, distance from drainage, distance from lineament, distance from road and NDVI) and categorical or discrete data (geology, soil and land use/land cover) were constructed. In the present study, data and maps from various sources like Survey of India, Geological Survey of India, National Atlas and Thematic Mapping Organization (NATMO), United States Geological Survey (USGS), and world climate website (http://www.worldclim.org) were collected to prepare thematic data layers of different causative factors (Table 1).

Table 1 Sources of the different data layers of the study area

Methodology of the study

Methodological framework of the study (Fig. 2) is divided into several distinct segments like, spatial database construction for landslide conditioning factor and landslide locations, formulation of the model, preparation of landslide susceptibility map and validation of the model. The methodological framework of the study was taken on the basis of a very familiar principle of “present and past are keys to the future”. The basic law of this principle is related to the use of subsisted landslide for the evaluation of future landslide areas (Bai et al. 2010), due to which the spatial database was constructed. For the construction of spatial database, all of the landslide causative factors were converted into raster data of 30m × 30 m cell size.

Fig. 2
figure 2

Methodological framework of the study

Preparation of landslide inventory:

In GIS based statistical analysis, landslide distribution map or landslide inventory is very much significant, because a landslide inventory provides not only the spatial information of landslide, but also helps to extract parametric information of landslide affected and adjacent areas. Due to these reasons, landslide inventory map is widely used for GIS based landslide susceptibility analysis (Bui et al. 2011; Shit et al. 2016; Mondal and Mandal 2017), seed cell sampling based landslide susceptibility analysis (Bai et al. 2010; Wang et al. 2013; Dagdelenler et al. 2015). For the identification of shape, size, location of landslide and their types of materials and movements, comprehensive field studies were conducted using GPS (Bai et al. 2010). In the Rorachu river basin, total numbers of 80 landslides were recognized with a total areal coverage of 0.85 sq. km (Fig. 3). A detailed landslide database was prepared during field study on the basis of landslide classification provided by Varnes (1978) and Cruden and Varnes (1996). The landslide database comprises shallow landslides, deep seated landslide, rock slide and earth slides. To prepare landslide inventory map, all landslide locations were vectorized from LANDSAT 8 OLI, sentinel-2 image and Google earth images using Arc Map 10.3 software. For the preparation raster layer of landslides, a vector to raster transformation was accomplished.

Fig. 3
figure 3

Landslide inventory map of the Rorachu river basin

Application of logistic regression model:

Logistic regression is a statistical model which permits a multivariable regression analysis between a dependent and a group of independent variables (Bai et al. 2010). On the basis of a group of predictor variables, logistic regression is an important multivariate analyzer for the prophecy of presence or absence of an outcome (Lee 2007). The main advantage of logistic regression is that, with the help of a proper link function to the ordinary linear regression, logistic regression can be performed. In this case, there is no problem if the data is either continuous or discrete or both and the regression does not require the normal distribution of the data. In the present study, the dependent variable landslide is presented by binary digits, 0 for absence and 1 for the presence of the event. This condition where dependent variable is binary, the logistic regression link is suitable (Atkinson and Massari 1998). After the conversion of dependent variable into logit variable, the algorithm of logistic regression implicates maximum likelihood estimation. Thus, logistic regression measures the probability of an event (Atkinson and Massari 1998; Dai and Lee 2002). In the present condition when the dependent variable is binary, logistic regression model can be written as equation-1.

$$P={\text{ }}1/{\text{ }}\left( {1+{e^{ - z}}} \right)$$
(1)

where P is the probability of the event, which ranges between 0 to 1on a sigmoid pattern, z is the linear combination of the model (linear logistic model) which ranges between − ∞ to + ∞ (Eq. 2).

$$z={\text{ }}{b_0}+{b_1}{x_1}+{b_2}{x_2} \ldots {b_n}{x_n}$$
(2)

where the b0 represents the intercept of the model, b i (i = 1, 2, 3….n) represents the slope co-efficient and xi (i = 1, 2, 3…n) is the number of independent variable in the equation. The linear logistic regression represents the existing conditions (presence or absence of landslide), on the basis of pre-failure conditions (independent variables).

In the present study the spatial relationship between landslide and its causative factors was assessed using logistic regression model in the Rorachu river basin of eastern Sikkim. For the construction of the model and calculation of probability, all the data layers including landslide inventory were converted into point format and the value of each point was used in the IBM Statistical Package for the Social Sciences (SPSS) statistical software. All factors were used to calculate coefficient values of the model. Landslide is a very complex process controlled by several geo-environmental factors. On the basis of the coefficient values, the model analyses the role of causative factors or which factors are prime for the occurrence of landslide in the Rorachu river basin.

Conversion of categorical variables

Generally in logistic regression analysis due to the nominal nature, conversion of categorical variables into numeric variables using binary dummy variable is necessary. But the excess use of dummy variables may increase the length of the equation and influence the accuracy of the model. For this reason, category wise frequency ratio (Eq. 3) was calculated (Table 2) to change categorical variable into numeric variable (Bai et al. 2010; Lee and Pradhan 2007; Wang et al. 2013). A conceptual framework of frequency ratio is presented in the equation (Eq. 3).

Table 2 Category wise frequency ratio (FR) of different categorical variables
$$FR=~\frac{{LAi/Ai}}{{{{\mathop \sum \nolimits_{{i=1}}^{N} LAi} \mathord{\left/ {\vphantom {{\mathop \sum \nolimits_{{i=1}}^{N} LAi} {\mathop \sum \nolimits_{{i=1}}^{N} Ai}}} \right. \kern-0pt} {\mathop \sum \nolimits_{{i=1}}^{N} Ai}}}}$$
(3)

where LAi is the number of landslide pixel in ith class of a parameter, Ai is the total number of pixel of ith class in a parameter and N is the total no of class in a parameter. The categories of geology, soil and land use/land cover were transformed into numerical variables.

Preparation landslide susceptibility map:

The calculated probability of the model was used as landslide susceptibility index (LSI). The probability values were assigned to the point features using Arc Map 10.3 software data joining tool. With the help of inverse distance weighting (IDW) tool in Arc Map 10.3 software package, the points of different probability values were interpolated and landslide susceptibility map was prepared. On the basis of LSI values, five distinctive zones of landslide susceptibility were identified using natural break reclassify scheme in Arc Map 10.3.

Model validation

GIS based statistical modeling is a very useful technique for landslide susceptibility analysis. But the acceptance, accuracy and predictive capacity of the model depend on the proper validation. Without validation there is no scientific significance of such models. In the present study, receiver operating characteristics (ROC) curve and decision rule based landslide density method were used to validate the model.

Model validation by ROC curve:

ROC curve is a commonly used method to visualize the performance of the binary classifier, meaning a classifier with two possible output processes, i.e. presence or absence of an event where presence is considered as positive and absence as negative classification. A cut-off point or threshold value is used to discriminate two outcomes. On the basis of the classification, the result is divided into four types, i.e. true positive or TP (presence of event is correctly classified as positive), false negative or FN (presence of event is classified as negative), true negative or TN (absence of event is correctly classified as negative) and false positive or FP (absence is classified as positive). All the results are very much significant for the calculation of specificity and sensitivity. ROC curve is a two dimensional diagram in which specificity lies on the X axis and sensitivity on the Y axis. A conceptual framework may help to understand the basic structure of ROC curve (Eqs. 4, 5). The precision of the test depends how well the test divides the area of an event from non-event areas. Accuracy of a test is measured on the basis of the area under ROC curve (AUC). The value of AUC ranges between 0.5 and 1.0 where 1 indicates perfect test and on the other hand 0.5 indicates useless test.

$${\text{Sensitivity}}={{\text{a}} \mathord{\left/ {\vphantom {{\text{a}} {\left( {{\text{a}}+{\text{b}}} \right)}}} \right. \kern-0pt} {\left( {{\text{a}}+{\text{b}}} \right)}}$$
(4)
$${\text{Specificity}}={{\text{d}} \mathord{\left/ {\vphantom {{\text{d}} {\left( {{\text{c}}+{\text{d}}} \right)}}} \right. \kern-0pt} {\left( {{\text{c}}+{\text{d}}} \right)}}$$
(5)

Model validation by landslide density method:

Landslide density is the ratio of actual landslide area to the landslide susceptibility classes (Sarkar et al. 2008). The class wise landslide density and landslide susceptibility areas were calculated with the help of area with and without landslides for each susceptibility class. The basic rule of this method is that, in case of highly accurate map landslide density will increase with increasing LSI values and highest landslide density will be found in very high landslide susceptibility class.

Landslide conditioning factors

The present study deals with the assessment of landslide susceptibility using logistic regression model. For the fulfillment of the purpose, rainfall, slope, aspect, curvature, relief, drainage density, distance from drainage, distance from lineament, distance from road, geology, soil NDVI and land use/land cover were selected and used as significant factor for landslide occurrence. By nature the data layers of landslide conditioning factors were mainly of two types: continuous data (rainfall, slope, aspect, curvature, relief, drainage density, distance from drainage, distance from lineament, distance from road and NDVI) and categorical or discrete data (geology, soil and land use/land cover). In case of categorical data conversion of the data is necessary for logistic regression analysis.

Rainfall

Rainfall is considered as triggering factor for landslide occurrence, as it increases the soil saturation due to which the slope materials become unstable. On the other hand heavy rainfall encourages the surface run-off as well as the discharge and erosive capacity of small stream segments. The rapid erosion caused by tiny rills, gullies and lower order streams directly affect the slope stability by reducing the cohesiveness of soil. Rainfall data of the study area was collected from the world climate website (http://www.worldclim.org) and a rainfall distribution map was prepared using Arc Map 10.3 software. From the prepared rainfall distribution map (Fig. 4), the maximum 224 cm average annual rainfall was noticed in the Rorachu river basin.

Fig. 4
figure 4

Rainfall distribution map of the Rorachu river basin

Morphometric factor

Among all morphometic factors, slope, aspect, curvature, and relief were extracted from Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM) of 30 m spatial resolution with the help of Arc Map 10.3 software package. Previous literature (Shit et al. 2016; Mondal and Mandal 2017) showed that these factors were widely used by researchers in landslide susceptibility analysis. For the calculation of drainage density river and streams were vectorized from topographical map no. 78A/11 and compared with digital elevation model for necessary corrections. Drainage density was calculated after Horton’s (1945) method (Eq. 6).

$$Dd=\left( {\frac{{Lk}}{{Ak}}} \right)$$
(6)

where Dd denotes drainage density, Lk represents the length of the streams of a basin and Ak is the total area of the basin. The basin was divided into 1 km × 1 km grids and length of all stream segments per grid was measured to analyze drainage density. On the basis of the obtained value a drainage density map was prepared with the help of inverse distance weighting (IDW) tool in ArcMap 10.3. The slope map of the Rorachu river basin (Fig. 5a) depicts that, the slope ranges between 0° and 68.9495°. The aspect of slope is sounded clockwise in degrees from 0 to 360. On the basis of degree values the aspect map (Fig. 5b) of the study area was prepared. Slope aspect influences temperature and precipitation which affects soil moisture, thickness of soil and vegetation cover of the slope. Normally south oriented slopes in the northern hemisphere receive more precipitation and become unstable due to direct impact of saturation. The curvature of the Rorachu river basin ranges from − 28.09 to 6.79 (Fig. 5c), where positive value indicates convexity and negative value denotes concavity of the slope. The relief of the Rorachu river basin is gradually increasing northward and maximum relief of 4114 m (Fig. 5d) was recorded in the northern part of the basin. In the Rorachu river basin, drainage density ranges from 0.81 to 7.46 (Fig. 6a).

Fig. 5
figure 5

Slope (a), aspect (b), curvature (c) and relief (d) of the Rorachu river basin

Fig. 6
figure 6

Drainage density (a), distance from drainage (b), distance from lineament (c) and distance from road (d) of the Rorachu river basin

Distance from drainage, lineament and road

Streams are important agent for saturation of slope as well as slope instability. It was widely used by many researchers (Bui et al. 2011; Aghdam et al. 2016, and Wu et al. 2017). For the assessment of influence of streams on landslide occurrence, drainage network of Rorachu river basin was digitized from the topographical map no 78A/11 and a distance from drainage map was prepared on the basis of 100, 200, 300, 400, 500, 600 and 700 m distance (Fig. 6b). Lineaments are considered as the linear expression of underlying geological structure like, fault. Hence, the slopes with lineaments have greater tendency to become unstable. Using line tool of Geomatica 10.2 software, lineaments of the study area were extracted from the panchromatic band of LANDAT 8 OLI image with 15 m spatial resolution and compared with 1:50,000 scaled thematic map of bhuvan (http://www.bhuvan.nrsc.gov.in). For the better understanding of influence of lineaments on slope instability, a distance from lineament map (Fig. 6c) was prepared using 100, 200, 400, 800, 1200, 1600 and 2400 m buffer distance. Modification of slope angle due to construction of road and load of heavy vehicles may affect the stability of slope by increasing stress and reducing cohesiveness of soil. As a result landslide occurred near the roads. In Rorachu river basin 56 landslide locations were identified along 31A national highway. Road network of the study area was also digitized from the same topographical map and updated with the help of Google earth image of 2016 and 10 m resolution sentinel-2 satellite image of 2017. To assess the impact of road on landslide occurrence, a distance from road map was prepared with 100, 200, 400, 800, 1600, 2400 and 3600 m buffer distance (Fig. 6d). All buffer maps were prepared using Arc Map 10.3 software.

Geology and soil

Geology is considered as one of the most important causative factor for slope instability as well as landslide. Fragile and weak geological structures are more prone to landslides. The geological map of the study area was prepared from district resource map of east Sikkim collected from geological survey of India, Kolkata. Different geological groups were vectorized by Arc Map 10.3 software. From the geological map, five lithological groups were identified in the Rorachu river basin such as quartzite, sillimanite bearing granite gneiss, schist, amphibolite and granite gneiss (lingtse gneiss) (Fig. 7a). The soil map of Rorachu river basin was collected from Natural resource atlas of Sikkim guided by National Atlas and Thematic Mapping Organization (NATMO), Kolkata. Soil of the study area was divided into seven different categories on the basin of material present in the soil (Fig. 7b) such as fine loamy fluventic eutrudepts (S001), coarse loamy humic pachic dystrudepts (S002), coarse loamy humic dystrudepts (S003), fine loamy typic paleudolls (S004), fine skeletal cumulic hapludolls (S005), loamy skeletal entic hapludolls (S006) and coarse loamy typic hapludolls (S007). The characteristics of different soil are mentioned in the Table 3.

Fig. 7
figure 7

Geology (a), soil (b), NDVI (c) and land use and land cover (d) of the Rorachu river basin

Table 3 Soil characteristics of the Rorachu river basin

Ndvi and land use/land cover

The normalized difference vegetation index (NDVI) map (Fig. 7c) was prepared from sentinel-2 image of 10 m spatial resolution using NDVI index in erdas imagine 9.2 software with the help of the given equation:

$$NDVI=\frac{{(NIR - R)}}{{(NIR+R)}}$$
(7)

where NDVI indicates the normalized difference vegetation index, NIR is the near infrared band (band no. 8) and R (band no. 4) is the red band of sentinel-2 image. The value of NDVI ranges between − 1 and + 1, where the values closer to 0 denotes less vegetation and closer to + 1 indicates good concentration of green leaves. Land use/land cover map was also prepared from sentinel-2 image using maximum likelihood algorithm in Erdas Imagine 9.2 image processing software under supervised image classification scheme and later image classification accuracy was assessed by Cohen’s Kappa coefficient method. Land use/land cover map (Fig. 7d) reflects the condition of physical environment and anthropogenic activities. Bare ground, settlement, road, river, terrace farming, sparse vegetation and dense vegetation were identified as significant land use/ land cover in the present study area where dense forest occupied 58.03% area, was considered as the dominant land cover followed by sparse vegetation, bare ground, settlement, river, terrace farming and road.

Result and discussion

Analysis of logistic regression model

On the basis of training data (725 landslide pixel and 78,975 non landslide pixel) obtained from thirteen thematic data layers and landslide inventory map, the model was constructed. After calculation, the result of the model was further verified to assess whether the dataset used in this model was suitable or not. There are various processes for the verification of the model result or to assess the goodness of fit of the model. In the present study, Hosmer–Lemeshow test, Cox and Snell R2 and Nagelkerke R2 and − 2log likelihood method was used to check the suitability of the model. In case of Hosmer–Lemeshow goodness of fit, the model is considered as suitably fitted if the significance of chi square value of the test is more than 0.05 (Bai et al. 2010). In the study when the test was performed, the significance of chi square value of 0.259 was obtained for the model (Table 4) which indicates the suitability of the model. In other words, the dataset used in the model was suitable for logistic regression analysis. Pseudo R2 is another method to justify model fitting. The model is regarded as perfect if the pseudo R2 value is 1 and the value above 0.2 indicates overall good fit (Clark and Hosking 1986). In the present study Cox and Snell R2 and Nagelkerke R2 were used to assess how logistic regression model fits the data. The calculated Nagelkerke R2 value was 0.209 which indicates a relatively good fit of the model (Table 5).

Table 4 Result of Hosmer–Lemeshow test and model summary
Table 5 Result of -2log likelihood test

− 2log likelihood (− 2LL) is another method used to evaluate the goodness of fit of the model. The method is a key concept to understand the test in multiple regressions (Garcia-Rodriguez et al. 2007). Generally the smaller value of − 2LL implies better result of the test. In this model − 2LL were calculated using all dependent variables and excluding one by one. The result showed that, the model yields lowest value when all the dependent variables were used (Table 5). That means the causative factors selected for the model construction, were prime factors for the landslide occurrence in the Rorachu river basin and the factors were perfectly suited in the model.

On the basis of the independent and dependent variable, coefficient values of the logistic regression model were calculated (Table 6) and on the basis of the coefficients the following equation or linear combination was constructed for the present study:

Table 6 Coefficient of logistic regression and overall statistics of landslide conditioning factors
$$\begin{aligned} z & ={\text{1}}.{\text{98}}0 - \left( {0.00{\text{2}} \times {\text{RAIN}}} \right)+\left( {0.{\text{38}}0 \times {\text{SLOPE}}} \right) - \left( {0.0{\text{3}}0 \times {\text{ASPECT}}} \right) - \left( {0.{\text{58}}0 \times {\text{CURV}}} \right) - \left( {0.00{\text{2}} \times {\text{REL}}} \right) \\ & \quad - \left( {.{\text{351}} \times {\text{DD}}} \right)+\left( {{\text{2}}.{\text{952}} \times {\text{GEOL}}} \right)+\left( {{\text{1}}.{\text{22}}0 \times {\text{SOIL}}} \right) - \left( {0.00{\text{3}} \times {\text{DRAI}}} \right)+\left( {0.00 \times {\text{LIN}}} \right) \\ & \quad - \left( {0.00{\text{2}} \times {\text{ROAD}}} \right)+\left( {0.{\text{34}}0 \times {\text{NDVI}}} \right)+\left( {0.{\text{1}}0{\text{3}} \times {\text{LULC}}} \right) \\ \end{aligned}$$
(8)

where RAIN indicates rainfall, SLOPE indicates slope, ASPECT indicates aspect, CURV indicates curvature, REL indicates relief, DD indicates drainage density, GEOL indicates geology, SOIL indicates soil, DRAI indicates distance from drainage, LIN indicates distance from lineament, ROAD indicates distance from road, NDVI indicates normalized difference vegetation index (NDVI), and LULC indicates land use/land cover of the study area.

The calculated coefficient values reveal that, all landslide causative factors were not equally important or significant for the occurrence of landslide in the Rorachu river basin of eastern Sikkim Himalaya. Generally the significance value of all coefficients was less than 0.05 which depicts that all the independent variables used in the model were significant for the occurrence of the event. Coefficient value also explains the role or contribution of landslide causative factors in the landslide occurrence. The factors have higher coefficient values are considered as more influential than the others. Beside this, coefficient value analyses the change of dependent variable on the basis of independent variables. From the coefficient values, it was noticed that all the landslide causative factors are not positively related with landslide occurrence in the basin and in some cases zero relationship was also found (Table 6). Geology, soil, slope, NDVI and land use/land cover have positive relationship with landslide as the coefficient value of these factors were 2.952, 0.122, 0.038, 0.340 and 0.103 respectively. On the other hand rainfall, relief, aspect, curvature, drainage density, distance from drainage, and distance from road have negative relationship. From the coefficient values it is evident that, lineament has zero relationship with landslide occurrence in the Rorachu river basin. Geology, soil and land use/land cover were converted into continuous data using frequency ratio. So, their relationship with landslide was built on the basis of frequency ratio values. The coefficient value of geology shows that, it is the prime causative factor for the landslide occurrence. Geology yields maximum coefficient value (Table 6). In geological sub-divisions, maximum frequency ratio value was noticed in case of quartzite and Sillimanite bearing granite gneiss (Table 6). The high positive coefficient value reflects the same condition that, the categories having more frequency ratio value were more prone to landslides. Soils with coarser materials are more prone to landslide due to their excessive drained nature. Cohesion is minimum in this type of soil and as a result soil mass flows down to the slope very easily in moist condition. The same fact was noticed in the Rorachu river basin. Maximum frequency ratio was found in coarse loamy typic hapludolls soil followed by loamy skeletal entic hapludolls, coarse loamy humic dystrudepts and coarse loamy humic pachic dystrudepts. Positive coefficient value of soil indicates the same condition as geology. In case of land use/land cover, road, river and bare ground were the main features which influence the occurrence of landslide in the study area. Normalized Difference Vegetation Index (NDVI) represents the presence or absence of the vegetation. Positive coefficient of NDVI indicates that, landslide was occurred not only in bare surfaces, but also in the well vegetated areas in Rorachu river basin. Few landslide and landslide scar was found in the Ranipool forest block, Gangtok forest block and Bhusuk. Generally rainfall, relief and drainage density have positive relation with landslide. But in the present study, they have negative relationship with landslide in the basin. It could be noted that, this is reverse of the normal condition. Maximum rainfall of the basin was found in the south-western part where most of the landslides were occurred in the northern and north western part. So from this scenario it is clear that though rainfall is a major triggering factor for the occurrence of landslide, in Rorachu river basin landslides are not controlled by rainfall only. In the logistic regression model of the present study the coefficient value of relief was found − 0.002 which indicates a very poor negative relationship between relief and landslide occurrence. But the actual picture is quite different from this. In the Rorachu river basin, the relief ranges between 834 m and 4114 m but maximum landslide was found between 2500 m and 3500 m elevation and 56% of landslide pixels were found within 3500 m altitude. From the coefficient value it seems that landslide occurrence is decreasing with increasing relief. But actually a sigmoid pattern of landslide occurrence was noticed in case relief. In the hilly mountainous areas slope stability depends on the distance between streams and drainage density, or it can be said that the distribution of drainage and drainage density influences slope stability in the mountainous areas. Stability of the slope decreases due to dissection of slope by active down cutting of lower order streams. This dissection not only reduces the strength of soil but the amount or degree of slope also changes in a considerable rate. Hence the slopes closer to the drainage are less stable. Drainage density is high when length of the stream per unit area is more. This depends on the shape and alignment of the stream segments. The areas having straight channels contain lesser drainage density than the areas where channel is more or less sinuous. In case of Rorachu river basin, the slope of entire northern and north-western part of the basin was highly dissected by small segments of lower order streams. As a result a huge number of debris flows were found in the area. Some of the debris flows directly intersect the 31A National Highway. As a result, the coefficient value of distance from drainage data layer was found negative. This negative relationship between landslide and distance from drainage clearly depicts that, more landslides were found closer to the river in the basin. Out of 785 landslide training points 726 points (92.48%) were found within 200 m distance from rivers. This fact clearly explains the role of fluvial erosion in slope stability in the Rorachu river basin of eastern Sikkim Himalaya. But in case of drainage density, the relationship between landslide and drainage density was found negative in the logistic regression model. This is due to the fact that in the moderate to low relief areas of Rorachu river basin length of the streams were more than the high relief areas. Slope played an important role for this type of variation in drainage density. Generally in high relief areas slope is more 50°, as result the lower order stream segments were very straight in character. On the other hand in moderate to low relief areas higher order streams are more sinuous or irregular. Hence in the present study maximum drainage density was found in the south-eastern part of the basin where relief is moderate to low. In case of curvature positive and negative value indicates convex and concave slope respectively and zero value indicates flat areas. Generally concave slopes are more prone to landslide than the convex. The coefficient of curvature in the model was − 0.058 which indicates that in the Rorachu river basin concave slopes were more prone to landslide than the concave slopes and landslide was absent in the flat areas. Roads are one of the major causative factors in the present study area. 56 landslides out of 80 were found along 31A national highway in the basin which clearly depicts the influence of slope cutting for the road construction and impact of vibration due to vehicle transportation. The study enlightens the fact that, 594 training landslide points were found between 200 m buffer distances from road. So it can be said that, 75.66% of total landslides were occurred very close to the roads in the Rorachu river basin.

Wald statistics is another method was used in the study to identify the causative factor for landslide occurrence. Wald statistics indicates the significance of the variables responsible for the occurrence of event (Wang et al. 2013). The variables with greater Wald value are considered as the most significant factor for the event. Table 6 revealed that, all factors used in the model were significant at 0.05 significance level and geology produced the highest Wald value followed by distance from road, rainfall and slope. Relief, drainage density and land use/land cover have moderate Wald value and other factors have comparatively lower Wald value. So, it is clear that Geology is the dominant factor for the occurrence of landslide in the Rorachu river basin. Both the coefficient and Wald value of geology is much higher than the other landslide conditioning factors in the basin.

Analysis of landslide susceptibility map

Calculated probability values of logistic regression model were used as landslide susceptibility index (LSI). After the preparation of, the landslide susceptibility map was divided into five suitable classes (very low, low, moderate, high and very high) using manual classification method in Arc Map software package to demarcate different landslide susceptibility zones (LSZ).

The final landslide susceptibility map (Fig. 8) revealed that, the LSI value ranges from 0.000023 to 0.349046. From the map it is very evident that, the northern stretch of the basin is more susceptible to landslide than the southern part. After the demarcation of landslide zones, class wise pixel count as well as area was calculated. The data showed that, out of 71.73 sq. km area of the basin, 36.70 sq. km (51.17%) area was occupied by very low susceptibility class and lowest area of 1.15 sq. km (1.61%) was found in case of very high susceptibility (Table 7). The southern, south-eastern and some portions of central parts of the basin are under stable condition. But if existed landslide location data was observed, it can be seen that most of the landslides were located in the northern, north-eastern and north-western part of the basin belong to high and very high susceptibility classes. Total 0.65 square kilometer (75.48%) slide area was occupied these two classes (Table 7). This condition indicates that, the areas having LSI value more than 0.034218, slope instability as well as landslide can be a serious threat to the life and property and natural resources in the Rorachu river basin. On the other hand landslide was rarely seen in the very low and low susceptibility classes. These two classes contain 0.09 square kilometer (9.72%) slide area (Table 7).

Fig. 8
figure 8

Landslide susceptibility map of the study area

Table 7 Landslide susceptibility class wise total area and landslide area distribution

Validation of the model

Model validation by ROC curve

On the basis of 946 landslide pixels (including training and validation data) and 78,754 non-landslide pixels, a receiver operating characteristics (ROC) curve was prepared to validate the accuracy of logistic regression model. The graphical representation of ROC curve and AUC indicates a very good accuracy of the model (Fig. 9). The value of AUC was found 0.947 or 94.7% which reveals that the result of logistic regression model is very close to the perfect analysis of the data (Table 8). The asymptotic significance of the ROC curve apprises that the curve is also statistically significant.

Fig. 9
figure 9

Receiver operating characteristics (ROC) curve

Table 8 Overall statistics of ROC curve and AUC

Model validation by landslide density method:

The calculated landslide density shows that the susceptibility class wise landslide was increased towards the higher value classes and maximum landslide density of 0.2695 was found in very high landslide susceptibility class (Table 9).

Table 9 Landslide densities of different landslide susceptibility classes

On the basis of calculated landslide density, a landslide density curve was prepared to visualize the trend of landslide density. As the result of landslide density method was perfectly matched with the rules, so it can be stated that the overall accuracy and predictive capacity of the logistic regression model was very satisfactory.

Conclusion

GIS based statistical modeling is widely used in landslide susceptibility analysis. These methods are very much suitable and it has higher acceptability for the areas where data availability is minimum. Generally landslides are not easy to predict and forecast. In this regard, a landslide susceptibility map may provide brief and necessary information about slope instability of a particular region. Landslide susceptibility map can be a useful tool for the decision makers and planners as it demarcates and delineates the potential areas of future landslides. As complex geo-environmental phenomena, landslide is driven by a group of factors and all factors are not equally responsible for the event. In this condition logistic regression model is very much suitable for the analysis of relationship between landslide and its related factors as it estimates the correlation in a non-linear structure. In the present study, non-morphometric factors (geology, soil, NDVI, land use/land cover and distance from road) were found more responsible than the morphometric factors. Finally it may be concluded that, the landslide susceptibility map of the Rorachu river basin can be treated as an important source of information for future urban and land use planning of the region.