1 Introduction

A landslide is a geological mass shedding process of rock debris or earth down natural and engineered slopes. It also includes events such as the mass movement of rock, failures of slopes, topples slides and flows such as debris flows and mudflows (Novotný 1978; Cruden and Varnes 1996; Hungr et al. 2013; Gariano and Guzzetti 2016; Subramani et al. 2017). Landslides cause serious peril to human life, habitats, and socio-economical activities such as industrial establishments and lifelines, including transportation networks and communication systems (Martha et al. 2015). The prediction of hazardous landslide areas and their causative factors has become a growing concern due to the pressure of urbanization (Aleotti and Chowdhury 1999; Sajinkumar et al. 2011). Increasing population exerts pressure on land resources causing human activities to extend over hilly regions. This has resulted in increased socio-economic losses because of unstable slope failures during landslides (Das et al. 2011; Subramani et al. 2017; Hong et al. 2017).

According to the Geological Survey of India, about 0.42 million km2 (12.6%) of land area, excluding snow-covered regions, falls under the landslide hazard zone in India (NLRMS 2019). This 0.09 million km2 is in the Western Ghats and Konkan hills, which pass through Tamil Nadu, Kerala, Karnataka, Goa and Maharashtra (NDMA Annual Report 2018–2019; Deepu et al. 2019; Thennavan and Ganapathy 2020). The Western Ghats is the second-most vulnerable region for landslide in India after the Himalayas according to the National Disaster Management Authority, Government of India (Kuriakose et al. 2009; Anbazhagan and Sajinkumar 2011; Sajinkumar and Anbazhagan 2015; Pradhan et al. 2019). Landslide is a severe hazard that causes substantial human and financial losses in the country. On average, landslide claims about 500 lives and causes approximately Rs. 300 crore damage annually (Parkash 2015).

Historically, landslide susceptibility modelling is widely explored and has attracted widespread research interest. The methods used for susceptibility modelling are classified as qualitative and quantitative in the literature (Chalkias et al. 2014). Qualitative methods are knowledge-driven (Aleotti and Chowdhury 1999; Westen 2000), whereas quantitative methods are based on the relationships between controlling factors and landslides (Guzzetti et al. 1999). These quantitative methods can be again classified as: (i) data-driven methods, which include bivariate statistics, multivariate statistics, artificial neural networks, etc., and (ii) physically-based methods include deterministic methods and probabilistic methods (Niyogi et al. 2010; Tsai et al. 2015; Chae et al. 2017). Geographic Information System (GIS) has played a significant role in landslide research as it has combined functions of collection, storage, manipulation, display and analysis of spatially referenced data. The combination of GIS and multivariate statistical analysis is considered as a beneficial approach in landslide mapping (Van Westen 1994; Chung et al. 1995; Guzzetti et al. 1999; Chen and Wang 2007). Even though these statistical models are considered as an important method in landslide studies, they lack efficiency in explaining spatial variations in the study area while modelling (Pourghasemi et al. 2020). Especially in landslide susceptibility studies, which are area-specific, it is essential to understand the spatial interactions between landslide location and conditioning factors (Erener and Düzgün 2010). So the combination of spatial elements and the statistical model could be a more robust method for landslide modelling than the standard statistical landslide prediction methods (Pourghasemi et al. 2020).

Spatial econometrics is an evolving stream of econometrics that manages spatial interaction (spatial autocorrelation) and spatial structure (spatial heterogeneity) in regression models for cross-sectional and panel data (Anselin 2003). These methods are applied in an extensive range of studies such as demand analysis, labour economics, agricultural economics, environmental economics, etc. Spatial autoregressive model (SAR) and geographically weighted regression (GWR) are widely used global and local regression techniques. The study explores spatial models and the possibility of identifying and measuring the human influencing parameters of landslide occurrences in Kerala.

This investigation was spurred by landslides in Kerala during the years 2018 and 2019. According to Kerala State Disaster Management Authority (KSDMA), 1426 landslides are reported in 2018 and 711 in 2019 (Martha et al. 2019). This study emphasizes land-use alterations made by human interventions and their association with landslide occurrences along with other natural parameters. The study is classified into three sections: section 1 identifies the landslide hotspots in Kerala, section 2 deals with the identification of human influencing parameters that accelerate landslide occurrences and section 3 deals with landslide modelling and comparative performance of various methods.

2 Study area

2.1 Geographical and geological setting

Kerala is a relatively small state of 38,863 km2 area and is located in the southwestern tip of peninsular India between latitudes 8°17′30″–12°47′40″N and longitudes 74°51′57″–77°24′47″E (figure 1) (SoE Report 2007; Vinod 2017). The state is topographically diverse with physiographic divisions starting from low lands on the west to the midlands (7.5–75.0 m) and further on to the highlands (> 75.0 m) in the east (SoE Report 2007). There is a gradual sloping from the eastern Western Ghats to the western Arabian Sea (Das 2020). Kerala's distinct geomorphic features include the tallest mountain peak of Western Ghats, Anamudi (2695 m), 44 rivers and a coastal plain studded with several lagoons and barrier systems (Kumar 1994). The state is geologically a portion of the south Indian shield, preponderance with the crystalline rock formation (Singh et al. 2016). The state is a treasure trove of natural resources such as biodiversity, forest, water and mineral resources. Temperature and humidity are high throughout the year and the state is blessed with copious rainfall for nearly 10 months. Kerala is highly vulnerable to various natural disasters, along with changing climatic conditions. The state's topographic position with the steep slope gradient on one side and sea on the other increases disaster complexity.

Figure 1
figure 1

The study area.

2.2 Landslides of 2018 and 2019

According to KSDMA, 14.43% of the state is prone to landslide, especially during the monsoon season. Among landslides, debris flows are the most common in Kerala, named as Urul Pottal in the local vernacular (Vineetha et al. 2019). During 2018 and 2019, the state experienced the worst disaster in its history since 1924 (KPDNA 2018). 2018 disaster resulted in the death of 483 persons and large-scale loss of property and massive floods, which affected around 5.4 million people (PDNA 2018; Abraham et al. 2019; Martha et al. 2019). Several landslides, mainly debris flows, occurred in various parts of the state along with floods. According to KSDMA, in 2018, 1426 landslides were reported from 10 districts of the state. It is evident from figure 2 that, Idukki district witnessed the maximum number of landslides in 2018. The combination of high-intensity rainfall and steep slopes with highly dissected hills and thick topsoil accelerates the probability of landslide (Martha et al. 2019). The sudden release of excess water from dams due to increased water levels resulted in toe cutting of rivers and accelerated occurrences of landslides and damages (Abraham et al. 2019).

Figure 2
figure 2

Landslide location (2018 and 2019).

Following 2018, the state faced severe landslides in 2019. A total of 711 landslides and extensive flooding were reported from various parts of the state, causing the death of 121 people (KDSMA 2019 data). Even though the number of landslides reduced, the intensity was observed to be larger than the previous year causing more casualties. Figure 2 clearly shows that the northern districts of Kerala, Wayanad, Malappuram and Kozhikode are the worst affected region in 2019. In 2019, two hamlets in the state, one at Kavalappara in Malappuram district and the other at Puthumala in Wayanad district were completely washed away by landslides (https://indianexpress.com/article/explained/how-a-hill-in-the-western-ghats-buried-a-part-of-a-village-in-kerala-kavalapparalandslide-5903013). The village-wise landslide distribution for the years 2018 and 2019 are represented in figure 3.

Figure 3
figure 3

Village-wise landslide distribution.

3 Data and methods

The data sources used in the study are landslide location, topography, rainfall, geology and land-use. This study's landslide inventory comprises of 2137 landslides from 2018 and 2019 and was obtained from KSDMA. The landslide locations are spatially analyzed and intersected with land-use to identify their respective land-uses. The land-use for the study is prepared from sentinel-2 images for the year 2018 downloaded from United States Geological Survey (USGS) website. The image is classified using unsupervised classification algorithm K-means clustering and verified with government data. The quarry data for the whole state was collected from a study conducted by Forest Health Division, Kerala Forest Research Institute Peechi (Alex and Sajeev 2017). Freely available road network data from Open Street Map (OSM) is used for road density calculation. Elevation is obtained from 30 × 30 m2 Digital Elevation Model (DEM) collected from USGS. ASTER GDEM of 30-m resolution is downloaded from the USGS website and used for slope generation in GIS. Rainfall location data obtained from the India Metrological Department (IMD) is averaged and interpolated to generate the rainfall map for Kerala. Lithology and lineament were acquired from Kerala State Land Use Board, Department of Planning and Economic Affairs, Government of Kerala.

The methodology flowchart for the study is represented in figure 4. The initial part of the study identifies anthropogenic conditioning factors influencing landslides through the spatial intersection of landslide location and land-use data. Based on the results obtained, anthropogenic conditioning factors such as plantation area, built-up area, cropland, forest plantation, and quarries (figure 5) were selected for statistical evaluation along with natural condition factors (figure 6). The study is carried out at an aggregated unit level, village boundary with landslide density as the dependent variable (figure 7). The landslide hotspots were identified throughout the Kerala state. Hotspot analysis identifies statistically significant clusters of high (hotspots) and low values (coldspots) in the dataset. These clusters are a set of neighbouring spatial units with similar values. In other words, hotspots group locations with positive spatial autocorrelation (Getis and Aldstadt 2004).

Figure 4
figure 4

Methodology.

Figure 5
figure 5figure 5

Spatial distribution of human conditioning factors (independent variables): percentage distribution of (a) plantation area; (b) quarry area; (c) cropland; (d) forest plantation area; (e) built-up area; and (f) road density.

Figure 6
figure 6figure 6

Spatial distribution of natural conditioning factors (independent variables) (a) slope, (b) elevation, (c) rainfall, (d) lithology, and (e) lineament density.

Figure 7
figure 7

Village-wise landslide density (dependent variable).

The regressors used in the study include anthropogenic conditioning factors such as the plantation area, built-up area, cropland area, forest plantation area, quarry area represented as a percentage along with road density and the natural condition factors such as slope, elevation, rainfall, lithology and lineament density, aggregated at the macro-level village boundary and all these are used as independent variables. Based on the results of ordinary least square (OLS) regression, it was decided to check for spatial models. Moran’s I test on the OLS residuals indicates the presence of spatial autocorrelation in the data. Lagrange multiplier test is used to identify the suitable spatial regression model. The results of LM test found that both the spatial autoregressive model and spatial error model are significant. Since both models were found significant, the robust LM statistics is used for model selection. SAR model is selected for the study based on its lower p value as per Robust LM test. As spatial phenomena have variability across space, the local spatial analysis could endow more accurate and realistic depictions of situations, especially in landslide studies. Geographically weighted regression (GWR), a local regression method, is used to analyze the local variability of landslides. The study used Geoda software for OLS and SAR modelling and R programming for GWR.

3.1 Spatial autoregressive model (SAR)

Spatial regression is a standard linear regression model formed by incorporating spatial effects into the model (Saputro et al. 2019). SAR model considers the dependency of the spatial unit and the corresponding neighbouring units (external effects or spatial interaction) of the dependent variable. The spatial lag term ρWy is added into the regression equation to incorporate the neighbourhood effects. This is expressed as (Saputro et al. 2019):

$$ Y = \rho Wy + X\beta + \varepsilon, $$

where ρ is a spatial autoregressive coefficient and Wy is the weight matrix.

3.2 Geographically weighted regression (GWR)

GWR is a spatial regression technique used to study the local variations in the process and variables (Brunsdon et al. 1996, 1999). GWR calculates the local variation of the parameter based on geographical location, incorporating real-world spatial variations (Hong et al. 2017). In simple linear regression, a model developed as a linear function of a set of independent variable is as follows:

$$ y_{i} = a_{0} + \mathop \sum \limits_{k = 1, m} a_{k } x_{ik} + \varepsilon_{i } , \quad i = 1,2, \ldots ,n, $$

where yi is the ith observation of the dependent variable, xik is the ith observation of the kth independent variable, the εi is error terms with zero means, and each ak is the beta coefficient of the k variables and a0 is the intercept. The coefficients in the GWR are unique to each location i rather than being global estimates. This is estimated as (Hong et al. 2017):

$$ y_{i} = a_{i0} + \mathop \sum \limits_{k = 1, m} a_{ik } x_{ik} + \varepsilon_{i } , \quad i = 1,2, \ldots ,n, $$

where aik is the coefficient value of the kth parameter at location i (Brunsdon et al. 1996; Hong et al. 2017). The GWR output provides a localized estimate of parameters including R2 and provides the local variation of beta coefficients (Hong et al. 2017).

As mentioned above, this study utilizes the global regression models OLS and SAR and local regression model GWR for identifying human conditioning factors influencing landslide occurrence and the results are interpreted and analyzed.

4 Results

4.1 Landslide and anthropogenic conditioning factors

Earth's surface is continuously changing due to unsustainable practices that occur without considering the terrain conditions. Improper land-use management is a major determinant for natural disasters such as landslides (Aleotti and Chowdhury 1999; Ercanoglu and Gokceoglu 2004; Lee and Talib Jasmi Abdul 2005; Madhu and Vijith 2008; Vasantha Kumar and Bhagavanulu 2008). Large areas of forest land is cleared for agriculture, plantation, and other activities in Kerala (Jha et al. 1995; Kumar 2005). The consequences of all these are evident by the recurrent disasters occurring in the state, such as frequent flash floods and landslides, soil erosion and silting of reservoirs, resulting in ecological and environmental problems (Kumar 2005).

Table 1 consolidates the distribution of the number of landslides in each district of Kerala and its corresponding land-use. The highest percentage of landslides occurred in the Idukki district with 49.04%, followed by Malappuram (12.82%) and Wayanad (11.04%) districts. District-wise percentage distribution of land-use activities in landslide spots are represented in figure 8. While analyzing the landslides, landuse-wise, the maximum number of landslides have occurred in plantation area with 59.38% (tea, coffee, rubber, cardamom etc.) located in plantations. Forest comes second with 21.85% of the total landslides falling in forest land (includes all types of forests such as evergreen forest, deciduous forest, fragmented forest, etc.). Other human-modified land-use practices such as built-up, cropland, forest plantation (teak, pine, etc., planted forest by the government) and quarry have landslide occurrences of 0.37%, 2.20%, 2.39%, and 0.42%, respectively. From table 1, we could also infer that the landslides in the forest plantation are maximum in the Malappuram district mainly due to large government teak plantations. Ernakulam and Palakkad have more number of landslides in the built-up area. About 64.76% of total landslides have occurred in the human-modified land-uses, which could elucidate that land-use could significantly impact landslide occurrences. Therefore human conditioning factors such as plantation area, built-up area, cropland, forest plantation, and quarries were selected for statistical evaluation along with other natural conditioning factors such as slope, elevation, rainfall, lithology and lineament in the latter part of the study. Figure 9 gives the correlation of selected landslide conditioning factors and the result indicates that elevation and plantation area and elevation and slope are correlated.

Table 1 District-wise landslide distribution and land-use.
Figure 8
figure 8

Landslide and land-use percentage.

Figure 9
figure 9

Correlation of independent variables.

4.2 Hotspot analysis

The spatial pattern of landslide distribution is analyzed using hotspot analysis in the study area and depicted in figure 10. The result indicates that the landslide hotspots are concentrated in Idukki, Ernakulam, Kottayam, Wayanad, Kozhikode and Malappuram districts of Kerala. Table 2 shows that no cold spots were identified in the study area. These identified hotspots could be grouped into two clusters. The primary cluster is located in the southeast portion of the state, including areas of Idukki, Ernakulam and Kottayam districts. At the same time, the secondary cluster located in the northeast portion of the state includes parts of Wayanad, Kozhikode and Malappuram districts. Among the 14 districts in the state, six districts are part of landslide hotspots and all these hotspots are located in the Western Ghats. Hotspots in the Idukki district include 41 villages with 99% confidence and two villages with 95% confidence. In the case of Ernakulam district, one village each with 99% confidence and 90% confidence is present. While considering the Kottayam district, five villages have 99% hotspot confidence. In the case of Wayanad district, four villages are within 90% and one village in 99% confidence of landslide hotspots. For the Kozhikode district, three villages fall within 95% and one village with 99% hotspot confidence is present. In the case of Malappuram district, one village, each with confidence percentage of 95 and 99 is present. The Idukki district included in primary cluster of landslide hotspot, has plantation as its predominant land-use. The plantation is a major land-use in Wayanad and Malappuram districts too.

Figure 10
figure 10

Landslide hotspots.

Table 2 Coefficients of hotspot analysis.

4.3 OLS model

Human influencing parameters identified based on results from table 1 were regressed with landslide density aggregated to village boundaries along with natural conditioning factors. The results of OLS regression give less explanatory power as the R2 value obtained is 21.7%. Since Moran’s I value of the residuals is statistically significant, spatial autocorrelation is established, rejecting the null hypothesis. The result of the Lagrange Multiplier test (LM) indicates that both LM-Lag and LM-Error are significant, but the Robust LM-Lag statistic is highly significant, comparing to Robust LM-Error statistics. The result provides a perfect platform for the selection and estimation of the SAR model. Most of the anthropogenic conditioning factors along with the natural parameter slope and rainfall, gave statistically significant results. Details of the OLS model are provided in tables 3 and 4.

Table 3 Summary statistics for OLS model.
Table 4 Summary statistics of test for spatial autocorrelation.

4.4 SAR and GWR models

The coefficients of the six human influencing parameters and five natural condition parameters regressed on landslide density using the SAR model are presented in table 5. The results indicate that human-modified land-use activities such as plantation, quarry, cropland and road were found to be significant in the study area. Also, natural parameters slope and rainfall are influential to the landslide. Based on the coefficients of SAR results, it is clearly understood that quarry has a vital role in landslide occurrences among the anthropogenic conditioning parameters. The increase in the log-likelihood from 1315.54 (OLS) to 1590.34 for SAR indicates the improved fit of the model and its better explainability. The Akaike information criterion value reduced to −3124.68 indicates lower information loss in SAR models compared to OLS. The spatial autoregressive coefficient is estimated as 0.669. The predictive performance of the model is 50% indicating better performance than OLS.

Table 5 Summary statistics for SAR.

On the other hand, GWR allows examining the relationship between the dependent and independent variables in aggregated spatial units, as if each cell has a unique local regression parameter expressing their relationship. The range and average values of coefficients along with the model performance indicators are given in table 6. The AICc value obtained is −2977.968 and R2 is 47%. The ranges of the local coefficients of the independent variables obtained in GWR are −0.0032700 to 0.10691 with a mean of −0.00026869 (percentage area of built-up), −0.0024609 to 0.0025813 with a mean of 0.00020947 (percentage area of the plantation), −0.0077835 to 0.043267 with a mean of 0.0093223 (percentage area of the quarry), −0.0043979 to 0.013156 a mean of 0.0015400 (road density), −0.0076673 to 0.0092037 with a mean of −0.000045277 (percentage of forest plantation), −0.00040918 to 0.00099887 with a mean of 0.00017659 (percentage area of cropland), −0.0027095 to 0.032236 with a mean of 0.0094472 (average slope), −0.0011853 to 0.00055598 with a mean of 0.00000071149 (elevation), −0.00020374 to 0.00055299 with a mean of 0.000077782 (average rainfall) and −0.23077 to 0.18941 with a mean of (−0.021993) (coefficients of lithology is given in table 6). These local coefficients indicate that the relationships between the landslide density and independent variables are not stationary and they vary throughout the space. The predicted values of all three regression values are plotted in GIS and represented in figure 11. The maximum range of landslide predictions obtained in OLS is 0.344, SAR is 0.33 and GWR is 0.66.

Table 6 Summary statistics for GWR.
Figure 11
figure 11

Landslide model comparison of predicted values of OLS, SAR and GWR.

5 Discussion

The study analyzes the relationship between landslide occurrence and land-use activity using spatial overlay and statistical modelling. The results indicate that 64.76% of the total landslides fall in the category human-modified landscapes and majority of them are in plantation. According to the results of hotspot analysis, two clusters of landslide hotspots are identified in Kerala. The primary cluster includes most of the area under Idukki district and small regions of Ernakulam and Kottayam districts. The secondary cluster is located in the boundaries of Kozhikode, Malappuram and Wayanad districts. All regression results indicate that the percentage area under the categories quarry and plantation along with road density has a major role in landslide occurrences. Among the natural conditioning factors, slope and rainfall indicate major role in landslide occurrences.

The model performance indicators like AICc and R2 show that spatial regression models are better than OLS in understanding the dynamics of landslides. Comparison of the predicted plots of the three models with landslide density map indicates that the spatial pattern of prediction is better in GWR. The two clusters identified by hotspot analysis are predicted correctly as high landslide density areas by GWR. The other two models identify one more cluster in northern Kerala as high landslide density region in Kannur, Kasaragod region. The real landslide density shown in figure 7 is spatially similar in pattern with the GWR prediction compared to the other models. GWR models have a better prediction range at higher landslide densities.

6 Conclusion

This study aims to identify the landslide hotspots in Kerala, India, based on 2018 and 2019 landslides and apply spatial modelling techniques to identify human influencing parameters that accelerate landslide occurrences. Results of hotspot analysis indicate that the landslide hotspots are concentrated in two clusters, the primary cluster includes Idukki, Ernakulam and Kottayam districts and the secondary cluster includes Wayanad, Kozhikode and Malappuram districts. All these hotspots recline in the Western Ghats mountain region, which indicates the necessity of detailed regional study on landslides in the Western Ghats area. While examining the landslide locations and their corresponding land-use, the maximum number of landslides have occurred in the plantation area (59.38%), with the maximum in the Idukki district. 64.76% of the state's total landslides have happened in the human-modified land-use such as plantation, built-up, cropland, forest plantation, and quarries.

OLS, SAR and GWR models were used in the study to analyze the role of anthropogenic conditioning parameters triggering landslides. The result indicates that the human-modified landscapes also have an imperative role in landslide occurrences along with natural parameters. The study indicates that even though landslide occurring in the quarry is less in number, quarry also has a significant role in landslide occurrences comparing with other anthropogenic conditioning factors. All models clarify that human-modified land-use practices such as quarry, plantation, cropland indicate a positive correlation with landslide occurrences. Slope and rainfall are found to be positively influencing landslides.

SAR and GWR models used in the study incorporate the spatial correlations of the independent variables. Therefore, the results of the analyses illustrate that the conventional susceptibility assessment method can produce better results by applying spatial regression models. Even though the statistical evaluation of the models show that the performance of SAR model is superior, spatial examination of the predicted values prove that GWR is the better model. Since the regional variation in each variable is considered in GWR models, this is more useful to understand the behavioural pattern of independent variables. The study surmises that both global and local models can be used for landslide studies based on their applicability. SAR model could be used if the spatial effects are to be incorporated in a global aspect, while GWR can be applied if the local effects of spatial variations are to be analyzed.