1 Introduction

Natural disasters occur due to the physical processes present in the natural dynamics of the planet, but anthropic actions play an increasingly intense role in the occurrence of these phenomena (Smith 2013; Tehrany et al. 2014; Noh et al. 2016). According to the Intergovernmental Panel on Climate Change (IPCC 2014), natural disasters have been occurring with higher frequency and magnitude, causing, consequently, an increase in the damages to society. Factors such as the influence of unplanned urbanization and economic development, intensification of human activities and climate change disrupt the natural balance of the environment, which enhances the occurrence of both environmental problems and the susceptibility to landslides (Kjeldsen 2010; Bai et al. 2011; Dou et al. 2015a).

The impacts of disasters caused by both natural phenomena and anthropogenic actions can be accentuated by socioeconomic issues, such as the lack of urban planning, public policies and effective disaster prevention actions (Farber et al. 2010; Mata-Lima et al. 2013). According to reports from the emergency events database, most disasters occur in developing countries, associated with high population density in areas at risk and social vulnerability of the population (EM-DAT 2017). As these phenomena are not completely predictable, it is necessary to identify the areas susceptible to disasters prior to human occupation, so that mitigation measures can be implemented (Bai et al. 2011).

Mass movements consist of slope mass-wasting processes, which can be derived from soil, rocks or debris, under the influence of gravity (Gariano and Guzzetti 2016), characterizing themselves as some of the most destructive phenomena in nature and one of the most common risks, responsible for immense property damage and hundreds of deaths and injuries each year, as well as causing impacts to the environment. In this article, we highlighted the landslides, which are phenomena that can occur naturally, but they can also be induced by human activities (Saha et al. 2002; Highland and Bobrowsky 2008; Pradhan 2010; Ozdemi 2013; Chen et al. 2015). Landslides usually result from the association of several environmental factors, especially heavy rainfall on sloping terrain, conditions in which this phenomenon is more likely to occur. Topography is the most influential factor in the occurrence of landslides, along with the geological, geomorphological and vegetation cover characteristics (Bai et al. 2011; Ozdemi 2013; Sujhata and Rajamanickam 2014). Changes in land cover, deforestation and increased livestock activity on the slopes are some of the factors that can trigger large landslides (Shirzadi et al. 2012).

Through scientific analysis, it is possible to identify areas susceptible to landslides, which is of great value in the urban and regional planning process, as this information can be used in the elaboration of measures to avoid or at least reduce the impacts caused by these extreme events. The susceptibility maps that illustrate the areas most prone to landslides are of high interest and can be used in environmental management, land use planning and infrastructure development (Dou et al. 2019). However, the challenge of understanding or predicting landslides is still great. Based on the relationships between past occurrences of landslides and a set of environmental factors, it is possible to establish modeling of the susceptible areas (Pradhan 2010; Oh and Pradhan 2011). Several attempts were made to reduce the subjectivity and error of the mappings and with the development of both computing and geographic information systems (GIS), expectations for the use of large datasets have increased. GIS has been widely used for the determination of landslide areas due to the innumerable built-in statistical methods. In the last decades, the use of quantitative methods for the creation of susceptibility maps and statistical models using bivariate or multivariate techniques has become popular (Nandi and Shakoor 2009; Bai et al. 2010; Wu and Song 2018).

To obtain a reliable evaluation, it is necessary to choose an appropriate methodology for analysis and modeling. There are several studies on the evaluation of susceptibility to landslides using GIS and logistic regression (LR). Nandi and Shakoor (2009) used bivariate and multivariate statistical analyses to predict the distribution of landslide areas and GIS to evaluate the relationship between the events and the factors that contribute to its occurrence. Pradhan (2010) carried out the landslide hazard map approach through the cross-validation of a multivariate logistic regression model, using 10 input variables. Shirzadi et al. (2012) describe the application of logistic regression to rock-fall susceptibility. The results from this study demonstrated that use of a logistic regression model within a GIS framework is useful and suitable for rock-fall susceptibility mapping and can be used to reduce susceptibility associated with rock-fall. Schlögel et al. (2018) evaluated the effect of the spatial resolution of the digital elevation model (DEM) on the modeling quality of areas susceptible to landslides and used logistic regression techniques. Despite the efficient results in the mentioned models, they require a wide range of data, which is difficult to obtain. The advantages of using LR methods are the ease of dealing with categorical independent variables, as well as classifying individuals into categories. In addition, the results provide probability and have a high degree of reliability.

The objective of this work was to develop and assess a landslide prediction model using geomorphological variables and logistic regression. GIS was used to map areas susceptible to landslides, using only four predictive variables (slope, geology, pedology and land), having as a case study the municipality of Novo Hamburgo, Rio Grande do Sul, Brazil. The assessment of the results generated by the developed model was carried out by means of comparison with the results from a traditional model for assessment of landslides, proposed by the Geological Survey of Brazil (CPRM).

2 Study area

The municipality of Novo Hamburgo is located at coordinates 29°47′ S and 51°13′ W, in the State of Rio Grande do Sul, southern Brazil. The number of inhabitants is approximately 238,940 and the territorial area is 224 km2 (IBGE 2010) (Fig. 1).

Fig. 1
figure 1

Location of the study area

The study area is part of the phytogeographic region classified as semideciduous Seasonal Forest, which nowadays only exists on the slopes of the Serra Geral, and the climate is subtropical, with four well-defined seasons. The average annual precipitation is 125.2 mm, with the rainiest months being September and June (Fig. 2).

Fig. 2
figure 2

Precipitation and temperature average of the study area

The largest expansion of urban area of the municipality and the triggering of occupation of irregular areas occurred between 1977 and 1997. The exhaustion of the territory suitable for urban occupation resulted in flood and landslides problems, due to the invasion of hazard areas by the population of low income. During the last 25 years, three episodes of landslides occurred in inhabited areas: in 1993, the first landslide in the municipality was recorded, which occurred on the outskirts of the city and reached two residences; in 2002, another landslide occurred, partially destroying two houses and causing the evacuation of the area, where there were approximately 80 houses; and in 2011, a landslide led to the deaths of three children (Riegel and Quevedo 2015).

3 Materials and methods

The susceptibility map proposed and validated by Riegel and Quevedo (2015) was used as a dependent variable for the logistic regression model (LRM). The method for identifying areas susceptible to landslides proposed by Riegel and Quevedo (2015) is based on the crossing of thematic maps, considering the use of weights and hierarchical values (Fig. 3a). The work is based on the methodology proposed by Ross (1994), which aims to attribute ranges of environmental fragility, so the use of weights and values of importance were proposed based on a bibliographical review.

Fig. 3
figure 3

(a) Susceptibility map proposed by Riegel and Quevedo (2015) (b) Map reclassified with the dependent variable

The susceptibility map constructed by Riegel and Quevedo (2015) aims to identify landslide areas and used the following thematic maps: map of the slope, map of land use and coverage, pedological map and map of geological aspects. The first procedure was the reclassification of the maps, based on the definition of pre-established weights, followed by the transformation of the files from vector to raster. The hierarchical importance between the thematic maps was established and values of importance for each variable were assigned. The scores from 1 to 5 attributed to each of the variables were established by Riegel and Quevedo (2015) based on Ross (1994), Highland (2008) and Brazil (2012), where 1 corresponds to the smallest contribution to the susceptibility to landslides and 5 to the bigger susceptibility to landslides (Table 1).

Table 1 Weights and importance values attributed to each variable due to susceptibility to landslides (Riegel and Quevedo 2015)

To adjust the susceptibility map for the construction of the LRM, the first step was to establish a binary formulation using the software ArcGIS®. Thus, high and very high-risk areas were scaled with a value of 1, while regions of moderate, low and very low risk of susceptibility were defined as 0, according to Fig. 3b. This procedure was performed in order to separate the susceptible regions from the low-risk areas, establishing an entry map for the LRM formulation.

As predictive variables for the LRM process, the following thematic maps were used: slope, geological aspects, pedological aspects and land use and coverage. The slope map was extracted from the digital terrain model (DTM), obtained through the ASTER GDEM system, with a resolution of 30 m, using the ArcGIS software. The pedological and geological maps were extracted from the bases built by the RADAMBRASIL project (1983). Finally, the map of land use and coverage was taken from the databases provided by the MONALISA Project (2005), with an update of the urban area of 2009.

Figure 4 presents the thematic maps used as independent variables. The slope map (Fig. 4a) shows a variation from 0 to 52°. According to Bai et al. (2011), Ozdemi (2013), Paulín et al. (2014) and Meten et al. (2015), the slope is one of the determining factors in the occurrence of landslides.

Fig. 4
figure 4

(a) Map of slope (b) Map of geological aspects (c) Map of land use and coverage and (d) Pedological map

Geology is one of the main factors influencing the occurrence of landslides, being used in the LRMs of Bai et al. (2011) and Paulín et al. (2014). The geological map (Fig. 4b) is composed of the following characteristics: (1) Alluvial deposits, (2) Alluvial Caves, (3) Serra Geral Formation—Facies Gramado, (4) Botucatu Formation, (5) Pirambóia Formation and 6) Peat. The Coluvial and Alluvial Reservoirs, composed mainly of sands and silty-clayey sediments, were formed in the Holocene, Quaternary. The Serra Geral Formation is composed of basic effusive rocks (basalts), having their origin in the Cretaceous. The Botucatu and Pirambóia Formations are composed by sandstones and date, respectively, to the Jurassic and Permian periods.

In relation to the map of land uses and coverage (Fig. 4c), the following classes are observed: (1) native forest; (2) native forest and rural anthropic environment; (3) secondary vegetation and rural anthropic environment; (4) forestry; (5) rural environment; (6) urban sprawl; (7) urban occupation; (8) bathed; (9) water blade. Anthropogenic actions are directly linked to a variety of disasters, including landslides, given their changes in the environment (Kjeldsen 2010; Bai et al. 2011).

Finally, in the pedological map (Fig. 4d), the following classes are observed: (1) Arsenic Red Dystrophic Argisol, (2) Reddish Argisol Latossolic, (3) Typical Dystrophic Red Argisol, (4) Typical Orbicular Haplico Chernosol and (5) Hydromorphic Eutrophic Arsenic Planosol. With the increase in landslides, the use of pedological maps has enabled pertinent information to identify areas of risk (CPRM 2008).

After the compilation and organization of the data, the maps were transformed into raster format, considering the spatial resolution of 30 m, thus allowing the compatibility of all the files. Once this procedure has been performed, the files have been exported from ArcGIS to the ASCII Raster extension, which allows one to view the data in a spreadsheet format, in which each cell corresponds to a pixel in the study area. As input data for the LRM, 50% of the pixels of the study area were used, i.e., 123.308 pixels. The remaining pixels were subsequently used for calibration and validation of the model.

The process proposed in this study relates the use of LR to GIS, which began to be more used after the 1990s and became more recurrent in the last years (Venticinque et al. 2007; Nandi and Shakoor 2009; Pradhan 2010; Schlögel et al. 2018). LR consists of a model that relates a set of p independent variables to a dependent binary variable Y that assumes only two possible states, 0 or 1. The logistic model allows the direct estimation of the probability of occurrence of an event (Y = 1):

$$P\left( {Y = 1} \right) = \frac{{\exp (\beta_{0} + \beta_{1} x_{1} + \cdots + \beta_{n} x_{n} )}}{{1 + \exp (\beta_{0} + \beta_{1} x_{1} + \cdots + \beta_{n} x_{n} )}}$$
(1)

in which βi (i = 0, 1, …, n) are the parameters of the model, estimated by the maximum likelihood method. Thus, the likelihood function will be given as:

$$L\left( \beta \right) = \mathop \prod \limits_{i = 1}^{n} \pi_{i}^{{y_{i} }} \left( {1 - \pi_{i} } \right)^{{1 - y_{i} }}$$
(2)

with yi = {0,1} e πi = P(Y = 1/X = xi).

The method consists in estimating β that maximizes the log-likelihood function, determining the parameters that best reproduce the observed data, maximizing L(β). After the estimation of the parameters of the model, the verification phase of the model adjustment to the data was initiated, in which the likelihood ratio test consists of comparing the joint probabilities of the samples, determining the inclusion or not of the variables to the model (Owen 2001).

The model was generated in the software SPSS®, which indicated an equation of probability of occurrence to landslides, which was applied and validated, using all the pixels of the mapping (246,615 pixels). Then, in the ArcGIS® software, using the “Raster Calculator” tool, the map was spatialized. The final result was also validated with the landslide susceptibility map constructed by CPRM (2015) using the receiver operating characteristic curve (ROC) (Peres and Cancelliere 2014).

In order to assess the accuracy of the model in predicting risk areas for landslides, a field study was carried out in the susceptible areas of the municipality of Novo Hamburgo, considering free access areas. In this field study, 29 points that already presented the occurrence of landslides or evidence of imminent landslides were selected, as exemplified in Fig. 5.

Fig. 5
figure 5

(a) Landslide in an urban area located close to a school (b). Landslide in the backyard of a residence, located on the side of a road

4 Results and discussion

Multiple logistic regression was initially constructed based on four variables (Table 1) and on the classification shown in Fig. 3b, where high and very high-risk areas were scaled with a value of 1. The likelihood-ratio test estimator was used to determine which variables would be added to the model (Shirzadi et al. 2012). Among the analyzed variables, according to Fig. 4, the best prediction model was obtained considering all four variables. The results presented significant values (X2 = 1276,552; p < 0.01) for the Hosmer and Lemeshow test (1989), indicating that the model is useful for analysis of probability of occurrence of landslides. The Nagelkerke (R2) value resulted in 0.430, that is, the model can explain 43% of the variations recorded in the dependent variable. The coefficients of the model and the significance of these are shown in Table 2:

Table 2 Coefficients used in LRM

The coefficients are introduced in Eq. (3) in order to define the probability of landslides.

$$P\left( {Y = 1} \right) = \frac{{\exp \left( { - 4.129 + 0.089x_{{{\text{PED}}}} - 0.079x_{{{\text{USO}}}} + 0.105x_{{{\text{GEO}}}} + 0.133x_{{{\text{DEC}}}} } \right)}}{{1 + \exp \left( { - 4.129 + 0.089x_{{{\text{PED}}}} - 0.079x_{{{\text{USO}}}} + 0.105x_{{{\text{GEO}}}} + 0.133x_{{{\text{DEC}}}} } \right)}}$$
(3)

The four variables used to predict the susceptibility to landslides in the studied area were considered significant and included in the model. According to the results of the coefficients B and C.I. for EXP (B) presented in Table 2, the variable with the least contribution in the model, although significant, was land use and coverage. Shirzadi et al. (2012) eliminated the variable uses and occupations of the soil from their model because it was considered non-significant. Pradhan and Lee (2010) and Dou et al. (2015a) suggest that factors with very low or zero predictability should be removed, in order to reduce noise and uncertainties, improving the prediction and capacity of the model. In this study, it was considered pertinent to keep the variable of land use and coverage, given that most of the area susceptible to landslides is found in areas undergoing urbanization.

The LRM had a total success rate of 87.3%, i.e., a satisfactory classification ability (Table 3). Despite the reduced number of predictive variables, the model achieved a “total success rate” close to models that consider a much larger number of variables, such as the ones shown in studies conducted by Pradhan and Lee (2010), Bai et al. (2011), Shirzadi et al. (2012) and Dou et al. (2015a).

Table 3 Classification of LRM

Figure 6a presents the result of cross-thematic maps from the equation derived from the LRM, which was applied to all pixels in the study area. The initial continuous scale result was subdivided into three classes: High (> 40%), Moderate (20–40%) and Low (< 20%). The classes were established in order to approximate the RLM to the classification method of Riegel and Quevedo (2015) and allow the comparison with the CPRM susceptibility map, since these methods do not present results in terms of probability but in classes of levels of occurrence. The maps used for calibration and validation of the model are shown, respectively, in Fig. 6b and c: Riegel and Quevedo susceptibility map (2015) and CPRM susceptibility map (2015). In the three maps, it is possible to verify the similarity between the susceptible areas, consolidated to the North in the urban region, and in the South and Southeast ends of the rural environment. The three models were able to signal these regions as areas susceptible to landslides; however, the differences between them are visible, especially in areas of moderate susceptibility, where a very large variation is perceived.

Fig. 6
figure 6

(a) Map of susceptibility to mass movements—LRM (b) Riegel and Quevedo Susceptibility map (2015) and (c) CPRM susceptibility chart (2015)

LRM was validated based on the comparison of the results provided by the susceptibility map proposed by the CPRM (2015). Specifically, the results were examined quantitatively using the receiver operating characteristic (ROC) curve. To examine the reliability and efficiency of the landslide probability model, the area under the curve (AUC), which represents the accuracy of the success of the model, was calculated. The area under the curves explains how well the method and factors classify landslides and total area = 1 denotes perfect prediction accuracy (Shirzadi et al. 2012; Dou et al., 2015b). The prediction curve for the model is presented in Fig. 7, in which the AUC value is (0.825) (Table 4), which corresponds to a prediction accuracy of 82.5%. The AUC value obtained by the LRM is considered excellent according to Hosmer and Lemeshow (1989).

Table 4 Area under the ROC curve (AUC)—accuracy of the mapping of susceptibility to mass movements
Fig. 7
figure 7

ROC curve—accuracy of mapping susceptibility to mass movements

To validate the LRM, 29 points were geo-referenced, in which there have already been landslides or, at least, there is evidence of imminent landslides. Figure 8 shows an enlargement of the areas where the fieldwork was carried out, in which the 29 selected points were marked. Thus, it was possible to establish that the LRM correctly classified 86% of the points identified in the field work as being points of high or moderate probability of occurrence of landslides. In comparison with the map presented by CPRM, the percentage of correct classification is reduced to 79%, with many points being classified as having a moderate probability of landslide.

Fig. 8
figure 8

(a) Expansion of the landslide susceptibility map (LRM) with 29 validation points (b) CPRM susceptibility map (2015) with 29 validation points

It was considered that the RLM performed well in the classification of areas susceptible to landslides, even though it was estimated with a small number of predictive variables. According to Chang et al. (2019), quantitative methods such as logistic regression are useful for problem-solving and have been used successfully in risk assessment applications. It was also found that use of Riegel and Quevedo (2015) method as an information base to generate the landslide (1) and non-landslide (0) points did not compromise the predictive potential of the RLM. The model showed satisfactory levels for both accuracy and classification, equivalent to other studies that use RL and that are based on the compilation of landslide inventories, such as Pradhan and Lee (2010), Bai et al. (2011), Shirzadi et al. (2012) and Schlögel et al. (2018) (Fig. 8).

LR is the most popular susceptibility model; it is simple, straightforward, highly interpretable and produces an accurate and robust prediction (Chang et al. 2019). Thus, the LRM can be considered an efficient tool to support public managers to potentiate the reduction in the impacts of natural disasters. Planning should not allow the appropriation of these places unfit for occupancy. However, the current reality of the analyzed municipality shows that the most worrying region is located above RS 239, an area already occupied by low-income housing and low purchasing power. In addition to being characterized as susceptible, these regions are also considered Environmental Protection Areas, which have a certain occupation restriction, through low occupancy rates and utilization rates, imposed by the Municipal Master Plan (Novo Hamburgo 2004, 2010). In some parts of this region, Permanent Preservation Areas are also observed, whose occupation is not allowed (Brazil 2012). However, the displacement of the population that today occupies these areas to suitable environments is almost unviable in a territory such as the municipality of Novo Hamburgo, which has undergone several landslide processes in irregularly occupied areas. Thus, the first step would be the inclusion of susceptibility patches in the master plan and the total restriction for new occupations, in order to reduce the impact on these areas with steep hillslopes. On the other hand, because they are irregular dwellings, frequent inspection is the most appropriate method to inhibit occupation and to avoid harming society.

5 Final considerations

Over the years, the frequency of natural disasters has become a concern. The fact that landslides have very specific characteristics, and sometimes cause fatal victims, makes planning important through preventive strategies that potentiate the reduction in their negative effects. The use of GIS associated to LR was efficient in the analysis of the probability of susceptibility to landslides, resulting in a fast and practical method, which uses few variables, reducing the need for data collection, low computational cost and at the same time adequate coefficients for the definition of areas with greater or lesser degree of risk.

The simplicity of the method and the use of input data from the model that do not require an extensive field assessment establishes a response that can be used as a previous model of susceptibility analysis. This may serve as an indication of the most probable regions for the occurrence of landslides for a detailed field investigation and survey of variables to be used in more robust models. Therefore, it is characterized as a replication method for other regions, as well as presents consistent results for inclusion in zoning and occupation plans.