Introduction

Landslides are defined as all downslope movements of Earth materials; they include earth slide, rock falls, topples, or debris flows (Varnes 1984). Landslides have caused considerable damage to property and human lives. Billion dollars (Brabb 1984) are lost every year throughout the world as a direct consequence of landslides. In terms of human lives, they caused as many as 100,000 lives in China (Li and Wang 1992) and as many as 254 persons have recently died as a consequence of the February 17, 2017 mudslide in Colombia. Bell (1999) stated that for the same degree of hazard, losses are of higher order of magnitude in developing countries than in developed ones. In developed countries, adequate mitigation measures are very often implemented, while in developing countries, the lack of resources prevents adequate spending on landslide mitigation and research.

Mitigation against landslide incidence and taking this hazard into consideration for local or regional planning requires a deep understanding of the factors governing this process. Landslide occurrence is the result of a simultaneous interplay between a set of conditioning factors or quasi-static parameters (Carrara et al. 1990 and 1991) and triggering ones. The quasi-static parameters such as geology, structure, mineralogy, slope angle, slope aspect, drainage network, lineament, land use, land cover, and relief usually condition a given area to landslide. In fact, the spatial distribution to these independent parameters defines the susceptibility to landslides in an area (Carrara et al. 1990 and 1991).

In landslide susceptibility assessment procedure, it is customary to use the uniformitarism principle which states that, in the realm of landslides, actual landslides occur under the same conditions that led to the generation of previous slides (Brabb 1984). Using this principle, landslide susceptibility assessment can be dealt with using the heuristic approach, the statistical approach, or the deterministic mechanical approach.

For regional studies, the first two approaches are often used while on comparatively larger scales, the deterministic approach is more appropriate.

Several statistical modeling techniques have been used to study the susceptibility to landslide of every pixel of an area. Such techniques are weight of evidence (Bonham-Carter 1994), frequency ratio (Lee and Min 2001), analytical hierarchy process (Saaty 1980), logistic regression (McFadden 1973), and many more. These methods have been successfully applied for landslide susceptibility assessment by several researchers such as Lee and Sambath (2006), Pradhan and Lee (2010), Pradhan et al. (2010), Chen et al. (2016), Teerarungsigul et al. (2016), Wang et al. (2016), Le et al. (2017), Aditian et al. (2018), and Jacobs et al. (2018), using the GIS software for handling the geospatial database.

The present research work deals with the assessment of landslide susceptibility in Souk Ahras area. It aims to assess landslide susceptibility on a pixel-based mapping unit in the area using three statistical modeling tools such as logistic regression, frequency ratio, and weight of evidence. The performance and the prediction accuracy of each of the three models will be evaluated in a part of the study area reserved for this purpose. The model which gives the best prediction rate, among the three, will be used to predict the potential landslide locations in neighboring areas. Furthermore, the produced susceptibility map will be a valuable document for city planning, infrastructure construction, and agriculture developments in the region.

General setting

The study area covers 344.78 km2 between longitudes 36° 17′ 59.76″ to 36° 7′ 12.53″ N and 7° 57′ 6.31″ to 7° 37′ 22.68″ E latitudes and includes two municipalities belonging to the administrative district of Souk Ahras (Fig. 1). It is characterized by hilly terrains that reach a maximum altitude of 1286 m and scattered settlements that are sometimes located on steep slopes. The inter-annual variations in rainfall over the period of 1986–2015 show that 2009 is the year with maximum precipitations with 1180 mm/year and 1993 is the driest with 391.3 mm/year. The geological setting is typical of the Medjerda upstream basin originated from the evolution of the tellien external zones (Vila 1980). Being part of this complex domain, the study area is disturbed by Triassic uprising diapirs and effects of the Neo-tectonic events that modified the original sedimentary setup (Chabbi et al. 2016). The litho-stratigraphic succession consist of a mixture of marls, clayey marls, and limestone of the upper Cretaceous; clay marls, yellow limestone, and marls of the Paleogene; conglomerates, sandstones, clays, and marls of the Neogene’ and slope scree, gravel, sand, silt, and the superficial deposits of the Quaternary (Mahtali 2009). Hadji et al. (2014) have shown that the fissured marls of Souk Ahras region play a fundamental role in the development of the slope failure processes.

Fig. 1
figure 1

Geo-graphical location of the study area, presented in the digital elevation model

Materials and methods

Slope instability phenomena are the interplay of a variety of factors, involving geological, geomorphological, and hydroclimatology characteristics of the terrain and human-related activities. Consequently, a large amount of spatial data has to be analyzed to predict the stability of the slopes and hillsides within the area. In this study, a landslide database comprising a set of ten factors such as lithology (rock type), slope angle (°), slope aspect (°), profile curvature, plan curvature, distance to river (m), proximity to road (m), distance to faults (m), elevation (m), and precipitation (mm) is created. Each of the above attributes is presented as a thematic layer of information within the GIS software.

Landslide susceptibility assessment in the area is elaborated over three main stages, the first being the setting up of landslide inventory map of the study area, generated by visual interpretation of satellite images, dichotomous images, and field surveys. The second one is thematic mapping of ten factors contributing directly or indirectly in the occurrence of landslides. The third stage deals with data handling and the calculation of the susceptibility values for each pixel within the study area by the application of statistical methods.

All landslide conditioning factors are managed into GIS platform using Arc GIS software, whereas the statistical procedure is done, using SPSS statistical package.

Inventory map

The landslide inventory map (Fig. 2) provides the spatial distribution of existing landslides. It is the first and the most important thematic layer in landslide susceptibility assessment procedure. It was mapped on the basis of multisource data such as visual interpretation of Landsat 8 images, which is considered to be a tool for timely and large-scale monitoring changes in land use (Chen et al. 2013), previous works dealing with the problem of landslide in the region and direct mapping in the field. Photographs of a few significant phenomena that occurred at the Hanancha and Tiffech region are shown in Fig. 2b–e.

Fig. 2
figure 2

Landslide inventory. a Landslide inventory map including landslide training and validation data set. b Landslide occurred in Tiffech entryway RN 81 at Pk88+00 (36° 13′ 7.5″ N, 7° 51′ 33.91″ E, alt. 700 m). c, d two landslide in Hanancha entryway RN 81B at Pk03+00 (36° 15′ 48.32″ N, 7° 52′ 10.92″ E, alt. 720 m). e Landslide in the northern flank of the dam of Ain Dalia (36° 16′ 33.21″ N, 7° 49′ 34.16″ E, alt. 680 m)

This map includes 301 landslide polygons distributed over 13,451 pixels. The pixel size of the landslide raster is 30 m × 30 m. The landslide inventory map was randomly partitioned into two data sets: training data set with 10,760 pixels (80%) for building the landslide susceptibility models and validation data set with 2691 pixels (20%) for validating the performance of these models. The distribution of entire landslide data set, training landslide data set, and testing landslide data set for each parameter classes are shown in Fig. 13a–j.

Landslide conditioning factors

To build the landslide susceptibility model, all the preparatory and triggering factors for landslide occurrences in the study area need to be analyzed with the assumption that the past and present are keys to the future (Varnes 1984). The landslide pre-disposing factor database used is a digital elevation model (DEM) with 30 m of spatial resolution taken from the United States Geological Survey (USGS), type Shuttle Radar Topography Mission (SRTM), orthophotos, geological maps, and precipitation data (29 years of measurements). All the landslide conditioning factors were divided into several classes based on frequency analysis of landslide occurrences in the study area spread over a total of 383,081 pixels (30 × 30 m) with 679 rows and 988 columns.

Geology plays an important role in landslide incidence process as different geological units have different susceptibilities to active geomorphological processes (Lee and Talib 2005; Yesilnacar and Topal 2005; Lee and Evangelista 2006; García-Rodríguez and Malpica 2010). The geological features were digitized on the basis of four (1:50,000) geological maps covering the study area (Souk Ahras, M’ Daourouche, Sedrata, and Abdi) (Fig. 3). The 22 lithological units occupying the study area have been grouped into eight categories to simplify their management (Table 1).

Fig. 3
figure 3

Lithological map: 1, fluvial alluvium; 2, gravitational formation; 3, alluvium; 4, Diluvian formation; 5, limestone; 6, red clays, silts; 7, conglomerates, sandstones, and clay; 8, conglomerates, gravelites, sandstones, clays, and marls; 9 siltstones, clayey marl, sandstone; 10 sandstones, marl, conglomerates; 11, quartz sandstone, gray clays; 12 bituminous black and brown limestone with globigerina, black marl limestone; 13, black clay marl; 14, marl, with rare intercalations of limestone; 15, limestone with rare intercalations of marls; 16, limestone with inocérames and marl with Globotruncana; 17, marl clay, gray marl limestone; 18 limestone, limestone and sandstone, calcareous marl; 19, marls and gray marl clay and past marl-limestone; 20, clay and gypsum-sandstone; 21, dolomite; 22 gray limestone, marl, dolomite. (Digitized from Souk Ahras, Sedrata, Abdi and M’ Daourouche geological maps 1/50000)

Table 1 Classification of lithological units in the study area

Tectonic features play a significant role in landslide occurrence. Usually, rocks adjacent to fault zones are heavily fractured and weathered, which produce favorable geological conditions for landslides to occur (Fig. 4). The 74.6% of all the landslide events in the study area occurred in a distance less than 500 m from faults. To assess the relationship between lineaments and landslides initiation, buffer zones were drawn on both sides of the existing faults (< 50, 50–200, 200–500, 500–1000, and > 1000 m).

Fig. 4
figure 4

Distance to fault map of the study area

The road construction is one of the most important anthropogenic factors in triggering slope instabilities. To take into consideration the influence of the proximity of roads in the landslides occurrence, buffer zones on both sides of the roads were created. The study area was divided into six buffer zones around the roads using multiple buffer analysis in GIS to categorize this layer into six different classes such as < 50, 50–150, 150–250, 250–500, 500–1000, and > 1000 m (Fig. 5).

Fig. 5
figure 5

Distance to roads map of the study area

Rainfall is one of the main triggering factors for landslides, particularly, in mountainous areas. In general, water in pore spaces and cracks causes an increase in hydrostatic pressures and subsequently a reduction of the shear strength. The mean annual precipitation map (1986–2015) (Fig. 6) has been interpolated from precipitation data of six meteorological stations located inside and in the vicinity of the study area.

Fig. 6
figure 6

Rainfall map of the study area

Slope angle is the principal conditioning factor in landslide incidence that is frequently used by researches in landslide susceptibility mapping (Bui et al. 2012; Nourani et al. 2014). As a generalized concept, with the increase in slope angle, the destabilizing force component of the land mass increases while the normal component decreases. Consequently, the resisting force decreases which directly leads to instability when the critical angle is reached. The slope map of the study area was derived from the digital elevation model (Fig. 1). It was classified into six different classes with an interval of 5° (Fig. 7). The dominant terrain units in the study area (70%) have slope angles between 5° and 20°.

Fig. 7
figure 7

Slope angle map of the study area

The elevation is found to be one of the factors influencing stability (Regmi et al. 2010). It varies from 512 to 1287 m, and the elevation decreases from the northeast to the southwest direction. The elevation values were divided into eight classes such as 521–600, 600–700, 700–800, 800–900, 900–1000, 1000–1100, 1100–1200, and 1200–1287 m (Fig. 8).

Fig. 8
figure 8

Elevation map of the study area

The slope aspect map (Fig. 9) of the study area is also derived from DEM. It gives information on the exposure of the slope relative to the north. Less sunny slopes are less exposed to evaporation and therefore contain more moisture which contributes to reducing soil shear resistance. Consequently, the slope covering materials become more susceptible to slide downwards (García-Rodríguez and Malpica 2010). Its values indicate the direction of the cell’s slope faces to north, northeast, east, southeast, south, southwest, west, northwest, or flat land (Avtar et al. 2011).

Fig. 9
figure 9

Slope aspect map of the study area

Plan curvature differentiates between the concavity and the convexity of slopes. Positive values indicate that the surface is upwardly convex in that cell, and negative ones shows that the surface is upwardly concave. A zero value represents a flat surface (Lee and Evangelista 2006). In this study, the plan curvature has been divided into three classes namely convergence, flat, and divergence curvature (Fig. 10).

Fig. 10
figure 10

Plan curvature map of the study area

The profile curvature indicates the curvature of the surface in the direction of slope (Wilson and Gallant 2000). It affects the flow velocity of water draining the surface and influences erosion and deposition. The profile curvature was reclassified into three classes namely concave (−), flat (0), and convex (+) (Fig. 11). In convex profile curvature, the erosion will prevail while depositions occur in concave ones.

Fig. 11
figure 11

Profile curvature map of the study area

The hydrographic network influences the distribution of unstable areas by creating and maintaining fresh slopes as a consequence of erosion by ravines at the break of the slopes which cause soil movements. Hence, there is a need to designate buffer zones by measuring the distance, separating the drain from the vulnerable zone. Multiple buffers were applied to categorize this factor into six classes such as < 50, 50–100, 100–150, 150–200, 200–250, and > 250 m (Fig. 12).

Fig. 12
figure 12

Distance to river map of the study area

Landslide susceptibility assessments

The frequency ratio model

The frequency ratio method (Lee and Min 2001) is one of the simplest probabilistic models based on the spatial relationships between the distribution of landslides and landslide conditioning factors (Youssef et al. 2015; Chen et al. 2016; Aditian et al. 2018) (Fig. 13). The frequency ratio (FR) of a particular parameter class is defined as the ratio of landslide percent area of the class to the total percent area of that particular class. The frequency ratios of different parameter classes are given in Table 2.

Fig. 13
figure 13figure 13

The relationship between landslide occurrence (entire landslide data set, training landslide data set, and training landslide data set in percentage) and each factor

Table 2 Spatial relationships between landslide conditioning factors and landslide events using frequency ratio and logistic regression models

A weight value less than 1 indicates a low probability of landslide occurrence, while a weight value greater than 1 indicates a greater susceptibility to the phenomenon.

In order to combine all the weight values of the n parameters, an overall landslide susceptibility index (LSI) is calculated by summing all the weights of the parameters using the following formula (Lee and Talib 2005):

$$ \mathrm{LSI}={\mathrm{FR}}_1+{\mathrm{FR}}_2+\dots .+{\mathrm{FR}}_n $$
(1)

The LSI values were mapped on a landslide susceptibility map (Fig. 14a).

Fig. 14
figure 14figure 14

Landslide susceptibility maps. a Using frequency ratio model. b Using weights of evidence model. c Using logistic regression model

Weights of evidence model

Weights of evidence (WoE) method is the Bayesian probability model in a log-linear form using prior and posterior probability and is applied where sufficient data are available to estimate the relative importance of evidence by statistical means (Bonham-Carter 1994).

This method was originally developed for a nonspatial application of medical diagnosis (Spiegelhalter and Knill-Jones 1984). Afterwards, it was applied to assess mineral potential mapping with GIS (Bonham-Carter et al. 1989, Bonham-Carter 1994), and then, this method has also been implemented to assess landslide susceptibility (Neuhäuser and Terhorst 2007; Dahal et al. 2008; Regmi et al. 2010; Khosravi et al. 2016; Teerarungsigul et al. 2016). A detailed description of the mathematical formulation of the method is available in Bonham-Carter (1994), Bonham-Carter et al. (1989) Dahal et al. (2008), and Pradhan et al. (2010).

In the present study, we have used the weights of evidence modeling for landslide susceptibility evaluation and mapping. According to the method, positive and negative weights (W+ and W) are assigned to each landslide causative factor (U) based on the presence or absence of the landslides (A) within the area (Eqs. 2 and 3). Hence, this method utilizes the landslide inventory data for weighting the factors (Bonham-Carter et al. 1989).

$$ \mathrm{W}+=\ln \frac{\mathrm{P}\left\{\mathrm{A}/\mathrm{U}\right\}}{\mathrm{P}\left\{\mathrm{A}/\overline{\mathrm{U}}\right\}} $$
(2)
$$ \mathrm{W}-=\ln \frac{\mathrm{P}\left\{\overline{\mathrm{A}}/\mathrm{U}\right\}}{\mathrm{P}\left\{\overline{\mathrm{A}}/\overline{\mathrm{U}}\right\}} $$
(3)

where W+ and W are the weights for the presence or absence of landslides within a certain class of a causative factor map, P is the probability, and ln is the natural log. A is the presence of potential landslide causative factor, Ā is the absence of a potential landslide causative factor, U is the presence of landslide, and Ū is the absence of a landslide.

The weights measure a correlation between evidence (predictive variable) and event (response variable). The weights for the binary predictor factor are defined (Pradhan et al. 2010) as follows: a positive weight (W+) indicates that the causative factor is present at the landslide location, and the magnitude of this weight is an indication of the positive correlation between presence of the causative factor and landslides. A negative weight (W) indicates an absence of the causative factor and shows the level of negative correlation (Dahal et al. 2008; Xu et al. 2012).

The difference between the two weights is known as the weight contrast or the final weight (Dahal et al. 2008), which is expressed as follows (Xu et al. 2012):

$$ C={W}^{+}-{W}^{\hbox{--} } $$
(4)

The magnitude of the contrast reflects the correlation between a causative factor class and the occurrence of landslides. For a spatial association, the value of C is positive, and when a spatial association is lacking, the value is negative (Kayastha et al. 2012). The standard deviation of W is calculated as follows:

$$ S(C)=\sqrt{S^2{W}^{+}+{S}^2{W}^{-}} $$
(5)

Whereas S2W+ is the variance of the positive weights, S2Wis the variance of the negative weights. The standardized contrast C/S(C) gives a measure of confidence (Neuhäuser and Terhorst 2007). The result is given in Table 3. All negative and/or decimal weights of the contrast values (C) are transformed into integers by addition and multiplication in the statistical package for their use in the production of the landslide susceptibility map of the study area.

Table 3 Spatial relationship between landslide conditioning factors and landslide events using weights of evidence model

Logistic regression model

Logistic regression, one of the multivariate analysis methods developed by McFadden (1973), forms a multivariate regression relation between a dependent variable and several independent variables (Lee 2005) by the use of a nonlinear relationship (Yesilnacar and Topal 2005). The detail descriptions of the logistic regression technique can be found in the literature (Hosmer and Lemeshow 2000).

In the logistic regression approach, the dependent variable is dichotomous and the independent variables can be either continuous or discrete or any combination of both types and they do not necessarily have normal distributions (Lee and Sambath 2006; Bai et al. 2010; Jacobs et al. 2018).

In this paper, we have used LR method for landslide susceptibility mapping, in order to find the best fitting model to describe the relationship between the dependent variable which is a binary variable representing the presence or absence of landslides (1 or 0) and ten independent parameters. Logistic regression model applies maximum likelihood estimation after transforming the dependent variable into a logit variable (Bai et al. 2010). The logistic model can be expressed as follows:

$$ P=\frac{1}{1+{e}^{-z}} $$
(6)

where P is the probability of landslide occurrence, and it varies from 0 to 1 on an s-shaped curve; z represents the linear combination of the predictive variables, and it varies from − ∞ to + ∞. It is defined as follows:

$$ Z={B}_0+{B}_1{X}_1+{B}_2{X}_2+\dots {B}_n{X}_n $$
(7)

where B0 is the intercept of the model, B1, B2Bn represents the coefficients of the LR model, and X1, X2Xn represent the independent variables.

For the analysis, we have produced a map showing the area affected by landslides, with a total of 10,761 landslide pixels and an equal proportion of non landslides pixels that were randomly chosen from the landslide-free area to represent the dependent variables (1 for landslide presence) or (0 for landslide absence). The spatial database of each factor was converted into raster format with a pixel resolution of 30 × 30 m. The conversion of parameters from nominal to numeric is done through the creation of dummy variables for all the categories of each independent variable. The raster landslide and the factor maps were converted into dbf format for their use in SPSS Version 20 statistical software, and the correlations between the landslide event and each factor are calculated, in order to have a landslide susceptibility map. Using Eq. (7) and the coefficients shown in Table 2, the final equation predicting the landslide occurrence is obtained as follows:

$$ Z=-0.2694042891986+\left(0.073089321975\times \mathrm{elevation}\right)+\left(0.051630073777\times \mathrm{slope}\ \mathrm{aspect}\right)+\left(-0.09979018356\times \mathrm{distance}\ \mathrm{to}\ \mathrm{faults}\right)+\left(-0.00305615436\times \mathrm{distance}\ \mathrm{to}\ \mathrm{river}\right)+\left(-0.035805899229\times \mathrm{lithology}\right)+\left(0.011739997814\times \mathrm{plan}\ \mathrm{curvature}\right)+\left(-0.13323703649\times \mathrm{precipitation}\right)+\left(0.112478991710\times \mathrm{profile}\ \mathrm{curvature}\right)+\left(-0.11167225408\times \mathrm{proximity}\ \mathrm{to}\ \mathrm{road}\right)+\left(0.241084809732\times \mathrm{slope}\ \mathrm{angle}\right). $$

The three susceptibility maps based on FR, WoE, and LR statistical models have been divided into five classes as very low, low, moderate, high, and very high, using the natural break method (Fig. 14a–c). The susceptibility class and the percentage of landslide area in each class are shown in Table 4 and Fig. 15.

Table 4 The five susceptibility zones
Fig. 15
figure 15

Histogram of landslide area and the susceptibility class generated by tree models “LR, FR, and WoE”

Validations of landslide susceptibility maps

The prediction models constructed by many methods will have no scientific significance without validation (Bui et al. 2012). In this paper, the validation of landslide susceptibility models is checked using the area under the curve (AUC) method. The landslide susceptibility maps constructed by the FR, WoE, and LR models were compared with both the training and validation data sets. The AUC values obtained represent respectively the success rate and the prediction rate of the used models. The success rate describes how well the model fits with past events, and prediction rate describes how well the model predicts the occurrence of landslide events in the future. The receiver operating characteristic (ROC) curves were plotted using the cumulative percentage of decreasing susceptibility index on the horizontal axis and the cumulative percentage of observed landslide occurrence on the vertical axis (Fig. 16a, b). In this study, the success rate curves of the models, tested with 80% landslide data, showed that the AUC values were 0.8350 (83.50%), 0.8211 (82.11%), and 0.9057 (90.57%) for FR, WoE, and LR models, respectively (Fig. 16a), and the prediction rate curves tested with 20% landslide data showed that the AUC values were 0.8412 (84.12%), 0.8314 (83.14%), and 0.9091 (90.91%) respectively (Fig. 16b). The modeling results of the AUC values of the ROC curve obtained for FR, WoE, and LR methods show that all the three models used in this study have reasonably high prediction accuracy and can be used for the spatial prediction of landslide. While comparing them with each other based on AUC values, the map produced by LR model presented the best result for landslide susceptibility evaluation. As a result, the LR method was found to be the most successful one.

Fig. 16
figure 16

AUC curves representing quality of models. a Success rate. b Prediction rate

Discussion and conclusions

Souk Ahras area is as stated earlier characterized by mountainous type of relief. It shows high hills (1286 m) and deep wide valleys. It can be observed that side hill slopes are the subject of a very active mass wasting and erosion phenomena. All types of landslide, i.e., rotational slides, planar slides, mud flow, creep, and rock fall occur throughout the study area. They touch, to a different degree, all the rock types and occur on almost all slope angles suggesting the simultaneous interplay of several parameters in the process of landslide occurrence.

A total of 301 landslides were identified and mapped, and ten landslide conditioning factors were considered as the input data for a statistical based landslide susceptibility evaluation and modeling. The statistical techniques used herein are logistic regression, weight of evidence, and frequency ratio methods. The analysis is carried out using GIS technology as it facilitates storing, processing, and display of results in very efficient manner.

The work has resulted in three landslide susceptibility maps. Each one of them is classified into five hierarchic zones of susceptibility, very high susceptibility to very low susceptibility. The most prone sites to the phenomenon occurrence are concentrated in the northwest, central, and southeast parts of the study area. These zones are mainly distributed on the Triassic units: clay and gypsum-sandstone, marls and gray marl clay of lower Campanian-Upper Santonian, siltstones clayey marl, sandstone of upper and middle Miocene, and on cut slopes or embankments alongside roads.

The predictive capacities of the used statistical approaches have been validated by means of the ROC analysis.

The results of landslides susceptibility assessment show that all the three landslide models (Fr, WoE, and LR) have good performance and reasonably high success and prediction rate accuracies. The LSM produced using LR method gives the highest success and prediction rate with an AUC value of 90.57 and 90.91%, respectively, followed by the Fr and the WoE models.

Our work has led to conclude that LR model can give better results compared to WoE and FR models which are close to one other. It is one of the best models used in the landslide susceptibility assessment; it uses a sequence of convergence criterions to maximize the likelihood function for predicting landslide occurrences. The produced susceptibility maps could be a basic pre-requisite for any proposed developmental projects and will be quite useful to find suitable locations for implementing new developments.