Introduction

In recent years, an exponential increase has been observed in the number of natural hazards, such as earthquakes, floods and landslides, which have turned into disasters causing significant economic and social losses, as a result of rapid population growth, inadequate urbanisation, global climate change, and natural earth dynamics. As a result of 6.873 natural disasters occurring worldwide, 1.35 million people died, and an average of 218 million people were affected by these disasters (EMDAT, 2019). Environmental land use planning studies are at the center of the sustainable development process. Therefore, in order to protect natural environments, it is necessary to develop and systematically control the effectiveness of the policies that can protect nature from negative human effects and protect people from the natural hazards (Guzzetti et al. 2006; Regmi et al. 2010; Pradhan 2010; Park and Lee 2014; Chen et al. 2015; Chen et al. 2016; Lee and Park 2016; Wang et al. 2017; Banerjee et al. 2018; Park et al. 2018; Chen et al. 2019; Park et al. 2019; Khan et al. 2019; Cordoba et al., 2020, b). Landslides are one of the most widespread and damaging natural hazards in mountainous regions (Pradhan et al. 2010; Ghosh et al. 2017; Tekin and Çan, 2018; Tekin 2019; Sharma and Mahajan 2019; Panchal and Shrivastava 2020). For large areas where limited information is available for spatial and temporal probabilities of landslide events, landslide susceptibility assessments are considered as the initial step towards the landslide hazard and risk assessment to be used in land use planning (Corominas et al., 2014, b).

Landslide susceptibility maps depict where landslides can occur in the future, in other words, spatial likelihood values, taking into account the environmental factors that cause landslides. In the creation of landslide susceptibility maps, it is necessary to know the complex structure of mass movements and the factors that control these movements. Reliability of landslide susceptibility maps depends on the quality, accuracy of the input data and the selection of the appropriate method to be used in the analyses (Paryani et al., 2020; Pradhan et al., 2020, El-Haddad et al., 2021, Pourghasemi et al. 2012; Corominas et al., 2014, b; Goetz et al. 2015; Hungr 2016; Rossi and Reichenbach, 2016).

In this study, characteristics of landslides and landslide susceptibility modelling of the Ceyhan watershed which is one of the 25 major watershed of Turkey was evaluated that can be assist to Ceyhan river watershed management processes. Watershed management is an efficient way to respond many global challenges such as ecology, water supply, land use, climate change adaptation and disaster risk management. In between disaster risk management actions such as hazard assessment, mapping and zoning, early warning systems and risk reduction interventions have greater influence for effective watershed management actions (FAO 2017). The novelty of this research is that the completeness of the historical landslide inventory of the watershed was first evaluated by frequency size statistics. Secondly the total number of landslides that occurred over time in the area with landslide affected areas were estimated using frequency density curves and magnitude scale using the approach suggested by Malamud et al. (2004). Then, landslide susceptibility map of the watershed was prepared using the logistic regression model. Finally, estimation of the landslide affected area over time by frequency density curves and magnitude scale were compared to the landslide susceptibility map. It has found that almost 10% of the watershed was affected by landslides over time and 20 % of the watershed has found in high to very high susceptibility classes.

Study area and landslide causative factors

Ceyhan watershed is located between E35°40ı to 37°41ı latitudes and N36°40 ı to 38°40 ı longitudes with an areal extent of 21.730 km2. The watershed descends from Eastern Taurus high mountain lands into the Çukurova plain before flow into İskenderun bay in the eastern Mediterranean (Fig. 1). The upstream part of the watershed is characterized by Afşin Elbistan depression area surrounded by high mountain land and its eastern divide separates the Mediterranean basin from that of the Euphrates River. Almost all of Kahramanmaraş and Osmaniye Provinces, a part of Ceyhan and Yumurtalık Districts in Adana Province, the Central Districts of Adana Province and Kozan District are located within the borders of Ceyhan watershed. The Ceyhan River neighbors the Seyhan River from the west and the Asi river basins from the south Ceyhan.

Fig. 1
figure 1

Location of the study area

The Ceyhan watershed area has a transitional climate type between Mediterranean climate and continental climate where winters are generally harsh, droughts can be observed in the summer. Most of the total annual precipitation occurs in winter. According to the Wordclim climate data, the average annual precipitation is 945 mm ((WorldClim (2019) (https://worldclim.org/)) (Fig. 2).

Fig. 2
figure 2

Annual precipitation for study area (https://worldclim.org/)

The study area is located in the upper parts of the Taurus orogenic mountain belt extending east-west. The youngest units in the study area consist of Quaternary deposits. Landforms ranging from marine coastal and delta plain areas to high mountainous areas are observed. Young units are morphological units detrited by the advanced stream drainage system. It generally consists of valley systems with steep and deeply cut slopes and ridges and peaks between these valleys. Thick Paleozoic and Lower Tertiary platform carbonate deposits (Fig. 3), turbiditic sequences, terrestrial clastic and ophiolitic rocks formed by tectonic units in the orogenic mountain belt (Ulu, 2002). These units are covered by thick Tertiary clastic and carbonates. The Paleozoic and Mesozoic basement foundation units are exposed to the deeply incised valleys in the upper parts of the Taurus Mountains. Ophiolites rocks are also observed in similar areas; however, these rocks cover larger areas (Özgül and Kozlu, 2002; Kop, 2003; Usta et al. 2004). Miocene-aged neritic limestones and clastic and carbonated units are observed in the central parts of the study area. These units play an important role on morphology due to their resistance to abrasion. At the same time, their weakness against melting causes the formation of karstic shapes. The karstic structures in the study area are generally small-medium sized. The Miocene or younger Pliocene clastic units transitional with these carbonates form the edge sediments of the Adana basin (Ünlügenç, 1993).

Fig. 3
figure 3

Geology (Akbaş et al. 2011), and active faults (Emre et al. 2018) map in the study area

The elevation ranges from the sea level and attains to 3075 masl. The downstream part of the watershed is represented by Çukurova plain with elevation ranges up to 88 m. In the upstream part around Afşin Elbistan the elevation ranging between 1142 and 1300 m represented by large depression area with largest lignite reserves of the country (Fig. 4a, b). The other parts of the watershed are mainly characterized by deeply incised valleys and dissected topography (Fig. 5a, b).

Fig. 4
figure 4

Large depression area (a) and Lignite reserves (b) of the study area

Fig. 5
figure 5

Land Surface parameter such as DEM (a) and Slope map (b) of the study area

Plan curvature can be defined as a curvature parallel to the slope direction in the profile curvature (Wilson and Gallant, 2000). Plan curvature expresses the flow rate of water on the surface and the transport of sediments along the slope and thus the developing erosion by expressing the slope change rate. Profile curvature is revealed by the intersection of the surface with the horizontal plan; and it can also be expressed as the rate of orientation along certain contours (Wilson and Gallant, 2000). The cross-sectional curvature expresses the topographic convergence and divergence areas, expressing the tendency towards where the water flowing on the surface will join. The topographic wetness index (TWI) is widely used to express the topographic location and dimensions of water-saturated areas with surface flow potential (Moore et al. 1991). The mean slope map of a defined pixel is defined as the average slope value relative to adjacent pixels, and obtained relative to the adjacent pixels of 3 × 3. The roughness index reveals the roughness coefficients of the surface by correlating the height and slope curvature between the defined pixel dimensions (3 × 3 pixels). The detailed statistics of the raster data used in the study are shown in Table 1.

Table 1 Descriptive statistics of the continuous independent variables for the study area

Landslide inventory map

Landslide inventory maps are one of the basic and the most important input data for landslide susceptibility, hazard and risk assessments. Landslide inventory maps are mainly classified by archive and geomorphological types (Guzzetti et al. 2000). Archive inventories are prepared based on historical documents and show location and consequences of individual landslides. Geomorphological landslide inventories are subdivided into three being as historical, event and multi-temporal. Event inventory map represent the landslides after a single triggering agent. Historical inventory maps show the discernible landslides by the time of mapping. Multi-temporal inventories are the most appropriate one of which documents the state, style and distribution of the landslide activities in specific time intervals throughout the region (Guzzetti et al. 2012).

In this study historical landslide inventory data base prepared by Duman et al., 2011 and Can et al. (2013) was used (Fig. 6) and updated by additional field studies together with interpretation from google earth images. According to the archive inventory data prepared by Gökçe et al. (2008), 322 disastrous landslide events were reported in the study area that led to the evacuation of 2000 residences.

Fig. 6
figure 6

Landslide inventory map of the study area

A total of 901 landslides covering 175 km2 with average landslide area of 0.19 km2 were identified in the study area (Fig. 7). It is known that historical landslides inventories are incomplete because some of the landslides disappear over time under erosion, vegetation change or other environmental factors. The completeness of the landslide inventory is evaluated in the “Logistic regression methods” section in detail.

Fig. 7
figure 7

Images of some landslides in the study area

Logistic regression methods

Landslide susceptibility assessments and relationships between landslides, which occurred in a region, and environmental variables controlling them are modelled, where landslides may develop in the future. The preparation of landslide susceptibility map is summarized in two parts: qualitative and quantitative approaches (Soeters and van Westen, 1996; Aleotti and Chowdury, 1999; Guzzetti et al. 1999; Chung and Fabbri 2008; Corominas and Moya, 2010). In the present study logistic regression method was used which is the most widely used statistically based method in landslide susceptibility assessments (Pourghasemi et al. 2018; Reichenbach et al. 2018).

Logistic regression method that is used to determine the cause and effect relationship of the dependent variable and the independent variable in cases where the dependent variable is binary, while the expected values of the dependent variable are obtained as probability according to the independent variable (Atkinson and Massari 2011). In other words, it is a regression method that helps to classify and assign the expected value of the dependent variable according to the independent variable as probability (Fig. 8).

Fig. 8
figure 8

Flow chart in the study

X values are independent variables (landslides causative factors) and β gives the regression coefficients of independent variables. Since the Z value varies between -∞ and + ∞ in Eq. 1, logistic transform is applied to convert it to linear.

$$ {\mathrm{Z}}_{\upiota}={\upbeta}_0+{\upbeta}_1{\mathrm{X}}_1+{\upbeta}_2{\mathrm{X}}_2+{\upbeta}_3{\mathrm{X}}_3+.\dots +{\upbeta}_{\mathrm{n}}{\mathrm{X}}_{\mathrm{n}} $$
(1)

Since the Z value calculated with Eq. 1 varies between -∞ and + ∞, logit conversion is applied for probability calculation (Eq. 2). In this equation, P offers the possibility of an event to occur. P values calculated in the region indicates the likelihood of landslides.

$$ P=1/\left(1+\mathrm{e}\hbox{-} \mathrm{z}\right) $$
(2)

This transformation essentially approximates the situation probability –∞ when the P probability value approaches 0, and + ∞ when it approaches 1 (Hosmer et al. 2013). In the data sets to be used in the logistic regression method, the ratio of the bivalent (1, 0) dependent variable to each other affects general accuracy classification results. In this case, the logistic regression model gives results in favor of the class with high values (Hosmer et al. 2013). In such a case, in the logistic regression method, modeling is done by selecting an equal number of dependent variables for both classes (Hosmer et al. 2013; Heckmann et al. 2014; Tekin and Çan 2018).

Results and discussion

Completeness of the landslide inventory

Although the accuracy, reliability, and completeness of the landslide inventories have great importance there is no standard criteria to evaluate these concepts (Galli et al. 2008, Guzzetti et al. 2012; Trigila et al. 2010; van Westen et al. 2008). In some studies, the frequency landslide area relationships of different landslide inventories have been evaluated for landslide characteristics and completeness (e.g., Galli et al. 2008; Guzzetti et al., 2006; Guzzetti et al. 2008; Malamud et al. 2004; Van Den Eeckhaut et al., 2006.

In this study using the method proposed by Malamud et al. (2004), the probability density distribution and the landslide magnitude scale were analyzed to estimate the number of landslides and landslide affected areas over time in Ceyhan watershed. Malamud et al. (2004) found that three complete event landslide inventories triggered by different factors were well approximated by the same probability density function called as three parameter inverse gamma which is also known as three parameter Pearson type 5 distribution (equation 3) (Fig. 9a).

$$ p\left( AL;\rho, a,s\right)=\left(\frac{1}{N_{LT}}\frac{\delta {N}_L}{\delta {A}_L}\right)=\frac{\mathit{\exp}\left(-a/\left({A}_L-s\right)\right)}{\beta \Gamma \left(\rho \right){\left(\left({A}_L-s\right)/a\right)}^{\rho +1}} $$
(3)
Fig. 9
figure 9

According to of the historical landslide inventory the probability density distribution (a), theoretical frequency density curves (b)

where

A_L is the area of the landslide polygon (km2)

ρ continuous shape parameter that control the negative power-law decay for medium to large landslides

a continuous scale parameter that control the maximum of the probability distribution

scontinuous location parameter that control the positive power-law decay for small landslides

N_LTtotal number of landslide in the inventory

(δN_L)/(δA_L )ratio of number of landslides to different landslide area intervals, frequency density of landslides (km-2)

Γ(ρ)gamma function of ρ

Applying the method proposed by Malamud et al. (2004), the probability density distribution and the best-fitted three-parameter inverse gamma probability density function for Ceyhan watershed landslide inventory is given in Fig. 9a. The ρ, a, and s  parameters are determined as 1.25, 0.124 km2 and -0.0102 km2 respectively. Similar power law exponent –(ρ+1)= -2.25 was found with a rollover location at AL=5.1×10-2 km2 which is two logarithmic orders larger than the general probability distribution given by Malamud et al. (2004). Malamud et al. (2004) and Van Den Eeckhaut et al. (2006) were also found that the historical landslide inventories show a power law tail for medium to large landslides and a rollover for smaller landslides that is shifted to the right. The reason for this deviation is regarded due to the incompleteness of the historical landslide inventories. A landslide- event magnitude scale, mL, was also proposed by Malamud et al. (2004) in relation to the total number of landslides associated with the triggering event using the Equation 4;

$$ \mathrm{m}={\log}_{10}\left({\mathrm{N}}_{\mathrm{LT}}\right) $$
(4)

where NLT is the total number of landslides. According to the equation, an event triggering 10 and 108 landslides has a magnitude scale, mL, between 1 and 8. Malamud et al. (2004) also suggested use of the same approach for historical landslide inventories to determine the total area, total volume and total number of the landslides by calculating the frequency density (f(AL) (km-2) from Eq. 5.

$$ f\left({\mathrm{A}}_{\mathrm{L}}\right)={\updelta \mathrm{N}}_{\mathrm{CL}}/{\updelta \mathrm{A}}_{\mathrm{L}=}{\mathrm{N}}_{\mathrm{L}\mathrm{T}}p\left({\mathrm{A}}_{\mathrm{L}}\right) $$
(5)

The second part of the equation allows theoretical frequency density curves for various landslide magnitude scales to be determined by multiplying the probability density of landslides with the total number of landslides.

The frequency density distribution of the historical landslide inventory for Ceyhan watershed obtained by Eq. 5 is given in Fig. 9b with theoretical frequency density curves. The power law tail of the frequency density is in good agreement for landslides larger than 0.4 km2. The magnitude of the historical landslide inventory for Ceyhan watershed is calculated as mL=5.8±0.1. This estimate suggest that the historical landslide represent less than 1% of landslides that have occurred in the watershed. The multiplication of the estimated total landslides in the study area, with the mean landslide area of 3.07×10-3 km2, as obtained by Malamud et al. (2004), gives a total landslide area of about 1990±450 km2, which is 9.15 % of the study area. According to this estimate the present historical landslide inventory represents approximately 8.85 % of the previously landslide affected area.

Landslide susceptibility

In quantitative data driven landslide susceptibility assessments, either vector based or pixel based specific terrain mapping units should be selected to prepare the data matrices. In this study pixel size of 150 * 150 m was used for terrain units. In pixel based landslide susceptibility modelling several sampling approaches were used to sample the landslide affected pixels (e.g., Suzen and Doyuran 2004; Can et al. 2005; Clerici et al. 2006; Van Den Eeckhaut et al. 2006; Gorum et al. 2008; Nefeslioglu et al. 2008; Yilmaz 2010; Nefeslioglu et al. 2012; Regmi et al. 2014; Hussin et al. 2016, Tekin and Çan 2018).

In this study, method proposed by Tekin and Çan (2018) was used considering the whole single landslide polygons in the selection of 80 % of landslide training and 20% of landslide validation data. They have found that the prediction success rates of the susceptibility maps were higher for the pixels selected the considering the entire landslide polygons than the random pixel selection from any part of the landslide polygons. In this manner landslide, susceptibility assessments were performed on equally proportioned three different landslide affected and landslide free pixels.

The confusion matrix of the logistic regression model with highest performance is shown in Table 2, and the general accuracy value was obtained as 76.9 % (Table 2).

Table 2 Classification table logistic regression model results

Geological map, which is one of the parameters used as an independent variable, was evaluated as categorical data, whereas other variables were evaluated in the analysis as continuous data. In the logistic regression equation, X values indicate the independent variables, which are considered as landslide causative factors; β, on the other hand, indicates the regression coefficients of the independent variables. Another test that reveals the significance between variables is the Wald Test. β values express the relative effect of each independent variable on the dependent variable. Therefore, if the coefficient is increased, the degree of susceptibility is high and low. It means that it is low in susceptible (Table 3).

Table 3 Beta coefficients and Wald test statistics of the variables for dataset

The analysis of the study area as a result of the landslide susceptibility map and the probability values of the study area were evaluated in 5 categories between very low and very high through the consideration of the equal intervals (Fig. 10). The results of the 3 susceptibility maps have high predictive accuracy, but among these maps, according to the map with the highest statistical result, the percentages are as follows: 45.50 % very low, 19.42 % low, 13.49 % medium, 13.44 % high and 8.16 % high in the study area. It is observed that the test landslides are within the susceptible class range of 9.78 % very low, 8.76 % low, 15.38 % medium, 39.78 % high and 32.30 % very high (Fig. 11a), train landslides are within the susceptible class range of 2.47 % very low, 8.56 % low, 16.59 % medium, 38.97 % high and 33.41 % very high (Fig. 11b), and all landslides are within the susceptible class range of 2.61 % very low, 8.56 % low, 16.33 % medium, 39.06 % high and 33.45 % very high (Fig. 11c). The performance evaluation was performed with the ROC, which gives the accuracy statistics of the logistic regression analysis results, and the area under the curve (AUC) was found to be range between 0.831 and 0.842 (Fig. 11d).

Fig. 10
figure 10

Landslide susceptibility map of the study area

Fig. 11
figure 11

The test (a), train (b), all (c) landslides Success and prediction rate curves and AUC values of ROC curves in the landslide susceptibility map (d)

The Jackknife curves of the geological map, mean slope, roughness and slope parameters used in the study were calculated. Jackknife curves show the effect value of each variable on the landslide susceptibility map. Considering this, it is seen that the most effective parameter in landslide formation in the Ceyhan Watershed is geology, roughness, and slope (Fig. 12).

Fig. 12
figure 12

Jackknife test for AUC of individual environmental variable importance for landslide susceptibility map

The minimum, maximum, mean and the range of standard deviations around the mean are presented, according to Jackknife results, important environmental variables for high and very high landslide susceptibility area. As it could be seen in Fig. 13, a Box and Whisker plot displays, roughness, Dem, mean slope, slope variables in the total of high-very high landslide susceptibility areas. The values of the roughness parameters range from a minimum of 2.24 to a maximum of 16.41 with an average of 7.38. The average DEM parameter ranges from 0 to 3008 with an average of 1016, whereas the mean slope ranges from nearly 1.29 to 46.03 with an average of 14.70. Lastly, the slope variable ranges 0 to 57.51 with an average of 14.83.

Fig. 13
figure 13

Box and Whisker graph for important variables in total of high and very high susceptible areas for landslide (minimum and maximum values, mean value (black circle), range of two standard deviations around the mean)

Conclusions

River watersheds have important dynamic natural processes that configure the landscape features under the control of climate, geology, hydrology and biodiversity. In this study, historical landslide inventory data and landslide susceptibility mapping of the Ceyhan Watershed was evaluated. Landslides are complex natural events considering the type of material, type of failure, rate of movement, together with a wide variety of conditioning and triggering mechanisms under different geological, geomorphological, physical and man-made situations. For this reason, it is not possible to collect the same level of information for all of the landslides in the inventory. In this study the completeness of the landslide inventory was evaluated using frequency size statistics. It has seen that power law decay was obtained for medium to large landslides similar to historical landslides in the literature. The total estimated landslide affected area using landslide frequency magnitude relationship was estimated almost 10% of the study area that is 11 times higher than the mapped landslides. In the final landslide susceptibility map the high and very high susceptible zones were found 20 % of the study area. The results suggest that the landslide database and the considered landslide preparatory factors enable accurate and reliable susceptibility zonation to be used in the frame of Ceyhan river watershed management.