Introduction

The soil is regarded as an important natural resource in each country that different types of erosion cause its loss. Risk assessment and quantification of the soil erosion are two main activities to provide better political plans for natural resources, agriculture and environment (Feng et al. 2010; Mandal and Sharda 2013; Zhao et al. 2013). Soil erosion and sediment yield are complex issues which are affected by different factors (Choubin et al. 2018). Several studies have described on-site (e.g., loss of productive capacity and land degradation) and off-site (e.g., sedimentation of reservoirs, damage to the infrastructure, etc.) effects of soil erosion which often cause environmental and economic problems (Riksen and De Graaff 2001; Ledermann et al. 2010; Shi et al. 2012; Mullan 2013; Mekonnen et al. 2015). Quantification of the relationship between soil erosion and its conditioning factors is a big challenge and have attracted many researchers to itself (Bakker et al. 2005; Koulouri and Giourga 2007). In fact, quantification of the soil erosion is a process with complex and unstructured decision nowadays; therefore, it is necessary to conduct a comprehensive and systematic method to gain this goal (Renschler and Harbor 2002; Bahadur 2009; Conoscenti et al. 2013).

In the recent past, different methods have been introduced for soil erosion and sediment yield estimation. One of the most famous methods is Universal Soil Loss Equation (USLE) that is applied in many countries and different characteristics (Harmon and Doe 2001). Some of its applications have been done in developing countries such as Meusburger et al. (2013), and Csáfordi et al. (2012). The revised USLE (RUSLE) was introduced as a modification of the USLE model to calculate the mean yearly erosion (Duarte et al. 2016; Renard et al. 1997; Abdullah et al. 2017). Another model which has been used in other researches is Erosion Potential Method (EPM), and empirical model that was produced to predict soil erosion and sediment yield (Amiri 2010). Modified Pacific South-west Inter-Agency Committee (MPSIAC) framework was produced to estimate erosion in the USA (Ilanloo 2012). Application of the MPSIAC framework has been reported acceptable in different papers in Iran under arid and semi-arid features (Shahzeidi et al. 2012; Bagherzadeh and Daneshvar 2013; Taheri et al. 2013). In a study, Abdullah et al. (2017) used MPSIAC, EPM, and RUSLE models to predict soil erosion in Umm Nigga area, Kuwait. The findings of this study elaborated that MPSIAC had better performance than the other models, followed by EPM and RUSLE models.

On the other hand, other researchers have presented a new approach to spatial prediction of susceptibility to different kinds of erosion (e.g. gully, landslide, rill, and interrill) by using statistical and machine learning models. For instance, Conoscenti et al. (2008) analyzed relationships between geo-environmental factors and the spatial distributions of the erosion landforms using a geostatistical multivariate approach. Rahmati et al. (2016a) evaluated the capability of frequency ratio (FR) and weights-of-evidence (WofE), two common statistical models, for spatial prediction of gully erosion susceptibility. They demonstrated that FR and WofE models are efficient tools in gully susceptibility assessment. In another work, Rahmati et al. (2017a) also applied a conditional probability (CP) model to model the susceptibility of gully erosion in a semi-arid region, Iran. Angileri et al. (2016) generated a susceptibility map of water erosion by implementing stochastic gradient Treeboost in a study area in Italy. In this work, rill-interrill and gully types of erosion were studied. For their investigation, they regarded altitude, landform, aspect, and land-use as the most necessary factors affecting the rill and interrill erosion. Conforti et al. (2011) used an information value model to map gully erosion susceptibility that land use, stream power index (SPI), slope, aspect, topographic wetness index (TWI), slope length factors were considered for modelling. Their findings implied that 88% of the gullies fell in high and very high susceptibility categories. Märker et al. (2011) used stochastic gradient boosting and bootstrap aggregation models to predict the potential spatial distribution of erosion processes. They stated these models provides insights into factors controlling erosion processes and also are valuable tools in soil erosion conservation and geomorphology. Conoscenti et al. (2014) applied a GIS-based logistic regression model in central-northern Sicily, Italy, for susceptibility mapping of gully erosion. Maximum entropy is a data mining model which has been conducted in several fields of study including ecological modeling (Phillips et al. 2006; Phillips and Dudík 2008; Kleidon et al. 2010; Elith et al. 2011; Harte and Newman 2014), groundwater potential mapping (Rahmati et al. 2016b), and landslide susceptibility mapping (Vorpahl et al. 2012; Felicísimo et al. 2013; Park 2015; Kornejady et al. 2017a).

In the current study, two major objectives were considered which are: (i) assessing the capability of maximum entropy model for susceptibility mapping of rill erosion which is being applied for the first time, and (ii) application of the MPSIAC framework for estimating soil erosion rate in the study area. For conducting this research, Golgol watershed, Ilam province, Iran was chosen since it is one of the sub-watersheds of the Ilam Dam and the results of this study could be very helpful for management of the dam. Determining the most susceptible areas to rill erosion could be an initial step for soil conservation plans resulting in lower costs and required time.

Material and methods

Study area

The Golgol watershed, Ilam province, Iran is located between 46° 27′ to 38° 46′ eastern longitudes and 33° 25′ to 33° 38′ northern latitudes (Fig. 1). Elevation in the Golgol watershed ranges from 1013 to 2156 m. The average rainfall and temperature in the study area are measured as 580 mm, and 16.9 °C, respectively. The main species in forest lands of the Golgol watershed is Quercus brantii. There is a reservoir dam at low lands of the study area for supplying drinking water of Ilam city. Produced sediments from the sub-watersheds can fill the dam and are regarded as crucial threats to sustainable development in the study area. In a geological view point, this area is located in the folded Zagros zone with different lithological units such as limestone, shale, valley terrace deposits or a combination of them.

Fig. 1
figure 1

Location of the study area, training and validation rill erosion locations in Ilam province, Iran

Methodology

Application of MPSIAC framework

The MPSIAC framework needs 9 factors to be applied which are surface geology, land cover, soil, climate, land use, channel erosion, present erosion, topography, and runoff (Daneshvar and Bagherzadeh 2012). These layers were provided by employing data obtained from Ministry of Agriculture-Jahad, Iran (MAJ 2014).

Surface geology (Y1)

The surface geology (Y1) was calculated using X1 which is surface geology in PSIAC model. The X1 can be defined based on stone types, hardness and fracture (Daneshvar and Bagherzadeh 2012). It needs to be mentioned that scores of the units were defined between a range of 1 to 10 based on the local condition of Iran (Feyznia 1995).

Soil (Y2)

The soil map was obtained from Ministry of Agriculture-Jahad, Iran (MAJ 2014). This factor can be calculated by erodibility factor (K) as follows:

$$ {Y}_2=16.67K $$
(1)

where, Y2 shows soil variable in MPSIAC, K denotes erodibility factor which can be calculated by RUSLE model (Benzer 2010; Dumas and Printemps 2010).

Climate factor (Y3)

Climate influences the soil and vegetable cover and affects the runoff at watershed-scale. This factor in MPSIAC can be computed as below:

$$ {Y}_3=0.2{P}_2 $$
(2)

where, Y3 denotes climate variable, and P shows two-year returning period six-hour rainfall (mm).

Runoff (Y4)

Runoff has a close relationship with the climate of the watershed. An intense flood which rarely incidents has a high role in yearly sediment yield of the watershed. This variable can be obtained as below:

$$ {Y}_4=0.006R+10{Q}_p $$
(3)

where, Y4 denotes runoff variable, R shows runoff (mm) and Qp specific peak discharge calculated as (m3.km−2.s−1).

Topography (Y5)

This factor was obtained from DEM of the study area. This variable can be calculated as below:

$$ {Y}_5=0.33S $$
(4)

where, Y5 denotes topography variable, and S represents mean slope (%) which was extracted from a 1: 50,000 scale DEM of the study area.

Ground cover (Y6)

This factor includes any cover on the ground which influences the effect of rainfall on the ground such as vegetable cover litter. This factor can be calculated as below:

$$ {Y}_6=0.2 Pb $$
(5)

where, Y6 denotes land cover variable, and Pb represents the bare lands at each land unit (%).

Land use (Y7)

The land use map was obtained from MAJ (2014). This factor can be obtained as follows:

$$ {Y}_7=20-0.2 Pc $$
(6)

where, Y7 denotes land use variable, and Pc shows the canopy covering at each land unit (%).

Upland erosion (Y8)

Upland erosion was determined by employing the US Bureau of Land Management BLM method (Abdullah et al. 2017). This factor can be obtained as follows:

$$ {Y}_8=0.25 SSF $$
(7)

where, Y8 denotes current erosion amount, and SSF shows soil surface factor. The SSF is affected by 7 conditioning factors based on BLM.

Channel erosion (Y9)

Regarding the channel erosion in any watershed, type, shape, geomorphology, and bank erosion of the rivers are essential factors to be considered. This variable could be calculated as below:

$$ {Y}_9=1.67 SSF.g $$
(8)

where, Y9 represents channel erosion, SSF.g shows the gully erosion amount based on BLM model.

Rill erosion susceptibility mapping

In order to create the rill erosion susceptibility map of the study area, first the location of this erosion type was determined in the study area by extensive field surveys and 79 locations were detected. Based on the literature review, these locations were categorized into two classes of training and validation with a 70:30 ratios. In the other word, out of the 79 rill erosion occurrences, 55 cases were randomly selected for training the model, and the remaining 24 cases were used for validation purpose. The training dataset was used for training the maximum entropy (ME) model. Then, rill erosion susceptibility conditioning factors were provided and plotted in ArcGIS 10.2 software. These factors include elevation, slope percent, slope aspect, SPI, topographic wetness index (TWI), distance from the stream, plan curvature, lithology, land use, and soil. In fact, these conditioning factors were used as independent variables, while rill erosion inventory was considered as a dependent variable. In the next step, the ME model was applied by employing the conditioning factors and training locations. Lastly, receiver operating characteristics (ROC) curve was calculated to determine its performance (Chang-Jo and Fabbri 2003; Pourghasemi and Rahmati 2018).

Rill erosion conditioning factors

In this work, ten rill erosion conditioning factors were selected to be employed in the modelling process based on an investigation of the literature (Lu et al. 2001; Cerdan et al. 2002; Govers et al. 2007; Hancock et al. 2008; Auerswald et al. 2009; Wirtz et al. 2012; Angileri et al. 2016). However, a standard methodology and specific framework have not yet been established for the choice of conditioning factors for modeling the rill erosion susceptibility (Conoscenti et al. 2014; Angileri et al. 2016). First, a topographic map of the study watershed with 100,000-scale was prepared. From this DEM, some layers such as elevation, slope percent, slope aspect, SPI, TWI, and plan curvature were extracted (Razandi et al. 2015; Falah et al. 2017; Siahkamari et al. 2017). Elevation in this watershed changes between 1013 to 2156. Slope percent influences the water flow speed and subsequently, the higher slopes would be more susceptible to rill erosion. Slope percent was calculated in this study ranging from 0 to 352.8%. Slope aspect was prepared for the study area and categorized into 9 classes of main, sub-main, and flat. Stream power index signifies the erosion power of the flow in the studied region. This factor was calculated in SAGA-GIS software and changes from 0 to 81.75. TWI is another topo-hydrological factor which was developed by Moore et al. (1991) and can be obtained as below:

$$ TWI=\ln \left(\alpha /\tan \beta \right) $$
(9)

where, α represents the accumulative area which flows to a point and β shows its slope angle. Plan curvature is the last topographic factor which was considered in this study. This factor could be regarded as a contour line that is built by intersection of the horizontal plane and the surface (Yilmaz et al. 2012; Ghorbani Nejad et al. 2017).

Lithology was obtained from a geology map of the study area with 1: 100,000- scale (GSI 1997). As it can be seen, there are four categories of lithology in the study area KEpd-gu, Kbgp, Qft2, and OMas (Table 1).

Table 1 Lithology of the Ilam Dam watershed, Iran

Landuse of the study area was prepared by the Enhanced Thematic Mapper Plus (ETM+) images and supervised classification using maximum likelihood algorithm method in ENVI 4.2 software. Five classes of land use exist in the study watershed which are agriculture, rangeland, residential, and fragmented forest (Fig 2i). The soil map of the study area is comprised of Entisols, Inceptisols, and Vertisols (Fig 2j). In addition, distance from rivers map of the study area was calculated by implementing Euclidean distance function in ArcGIS 10.2 (Fig. 2f).

Fig. 2
figure 2figure 2

Rill erosion conditioning factors

Rill erosion susceptibility mapping by maximum entropy model

The ME model was developed by Phillips et al. (2006) which was first employed for ecological studies in modelling species distribution modelling (Rahmati et al. 2016b). This approach could be implemented by only having incident data (i.e. erosion locations data in this study). The basis of the ME is machine learning technique which makes it possible to predict the incident from deficient data (Medley 2010). The probability distribution in this model considers a set of limitations which are gained from the incident data by investigating the conditioning factors (Felicísimo et al. 2012, 2013). The output of this model is a map that represents the probability of rill erosion incident at each pixel of the study area. This model was applied in MaxEnt software.

Results

MPSIAC framework

The results of MPSIAC are represented in Table 2. The factors of this model are explained below. In the case of surface geology factor, X1 values, as well as Y1 values, range from 3.65 to 9.89 with an average of 7.67. In the respect of soil factor, Y2 changes from 6.83 to 8.67 with an average of 4.51. The results of runoff showed that Y4 ranges from 3.52 to 5.28 having an average of 4.51. In the respect of topography, y5 has maximum, minimum, and average values of 4.32, 5.5, and 4.91, respectively. Ground cover ranges from 3.21 to 4.82 having an average value of 4.02. In the case of land use, minimum, maximum and average values are 8.96, 10.76, and 9.841, respectively. In the respect of upland erosion, Y8 ranges between 5.95 and 8.2 having an average of 7.18. In the respect of channel erosion, Y8 ranges from 6.91 to 10.18 with an average value of 9.282.

Table 2 The scores of different factors in MPSIAC model

Annual sediment yield of hydrological units

Yearly sediment yield (Qs) (m3/km2/y) was calculated for each hydrological unit as well as the whole area (Table 3). According to the results of MPSIAC, predicted soil loss changes from 124.36 m3/km2/y to 200.97 m3/km2/y with for hydrological unit 10 and 13, respectively. The soil loss predicted for the whole watershed is calculated as 170.67 m3/km2/y.

Table 3 The results of sediment yield calculation by MPSIAC model for each hydrological unit and the whole area

Susceptibility map of rill erosion

The results of the response curve in ME model are represented in Fig. 3. In the case of elevation, most of the erosion locations are concentrated at 1200-1600. The results depict that slope percent of higher than 25 is the location of most rill erosions occurred. In the case of slope aspect, south, south-west, and west-facing aspects have more rill erosion incidents. The results of SPI depicts that SPIs more than 6 have a higher frequency of rill erosion occurrence in the studied area. In the respect of TWI, areas with TWI values between 8 and 13 had the highest occurrence of rill erosion. Distances from the river of 0 to 1000 had the highest concentration of the rill erosion incident. In addition, it was observed that a reverse relationship exists between this factor and rill erosion occurrence. In the respect of plan curvature, values lower than 0 had a high concentration of this erosion. The results of response curve for lithology showed that Kbgp had the highest erosion incident, while the lowest amount was seen in OMas class. In the case of land use, fragmented forest and agriculture were seen to have the highest erosion occurrence, respectively. The results of response curve for soil factor represented that Vertisols and Entisols had the highest amount of rill erosion.

Fig. 3
figure 3

Response curves for each erosion conditioning factor

In addition, the importance of rill erosion conditioning factors was assessed by using a Jackknife test as shown in Table 4. Accordingly, it can be seen that land use, slope percent, aspect and SPI have been selected as the most important erosion conditioning factors (ECFs) with contribution percent values of 14.5, 13.58, 13.3, and 12.2, respectively. On the other hand, TWI, plan curvature, and soil were identified as the least important factors with contribution percent values of 3.2, 7.1, and 7.5, respectively (Table 4).

Table 4 The contribution of each rill erosion conditioning factor in the modelling process

Erosion susceptibility map (ESM) produced by ME model is represented in Fig. 4. The ESM was classified into low, moderate, high and very high classes. As it can be seen, low, moderate, high, very high classes consist 19.61, 34.22, 28.76, and 17.39% of the studied region, respectively (Table 5, Fig. 5).

Fig. 4
figure 4

Soil erosion susceptibility map produced by ME model

Table 5 Area percentage related to different classes of the ESM produced by ME model
Fig. 5
figure 5

Classified soil erosion susceptibility map of the study area produced by ME model

Evaluation of the model performance

In several work, receiver operating characteristic (ROC) curve has been used for classification and validation purposes in different issues such as landslide, flood, groundwater, and forest fire (Rahmati and Melesse 2016; Hong et al. 2017; Chen et al. 2017a, b; Naghibi et al. 2017c; Rahmati et al. 2017b; Rahmati and Pourghasemi 2017). Thus, in this investigation, ROC curve was utilized to evaluate the ESM produced by ME. The area under this curve depicts the how efficient is the model in classifying incident and non-incident of the erosion (Yesilnacar and Topal 2005; Tahmassebipoor et al. 2016; Haghizadeh et al. 2017). The area under the curve (AUC-ROC) value of close to 1 depicts high performance of the model, while a lower value represents the weakness of the model in classifying the event (Naghibi et al. 2017a, b). For conducting this curve, the same number of non-erosion locations similar to the erosion locations were regarded and the values of the ME were extracted for them. Then, these values were entered in SPSS 20 and ROC plot was generated. Fig. 6 shows AUC-ROC for the ME model regarding training and validation datasets. Accordingly, AUC-ROC values for training and validation datasets are 0.867 (86.7%), and 0.794 (79.4%), respectively.

Fig. 6
figure 6

ROC plot for training and validation data by ME model

Discussion and conclusion

There are various kinds of erosion such as gully erosion and rill erosion, each one affecting by different conditioning factors; hence, it is necessary to investigate them separately and consider their specific characteristics (Vandekerckhove et al. 2000; Kornejady et al. 2017b). Considering these differences, the current study was conducted to investigate the parameters affecting rill erosion and their importance by Jackknife test, generate rill erosion susceptibility map by the ME model, and define the sediment yield at each hydrological unit of the watershed.

According to the results of MPSIAC, the highest and lowest predicted values of soil loss were observed in hydrological units 13 and 10, respectively. The soil loss predicted for the whole watershed was calculated as 170.67 m3/km2/y. The higher soil loss in hydrological unit 13 could be related to its lithological characteristics which are comprised of marl, and anhydrite with a calcite layer in between. The higher susceptibility of marl and anhydrite causes higher susceptibility of this hydrological unit to rill erosion. In the case of hydrological unit 10, it is formed by calcite which is a resistant geological unit to erosion representing the lowest value of erosion.

Furthermore, the findings of this study showed that the ME model was successful in predicting the probability of rill erosion occurrence in both training and validation datasets. The higher value of AUC-ROC than 0.70 (70%) shows that the ESM produced in this study is trustable and could be implemented in other studies with similar conditions. The ME model is suitable for modelling natural phenomena such as rill erosion (Rahmati et al. 2016a). It needs to be mentioned that ME machine learning algorithm does not need earlier outlier removal. Another capability of this model is that it is able to predict incident of an event which is complicated and has a nonlinear structure (Phillips et al. 2006; Phillips and Dudík 2008; Kornejady et al. 2017a).

In addition, the importance of ECFs was assessed using Jackknife test. The high contribution of land use, slope percent, aspect and SPI and low contribution of TWI, plan curvature, and soil were identified in the study area. Two highly important factors of slope percent and SPI strongly influence the erosion power of water flow in a watershed, and their high importance could be justified with this point of view (Shrimali et al. 2001; Yesilnacar and Topal 2005; Sharma and Tiwari 2009; Conforti et al. 2011).

Finally, the findings of this study confirmed the acceptable performance of the ME model in producing rill erosion susceptibility map validated by training, validating data sets accompanied by ROC curve. The result is in agreement with findings reported by Pourghasemi et al. (2017), who applied the ME model to predict the susceptibility of gully erosion in Iran. The ME is a general-purpose machine learning model which presence-only property of the model can be considered as a strong advantage in remote and inaccessible areas. This feature is mostly important to soil erosion and landslide studies since one cannot reject the possibility of erosion/landslide occurrence even in the absence of the phenomena (Kornejady et al. 2017a). However, as a main disadvantage of the ME model, inadequacy of geo-environmental factors and lack of attention to the erosion process can threaten the prediction accuracy. As another conclusion, the high contribution of land use, slope percent, aspect and SPI and low contribution of TWI, plan curvature, and soil were identified in the study area. Investigating the results of MPSIAC and ME models show that, in total, the southern part of the study area have a higher susceptibility to rill erosion compared to the northern part; however, in some places located at the northern part of the study area, high susceptible areas exist. This finding can be a useful tool for soil erosion control and conservation plans in the studied area. A deeper investigation of the results enlightens that high susceptible areas to rill erosion are often located in fragmented forest and agriculture land use classes. Inappropriate wood logging and transportation strategies as well as unsuitable agricultural plans and management may have caused this relationship between land use and rill erosion susceptibility. Considering the acceptable application of the ME model in this research, its utilization in other areas with different characteristics can be suggested in order to validate this methodology for a more general and wider application. At last, it could be suggested to use newer data mining models and coupling statistical and data mining models to get better results in mapping rill erosion susceptibility.