Keywords

3.1 Introduction

Gully erosion (GE) is the “erosion process whereby runoff water accumulates and often recurs in narrow channels and, over short periods, removes the soil from this narrow area to considerable depths” (Poesen et al. 2003). The channel formed from this erosion process is generally classified as a “gully”. Because it is a geographically widespread phenomenon, it is also known by a variety of native names like Ravine, Uvrag, Wadi, Nuallah or ‘Cho, Hakisan Gaung, Carcava or Arroyo, Donga, and Kwari (in Hausa language) in France, Russia, Arabic, India, Malaysia (Malay), Spain, South Africa, and Northern Nigeria, respectively, (Castillo and Gómez 2016; Maria and Nicolae 2017).

Gullies are commonly classified as “permanent” and “ephemeral” gullies. A permanent gully is a wide and deep channel eroded by concentrated flow removing the upland soil and parent materials, which cannot be eliminated through conventional tillage operations. Whereas ephemeral gullies are formed by a concentrated overland flow which can be remediated by conventional tillage operations (Garosi et al., 2019). Ephemeral gullies are smaller than permanent gullies but larger than rill systems. Unlike rills, ephemeral gullies are formed in the same location each season.

GE is a threshold phenomenon that is controlled by numerous factors. Thus, gullies occur only after a threshold of runoff erosivity and soil erodibility has been crossed. In addition to rainfall, runoff erosive power depends on topography that controls the discharge, concentration, and velocity of overland flow (Conoscenti et al., 2013). Morphology, density, and development of gullies in a particular area are also substantially regulated by the underlying bedrock (Poesen 2011). The incidence of a gully is also governed by soil resistance, which is dictated by soil properties such as texture, bulk density, moisture conditions, and organic matter content (Poesen et al. 2003). GE is also related to the type and stage of crop production, as well as tillage direction and conservation practices (Li et al. 2016). Also, several studies have reported triggering of gullies or increasing of GE rates as being caused by land use changes, intensification of farming activities, and overgrazing (Conoscenti and Rotigliano 2020).

Once formed, gullies become major sediment sources and often cause environmental problems within their reach (on-site effects) and downstream (off-site effects). Generally, GE results in different consequences: (i) significant land degradation and loss of productive capacity, (ii) high sediment yields and sediment discharge, which can transport both nutrients and pollutants, and (iii) sedimentation of reservoirs (reducing the water capacity of the reservoirs) and damage to the infrastructure and transport routes. Ecologically, GE can cause associated ecological problems such as eutrophication and acceleration of desertification processes. Therefore, the prediction of gully erosion susceptibility is the first and most important step in averting the undesired effects of gullies and achieving sustainable development (Rahmati et al. 2017b).

According to Domínguez-Cuesta (2013), susceptibility reflects the condition of becoming weak or easily influenced. In natural hazards terms, susceptibility is related to the spatial dimensions of hazards. It refers to the tendency of a region to experience the consequences of certain dangerous activity (e.g., floods, earthquakes, erosion, etc.) without considering either the moment of occurrence or possible casualties and financial impacts. Thus, GES may be defined as the probability of spatial occurrence of GE based on the relationships between the distribution of gullies in the past and factors that influenced their occurrence.

The prediction of GES involves several qualitative (Knowledge-driven) and quantitative (data-driven) methods. The knowledge-driven methods depend on expert opinions. The most common types of these methods simply use GE inventories to characterize sites of similar geographical properties that are susceptible to GE. Certain knowledge-driven approaches, however, incorporate the idea of ranking and weighting and may evolve to be semi data-driven in nature. Examples are the use of the analytic hierarchy process (AHP) of Saaty and Vargas (2012) by Arabameri et al. (2019b) and weighted linear combination (WLC) by (Sujatha and Sridhar 2019). AHP involves building a hierarchy of decision elements (factors) and then making comparisons between possible pairs in a matrix to give a weight for each element and also a consistency ratio. It is sustained by three principles: decomposition, comparative judgment, and synthesis of priorities. WLC is a concept to combine maps of GE influencing factors by assigning a standardized score (primary-level weight) to each class of a particular parameter and a factor weight (secondary-level weight) to the parameters themselves. Being partly subjective, the results of these approaches vary depending on the knowledge of experts (Arabameri et al. 2019c).

Data-driven approaches rely on statistical description of the relationship between GE and influencing factors. They statistically assess the combination of the influencing factors that are more closely related to the spatial distribution of existing gullies. Therefore, according to the prevailing terrain conditions (e.g., slope angle, slope shape, lithology, land cover) it is possible to quantitatively predict the likelihood of future gully occurrence even in non-gully affected areas. In this process, data from past and present gullies are used to evaluate the relative importance of the influencing factors and respective classes. Data-driven GES models are sustained by three major assumptions: (i) gullies can be recognized, classified, and mapped; (ii) GE influencing factors can be identified, registered, and used to build GE predictive models; and (iii) future GE occurrence can be spatially predicted (Arabameri et al. 2019c; Zêzere et al. 2017). In other words, following the Uniformitarianism principle, the past and the present are considered keys to the future, hence it is assumed that future gullies are more likely to occur under the same geologic and geomorphologic conditions that led to past gullies.

Data-driven methods can be grouped in bivariate statistical analysis (BSA) and multivariate statistical analysis (MSA) (Lucà et al. 2011). The BSA compares independently each influencing factor with the GE distribution. Weights of the GE influencing factors are assigned based on gully density using different methods such as information value, weights of evidence, among others. Bivariate statistical models do not consider the interdependence of predictive variables, and this is a major drawback of the method (Arabameri et al. 2019c; Zêzere et al. 2017). The MSA evaluates the combined relationship between the dependent variable and a set of independent variables.

Numerous MSA methods exist, but those commonly used to predict GES include linear and quadratic discriminant analyses and logistic regression. Linear and quadratic discriminant analyses have been used by Arabameri and Pourghasemi (2019) to predict GES in Iran. The method was also reported to be significant to define GES classes in the Pathro River Basin of India (Gayen et al. 2019). Logistic regression has been applied for susceptibility mapping by various researchers including Lucà et al. (2011), Conoscenti et al. (2014), Reza and Ronak (2015), Arabameri et al. (2018), Razavi-Termeh et al. (2020), and Arabameri et al. (2020c). However, the logistic regression model is the most widely used among the MSA approach. Compared with other MSA methods, its independent variables do not need to be linearly related or normally distributed, and can be categorical, continuous, or their mixture. In addition to this, the logistic regression model can effectively combine with other statistical analysis methods and/or knowledge-driven methods to improve GES prediction. Hence, the main objective of this study is to extend the application of logistic regression as a technique within data-driven methods. The method was applied in South Gombe State, Nigeria, where land development is restricted by intense GE because of semi-arid climate and physiographic conditions. Also, a thorough literature review to date shows that despite the high sensitivity of this area to GE, no comprehensive studies have been carried out to identify areas that are particularly susceptible to GE at all.

3.2 Materials and Methodology

3.2.1 The Case Study—Southe Gombe State (SGS), Nigeria

SGS occupies about 8023 km2 and is located within Gombe State, Nigeria. It is bounded by latitude 9°30′ and 10°20′ N and longitude 10°40′ and 11°50′ E (Fig. 3.1). SGS is characterized by a semi-arid type of climate marked by two distinct seasons: a rainy season from April to October and a dry season from November to March. The average annual temperature is 27 °C, while the average annual rainfall is about 835 mm with much of it falling between June and September. The rains come in the form of intensive, violent showers of short duration, especially at the beginning and end of the rainy season. The early rains of April which come just after the dry season are very effective in the gullying process. The low relative humidity, characteristic of the dry season, leaves the surface of the soil dry and cracked at various points. These cracks and other human features like footpaths, are rapidly exploited by the runoff from storms of the early rains and greatly favor the inception and subsequent evolution of gullies. Rain in the area may fall continuously for two, three, or more hours—although most of it comes during the first 40 min of the period of fall. They are the type of rainfall that causes so much damage in a relatively short time, especially in places where the soil is bare or is partially covered by vegetation which is incapable of protecting it from the erosive impact of the rain.

Fig. 3.1
figure 1

Location of the study area and gully landforms distribution

The geology of the area comprises of the crystalline basement and Cretaceous sedimentary Formations (Ikusemoran et al., 2018) (Table 3.1). The soils correlate with the underlying geology, and are mostly sandy, low in organic matter, and are characterized by low water-holding capacity. The only exception to this observation is the wetland (fadama) soils that are fine-textured with higher organic matter content and relatively higher water-holding capacity (Ikusemoran et al., 2016). Geomorphologically, the SGS is characterized by complex terrain and dominated by rugged hills of granite and sandstone, volcanic plugs, a sedimentary and volcanic plateau, and low, swampy plains; with elevation ranging from 200 m (a.s.l.) in the central parts to about 1170 m (a.s.l.) in the southern section. Land use in the area is characterized by livestock grazing, rain-fed farming, and dry season fadama (market gardening) cultivation.

Table 3.1 Geologic formations, lithological characteristics, and age of rock materials

3.2.2 Methodology

3.2.2.1 Spatial Data Used for the Study

The data used in this study was collected by integrating field data, remote sensing (RS) data, and geographic information system (GIS) applications. GIS data processing and computing can produce GE maps with low costs and acceptable accuracy. Also, GIS allows fast and easy representation and analysis of spatial data and can generally incorporate information layers from diverse sources (Arabameri et al. 2020d). Thus, the basic maps used in this study were satellite imageries archived by Google Earth, Landsat 8 images (spatial resolution 30 m), soil and geological maps (scale 1:100,000), and a Digital Elevation Model (DEM) with a spatial resolution of 20 m. Table 3.2 presents a summary of the data types used in the present study.

Table 3.2 Summary of data used for the study

3.2.2.2 Spatial Analytical Process

Figure 3.2 displays the flowchart of the approach followed in this study and consists of the following steps: Spatial data collection which involves identification and extraction of permanent GE features (gully inventory data) and assembly of GEIFs; spatial data analysis comprising independence scrutiny (multicollinearity) among GEIFs, and random partitioning of the gully inventory data; susceptibility spatial prediction which involves a determination of the spatial relationship between GE and GEIFs, and estimation of the probability of gully occurrence and generation of GE susceptibility map; and Validation of results.

Fig. 3.2
figure 2

Flow chart of data and procedures followed for the study

3.2.2.3 Spatial Data Collection

As described above, the assumption of susceptibility to GE is that past and/or present gullies are important for understanding the occurrence of future gullies, therefore, the acquisition and construction of GE inventory is considered an important step in geomorphological analyses and stochastic modeling like the present study. In this study, gully landforms (Fig. 3.3) were identified through the interpretation of Google Earth image taken in 2018, and comprehensive field surveys conducted in early 2019 using a global positioning system (GPS) to record coordinates of gullies. The gullies were mapped as polygons but later converted to points by considering the locations of the head-cut portion of each gully. An equal number of non-gully point locations were randomly selected and later combined with the gully locations.

Fig. 3.3
figure 3

Example of some identified gullies and their effects on (ai & ii) agricultural land (bi & ii), and on infrastructure in the study area

As previously mentioned, GES is regulated by a variety of factors. However, it is difficult to use all factors at once (Arabameri et al. 2019a, b, c) and thus it is important to pick key factors that could theoretically affect GE in a specific area. Although, there is no standardized guiding theory for the selection of factors (Arabameri et al. 2018), comprehensive literature review, local environmental conditions in the study area, and multicollinearity checks were utilized to select 14 factors: rainfall, aspect, slope angle (SA), length of slope (LS), elevation, plan curvature (PC), topographic wetness index (TWI), stream power index (SPI), drainage density (DD), distance from stream (DS), distance from road (DR), geology, soil texture (ST), and land use (LU). After selecting and analysis of the GEIFs, it is essential to classify their numerical values for effective modeling, and when representing them on a map so that, visually, they can be as clearly understood as possible. For this reason, existing GISs are equipped with several methods for automatically performing the classification of susceptibility values. The commonly used ones include the quantile, equal interval, Jenks natural breaks, and geometrical interval classifications (Osaragi 2019). Those suitable for data used in this study have been applied.

3.2.2.4 Spatial Data Analysis

3.2.2.4.1 Partitioning of GE Inventory Data

Before its implementation, the logistic regression model must be trained and tested with two separate samples of data. But practice in early research on the implementation of logistic regression showed that for the percentage of training and testing samples, there is no clear rule of thumb. Nevertheless, it is suggested that the percentage for testing should be inversely proportional to the square root of the number of free adjustable parameters (Abdulkadir et al. 2020). Consequently, by using the unsupervised filtering procedures to avoid replications, the GE inventory data was divided into 70% training and 30% testing samples. This is consistent with the percentage of samples considered in related studies (Abdulkadir et al. 2020; Arabameri et al. 2020a; Rahmati et al. 2017a).

3.2.2.4.2 Extraction of GEIFs

Table 3.3 presents the methods of extraction and classification of GEIFs selected for the study. A Digital Elevation Model (DEM) with a spatial resolution of 20 m was produced from 1:50,000-scale topographic contour maps, and used for extraction of topographical factors: elevation, SA, LS, SPI, TWI, DD, DS, DR, aspect, and PC.

Table 3.3 GEIFs and methods used for their extraction and classification
Fig. 3.4
figure 4

GEIF Maps. a rainfall, b elevation, c SA, d LS, e SPI, f TWI, g DD, h DS, i DR, j Aspect, k PC, l geology, m ST, n LU

Elevation plays a significant role in determining vegetation cover type and to a large extend, precipitation characteristics (Gómez-Gutiérrez et al. 2015). Consequently, GE may occur in different elevations, depending on the initiating and developing mechanisms. SA plays a significant role in the dynamics of the processes controlling landscape development; it primarily affects surface runoff, DD, soil erosion, etc., (van der Meij et al. 2017). According to Conforti et al. (2010), steep SA facilitates high runoff velocity and consequent initiation of rill and gully. Hence, the SA is crucial for the prediction of GES. LS is the amalgamation of slope length (L) and steepness (S) of which L controls sediment detachment and generation, and S controls the movement of these sediments in response to intense rainfall and related runoff (Pradeep et al., 2014). It is a key variable used in GES assessment. Theoretically, areas with high LS values are prone to GE activity (Conoscenti et al. 2014; Lucà et al. 2011).

Stream power is an important factor considered in the assessment of GES. It is the potential for flowing water to perform geomorphic work and is used to measure the erosive power of water flow based on the assumption that discharge is proportionate to the catchment area. The index SPI is one of the main factors controlling slope erosion processes since the erosive power of running water directly influences slope toe erosion and river incision (Conforti et al. 2010). It is also indicative of the potential energy available to entrain sediment so that areas with high SPI have a great potential for erosion.

TWI relates local topographic slope to the upslope contributing area at any given location within a watershed. It is commonly employed as a proxy for the potential for surface and subsurface water accumulation due to runoff and lateral transmissivity (Raduła et al., 2018). Since the dynamics, location, and size of saturated source areas are controlled by subsurface hydraulic characteristics and watershed topography, TWI is frequently used to model GES. Technically, areas with larger upslope drainage areas and shallower slopes will produce larger TWI values, indicating a higher propensity for GE.

DD, also known as stream density, is the ratio of the total length of streams or channels in kilometer (km) to the size in km2 of the area being studied. The measurement of DD is a useful numerical measure of landscape dissection and runoff potential. Hypothetically, a high DD reflects a highly dissected drainage basin with a relatively rapid hydrological response to rainfall events, while a low DD entails a poorly drained basin with a slow hydrologic response (Charizopoulos et al. 2019). However, the critical value of DD per square km that may cause soil erosion by water is 0.90 km per square km of area (Surjit et al. 2015).

Gullies are generally connected to the stream network of an area, enabling the removal of the materials eroded from upland areas (Conoscenti et al. 2014). Joshi et al. (2019) observed that first- and second-order streams behave like gullies and accelerate soil erosion mechanism. These streams are normally located on greater elevations with steeper slopes and, thus, get conducive conditions for soil erosion. Further channel incision in those streams leads to their expansion and initiation of gullies in the nearby non-incised surface. For this reason, DS is recognized as a factor of gully development, with the assumption that the areas closer to existing first- and second-order streams are more susceptible to GE (Dube et al. 2014; Zakerinejad and Maerker 2015).

Roads play are important in rural and urban development; however, they have a significant impact on gully occurrence and distribution (Pourghasemi et al. 2017). Roads induce concentration of surface runoff, divert concentrated runoff to other catchments, and increase catchment size, which eventually leads to gully development (Conoscenti et al. 2014). Consequently, the spatial distribution of road networks was examined.

Aspect is a topographical attribute that is considered crucial in GES assessment. It is commonly referred to as the direction to which the slope face. Aspect is expressed in degrees from north and clockwise, ranging from 0 to 360. The value of negative one (−1) is used to indicate flat surfaces such as flood plains and fluvial terraces. According to Conoscenti et al. (2014) and Rahmati et al. (2016), aspect indirectly influences GE processes given its relationship with the duration of sunlight exposure, evapotranspiration, moisture retention, vegetation cover type, and distribution on slopes.

PC is described as the curvature of a contour line formed by intersecting a horizontal plane with the surface. The use of the word curvature technically defines the rate of change of SA or aspect, usually in a particular direction (Conforti et al. 2010). The influence of PC on slope erosion processes is the convergence or divergence of water during downslope flow. While assessing PC, positive values describe convexity, while negative values characterized the concavity of slope curvature. The values of the PC around zero indicate that the surface is flat.

Rainfall is a central factor that drives soil erosion and gully development through its potential ability (erosivity) to disintegrate soil aggregates and transport them downslope (Nearing et al. 2017). Rainfall erosivity is a function of the physical characteristics of rainfall (intensity and long duration). These aspects, in addition to those related to amount, drop size distribution, terminal velocity, and extraneous factors such as wind velocity and slope angle, determine rainfall erosivity (Rutebuka et al. 2020). Rainfall erosivity using annual rainfall distribution data from weather stations in SGS was analyzed.

Geological features and the weathering properties of material exposed or close to the earth's surface affect GE. Both soft and hard rocks are scourged by degradation through fluvial erosion, mass wasting, soil creeps, and landslides. Soft rocks are generally more susceptible to soil erosion than hard rocks. The extent of effectiveness of degradation is a function of the degree of consolidation or cementation among the different sedimentary types. GE in the areas underlain by rocks of the Basement Complex, including younger and older granites, usually occurs along modern and ancient water channels. These water channels have earlier been covered by alluvial deposits. Small scale gullies produced by fluvial action are generally associated with Basement Complex areas. Weathered zones of Basement Complex terrain are strewn with incipient and minor gullies running down slopes and generally oriented in all directions.

The Susceptibility of soil to GE is inclined to its characteristics which are closely associated with geologic settings and contribute expressively to soil infiltration, runoff rate, soil resistance to erosion, and gully occurrence (Rahmati et al. 2016). ST, organic content, structure, and permeability have been shown to influence soil erodibility. Predominantly, ST has a strong correlation with soil erosion as it controls the erodibility and cohesiveness of the soil (Pal 2015). Thus, ST was assessed for the present study.

LUs are among the major environmental factors regulating hydrological and geomorphological processes by controlling overland flow, runoff generation, and sediment dynamics. Naturally, barren lands and sparse rangeland are more susceptible to erosion than pastures and forests with dense cover, where the vegetation cover greatly cuts the erosive action of surface runoff. In fact, there is a negative association between the rate of erosion and the density of vegetation (Hayas et al. 2017). Figure 3.4 shows the 14 selected factors used for the study.

3.2.2.4.3 Multicollinearity Scrutiny

Multicollinearity is a statistical phenomenon that describes interdependency conditions among multiple independent variables. In other words, it is the lack of independence which is indicated by high intercorrelations among a set of variables. In GES assessment, numerous methods have been used to test for multicollinearity. However, the variance inflation factor (VIF) and tolerance (TOL) are frequently used for this purpose (Cama et al. 2017). Hence, the VIF and TOL are used in this study. A VIF ≥ 10 or TOL ≤ 0.1 indicates serious multicollinearity (Guo-Liang et al. 2017).

3.2.3 Susceptibility Spatial Modeling Using MLR

3.2.3.1 Determination of Spatial Relation Between GE and GEIFs

GEIFs have been assessed and their independence confirmed. However, there is the need to determine the relationship between the GEIFs and GE to identify the most influential factors, and successfully predict GES. To achieve this, the forward stepwise logistic regression was employed. Usually, stepwise logistic regression is most often used in situations where the important independent variables are not known, and their associations with the outcome are not well understood. It involves estimating the model with each variable entered in turn and looking at the change in the logarithm of likelihood when each variable is added. If the observed significance level (Wald test) is less than the probability for remaining in the model (0.05 in this study), the variable is entered into the model and the model statistics are recalculated to see if any other variables are eligible for entry. Finally, it becomes a model excluding all insignificant independent variables, and coefficients are allocated to the independent variable classes, correlated with the gully training data set. If a coefficient is positive, its transformed log value will be greater than one, meaning that the event is more likely to occur. If a coefficient is negative, the latter will be less than one, and the odds of the event occurring decreases.

3.2.3.2 Estimation of Probability of Gully Occurrence

The principle of logistic regression rests on the analysis of a problem, in which a result measured with dichotomous variables such as 0 and 1 or true and false, is determined from one to more independent factors. In the case of gully erosion susceptibility mapping, the goal of logistic regression is to find the best fitting model to describe the relationship between the presence or absence of gullies (dependent variable) and a set of independent parameters. Logistic regression generates the model statistics and coefficients of a formula useful to predict a logit transformation of the probability that the dependent variable is 1 (probability of occurrence of a gully event). Thus, in this study, binary Logistic regression estimates the probability (P) of the occurrence of a gully through the formula:

$$P=\frac{1}{1+ {e}^{-z}} \frac{1}{1+ {e}^{-z}}$$
(3.5)

where P is the probability of a gully occurring. Z is a value from  − ∞ to  + ∞, defined by the following equation;

$$Z={b}_{0}+{b}_{1}{X}_{1}+ {b}_{2}{X}_{2}+\cdots {b}_{n}{X}_{n}$$
(3.6)

where b0 is the intercept (constant) of the logistic regression model, n is the number of independent variables, and b1, b2, …, bn are coefficients, which measure the influence or contribution of independent variables (X1, X2, …, Xn). In addition to the model statistics and coefficients, the outcome of the logistic regression process was obtained in ArcGIS as a predicted map of probability defined by numbers that are confined between 0 and 1. The predicted probability values were subsequently reclassified into five susceptibility classes using the Jenks natural classification method to produce the final gully erosion susceptibility map.

3.3 Validation of the GESM

Validation is the task of demonstrating that the model is a suitable representation of the actual system: that it reproduces system behavior with enough reliability to satisfy analysis objectives (Rahmati et al. 2017b). Without validation, the prediction model and image are ineffective and lack scientific significance. As a result, the final users would not be sure of the model outputs and are unlikely to use them for planning and decision-making. Normally, the validation of predictions is based on the comparison between the prediction results and the unknown target pattern, the areas affected by future gully erosion. The unknown target pattern is usually represented by a part of the known gully pattern. Usually, the comparison of the model results and observed data is represented through a confusion matrix (Table 3.4).

Table 3.4 Confusion matrix used for the evaluation of models

According to Table 3.4, TP (true positive) and TN (true negative) also known as sensitivity are the numbers of gullies that are correctly classified, whereas FN (false negative) and FP (false positive) referred to as specificity are the numbers of gullies that are incorrectly classified. Several approaches such as Efficiency, Kappa coefficient, Seed Cell Area Index (SCAI), Area Under Receiver Operating Characteristics (AUROC) among others have been used for validating GESMs. In this study, the AUROC is a graphical plot that illustrates the analytical ability of a model as its discrimination threshold is varied (Vakhshoori and Zare 2018). It is created by plotting sensitivity on the y-axis against specificity on the x-axis. The two parameters are derived as

$$\mathrm{Sensitivity }= \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$
(3.7)
$$\mathrm{Specificity }= \frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}$$
(3.8)

The shape of the AUROC curve indicates the predictive performance of the model, where the predictive performance of the model is higher when the AUROC curve is closer to the upper left corner. The highest possible AUROC = 1 represents 100% specificity and 100% sensitivity. Furthermore, AUROC values of <0.6 indicate a poor, 0.6–0.7 a moderate, 0.7–0.8 a good, 0.8–0.9 a very good, and >0.9 an excellent model performance (Jiang 2020; Vakhshoori and Zare 2018).

3.4 Results and Discussion

3.4.1 Gully Erosion Inventory

The results of the interpretation of Google Earth imageries and field investigations confirmed that the study area is indeed affected by the GE phenomenon. A total of 130 critical gully eroding sites, covering 34,499.2 or 4.3% of the study area were identified and classified based on their position on the landscape and morphological hydraulic and geometry characteristic into three (3) main types of permanent gullies: continuous (35%), discontinuous (55%), and bank (10%). Generally, the gullies are characterized by incisions having near-vertical banks and are mostly large. Gully depth ranges from 0.9 to 12.2 m, less frequently up to 20 m and exceptionally even greater. The length reaches a maximum of 10,857 m on continuous, 497.2 m on discontinuous, and 1092.3 m on the bank gullies. Cross sections of the gullies are mostly U-shaped even though V-shaped gullies are also found. Spatially, it was observed in the northern parts of the study area, where Gombe and Kerikeri geologic formations are massively exposed that gully channels are mostly wide, U-shaped, and lack vegetation cover on the side slopes signifying active gullying stage. While in the mountainous southern part, narrow and V-shaped gullies are common with some vegetation growing on slopes indicating stabilization. However, the presence of falls on some of the channels indicated gully rejuvenation. Table 3.5 presents the morphometric characteristics of the identified permanent gullies in the study area.

Table 3.5 Morphometric characteristics of the identified permanent gullies

3.4.2 Multicollinearity Scrutiny

The results of multicollinearity scrutiny as shown in Table 3.6 indicate that the highest value of VIF is 4.854 and the lowest coefficient of tolerance is 0.206, respectively. Therefore, there is no collinearity between these factors, and this allows the inclusion of all the factors in the final gully erosion susceptibility modeling process.

Table 3.6 Multicollinearity test results for the relationship among factors

3.4.3 Susceptibility Spatial Prediction

3.4.3.1 Spatial Relationships Between GE and GEIFs

The results of the spatial relationship between GE and each GEIF obtained from the forward stepwise logistic regression are displayed in Fig. 3.5.

Fig. 3.5
figure 5figure 5

Spatial relationships between GE and each class of GEIF a rainfall, b elevation, c SA, d LS, e SPI, f TWI, g DD, h DS, i DR, j Aspect, k PC, l geology, m ST, n LU

According to Fig. 3.5a, for all the rainfall classes, there is a positive relationship with GE occurrence since all the coefficient (β) values assigned to each class are positive. However, the rainfall classes 1001–1050 and >1100 with β values 2.261 and 1.183 exhibited a strong positive relationship when compared with the first two classes which were assigned weak positive association. The results suggest that the propensity of GE occurrence in the study area increases with an increase in rainfall amount. The result is in harmony with the findings obtained by Conforti et al. (2010) in the Turbolo catchment, Italy; Rahmati et al. (2016) in the Chavar region, Iran; and Makaya et al. (2019) in the upper uMgeni catchment in KwaZulu Natal, South Africa, who also showed that GE development generally occurs in regions with high rainfall amount.

Regarding elevation, regression analysis indicated that elevation has a positive but weak relationship with GE activity. From Fig. 3.5b it can be observed that an increase in elevation causes a corresponding increase in the probability of GE occurrence. This holds to the elevation class 300–500 m (β = 0.369) from which the relationship reverses and eventually becomes negative at the class >700 m (−0.299). This result agrees with the findings of numerous studies (Alireza et al. 2019; Arabameri et al. 2020d, 2019a; Rahmati et al. 2017b) who agreed that lower elevations are most susceptible to GE.

In the case of SA, a positive spatial correlation exists between gully formation and areas with SA below 30°. This is confirmed by the positive β values 1.387, 0.872, and 0.454 assigned to <5, 5–10, and 11–30 classes, respectively (Fig. 3.5c). Conversely, for SA classes above 30°, β values were negative, indicating weak relationships and low probability of gully occurrence. This result is in agreement with Lazarus (2012); Adediji et al. (2013); Rahmati et al. (2016), (2017a); Debanshi and Pal (2018), and Arabameri et al. (2020d) who found most gullies occurring on lower slopes. According to these studies, the reason for this could be because lower SAs have greater soil depth, intensive rainfall impaction and greater runoff from upslope will decrease soil strength resulting in the development and extension of the gully channels.

The influence of LS on GE as shown in Fig. 3.5d revealed that the probability of gully occurrence increased with increasing LS. The highest probability was found where LS was 6–10 m (β = 1.262). However, immediately after this maximum level, the probability to host a gully abruptly decreased. The reason for this sudden decline was unclear, but it might relate to land use, vegetation, and other factors in the area. The result in this study confirms the findings of Bagio et al. (2017) and Zabihi et al. (2018) that a direct relationship exists between the LS and GE. Their result also implied that the higher the LS, the higher the probability of GE occurrence due to increased runoff velocity and a decreasing detachment and transport threshold of soil particles.

Concerning SPI, a significant positive association exists with GE. As shown in Fig. 3.5e a strong positive correlation was found in the class 51–100 (β = 3.456) and class 101–200 (β = 1.356). The remaining three classes had a low positive influence on GE. However, the results indicate that the probability to host a gully due to SPI increases with an increase in SPI. The outcome of this analysis is supported by the works of Shit et al. (2015); Rahmati et al. (2017a); Amiri et al. (2019); and Domazetović et al. (2019) who also observed that higher SPI values have higher erosion potential and often coincide with the appearance of soil removal and GE occurrence.

The bond between TWI and GE occurrence portrays an ascending pattern with the lowest class (<10) assigned the minimum β value (0.070) while the highest class (>18) got the maximum β value (3.112) (Fig. 3.5f). Thus, the result suggests that the probability of GE susceptibility intensifies with an increase in TWI. This result is similar to the study of Arabameri et al. (2020b) in the Chah Mousi watershed in Semnan province, Iran, that gully formation in the watershed is particularly favored in areas with high TWI values representing zones of saturation with high surface soil water along drainage paths. These saturated areas favor gully formation since the surface soils lose their strength as they become wet. Also, Zabihi et al. (2018) reported in their study in the Mazandaran province of Iran that, the greater the TWI factor, the greater is the potential for gully occurrence. High values of TWI increase the filtration rate and provide the conditions for piping and roof collapse, resulting in the development of gully tunnels and, eventually, the appearance of gullies on the surface.

Considering DD, the link with GE portrays an ascending pattern similar to the situation between TWI and GE. Positive relationships are observed in <1.5 (β = 4.612), 1.1–1.5 (β = 1.759), and 0.5–1.0 (β = 0.096) classes, while the class >0.5 (β = −0.025) exhibited a negative relationship (Fig. 3.5g). This means that susceptibility to GE due to DD increases with an increase in DD. Conoscenti et al. (2014); Dewitte et al. (2015); Rahmati et al. (2017a); Azareh et al. (2019); and Arabameri et al. (2020b) reported similar results.

The analysis concerning the association between DS and GE showed that GE increases as the DS decreases (Fig. 3.5h). In other words, locations at distances less than 500 m from a stream were more susceptible to GE. This finding conforms with reports by Dube et al. (2014), and Conoscenti et al. (2014) that close distances are more prone to gully development than on areas far from the stream. Like the case in DS, the relationship between GE and DR showed that the nearer the site to a road, the higher the potential for GE. Distances of less than 500 m from a road were positively correlated to gully locations (Fig. 3.5i), which highlights the importance of the roles of road development and disturbance of ground surfaces in promoting landscape degradation. This result is in consonant with findings of previous works ((Nyssen et al. 2002; Rahmati et al. 2017a; Seutloali et al. 2016) in that, when the distance from road increases, the probability of GE occurrence decreases. For aspect, positive correlation exists between east (β = 2.737), southeast (β = 0.811), northeast (β = 0.644), south (β = 0.418), north (β = 0.225), and southwest (β = 0.087) facing slopes. While west, northwest, and flat facing aspects exhibited negative relationships (Fig. 3.5j). This result implies that slopes facing north to southwest predominate and have a greater propensity to host gullies than slope aspect facing west. This is in agreement with Zabihi et al. (2018), who reported that eastward aspects in the Mazandaran province of Northern Iran are more susceptible to GE. This is so because the eastward facing slope aspects get more solar radiation in the northern hemisphere and, as a result, they experience more evaporation, higher soil porosity (total pore space), lower soil strength, and lower vegetation density.

The interpretation of the β values linking GE and PC classes revealed an all positive relationship (Fig. 3.5k). However, the concave class exhibited the strongest relationship with a β value of 0.287, followed by flat class (0.083), and convex class (0.021). There is a consensus between this result and the findings reported by Conforti et al. (2010) in the Turbolo catchment, Italy, and Rahmati et al. (2016) in the Chavar region, Iran, who also confirmed flat and concave curvatures are more prone to GE. About the connection between GE and geologic Formations, Fig. 3.5l showed that all Formations have a weak positive relationship with GE except Basalt Formation which had a negative relationship with the phenomenon. The positive relationship ranges from β = 0.053 on the Basement Complex to β = 0.615 on the Gombe sandstone. This confirms the study by Lazarus (2012) who concluded that the sandstones and shales that dominate Gombe States geology are more susceptible to GE than other geologic Formations in the area.

ST and GE revealed a significant relationship. However, among the six ST types (Fig. 3.5m), the sandy class had the highest positive β value (0.524), indicating the highest GE susceptibility, followed by sandy clay loam (β = 0.467), Sandy loam (β = 0.214), and Loamy sand (β = 0.103) accordingly. Contrarily, Silt clay loam and Silt clay obtained negative β values, showing the negative influence on GE incidence. A close look at the results also revealed that areas dominated by sandy loam and sandy clay loam hosted more gullies due to high sand contents that render them easily detachable. Similar work by Abdulfatai et al. (2014) and Igwe and Egbueri (2018) stated that the dominance of sand proportion in soils accelerates GE, while Silt clay loam and silt clay classes, due to sufficient and high clay content are resistant to soil erosion and gully development. Regarding LU (Fig. 3.5n), regression analyses established that bare surface, shrubland, farmland, built-up areas, and woodland have positive β values. The highest β value (1.197) was found on bare surfaces, followed by shrubland, which had a β value of 0.762. A negative relationship between LU and GE occurred in the wetland and waterbody classes, with waterbody having the lowest β value (−7.534). The finding is in harmony with previous studies (Amiri et al. 2019; Devátý et al. 2019; Shellberg et al. 2016), who concluded that protected locations like forest and rangelands experience less GE in comparison with bare surface regions.

3.4.3.2 The Relative Importance of the GEIFs

Following the stepwise regression allocation of β values to classes in all factors which confirmed their relationship with GES, the relative importance of each factor was determined based on the highest β value within the factor. The higher the β value, the stronger the effect of the given factor on GE occurrence. As shown in Table 3.7 and Fig. 3.6 DR, DD, DS, SPI, TWI, aspect, and rainfall had the strongest values and significantly contributed to GES whereas SA, LS, and LU exhibited moderate importance. In contrast, geology, ST, elevation, and PC were the least important and showed less impact in the gully occurrence in the study area, respectively. Therefore, this result suggests that among all the fourteen factors entered into the analysis, the DS factor by its highly significant β value is the most important single factor affecting GES in the study area.

Table 3.7 Relative importance and ranking of GEIFs
Fig. 3.6
figure 6

Relative importance of GEIFs

3.4.3.3 GE Probability Mapping

After the influence of each factor class was determined, GES was predicted by computing the probability of gully occurrence in the study area. This was attained by executing Eq. 3.5 and 3.6. The value of z (Eq. 3.6) was computed by substituting b with the β values of the most important factors earlier determined. Because the highest positive significant coefficient in the analysis (β = 5.175) belongs to the DR factor class < 500, DR was introduced as the most significant determining factor for GE occurrence, thus assuming the rest of the factors are constant, for a unit change in DR, the probability of GE occurrence will be e5.175 or 176.796 times. Hence, the β values of the most significant factors were inputted in the equation to compute z as

$$z = -9.327 + 5.175 *\mathrm{DR }(\mathrm{Class }1) + 1.811 *\mathrm{DR }(\mathrm{Class }2) + 4.612 *\mathrm{DD }(\mathrm{Class }1) + 1.759 *\mathrm{DD }(\mathrm{Class }2) + 3.831 *\mathrm{DS }(\mathrm{Class }1) + 1.413 *\mathrm{DS }(\mathrm{Class }2) + 3.456 *\mathrm{SPI }(\mathrm{Class }3) + 1.356 *\mathrm{SPI }(\mathrm{Class }4) + 3.112 *\mathrm{TWI }(\mathrm{Class }5) + 1.122 *\mathrm{TWI }(\mathrm{Class }4) + 2.737 *\mathrm{Aspect }(\mathrm{east}) + 2.261 *\mathrm{rainfall }(\mathrm{Class }4) + 1.183 *\mathrm{rainfall }(\mathrm{Class }3) + 1.387 *\mathrm{SA }(\mathrm{Class }1) + 1.262 *\mathrm{LS }(\mathrm{Class }3) + 1.105 *\mathrm{LS }(\mathrm{Class }2) + 1.197 *\mathrm{LU }(\mathrm{bare surface})$$
(3.9)

Once the value of z was obtained, it was inserted into Eq. 3.10 and the probability of GE occurrence was calculated as

$$P (\mathrm{probability of gully occurrence}) = \frac{1}{\begin{array}{c}1 +\mathrm{ e}-9.327 + 5.175 *\mathrm{DR }\left(\mathrm{Class }1\right)+ 1.811 *\mathrm{DR }\left(\mathrm{Class }2\right) + 4.612 \\ *\mathrm{DD }\left(\mathrm{Class }1\right)+ 1.759 *\mathrm{DD }\left(\mathrm{Class }2\right)+ 3.831 *\mathrm{DS }(\mathrm{Class }1) + 1.413 *\mathrm{DS }(\mathrm{Class }2)\\ \begin{array}{c}+ 3.456 *\mathrm{SPI }(\mathrm{Class }3) + 1.356 *\mathrm{SPI }(\mathrm{Class }4) + 3.112*\mathrm{TWI }(\mathrm{Class }5) + 1.122\\ *\mathrm{TWI }(\mathrm{Class }4)+ 2.737 *\mathrm{Aspect }(\mathrm{east}) + 2.261 *\mathrm{rainfall }(\mathrm{Class }4) + 1.183 *\mathrm{rainfall }(\mathrm{Class }3) \\ + 1.387 *\mathrm{SA }(\mathrm{Class }1) + 1.262 *\mathrm{LS }(\mathrm{Class }3) + 1.105 *\mathrm{LS }(\mathrm{Class }2) + 1.197 *\mathrm{LU }(\mathrm{bare surface})\end{array}\end{array}}$$
(3.10)

The output of the above computation was subsequently transferred into the ArcGIS and the GESM with the probability (p) ranging from 0 to 1 was prepared. The produced map was then partitioned into 20 classes using a threshold of 0.05 equal probability intervals. After overlapping with the GE inventory map, a histogram representing the incidence of gullies and non-gully occurrence against the probability classes was plotted. Lastly, based on the histogram, the probability range was reclassified using the natural breaks classification method into five classes: 0–0.076, 0.076–0.494, 0.494–0.733, 0.733–0.898, and 0.898–1.00, representing relatively safe, less susceptible, moderately susceptible, highly susceptible, and extremely susceptible, respectively. Table 3.8 and Fig. 3.7 present the characteristics of the five susceptibility classes and the final GESM.

Table 3.8 Characteristics of the five GES classes
Fig. 3.7
figure 7

Predictive GESM derived using the logistic regression model

From a visual analysis of the final GESM (Fig. 3.7), it can be observed that most parts of the study area particularly the eastern and southwestern parts fall in the less and moderately susceptible classes, while the northwest and central parts fall in the highly and extremely susceptible classes. It was also observed from field observation that the areas within the high and extreme susceptibilities are mostly located near roads. This underscores the results of the relationship between GE and GEIFs of which DR (<500 m) is the most important single factor determining GE susceptibility in the study area

3.5 Validation

The predictive performance of the MLR model was evaluated using the validation data set (78). This was performed by calculating the values of sensitivity and specificity and then drawing the AUROC curve. As earlier indicated, sensitivity was computed using Eq. 3.7 as the fraction of locations hosting gullies that were correctly classified as susceptible, while specificity is derived from the fraction of areas free of gullies that were correctly classified as not susceptible using Eq. 3.8. Table 3.9 and Fig. 3.8 present the validation results and the AUROC curve. From Table 3.9 it can be seen that the model correctly classified 37 gully locations and 35 non-gully locations, representing 94.87% and 89.74%, respectively. Also, the overall predictive performance of the model as given by the value of AUROC (0.923% or 92.3%) indicates excellent performance.

Table 3.9 Confusion matrix that presents results of the validation
Fig. 3.8
figure 8

AUROC curve for the final GESM

3.6 Conclusion

GE is the water erosion process that cuts soils and forms permanent gullies on the landscape that cannot be eliminated through conventional tillage operations. The formation and development of gullies is an important environmental threat throughout the world since it is responsible for land degradation, increase in sediment delivery, and reduction of water quality. It is also responsible for a decreased water travel time to rivers (and hence increased flooding probabilities), for the filling up of ponds and reservoirs, and for the destruction of infrastructure (buildings and roads). Therefore, the prediction of areas susceptible to GE is a crucial issue for environmental scientists, land managers, and decision-makers. To tackle this problem, researchers usually employ Knowledge-driven and Data-driven predictive methods. This study demonstrated the ability of data-driven logistic regression to accurately predict GES in SGS, Nigeria.

A GE inventory was prepared from a total of 260 gully and non-gully locations compiled from the interpretation of Google Earth images and field investigations. Besides, local environmental conditions and a 20 m DEM allowed the selection of soil texture, geology, land use, rainfall, and some topographical factors influencing GE susceptibility. Subsequently, the inventory data was randomly split into two datasets; 182 or 70% for training the logistic regression model, and 78 or 30% validation for validation of prediction results, while influencing factors independence was assessed using multicollinearity scrutiny. Results of forward stepwise regression for the relationship between GE and selected factors indicated that distance from road is key to gully formation. After running the logit function, the resultant susceptibility map revealed that 3.1% of the study area was relatively safe, 50.1% less, 23.3% moderate, 19.2% high, and 4.3% extremely susceptible. Validation assessment using area under the receiver operating characteristic curve provided 92.3% prediction accuracy. This study further confirmed logistic regression as an excellent and accurate data-driven method for spatial analysis and prediction of GE susceptibility. The method can be applied elsewhere with similar physiographic characteristics.