Introduction

Soil cation exchange capacity or CEC is a very important characteristic, essential for measuring fertility and nutrient retention capacity, and is a commonly applied indicator of soil condition or vulnerability. Soil management of a region has invoked this research to develop a predictive model using measurements of a sample of CEC values of a region. The predictive models are used to study the CEC behavior in terms of contributory factors of soil composition of clay, silt, sand, gypsum and organic matter. These soil constituents carry intrinsically negative charges on their surface and therefore adsorb exchangeable cations (Tang et al. 2009).

The two properties that most account for the reactivity of soils are surface area and surface charge. Surface area is a direct result of particle size and shape. Most of the total surface area of mineral soil is due to clay-sized particle and soil organic matter. Charge development in soils is intimately associated with these same two fractions, although the sand and silt size fractions may contribute some CEC if coarse-grained vermiculite is present. Charge development in soils occurs as a result of both isomorphic substitution and ionization of functional groups on the surface of solids that make up the soil matrix. These two mechanisms give rise to the permanent and the pH-dependent charges of soils, respectively (Yola et al. 2014). Plant roots are capable of cation exchanges and root zones are where this capability is exploited. The quantity of negative charges in soil, known as CEC, is a measurable quantity, but their direct measurements are expensive, difficult and time-consuming (Evans 1989). Therefore, this paper aims to gain an insight into modeling the CEC in terms of soil composition of clay, silt, sand, gypsum and organic matter. The significance of the CEC of the soil particles stems from the ‘ease’ of exchange of cations with one another to the extent that they are readily available for plants. The CEC is among the most important soil properties required in soil databases (Manrique et al. 1991) and serves as an input to soil and environmental models (Keller et al. 2001). Different study topics on the CEC include: fluoride sorption and desorption on soils (Gago et al. 2014), monitoring general variability of soil attributes to different land use types in calcareous soils (Rezapour 2014), influence of pasture degradation on soil quality indicators that included physical, chemical, biological and micromorphological attributes (Ayoubi et al. 2014), isolation of different soil organic carbon fractions and coal C in a reclaimed minesoil chronosequence (Chaudhuri et al. 2013), determination of adsorption efficiency based on cation exchange capacity (Gatima et al. 2006), study on adsorption of aqueous phenol solution in soil (Subramanyam and Das 2009), impact of sewage and mining activities on distribution of heavy metals in the water–soil–vegetation system (Semhi et al. 2013), potential impact of fluorine-rich fertilizers on the aquifer system (Marimon et al. 2013), prediction of subsurface heterogeneity of contaminated soil management (Moon et al. 2013), identification of heavy metal sources in agricultural soil (Huang et al. 2013), effects of cations and anions on iron and manganese sorption and desorption capacity in calcareous soils (Moharami and Jalali 2013), effect of the addition of granitic powder to an acidic soil (Silva et al. 2013), effects of agricultural practice and land use on the distribution and origin of some potentially toxic metals in the soils (Moghaddas et al. 2013). In the laboratory procedure, CEC of the soils is measured by using the ammonium acetate (NH4OAc) method through the replacement of sodium (Na+) ions with ammonium (NH4+) ions, but this is difficult, time-consuming and expensive (Carpena et al. 1972). An alternative CEC estimation approach is through the more easily measurable and readily available soil properties such as particle size distribution (clay, sand and silt content), gypsum and organic matter, which are referred to as pedo-transfer functions (PTFs) as coined in soil science (Bouma 1989). In recent years, data mining techniques such as artificial neural networks (ANN) and support vector machines (SVM) have been widely used in the modeling of various complex environmental problems (Yilmaz and Kaynar 2011). A neural network is an adaptable system that learns relationships from the input and output datasets and is able to predict a previously unseen dataset of similar characteristics to the input set (Haykin 1999; ASCE 2000). Previous PTF studies for formulating CEC models using ANNs include (Van Bladel et al. 1975; Baker and Ellison 2008; Tang et al. 2009; Minasny and McBratney 2002; Minasny et al. 1999; Schaap et al. 1998; Silveira et al. 2013). Gruszczyñski (2009) applied multiple regression, polynomial neural network and fuzzy neural network (ANFIS) to prediction of CEC in southern Poland. The results showed that ANFIS is better than the multiple regression and polynomial neural network models. Tang et al. (2009) investigated four radial basis function neural networks (RBFN) into the estimation of CEC from 457 soil physicochemical properties. Soil horizon; pH; organic carbon content; and clay, silt and sand contents were taken as the input variables. The results show high correlation coefficients between predicted and measured CEC values.

As an alternative to ANN, the SVM proposed by Vapnik (1998) is a powerful tool for nonlinear classification, regression and time series prediction (Wang et al. 2008). SVM belong to kernel-based learning approaches and have gained wide popularity. SVM are a kind of supervised machine learning system that use a linear high-dimensional hypothesis space called feature space. The basic idea of working principle of SVM is provided by the use of kernel functions that implicitly map the data to a higher-dimensional space. This makes SVM a powerful tool for modeling the nonlinear complex environmental problems (Bhagwat and Maity 2012). Several studies reported the use of SVM in forecasting the soil water (Wu et al. 2008), soil moisture prediction (Gill et al. 2006), estimation of soil hydraulic parameters (Twarakavi et al. 2009), modeling soil diffuse reflectance spectra (Rossel and Behrens 2010) and soil type classification (Kovačević et al. 2010). Liao et al. (2014) also used SVM, multiple stepwise regression (MSR) and ANN models to estimate the CEC from easily measured physicochemical properties (e.g., texture, soil organic matter, pH) based on 208 soil samples in Qingdao City, China. They indicated that the SVM model has better results than the ANN and MSR models to estimate CEC values. Sensitivity analysis was also conducted to explore the influence of each input parameter on the CEC predictions. The clay and sand content are the most and weakest parameters, respectively.

Study area

The area under study covers approximately 8000 ha (between the longitude: 46°35′–46°40′E and latitude: 35°08′–35°12′N) in the Khoja region, Tabriz, Azerbaijan, Iran. The Khoja Soil and Water Conservation Research Station is located 30 km at the east of Tabriz and 60 km from Eher (also mentioned as Ahar). The prevailing semiarid climate in the region is cold continental with a mean annual rainfall of 220–270 mm. The altitude in the vicinity of the region is approx. 1550 m above the mean sea level. The region is surrounded by mountains and hills through which the River Aji flows to Lake Urmu. The underlying rock formations are often granitic intrusion related to the Oligo-Miocene age, intruding through a sequence of calcareous rocks. The sampling procedure was started by carrying out a preliminary soil investigation and then designing a random sampling procedure, but flexibility is exercised during the setting out of the sampling locations to reflect the inherent variability of soil (e.g., the sampling points are set out capturing the locations with tree lines, variability due to soil texture or locations with notable land use changes). The underlying principle in selected sample location is that the results are representative of the soil, such that the fitted models are not a construct of the particular measurement but are capable of being a representative of the whole area by capturing its inherent variability. The study area is located on floodplain with well-established flooding patterns, and therefore, soil variability depends more on local weathering than regional variability or land use. There have been past attempts to assess land suitability of the site for agricultural purposes by using various methods (Malekian and Jafarzadeh 2011), which were used as preliminary basis before taking the necessary steps to gather data. On this basis, the decisions on designing the sampling procedure were made, and subsequently, sampling at regular grid points was found to be more appropriate for this study area, but the randomness was allowed within the number of samples at each soil horizon.

Materials and Methods

Artificial neural networks (ANNs)

ANNs, often traced back to McCulloch and Pitts (1943), are inspired by the working of the brain and nerve systems in biological organisms with a capability for self-learning and automatic abstracting and with a possible benefit of reducing modeling times. One application of ANNs is an alternative modeling strategy to traditional methods of data and time series analysis. Fundamental processing element of ANNs is a neuron. At the hidden layers, each neuron computes w ij , a weighted sum of its p input signals, x i , for \(i = \, 0, \, 1, \, 2, \ldots ,n\) and then applies a nonlinear activation function to produce an output signal, u j . The model of a neuron is shown in Fig. 1. A neuron j is described mathematically by the following pair of equations:

$$u_{j} = \mathop \sum \limits_{i = 1}^{p} w_{ij} \cdot x_{i}$$
(1)

and

$$x_{j} = \varphi \left( {u_{j} - \theta_{j} } \right)$$
(2)

where \(\theta\) is a threshold function, and its use has the effect of applying an affine transformation to the output of the linear combiner in the model of Fig. 1 (Haykin 1999; Melesse and Hanley 2005), and in this study, the logistic sigmoid nonlinear function (Bilgili et al. 2007) is used for this purpose, expressed as:

$$\varphi_{x} = \frac{1}{{1 + {\text{e}}^{ - x} }}$$
(3)

The type of ANN used in this study is a feed-forward multilayer perceptron (MLP) with back propagation (BP) learning algorithm, as commonly used in various complex environmental problems such as soil science applications of MLP. The structure of a three-layer MLP is shown in Fig. 2. MLP with back propagation (BP) is a popular form of training multilayer neural networks learning algorithm, and it is widely used in solving various classification and prediction problems. Back propagation convergence is slow, but it has the advantages of accuracy and adaptability (Kisi 2005).

Fig. 1
figure 1

Nonlinear model of a neuron (Haykin 1999)

Fig. 2
figure 2

Simple configuration of multilayer perceptron neural network

It consists of three layers: an input layer, a hidden layer and an output layer. A set of neurons or nodes are arranged in each layer. The number of neurons in the input and output layers is defined depending on the number of input and output variables of the system under investigation, respectively. However, the number of neurons in the hidden layer(s) is usually determined via a trial-and-error procedure. As seen from the figure, the neurons of each layer are connected to the neurons of the next layer by weights. The typical performance function used for training feed-forward neural networks is the mean sum of squares (MSE) of the network errors:

$${\text{MSE}} = \frac{1}{N}\mathop \sum \limits_{j = 1}^{N} \left( {T_{j} - O_{j} } \right)^{2}$$
(4)

where \(T_{j}\) is the target output, \(O_{j }\) is the actual output at output unit j, and N is the number of data patterns.

Support vector machines (SVM)

The SVM is based on statistical learning theory (Vapnik 1998) Because of good generalization performance, SVM is receiving increasing attention in pattern classification and nonlinear regression estimation (Cao and Tay Francis 2003). For a given training data with N number of samples, represented by \(\left( {x_{1} ,y_{1} } \right), \ldots ,\left( {x_{N} ,y_{N} } \right)\), where x is an input vector and y is a corresponding output value, SVM estimator (f) on regression can be represented by:

$$f\left( x \right) = \varvec{w} \cdot \emptyset \left( x \right) + b$$
(5)

where w is a weight vector, b is a bias, ‘·’ denotes the dot product and \(\emptyset\) a nonlinear mapping function. A smaller value of w indicates the flatness of Eq. (5), which can be obtained using minimizing the Euclidean norm as defined by \(\left\| {w^{2} } \right\|\). Vapnik (1995) introduced the following convex optimization problem with an \(\varepsilon\)-insensitive loss function for regression using SVM:

$$\begin{aligned} {\text{minimize}}&\quad \frac{1}{2}w^{2} + C\mathop \sum \limits_{K = 1}^{N} \left( {\xi_{k}^{ + } + \xi_{k}^{ - } } \right) \hfill \\ {\text{subject}}\;{\text{to}}&\;\left\{ {\begin{array}{*{20}l} {y_{k} - \left( {w \cdot \emptyset \left( {x_{k} } \right) + b} \right) \le \varepsilon + \xi_{k}^{ + } } \hfill & {} \hfill \\ { - y_{k} + \left( {w \cdot \emptyset \left( {x_{k} } \right) + b} \right) \ll \varepsilon + \xi_{k}^{ - } } \hfill & {k = 1,2, \cdots ,N} \hfill \\ {\xi_{k}^{ + } ,\xi_{k}^{ - } \ge 0} \hfill & {} \hfill \\ \end{array} } \right. \hfill \\ \end{aligned}$$
(6)

where \(C\) is a positive trade-off parameter or capacity parameter that determines the degree of the empirical error in the optimization problem and determines the trade-off between the flatness of the function and the amount to which deviations larger than \(\varepsilon\) are tolerated. Also \(\xi_{k}^{ - } ,\xi_{k}^{ + }\) are slack variables representing upper and lower constraints on the output system over the error tolerance \(\varepsilon\) (Misra et al. 2009). Lagrangian multipliers and the Karush–Kuhn–Tucker (KKT) condition were used to solve the optimization of Eq. (6) in a dual form. Support vectors are the input vectors that have nonzero Lagrangian multipliers under the KKT condition (Yoon et al. 2011). Figure 3 shows a schematic diagram of the SVM used in this study.

Fig. 3
figure 3

A schematic structure of SVM model (Yoon et al. 2011)

In natural processes, almost all the predictor variables (input space) are nonlinearly related to the predicted variable. This limits a linear formulation of the problem as shown in Eq. (6). This limitation is solved by mapping the input space on to some higher-dimensional space (feature space) using a kernel function. The kernel function enables us to implicitly work in a higher-dimensional feature space. Eq. (5) can then be modified by using Lagrangian multipliers \((\alpha_{i} \;{\text{and}}\; \alpha_{i}^{*} )\) as following:

$$f\left( {x,\alpha_{i} ,\alpha_{i}^{*} } \right) = \mathop \sum \limits_{i = 1}^{N} \left( {\alpha_{i} - \alpha_{i}^{*} } \right) \cdot K\left( {x,x_{i} } \right) + b$$
(7)

where \(K\left( {x,x_{i} } \right)\) is the kernel function. Commonly used kernel functions include the linear, polynomial, radial basis and sigmoid kernel functions. In this study, the widely used radial basis function (RBF) kernel function was used. The RBF kernel function is defined by Eq. (8):

$$K\left( {x,x_{i} } \right) = { \exp }\left( { - \frac{{x - x_{i}^{2} }}{{\sigma^{2} }}} \right)$$
(8)

while using SVM with RBF kernel function, one has to optimize three parameters during training, which includes kernel parameter \((\sigma )\), capacity parameter \((C)\) and insensitive loss function \((\varepsilon )\). In this study, an internal cross-validation (Wang and Hu 2005) during creation of SVM model is adopted to have an optimal combination of the three parameters.

Data specification

Present study includes 380 soil samples collected from different horizons of 80 soil profiles. The approach adopted in this study consists of marking 80 sites spaced at approximately 1000 m on the map of the area at a scale of 1:25,000. These sites were set out using the GPS, and bores were dug of the size of 1 m × 2 m × 1.5 m (depth). Samples were taken from each site and were analyzed in laboratory to measure their content of CEC, clay, silt, sand, gypsum and OM. For this study, texture represented by clay, silt and sand as well as gypsum and organic matter was measured by Bouyoucos hydrometer, Stone’s and Walkley-Black methods, respectively (Sayegh et al. 1978; Nelson and Sommers 1982) and the CEC values (in Cmol Kg−1) were determined using the Bower method (Sparks et al. 1996). Total data of 380 samples were divided randomly in a way to use 323 for model training, whereas remaining 57 samples (15 % of total samples) were used for testing the models. The size of sand particles ranges between 2.0 and 0.05 mm; silt, 0.05 and 0.002 mm; and clay, less than 0.002 mm. Gypsum, known to chemists as calcium sulfate dihydrate (CaSO4–2H2O), is a soft naturally occurring mineral and primarily used in agriculture as a ‘clean green’ soil conditioner and fertilizer (Anonymus 1992). Organic matter in soil is composed of carbon, oxygen, hydrogen, nitrogen, phosphorus and sulfur, which is a complex mixture of substances that range from freshly deposited plant and animal parts to the residual humus-stable organic compounds. Clay minerals usually range from 10 to 150 meq/100 g in CEC values. Organic matter ranges from 200 to 400 meq/100 g. So, the kind and amount of clay and organic matter content greatly influence the CEC of soils (Parker 2010). The key statistic values of the dataset used are presented in Table 1, and Table 2 presents the estimated values of correlation between the variables, according to which: (1) The correlation between CEC, silt and gypsum is poor, and (2) CEC is directly correlated with clay, silt and OM but inversely with sand and gypsum. Based on Table 1, the training and testing data are broadly agreeable to each other, whereas Table 2 shows that the correlation of the CEC with the input variables is ranked in the order of clay, OM, sand (inverse correlation), silt and gypsum, but it is rather concerning that the cross-correlation between silt and sand is quite strong.

Table 1 Summary of statistics characteristics of input and output parameters
Table 2 Correlation matrix of the data used

Performance measures

Three performance criteria, correlation coefficient (CC), root-mean-square error (RMSE) and the Nash–Sutcliffe efficiency coefficient (E), are used in this study to assess the goodness of fit of the models. The CC, which ranges from −1 to 1, is a statistical measure of how well the regression line is close to the observed data, and a coefficient of ±1 indicates that the regression line perfectly fits the observed data. The RMSE can provide a balanced evaluation of the goodness of fit of the model as it is more sensitive to the larger relative errors caused by the low value, and the perfect model will have a value of zero. The E-values range between −∞ and 1.0, with = 1 obtained by perfect fits. These performance measures and information criteria are calculated by (Ghorbani et al. 2013):

$${\text{CC}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left[ {\left( {{\text{CEC}}_{\rm o} } \right)_{i} - \left( {\overline{\text{CEC}}_{\rm o} } \right)} \right]\left[ {\left( {{\text{CEC}}_{\rm e} } \right)_{i} - \left( {\overline{\text{CEC}}_{\rm e} } \right)} \right]}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left[ {\left( {{\text{CEC}}_{\rm o} } \right)_{i} - \left( {\overline{\text{CEC}}_{\rm o} } \right)} \right]^{2} \mathop \sum \nolimits_{i = 1}^{n} \left[ {\left( {{\text{CEC}}_{\rm e} } \right)_{i} - \left( {\overline{\text{CEC}}_{\rm e} } \right)} \right]^{2} } }}$$
(9)
$${\text{RMSE = }}\sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\left( {{\text{CEC}}_{\rm o} } \right)_{i} - \left( {{\text{CEC}}_{\rm e} } \right)_{i} } \right)^{2} }}{n}}$$
(10)
$$E = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\left( {{\text{CEC}}_{\rm o} } \right)_{i} - \left( {{\text{CEC}}_{\rm e} } \right)_{i} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} \left[ {\left( {{\text{CEC}}_{\rm o} } \right)_{i} - \left( {\overline{CEC}_{\rm o} } \right)} \right]}}$$
(11)

where the subscripts ‘o’ and ‘e’ represent the observed and estimated values; the average value of the associated variable is represented with a bar above it and n is the total number of records.

Results and Discussion

In the application of ANN and SVM in this study for the predication model, five soil characteristics including clay, silt, sand, gypsum and organic matter are considered as the input variables, and CEC is considered as the output variable. In order to develop PTFs using proposed models to predict CEC, both training and test data were normalized using following equation:

$$x_{n} = \frac{{x - x_{\hbox{min} } }}{{x_{\hbox{max} - } x_{\hbox{min} } }}$$
(12)

where \(x_{n}\) is the normalized value (lying between 0 and 1), x is the original data point, \(x_{\hbox{min} }\), and \(x_{\hbox{max} }\) are the minimum and maximum values in the dataset, respectively. In this study, an MLP model has been constructed for predicting the CEC using the MATLAB software, which also randomizes the dataset, as the data points are not time series. Neurons in the input layer have no transfer function. Logistic sigmoid transfer function was used in the hidden layer, and linear transfer function (purelin) was employed from the hidden to the output layers as an activation function. Logistic sigmoid activation function is known as the most used nonlinear activation function. Most important features of the sigmoid functions are continuity and differentiability. Purelin activation function is a pure linear function which outputs any inputs without any change. This is to enable the network to be able to take care of any nonlinearities in the input data and at the output, to be able to give a wide range of values (Zhang et al. 1998).

The neural network used in this study was trained using 1000 epochs and the Levenberg–Marquardt learning algorithm. The optimal value of learning rate and momentum used in present study was 0.001 and 0.9, respectively. Mean square error (MSE) is used as the performance measure of MLP. The optimal number of neuron in the hidden layer was identified using a trial-and-error procedure by varying the number of hidden neurons from 1 to 20. Results from Fig. 4 suggest that a neural network with one hidden layer having 13 nodes performs well with this dataset and achieves CC value = 0.920, RMSE = 1.803 Cmol Kg−1 and E = 0.846 with training and CC = 0.860, RMSE = 2.731 Cmol Kg−1 and E = 0.723 with testing dataset. The ANN modeling was implemented using MATLAB software. Figure 5 shows the scatter plots obtained from the optimum ANN model for training and testing dataset. Results in terms of various performance measures and Fig. 4 suggest that used neural network model achieves close approximations of the actual observations, suggesting effectiveness of this modeling approach in predicting CEC values.

Fig. 4
figure 4

a The RMSE and b the CC of different neural networks

Fig. 5
figure 5

Scatter plots of measured and predicted CEC by ANN model in a training set b testing set

To compare the performance of neural network modeling approach, SVM, another modeling approach is used for predicting CEC values. Many studies on the use of SVM in soil science suggested the improved performance by RBF kernel-based SVM (Twarakavi et al. 2009; Zhu and Xu 2011; Cisty et al. 2011; Lamorski et al. 2013; Shi et al. 2013). Therefore, in present study, the RBF kernel with parameter γ is used as the kernel function. Therefore, in this study, the RBF kernel with parameters (C, ε, σ) is used as the kernel function for CEC modeling and the accuracy of a SVM model is dependent to identify the parameters. To obtain a suitable value of these parameters (C, ε, σ), the RMSE was used to optimize parameters. The performance criteria with optimal parameters (C, ε, σ) = (4.68, 0.79, 2.36) suggest that SVM perform well in predicting CEC values (CC = 0.896, RMSE = 2.040 Cmol Kg−1 and E = 0.803 with training set and CC = 0.849, RMSE = 2.796 Cmol Kg−1and E = 0.709 with testing set). Figure 6 shows the scatter plots of the results obtained using the SVM model for training and testing dataset. The results show almost perfect agreement between predicted and measured values.

Fig. 6
figure 6

Scatter plots of measured and predicted CEC by SVM model in a training set b testing set

The performance of the ANN and SVM techniques is compared in Table 3. Given the obtained results in Table 3, results indicate an appropriate performance by both models for CEC prediction. ANN model with high CC (0.920), lowest RMSE equal to (1.803 Cmol Kg−1) and highest Nash–Sutcliffe coefficient (0.846) with training data and high CC (0.860), lowest RMSE equal to (2.731 Cmol Kg−1) and highest Nash–Sutcliffe coefficient (0.723) with test dataset found to be performing well in comparison with SVM models with this dataset. Comparison of ANN and SVM models as illustrated in Fig. 7 suggests that both ANN and SVM methods perform poorly in extrapolating the maximum and minimum values of CEC data.

Table 3 Performance evaluation criteria of SVM and ANN models
Fig. 7
figure 7

a Comparison of the SVM and ANN models predicted and measured CEC values in testing set

Conclusion

In this study, ANN and SVM were used to predict the CEC for an agricultural site using five input variables. From these results, following conclusion can be drawn:

The results from this study suggest that both ANN and SVM models had the ability to predict CEC within acceptable limits. ANN and SVM methods perform poorly in extrapolating maximum and minimum values of CEC data. ANN model provided better estimation in the testing period in comparison with SVM for CEC prediction. Before using both ANN and SVM modeling approaches for CEC prediction, it is suggested that these techniques may be used with the datasets from different regions as all machine learning approaches are data-dependent in nature.