Introduction

There is an increasing demand for reliable large-scale soil data to meet the requirements of models for planning of land-use systems, characterization of soil pollution, and prediction of land degradation (McBratney et al. 2002; Zolfaghari et al. 2016). Cation exchange capacity (CEC) is one of the most important soil properties that is required in soil databases (Amini et al. 2005; Liao et al. 2014), and is used as an input in soil and environmental models (Keller et al. 2001). CEC refers to the quantity of negative charges in soil (Jaremko and Kalembasa 2014). The negative charge may be pH dependent (soil organic matter) or permanent (some clay minerals) (Liao et al. 2014; Zolfaghari et al. 2016). Although CEC can be measured directly, its measurement is difficult and expensive. Pedotransfer functions (PTFs) provide an alternative by estimating CEC from more readily available soil data (Liao et al. 2014; Emamgolizadeh et al. 2015; Zolfaghari et al. 2016).

In recent years, various PTFs have been developed to estimate CEC from basic physical and chemical soil properties (McBratney et al. 2002; Amini et al. 2005; Kianpoor et al. 2012; Bayat et al. 2014; Liao et al. 2014). In most of these models, CEC is assumed to be a linear function of soil organic carbon and clay content (McBratney et al. 2002; Sarmadian and Taghizadeh Mehrjardi (2008); Kianpoor et al. 2012). Multiple linear regression (MLR) analysis is generally used to find the relevant coefficients in the model equations. Often, however, models developed for one region may not give adequate estimates for a different region (Wagner et al. 2001; Amini et al. 2005; Emamgolizadeh et al. 2015).

A recent approach to model PTFs is the use of artificial neural networks (ANNs). Artificial neural networks have been successfully employed to predict some soil properties that their measurement is difficult (Minasny and McBratney 2002; Amini et al. 2005; Bayat et al. 2014; Emamgolizadeh et al. 2015). An advantage of using ANNs is that no specific type of function needs to be assumed a priori to model the relationship between inputs and outputs. The optimum relation that links input data to output data is obtained through a training procedure. ANN Models are generally expected to be superior to MLR models because of their greater feasibility (Amini et al. 2005; Bayat et al. 2014; Emamgolizadeh et al. 2015). A type of artificial neural network known as multi-layer perceptron (MLP), which uses a back-propagation training algorithm, is usually used for generating PTFs (Minasny and McBratney 2002; Amini et al. 2005; Sarmadian and Taghizadeh Mehrjardi (2008); Lake et al. 2009;Keshavarzia and Sarmadiana 2010; Yilmaz and Kaynar 2011; Kianpoor et al. 2012; Emamgolizadeh et al. 2015). This network uses neurons whose output is a function of a weighted sum of the inputs.

Several attempts have been conducted in relation to modeling various soil physiochemical parameters by means of different artificial intelligence-based model techniques such as those done for modeling of the daily and hourly behavior of runoff (Aqil et al. 2007), estimation of soil erosion and nutrient concentrations in runoff (Kim and Gilley 2008), modeling of Pb(II) adsorption from aqueous solution (Yetilmezsoy and Demirel 2008), to determine the clay dispersibility (Zorluer et al. 2010), estimating the grout ability of granular soils (Tekin and Akbas 2011), prediction of swell potential of clayey soils (Yilmaz and Kaynar 2011), prediction of soil water retention curve (Abbasi et al. 2011), land suitability evaluation (Keshavarzi et al. 2011), estimating wet soil aggregate stability (Besalatpour et al. 2013) and etc. Some studies also have been considered capability of soft computing techniques for prediction modeling soil CEC such as those conducted by Amini et al. (2005); Sarmadian and Taghizadeh Mehrjardi (2008); Tang et al. (2009); Sarmadian et al. (2013); Kianpoor et al. (2012), Keshavarzi et al. (2012), Liao et al. (2014), Bayat et al. (2014), Emamgolizadeh et al. (2015); Zolfaghari et al. (2016). The findings of these researchers demonstrated that PTFs developed through artificial intelligence-based modeling techniques were more efficient than the regression ones to predict the CEC. In spite of, few studies focused on developing PTFs by means of adaptive neuro-fuzzy inference system for prediction of CEC.

The objectives of this study were to develop suitable artificial neural network for estimation of CEC in Guilan region soils located in northern Iran and comparing artificial neural network with regression and adaptive neuro-fuzzy inference system models that have been developed for these soils.

Materials and methods

Study area and data collection

This research was carried out in paddy soils of Guilan province. The study area is located between 49°, 31′ to 49°, 45′E longitude and 37°, 7′ to 37°, 27′N latitude in north of Guilan Province, the southern coast of Caspian Sea, Northern Iran (Fig. 1). Region climate is very humid with annual precipitation mean 1293.6 mm and annual temperature mean 15.8 °C. The region soils moisture and temperature regimes are Aquic, Udic and Thermic, respectively, and soils parent materials are derived from river sediments. Soil series names of study area and their distribution are presented in Table 1 and Fig. 1, respectively. All soil profiles were deep expecting 10 and 11 soil series. Texture of soils was light to heavy in different soil series (Fig. 2).

Fig. 1
figure 1

Study area location and soil types map of area

Table 1 Soil series names of the study area with surface soil texture
Fig. 2
figure 2

Textural distribution of both training and testing data sets on the USDA soil texture triangle

The determination of chemical and physical properties was carried out on 171 soil samples collected from various horizons of 120 soil profiles. Using profile description and laboratory analysis of soil samples, all the studied soils were classified as Entisols and Inceptisols on the basis of Soil Survey Staff (2014b). The soil properties measured for this research were organic carbon (OC), pH, calcium carbonate, CEC and soil texture fractions including clay, sand and silt content. The following analytical methods were employed to measure each of parameters for this study: organic carbon content was determined using Walkley–Black method (Nelson and Sommers 1982), Particle size distribution using pipette method (Soil Survey Staff 2014a), CEC using sodium acetate (pH 8.2), pH by pH meter in ratio 1:2 soil with water and CaCO3 using titration method (Soil Survey Staff 2014a). The results of determinations were used as input variables to develop the CEC estimation models. The data sets were divided into two subsets. One subset was used for generating PTFs and calibrating PTFs from the literature, and the other subset was used for testing the models. The division was based on stratified random sampling by sorting the data based on CEC, stratifying the data into 10 CEC groups, and randomly selecting 25 % of the data from each group for testing. The remaining 75 % of the data were used for calibration and this division carried out as statistical characteristics of soil properties (e.g. min, max, and etc.) were similar in two subsets.

Prediction methods

Multiple linear and non-linear regression model

The general purpose of multiple regressions is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. Multiple regressions are the most common method used in development PTFs.

Artificial neural networks (ANNs)

Artificial neural networks (ANNs) are a form of artificial intelligence, which, by means of their architecture, attempt is made to simulate the biological structure of the human brain and nervous system (Amini et al. 2005). In this study, developed ANN model was multi-layer perceptron which is the most commonly-used neural network structure in ecological modeling and soil science (Agyare et al. 2007; Besalatpour et al. 2013; Emamgolizadeh et al. 2015). The MLP algorithm developed in this research is a feed-forward back-propagation network (FFBP) model. There are two input elements,  %Clay and  %OC, and one output element, CEC, so that the MLP architecture is 2-m-1, where m represents the number of hidden neurons. A schematic diagram of the network is given in Fig. 3. Assume P is a (d × n) rescaled input matrix where the rows consist of elements (i.e. clay and OC) and the columns are the samples. Initially, we calculate a linear combination, aj, of the weighted input elements, Pi, plus a constant bias, \(w_{jO}^{(h)}\), expressed as:

Fig. 3
figure 3

A schematic structure of the feed forward back-propagation neural network

$$a_{j} = \mathop \sum \limits_{i = 1}^{d} w_{ji}^{(h)} P_{i} + w_{jo}^{(h)} , \,\,j = 1, \ldots , m \,\,and \,\,i = 1, \ldots , d$$
(1)

where d is the number of elements, m is the number of neurons, and \(w_{ji}^{(h)}\) denotes the weights given to the input i of the neuron j in the hidden layer. The matrix, aj, is then activated by a tangent sigmoid function, f, to produce the output of the hidden layer, Zj:

$$Z_{j} = f\left( {a_{j} } \right) = - 1 + \left[ {2/\left( {1 + exp\left( { - 2a_{j} } \right)} \right)} \right]$$
(2)

In the output layer (Fig. 3), the outputs of the hidden layer are summed linearly to produce CEC estimates:

$$CEC_{{\left( {predicted} \right)}} = \mathop \sum \limits_{j = 1}^{m} w_{j}^{(o)} Z_{j} + w_{o}^{(o)}$$
(3)

The above procedure is repeated for every sample, i.e. n times. The weights in the above equations are adjustable parameters of the network and are optimized during the network training procedure. The commonly used objective function in training is the mean squared error (MSE) typically specified as:

$$MSE = \frac{1}{n}\mathop \sum \limits_{k = 1}^{n} \left( {CEC_{predicted} - CEC_{measured} } \right)^{2} , k = 1, \ldots , n.$$
(4)

Error minimization can be obtained by a number of procedures. Frequently, the Levenberg–Marquardt (More 1977) algorithm is used in feed-forward networks (Schaap et al. 1998). A problem that usually occurs during network training is over-fitting or overtraining, which means that the network learns to work well for the training inputs, but not well enough for a test data set. To avoid overtraining, Amini et al. (2005) proposed a regularized objective function, MSEReg, in which the sum of network weights is added to the MSE:

$$MSE_{Reg} = \gamma MSE + \left( {1 - \gamma } \right)MSW$$
(5)

where \(\gamma\) is a performance ratio calculated by means of the Bayesian regularization in combination with the Levenberg–Marquardt algorithm and MSW is the mean of the squared weights and biases (Amini et al. 2005). When the data set is small and you are training function approximation networks, Bayesian regularization provides better generalization performance than early stopping. This is because Bayesian regularization does not require that a validation data set be separate from the training data set; it uses all the data (Help of MATLAB R2015b software 2015). For this purpose, we used “create network or data toolbox” of MATLAB software which training, adaption learning, performance and transfer functions were Bayesian regularization, gradient descent, MSEReg and tangent sigmoid, respectively.

MATLAB R2015b software (2015) was used to develop PTFs for predicting CEC by means of ANN model. In order to this end, all data set were first normalized between 0.1 and 0.9 to achieve effective network training. Luk et al. (2000) stated that neural networks trained on normalized data, achieve better performance and faster convergence in general, although the advantages diminish as network and sample size become large. Normalizing the data set was done through in two stage: (1) Pre-processing: The input (clay and OC) and output (CEC) data for training and test data sets were initially rescaled to fall within the range of [0.1, 0.9] by the transfer function (Help of MATLAB R2015b software 2015):

$$P_{norm} = \left[ {0.8 \times \left( {\left( {P_{i} - P_{min} } \right)/\left( {P_{max} - P_{min} } \right)} \right)} \right] + 0.1$$
(6)

where Pnorm is the rescaled input matrix, Pi is the input matrix, and Pmin and Pmax are two vectors containing the minimum and the maximum values of the input matrix, respectively. The output (CEC) of the network is also rescaled by using its minimum and maximum values. (2) Post-processing: To back-transform the results of the network we used the following equation:

$$P_{i} = \left[ {1.25 \times \left( {P_{norm} - 0.1} \right)/\left( {P_{max} - P_{min} } \right)} \right] + P_{min}$$
(7)

Adaptive neuro-fuzzy inference system (ANFIS) model

In ANFIS, fuzzy rule bases are combined with neural networks to train the system using experimental data and obtain appropriate membership functions for process prediction and control (Lertworasirikul 2008; Besalatpour et al. 2013). Takagi–Sugeno-Kang (TSK) model (Takagi and Sugeno 1985) that is one of the most frequently-used precise fuzzy models was used in the current study to predict soil CEC. In order to simplify, it is assumed that the inference system has two input variables x and y as each variable has two fuzzy subsets. A typical rule set with two fuzzy if–then rule set for a first-order Sugeno fuzzy model can be defined as Eqs. (8) and (9):

$${\text{Rule 1}}:{\text{ If}}\,x\,{\text{is}}\,A_{1} \,{\text{and}}\,y\,{\text{is}}\,B_{1} \,{\text{Then}}\,f_{ 1} = p_{ 1} x + q_{ 1} y + r_{ 1}$$
(8)
$${\text{Rule 2}}:{\text{ If}}\,x\,{\text{is}}\,A_{2} \,{\text{and}}\,y\,{\text{is}}\,B_{2} \,{\text{Then}}\,f_{ 2} = p_{ 2} {\text{x}} + q_{ 2} y + r_{ 2}$$
(9)

where A 1, A 2 and B 1, B 2 are the membership functions for inputs x and y respectively, p 1, q 1, r 1 and p 2, q 2, r 2 are the parameters of the output function. The corresponding equivalent ANFIS architecture for two input variable first-order Sugeno-fuzzy model with two rules is illustrated in Fig. 4a. The general architecture of ANFIS consists of five layers, namely, fuzzy, product, normalized, defuzzy and output layer is depicted in Fig. 4b. In this architecture, the circular nodes represent nodes that are fixed, whereas the square nodes are nodes that have parameters to be learnt (Yilmaz and Kaynar 2011).

Fig. 4
figure 4

a Two input first-order Sugeno-fuzzy model with two rules and b equivalent adaptive neuro-fuzzy inference system architecture

Layer 1 Every node in this layer is represented by a square node including a node function. The node function employed by each node determines the membership relation between the input and output functions.

Layer 2 every node in this layer is a fixed (circle) labeled II node and its output is produced by signals obtained from layer 1.

Layer 3 every node in this layer is a fixed (circle) node labeled N. The nodes normalize the firing strength by calculating the ratio of firing strength for this node to the sum of all the firing strengths.

Layer 4 Every node in this layer is represented by a square node including a node function.

Layer 5 The single node in this layer is a fixed (circle) node labeled ∑ that computes the overall output as the summation of all incoming signals.

Performance evaluation criteria

Four different types of standard statistical performance evaluation criteria were used to control the accuracy of the prediction capacity of the models developed. These are root mean square error (RMSE), the determination coefficient (R2), mean bias error (MBE) and relative improvement (RI). Performance evaluation criteria used in the current study can be calculated using following equations:

$$RMSE = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }$$
(10)
$$R^{2} = 1 - \left[ {\left( {\mathop \sum \limits_{i = 1}^{n} (y_{i} - \hat{y}_{i} )^{2} } \right){\bigg{/}}\left( {\mathop \sum \limits_{i = 1}^{n} (y_{i} - \bar{y}_{i} )^{2} } \right)} \right]$$
(11)
$$MBE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)$$
(12)
$$RI = [(RMSE_{Reg} - RMSE_{M} )/RMSE_{Reg} ] \times 100$$
(13)

where yi denotes the measured value, \(\hat{y}_{i}\) is the predicted value, \(\bar{y}_{i}\) is the average of the measured value, and n is the total number of observations. The MBE characterizes the mean difference between the calculated and measured data; hence, it is a criterion of systematic error in the model fitting. Negative and positive values of MBE indicate under and over estimation of PTFs for a given parameter respectively. RMSEReg is root mean square error of regression model and RMSEM is root mean square error of other models (Bayat et al. 2014).

Results

Data summary statistics

Pertinent statistics of the soil properties used to calibrate and test the newly developed models are given in Table 2. The correlation coefficients between variables are given in Table 3. The correlations between CEC and soil OC (r = 0.63) and between CEC and clay content (r = 0.82) had the most value and were positive significant in 0.01 level in comparison with the other properties. Therefore, clay and organic carbon content were used for prediction of CEC. The coefficient of variation (CV) of the soil organic carbon content showed more variability than those of the soil clay percentage and CEC, being about three times as large as the other properties (Table 2). This large variation in OC is due largely to the variability in manure and compost applications as fertilizer and return of rice plant residuals and soil amendments in the study area.

Table 2 Statistics of the training and testing data sets
Table 3 Correlation coefficients of the measured soil properties

Multiple linear and non-linear regressions (MLR and MNLR)

Developing PTFs using MLR and MNLR models for predicting soil CEC in study area were done by means of SPSS 24 software (2016) and above-mentioned physiochemical soil properties were used as independent variables. In the regression analysis, normalizing the data distribution is one of the primary assumptions that have to be carried out. Therefore, the normality of the data was evaluated using the Kolmogorov–Smirnov method. All data had a normal distribution. After normalizing test data, multiple linear regression function was derived for training data set through stepwise method. In this method, all data were first inserted as input data and subsequently, the data that were significantly less effective on output parameter were eliminated. MLR model was derived among CEC, OC and clay content properties (Eq. 14). It was found that the developed equations through MLR model among CEC and input variables were not statistically strong enough to establish significant models by traditional statistical models, because few numbers of inputs had high correlation with CEC. However, since the accuracy of pedotransfer function models depends on the number of inputs, while increasing the number of inputs will decrease the accuracy of the estimations (Amini et al. 2005). OC and clay were used for developing non-linear regression model. Different types of models include power, exponential, cubic and etc. were developed for non-linear regression. Finally, the best linear and non-linear regression equations that were derived for training data set were as Eqs. (14) and (15) that variance analysis result of multiple linear and non-linear regression models was mentioned in Table 4:

Table 4 Variance analysis result of multiple linear and non-linear regression models
$$CEC = 4.263 + 0.455\,Clay + 1.097\,OC\,\,\,\,R^{2} = 0.77$$
(14)
$$CEC = 0.55 + 0.64\,Clay^{0.97} + 0.55\,OC^{1.26} \,\,\,R^{2} = 0.79$$
(15)

From the numerous available PTFs derived to predict CEC we selected only those regression models that used OC and clay as independent variables and had a coefficient of determination, R2, greater than 0.5. The selected PTFs were calibrated for the study region using a generalized least squares procedure with a subset of training data (Table 2). The models and their evaluation criteria amounts are given in Table 5. The R2 and RMSE values of models showed that regression models of current study were the most accurate one.

Table 5 Selected pedotransfer functions and their calibration coefficients

Optimization of artificial neural network model

We used feed-forward back-propagation neural network in this study. We constructed one network that used organic carbon and clay content as inputs. Because, former researchers such as Amini et al. (2005), Sarmadian and Taghizadeh Mehrjardi (2008). Sarmadian et al. (2013) found that these inputs had the best results. Also, these inputs had the most correlation coefficient with CEC in current study (Table 3). Finding the optimum number of hidden neurons in the hidden layer is an important step in developing FFBP networks. In neural network design, too many hidden units cause over-fitting, while too few hidden units cause under fitting. To find the optimum number of hidden units, the RMSEs of the network with two inputs (OC and clay) were plotted versus the number of hidden units (Fig. 5), and number of hidden units equal 7 had lower RMSE therefore it be selected.

Fig. 5
figure 5

RMSE versus number of neuron in hidden layer in FFBP network for selective suitable number of neurons

The objective function without regularization (not shown) produced numerous local minima and fluctuated greatly as the number of hidden units increased. The regularized objective function, Eq. (6), showed a more stable response with respect to the number of hidden units as the RMSE of the training and testing procedures decreased gradually as the number of hidden units increased to 7 (Fig. 5). There are a number of advantages in using the Bayesian regularization algorithm (BRA) over others. One advantage is that it is model-driven rather than data-driven; owing to its Bayesian principles as opposed to maximum likelihood principles. Another advantage of BRA is that of pruning. The weight penalty term that is added to the algorithm means that, as long as sufficient hidden neurons are supplied, the BRA will automatically prune the ANN to the optimum architecture and over-fitting is avoided (Amini et al. 2005). For this reason, as illustrated by Fig. 5, adding more hidden neurons do not improve the model. After repeated experiments, a persistent minimum value of RMSE occurred at the hidden unit value of 7 with two inputs (Fig. 5), suggesting that pruning occurred above 7 hidden units. Therefore, we used a FFBP network containing 7 hidden units (FFBP7H), with tangent sigmoid transfer function, Bayesian regularization training function and gradient descent adaptation learning function for further analysis. The weights for the FFBP7H model are given in Table 6. Comparison of results obtained from current study with Sarmadian and Taghizadeh Mehrjardi (2008); Tang et al. (2009); Lake et al. (2009); Keshavarzia and Sarmadiana (2010); Kianpoor et al. (2012); Sarmadian et al. (2013); Bayat et al. (2014); Emamgolizadeh et al. (2015) researches showed that using Bayesian regularization algorithm for model learning in this study caused to increase accuracy of artificial neural network for estimation of CEC. Also, Amini et al. (2005), increased accuracy of ANN model using Bayesian regularization learning algorithm that our study result was in agreement with their research results, too.

Table 6 The weights used for the two FFBP networks with 7 neurons

Adaptive neuro-fuzzy inference system (ANFIS)

In this study, ANFIS model was also applied for predicting CEC using the same normalized data that were used for ANN model. In the ANFIS system, each input parameter might be clustered into several class values in layer 1 to build up fuzzy rules and each fuzzy rule would be constructed using two or more membership functions in layer 2. Several methods have been proposed to classify the input data and to make the rules, among which the most widespread are grid partition and subtractive fuzzy clustering (Aqil et al. 2007; Ertunc and Hosoz 2008; Yetilmezsoy et al. 2011; Kianpoor et al. 2012). In this study, grid partition was taken in consideration. Then Psigmoid membership function and their numbers for input parameters and linear membership function for output parameter were selected and so, fuzzy inference system (FIS) was generated. For training FIS, hybrid algorithm was applied. In this way, epoch 40 had the most optimal result with minimum error. After the FIS was trained, validation of the model using a testing data was carried out. Different parameter types and their values used for training ANFIS can be seen in Table 7. The descriptive performance of the ANFIS model for the test dataset and the related statistical evolutionary results are given in Table 8. The values of 0.82, 1.184, 0.218 and 25.7 for R2, RMSE, MBE and RI parameters, respectively, for ANFIS testing stage, while regression and ANFIS efficiency were less than feed-forward back-propagation network model. Comparison of trained ANFIS model in this study with trained ANFIS in Kianpoor et al. (2012) and Keshavarzi et al. (2012) researches showed that accuracy of ANFIS was high in current study, because, we used grid partition for classification of input data and making the rules, However, they used subtractive fuzzy clustering. Therefore, using grid partition caused to increase accuracy of training in our research. Yilmaz and Kaynar (2011) and Vafakhah et al. (2014) used grid partition and hybrid algorithm for FIS generation and training, respectively, in ANFIS model and reported high accuracy for model training. And so, our result was in agreement with them.

Table 7 Different parameter types and their values used for training ANFIS
Table 8 Test results of the regression, neural network and adaptive neuro-fuzzy inference system

Discussion

After determining regression equations, in order to evaluate the accuracy of MLR and MNLR models, the results of these models were compared with experimental data. In fact, the coefficient of determination (R2) between the measured and predicted values is a good indicator to check the prediction performance of the model (Gokceoglu and Zorlu 2004; Kianpoor et al. 2012). The obtained values of R2, RMSE, MBE and RI using MLR and MNLR are shown in Table 8. For test dataset, the R2, RMSE and MBE values have been obtained 0.68, 1.593 and −0.328 for linear regression and 0.73, 1.364 and 0.286 for non-linear regression, respectively. Results showed that MNLR model have high accuracy with regard to MLR and this shows relationship between CEC and soil properties such as organic carbon and clay is non-linear and complex. MLR model result is in contrast with the results of Yilmaz et al. (2012) and Kianpoor et al. (2012). However, obtained results had agreement with those reported by McBratney et al. (2002); Amini et al. (2005); Sarmadian and Taghizadeh Mehrjardi (2008); Bayat et al. (2014); Emamgolizadeh et al. (2015). Their results showed high correlation coefficient for predicting the soil CEC by means of multiple linear regression models. As above mentioned; the more inputs will result in the less accuracy of the estimation (Amini et al. 2005) and this point explains their results. Input data in McBratney et al. (2002); Amini et al. (2005) and Sarmadian and Taghizadeh Mehrjardi (2008) studies were clay and organic carbon.

The test data set was used to evaluate the performance of the MLR, MNLR, neural network model and ANFIS for predicting CEC. The statistical results of the comparisons are given in Table 8, which shows that the neural network model had larger R2 value than the regression and ANFIS models. This is in line with the work done by Yilmaz and Kaynar (2011); Kianpoor et al. (2012); Bayat et al. (2014). Their findings demonstrated that prediction performances of the FFBP model had higher accuracy than both multiple regression equations and adaptive neuro-fuzzy inference system for predicting swell potential of clayey soil and CEC, respectively. The MBE values indicated that the artificial neural network and ANFIS models had overestimated the CEC. This overestimation was however small, especially for the FFBP model. The smallest RMSE was produced by the FFBP7H model, while the largest RMSE was produced by the linear regression model, these results were in agreement with Amini et al. (2005); Kianpoor et al. (2012); Sarmadian et al. (2013). The relative improvement of the models was calculated using the linear regression model as a reference. The results in Table 8 show that the FFBP model had in general the largest relative improvement (RI) that was in agreement with Amini et al. (2005); Bayat et al. (2014). The scatter plots of the measured versus predicted CEC for the test data set are given in Fig. 6 for the prediction models, which we identified FFBP7H as being the best model for predicting CEC.

Fig. 6
figure 6

The scatter plots of the measured versus predicted CEC for testing data using regression, ANFIS and FFBP7H network with two inputs (OC and clay)

On the other hand, the proposed ANN model was, in general, more feasible than the ANFIS model in predicting CEC when the evaluation criteria are compared. The existing patterns and trends among the input variables and the output (CEC) are relatively complex and intricate. It appears that, the ANN model was more capable in extracting the existing patterns among the input variables and the output. Neural networks can extract the patterns and detect the trends that are too complex to be noticed by either humans or other computer techniques because of their remarkable ability to derive a general solution from complicated or imprecise data (Yilmaz and Kaynar 2011; Besalatpour et al. 2013; Bayat et al. (2014)). These artificial networks have the capability of learning from examples and are capable to solve intricate, nonlinear problems and problems which are very tedious to solve by conventional methods. In addition, when a data stream is analyzed using a neural network, it is possible to detect the important predictive patterns that are not previously apparent to a non-expert (Yilmaz and Kaynar 2011; Besalatpour et al. 2013). Finally, all these indicate that ANFIS approach may not always be a better choice for predicting soil CEC.

Conclusion

In this study, multiple linear and non-linear regression, artificial neural network models (feed-forward back-propagation network, FFBP) and adaptive neuro-fuzzy inference system were employed to develop a pedotransfer function for predicting soil CEC using available soil properties. The performance of the regression, neural network and ANFIS models was evaluated using a test data set. The newly developed FFBP neural network PTF with 7 hidden neurons predicted CEC better than the regression and ANFIS models and significantly improved the accuracy of the prediction by up to 80.3 %. The neural network models are in general more suitable for capturing the non-linearity of the relationship between variables. In this study, however, the relationship between CEC and clay and organic carbon appeared to be dominantly linear. Consequently, with the use of proposed ANNs especially, FFBP network, the performance of CEC condensers can be determined by performing only a limited number of test operations, thus saving engineering effort, time and funds. Finally, using Bayesian regularization algorithm for model learning in FFBP and grid partition for classification of input data and making the rules in ANFIS model caused to increase the accuracy of these models, dramatically, for CEC prediction. We suggest that researchers use genetic algorithm for optimization of models and rules in future work.