Introduction

Mining and related industries are widely considered as having unfavorable effects on environment in terms of magnitude and diversity. Among them, heavy metals are often present as a result of mining, milling and industrial manufacturing. Sulphide mines extraction is a major water quality problem due to acid mine drainage (AMD) generation in most of source of them. The oxidation of sulphide minerals in particular pyrite exposed to atmospheric oxygen during or after mining activities generates acidic waters with low pH values (as low as 2) and high concentrations of dissolved iron (Fe), sulphate (SO4) and heavy metal and toxic materials like lead, copper, zinc, aluminum, mercury, marcasite and pyrite (FeS2), which are harmful for the human and aquatic environment (Williams 1975; Daskalakis and Helz 1999; Moncur et al. 2005; Balistrieri et al. 2007; Zhao et al. 2007).

The Sarcheshmeh copper deposit is recognised as the fourth largest mine in the world containing 1 billion tonnes averaging 0.9% copper and 0.03% molybdenum (Banisi and Finch 2001). This ore body is located at southeast of Iran, Kerman province. Mining operation has disposed many low grade waste dumps and has raised many environmental problems. Environmental problems of sulphide minerals oxidation and AMD generation in the Sarcheshmeh copper mine and their impacts on the Shur River have been investigated in the past (Marandi et al. 2007; Shahabpour and Doorandish 2008; Doulati Ardejani et al. 2008; Bani Assadi et al. 2008; Derakhshandeh and Alipour 2010).

Shur River in the Sarcheshmeh copper mine has been polluted by AMD with pH values ranging between 2 and 4.5 and high concentrations of heavy metals. The prediction of heavy metals in in Shur River using cost-effective and quick methods such as artificial neural network (ANN) and support vector machine (SVM), are valuable in developing appropriate remediation and monitoring methods.

In addition, several investigations have been done using artificial neural networks (ANN) and multiple linear regression (MLR) in different fields of environmental engineering in the recent decades (Karunanithi et al. 1994; Lek and Guegan 1999; Govindaraju 2000; Karul et al. 2000; Bowers and Shedrow 2000; Kemper and Sommer 2002; Dedecker et al. 2004; Kuo et al. 2004; Khandelwal and Singh 2005 Almasri and Kaluarachchi 2005; Kurunc et al. 2005; Sengorur et al. 2006; Kuo et al. 2007; Messikh et al. 2007; Palani et al. 2008; Hanbay et al. 2008; Chenard and Caissie 2008; Dogan et al. 2009; Singh et al. 2009; Rooki et al. 2011). However, recent works on the artificial intelligence have resulted in finding a novel machine learning theory called SVM. The SVM method relies on the statistical learning theory, which enables learning machines to generalise the unseen data. It was introduced in the early 1990s as a non-linear solution for classification and regression tasks (Vapnik 1995; Behzad et al. 2009). This technique has been proven to have superior performances in various problems due to its generalization abilities and robustness against noise and interferences (Steinwart 2008). Support vector machine (SVM) is a device to find a solution which uses the minimum possible energy of the data (Martinez-Ramon and Cristodoulou 2006; Bishop 2006; Cristianini and Shawe-Taylor 2000). In general, there are at least three reasons for the success of SVM: its ability to learn well with only a very small number of parameters, their robustness against the error of the model, and their computational efficiency compared with several other methods such as neural network, fuzzy network and, etc. (Martinez-Ramon and Cristodoulou 2006; Wang 2005). The literature review has shown that although many research works have been conducted related to the application of the ANN method in mining and relevant environmental problems, the SVM method has not been used in environmental assessment and even prediction of heavy metals in AMD. In this paper, the heavy metals in the Shur River impacted by AMD are predicted using SVM. The results obtained from the predictions using SVM are compared with the GRNN (Rooki et al. 2011) and the concentrations of major heavy metals were sampled and analysed in Shur River of Sarcheshmeh copper mine, southeast Iran.

Site description

Sarcheshmeh copper mine is located at 160 km distance to southwest of Kerman and at 50 km distance to southwest of Rafsanjan in Kerman province, Iran. The main access road to the study area is Kerman–Rafsanjan–Shahr Babak road. This mine belongs to Band Mamazar–Pariz Mountains. The average elevation of the mine is 1,600 m. The mean annual precipitation of the site varies from 300 to 550 mm. The temperature varies from +35°C in summer to −20°C in winter. The area is covered with snow about 3–4 months per year. The wind speed sometimes exceeds to 100 km/h. A rough topography is predominant at the mining area. Figure 1 shows the geographical position of the Sarcheshmeh copper mine.

Fig. 1
figure 1

Location of the Sarcheshmeh mine and Shur River (after Atapour and Aftabi 2007; Rooki et al. 2011)

The ore body in Sarcheshmeh is oval shaped with a long dimension of a length of about 2,300 m and a width of about 1,200 m. This deposit is associated with the late Tertiary Sarcheshmeh granodiorite porphyry stock. The geology of Sarcheshmeh porphyry deposit is very complicated and various rock types can be found there. Mineralization in this deposit is associated with the Late Tertiary, with main minerals such as chalcocite, chalcopyrite, covellite, bornite, and molybdenite. However, other minerals are also seen in the deposit, which includes molybdenum, gold, and silver. The oxide zone of deposit consists mainly of cuprite, tenorite, malachite, and azurite. Pyrite is the gangue mineral, which causes acidity of mine sewage (Monjezi et al. 2009). Open pit mining is used to extract copper deposit in Sarcheshmeh. A total of 40,000 tons of ore (average grades 0.9% Cu and 0.03% molybdenum) is approximately extracted per day in Sarcheshmeh mine (Banisi and Finch 2001). The catchment area of the Shur River is approximately 200 km2 and the discharge is about 0.53 m3/s (Monjezi et al. 2009).

Sampling and field methods

Sampling of waters in the Shur River downstream from the Sarcheshmeh mine was carried out in February 2006. Water samples consist of water from Shur River (Fig. 1) originating from Sarcheshmeh mine, acidic leachates of heap structure, run-off of leaching solution into the River and samples affected by tailings along the Shur River. The water samples were immediately acidified by adding HNO3 (10 cc acid/1,000 cc sample) and stored under cool conditions. The equipments used in this study were sample container, GPS, oven, autoclave, pH meter, atomic adsorption, and ICP analysers. The pH of the water was measured using a portable pH meter in the field. Other physical parameters were total dissolved solids (TDS), electric conductivity (EC) and temperature. Analyses for dissolved metals were performed using atomic adsorption spectrometer (AA220) in water Lab of the National Iranian Copper Industries Company (NICIC). In spite of not being here, ICP (model 6000) was also used to analyse the concentrations of those heavy metals, which are detected in the range of ppb. Table 1 gives the minimum, maximum, and the mean values of the some physical and chemical parameters. According to mean values of heavy metals in Table 1, the aquatic life and the surrounding environment at Shur River is a severe condition. According to the correlation matrix (Table 2), pH, SO4 and Mg have most correlation with heavy metals (Cu, Mn and Zn) concentrations.

Table 1 Maximum, minimum, and mean physical, and chemical constituents including heavy metals of the Shur River
Table 2 Correlation matrix between heavy metals concentrations and independent variables

Support vector machine

In pattern recognition, the SVM algorithm constructs non-linear decision functions by training a classifier to perform a linear separation in some high dimensional space, which is non-linearly related to input space.

To generalize the SVM algorithm for regression analysis, an analogue of the margin is constructed in the space of the target values (y) by using Vapnik’s ε-insensitive loss function. This function is shown in the Fig. 2.

$$ \left| {y - f(x)} \right|_{\varepsilon } : = \max \left\{ {0,\left| {y - f(x)} \right| - \varepsilon } \right\} \, $$
(1)
Fig. 2
figure 2

Concept of ε-insensitivity. Only the samples out of the ±ε margin will have a nonzero slack variable, so they will be the only ones that will be part of the solution (Liu et al. 2009)

To estimate a linear regression

$$ f(x) = (w.x) + b $$
(2)

With precision, one minimizes

$$ \frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{m} {\left| {y - f(x)} \right|_{\varepsilon } } $$
(3)
$$ L(w,\xi ,\xi^{\prime}) = \frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{N} {(\xi_{i} + \xi^{\prime}_{i} } ) $$
(4)
figure a

Written as a constrained optimization problem, this reads.

For all i = 1,…, m. It should be noted that according to (5) and (6), any error smaller than ε does not require a nonzero \( \xi_{i} \) or \( \xi^{\prime}_{i} , \) and does not enter the objective function (3).

Generalized kernel-based regression estimation is carried out in a complete analogy in order to recognize pattern. Introducing Lagrange multipliers, one thus arrives at the following optimization problem: for C > 0, ε > 0 chosen a priori, Maximize

$$ L(\alpha ,\alpha^{\prime}) = \frac{1}{2}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {(\alpha_{i} - \alpha_{i}^{\prime } )x_{i}^{T} x_{j} (} } \alpha_{i} - \alpha_{j}^{\prime } ) + \sum\limits_{i = 1}^{N} {((\alpha_{i} - \alpha_{i}^{\prime } )y_{i} - (\alpha_{i} + \alpha_{i}^{\prime } )\varepsilon )} $$
(8)
$$ {\text{Subject to 0}} \le (\alpha_{i} - \alpha_{i}^{\prime } ) \le C $$
(9)

where, x i only appears inside an inner product. To get a potentially better representation of the data, the data points can be mapped into an alternative space, generally called feature space (a pre-Hilbert or inner product space) through a replacement:

$$ x_{i} x_{j} \to \varphi (x_{i} ).\varphi (x_{j} ) \, $$
(10)

The functional form of the mapping φ(x i ) does not need to be known since it is implicitly defined by the choice of kernel: k(x i , x j ) = φ(x i ).φ(x j ) or inner product in Hilbert space. With a suitable choice of kernel the data can become separable in feature space while the original input space is still non-linear. Thus, whereas data for n-parity or the two spirals problem is non-separable by a hyper plane in input space, it can be separated in the feature space by RBF kernel:

$$ k(x_{i} ,x_{j} ) = e^{{ - \left\| {x_{i} - x_{j} } \right\|^{2} /2\sigma^{2} }} $$
(11)

where \( \sigma \) is the Gaussian parameter.

Several other choices for the kernel can be seen in the Table 3.

Table 3 Different common kernels

Then, the regression estimate takes the form

$$ y_{i} = \sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {(\alpha_{i} - \alpha_{i}^{\prime } )\varphi (x_{i} )^{T} \varphi (x_{j} ) + b = } } \sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {(\alpha_{i} - \alpha_{i}^{\prime } )K(x_{i} ,x_{j} ) + b} } $$
(12)

where b is computed using the fact that (5) becomes an equality with ξ i  = 0 if 0 < α i  < C and (6) becomes an equality with \( \xi^*_{i} \) = 0 if 0 < α i  < C.

Several extensions of this algorithm are possible. From an abstract point of view, it is just needed target function, which depends on the vector (w, ξ). There are multiple degrees of freedom for constructing this function, including some freedom how to penalize, or regularize, different parts of the vector, and some freedom how to use the kernel trick. (Agarwala et al. 2008; Quang-Anh et al. 2005; Stefano and Giuseppe 2006; Lia et al. 2007; Hwei-Jen and Jih Pin 2009; Eryarsoy et al. 2009; Chih-Hung et al. 2009; Sanchez 2003).

Support vector machine implementation for prediction of heavy metals

Similar with other multivariate statistical models, the performances of SVM for regression depend on the combination of several parameters. They are capacity parameter C, \( \varepsilon \) of \( \varepsilon \)-insensitive loss function, the kernel type K and its corresponding parameters. C is a regularization parameter that controls the trade-off between maximizing the margin and minimizing the training error. If C is too small, then insufficient stress will be placed on fitting the training data. If C is too large, then the algorithm will overfit the training data. But, Wang et al. (2003) indicated that prediction error was scarcely influenced by C. In order to make the learning process stable, a large value should be set up for C (e.g., C = 100).

The optimal value for \( \varepsilon \) depends on the type of noise present in the data, which is usually unknown. Even if enough knowledge of the noise is available to select an optimal value for \( \varepsilon , \) there is the practical consideration of the number of resulting support vectors. \( \varepsilon \)-insensitivity prevents the entire training set meeting boundary conditions, and so allows for the possibility of sparsity in the dual formulations solution. Therefore, choosing the appropriate value of \( \varepsilon \) is critical from theory.

Since in this study the non-linear SVM is applied, it would be necessary to select a suitable kernel function. The obtained result of previous published researches (Dibike et al. 2001; Han and Cluckie 2004) indicates the Gaussian radial basis function has superior efficiency than other Kernel functions. The form of the Gaussian kernel is as follow:

$$ k(x_{i} ,x_{j} ) = e^{{ - \left\| {x_{i} - x_{j} } \right\|^{2} /2\sigma^{2} }} $$
(13)

In addition, where \( \sigma \) is a constant parameter of the kernel and can control the amplitude of the Gaussian function and the generalization ability of SVM. We have to optimize \( \sigma \) and find the optimal one.

In order to find the optimum values of two parameters (\( \sigma \) and \( \varepsilon \)) and prohibit the overfitting of the model, the data set was separated into a training set of 44 compounds and a test set of 12 compounds randomly and the leave-one-out cross-validation of the whole training set was performed. The leave-one-out (LOO) procedure consists of removing one example from the training set, constructing the decision function on the basis only of the remaining training data and then testing on the removed example (Liu et al. 2006). In this fashion after one tests all examples of the training data and measures the fraction of errors over the total number of training examples. The root mean square error (RMS) was used as an error function to evaluate the quality of model.

$$ {\text{RMS}} = \sqrt {\frac{{\sum\nolimits_{i - 1}^{n} {\left( {y_{i} - \widehat{y}_{i} } \right)^{2} } }}{n}} $$
(14)

where, y i is the measured value, \( \hat{y}_{i} \) denotes the predicted value, and n stands for the number of samples. Detailed process of selecting the parameters and the effects of every parameter on generalization performance of the corresponding model are shown in Fig. 3. To obtain the optimal value of σ, the SVM with different σ were trained, the σ varying from 0.01 to 0.2, every 0.01. We calculated the RMS on different σ, according to the generalization ability of the model based on the LOO cross-validation for the training set in order to determine the optimal one. The curve of RMS versus the sigma was shown in Fig. 3. The optimal σ was found as 0.13. In order to find an optimal \( \varepsilon , \) the RMS on different \( \varepsilon \) was calculated. The curve of the RMS versus the epsilon was shown in Fig. 3. From Fig. 3, the optimal \( \varepsilon \) was found as 0.08.

Fig. 3
figure 3

Sigma versus RMS error (left) and epsilon versus RMS error (right) on LOO cross-validation

From the above discussion, the σ, \( \varepsilon \) and C were fixed to 0.13, 0.08 and 100, respectively, when the support vector number of the SVM model was 45. Figure 4 shows the schematic of SVM structure. The prediction correlation coefficient (R) of the test set for Mn, Fe, Cu and Zn is 0.94, 0.88, 0.951 and 0.95, respectively (Fig. 5).

Fig. 4
figure 4

Schematic of SVM structure

Fig. 5
figure 5

Obtained results of SVM in the prediction process of Mn, Fe, Cu, and Zn in test data

Prediction by general regression neural network

In order to check the accuracy of SVM in the prediction of heavy metals included in the AMD, obtained results of General Regression Neural Network (GRNN) has been proposed by Specht (1991). GRNN is a type of supervised network and also trains quickly on sparse data sets but, rather than categorising it. GRNN applications are able to produce continuous valued outputs. GRNN is a three-layer network where there must be one hidden neuron for each training pattern. GRNN is a modification to probabilistic neural network, which has also been successfully used in many engineering applications. Huang and Williamson (1994) described GRNN as an easy-to-implement tool, which has efficient training capabilities, and the ability to handle in complete patterns. GRNN is known to be particularly useful in approximating continuous functions. It may have multidimensional input, and it will fit multidimensional surfaces through data (Huang et al. 1996).

Optimum structure of the GRNN was obtained using trial and error method and the optimum smooth factor (SF) was selected 0.10. This network has three layers; input layer with 3 neurons (pH, SO4 and Mg), hidden layer incorporating 44 neurons (number of training samples) with radial basic activation function in all neurons and output layer with 4 neurons (Cu, Fe, Mn and Zn) with linear activation function (Rooki et al. 2011). The prediction correlation coefficient (R) of the test set for Mn, Fe, Cu and Zn is 0.94, 0.88, 0.951 and 0.95, respectively (Fig. 6).

Fig. 6
figure 6

Obtained results of GRNN in the prediction process of Mn, Fe, Cu, and Zn in test data

Discussion

In this research work, we have demonstrated one of the applications of support vector machine in the prediction of heavy metals included in the acid mine drainage. In addition, for showing the better performance, we have compared the obtained results of SVM with those of the GRNN. Figures 5 and 6 represent the given results of the two expressed methods. Table 4 compares the correlation coefficient R and Root Mean Square error (RMS) associated with two methods for both training and test data. The indices 1 and 2 for R and RMS in Table 4 are related to the training and test data, respectively.

Table 4 The comparison of the results (R, RMS) of two methods in train and test data

As it is quite clear in Figs. 5 and 6 and Table 4, SVM has a considerable better performance compare with the GRNN and there are some proper representations in terms of correlation coefficient (R) for those predicted by SVM. Therefore, SVM has two reliable characteristics (i.e. good prediction and appropriate running time).

Conclusions

Support vector machine (SVM) is a novel machine learning methodology based on statistical learning theory (SLT), which has considerable features including the fact that requirement on kernel and nature of the optimization problem results in a uniquely global optimum, high generalization performance, and prevention from converging to a local optimal solution. In this paper, a new method to predict major heavy metals in Shur River impacted by AMD has been presented using SVM method while GRNN has also used for making a proper comparison. Although both methods are data-driven models, it has been found that SVM makes the running time considerably faster with a higher accuracy. In terms of accuracy, the SVM technique resulted in a RMSE reduction relative to that of the GRNN model (Table 4). Regarding the running time, SVM requires a small fraction of the computational time used by GRNN, and it is also an important factor to choose an appropriate and high-performance data-driven model.