Introduction

A key element in an exploration project is evaluation of the results in different exploration phases. This is important to obtain scientific and economic implications to move to the next steps of exploration. This problem is more obvious when the volumes of exploration operations are increased and the quality levels of the explorations are different. Comparative, integration, and modeling methods have been used to solve this problem (Bonham-Carter et al. 1988; Venkataraman et al. 2000; Moon et al. 2006). Classification methods are important in exploration data modeling methodologies. In exploration data modeling methodologies, deposits are divided into training and testing classes. Using the training class, a model is built on the basis of exploration datasets. Then, based on this model and the exploration data of the testing class, probabilistic results of the model are estimated. In case of positive results in estimation, extension of exploration is proposed. Classification methods are divided into two categories (Duda et al. 2000; Theodoridis and Koutroumbas 2009): (i) for data with specified probability density function and (ii) for data without specified probability density. In this research, discriminant analysis (DA) representing the first category and support vector machine (SVM) representing the second category are used.

The application of DA in mineral exploration was first introduced by Harris (1965) to evaluate the role of geological variables in determining the probability of mineralization. This method has also been used to identify geochemical anomalies (Rose 1972; Whitehead and Govett 1974), to identify exploration criteria to separate ore-bearing rocks from barren rocks (Divi et al. 1979; Fedikow et al. 1991), to separate rock units and minerals (McKinley et al. 2014; Tibljas et al. 2002; Varadanchari and Mukherjee 2004), and to map hydrothermal alterations (Tahmasebi et al. 2010). Roshani et al. (2013) used the DA method to correlate surface geochemical anomalies with borehole data in the Kuh-Panj porphyry copper deposit, SE Iran. Belkhiri and Mouni (2014) used the DA for geochemical characterization of surface water and groundwater resources. Modeling of mineral potential using DA has been demonstrated by Agterberg (1974), Chung (1977), Prelat (1977), Bonham-Carter and Chung (1983), Harris and Pan (1999), Pan and Harris (2000), and Harris et al. (2003). Carranza (2009) has demonstrated a GIS-based technique for spatial evidence representation in modeling of mineral potential through DA.

The SVM was first proposed for pattern recognition by Vapnik in 1982. Then, it was used by Li (2005) to identify optimum locations for drilling the oil and gas fields in China. This method was also used for classification, regression and data ranking (Yu and Kim 2012), lithologic classification (Al-Anazi and Gates 2010; Yu et al. 2012), geochemical fingerprinting of ores (Savu-Krohn et al. 2011), identification of hydrothermal alteration associated with mineralization (Abbaszadeh et al. 2013), mineral resource estimation (Chatterjee and Bandopadhyay 2011), and modeling of mineral potential (Wu et al. 2010; Zuo and Carranza 2011; Abedi et al. 2012; Rodriguez-Galiano et al. 2014).

With regard to modeling of mineral potential, the performance of DA has been compared and contrasted with classification methods suitable for data with specified probability density [e.g., logistic regression (LR)] and with classification methods suitable for data without specified probability density [e.g., artificial neural networks (ANN)] (cf. Agterberg 1974; Harris and Pan 1999; Harris et al. 2003). A common conclusion of these comparative studies is that, with regard to modeling of mineral potential, classification methods suitable for data without specified probability density (e.g., ANN) generally outperform classification methods suitable for data with specified probability density (e.g., DA, LR), and that LR generally outperforms DA. In this study, we compare and contrast DA with SVM, which has been around since just a decade or so ago. Both DA and SVM make use of a hyperplane for classification (see below), and this commonality is the basis for comparing their performance in this study.

The study area (20 km2) covers the Sari-Gunay and Agh-Dagh hills at the Sari-Gunay epithermal gold deposit, NW Iran. The exploration, including extensive drilling, has been completed at the Sari-Gunay hill, but the Agh-Dagh hill and its surrounding areas still need more drilling. In this research, DA and SVM are applied to the soil geochemical and borehole data to obtain algorithms for mapping potential gold mineralization. The final potential map can be used for further drilling in the Sari-Gunay exploration area.

Methods

Classification methods can be divided into two types, one in which the probability distribution of data must be specified and the other without the need to know the probability distribution of data. The aim of the first type is to minimize empirical risk, while the second type is used to minimize structural risk (Cristianini and Taylor 2000). In empirical risk minimization, the output is a model with a minimum error in the training data, whereas in the structural risk minimization, the output is a model with generalization properties that has a minimum error in both the training and the testing data (Theodoridis and Koutroumbas 2009).

The DA and LR methods belong to the first type, and the SVM and ANN methods are attributed to the second type. The DA method has been chosen because, compared to LR, it is less sensitive to the number of samples and the size of training set (Davis 2002). It has been shown that when sample sizes are equal and homogeneity of variance/covariance holds, DA is more accurate than LR (Hu and Yu 2011). Discriminant analysis has more assumptions and restrictions than LR (Davis 2002), but when the assumptions of DA are met it is more powerful than LR (e.g., Bökeoğlu Çokluk and Büyüköztürk 2008; Cohen et al. 2002; Hu and Yu 2011). The SVM method was also selected because of its four main advantages (Auria and Moro 2008; Byvatov et al. 2003; Shawe-Taylor and Cristianini 2004): (1) it has a regularization parameter, which makes the user think about avoiding over-fitting; (2) it uses the kernel trick, so it can build an expert knowledge about the problem via the kernel; (3) SVM is defined by a convex optimization problem (no local minima), which makes it more efficient than ANN; and (4) it is an approximation bound on the test error rate, and there is a substantial body of theory behind it that suggests to choose SVM over other methods.

Discriminant Analysis

DA obtains a hyperplane that describes the separation between the observed groups. It allows classification of new observations into one of the known and predefined groups. This method is based on the mean and pooled matrix of the data. To perform DA, two conditions are required, namely (Davis 2002; McLachlan 1992): (i) all variables must have a normal distribution; and (ii) the matrix of pooled variance–covariance for all groups should be homogeneous. If these two conditions are satisfied, DA with a linear function can be used, whereas if the first condition is satisfied but the second condition is not, DA with quadratic function can be used instead of a linear function (Croux and Joossens 2005; Friedman 1989). If, in a new observation point, the p variables are measured, then

$$ x_{i} = (x_{i1} , x_{i2} ,x_{i3} , \ldots ,x_{ip} )^{T} \quad i = 1,2, \ldots ,n, $$
(1)

where x i is a column array of the data and n is the number of samples. To classify the new observation point between K classes that are separated by a hyperplane, computation of the classification score of that point in each class is required. The value of the score is obtained from the following equation:

$$ cf\left( {x_{i} } \right) = \left( {x_{i} - \mu_{k} } \right)^{T} \Sigma_{k}^{ - 1} \left( {x_{i} - \mu_{k} } \right) + \ln \left| {\Sigma_{k} } \right| - 2\log \pi_{k} $$
(2)

In this equation, μ k is the group k mean vector (1 ≤ kK), \( \Sigma_{k}^{ - 1} \) is the inverse covariance matrix of group k, \( \left| {\Sigma_{k} } \right| \) is the covariance matrix determinant, and \( \pi_{k} \) is the initial probability of group k. Therefore, the new observation point belongs to the group that has the lowest classification score. This equation is called quadratic function discriminant analysis (QDA) (Wu et al. 1996).

If the covariance matrix of the groups is homogeneous in Eq. (2), then the matrix of pooled covariance between groups can be used. This pooled matrix is calculated from the following equation:

$$ \Sigma = \frac{1}{N}\sum\nolimits_{k = 1}^{K} {n_{k} \Sigma_{k} } $$
(3)

By placing the matrix in Eq. (2), we will have

$$ cf\left( {x_{i} } \right) = \left( {x_{i} - \mu_{k} } \right)^{T} \Sigma^{ - 1} \left( {x_{i} - \mu_{k} } \right) - 2\log \pi_{k} = 2\mu_{k}^{T} \Sigma^{ - 1} x_{i} - \mu_{k}^{T} \Sigma^{ - 1} \mu_{k} - 2\log \pi_{k} $$
(4)

This equation is called linear function discriminant analysis (LDA) (George and Fernandez 2002; Wu et al. 1996).

In DA, three aspects are considered for function validation (Srivastava 2002): (1) significant differences between the classes; (2) proper selection of significant variables in the discriminant function; and (3) estimation of the probability of misclassification.

Support Vector Machine

SVM is a tool based on the statistical learning theory (Vapnik 1999). In this method, the input vectors are mapped by kernel functions into a multi-dimensional space with larger dimension than the original dimension. Then, a hyperplane is created that separates the input vectors at the maximum possible distance. This hyperplane is called the hyperplane with the maximum separating boundary. Two other parallel hyperplanes with no data between them exist on both sides of this hyperplane (Fig. 1). As the distance between two parallel hyperplanes is increased, the classification error is decreased (Ben-Hur and Weston 2010; Burges 1998).

Figure 1
figure 1

A separating hyperplane with a maximum possible margin to distinguish classes in a dataset in a linear case. The support vectors, the data fall in the two hyperplanes (dash lines) parallel to the separating hyperplane, are encircled

Suppose the training data include N data pairs in Eq. (5); therefore, the independent variable x i is classified in two classes of y i  = 1 and y i  = −1 (Kavzoglu and Colkesen 2009).

$$ D = \left\{ {(x_{i} ,y_{i} )|x_{i} \in R^{n} , y_{i} \in \left\{ { - 1,1} \right\}} \right\}_{i = 1}^{N} $$
(5)

Then, the separating hyperplane equation of the two classes is

$$ \left\{ {x\in R|f\left( x \right) = w \cdot x + b = 0} \right\}, \quad w\in R, \;b\in R, $$
(6)

where w is the vector of hyperplane coefficient and b is the vector representing the hyperplane distance away from the origin of coordinates. Selection of unique solutions for these two parameters would lead to a maximum distance between the two parallel hyperplanes and increase in generalization of the separating hyperplane, and therefore, the optimization is

$$ \min_{w,b} \frac{1}{2}\left\| w \right\|^{2} + c\sum\nolimits_{i = 1}^{N} {\xi_{i} } $$
(7)

subject to the following constraints:

$$ y_{i} \left( {\varphi (x_{i} ) \cdot w + b} \right) \ge \left( {1 - \xi_{i} } \right) \forall i,\quad \xi_{i} \ge 0, $$
(8)

where \( \varphi \left( {x_{i} } \right) \) is the kernel function, c is the fixed value of capacity, and \( \xi_{i} \) is the auxiliary parameter to separate overlapping data. Larger values of c represent a further consideration to the points close to the decision boundary, and smaller values that are points away from the decision boundary are optimized in solving the problem. This SVM method is called c-SVM approach. But in nu-SVM approach, the optimization is

$$ \min_{w,b} \frac{1}{2}\left\| w \right\|^{2} - nu \times \rho + \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\xi_{i} } $$
(9)

subject to the following constraints:

$$ y_{i} \left( {\varphi (x_{i} ) \cdot w + b} \right) \ge \rho - \xi_{i} \forall i,\quad \xi_{i} \ge 0, \;\rho \ge 0, $$
(10)

where ρ is half of the distance between the two parallel hyperplanes and nu is the control parameter effect of the second part of the optimization function. The value of nu is selected between zero and one. In order to solve Eqs. (7) and (9), the Wolfe dual Lagrange function is used by keeping the Karush–Kuhn–Tucker conditions. Detailed information can be found in Theodoridis and Koutroumbas (2009). By obtaining the w and b vectors by maximizing the Wolfe dual Lagrange function, the decision function to classify a new sample is (Yang et al. 2008):

$$ G\left( x \right) = {\text{sign}}\left[ {f\left( x \right)} \right] = {\text{sign}}\left[ {\varphi (x) \cdot w + b} \right] $$
(11)

To obtain appropriate results in classification using SVM, two aspects must be considered. First, to obtain optimal values for the parameters c and nu, respectively, the methods c-SVM and nu-SVM are used by cross-validation method. Second, to obtain the appropriate kernel function, the trial-and-error method is used. In the SVM method, four functions of linear kernel, polynomial kernel, radial basis function, and the sigmoid function are used additionally (Table 1).

Table 1 Common Kernel Functions in SVM Method

Geology of the Study Area

Sari-Gunay deposit is located in 60 km west of Hamadan city in the northwestern part of Iran. The geology of this area in terms of tectonic setting is part of the Tethyan Ocean formed between the Eurasian and Gondwana supercontinents. Evidence for closure of at least two oceans (i.e., Paleotethys in Paleozoic and Neotethys in Cenozoic) has been recorded in this area (Alavi 1994; Bagheri and Stampfli 2008; Stampfli and Borel 2004; Stőcklin 1968). The Sanandaj–Sirjan magmatic–metamorphic zone and Urumieh–Dokhtar magmatic arc (Fig. 2a) are two important geologic subdivisions of the Zagros orogenic belt in this area (Aliyari et al. 2012; Kouhestani et al. 2012). Epithermal gold and porphyry copper mineralizations in these two zones are associated with subduction and obduction processes from Jurassic to Miocene (Dargahi et al. 2010; Mirnejad et al. 2011; Moosavi et al. 2008).

Figure 2
figure 2

a Sanandaj–Sirjan zone and Urumieh–Dokhtar magmatic arc in Iran. b Geological map of the Sari-Gunay epithermal gold deposit (Wilkinson 2005)

The Sari-Gunay gold deposit is composed of two mineralized zones (Sari-Gunay hill in the northwest and Agh-Dagh hill in the southeast) (Figs. 2b, 3). It lies on the northern western part of Sanandaj–Sirjan zone (Fig. 2a). This deposit is related to sub-volcanic intrusion in a high potassium sub-alkaline volcanic complex of the Middle Miocene (Richards et al. 2006). The bedrock in this area is mostly Jurassic limestone and some clastic sediments as well as metamorphic sedimentary rocks. These rocks are covered by andesitic volcanic and Oligo-Miocene to Pleistocene pyroclastic rocks. The mineralized host rocks are strongly altered diatreme breccia with sub-alkaline dacite porphyry (Fig. 2b). These rock units are intersected by a large number of antimony sulfide and arsenic veins (Asadi et al. 2014).

Figure 3
figure 3

Location map of the soil geochemical samples on the topography map of the study area

According to Richards et al. (2006), Sari-Gunay is mostly a large low sulfidation epithermal system showing strong vertical and lateral gold mineralization associated with silicified and brecciated volcanic and sub-volcanic rocks. The epithermal system is characterized by a gold–stibnite–mercury-rich core, surrounded by arsenic halos and further away by lead–silver–zinc–copper halos.

Two distinct mineralization phases of epithermal gold and porphyry Cu–Au have been recorded at Sari-Gunay (Richards et al. 2006; Wilkinson 2005). The first mineralization phase is a porphyry system associated with potassic, sericite, and quartz–tourmaline–feldspar–potassic alteration assemblages. This porphyry Cu–Au mineralization is low grade and deeper than 500 m and is mostly restricted to the sub-volcanic dykes showing weak to moderate potassic alteration. The second phase of mineralization is low-temperature epithermal gold system associated with strong silicification and sericite–fengite alteration and formation of gold that is mostly hosted in arsenian pyrite (Wilkinson 2005; Richards et al. 2006).

The porphyry mineralization mostly occurred as quartz–pyrite–chalcopyrite–tennantite–magnetite vein/veinlets (with average 0.25% Cu and ≤0.5 g/t Au grade), quartz–tourmaline veins and breccia cements (with minor amounts of sulfide minerals and low gold grade), quartz–pyrite–stibnite–realgar–orpiment veins (with dominantly gold mineralization), and quartz–calcite–pyrite ± galena ± sphalerite veins (with low grade of base metals). The dominant vein trend is 20°–30° (NNE-SSW) with dips from vertical to 70° WNW. Sari-Gunay with 120–130 Mt resource at an average grade of 2 g/t gold is so far the largest discovered gold deposit in Iran (Wilkinson 2005; Richards et al. 2006).

Mineralization at Agh-Dagh hill is mainly associated with diatreme breccia on northwest slopes of the hill (Fig. 2b). Arsenic mineralization at Agh-Dagh hill is stronger, and gold and antimony mineralizations are weaker than Sari-Gunay hill (Wilkinson 2005). Mineralization at Agh-Dagh deposit is associated with sills that contain the suite of metals Fe–As–Sb–Hg–Au–Ag–Tl with outer zones enriched in Fe–Pb–Ag–Zn elements.

Data Preparation

Within the Sari-Gunay exploration area, 1724 soil geochemical samples were collected and analyzed for 46 elements by ICP-MS method and for Au by fire assay method. The soil sample density varied from 100 m × 25 m to 100 m × 100 m, depending on the evidences of surface mineralization (Fig. 3). The most effective elements (listed in the Table 2 with their statistical parameters) have been selected from 47 elements by using stepwise and feature selection methods (see Sect. 5.1 and 5.2 for details) and then used for DA and SVM classifications. After removing the outlier and censored geochemical data, the results were normalized by using normal score transformation method (Tabachnick and Fidell 2012). In the normal score transformation, in addition to data normalization, standardization was also considered (Siddiqui and Syed Osman 2013). Therefore, the mean of the total normal score of geochemical samples per cell is considered as the normal score of each element in that cell.

Table 2 Statistical Parameters of the Dependent Variable and Independent Variables Involved in the Classification Functions

Forty six exploration boreholes data have been used in this study (Fig. 4). Core samples were analyzed like surficial soil geochemical samples and in OMAC laboratory in Ireland with supervised by Rio-Tinto Mining and Exploration Limited. In-house reference materials were used in every batch of 50 samples and sub-sample duplicates for each 25 measurements to check the analytical accuracy. This accuracy has been evaluated by calculating the relative standard deviation (RSD) with Thompson and Howarth (1976) method. Analytical precision (2 × RSD) exhibited better than ±10% for all elements.

Figure 4
figure 4

a Locations of the mineralized zones and boreholes in the Sari-Gunay hill area. b Number–size fractal plot of the PI variable

In this research, ordinary kriging was used to interpolate and calculate the average gold grade of blocks of 10 m × 25 m × 25 m as the optimal dimensions for reserve estimation (Bartram 2005). Then, for each 25 m × 25 m cell at the surface (Fig. 4a), the sum of block grades above cutoff grade (0.5 g/t) multiplied by the thickness of the blocks was calculated and considered as an index of the cell productivity (Solovov 1985). In Table 2, the statistical parameters of the productivity index (PI) are shown. From 1767 cells, 660 cells were below economic gold mineralization (i.e., the PI is zero), and regarded as the background class.

To identify the number of classes of the remaining cells with economic gold mineralization, the number–size fractal method was used (Mandelbrot 1983; Deng et al. 2009). In the fractal method, the cumulative number of grade has a power-law function with the mineralization variables such as block thickness and grade that can be used to calculate the PI (Sanderson et al. 1994; Roberts et al. 1998; Wang et al. 2010). The number–size plot showed that the PI data were composed of two classes (Fig. 4b). The inflection point (corresponding to a value of 4050 g/t) was considered as the threshold for separating the classes. Thus, the PI variable is classified into three populations, namely class A with PI of zero, class B with intermediate PI, and class C with high PI. Obviously, high PI in each cell represents gold mineralization in that cell. Figure 4a shows the distribution of these cells in the Sari-Gunay hill area.

Classification

From 1767 cells, with their PIs estimated, 387 cells overlap with the soil geochemical samples. This is due to the geochemical sampling with 100 m line spacing. Seventy percent of these soil geochemical samples were selected randomly as the training data to construct the classification function, and the remaining thirty percent were used as the testing data. The populations of these two datasets are shown in Table 3.

Table 3 The Number of Cells Overlapped with Geochemical Samples Based on Each Population

The first step in classification is to reduce the number of variables involved in the classification function, which is also called the reduced dimensions. This is due to the reason that some of the independent variables can be a linear or nonlinear combination of other independent variables. The presence of these variables in the function is useless and they will just increase the computations. Besides, reducing the number of variables would increase generalizability of the function and reduce errors in classification of the test data.

Classification by DA

In order to determine the effective elements in the discriminant function, the stepwise method with Mahalanobis distance was used (Roshani et al. 2013). The results showed that only seven out of the 47 elements (i.e., Au, Na, W, Hg, K, Y, and Co) are the most effective for DA of the present data. The Au as the major element in mineralization and Hg as the minor element in mineralization are appropriately included in the discriminant function. In addition, Co, W, and Y (low mobility elements) and Na and K were considered effective as they are typically associated with low sulphidation epithermal hydrothermal alteration (Pirajno 2009; Robert et al. 2007). Therefore, linear and quadratic discriminant functions by using the training data were computed for the seven significant elements. The leave-one-out cross-validation method (Hastie et al. 2009) was used to determine the accuracy of discriminant functions on the classification of the training data. Then, the obtained discriminant functions were used to classify the testing data, and the results were compared with the earlier classes. The results of DA classification of the training and testing data are shown in Tables 4 and 5.

Table 4 Results of Classification of the Training and Testing Data by LDA
Table 5 Results of Classification of the Training and Testing Data by QDA

The classification accuracy of the testing data was almost close to the classification of the training data (Tables 4, 5). This demonstrates an acceptable validity of the discrimination functions. The best accuracy of classification in both linear and nonlinear methods was attributed to the B population, and the lowest accuracy was attributed to the data of the C population. In general, the QDA function with classification accuracy of 68.2% showed a better performance than the LDA function with classification accuracy of 61.5 %.

Classification by SVM

First, the effective elements were selected through the feature selection method, with nearest neighbor criteria (Hastie et al. 2009). Gold, Ag, As, Ba, Be, Cd, Co, Cu, La, Ni, P, Pb, S, Sb, Sc, Th, Y, and Zr are these elements that involved in the classification functions as the independent variables. Gold, Ag, Cu, and Pb were considered the key elements, and As, Sb, and S are also effective elements as they are typically associated with low sulphidation epithermal gold mineralization (Pirajno 2009; Robert et al. 2007). Classification of the training data was done by each four kernel functions, and the best results were achieved with radial basis function kernel. The 10-fold cross-validation was used to optimize parameters c and nu, as well as parameter γ for the selected radial basis function kernel. The results of data classification by two methods of c-SVM and nu-SVM are listed in Tables 6 and 7.

Table 6 Results of Classification Using c-SVM with Optimized Parameters γ = 0.01 and c = 151
Table 7 Results of Classification Using nu-SVM with Optimized Parameters γ = 0.015 and nu = 0.320

The results in Tables 6 and 7 indicated that in both methods of SVM, the best classification was attributed to the population A, and the lowest accuracy was attributed to the population C. The considerable difference between the accuracy of the classification of the training data and the testing data in both methods was due to the usage of the classification model obtained by the training data for classification of the given data. However, the accuracy of the obtained classification by the cross-validation method in both SVM methods was close to the classification accuracy of the testing data. In total, classification of the training data and the testing data, both methods with classification accuracy of 88.9% revealed the same behavior.

Discussion

It is very important to define the location of the boreholes accurately in different stages of exploration. In most exploration projects, the borehole locations are identified by integrated analysis of the surface exploration data. In this research, we provide a new method to model the geometry of the mineralization by using a combination of surface geochemical data and subsurface drilling data to suggest further drilling locations. For this purpose, two classification methods were used in Sari-Gunay gold deposit.

By comparing the results of the classification by the DA method with the SVM method on the testing data, the nu-SVM method revealed to have the best ability and the QDA method showed the lowest ability to separate the data of the population A. The LDA method showed the weakest performance and three other methods showed the best performances in classification of the data of the population B. The best and weakest methods for classification of the data of the population C were attributed the nu-SVM and LDA methods, respectively. Therefore, it can be concluded that the methods of SVM perform a better classification than DA methods. Also, the nu-SVM method was the best method of classification, and after that c-SVM, QDA, and LDA methods showed better performances, respectively.

These results can be interpreted with two mathematical and geological aspects. In mathematical view, SVM method can be applied irrespective to data distribution and outliers. This method is also known as the generalization performance of the classifier (Duda et al. 2000; Theodoridis and Koutroumbas 2009). Therefore, it can be operated satisfactorily with training and testing data to gain better results. But, in geological view, Ag, As, Cd, Cu, Pb, S, and Sb independent variables are chalcophile elements that have a significant positive correlation with gold mineralization. Ba, Be, La, and Zr independent variables are also lithophile elements that have a significant negative correlation with gold mineralization in the study area. These two groups of elements have major role in the classification functions of the SVM method.

The above-mentioned four classification functions were applied to the geochemical data to predict the extension of gold mineralization at surface and to determine the optimum locations for further drillings in Sari-Gunay hill. These data were also used to define the appropriate locations for reconnaissance drilling at the Agh-Dagh hill. The results of surface modeling are shown in Figure 5. The classification results indicated three gold potential areas:

Figure 5
figure 5

Predicting models of gold mineralization by four classification functions in the exploration area (The background, moderate, and high potential areas are shown as blue, yellow, and red, respectively; black points indicate the location of the drilled exploration boreholes)

  1. I.

    The strongest and largest gold potential area is located in the central part of the Sari-Gunay hill. As shown in Figure 5, the four methods of classification separated the areas with high and moderate PI values from the background. However, the comparison of the obtained models with kriging indicates that the shape and size of these two potential areas in the obtained models by SVM method are more consistent with actual results (Fig. 4a). In addition, the mineralization model of QDA method is reasonably consistent with the kriging model.

  2. II.

    The second important gold potential area is located in Agh-Dagh hill. All four classification functions revealed two strong and moderate gold mineralized zones (high and moderate PI values) at the Agh-Dagh hill (Fig. 5). In terms of shape and size of these two zones, two methods of LDA and c-SVM were similar. The QDA method showed the largest gold mineralized area and the nu-SVM method showed the smallest gold mineralized area at the Agh-Dagh hill.

  3. III.

    The third important gold potential area is located between the Sari-Gunay and the Agh-Dagh hills. This area is mostly covered by soil. There are a few stibnite veins trending in a NE–SW direction in this area. All classification models showed moderate mineralized zones in this area (Fig. 5). The LDA method determined the continuity of the mineralization better than the other methods in this zone. It is concluded that LDA method is more useful in determining small areas of oriented structural control mineralization, while SVM works better to determine large areas of disseminated mineralization in this study area.

Two and three boreholes already successfully drilled at the Agh-Dagh hill and the area between the Sari-Gunay and Agh-Dagh hill, respectively. Four of these boreholes lie on the moderate and one hole lies in the high potential area. Lithology and chemical analysis of cores, obtained from these boreholes, shown that gold mineralization occurred in lithic tuff breccia and dacite–andesite porphyritic rocks in dispersion vein-like shape with an average grade of 3.1 g/t gold in II area (Fig. 5) and dacite crystalline tuff and dacite–andesite porphyritic rocks in narrow veins with an average grade of 4.1 g/t gold in III area (Fig. 5). Silicification and sericite alterations are the principal hydrothermal alterations in these rocks. Therefore, both moderate and high potential areas (moderate and high PI) are proposed as potential areas for further reconnaissance drilling. The high potential areas (high PI) at Sari-Gunay hill is proposed for further systematic drilling, and moderate potential areas here are proposed for further reconnaissance drilling.

Conclusions

The main conclusions obtained from the applications of two classification methods, DA and SVM, to the surface soil and drilling data to locate the high gold potential areas for further drilling in the Sari-Gunay gold deposit are as follows:

  1. 1.

    SVM methods classified the PI in all three classes of background, moderate, and high areas for the training and testing data due to the lower structural errors.

  2. 2.

    All four models of gold mineralization prediction in Sari-Gunay hill showed reasonable agreement with the model obtained by the kriging method. Nevertheless, the SVM models showed a better fit with kriging model in terms of shape and extent of the mineralization with different quantities.

  3. 3.

    All four models predicted the mineralization in Agh-Dagh hill, which is very similar to Sari-Gunay hill. However, the size of the strong mineralization zone in the Agh-Dagh hill is much smaller than the Sari-Gunay hill.

  4. 4.

    The obtained gold mineralization prediction model by the LDA method showed good continuity of gold mineralization along the northeast–southwest trending valley between the two hills.

  5. 5.

    Zones of high PI are proposed as the first priority for infill systematic drilling at the Sari-Gunay hill, and zones with medium PI are proposed for further reconnaissance drilling at Agh-Dagh hill as well as the area between Sari-Gunay and Agh-Dagh hills.